url_loader
langroid/parsing/url_loader.py
BaseCrawlerConfig
¶
Bases: BaseSettings
Base configuration for web crawlers.
TrafilaturaConfig
¶
FirecrawlConfig
¶
BaseCrawler(config)
¶
Bases: ABC
Abstract base class for web crawlers.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
config
|
BaseCrawlerConfig
|
Configuration for the crawler |
required |
Source code in langroid/parsing/url_loader.py
needs_parser
abstractmethod
property
¶
Indicates whether the crawler requires a parser.
CrawlerFactory
¶
Factory for creating web crawlers.
create_crawler(config)
staticmethod
¶
Create a crawler instance based on configuration type.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
config
|
BaseCrawlerConfig
|
Configuration for the crawler |
required |
Returns:
Type | Description |
---|---|
BaseCrawler
|
A BaseCrawler instance |
Raises:
Type | Description |
---|---|
ValueError
|
If config type is not supported |
Source code in langroid/parsing/url_loader.py
TrafilaturaCrawler(config)
¶
Bases: BaseCrawler
Crawler implementation using Trafilatura.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
config
|
TrafilaturaConfig
|
Configuration for the crawler |
required |
Source code in langroid/parsing/url_loader.py
FirecrawlCrawler(config)
¶
Bases: BaseCrawler
Crawler implementation using Firecrawl.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
config
|
FirecrawlConfig
|
Configuration for the crawler |
required |
Source code in langroid/parsing/url_loader.py
ExaCrawler(config)
¶
Bases: BaseCrawler
Crawler implementation using Exa API.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
config
|
ExaCrawlerConfig
|
Configuration for the crawler |
required |
Source code in langroid/parsing/url_loader.py
crawl(urls)
¶
Crawl the given URLs using Exa SDK.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
urls
|
List[str]
|
List of URLs to crawl |
required |
Returns:
Type | Description |
---|---|
List[Document]
|
List of Documents with content extracted from the URLs |
Raises:
Type | Description |
---|---|
LangroidImportError
|
If the exa package is not installed |
ValueError
|
If the Exa API key is not set |
Source code in langroid/parsing/url_loader.py
URLLoader(urls, parsing_config=ParsingConfig(), crawler_config=None)
¶
Loads URLs and extracts text using a specified crawler.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
urls
|
List[Any]
|
List of URLs to load |
required |
parsing_config
|
ParsingConfig
|
Configuration for parsing |
ParsingConfig()
|
crawler_config
|
Optional[BaseCrawlerConfig]
|
Configuration for the crawler |
None
|