Using the LLM-based PDF Parser¶

Converts PDF content into Markdown format using Multimodal models.
Uses multimodal models to describe images within PDFs.
Supports page-wise or chunk-based processing for optimized performance.

Initializing the LLM-based PDF Parser¶

Make sure you have set up your API key for whichever model you specify in model_name below.

You can initialize the LLM PDF parser as follows:

parsing_config = ParsingConfig(
    n_neighbor_ids=2,
    pdf=PdfParsingConfig(
        library="llm-pdf-parser",
        llm_parser_config=LLMPdfParserConfig(
            model_name="gemini-2.0-flash",
            split_on_page=True,
            max_tokens=7000,
            requests_per_minute=5,
            timeout=60,  # increase this for large documents
        ),
    ),
)

Parameters¶

`model_name`¶

Specifies the model to use for PDF conversion. Default: gemini/gemini-2.0-flash

`max_tokens`¶

Limits the number of tokens in the input. The model's output limit is 8192 tokens.

Default: 7000 tokens (leaving room for generated captions)
Optional parameter

`split_on_page`¶

Determines whether to process the document page by page.

Default: True
If set to False, the parser will create chunks based on max_tokens while respecting page boundaries.
When False, the parser will send chunks containing multiple pages (e.g., [11,12,13,14,15]).

Advantages of False:

Reduces API calls to the LLM.
Lowers token usage since system prompts are not repeated per page.

Disadvantages of False:

You will not get per-page splitting but groups of pages as a single unit.

If your use case does not require strict page-by-page parsing, consider setting this to False.

`requests_per_minute`¶

Limits API request frequency to avoid rate limits.

If you encounter rate limits, set this to 1 or 2.

Using the LLM-based PDF Parser¶

Initializing the LLM-based PDF Parser¶

Parameters¶

model_name¶

max_tokens¶

split_on_page¶

requests_per_minute¶

`model_name`¶

`max_tokens`¶

`split_on_page`¶

`requests_per_minute`¶