Skip to content

Using the LLM-based PDF Parser

  • Converts PDF content into Markdown format using Multimodal models.
  • Uses multimodal models to describe images within PDFs.
  • Supports page-wise or chunk-based processing for optimized performance.

Initializing the LLM-based PDF Parser

Make sure you have set up your API key for whichever model you specify in model_name below.

You can initialize the LLM PDF parser as follows:

parsing_config = ParsingConfig(
    n_neighbor_ids=2,
    pdf=PdfParsingConfig(
        library="llm-pdf-parser",
        llm_parser_config=LLMPdfParserConfig(
            model_name="gemini-2.0-flash",
            split_on_page=True,
            max_tokens=7000,
            requests_per_minute=5,
            timeout=60, # increase this for large documents
        ),
    ),
)

Parameters

model_name

Specifies the model to use for PDF conversion. Default: gemini/gemini-2.0-flash.

max_tokens

Limits the number of tokens in the input. The model's output limit is 8192 tokens. - Default: 7000 tokens (leaving room for generated captions). - Optional parameter.

split_on_page

Determines whether to process the document page by page. - Default: True - If set to False, the parser will create chunks based on max_tokens while respecting page boundaries. - When False, the parser will send chunks containing multiple pages (e.g., [11,12,13,14,15]). - Advantages of False: - Reduces API calls to the LLM. - Lowers token usage since system prompts are not repeated per page. - Disadvantages of False: - You will not get per page splitting but group of pages as a page. - If your use case does not require strict page-by-page parsing, consider setting this to False.

requests_per_minute

Limits API request frequency to avoid rate limits. - If you encounter rate limits, set this to 1 or 2.