Using the Gemini PDF Parser¶

Converts PDF content into Markdown format.
Uses Gemini multimodal models to describe images within PDFs.
Supports page-wise or chunk-based processing for optimized performance.

Initializing the Gemini PDF Parser¶

Make sure you have set up your gemini api key in env as GEMINI_API_KEY

You can initialize the Gemini PDF parser as follows:

parsing_config = ParsingConfig(
    n_neighbor_ids=2,
    pdf=PdfParsingConfig(
        library="gemini",
        gemini_config=GeminiConfig(
            model_name="gemini-2.0-flash",
            split_on_page=True,
            max_tokens=7000,
            requests_per_minute=5,
        ),
    ),
)

Parameters¶

`model_name`¶

Specifies the Gemini model to use for PDF conversion. Default: gemini-2.0-flash.

`max_tokens`¶

Limits the number of tokens in the input. The model's output limit is 8192 tokens. - Default: 7000 tokens (leaving room for generated captions). - Optional parameter.

`split_on_page`¶

Determines whether to process the document page by page. - Default: True - If set to False, the parser will create chunks based on max_tokens while respecting page boundaries. - When False, the parser will send chunks containing multiple pages (e.g., [11,12,13,14,15]). - Advantages of False: - Reduces API calls to the LLM. - Lowers token usage since system prompts are not repeated per page. - Disadvantages of False: - You will not get per page splitting but group of pages as a page. - If your use case does not require strict page-by-page parsing, consider setting this to False.

`requests_per_minute`¶

Limits API request frequency to avoid rate limits. - If you encounter rate limits, set this to 1 or 2.

Using the Gemini PDF Parser¶

Initializing the Gemini PDF Parser¶

Parameters¶

model_name¶

max_tokens¶

split_on_page¶

requests_per_minute¶

`model_name`¶

`max_tokens`¶

`split_on_page`¶

`requests_per_minute`¶