Skip to content

Using the Gemini PDF Parser

  • Converts PDF content into Markdown format.
  • Uses Gemini multimodal models to describe images within PDFs.
  • Supports page-wise or chunk-based processing for optimized performance.

Initializing the Gemini PDF Parser

Make sure you have set up your gemini api key in env as GEMINI_API_KEY

You can initialize the Gemini PDF parser as follows:

parsing_config = ParsingConfig(
    n_neighbor_ids=2,
    pdf=PdfParsingConfig(
        library="gemini",
        gemini_config=GeminiConfig(
            model_name="gemini-2.0-flash",
            split_on_page=True,
            max_tokens=7000,
            requests_per_minute=5,
        ),
    ),
)

Parameters

model_name

Specifies the Gemini model to use for PDF conversion. Default: gemini-2.0-flash.

max_tokens

Limits the number of tokens in the input. The model's output limit is 8192 tokens. - Default: 7000 tokens (leaving room for generated captions). - Optional parameter.

split_on_page

Determines whether to process the document page by page. - Default: True - If set to False, the parser will create chunks based on max_tokens while respecting page boundaries. - When False, the parser will send chunks containing multiple pages (e.g., [11,12,13,14,15]). - Advantages of False: - Reduces API calls to the LLM. - Lowers token usage since system prompts are not repeated per page. - Disadvantages of False: - You will not get per page splitting but group of pages as a page. - If your use case does not require strict page-by-page parsing, consider setting this to False.

requests_per_minute

Limits API request frequency to avoid rate limits. - If you encounter rate limits, set this to 1 or 2.