PDF Files and Image inputs to LLMs¶

Langroid supports sending PDF files and images (either URLs or local files) directly to Large Language Models with multi-modal capabilities. This feature allows models to "see" files and other documents, and works with most multi-modal models served via an OpenAI-compatible API, e.g.:

OpenAI's GPT-4o series and GPT-4.1 series
Gemini models
Claude series models (via OpenAI-compatible providers like OpenRouter or LiteLLM )

To see example usage, see:

tests: test_llm.py, test_llm_async.py, test_chat-agent.py.
example script: pdf-json-no-parse.py, which shows how you can directly extract structured information from a document without having to first parse it to markdown (which is inherently lossy).

Basic Usage directly with LLM `chat` and `achat` methods¶

First create a FileAttachment object using one of the from_ methods. For image (png, jpg/jpeg) files you can use FileAttachment.from_path(p) where p is either a local file path, or a http/https URL. For PDF files, you can use from_path with a local file, or from_bytes or from_io (see below). In the examples below we show only pdf examples.

from langroid.language_models.base import LLMMessage, Role
from langroid.parsing.file_attachment import FileAttachment
import langroid.language_models as lm

# Create a file attachment
attachment = FileAttachment.from_path("path/to/document.pdf")

# Create messages with attachment
messages = [
    LLMMessage(role=Role.SYSTEM, content="You are a helpful assistant."),
    LLMMessage(
        role=Role.USER, content="What's the title of this document?", 
        files=[attachment]
    )
]

# Set up LLM with model that supports attachments
llm = lm.OpenAIGPT(lm.OpenAIGPTConfig(chat_model=lm.OpenAIChatModel.GPT4o))

# Get response
response = llm.chat(messages=messages)

Supported File Formats¶

Currently the OpenAI-API supports:

PDF files (including image-based PDFs)
image files and URLs

Creating Attachments¶

There are multiple ways to create file attachments:

# From a file path
attachment = FileAttachment.from_path("path/to/file.pdf")

# From bytes
with open("path/to/file.pdf", "rb") as f:
    attachment = FileAttachment.from_bytes(f.read(), filename="document.pdf")

# From a file-like object
from io import BytesIO
file_obj = BytesIO(pdf_bytes)
attachment = FileAttachment.from_io(file_obj, filename="document.pdf")

Follow-up Questions¶

You can continue the conversation with follow-up questions that reference the attached files:

messages.append(LLMMessage(role=Role.ASSISTANT, content=response.message))
messages.append(LLMMessage(role=Role.USER, content="What is the main topic?"))
response = llm.chat(messages=messages)

Multiple Attachments¶

Langroid allows multiple files can be sent in a single message, but as of 16 Apr 2025, sending multiple PDF files does not appear to be properly supported in the APIs (they seem to only use the last file attached), although sending multiple images does work.

messages = [
    LLMMessage(
        role=Role.USER,
        content="Compare these documents",
        files=[attachment1, attachment2]
    )
]

Using File Attachments with Agents¶

Agents can process file attachments as well, in the llm_response method, which takes a ChatDocument object as input. To pass in file attachments, include the files field in the ChatDocument, in addition to the content:

import langroid as lr
from langroid.agent.chat_document import ChatDocument, ChatDocMetaData
from langroid.mytypes import Entity


agent = lr.ChatAgent(lr.ChatAgentConfig())

user_input = ChatDocument(
    content="What is the title of this document?",
    files=[attachment],
    metadata=ChatDocMetaData(
        sender=Entity.USER,
    )
)
# or more simply, use the agent's `create_user_response` method:
# user_input = agent.create_user_response(
#     content="What is the title of this document?",
#     files=[attachment],    
# )
response = agent.llm_response(user_input)

Using File Attachments with Tasks¶

In Langroid, Task.run() can take a ChatDocument object as input, and as mentioned above, it can contain attached files in the files field. To ensure proper orchestration, you'd want to properly set various metadata fields as well, such as sender, etc. Langroid provides a convenient create_user_response method to create a ChatDocument object with the necessary metadata, so you only need to specify the content and files fields:

from langroid.parsing.file_attachment import FileAttachment
from langroid.agent.task import Task

agent = ...
# Create task
task = Task(agent, interactive=True)

# Create a file attachment
attachment = FileAttachment.from_path("path/to/document.pdf")

# Create input with attachment
input_message = agent.create_user_response(
    content="Extract data from this document",
    files=[attachment]
)

# Run task with file attachment
result = task.run(input_message)

See the script pdf-json-no-parse.py for a complete example of using file attachments with tasks.

Practical Applications¶

PDF document analysis and data extraction
Report summarization
Structured information extraction from documents
Visual content analysis

For more complex applications, consider using the Task and Agent infrastructure in Langroid to orchestrate multi-step document processing workflows.