Code Injection Protection with full_eval Flag¶
Available in Langroid since v0.53.15.
Langroid provides a security feature that helps protect against code injection vulnerabilities when evaluating pandas expressions in TableChatAgent
and VectorStore
. This protection is controlled by the full_eval
flag, which defaults to False
for maximum security, but can be set to True
when working in trusted environments.
Background¶
When executing dynamic pandas expressions within TableChatAgent
and in VectorStore.compute_from_docs()
, there is a risk of code injection if malicious input is provided. To mitigate this risk, Langroid implements a command sanitization system that validates and restricts the operations that can be performed.
How It Works¶
The sanitization system uses AST (Abstract Syntax Tree) analysis to enforce a security policy that:
- Restricts DataFrame methods to a safe whitelist
- Prevents access to potentially dangerous methods and arguments
- Limits expression depth and method chaining
- Validates literals and numeric values to be within safe bounds
- Blocks access to any variables other than the provided DataFrame
When full_eval=False
(the default), all expressions are run through this sanitization process before evaluation. When full_eval=True
, the sanitization is bypassed, allowing full access to pandas functionality.
Configuration Options¶
In TableChatAgent¶
from langroid.agent.special.table_chat_agent import TableChatAgentConfig, TableChatAgent
config = TableChatAgentConfig(
data=my_dataframe,
full_eval=False, # Default: True only for trusted input
)
agent = TableChatAgent(config)
In VectorStore¶
from langroid.vector_store.lancedb import LanceDBConfig, LanceDB
config = LanceDBConfig(
collection_name="my_collection",
full_eval=False, # Default: True only for trusted input
)
vectorstore = LanceDB(config)
When to Use full_eval=True¶
Set full_eval=True
only when:
- All input comes from trusted sources (not from users or external systems)
- You need full pandas functionality that goes beyond the whitelisted methods
- You're working in a controlled development or testing environment
Security Considerations¶
- By default,
full_eval=False
provides a good balance of security and functionality - The whitelisted operations support most common pandas operations
- Setting
full_eval=True
removes all protection and should be used with caution - Even with protection, always validate input when possible
Affected Classes¶
The full_eval
flag affects the following components:
TableChatAgentConfig
andTableChatAgent
- Controls sanitization in thepandas_eval
methodVectorStoreConfig
andVectorStore
- Controls sanitization in thecompute_from_docs
method- All implementations of
VectorStore
(ChromaDB, LanceDB, MeiliSearch, PineconeDB, PostgresDB, QdrantDB, WeaviateDB)
Example: Safe Pandas Operations¶
When full_eval=False
, the following operations are allowed:
# Allowed operations (non-exhaustive list)
df.head()
df.groupby('column')['value'].mean()
df[df['column'] > 10]
df.sort_values('column', ascending=False)
df.pivot_table(...)
Some operations that might be blocked include:
# Potentially blocked operations
df.eval("dangerous_expression")
df.query("dangerous_query")
df.apply(lambda x: dangerous_function(x))
Testing Considerations¶
When writing tests that use TableChatAgent
or VectorStore.compute_from_docs()
with pandas expressions that go beyond the whitelisted operations, you may need to set full_eval=True
to ensure the tests pass.