pydantic_utils
langroid/utils/pydantic_utils.py
flatten_dict(d, parent_key='', sep='.')
¶
Flatten a nested dictionary, using a separator in the keys. Useful for pydantic_v1 models with nested fields -- first use dct = mdl.model_dump() to get a nested dictionary, then use this function to flatten it.
Source code in langroid/utils/pydantic_utils.py
has_field(model_class, field_name)
¶
Check if a Pydantic model class has a field with the given name.
flatten_pydantic_model(model, base_model=BaseModel)
¶
Given a possibly nested Pydantic class, return a flattened version of it, by constructing top-level fields, whose names are formed from the path through the nested structure, separated by double underscores.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
model
|
Type[BaseModel]
|
The Pydantic model to flatten. |
required |
base_model
|
Type[BaseModel]
|
The base model to use for the flattened model. Defaults to BaseModel. |
BaseModel
|
Returns:
Type | Description |
---|---|
Type[BaseModel]
|
Type[BaseModel]: The flattened Pydantic model. |
Source code in langroid/utils/pydantic_utils.py
get_field_names(model)
¶
Get all field names from a possibly nested Pydantic model.
Source code in langroid/utils/pydantic_utils.py
generate_simple_schema(model, exclude=[])
¶
Generates a JSON schema for a Pydantic model, with options to exclude specific fields.
This function traverses the Pydantic model's fields, including nested models, to generate a dictionary representing the JSON schema. Fields specified in the exclude list will not be included in the generated schema.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
model
|
Type[BaseModel]
|
The Pydantic model class to generate the schema for. |
required |
exclude
|
List[str]
|
A list of string field names to be excluded from the generated schema. Defaults to an empty list. |
[]
|
Returns:
Type | Description |
---|---|
Dict[str, Any]
|
Dict[str, Any]: A dictionary representing the JSON schema of the provided model, with specified fields excluded. |
Source code in langroid/utils/pydantic_utils.py
flatten_pydantic_instance(instance, prefix='', force_str=False)
¶
Given a possibly nested Pydantic instance, return a flattened version of it, as a dict where nested traversal paths are translated to keys a__b__c.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
instance
|
BaseModel
|
The Pydantic instance to flatten. |
required |
prefix
|
str
|
The prefix to use for the top-level fields. |
''
|
force_str
|
bool
|
Whether to force all values to be strings. |
False
|
Returns:
Type | Description |
---|---|
Dict[str, Any]
|
Dict[str, Any]: The flattened dict. |
Source code in langroid/utils/pydantic_utils.py
extract_fields(doc, fields)
¶
Extract specified fields from a Pydantic object. Supports dotted field names, e.g. "metadata.author". Dotted fields are matched exactly according to the corresponding path. Non-dotted fields are matched against the last part of the path. Clashes ignored. Args: doc (BaseModel): The Pydantic object. fields (List[str]): The list of fields to extract.
Returns:
Type | Description |
---|---|
Dict[str, Any]
|
Dict[str, Any]: A dictionary of field names and values. |
Source code in langroid/utils/pydantic_utils.py
nested_dict_from_flat(flat_data, sub_dict='')
¶
Given a flattened version of a nested dict, reconstruct the nested dict. Field names in the flattened dict are assumed to be of the form "field1__field2__field3", going from top level down.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
flat_data
|
Dict[str, Any]
|
The flattened dict. |
required |
sub_dict
|
str
|
The name of the sub-dict to extract from the flattened dict. Defaults to "" (extract the whole dict). |
''
|
Returns:
Type | Description |
---|---|
Dict[str, Any]
|
Dict[str, Any]: The nested dict. |
Source code in langroid/utils/pydantic_utils.py
pydantic_obj_from_flat_dict(flat_data, model, sub_dict='')
¶
Flattened dict with a__b__c style keys -> nested dict -> pydantic object
Source code in langroid/utils/pydantic_utils.py
temp_params(config, field, temp)
¶
Context manager to temporarily override field
in a config
Source code in langroid/utils/pydantic_utils.py
numpy_to_python_type(numpy_type)
¶
Converts a numpy data type to its Python equivalent.
Source code in langroid/utils/pydantic_utils.py
dataframe_to_pydantic_model(df)
¶
Make a Pydantic model from a dataframe.
Source code in langroid/utils/pydantic_utils.py
dataframe_to_pydantic_objects(df)
¶
Make a list of Pydantic objects from a dataframe.
first_non_null(series)
¶
dataframe_to_document_model(df, content='content', metadata=[], exclude=[])
¶
Make a subclass of Document from a dataframe.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
df
|
DataFrame
|
The dataframe. |
required |
content
|
str
|
The name of the column containing the content, which will map to the Document.content field. |
'content'
|
metadata
|
List[str]
|
A list of column names containing metadata; these will be included in the Document.metadata field. |
[]
|
exclude
|
List[str]
|
A list of column names to exclude from the model. (e.g. "vector" when lance is used to add an embedding vector to the df) |
[]
|
Returns:
Type | Description |
---|---|
Type[BaseModel]
|
Type[BaseModel]: A pydantic model subclassing Document. |
Source code in langroid/utils/pydantic_utils.py
409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 |
|
dataframe_to_documents(df, content='content', metadata=[], doc_cls=None)
¶
Make a list of Document objects from a dataframe. Args: df (pd.DataFrame): The dataframe. content (str): The name of the column containing the content, which will map to the Document.content field. metadata (List[str]): A list of column names containing metadata; these will be included in the Document.metadata field. doc_cls (Type[BaseModel], optional): A Pydantic model subclassing Document. Defaults to None. Returns: List[Document]: The list of Document objects.
Source code in langroid/utils/pydantic_utils.py
extra_metadata(document, doc_cls=Document)
¶
Checks for extra fields in a document's metadata that are not defined in the original metadata schema.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
document
|
Document
|
The document instance to check for extra fields. |
required |
doc_cls
|
Type[Document]
|
The class type derived from Document, used as a reference to identify extra fields in the document's metadata. |
Document
|
Returns:
Type | Description |
---|---|
List[str]
|
List[str]: A list of strings representing the keys of the extra fields found |
List[str]
|
in the document's metadata. |
Source code in langroid/utils/pydantic_utils.py
extend_document_class(d)
¶
Generates a new pydantic class based on a given document instance.
This function dynamically creates a new pydantic class with additional fields based on the "extra" metadata fields present in the given document instance. The new class is a subclass of the original Document class, with the original metadata fields retained and extra fields added as normal fields to the metadata.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
d
|
Document
|
An instance of the Document class. |
required |
Returns:
Type | Description |
---|---|
Type[Document]
|
A new subclass of the Document class that includes the additional fields |
Type[Document]
|
found in the metadata of the given document instance. |