base
VectorStore(config)
¶
Bases: ABC
Abstract base class for a vector store.
Source code in langroid/vector_store/base.py
clear_empty_collections()
abstractmethod
¶
Clear all empty collections in the vector store. Returns the number of collections deleted.
clear_all_collections(really=False, prefix='')
abstractmethod
¶
Clear all collections in the vector store.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
really |
bool
|
Whether to really clear all collections. Defaults to False. |
False
|
prefix |
str
|
Prefix of collections to clear. |
''
|
Returns: int: Number of collections deleted.
Source code in langroid/vector_store/base.py
list_collections(empty=False)
abstractmethod
¶
List all collections in the vector store (only non empty collections if empty=False).
set_collection(collection_name, replace=False)
¶
Set the current collection to the given collection name. Args: collection_name (str): Name of the collection. replace (bool, optional): Whether to replace the collection if it already exists. Defaults to False.
Source code in langroid/vector_store/base.py
create_collection(collection_name, replace=False)
abstractmethod
¶
Create a collection with the given name. Args: collection_name (str): Name of the collection. replace (bool, optional): Whether to replace the collection if it already exists. Defaults to False.
Source code in langroid/vector_store/base.py
compute_from_docs(docs, calc)
¶
Compute a result on a set of documents,
using a dataframe calc string like df.groupby('state')['income'].mean()
.
Source code in langroid/vector_store/base.py
maybe_add_ids(documents)
¶
Add ids to metadata if absent, since some vecdbs don't like having blank ids.
Source code in langroid/vector_store/base.py
similar_texts_with_scores(text, k=1, where=None)
abstractmethod
¶
Find k most similar texts to the given text, in terms of vector distance metric (e.g., cosine similarity).
Parameters:
Name | Type | Description | Default |
---|---|---|---|
text |
str
|
The text to find similar texts for. |
required |
k |
int
|
Number of similar texts to retrieve. Defaults to 1. |
1
|
where |
Optional[str]
|
Where clause to filter the search. |
None
|
Returns:
Type | Description |
---|---|
List[Tuple[Document, float]]
|
List[Tuple[Document,float]]: List of (Document, score) tuples. |
Source code in langroid/vector_store/base.py
add_context_window(docs_scores, neighbors=0)
¶
In each doc's metadata, there may be a window_ids field indicating the ids of the chunks around the current chunk. These window_ids may overlap, so we - coalesce each overlapping groups into a single window (maintaining ordering), - create a new document for each part, preserving metadata,
We may have stored a longer set of window_ids than we need during chunking.
Now, we just want neighbors
on each side of the center of the window_ids list.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
docs_scores |
List[Tuple[Document, float]]
|
List of pairs of documents to add context windows to together with their match scores. |
required |
neighbors |
int
|
Number of neighbors on "each side" of match to retrieve. Defaults to 0. "Each side" here means before and after the match, in the original text. |
0
|
Returns:
Type | Description |
---|---|
List[Tuple[Document, float]]
|
List[Tuple[Document, float]]: List of (Document, score) tuples. |
Source code in langroid/vector_store/base.py
remove_overlaps(windows)
staticmethod
¶
Given a collection of windows, where each window is a sequence of ids, identify groups of overlapping windows, and for each overlapping group, order the chunk-ids using topological sort so they appear in the original order in the text.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
windows |
List[int | str]
|
List of windows, where each window is a sequence of ids. |
required |
Returns:
Type | Description |
---|---|
List[List[str]]
|
List[int|str]: List of windows, where each window is a sequence of ids, and no two windows overlap. |
Source code in langroid/vector_store/base.py
get_all_documents(where='')
abstractmethod
¶
Get all documents in the current collection, possibly filtered by where
.