Skip to content

models

langroid/embedding_models/models.py

FastEmbedEmbeddingsConfig

Bases: EmbeddingModelsConfig

Config for qdrant/fastembed embeddings, see here: https://github.com/qdrant/fastembed

EmbeddingFunctionCallable(embed_model, batch_size=512)

A callable class designed to generate embeddings for a list of texts using the OpenAI or Azure OpenAI API, with automatic retries on failure.

Attributes:

Name Type Description
embed_model EmbeddingModel

An instance of EmbeddingModel that provides configuration and utilities for generating embeddings.

Methods:

Name Description
__call__

List[str]) -> Embeddings: Generate embeddings for a list of input texts.

Parameters:

Name Type Description Default
model OpenAIEmbeddings or AzureOpenAIEmbeddings

An instance of OpenAIEmbeddings or AzureOpenAIEmbeddings to use for generating embeddings.

required
batch_size int

Batch size

512
Source code in langroid/embedding_models/models.py
def __init__(self, embed_model: EmbeddingModel, batch_size: int = 512):
    """
    Initialize the EmbeddingFunctionCallable with a specific model.

    Args:
        model ( OpenAIEmbeddings or AzureOpenAIEmbeddings): An instance of
                        OpenAIEmbeddings or AzureOpenAIEmbeddings to use for
                        generating embeddings.
        batch_size (int): Batch size
    """
    self.embed_model = embed_model
    self.batch_size = batch_size

OpenAIEmbeddings(config=OpenAIEmbeddingsConfig())

Bases: EmbeddingModel

Source code in langroid/embedding_models/models.py
def __init__(self, config: OpenAIEmbeddingsConfig = OpenAIEmbeddingsConfig()):
    super().__init__()
    self.config = config
    load_dotenv()
    self.config.api_key = os.getenv("OPENAI_API_KEY", "")
    self.config.organization = os.getenv("OPENAI_ORGANIZATION", "")
    if self.config.api_key == "":
        raise ValueError(
            """OPENAI_API_KEY env variable must be set to use 
            OpenAIEmbeddings. Please set the OPENAI_API_KEY value 
            in your .env file.
            """
        )
    self.client = OpenAI(base_url=self.config.api_base, api_key=self.config.api_key)
    self.tokenizer = tiktoken.encoding_for_model(self.config.model_name)

truncate_texts(texts)

Truncate texts to the embedding model's context length. TODO: Maybe we should show warning, and consider doing T5 summarization?

Source code in langroid/embedding_models/models.py
def truncate_texts(self, texts: List[str]) -> List[List[int]]:
    """
    Truncate texts to the embedding model's context length.
    TODO: Maybe we should show warning, and consider doing T5 summarization?
    """
    return [
        self.tokenizer.encode(text, disallowed_special=())[
            : self.config.context_length
        ]
        for text in texts
    ]

AzureOpenAIEmbeddings(config=AzureOpenAIEmbeddingsConfig())

Bases: EmbeddingModel

Azure OpenAI embeddings model implementation.

Parameters:

Name Type Description Default
config AzureOpenAIEmbeddingsConfig

Configuration for Azure OpenAI embeddings model.

AzureOpenAIEmbeddingsConfig()

Raises: ValueError: If required Azure config values are not set.

Source code in langroid/embedding_models/models.py
def __init__(
    self, config: AzureOpenAIEmbeddingsConfig = AzureOpenAIEmbeddingsConfig()
):
    """
    Initializes Azure OpenAI embeddings model.

    Args:
        config: Configuration for Azure OpenAI embeddings model.
    Raises:
        ValueError: If required Azure config values are not set.
    """
    super().__init__()
    self.config = config
    load_dotenv()

    if self.config.api_key == "":
        raise ValueError(
            """AZURE_OPENAI_API_KEY env variable must be set to use 
        AzureOpenAIEmbeddings. Please set the AZURE_OPENAI_API_KEY value 
        in your .env file."""
        )

    if self.config.api_base == "":
        raise ValueError(
            """AZURE_OPENAI_API_BASE env variable must be set to use 
        AzureOpenAIEmbeddings. Please set the AZURE_OPENAI_API_BASE value 
        in your .env file."""
        )
    self.client = AzureOpenAI(
        api_key=self.config.api_key,
        api_version=self.config.api_version,
        azure_endpoint=self.config.api_base,
        azure_deployment=self.config.deployment_name,
    )
    self.tokenizer = tiktoken.encoding_for_model(self.config.model_name)

truncate_texts(texts)

Truncate texts to the embedding model's context length. TODO: Maybe we should show warning, and consider doing T5 summarization?

Source code in langroid/embedding_models/models.py
def truncate_texts(self, texts: List[str]) -> List[List[int]]:
    """
    Truncate texts to the embedding model's context length.
    TODO: Maybe we should show warning, and consider doing T5 summarization?
    """
    return [
        self.tokenizer.encode(text, disallowed_special=())[
            : self.config.context_length
        ]
        for text in texts
    ]

embedding_fn()

Get the embedding function for Azure OpenAI.

Returns:

Type Description
Callable[[List[str]], Embeddings]

Callable that generates embeddings for input texts.

Source code in langroid/embedding_models/models.py
def embedding_fn(self) -> Callable[[List[str]], Embeddings]:
    """Get the embedding function for Azure OpenAI.

    Returns:
        Callable that generates embeddings for input texts.
    """
    return EmbeddingFunctionCallable(self, self.config.batch_size)

embedding_model(embedding_fn_type='openai')

Parameters:

Name Type Description Default
embedding_fn_type str

Type of embedding model to use. Options are: - "openai", - "azure-openai", - "sentencetransformer", or - "fastembed". (others may be added in the future)

'openai'

Returns: EmbeddingModel: The corresponding embedding model class.

Source code in langroid/embedding_models/models.py
def embedding_model(embedding_fn_type: str = "openai") -> EmbeddingModel:
    """
    Args:
        embedding_fn_type: Type of embedding model to use. Options are:
         - "openai",
         - "azure-openai",
         - "sentencetransformer", or
         - "fastembed".
            (others may be added in the future)
    Returns:
        EmbeddingModel: The corresponding embedding model class.
    """
    if embedding_fn_type == "openai":
        return OpenAIEmbeddings  # type: ignore
    elif embedding_fn_type == "azure-openai":
        return AzureOpenAIEmbeddings  # type: ignore
    elif embedding_fn_type == "fastembed":
        return FastEmbedEmbeddings  # type: ignore
    elif embedding_fn_type == "llamacppserver":
        return LlamaCppServerEmbeddings  # type: ignore
    else:  # default sentence transformer
        return SentenceTransformerEmbeddings  # type: ignore