Skip to content

Local embeddings provision via llama.cpp server

As of Langroid v0.30.0, you can use llama.cpp as provider of embeddings to any of Langroid's vector stores, allowing access to a wide variety of GGUF-compatible embedding models, e.g. nomic-ai's Embed Text V1.5.

When defining a VecDB, you can provide an instance of LlamaCppServerEmbeddingsConfig to the VecDB config to instantiate the llama.cpp embeddings server handler.

To configure the LlamaCppServerEmbeddingsConfig, there are several parameters that should be adjusted, these are:

embed_cfg = LlamaCppServerEmbeddingsConfig(
    api_base="your-address-here", # IP + Port, e.g. localhost:5001
    dims=768,  # Change this to match the dimensions of your embedding model
    context_length=2048, # Change to match the config of the model.
    batch_size=2048, # Safest to ensure this matches context_length
    )

The above configuration is sufficient for a server running the example nomic embedding model with the command:

./llama-server -ngl 100 -c 2048 -m ~/nomic-embed-text-v1.5.Q8_0.gguf --host IP_ADDRESS --port PORT --embeddings -b 2048 -ub 2048

An example setup can be found inside examples/docqa/chat.py.