Setup¶
Install¶
Ensure you are using Python 3.11. It is best to work in a virtual environment:
# go to your repo root (which may be langroid-examples)
cd <your repo root>
python3 -m venv .venv
. ./.venv/bin/activate
langroid-examples
repo, which can be a good starting point for your own repo.
The langroid-examples
repo already contains a pyproject.toml
file so that you can
use Poetry
to manage your virtual environment and dependencies.
For example you can do
Alternatively, use pip
to install langroid
into your virtual environment:
The core Langroid package lets you use OpenAI Embeddings models via their API.
If you instead want to use the sentence-transformers
embedding models from HuggingFace,
install Langroid like this:
doc-chat
extra:
- For "chat with databases", use the db
extra:
`bash
pip install "langroid[db]"
- You can specify multiple extras by separating them with commas, e.g.:
- To simply install all optional dependencies, use the all
extra (but note that this will result in longer load/startup times and a larger install size):
Optional Installs for using SQL Chat with a PostgreSQL DB
If you are using SQLChatAgent
(e.g. the script examples/data-qa/sql-chat/sql_chat.py
,
with a postgres db, you will need to:
- Install PostgreSQL dev libraries for your platform, e.g.
sudo apt-get install libpq-dev
on Ubuntu,brew install postgresql
on Mac, etc.
- Install langroid with the postgres extra, e.g.
pip install langroid[postgres]
orpoetry add langroid[postgres]
orpoetry install -E postgres
. If this gives you an error, trypip install psycopg2-binary
in your virtualenv.
Work in a nice terminal, such as Iterm2, rather than a notebook
All of the examples we will go through are command-line applications. For the best experience we recommend you work in a nice terminal that supports colored outputs, such as Iterm2.
OpenAI GPT-4/GPT-4o is required
The various LLM prompts and instructions in Langroid have been tested to work well with GPT-4 (and to some extent GPT-4o). Switching to other LLMs (local/open and proprietary) is easy (see guides mentioned below), and may suffice for some applications, but in general you may see inferior results unless you adjust the prompts and/or the multi-agent setup.
mysqlclient errors
If you get strange errors involving mysqlclient
, try doing pip uninstall mysqlclient
followed by pip install mysqlclient
Set up tokens/keys¶
To get started, all you need is an OpenAI API Key. If you don't have one, see this OpenAI Page. (Note that while this is the simplest way to get started, Langroid works with practically any LLM, not just those from OpenAI. See the guides to using Open/Local LLMs, and other non-OpenAI proprietary LLMs.)
In the root of the repo, copy the .env-template
file to a new file .env
:
.env
file should look like this:
Alternatively, you can set this as an environment variable in your shell (you will need to do this every time you open a new shell):
All of the following environment variable settings are optional, and some are only needed to use specific features (as noted below).
- Qdrant Vector Store API Key, URL. This is only required if you want to use Qdrant cloud.
Langroid uses LanceDB as the default vector store in its
DocChatAgent
class (for RAG). Alternatively Chroma is also currently supported. We use the local-storage version of Chroma, so there is no need for an API key. - Redis Password, host, port: This is optional, and only needed to cache LLM API responses using Redis Cloud. Redis offers a free 30MB Redis account which is more than sufficient to try out Langroid and even beyond. If you don't set up these, Langroid will use a pure-python Redis in-memory cache via the Fakeredis library.
- Momento Serverless Caching of LLM API responses (as an alternative to Redis).
To use Momento instead of Redis:
- enter your Momento Token in the
.env
file, as the value ofMOMENTO_AUTH_TOKEN
(see example file below), - in the
.env
file setCACHE_TYPE=momento
(instead ofCACHE_TYPE=redis
which is the default).
- enter your Momento Token in the
- GitHub Personal Access Token (required for apps that need to analyze git repos; token-based API calls are less rate-limited). See this GitHub page.
- Google Custom Search API Credentials: Only needed to enable an Agent to use the
GoogleSearchTool
. To use Google Search as an LLM Tool/Plugin/function-call, you'll need to set up a Google API key, then setup a Google Custom Search Engine (CSE) and get the CSE ID. (Documentation for these can be challenging, we suggest asking GPT4 for a step-by-step guide.) After obtaining these credentials, store them as values ofGOOGLE_API_KEY
andGOOGLE_CSE_ID
in your.env
file. Full documentation on using this (and other such "stateless" tools) is coming soon, but in the meantime take a peek at the testtests/main/test_google_search_tool.py
to see how to use it.
If you add all of these optional variables, your .env
file should look like this:
OPENAI_API_KEY=your-key-here-without-quotes
GITHUB_ACCESS_TOKEN=your-personal-access-token-no-quotes
CACHE_TYPE=redis # or momento
REDIS_PASSWORD=your-redis-password-no-quotes
REDIS_HOST=your-redis-hostname-no-quotes
REDIS_PORT=your-redis-port-no-quotes
MOMENTO_AUTH_TOKEN=your-momento-token-no-quotes # instead of REDIS* variables
QDRANT_API_KEY=your-key
QDRANT_API_URL=https://your.url.here:6333 # note port number must be included
GOOGLE_API_KEY=your-key
GOOGLE_CSE_ID=your-cse-id
Microsoft Azure OpenAI setup[Optional]¶
This section applies only if you are using Microsoft Azure OpenAI.
When using Azure OpenAI, additional environment variables are required in the
.env
file.
This page Microsoft Azure OpenAI
provides more information, and you can set each environment variable as follows:
AZURE_OPENAI_API_KEY
, from the value ofAPI_KEY
AZURE_OPENAI_API_BASE
from the value ofENDPOINT
, typically looks likehttps://your.domain.azure.com
.- For
AZURE_OPENAI_API_VERSION
, you can use the default value in.env-template
, and latest version can be found here AZURE_OPENAI_DEPLOYMENT_NAME
is the name of the deployed model, which is defined by the user during the model setupAZURE_OPENAI_MODEL_NAME
Azure OpenAI allows specific model names when you select the model for your deployment. You need to put precisly the exact model name that was selected. For example, GPT-3.5 (should begpt-35-turbo-16k
orgpt-35-turbo
) or GPT-4 (should begpt-4-32k
orgpt-4
).AZURE_OPENAI_MODEL_VERSION
is required ifAZURE_OPENAI_MODEL_NAME = gpt=4
, which will assist Langroid to determine the cost of the model
Next steps¶
Now you should be ready to use Langroid! As a next step, you may want to see how you can use Langroid to interact directly with the LLM (OpenAI GPT models only for now).