Simple RAG pipeline. Dockerized open source

Published at

4 days ago

Main Article

https://github.com/Emissary-Tech/legit-rag

Legit-RAG

A modular Retrieval-Augmented Generation (RAG) system built with FastAPI, Qdrant, and OpenAI.

System Components

Components - Individual RAG components
Workflow Components - RAG workflow implementation
Logging System - Event logging and visualization

Workflow Components

The system follows a 5-step RAG workflow:

Query Routing (router.py)
- Determines if a query can be answered (ANSWER), needs clarification (CLARIFY), or should be rejected (REJECT)
- Uses LLM to make intelligent routing decisions
- Extensible through BaseRequestRouter interface
Query Reformulation (reformulator.py)
- Refines the original query for better retrieval
- Extracts keywords for hybrid search
- Implements BaseQueryReformulator for custom reformulation strategies
Context Retrieval (retriever.py)
- Performs hybrid search combining:
  - Semantic search using embeddings
  - Keyword-based search
- Currently uses Qdrant for vector storage
- Extensible through BaseRetriever interface
Completion Check (completion_checker.py)
- Evaluates if retrieved context is sufficient to answer the query
- Returns confidence score
- Customizable threshold through configuration
- Implements BaseCompletionChecker interface
Answer Generation (answer_generator.py)
- Generates final response using retrieved context
- Includes relevant citations
- Provides confidence scoring
- Extensible through BaseAnswerGenerator interface

Extensibility

The system is designed for easy extension and modification:

LLM Providers
- Currently uses OpenAI
- Can be extended to support other providers (Anthropic, Bedrock, etc.)
- Each component uses abstract base classes for provider independence
Vector Databases
- Currently implements Qdrant
- Can be extended to support other vector DBs (Pinecone, Weaviate, etc.)
- Abstract BaseRetriever interface for new implementations
Document Management
- Flexible document model with metadata support
- Extensible for different document types and sources
Search Strategies
- Hybrid search combining semantic and keyword approaches
- Customizable result merging strategies
- Extensible for additional search methods

Setup and Installation

Prerequisites

Python 3.10+
Docker and Docker Compose
OpenAI API key

Setup Steps

Clone the repository:

git clone https://github.com/yourusername/legit-rag.git
cd legit-rag

Create and activate a virtual environment:

python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

Install dependencies:

pip install -r requirements.txt

Create .env file:

cp .env.example .env

Edit .env and add your OpenAI API key:

OPENAI_API_KEY=your-key-here

Running the System

Start our API server and the Qdrant vector database:

docker-compose up -d

The API will be available at http://localhost:8000
The Qdrant db will be available at http://localhost:6333

To run the API server directly (i.e. in a debugger), note that—after stopping it in Docker—it may be run with:

python -m src.api

API Endpoints

Add Documents

POST /documents
{
    "documents": [
        {
            "text": "Your document text here",
            "metadata": {"source": "wiki", "topic": "example"}
        }
    ]
}

Query

POST /query
{
    "query": "Your question here"
}

Example Usage

import requests

# Add documents
docs = {
    "documents": [
        {
            "text": "Example document text",
            "metadata": {"source": "example"}
        }
    ]
}
response = requests.post("http://localhost:8000/documents", json=docs)

# Query
query = {
    "query": "What does the document say?"
}
response = requests.post("http://localhost:8000/query", json=query)
print(response.json())

API Documentation

Once the server is running, you can access the API documentation at:

Swagger UI: http://localhost:8000/docs
ReDoc: http://localhost:8000/redoc

Configuration

Key configuration options in config.py:

LLM models for each component
Vector DB settings
Completion threshold
API endpoints and ports

Future Enhancements

Provider-agnostic LLM interface
Support for streaming responses
Additional vector database implementations
Enhanced document preprocessing
Caching layer for frequent queries
Batch document processing
Advanced result ranking strategies

gittech. site