Cortex

Self-hosted RAG engine with hybrid semantic + lexical retrieval

PythonFastAPINext.js 15React 19OpenAI APIFAISSSQLiteDockerAWSTypeScriptTailwind CSS
Cortex RAG Architecture: Hybrid Retrieval System combining Semantic Search (FAISS vectors), BM25 Lexical Search, and Reciprocal Rank Fusion (RRF) for optimal document retrieval across multi-format documents

Cortex: Production-Ready RAG System

Overview

Cortex is a self-hosted Retrieval Augmented Generation (RAG) system that enables semantic understanding and querying across multi-format documents. It combines intelligent memory management, hybrid retrieval strategies, and confidence scoring to deliver accurate, context-aware responses.

Problem

Organizations struggle to extract insights from large document collections. Traditional search methods fail to understand semantic meaning, and manual document analysis is time-consuming and inconsistent. Teams spend hours reviewing documents to find relevant information, leading to inefficient workflows and missed insights.

Solution

Cortex implements a production-ready RAG architecture with three core innovations:

  1. Intelligent Memory System: Maintains context-aware sessions with conversation history, allowing for multi-turn interactions that build on previous queries.

  2. Hybrid Retrieval: Combines Semantic Search (FAISS vectors), BM25 Lexical Search, and Reciprocal Rank Fusion (RRF) to maximize retrieval accuracy across different query types.

  3. Confidence Scoring: Provides visual confidence badges and source attribution, helping users understand the reliability of generated answers.

Architecture

The system follows a modular architecture:

  • Backend: FastAPI serves as the API layer, handling document processing, embedding generation, and LLM interactions
  • Vector Store: FAISS enables fast similarity search across document embeddings
  • Database: SQLite stores session metadata and conversation history
  • Frontend: Next.js 15 with React 19 provides a responsive, modern interface
  • Deployment: Docker Compose orchestrates services with Nginx as a reverse proxy on AWS

Technical Breakdown

Key Technologies

  • Python for backend processing and AI integration
  • FastAPI for high-performance API endpoints
  • FAISS for efficient vector similarity search
  • OpenAI API for embeddings and LLM inference
  • Next.js 15 with React 19 for the frontend
  • Docker for containerization and deployment
  • AWS EC2 for hosting

Challenges Solved

  1. Document Chunking: Implemented intelligent chunking that preserves context while respecting token limits
  2. Hybrid Search: Balanced semantic and lexical retrieval using RRF to handle both conceptual and keyword queries
  3. Memory Management: Built session-based memory that maintains conversation context without storing full histories
  4. Confidence Scoring: Developed a scoring mechanism that evaluates retrieval quality and answer relevance

Results

  • 95%+ accuracy in document retrieval and question answering
  • Time reduction: Document analysis time reduced from hours to minutes
  • Scalable architecture: Handles large document collections with efficient vector search
  • Production-ready: Containerized deployment enables easy setup and maintenance

What I Learned

Building Cortex taught me the importance of balancing semantic and lexical search strategies. The hybrid approach significantly outperformed either method alone. Additionally, implementing session-based memory required careful design to balance context retention with performance.

Next Steps

Future enhancements could include:

  • Support for additional document formats (Markdown, CSV)
  • Multi-tenant architecture for enterprise deployments
  • Advanced caching strategies for faster response times
  • Integration with more LLM providers for flexibility

Key Features & Capabilities

Intelligent Memory System for context-aware sessions and conversation history

Hybrid Retrieval combining Semantic Search, BM25 Lexical Search, and Reciprocal Rank Fusion

Confidence Scoring with source attribution and visual confidence badges

Multi-format document pipeline (PDF/DOCX/TXT) with session-based metadata

Modular LLM integration supporting multiple models and providers

Vector similarity search using FAISS for semantic document retrieval

Containerized deployment with Docker Compose and Nginx

Responsive web interface with modern UX patterns