Skip to content

LegalAI-tech/LegalAI-FastApi-Backend

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 

Repository files navigation

⚖️ Nyay Mitra

Production-Grade Agentic Legal AI Platform for Indian Law

Multi-agent reasoning · Adaptive RAG · Lawyer workspace · Multilingual support


📌 Overview

Nyay Mitra is a full-stack, production-grade legal AI platform built for the Indian legal system. It uses a LangGraph-based agentic execution pipeline that replaces traditional single-LLM chatbot patterns with a controlled, parallel, role-aware multi-agent reasoning system.

Unlike conventional chatbots, Nyay Mitra:

  • Routes queries through a deterministic + semantic intent classification system
  • Executes multiple agents in parallel (RAG, Chat, Document Analysis)
  • Aggregates and synthesizes results with context-aware memory
  • Supports lawyer workspace mode with persistent matter context
  • Handles multilingual Indian legal queries with translation pipelines

✨ Features

For Citizens (Normal Users)

  • Normal Mode — Direct lightweight chat for quick legal queries
  • Agent Mode — Full LangGraph execution with RAG, Document Analysis, and Chat
  • Document Upload & Chat — Upload PDFs/text files and ask questions
  • Multilingual Support — Query in Hindi, Bengali, Urdu, Tamil, Telugu, and more
  • Contextual Memory — Rolling conversation summary for follow-on queries

For Lawyers

  • Matter-Based Workspace — Persistent legal memory per matter (case)
  • Workspace Memory Injection — Party summaries, legal issues, chronology injected into every AI response
  • Multi-Document RAG — Vector-indexed matter documents with semantic retrieval
  • Document Analysis — Structured legal analysis with clause-by-clause breakdown
  • Contract Review — Professional (lawyer) and client-friendly review modes
  • Client Communications — Auto-generate WhatsApp messages, emails, and voice note scripts
  • Team Collaboration — Multiple lawyers on a matter with role-based access

Platform

  • Adaptive RAG — Direct, map-reduce, and section-based retrieval strategies based on document size
  • Parallel Execution — LangGraph fan-out with asyncio.gather for concurrent agent execution
  • Intelligent Synthesis — Single LLM call produces both final response and updated memory summary
  • Document Deduplication — SHA-256 content hashing prevents duplicate ingestion
  • HuggingFace Hub Storage — Documents stored on HF Hub with background reindexing on restart
  • Legal Document Generation — Jinja2 template-based legal draft generation

🏗️ System Architecture

┌─────────────────────────────────────────────────────────────────┐
│                        Frontend (React)                         │
└─────────────────────────┬───────────────────────────────────────┘
                          │
┌─────────────────────────▼───────────────────────────────────────┐
│              Node.js BFF (Express + Prisma)                     │
│                                                                 │
│  ┌─────────────┐  ┌──────────────┐  ┌────────────────────────┐  │
│  │ Citizen API │  │  Lawyer API  │  │  Workspace Memory API  │  │
│  │  /citizen   │  │   /lawyer    │  │  Matter · Documents    │  │
│  └─────────────┘  └──────────────┘  └────────────────────────┘  │
│                                                                 │
│  PostgreSQL (via Prisma) — Conversations, Matters, Memory       │
└─────────────────────────┬───────────────────────────────────────┘
                          │ HTTP
┌─────────────────────────▼───────────────────────────────────────┐
│              Python FastAPI Backend (v3)                        │
│                                                                 │
│  ┌──────────────────────────────────────────────────────────┐   │
│  │                  LangGraph Orchestrator                  │   │
│  │                                                          │   │
│  │workspace_context → language_detection → translation_query│   │
│  │                            ↓                   ↓         │   │
│  │                      intent_router <- document_ingestion │   │ 
│  │                             ↓                            │   │
│  │              ┌──────────────┬──────────┐                 │   │
│  │              ▼              ▼          ▼                 │   │
│  │          chat_node      rag_node   doc_analysis          │   │
│  │              └──────────────┴──────────┘                 │   │
│  │                            ↓                             │   │
│  │                       aggregator                         │   │
│  │                            ↓                             │   │
│  │                       synthesizer                        │   │
│  │                            ↓                             │   │
│  │                   translation_response                   │   │
│  └──────────────────────────────────────────────────────────┘   │
│                                                                 │
│  ┌──────────────┐  ┌──────────────┐  ┌──────────────────────┐   │
│  │  RAG Service │  │ Chat Service │  │ Translation Service  │   │
│  │  ChromaDB    │  │  Gemini LLM  │  │   Gemini (Argos      │   │
│  │  Gemini Emb  │  │  Rolling Mem │  │   fallback planned)  │   │
│  └──────────────┘  └──────────────┘  └──────────────────────┘   │
└─────────────────────────────────────────────────────────────────┘
                          │
              ┌───────────▼────────────┐
              │   HuggingFace Hub      │
              │   Document Storage     │
              │   (PDF/TXT uploads)    │
              └────────────────────────┘

🧠 LangGraph Execution Pipeline

Execution Graph

[START]
   │
   ▼
workspace_context_node     ← Injects matter/workspace context for lawyer mode
   │
   ▼
language_detection_node    ← Detects language, sets translation flags
   │                          Early exit if input_language = "en"
   ├──[needs translation]──► translation_query_node
   │
   ▼
document_ingestion_node    ← SHA-256 dedup, chunking, Gemini embeddings → ChromaDB
   │                          Skips if no storage_url
   ▼
intent_router_node         ← 3-layer routing:
   │                          L1: Deterministic (has_document, is_summary, is_web)
   │                          L2: LLM semantic classifier (7 intent classes)
   │                          L3: Safety net (chat fallback)
   │
   ├──[use_chat]────────────► chat_node
   ├──[use_rag]─────────────► rag_node          ← Parallel fan-out
   ├──[use_document_analysis]► doc_analysis_node ← via asyncio.gather
   └──[use_web]─────────────► web_node (planned)
                │
                ▼
          aggregator_node    ← Collects outputs from all parallel nodes
                │
                ▼
          synthesizer_node   ← Context-aware LLM synthesis
                │              previous_summary + all outputs → final_response + updated_summary
                │              Single LLM call for both outputs
                ├──[chat only]──► passthrough (no LLM call)
                │
                ▼
   translation_response_node ← Translates response if output_language ≠ en
                │
                ▼
            [END] → final_response + updated_summary → BFF

Intent Classification (3 Layers)

Layer Trigger Method Latency
L1 Deterministic has_document, is_summary, is_web State-based rules ~0ms
L2 LLM Semantic General queries, no document Gemini classifier (7 intents) ~1-2s
L3 Safety Net LLM timeout/failure Fallback to chat ~0ms

7 Intent Classes:

legal_explanation  → chat only
document_question  → rag only
document_summary   → document_analysis only
comparison         → rag + chat
legal_research     → web + chat
greeting           → chat only
followup           → chat only

🔄 Adaptive RAG System

Retrieval Strategy Selection

Document chunks:
  ≤ 6 chunks  → Direct analysis (single LLM call)
  ≤ 20 chunks → Summarization analysis (parallel batch LLM calls)
  > 20 chunks → Section-based analysis (concurrent section LLM calls)

Query retrieval:
  ≤ 4 chunks  → Direct RAG (single LLM call)
  > 4 chunks  → Map-Reduce RAG (parallel batches → reduce)

RAG Pipeline

Query → Semantic Embedding → ChromaDB Vector Search
     → Top-K Retrieval (semantic / full mode)
     → Context Construction (token budget: 8000)
     → Direct / Map-Reduce LLM Execution
     → Answer + Sources

Performance Metrics

Scenario Before Optimization After Optimization
Document Analysis (13 chunks) ~46s sequential ~5-8s parallel
RAG Query (large doc) ~36s sequential ~6-8s map-reduce
English query routing ~5s (LLM detection) ~0ms (early exit)
First doc upload ~5-10s
Follow-on doc query ~3-5s

🧩 Agent Internals

general_chat_agent

  • Wraps chat_service.get_chatbot_response()
  • Memory: previous_summary (rolling, 10-sentence LLM summary) + history[-6:] (exact window)
  • Two concurrent LLM calls: chat response + summary update (asyncio.gather)
  • Workspace context injection for lawyer mode

rag_agent

  • Selects retrieval strategy based on document size and query type
  • Meta-query detection (avoids vector search for "what document is this?" type queries)
  • Two prompt variants: semantic precision and comprehensive retrieval

document_analysis_agent

  • Three analysis tiers (direct, summarization, section-based)
  • Analysis cache: keyed by document_id:mode:top_k, max 100 entries FIFO
  • All batch/section LLM calls parallelized

translation_agent

  • Wraps translate_service.translate_text()
  • Gemini-based translation with legal terminology preservation
  • Prefix stripping for clean output

language_detection_agent

  • langdetect library with confidence scoring
  • User-provided language early exit (skips agent entirely)
  • Falls back to English for unsupported languages

🧾 Memory Architecture

Citizen Conversations

previous_summary (rolling, 10 sentences, LLM-generated)
    + history[-6:] (exact last 6 messages, 800 char/msg cap)
    → chat_service.main_chat_prompt
    → response + updated_summary (parallel)

Lawyer Workspace Conversations

previous_summary (conversation-level rolling memory)
    + workspace_context (matter-level persistent memory)
        - partySummary, legalIssues, factChronology
        - lawyerNotes, aiSummary, keyDates, documentIndex
    + history[-10:] (larger window for lawyer workflows)
    → injected into chat prompt + synthesizer prompt

Synthesizer Memory

For all non-chat paths (RAG, Analysis, multi-source):
    previous_summary + query + all node outputs
    → Single LLM call → final_response + updated_summary
    No extra LLM call — both outputs from same inference

📁 Project Structure

python_backend/
├── main.py                          # FastAPI app, lifespan, routers
├── orchestrator/
│   ├── graph.py                     # LangGraph StateGraph definition
│   ├── state.py                     # GraphState with reducers
│   └── service_registry.py          # Agent singletons
├── nodes/
│   ├── workspace_context_node.py    # Lawyer workspace injection
│   ├── language_detection_node.py   # Language detection + early exit
│   ├── translation_query_node.py    # Query translation
│   ├── document_ingestion_node.py   # Upload, chunk, embed, store
│   ├── intent_router_node.py        # 3-layer intent classification
│   ├── chat_node.py                 # General chat execution
│   ├── rag_node.py                  # RAG execution
│   ├── document_analysis_node.py    # Document analysis execution
│   ├── aggregator_node.py           # Parallel output collection
│   ├── synthesizer_node.py          # Context-aware synthesis
│   └── translation_response_node.py # Response translation
├── agents/
│   ├── general_chat_agent.py        # Chat agent wrapper
│   ├── rag_agent.py                 # RAG agent (no inner LangGraph)
│   ├── document_analysis_agent.py   # Analysis agent (no inner LangGraph)
│   ├── translation_agent.py         # Translation agent
│   └── language_detection_agent.py  # Language detection agent
├── services/
│   ├── rag_service.py               # ChromaDB, embeddings, retrieval, analysis
│   ├── chat_service.py              # Chat LLM, rolling memory, summary
│   ├── docgen_service.py            # Document generation service
│   ├── translate_service.py         # Gemini translation
│   └── language_detection_service.py # langdetect wrapper
├── routers/
│   ├── agent_router.py              # /agent/chat, /agent/upload-and-chat
│   ├── chatbot_router.py            # Legacy direct chat
│   ├── translate_router.py          # Translation API
│   ├── language_detection_router.py # Language detection API
│   └── docgen_router.py             # Document generation API
└── utils/
    ├── llm_service.py               # llm_generate with fence stripping + JSON extraction
    ├── graph_utils.py               # is_executed() guard
    ├── logger.py                    # Structured logging
    ├── config.py                    # Settings from env
    └── constants.py                 # Node name constants

node_backend/
├── src/
│   ├── module/
│   │   ├── lawyer/
│   │   │   ├── chat/                # Lawyer chat service + controller
│   │   │   ├── comms/               # Client communication generation
│   │   │   ├── contract/            # Contract review
│   │   │   ├── document/            # Matter document management
│   │   │   ├── matter/              # Matter CRUD
│   │   │   └── workspace/           # WorkspaceMemory service
│   │   ├── citizen/                 # Citizen chat + agent mode
│   │   └── auth/                    # JWT auth
│   ├── services/
│   │   └── python-backend.service.ts # Python API client
│   └── config/
│       └── database.ts              # Prisma client
└── prisma/
    └── schema.prisma                # Matter, WorkspaceMemory, LawyerConversation...

🌐 API Reference

Agent Endpoints (Python — /api/v3)

POST /agent/chat

Full LangGraph agent chat for text queries.

{
  "query": "what is anticipatory bail under CrPC",
  "session_id": "uuid",
  "document_id": "chromadb-uuid",
  "history": [{"role": "user", "content": "..."}, {"role": "assistant", "content": "..."}],
  "previous_summary": "Prior conversation summary...",
  "conversation_id": "conv-uuid",
  "conversation_type": "matter_workspace | standalone | client_comms_generation",
  "user_role": "LAWYER | CITIZEN",
  "matter_id": "matter-uuid",
  "workspace_memory": {
    "partySummary": "...",
    "factChronology": "...",
    "legalIssues": "...",
    "lawyerNotes": "...",
    "aiSummary": "...",
    "keyDates": [{"label": "Next hearing", "date": "2024-08-01"}],
    "documentIndex": [{"id": "...", "title": "...", "vectorDocId": "..."}]
  },
  "input_language": "en",
  "output_language": "hi"
}

Response:

{
  "response": "...",
  "session_id": "uuid",
  "updated_summary": "...",
  "agents_used": ["intent_router", "rag_agent", "synthesizer"],
  "execution_trace": [...],
  "agent_outputs": {...},
  "language_info": {"detected_language": "en", "translated_query": "..."}
}

POST /agent/upload-and-chat

Upload a document and ask a question in one request (multipart/form-data).

Field Type Required
file binary
query string
session_id string
previous_summary string
input_language string
output_language string
user_role string

Response adds document_id and storage_url to the standard agent response.

POST /translate

Translate text using Gemini with legal terminology preservation.

{
  "text": "...",
  "source_lang": "en",
  "target_lang": "hi"
}

GET /health

System health check — RAG service, translation service, language detection service.


🚀 Deployment

HuggingFace Spaces (Python Backend)

The Python backend is designed for HuggingFace Spaces deployment:

# In your HF Space app.py / main.py
# ChromaDB uses /tmp/chroma_db (ephemeral but restored on restart via _reindex_from_hub)
# Background reindexing runs on startup — server is immediately available

On startup the system:

  1. Initializes all services (translation, language detection, RAG)
  2. Starts background re-indexing from HF Hub (non-blocking)
  3. Restores all documents from HF Hub storage within 30-60s

Environment Notes

  • /tmp/chroma_db — ephemeral, restored on restart via _reindex_from_hub()
  • Documents uploaded to HF Hub dataset survive restarts
  • Translation service is Gemini-based — no package downloads needed
  • Language detection uses langdetect (lightweight, no model downloads)

🔐 Security

  • JWT authentication for all lawyer endpoints
  • authenticateLawyer middleware on all lawyer routes
  • Matter ownership verified on every request
  • Document access scoped to matter membership
  • CORS configured (restrict allow_origins in production)
  • API keys via environment variables only

📊 Key Technical Decisions

Decision Rationale
LangGraph over custom orchestrator Reliable parallel execution, built-in state management, conditional routing
Gemini 2.5 Flash over GPT-4 Cost efficiency, strong multilingual Indian language support
ChromaDB over Pinecone Self-hosted, no per-query cost, sufficient for current scale
Rolling summary over full history Bounded token cost regardless of conversation length
Synthesizer owns updated_summary Single source of truth — no inconsistency across agents
SHA-256 deduplication Prevents re-ingestion of same document across uploads/restarts
Background reindexing Server available immediately — documents restored async
Gemini translation over Argos Quality for Indian legal content — no model downloads, correct terminology

🔧 Configuration Reference

RAG Service Constants

CHUNK_SIZE = 800
CHUNK_OVERLAP = 150
MAX_CHUNKS_PER_QUERY = 16
MAX_CONTEXT_TOKENS = 8000
MAX_ANALYSIS_TOKENS = 20000
RAG_DIRECT_THRESHOLD = 4      # chunks — below this: single LLM call
RAG_BATCH_SIZE = 4             # chunks per map-reduce batch
_llm_semaphore = Semaphore(5)  # max concurrent Gemini calls
_analysis_cache_max = 100      # FIFO eviction

Chat Service Constants

HISTORY_WINDOW = 6             # recent messages (exact)
SUMMARY_MAX_SENTENCES = 10     # rolling summary cap
MESSAGE_CHAR_CAP = 800         # per message in recent_block
TEMPERATURE_CHAT = 0.5         # balanced for legal accuracy
TEMPERATURE_SUMMARY = 0.0      # deterministic summaries

Intent Router Constants

LLM_CLASSIFY_TIMEOUT = 10.0   # seconds before fallback

🧪 Testing

Agent Chat (Swagger UI)

GET /docs → Swagger UI
POST /api/v3/agent/chat

Health Check

curl http://localhost:8080/health

Translation Test

curl -X POST http://localhost:8080/api/v3/translate \
  -H "Content-Type: application/json" \
  -d '{"text": "What is bail in Indian law", "source_lang": "en", "target_lang": "hi"}'

📈 Roadmap

  • Web search agent integration (Serper/Tavily)
  • Workspace context injection for lawyer mode (in progress)
  • Hinglish (Roman script Hindi) detection and handling
  • Multi-document RAG (query across multiple documents simultaneously)
  • Persistent ChromaDB storage (upgrade to paid HF Spaces)
  • Query rewriting for ambiguous pronoun resolution
  • Streaming responses
  • Rate limiting and request queuing
  • Gemini paid tier upgrade (removes 20 req/day limit)

⚠️ Disclaimer

Nyay Mitra is an AI-powered tool for legal information purposes only. It does not constitute legal advice. Always consult a qualified legal professional for advice specific to your situation.


👨‍💻 Authors

Tejasvi Aryan — AI/ML Architecture, LangGraph Pipeline, RAG System, Python Backend

Anubrata Guin — Node.js BFF, Prisma Schema, Lawyer Dashboard, Workspace System


⭐ Star this repository if you find it useful

Made with ❤️ for Indian legal accessibility

About

Production-grade legal AI backend powering Nyay Mitra. Built with FastAPI + LangGraph — multi-agent execution pipeline with adaptive RAG, parallel reasoning, document intelligence, lawyer workspace context injection, and multilingual support for Indian languages. Gemini 2.5 Flash.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors