Lexis.ai is an industry-grade AI-powered legal document intelligence platform that combines OCR, LLM-based extraction, risk analytics, and conversational AI to help you analyze contracts and uncover hidden risks.
- Smart Upload: Drag-and-drop document upload with automatic processing
- OCR Extraction: Extract text from images and PDFs using Tesseract
- AI Classification: Automatically classify documents (invoices, receipts, contracts)
- Structured Extraction: Extract key fields (vendor, amount, date, invoice number)
- Thumbnail Generation: Auto-generate document previews
- Chat with Documents: Ask natural language questions about your documents
- Context-Aware: Select multiple documents for cross-document queries
- Smart Responses: Powered by local LLM (Ollama) for privacy
- Financial Analytics: Track total spend, average amounts, highest transactions
- Vendor Analysis: See spending breakdown by vendor
- Monthly Trends: Visualize spending patterns over time
- Natural Language Queries: Ask questions like "What's my total spend with Acme Corp?"
- Pre-built Workflows: Invoice processing, receipt categorization, contract analysis
- Real-time Progress: Visual feedback for each workflow step
- Export Options: CSV, QuickBooks IIF, Excel formats
- Modern Design: Purple gradient theme with glassmorphism effects
- Smooth Animations: Framer Motion powered interactions
- Dark Mode: Full dark mode support
- Responsive: Works on desktop, tablet, and mobile
- Framework: React 18 with TypeScript
- Styling: Tailwind CSS + shadcn/ui components
- Animations: Framer Motion
- State Management: React Hooks
- Database: Supabase (PostgreSQL)
- Authentication: Supabase Auth
- Framework: FastAPI
- LLM: Ollama (local) with Gemma 3:4b model
- OCR: Tesseract + pytesseract
- PDF Processing: pdfplumber
- Image Processing: Pillow
- Cloud Integration: Vultr Object Storage (simulated)
- Local LLM: Ollama (privacy-first, no external API calls)
- Model: Gemma 3:4b (efficient, fast, accurate)
- RAG: Retrieval-Augmented Generation for document Q&A
- Extraction: LLM-based structured data extraction
- Node.js 18+ and npm
- Python 3.10+
- Ollama (for local LLM)
- Tesseract OCR
- Supabase Account (free tier works)
git clone https://github.com/yourusername/Lexis.ai.git
cd Lexis.aicd backend
# Create virtual environment
python -m venv venv
# Activate virtual environment
# Windows:
venv\Scripts\activate
# Mac/Linux:
source venv/bin/activate
# Install dependencies
pip install -r requirements.txt
# Copy environment template
copy .env.example .env # Windows
# cp .env.example .env # Mac/Linux
# Install Ollama and pull the model
ollama pull gemma3:4b
# Run the backend
uvicorn main:app --reloadBackend will run on http://localhost:8000
cd frontend
# Install dependencies
npm install
# Copy environment template
copy .env.example .env.local # Windows
# cp .env.example .env.local # Mac/Linux
# Edit .env.local and add your Supabase credentials
# VITE_SUPABASE_URL=your_supabase_url
# VITE_SUPABASE_PUBLISHABLE_KEY=your_supabase_anon_key
# Run the frontend
npm run devFrontend will run on http://localhost:8080
- Create a free account at supabase.com
- Create a new project
- Run the SQL schema (see
database/schema.sql) - Copy your project URL and anon key to
.env.local
- React 18
- TypeScript
- Vite
- Tailwind CSS
- shadcn/ui
- Framer Motion
- React Router
- Supabase Client
- FastAPI
- Python 3.10+
- Ollama (LLM)
- Tesseract OCR
- pdfplumber
- Pillow
- python-dotenv
- Supabase (PostgreSQL)
- Supabase Auth
- Ollama
- Gemma 3:4b
- Tesseract OCR
- β No API Keys in Code: All secrets in environment variables
- β Local LLM: Privacy-first with Ollama (no external API calls)
- β Secure Auth: Supabase authentication with JWT
- β CORS Protection: Configured for production
- β .gitignore: Protects sensitive files
# Optional: Only if using Gemini instead of Ollama
GEMINI_API_KEY=your_key_here
# Ollama Configuration
OLLAMA_MODEL=gemma3:4bVITE_SUPABASE_URL=your_supabase_project_url
VITE_SUPABASE_PUBLISHABLE_KEY=your_supabase_anon_key- Upload β User uploads document (PDF/Image)
- OCR β Tesseract extracts text
- Classification β LLM identifies document type
- Extraction β LLM extracts structured fields
- Storage β Saved to Supabase + Vultr backup
- Ready β Available for chat and analytics
- Uses RAG (Retrieval-Augmented Generation)
- Combines extracted data + OCR text for context
- Local LLM ensures privacy
- Supports multi-document queries
- Real-time calculation from extracted data
- Vendor aggregation
- Monthly trend analysis
- Natural language query support
- Rate Limits: Demo has 3 uploads and 5 questions per document limit
- OCR Accuracy: Depends on image quality
- LLM Speed: Local Ollama may be slower than cloud APIs
- File Size: Large PDFs (>10MB) may take longer to process
Contributions are welcome! Please feel free to submit a Pull Request.
This project is licensed under the MIT License.
- Ollama - Local LLM runtime
- Supabase - Backend as a Service
- Tesseract - OCR engine
- shadcn/ui - Beautiful UI components
- Vultr - Cloud infrastructure partner
For questions or support, please open an issue on GitHub.
Built with β€οΈ for legal intelligence and contract protection