Skip to content

Yatimai/reasonforge

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ReasonForge

Iterative fine-tuning of LLMs on Text-to-SQL using STaR (Self-Taught Reasoner).

Overview

This project implements a self-improvement loop for Text-to-SQL generation. The model generates multiple SQL candidates, executes them against SQLite databases, and fine-tunes on correct solutions.

Spider Dataset (7K train, 1K dev)
         ↓
Ministral-8B generates k=8 SQL candidates
         ↓
Execute on SQLite → keep correct ones
         ↓
Fine-tune LoRA on (question, SQL)
         ↓
Repeat for 3 iterations
         ↓
Baseline 60.1% → SFT 68.8% → STaR 78.0%

Results

Dev Set Performance (1,034 examples)

Model Accuracy Method
Ministral-8B baseline 60.1% greedy
+ SFT on Spider 68.8% greedy
+ STaR (3 iterations) 78.0% self-consistency k=16

STaR Training Progression

Iteration Train Accuracy Dev Accuracy
1 82.2% 71.0%
2 88.0% 71.7%
3 88.0% 72.1% (greedy) / 78.0% (k=16)

Tech Stack

Component Technology
Base Model Ministral-8B-Instruct-2410
Fine-tuning LoRA via LLaMA-Factory
Inference vLLM
Dataset Spider (Yale) - 7K train, 1K dev
Method STaR with k=8 candidates
Hardware NVIDIA H200

Project Structure

reasonforge/
├── src/
│   ├── star_train.py         # Main STaR training pipeline
│   ├── star_retrain.py       # Mini-STaR retrain on errors
│   ├── sql_utils.py          # SQL execution and matching
│   ├── prompts.py            # Prompts and generation config
│   └── evaluation/
│       └── eval_dev.py       # Dev set evaluation
├── configs/
│   ├── config.yaml           # STaR configuration
│   └── sft_ministral8b.yaml  # SFT configuration
├── data/
│   ├── spider/               # Spider dataset (166 DBs)
│   └── errors/               # Error analysis
├── models/
│   ├── merged_iter_1/        # STaR iteration 1
│   ├── merged_iter_2/        # STaR iteration 2
│   ├── merged_iter_3/        # STaR iteration 3 (best)
│   └── history.json          # Training history
├── reports/                  # Evaluation results
└── scripts/
    └── download_spider.py    # Dataset download

Quick Start

# 1. Install dependencies
pip install -r requirements.txt

# 2. Download Spider dataset
python scripts/download_spider.py

# 3. Evaluate baseline model
python src/evaluation/eval_dev.py \
    --model_path mistralai/Ministral-8B-Instruct-2410 \
    --k 1 \
    --temperature 0.0 \
    --no_self_consistency

# 4. Run STaR training
python src/star_train.py

# 5. Evaluate fine-tuned model (k=16, self-consistency)
python src/evaluation/eval_dev.py \
    --model_path ./models/merged_iter_3 \
    --k 16 \
    --self_consistency

Configuration

Main configuration in configs/config.yaml:

star:
  k_candidates: 8              # Candidates per question
  train_from_base: true        # Always start from base model
  difficulty_resampling: true  # Oversample hard questions

self_improvement:
  max_iterations: 5
  eval_sample_size: 7000

License

MIT

About

Iterative LLM fine-tuning on Text-to-SQL using STaR

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages