Research Repository for "Semantic Chameleon: Corpus-Dependent Poisoning Attacks and Defenses in RAG Systems"
DEFENSIVE RESEARCH ONLY: This repository contains sanitized educational materials for understanding and defending against RAG poisoning attacks. No weaponized attack materials are included.
Paper (PDF): https://doi.org/10.5281/zenodo.18080200
Code (This Repo): https://doi.org/10.5281/zenodo.18079735
Anonymous Review Copy: https://anonymous.4open.science/r/semantic-chameleon-B610/
Author: Scott Thornton (perfecXion.ai)
Abstract: This work characterizes how corpus composition and retrieval architecture jointly affect RAG security. We find that technical corpora are 13-62× harder to defend than general knowledge bases, and that simple hybrid BM25+vector retrieval neutralizes gradient-optimized attacks in our experiments.
Key Findings:
- 38.0% co-retrieval success on pure vector retrieval (n=50, 95% CI: 25.9%-51.8%)
- Hybrid retrieval (α≤0.5) reduces co-retrieval to 0% across all 50 gradient-optimized attacks
- Joint sparse+dense optimization partially circumvents hybrid (20-44% success) but significantly raises the bar
- Multi-model E2E (5 LLMs): attack success 46.7% (GPT-5.3) to 93.3% (Llama 4); safety violations 6.7% (Claude) to 93.3% (Llama 4)
- FEVER n=25: 0% overall success across all retrieval configs, confirming corpus-dependent effects at scale
- Technical corpora show 13-62× worse detection performance than general knowledge bases
- Query Pattern Differential emerges as most reliable detection method across corpora
Attack effectiveness varies dramatically across model families:
| Model | Attack Success | Safety Violations | Payload Leakage | Divergence |
|---|---|---|---|---|
| GPT-5.3 | 46.7% (7/15) | 33.3% | 9.6% | 0.284 |
| GPT-4o | 53.3% (8/15) | 86.7% | 12.0% | 0.483 |
| GPT-4o-mini | 53.3% (8/15) | 86.7% | 14.9% | 0.418 |
| Claude Sonnet 4.6 | 60.0% (9/15) | 6.7% | 5.7% | 0.196 |
| Llama 4 Instruct | 93.3% (14/15) | 93.3% | 56.8% | 0.268 |
Key Insight: Safety training maturity varies dramatically. Claude shows the strongest safety boundary (6.7% violations despite 60% attack success). Llama 4 is dramatically vulnerable (93% attack success, only 27% clean refusal rate). GPT-5.3 shows measurable improvement over GPT-4o.
A knowledgeable attacker who jointly optimizes for both BM25 and vector retrieval can partially circumvent hybrid defense:
| Attack Type | α=0.7 | α=0.5 | α=0.3 |
|---|---|---|---|
| Gradient-only (baseline) | 0% | 0% | 0% |
| Joint optimization | 20% | 36% | 44% |
Key Insight: Hybrid retrieval raises the attack bar from 38% (pure vector) to 0% (gradient-only on hybrid), but joint optimization achieves 20-44%. Hybrid retrieval is a significant defense, not an absolute one.
25 GCG-optimized attacks on FEVER Wikipedia (2,000-doc representative sample):
| Config | Co-Retrieval | Stealth | Overall Success |
|---|---|---|---|
| Pure Vector (α=1.0) | 100% | 0% | 0% |
| Hybrid (α=0.7) | 100% | 0% | 0% |
| Hybrid (α=0.5) | 100% | 0% | 0% |
| Hybrid (α=0.3) | 100% | 0% | 0% |
Key Insight: Confirms n=9 pilot at 2.8× scale. General-vocabulary corpora make attack documents conspicuous regardless of retrieval architecture.
Initial evaluation against GPT-4o-mini (15 attack scenarios):
| Metric | Result |
|---|---|
| Attack Success Rate | 60% (9/15 scenarios) |
| Safety Bypass Rate | 80% of successful attacks |
| Response Divergence | 46% average |
| Model Tested | GPT-4o-mini |
Validated corpus-dependency hypothesis against a 156,777-document production corpus:
| Attack Type | Retrieval Success | Trigger Rank |
|---|---|---|
| Naive (generic) | 0% | N/A |
| Adaptive (corpus-optimized) | 100% | #1 |
semantic-chameleon/
├── README.md # This file
├── LICENSE # MIT License
├── SECURITY.md # Responsible disclosure policy
│
├── detection/ # Detection framework (defensive only)
│ ├── semantic_drift.py # Method 1: Embedding anomaly detection
│ ├── keyword_anomaly.py # Method 2: IDF-based keyword detection
│ ├── query_pattern.py # Method 3: Query differential analysis
│ ├── detection_metrics.py # ROC, F1, AUROC evaluation
│ └── README.md # Detection method documentation
│
├── defense/ # Defense implementations
│ ├── hybrid_retrieval.py # BM25+vector hybrid scoring
│ ├── bm25_implementation.py # Okapi BM25 with configurable params
│ └── README.md # Defense deployment guide
│
├── evaluation/ # Evaluation scripts
│ ├── metrics.py # Success rate, CI calculation (Wilson score)
│ ├── statistical_tests.py # Chi-square, effect size (Cohen's h)
│ ├── corpus_analysis.py # Corpus property analysis
│ ├── e2e_llm_evaluation.py # NEW: End-to-end LLM evaluation
│ └── README.md # Evaluation methodology
│
├── examples/ # Sanitized educational examples
│ ├── sanitized_scenarios.json # Attack scenario descriptions (no exploits)
│ ├── benign_document_templates.txt # Example benign document structures
│ ├── detection_examples.py # How to use detection framework
│ └── README.md # Examples documentation
│
├── data/ # Dataset information (no actual data)
│ ├── security_se_instructions.md # How to obtain Security Stack Exchange
│ ├── fever_instructions.md # How to obtain FEVER dataset
│ └── corpus_statistics.json # Corpus metadata (sizes, domains)
│
├── experiments/ # Experiment scripts (March 2026)
│ ├── exp1_fever_large_scale.py # FEVER n=25 evaluation
│ ├── exp2_multimodel_e2e.py # Multi-model E2E (5 LLMs)
│ ├── exp3_joint_hybrid_attack.py # Joint sparse+dense optimization
│ ├── setup_data.py # Data download and embedding setup
│ └── requirements.txt # Experiment dependencies
│
├── results/ # Experimental results
│ ├── e2e_evaluation_results.json # Dec 2025: E2E LLM evaluation
│ ├── panw_case_study.json # Dec 2025: Production case study
│ ├── march-2026/ # March 2026 experiments
│ │ ├── exp1_fever_large_scale_results.json
│ │ ├── exp2_multimodel_e2e_results.json
│ │ └── exp3_joint_hybrid_attack_results.json
│ └── README.md # Results documentation
│
├── paper/ # Paper materials
│ ├── paper.pdf # Main paper (arXiv version)
│ ├── supplementary.pdf # Supplementary materials
│ └── figures/ # Paper figures (PNG, 300 DPI)
│
├── docs/ # Documentation
│ ├── REPRODUCIBILITY.md # Step-by-step reproduction guide
│ ├── ETHICAL_CONSIDERATIONS.md # Ethics and responsible use
│ ├── DEPLOYMENT_GUIDE.md # How to deploy defenses
│ └── FAQ.md # Frequently asked questions
│
└── requirements.txt # Python dependencies
This repository provides:
- Detection methods - 5 detection approaches with complete implementations
- Defense mechanisms - Hybrid retrieval and monitoring strategies
- Evaluation tools - Metrics, statistical tests, ROC analysis
- Corpus analysis - Understanding corpus-dependent security properties
- Deployment guides - Practical guidance for securing RAG systems
NOT included - Working attack implementations, weaponizable exploits, malicious document generation
git clone https://github.com/scthornton/semantic-chameleon
cd semantic-chameleon
pip install -r requirements.txtfrom detection.query_pattern import QueryPatternDetector
from defense.hybrid_retrieval import HybridRetriever
# Initialize detector
detector = QueryPatternDetector(
benign_queries=100, # Sample from production logs
sensitive_queries=20 # Domain-specific attack patterns
)
# Initialize hybrid defense
retriever = HybridRetriever(alpha=0.5) # Balanced BM25+vector
# Analyze corpus
results = detector.analyze_corpus(corpus, threshold=0.2)
print(f"Detected: {results['flagged_documents']} suspicious documents")from defense.hybrid_retrieval import HybridRetriever
# Security-critical configuration (recommended)
retriever = HybridRetriever(
alpha=0.5, # 50% vector, 50% BM25
bm25_k1=1.5, # Standard Okapi BM25
bm25_b=0.75
)
# Retrieve with defense
results = retriever.retrieve(query, k=10)See docs/REPRODUCIBILITY.md for complete step-by-step instructions.
High-level overview:
- Obtain datasets: Security Stack Exchange dump + FEVER dataset
- Run detection evaluation:
python evaluation/run_detection.py - Test hybrid defense:
python evaluation/run_hybrid_defense.py - Generate figures:
python evaluation/generate_figures.py
Expected compute: ~8-16 hours on GCP n1-standard-8 (or equivalent)
This research follows responsible disclosure practices:
- Defensive focus: All materials prioritize understanding defenses
- No weaponization: Attack implementations are conceptual only
- Sanitized examples: All examples use non-exploitable scenarios
- Coordinated disclosure: Vulnerabilities reported to affected vendors
See docs/ETHICAL_CONSIDERATIONS.md for full ethics statement.
If you use this research or code, please cite:
@article{thornton2025semantic,
author = {Thornton, Scott},
title = {Semantic Chameleon: Corpus-Dependent Poisoning Attacks and Defenses in RAG Systems},
year = {2025},
doi = {10.5281/zenodo.18080200},
url = {https://doi.org/10.5281/zenodo.18080200},
publisher = {Zenodo}
}Paper: https://doi.org/10.5281/zenodo.18080200 Code: https://doi.org/10.5281/zenodo.18079735
We welcome contributions that advance RAG security defenses:
- Detection method improvements
- New defense mechanisms
- Evaluation tools
- Documentation improvements
Not accepted: Attack implementations, weaponizable code, malicious examples
See CONTRIBUTING.md for guidelines.
Scott Thornton
- Website: https://perfecxion.ai
- Email: scott@perfecxion.ai
- Paper: https://doi.org/10.5281/zenodo.18080200
- GitHub: https://github.com/scthornton/semantic-chameleon
Security Issues: Please report via SECURITY.md
MIT License - see LICENSE for details.
Responsible Use Clause: By using this code, you agree to use it only for defensive security research, system hardening, and educational purposes. Malicious use is prohibited and violates the terms of this license.
- Security Stack Exchange community for public dataset
- FEVER dataset maintainers
- Google Cloud Platform for computational resources
- OpenAI for embedding API access
Last Updated: March 2026 Paper DOI: 10.5281/zenodo.18080200 Code DOI: 10.5281/zenodo.18079735 Status: Published on Zenodo (defensive research materials)