Skip to content

TangibleResearch/Halgorithem

Repository files navigation

Halgorithem

Deterministic hallucination detection for checking AI output against trusted source material.

Halgorithem takes source documents and AI output, extracts factual claims, retrieves the closest evidence, and labels each claim as supported, weakly supported, contradicted, hallucinated, or an unverifiable denial.

What It Does

  • Verifies AI-generated factual claims against supplied sources.
  • Splits multi-fact text into atomic claims.
  • Retrieves evidence chunks from one or more sources.
  • Detects date, number, unit, negation, source-qualifier, and simple entity-role conflicts.
  • Flags time-sensitive claims such as "current", "latest", "today", and "now".
  • Returns a structured result for every extracted claim.

What It Does Not Do

  • It does not prove truth in the real world.
  • It does not browse the web unless you use the optional URL wrapper.
  • It does not replace source quality review.
  • It does not guarantee perfect paraphrase understanding, especially with the local fallback embedder.
  • It does not use an LLM for verification.

Install

python -m pip install -e .

Recommended NLP model:

python -m spacy download en_core_web_lg

Lightweight fallback:

python -m spacy download en_core_web_sm

If neither spaCy model is installed, Halgorithem falls back to spacy.blank("en") with reduced linguistic accuracy instead of crashing.

Quick Start

from Halgorithem import Halgorithm

algo = Halgorithm()

results = algo.compare_to_docs(
    truth_docs=[
        {
            "file_id": 1,
            "file_path": "source.txt",
            "text": "BASIC was created in 1964 by John Kemeny at Dartmouth College.",
        }
    ],
    ai_output="BASIC was created in 1972 by NASA.",
)

for result in results:
    print(result["status"], result["claim"], result["reason"])

Python API

from Halgorithem import Halgorithm

algo = Halgorithm(sentences_per_chunk=2, sentence_overlap=1)

Verify in-memory documents:

algo.compare_to_docs(
    truth_docs="BASIC was created in 1964.",
    ai_output="BASIC was created in 1964.",
)

Verify files:

algo.compare_to_files(
    truth_file_paths=["sources/basic.txt"],
    ai_output="BASIC was created by NASA.",
)

Optional generation wrapper:

from engine import run

result = run(
    prompt="Summarize this source.",
    truth_file_paths=["sources/basic.txt"],
)

The wrapper may call OpenAI for generation. The verifier in Halgorithem/ remains deterministic.

CLI Usage

Interactive terminal UI:

python tui.py

Benchmark:

python bench.py

Tests

python -m pytest

The pytest suite is designed to be network-free and uses local documents.

Benchmark

bench.py runs a release benchmark across:

  • supported claims
  • paraphrases
  • weak support
  • hallucinations
  • date mismatches
  • entity-role swaps
  • unit errors
  • current/latest claims
  • table-like facts
  • multi-source disagreement
  • denial claims
  • missing-source cases

It reports accuracy, accuracy by category, a confusion matrix, failures, temporal warning checks, and a pass/fail threshold.

Output Schema

Every claim result includes:

{
    "claim": str,
    "status": "SUPPORTED | WEAK_SUPPORT | CONTRADICTION | HALLUCINATION | UNVERIFIABLE_DENIAL | ERROR",
    "confidence": float,
    "score": float,
    "matched_source": str | None,
    "matched_chunk_id": int | None,
    "matched_chunk": str,
    "chunk_text": str,
    "evidence": list,
    "unsupported_terms": list[str],
    "reason": str,
    "warning": str | None,
}

Verdict Meanings

  • SUPPORTED: strong evidence is present in the supplied sources.
  • WEAK_SUPPORT: related evidence exists, but the claim is inferential or not fully direct.
  • CONTRADICTION: relevant source evidence conflicts with the claim.
  • HALLUCINATION: the claim lacks adequate source support.
  • UNVERIFIABLE_DENIAL: the claim denies a fact or entity absent from the sources, so absence alone cannot prove it.
  • ERROR: the verifier could not parse or evaluate the claim, mostly for malformed math.

Runtime Hardening

Halgorithem handles:

  • missing files
  • empty sources
  • empty AI output
  • malformed truth_docs
  • missing text fields
  • bad UTF-8 file encodings
  • no extracted claims
  • math parse errors
  • missing spaCy or embedding models

Limitations

  • Rule-based entity-role detection handles common simple patterns, not arbitrary grammar.
  • Local hashing embeddings are deterministic and CI-safe but less semantic than sentence-transformers.
  • Multi-source disagreement is surfaced when the source qualifier is explicit.
  • Table-like facts work best when row values are near each other in text.
  • Current/latest claims are warned, not externally refreshed.

Roadmap

  • Optional structured table parser.
  • Better contradiction handling for passive and nested clauses.
  • Calibrated benchmark sets by domain.
  • Machine-readable benchmark artifacts.
  • More CLI commands beyond the interactive TUI.

Release Readiness

v1.0 readiness means tests pass, the benchmark meets its threshold, CI passes, packaging installs, README limitations are documented, and the release checklist is complete.

About

A Algo designed to detect AI Hallucitions

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages