Signal 38

The 38th parallel. North Korean military activity risk analysis — fine-tuned LFM2-350M evaluated against naive and classical ML baselines on 11 years of GDELT data.

What it does

Signal 38 ingests weekly clusters of GDELT v2 events involving North Korean military actors and produces structured risk assessments: escalation level (1–5), situation summary, key actors, watch indicators, and projected trajectories.

The landing page displays pre-computed assessments from the fine-tuned model on the held-out test set. Live in-browser inference via WebGPU is planned — the merged model will be exported to ONNX and loaded via transformers.js.

How it works

GDELT v2 events (11 years, NK military CAMEO codes)
  → weekly clusters (event counts, Goldstein scale, tone, actor codes)
  → Claude-labeled risk assessments (knowledge distillation)
  → three models evaluated:
      1. Naive baseline    — Goldstein-scale threshold rule
      2. Classical ML      — XGBoost on GDELT features
      3. LFM2-350M QLoRA  — fine-tuned on 463 labeled examples
  → LoRA adapter merged → pushed to HuggingFace Hub (signal38/lfm2-nk-risk)

Models

Model	Approach	Escalation MAE	Valid JSON
Naive baseline	Goldstein threshold rule	TBD	n/a
XGBoost	GDELT feature vector	TBD	n/a
LFM2-350M (fine-tuned)	QLoRA knowledge distillation	TBD	TBD

Results populated by 03_evaluate.ipynb — see data/outputs/results.json and data/outputs/test_predictions.json.

Notebooks

Run in order. Notebooks 02–04 require a T4 GPU runtime. All publish their artifacts back to this repo automatically.

Notebook	What it does	Runtime
`00_acled_labels.ipynb`	ACLED ground truth labels (optional)	CPU, ~2 min
`01_baseline.ipynb`	Naive rule + XGBoost baseline	CPU, ~5 min
`02_finetune.ipynb`	LFM2-350M QLoRA fine-tuning	T4 GPU, ~20 min
`03_evaluate.ipynb`	Three-model evaluation + results export	T4 GPU, ~10 min
`04_export_onnx.ipynb`	Merge adapter → fp16 PyTorch → HuggingFace Hub	T4 GPU, ~5 min
`05_export_gguf.ipynb`	Merge adapter → GGUF (Q4_K_M) → HuggingFace Hub	T4 GPU, ~5 min

Repo structure

signal38.github.io/
├── notebooks/          # Colab-ready pipeline notebooks
├── scripts/            # Shared helpers (colab_utils, features, metrics)
├── src/                # App source (WebGPU inference, UI)
├── docs/               # GitHub Pages site
├── data/
│   ├── clusters/       # Weekly GDELT event clusters
│   ├── labeled/        # Claude-generated risk assessments
│   ├── training/       # Train / val / test splits
│   └── outputs/        # Evaluation results (published by notebooks)
└── models/             # LoRA adapter weights and ONNX export (published by notebooks 02, 04)

Setup

pip install -r requirements.txt

Colab notebooks are self-contained. Open any notebook via the badge above, connect a T4 runtime, and run all cells.

Colab setup

The notebooks share common setup via scripts/colab_utils.py. Notebooks that publish artifacts back to this repo require a GITHUB_TOKEN_SIGNAL38 Colab secret. Notebook 00 additionally requires ACLED_EMAIL and ACLED_PASSWORD.

Setting up the GitHub token (one-time):

Create a fine-grained personal access token with Contents: Read and Write permission. When creating the token, set Resource Owner to signal38 (the org) — the form defaults to your personal account, which produces a token with the wrong scope. Then add it to Colab: open the key icon in the left sidebar → Secrets → Add new secret, name it GITHUB_TOKEN_SIGNAL38, paste the token, and enable notebook access.

For ACLED credentials (notebook 00), register at acleddata.com and add ACLED_EMAIL and ACLED_PASSWORD as Colab secrets.

For HuggingFace Hub (notebooks 04 and 05), create a write token at huggingface.co/settings/tokens and add it as HF_TOKEN.

Team

Diya Mirji — @dvm14
Jonas Neves — @jonasneves
Mike Saju — @Michaelsaju1

Built for AIPI 540.01 — Deep Learning, Spring 2026, Duke University AIPI Program.

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 129 Commits
.github		.github
data		data
docs		docs
models		models
notebooks		notebooks
scripts		scripts
.gitattributes		.gitattributes
.gitignore		.gitignore
EXAMPLES.md		EXAMPLES.md
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Signal 38

What it does

How it works

Models

Notebooks

Repo structure

Setup

Colab setup

Team

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Signal 38

What it does

How it works

Models

Notebooks

Repo structure

Setup

Colab setup

Team

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages