The 38th parallel. North Korean military activity risk analysis — fine-tuned LFM2-350M evaluated against naive and classical ML baselines on 11 years of GDELT data.
Signal 38 ingests weekly clusters of GDELT v2 events involving North Korean military actors and produces structured risk assessments: escalation level (1–5), situation summary, key actors, watch indicators, and projected trajectories.
The landing page displays pre-computed assessments from the fine-tuned model on the held-out test set. Live in-browser inference via WebGPU is planned — the merged model will be exported to ONNX and loaded via transformers.js.
GDELT v2 events (11 years, NK military CAMEO codes)
→ weekly clusters (event counts, Goldstein scale, tone, actor codes)
→ Claude-labeled risk assessments (knowledge distillation)
→ three models evaluated:
1. Naive baseline — Goldstein-scale threshold rule
2. Classical ML — XGBoost on GDELT features
3. LFM2-350M QLoRA — fine-tuned on 463 labeled examples
→ LoRA adapter merged → pushed to HuggingFace Hub (signal38/lfm2-nk-risk)
| Model | Approach | Escalation MAE | Valid JSON |
|---|---|---|---|
| Naive baseline | Goldstein threshold rule | TBD | n/a |
| XGBoost | GDELT feature vector | TBD | n/a |
| LFM2-350M (fine-tuned) | QLoRA knowledge distillation | TBD | TBD |
Results populated by 03_evaluate.ipynb — see data/outputs/results.json and data/outputs/test_predictions.json.
Run in order. Notebooks 02–04 require a T4 GPU runtime. All publish their artifacts back to this repo automatically.
| Notebook | What it does | Runtime | |
|---|---|---|---|
00_acled_labels.ipynb |
ACLED ground truth labels (optional) | CPU, ~2 min | |
01_baseline.ipynb |
Naive rule + XGBoost baseline | CPU, ~5 min | |
02_finetune.ipynb |
LFM2-350M QLoRA fine-tuning | T4 GPU, ~20 min | |
03_evaluate.ipynb |
Three-model evaluation + results export | T4 GPU, ~10 min | |
04_export_onnx.ipynb |
Merge adapter → fp16 PyTorch → HuggingFace Hub | T4 GPU, ~5 min | |
05_export_gguf.ipynb |
Merge adapter → GGUF (Q4_K_M) → HuggingFace Hub | T4 GPU, ~5 min |
signal38.github.io/
├── notebooks/ # Colab-ready pipeline notebooks
├── scripts/ # Shared helpers (colab_utils, features, metrics)
├── src/ # App source (WebGPU inference, UI)
├── docs/ # GitHub Pages site
├── data/
│ ├── clusters/ # Weekly GDELT event clusters
│ ├── labeled/ # Claude-generated risk assessments
│ ├── training/ # Train / val / test splits
│ └── outputs/ # Evaluation results (published by notebooks)
└── models/ # LoRA adapter weights and ONNX export (published by notebooks 02, 04)
pip install -r requirements.txtColab notebooks are self-contained. Open any notebook via the badge above, connect a T4 runtime, and run all cells.
The notebooks share common setup via scripts/colab_utils.py. Notebooks that publish artifacts back to this repo require a GITHUB_TOKEN_SIGNAL38 Colab secret. Notebook 00 additionally requires ACLED_EMAIL and ACLED_PASSWORD.
Setting up the GitHub token (one-time):
Create a fine-grained personal access token with Contents: Read and Write permission. When creating the token, set Resource Owner to signal38 (the org) — the form defaults to your personal account, which produces a token with the wrong scope. Then add it to Colab: open the key icon in the left sidebar → Secrets → Add new secret, name it GITHUB_TOKEN_SIGNAL38, paste the token, and enable notebook access.
For ACLED credentials (notebook 00), register at acleddata.com and add ACLED_EMAIL and ACLED_PASSWORD as Colab secrets.
For HuggingFace Hub (notebooks 04 and 05), create a write token at huggingface.co/settings/tokens and add it as HF_TOKEN.
- Diya Mirji — @dvm14
- Jonas Neves — @jonasneves
- Mike Saju — @Michaelsaju1
Built for AIPI 540.01 — Deep Learning, Spring 2026, Duke University AIPI Program.
MIT
