Add adapter generation demo scripts#5
Merged
Conversation
Add two tutorial scripts under tutorials/scripts/ that compose a
Granite Switch model with the RAG, core, and guardian adapter
libraries and demonstrate invoking every adapter end-to-end:
- run_adapter_generation.py: plain HuggingFace transformers path.
Activates adapters by passing adapter_name=... to
tokenizer.apply_chat_template, following the README idiom.
One dedicated demo_<adapter> function per adapter with the
message / document layout the adapter expects. For the three
per-span LoRA adapters (citations, context-attribution,
hallucination_detection) the script also applies io.yaml's
sentence_boundaries tagging and appends the io.yaml instruction
as a final user turn, so their output shape matches Mellea.
Results saved to a timestamped JSON file.
- run_adapter_generation_mellea.py: Mellea + vLLM path. Starts a
vLLM server for the composed model, wires a Mellea OpenAIBackend
with embedded-adapter registration, and calls the corresponding
intrinsic wrapper (rag.rewrite_question, core.check_certainty,
guardian.policy_guardrails, ...) for each adapter.
Both scripts mirror one another's demo inputs so outputs can be
compared side by side. Common flags: --output, --model-dir,
--base-model, --max-tokens (HF only); the Mellea script additionally
exposes --port, --gpu-memory-utilization, --max-model-len for the
vLLM server.
The 14 demos cover:
- RAG: query_rewrite, query_clarification, answerability, citations,
hallucination_detection
- Core: context-attribution, requirement-check, uncertainty
- Guardian: guardian-core (social_bias / harm / safe variants),
policy-guardrails, factuality-detection, factuality-correctio
- Tighten sentence-boundary tagging and instruction-templates
comment blocks; drop development-flavored asides.
- Rewrite demo_policy_guardrails docstring from pseudocode sketch
to concrete context/returns description.
- Replace magic-number slicing in strip_adapter_suffix with
str.removesuffix (Python 3.9+ idiom).
- Minor rephrasing in _split_sentences, _mark_sentence_boundaries,
and the granite_switch.hf import comment.
antonpibm
reviewed
May 11, 2026
- Default to ibm-granite/granite-switch-4.1-3b-preview instead of
composing a model at startup. Dropped compose_model(), LIBRARIES,
BUILD_TIMEOUT, --base-model, and the subprocess/tempfile plumbing
from both scripts.
- Load the HF model with device_map="auto" instead of .to("cuda"),
so it works on multi-GPU / no-GPU-0 setups.
- Remove the Mellea-sync claim from run_adapter_generation.py's
top-level docstring; the per-adapter attribution comments stay
since they point at the exact Mellea source the prompts were
copied from.
added 2 commits
May 12, 2026 11:17
Soften the hard cuda.is_available() check to a warning — with device_map="auto", plain HF transformers loads onto whatever is available (CUDA / MPS / CPU). CPU is slow but not unsupported.
Makes the naming symmetric with run_adapter_generation_mellea.py and clarifies that this script talks to the model directly via HuggingFace transformers, with no framework layer in between.
antonpibm
reviewed
May 12, 2026
Addresses PR #5 review comment. Strips every Mellea mention from the script: section-header attribution comments, "Copied verbatim from mellea..." markers, the CRITERIA_BANK subset note, and the per-demo "Mirrors X.Y()" docstrings (rewritten as plain behavior descriptions). The script now stands on its own without pointing at Mellea's source.
antonpibm
approved these changes
May 12, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Add two tutorial scripts under tutorials/scripts/ that compose a
Granite Switch model with the RAG, core, and guardian adapter
libraries and demonstrate invoking every adapter end-to-end:
Both scripts mirror one another's demo inputs so outputs can be
compared side by side. Common flags: --output, --model-dir,
--base-model, --max-tokens (HF only); the Mellea script additionally
exposes --port, --gpu-memory-utilization, --max-model-len for the
vLLM server.
The 14 demos cover:
- RAG: query_rewrite, query_clarification, answerability, citations, hallucination_detection
- Core: context-attribution, requirement-check, uncertainty
- Guardian: guardian-core (social_bias / harm / safe variants), policy-guardrails, factuality-detection, factuality-correctio