Skip to content

Add adapter generation demo scripts#5

Merged
antonpibm merged 6 commits into
mainfrom
feature/add-demo-scripts
May 12, 2026
Merged

Add adapter generation demo scripts#5
antonpibm merged 6 commits into
mainfrom
feature/add-demo-scripts

Conversation

@AlonMalach
Copy link
Copy Markdown
Collaborator

Add two tutorial scripts under tutorials/scripts/ that compose a
Granite Switch model with the RAG, core, and guardian adapter
libraries and demonstrate invoking every adapter end-to-end:

- run_adapter_generation.py: plain HuggingFace transformers path. Activates adapters by passing adapter_name=... to tokenizer.apply_chat_template, following the README idiom. One dedicated demo_<adapter> function per adapter with the message / document layout the adapter expects. For the three per-span LoRA adapters (citations, context-attribution, hallucination_detection) the script also applies io.yaml's sentence_boundaries tagging and appends the io.yaml instruction as a final user turn, so their output shape matches Mellea. Results saved to a timestamped JSON file.

- run_adapter_generation_mellea.py: Mellea + vLLM path. Starts a vLLM server for the composed model, wires a Mellea OpenAIBackend with embedded-adapter registration, and calls the corresponding intrinsic wrapper (rag.rewrite_question, core.check_certainty, guardian.policy_guardrails, ...) for each adapter.

Both scripts mirror one another's demo inputs so outputs can be
compared side by side. Common flags: --output, --model-dir,
--base-model, --max-tokens (HF only); the Mellea script additionally
exposes --port, --gpu-memory-utilization, --max-model-len for the
vLLM server.

The 14 demos cover:
- RAG: query_rewrite, query_clarification, answerability, citations, hallucination_detection
- Core: context-attribution, requirement-check, uncertainty
- Guardian: guardian-core (social_bias / harm / safe variants), policy-guardrails, factuality-detection, factuality-correctio

  Add two tutorial scripts under tutorials/scripts/ that compose a
  Granite Switch model with the RAG, core, and guardian adapter
  libraries and demonstrate invoking every adapter end-to-end:

    - run_adapter_generation.py: plain HuggingFace transformers path.
      Activates adapters by passing adapter_name=... to
      tokenizer.apply_chat_template, following the README idiom.
      One dedicated demo_<adapter> function per adapter with the
      message / document layout the adapter expects. For the three
      per-span LoRA adapters (citations, context-attribution,
      hallucination_detection) the script also applies io.yaml's
      sentence_boundaries tagging and appends the io.yaml instruction
      as a final user turn, so their output shape matches Mellea.
      Results saved to a timestamped JSON file.

    - run_adapter_generation_mellea.py: Mellea + vLLM path. Starts a
      vLLM server for the composed model, wires a Mellea OpenAIBackend
      with embedded-adapter registration, and calls the corresponding
      intrinsic wrapper (rag.rewrite_question, core.check_certainty,
      guardian.policy_guardrails, ...) for each adapter.

  Both scripts mirror one another's demo inputs so outputs can be
  compared side by side. Common flags: --output, --model-dir,
  --base-model, --max-tokens (HF only); the Mellea script additionally
  exposes --port, --gpu-memory-utilization, --max-model-len for the
  vLLM server.

  The 14 demos cover:
    - RAG: query_rewrite, query_clarification, answerability, citations,
      hallucination_detection
    - Core: context-attribution, requirement-check, uncertainty
    - Guardian: guardian-core (social_bias / harm / safe variants),
      policy-guardrails, factuality-detection, factuality-correctio
@AlonMalach AlonMalach requested a review from antonpibm May 11, 2026 05:48
@AlonMalach AlonMalach self-assigned this May 11, 2026
@AlonMalach AlonMalach requested a review from freunda May 11, 2026 05:50
    - Tighten sentence-boundary tagging and instruction-templates
      comment blocks; drop development-flavored asides.
    - Rewrite demo_policy_guardrails docstring from pseudocode sketch
      to concrete context/returns description.
    - Replace magic-number slicing in strip_adapter_suffix with
      str.removesuffix (Python 3.9+ idiom).
    - Minor rephrasing in _split_sentences, _mark_sentence_boundaries,
      and the granite_switch.hf import comment.
Comment thread tutorials/scripts/run_adapter_generation.py Outdated
Comment thread tutorials/scripts/run_adapter_generation.py Outdated
Comment thread tutorials/scripts/run_adapter_generation_mellea.py Outdated
Comment thread tutorials/scripts/run_adapter_generation.py Outdated
- Default to ibm-granite/granite-switch-4.1-3b-preview instead of
  composing a model at startup. Dropped compose_model(), LIBRARIES,
  BUILD_TIMEOUT, --base-model, and the subprocess/tempfile plumbing
  from both scripts.
- Load the HF model with device_map="auto" instead of .to("cuda"),
  so it works on multi-GPU / no-GPU-0 setups.
- Remove the Mellea-sync claim from run_adapter_generation.py's
  top-level docstring; the per-adapter attribution comments stay
  since they point at the exact Mellea source the prompts were
  copied from.
@AlonMalach AlonMalach requested a review from yairallouche as a code owner May 12, 2026 07:55
alon added 2 commits May 12, 2026 11:17
Soften the hard cuda.is_available() check to a warning — with
device_map="auto", plain HF transformers loads onto whatever is
available (CUDA / MPS / CPU). CPU is slow but not unsupported.
Makes the naming symmetric with run_adapter_generation_mellea.py
and clarifies that this script talks to the model directly via
HuggingFace transformers, with no framework layer in between.
Comment thread tutorials/scripts/run_adapter_generation_direct.py Outdated
Addresses PR #5 review comment. Strips every Mellea mention from
the script: section-header attribution comments, "Copied verbatim
from mellea..." markers, the CRITERIA_BANK subset note, and the
per-demo "Mirrors X.Y()" docstrings (rewritten as plain behavior
descriptions). The script now stands on its own without pointing
at Mellea's source.
@antonpibm antonpibm merged commit 436f698 into main May 12, 2026
@antonpibm antonpibm deleted the feature/add-demo-scripts branch May 12, 2026 09:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants