Skip to content

Staging to Main#69

Merged
cryptopoly merged 3 commits into
mainfrom
staging
Jun 2, 2026
Merged

Staging to Main#69
cryptopoly merged 3 commits into
mainfrom
staging

Conversation

@cryptopoly
Copy link
Copy Markdown
Owner

No description provided.

…old start

User-facing
- HTML Challenge prompt library: 32 curated single-page prompts across 4
  categories (Games / Simulations / Tech Demos / Creative Tools) behind an
  Option-C tabbed picker (category tabs + search + card grid). New
  challengePromptLibrary.ts + ChallengePromptLibraryModal.tsx, wired into
  HtmlChallengeTab via a "Prompt library" button.
- GGUF MTP speculative-decoding toggle in the launch modal (FU-074) — was
  backend-only (FU-047) with no UI; now shown for MTP-GGUF + llama.cpp.
- Qwen3.5/3.6 re-tagged multimodal across the catalog (FU-072) — upstream
  unified them onto Qwen3_5ForConditionalGeneration with vision_config;
  FU-040's text-only assumption was stale. Runtime supportsVision stays
  per-engine gated, so badges never produce a broken Attach-image button.

Version + upstream deps
- Bump to 0.9.3 (package.json, pyproject, Cargo, tauri.conf).
- turboquant-mlx-full floor >=0.5.0 (FU-069, parallel expert prefetch).
- mlx-vlm floor >=0.5.0 (FU-063).
- ggml-org Qwen3.6-{27B,35B-A3B}-GGUF non-MTP catalog rows (FU-064).

MLX speculative decoding — genuine engagement (were silently falling back)
- FU-075: mlx_worker_lifecycle imported the removed top-level
  configure_full_attention_split → ImportError disabled DFlash/DDTree/MTPLX
  for everyone. Use the dflash-mlx 0.1.5 target_ops adapter; split only on
  hybrid_gdn families.
- FU-076: MTP tensor probe missed top-level mtp.* keys (Qwen3.5/3.6) → MTPLX
  never selected. Match a bare "mtp." prefix.
- FU-077: harden install-mtplx.sh verify to import the server module +
  auto-retry; a truncated venv (missing numpy/fastapi/...) passed before.
- FU-078: MtplxEngine handed MTPLX a bare repo id; resolve the local HF
  snapshot dir when the candidate isn't an on-disk path.
- FU-071: DDTree availability probe checked a pre-0.1.5 symbol name.

Startup load time (FU-080)
- Backend cold import 2.6s -> ~0.85s: cache-strategy is_available() probes
  imported diffusers.hooks (pulling torch) at startup. New _diffusers_probe
  gates on importlib.metadata version (no import); real import stays lazy.
  torch/diffusers/mlx no longer in sys.modules after import backend_service.app.

Test infrastructure
- E2E suite: hardened DFlash + MTPLX checks (asserted structured engagement,
  not note substrings that the fallback note also matched); net-new DDTree,
  GGUF-MTP, and catalog-vision checks. CLI load surfaces treeBudget /
  dflashDraftModel / visionEnabled.
- cache-strategy matrix: classify missing-download as SKIP not FAIL (FU-070);
  fix tok/s capture + dflashAcceptanceRate; MTPLX cell targets canonical
  Qwen/Qwen3.5-4B (FU-073).
- Startup-import-purity guards + version-probe + all the above unit tests.

Validation: pytest green, vitest 453, tsc clean, cache-matrix 11/11,
E2E 39/39 (every spec-dec lane genuinely engaged).
…n-any-HF, connect presets)

Five local-AI-app parity features to close gaps vs Ollama / LM Studio,
each reusing existing infra rather than adding new heavy subsystems.

1. Out-of-box RAG — one-click nomic-embed-text-v1.5 install
   (/api/setup/install-embedding-model) + /api/rag/status. Chat doc
   panel shows vector vs lexical mode and offers the upgrade
   (RagStatusBadge). Retrieval was silently lexical-only without a model.

2. Server "Connect your app" presets — base_url + Python/JS snippets +
   Open WebUI / Continue.dev / Ollama presets in ServerTab.

3. Ollama-compatible API — /api/{chat,generate,tags,show,version,
   embeddings,embed} layered over the existing OpenAI generation path,
   translating SSE to NDJSON. Inherits auth + format->json_schema.
   Unlocks Ollama-preset tools (Open WebUI, Continue, Raycast, n8n).

4. Import Ollama / LM Studio models by reference — scans the Ollama blob
   store (manifest -> blob) and LM Studio cache, symlinks into a managed
   imported-models dir (no re-download), auto-registers for library scan.

5. Run any Hugging Face repo — /api/models/resolve-hf classifies backend,
   picks the GGUF file, and infers context + capabilities from the repo's
   own metadata; loads with canonicalRepo set to bypass the FU-041
   catalog fuzzy-match that mis-tagged off-catalog models (RunFromHuggingFace).

Tests: +42 backend (test_embedding_setup, test_hf_resolve, test_model_import,
+ Ollama shim cases in test_backend_service); vitest 453 green; tsc clean;
i18n 100%; full E2E suite 8/8 phases pass incl. new phase-0 checks.

Known follow-ups: stage llama-embedding binary for packaged builds (#1);
Windows symlink privilege (#4); raw-safetensors repos flagged vLLM/CUDA (#5).
@cryptopoly cryptopoly merged commit 17cb93e into main Jun 2, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant