Skip to content

benchmarking#21

Draft
danielmccannsayles wants to merge 5 commits into
mainfrom
dmccanns/remove-hf-cli
Draft

benchmarking#21
danielmccannsayles wants to merge 5 commits into
mainfrom
dmccanns/remove-hf-cli

Conversation

@danielmccannsayles

@danielmccannsayles danielmccannsayles commented Jun 20, 2026

Copy link
Copy Markdown
Member

Summary by cubic

Adds three small Go benchmarks—diskwrite, netread, and a stdlib-only naive downloader—to compare the hf CLI (huggingface_hub[hf_xet]) vs plain HTTPS and isolate network vs disk costs, informing removal of hf from modelwrap’s supply chain. Includes a probe to confirm if the HF path uses Xet, plus sample result TSVs and a go.mod for standalone builds.

  • New Features
    • bench/naive: sequential downloader via Hub tree API + resolve URLs; records per-file network and disk times; --sync option to fsync each file; optional HF_TOKEN; run.sh writes naive.tsv.
    • bench/netread: streams each file to io.Discard to measure pure network throughput; run.sh writes netread.tsv.
    • bench/diskwrite: measures raw write and fsync throughput; run.sh writes diskwrite.tsv.
    • Docs and probe: bench/README.md, writeup.md; bench/xet_probe.py monkeypatches xet_get/http_get and snapshots TCP peers with/without HF_HUB_DISABLE_XET.
    • Extras: sample results in bench/results/*.tsv; bench/go.mod for isolated builds.

Written for commit a665498. Summary will update on new commits.

Review in cubic

@danielmccannsayles danielmccannsayles changed the title first pass (slop) benchmarking Jun 26, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant