Skip to content

Add ClickBench example and create-chkit scaffold#128

Draft
KeKs0r wants to merge 9 commits into
mainfrom
marc/num-7213-activation-add-chkit-benchmark-example-project
Draft

Add ClickBench example and create-chkit scaffold#128
KeKs0r wants to merge 9 commits into
mainfrom
marc/num-7213-activation-add-chkit-benchmark-example-project

Conversation

@KeKs0r
Copy link
Copy Markdown
Member

@KeKs0r KeKs0r commented May 25, 2026

Adds a ClickBench CHKit example with schema and a separate full dataset load migration from datasets.clickhouse.com. Adds the create-chkit scaffolding package so users can start from curated examples, and refreshes getting-started docs around example-first and existing-project flows. Clarifies that --migration-id is an escape hatch for overriding the default timestamp migration prefix. Validated with package CLI typecheck/lint and a docs build.

KeKs0r added 4 commits May 25, 2026 19:09
Adds examples/clickbench/ with the full ClickBench hits schema and a
load migration that ingests the public ClickBench dataset via the
ClickHouse url() table function. Targets ObsessionDB via the
plugin-obsessiondb plugin.

Includes a .gitignore exception so example clickhouse.config.ts files
are tracked, and clarifies the --migration-id flag description (used
while authoring the load migration).
create-chkit downloads a curated example from
github:obsessiondb/chkit/examples/<name> via giget, rewrites the
project name and repins chkit + @chkit/* deps to latest, then runs
install with the auto-detected package manager. Default example is
clickbench. Built with @clack/prompts for the UI.

Restructures Getting Started docs into two pages — Start with an
example (using create-chkit) and Add to an existing project (using
chkit init) — and updates the docs link printed by chkit init.
The clickbench load migration truncates default.hits before reloading
the 70 GB ClickBench dataset. ClickHouse's default
max_table_size_to_drop (50 GB) blocks the TRUNCATE once the table is
already populated, leaving the migration stuck. Pass
max_table_size_to_drop = 0 and max_partition_size_to_drop = 0 on the
TRUNCATE so the migration is re-runnable against a partially or fully
loaded table.
@@ -0,0 +1,16 @@
import { defineConfig } from '@chkit/core'
@@ -0,0 +1,122 @@
import { schema, table } from '@chkit/core'
KeKs0r added 5 commits May 26, 2026 00:08
The @clickhouse/client library default is 30s, which kills migrate
in-flight on long-running DDL or INSERT statements (the ClickBench
dataset load is the canonical example — the load query took longer
than 30s, the client closed the socket, and the server cancelled the
INSERT). Lift the default to 120s across the stateless, session, and
DDL-fallback clients. Properly exposing a per-config timeout is a
follow-up.
The single `INSERT ... FROM url(hits_{0..99}.parquet)` exceeds the
hard request-duration limit on edge proxies in front of managed
ClickHouse deployments (the ObsessionDB customer-benchmark endpoint
504'd at ~10 min, ~55M of 100M rows). Split into five 20-file chunks
so each INSERT fits well under typical proxy budgets, and pin
max_execution_time = 0 on every chunk to keep the server-side query
timer from biting. Verified end-to-end against ObsessionDB: full
99,997,497 rows / 8.69 GiB loaded.
Replace the 5×20-file chunked url() load with a single s3() INSERT.
s3() does native partitioned-Parquet parallelism that url() doesn't;
combined with max_download_threads = 32 and max_insert_threads = 16
this is expected to drop wall time from ~13 min to ~3-5 min and bring
the whole load comfortably under typical edge-proxy request budgets,
removing the need for chunking. max_execution_time = 0 still required
to disable the server-side query timer. The dataset URL changes from
datasets.clickhouse.com (CloudFront alias) to the underlying
clickhouse-public-datasets S3 bucket.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants