Development Guide

Welcome! This document is for contributors and maintainers of Context Builder. It covers how to set up a development environment, build, test, lint, benchmark, and release the project.

For user-facing documentation and examples, see README.md. For performance work, see BENCHMARKS.md. For release history, see CHANGELOG.md.

Prerequisites

Rust toolchain (stable) with support for the 2024 edition.
- Install via rustup: https://rustup.rs
- Keep your toolchain up-to-date: rustup update
Git

Optional but recommended:

IDE with Rust Analyzer
Just or Make for local task automation (if you prefer)
Node.js (only if you plan to view Criterion’s HTML reports and serve them locally, not required for development)

Getting the code

git clone https://github.com/igorls/context-builder.git
cd context-builder

Project layout

Cargo.toml — crate metadata, dependencies, features
README.md — user-facing documentation
CHANGELOG.md — release notes
DEVELOPMENT.md — this file
BENCHMARKS.md — running and understanding benchmarks
scripts/
- generate_samples.rs — synthetic dataset generator for benchmarking
benches/
- context_bench.rs — Criterion benchmark suite
src/
- main.rs — binary entry point
- lib.rs — core orchestration and run() implementation
- cli.rs — clap parser and CLI arguments
- file_utils.rs — directory traversal, filter/ignore collection, prompts
- markdown.rs — core rendering logic, streaming, line numbering, binary/text sniffing
- tree.rs — file tree structure building and printing
samples/ — optional persistent datasets (ignored in VCS) for benchmarking

Building and running

Build:

cargo build

Run the CLI:

cargo run -- --help
cargo run -- -d . -o out.md -f rs -f toml -i target --line-numbers

Notes:

By default, parallel processing is enabled via the parallel feature (uses rayon).
Logging uses env_logger; set RUST_LOG to control verbosity:
- Linux/macOS: RUST_LOG=info cargo run -- ...
- Windows PowerShell: $env:RUST_LOG='info'; cargo run -- ...

Features

parallel (enabled by default)
- Enables parallel file processing in markdown generation via rayon.
- Disable defaults (sequential run):
  - Build/Run: cargo run --no-default-features -- ...
  - As a dependency in another crate: set default-features = false in Cargo.toml.
samples-bin
- Exposes the dataset generator as a cargo bin (development-only).
- Usage:
  - Linux/macOS:
    - cargo run --no-default-features --features samples-bin --bin generate_samples -- --help
  - Windows PowerShell:
    - cargo run --no-default-features --features samples-bin --bin generate_samples -- --help

Testing

Run all tests:

cargo test

Tips:

Unit tests cover CLI parsing, file filtering/ignoring, markdown formatting (including line numbers and binary handling), and tree building.
Avoid adding interactive prompts inside tests. The library is structured so that prompts can be injected/mocked (see Prompter trait).
For additional diagnostics during tests:
- Linux/macOS: RUST_LOG=info cargo test
- Windows PowerShell: $env:RUST_LOG='info'; cargo test

Linting and formatting

Format:

cargo fmt --all

Clippy (lints):

cargo clippy --all-targets --all-features -- -D warnings

Please ensure code is formatted and clippy-clean before opening a PR.

Benchmarks

We use Criterion for micro/meso benchmarks and dataset-driven performance checks.

See BENCHMARKS.md for details, including dataset generation, silent runs, and HTML report navigation.
Quick start:
```
cargo bench --bench context_bench
```

Environment variables

CB_SILENT
- When set to “1” or “true” (case-insensitive), suppresses user-facing prints in the CLI.
- The benchmark harness sets this to “1” by default to avoid skewing timings with console I/O.
- Override locally:
  - Linux/macOS: CB_SILENT=0 cargo bench --bench context_bench
  - Windows PowerShell: $env:CB_SILENT=0; cargo bench --bench context_bench
CB_BENCH_MEDIUM
- When set to “1”, enables the heavier “medium” dataset scenarios during benches.
CB_BENCH_DATASET_DIR
- Allows pointing the benchmark harness to an external root containing datasets:
  - <CB_BENCH_DATASET_DIR>/<preset>/project
- If not set and no ./samples/<preset>/project is present, benches will synthesize datasets in a temp dir.
RUST_LOG
- Controls log verbosity (env_logger). Example:
  - Linux/macOS: RUST_LOG=info cargo run -- ...
  - Windows PowerShell: $env:RUST_LOG='info'; cargo run -- ...

Coding guidelines

Edition: 2024
Error handling:
- Use io::Result where appropriate; prefer returning errors over panicking.
- It’s okay to use unwrap() and expect() in tests/benches and small setup helpers, but not in core library logic.
Performance:
- Prefer streaming reads/writes for large files (see markdown.rs).
- Keep binary detection lightweight (current sniff logic checks for NUL bytes and UTF-8 validity).
- Keep language detection simple and deterministic; add new mappings in one place.
Cross-platform:
- Normalize path separators in tests where string comparisons are used.
Logging:
- Use log::{info, warn, error}; let env_logger control emission.
CLI:
- Add new flags in cli.rs. Ensure you update tests covering parsing, and propagate options cleanly through run_with_args.

Submitting changes

Fork and create a feature branch:
```
git checkout -b feat/my-improvement
```

Make changes, add tests, and keep the code formatted and clippy-clean:

cargo fmt --all
cargo clippy --all-targets --all-features -- -D warnings
cargo test

If you modified performance-sensitive code, run benches (see BENCHMARKS.md).
Update CHANGELOG.md if the change is user-visible or noteworthy.
Open a PR with:
- A concise title
- Description of changes and rationale
- Notes on performance impact (if any)
- Any relevant screenshots or benchmark snippets

Suggested commit message convention: short, imperative subject; optionally scope (e.g., feat(cli): add --no-parallel flag).

Releasing (maintainers)

Ensure the tree is green:
- cargo fmt --all
- cargo clippy --all-targets --all-features -- -D warnings
- cargo test
- Optionally: cargo bench
Update versions and docs:
- Bump version in Cargo.toml.
- Add a new entry to CHANGELOG.md.
- Verify README.md and DEVELOPMENT.md are up to date.

Tag the release:

git commit -am "chore(release): vX.Y.Z"
git tag vX.Y.Z
git push && git push --tags

Publish to crates.io:
```
cargo publish --dry-run
cargo publish
```
Create a GitHub release, paste changelog highlights, and attach links to benchmarks if relevant.

Tips and pitfalls

Prompts during runs
- The library uses a Prompter trait for confirmation flows. Inject a test-friendly prompter to avoid interactive I/O in tests and benches.
Output file overwrites
- The CLI confirms overwrites by default. In tests/benches, use the injected prompter that auto-confirms.
Large datasets
- Prefer parallel builds for performance.
- Consider dataset size and line-numbering effects when measuring.

Questions?

Open an issue or start a discussion on GitHub. Thanks for contributing!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Development Guide

Prerequisites

Getting the code

Project layout

Building and running

Features

Testing

Linting and formatting

Benchmarks

Environment variables

Coding guidelines

Submitting changes

Releasing (maintainers)

Tips and pitfalls

Questions?

FilesExpand file tree

DEVELOPMENT.md

Latest commit

History

DEVELOPMENT.md

File metadata and controls

Development Guide

Prerequisites

Getting the code

Project layout

Building and running

Features

Testing

Linting and formatting

Benchmarks

Environment variables

Coding guidelines

Submitting changes

Releasing (maintainers)

Tips and pitfalls

Questions?