Skip to content

Latest commit

 

History

History
252 lines (188 loc) · 7.42 KB

File metadata and controls

252 lines (188 loc) · 7.42 KB

Development Guide

Welcome! This document is for contributors and maintainers of Context Builder. It covers how to set up a development environment, build, test, lint, benchmark, and release the project.

For user-facing documentation and examples, see README.md. For performance work, see BENCHMARKS.md. For release history, see CHANGELOG.md.


Prerequisites

  • Rust toolchain (stable) with support for the 2024 edition.
    • Install via rustup: https://rustup.rs
    • Keep your toolchain up-to-date: rustup update
  • Git

Optional but recommended:

  • IDE with Rust Analyzer
  • Just or Make for local task automation (if you prefer)
  • Node.js (only if you plan to view Criterion’s HTML reports and serve them locally, not required for development)

Getting the code

git clone https://github.com/igorls/context-builder.git
cd context-builder

Project layout

  • Cargo.toml — crate metadata, dependencies, features
  • README.md — user-facing documentation
  • CHANGELOG.md — release notes
  • DEVELOPMENT.md — this file
  • BENCHMARKS.md — running and understanding benchmarks
  • scripts/
    • generate_samples.rs — synthetic dataset generator for benchmarking
  • benches/
    • context_bench.rs — Criterion benchmark suite
  • src/
    • main.rs — binary entry point
    • lib.rs — core orchestration and run() implementation
    • cli.rs — clap parser and CLI arguments
    • file_utils.rs — directory traversal, filter/ignore collection, prompts
    • markdown.rs — core rendering logic, streaming, line numbering, binary/text sniffing
    • tree.rs — file tree structure building and printing
  • samples/ — optional persistent datasets (ignored in VCS) for benchmarking

Building and running

Build:

cargo build

Run the CLI:

cargo run -- --help
cargo run -- -d . -o out.md -f rs -f toml -i target --line-numbers

Notes:

  • By default, parallel processing is enabled via the parallel feature (uses rayon).
  • Logging uses env_logger; set RUST_LOG to control verbosity:
    • Linux/macOS: RUST_LOG=info cargo run -- ...
    • Windows PowerShell: $env:RUST_LOG='info'; cargo run -- ...

Features

  • parallel (enabled by default)

    • Enables parallel file processing in markdown generation via rayon.
    • Disable defaults (sequential run):
      • Build/Run: cargo run --no-default-features -- ...
      • As a dependency in another crate: set default-features = false in Cargo.toml.
  • samples-bin

    • Exposes the dataset generator as a cargo bin (development-only).
    • Usage:
      • Linux/macOS:
        • cargo run --no-default-features --features samples-bin --bin generate_samples -- --help
      • Windows PowerShell:
        • cargo run --no-default-features --features samples-bin --bin generate_samples -- --help

Testing

Run all tests:

cargo test

Tips:

  • Unit tests cover CLI parsing, file filtering/ignoring, markdown formatting (including line numbers and binary handling), and tree building.
  • Avoid adding interactive prompts inside tests. The library is structured so that prompts can be injected/mocked (see Prompter trait).
  • For additional diagnostics during tests:
    • Linux/macOS: RUST_LOG=info cargo test
    • Windows PowerShell: $env:RUST_LOG='info'; cargo test

Linting and formatting

Format:

cargo fmt --all

Clippy (lints):

cargo clippy --all-targets --all-features -- -D warnings

Please ensure code is formatted and clippy-clean before opening a PR.


Benchmarks

We use Criterion for micro/meso benchmarks and dataset-driven performance checks.

  • See BENCHMARKS.md for details, including dataset generation, silent runs, and HTML report navigation.
  • Quick start:
    cargo bench --bench context_bench

Environment variables

  • CB_SILENT

    • When set to “1” or “true” (case-insensitive), suppresses user-facing prints in the CLI.
    • The benchmark harness sets this to “1” by default to avoid skewing timings with console I/O.
    • Override locally:
      • Linux/macOS: CB_SILENT=0 cargo bench --bench context_bench
      • Windows PowerShell: $env:CB_SILENT=0; cargo bench --bench context_bench
  • CB_BENCH_MEDIUM

    • When set to “1”, enables the heavier “medium” dataset scenarios during benches.
  • CB_BENCH_DATASET_DIR

    • Allows pointing the benchmark harness to an external root containing datasets:
      • <CB_BENCH_DATASET_DIR>/<preset>/project
    • If not set and no ./samples/<preset>/project is present, benches will synthesize datasets in a temp dir.
  • RUST_LOG

    • Controls log verbosity (env_logger). Example:
      • Linux/macOS: RUST_LOG=info cargo run -- ...
      • Windows PowerShell: $env:RUST_LOG='info'; cargo run -- ...

Coding guidelines

  • Edition: 2024
  • Error handling:
    • Use io::Result where appropriate; prefer returning errors over panicking.
    • It’s okay to use unwrap() and expect() in tests/benches and small setup helpers, but not in core library logic.
  • Performance:
    • Prefer streaming reads/writes for large files (see markdown.rs).
    • Keep binary detection lightweight (current sniff logic checks for NUL bytes and UTF-8 validity).
    • Keep language detection simple and deterministic; add new mappings in one place.
  • Cross-platform:
    • Normalize path separators in tests where string comparisons are used.
  • Logging:
    • Use log::{info, warn, error}; let env_logger control emission.
  • CLI:
    • Add new flags in cli.rs. Ensure you update tests covering parsing, and propagate options cleanly through run_with_args.

Submitting changes

  1. Fork and create a feature branch:

    git checkout -b feat/my-improvement
  2. Make changes, add tests, and keep the code formatted and clippy-clean:

    cargo fmt --all
    cargo clippy --all-targets --all-features -- -D warnings
    cargo test
  3. If you modified performance-sensitive code, run benches (see BENCHMARKS.md).

  4. Update CHANGELOG.md if the change is user-visible or noteworthy.

  5. Open a PR with:

    • A concise title
    • Description of changes and rationale
    • Notes on performance impact (if any)
    • Any relevant screenshots or benchmark snippets

Suggested commit message convention: short, imperative subject; optionally scope (e.g., feat(cli): add --no-parallel flag).


Releasing (maintainers)

  1. Ensure the tree is green:

    • cargo fmt --all
    • cargo clippy --all-targets --all-features -- -D warnings
    • cargo test
    • Optionally: cargo bench
  2. Update versions and docs:

    • Bump version in Cargo.toml.
    • Add a new entry to CHANGELOG.md.
    • Verify README.md and DEVELOPMENT.md are up to date.
  3. Tag the release:

    git commit -am "chore(release): vX.Y.Z"
    git tag vX.Y.Z
    git push && git push --tags
  4. Publish to crates.io:

    cargo publish --dry-run
    cargo publish
  5. Create a GitHub release, paste changelog highlights, and attach links to benchmarks if relevant.


Tips and pitfalls

  • Prompts during runs
    • The library uses a Prompter trait for confirmation flows. Inject a test-friendly prompter to avoid interactive I/O in tests and benches.
  • Output file overwrites
    • The CLI confirms overwrites by default. In tests/benches, use the injected prompter that auto-confirms.
  • Large datasets
    • Prefer parallel builds for performance.
    • Consider dataset size and line-numbering effects when measuring.

Questions?

Open an issue or start a discussion on GitHub. Thanks for contributing!