From 6071b4263e81f91d379a39e338de9124f11c8a3e Mon Sep 17 00:00:00 2001 From: 16bit-ykiko <119843247+16bit-ykiko@users.noreply.github.com> Date: Wed, 17 Jun 2026 13:42:24 +0000 Subject: [PATCH] docs: sync catter --- en/catter/design/architecture.md | 164 +++++++++ en/catter/design/hook-mechanism.md | 218 ++++++++++++ en/catter/design/ipc-protocol.md | 221 ++++++++++++ en/catter/dev/build.md | 66 ++++ en/catter/dev/contribution.md | 58 ++++ en/catter/features/build-profiling.md | 21 ++ en/catter/features/command-tree.md | 43 +++ en/catter/features/compilation-database.md | 67 ++++ en/catter/features/fake-compilation.md | 24 ++ en/catter/features/target-tree.md | 39 +++ en/catter/guide/configuration.md | 80 +++++ en/catter/guide/quick-start.md | 77 +++++ en/catter/guide/what-is-catter.md | 49 +++ en/catter/index.md | 29 ++ en/catter/scripting/builtin-modules.md | 379 +++++++++++++++++++++ en/catter/scripting/command-analysis.md | 217 ++++++++++++ en/catter/scripting/option-parsing.md | 213 ++++++++++++ en/catter/scripting/overview.md | 94 +++++ en/catter/scripting/service-api.md | 230 +++++++++++++ en/catter/sidebar.yaml | 38 +++ zh/catter/design/architecture.md | 164 +++++++++ zh/catter/design/hook-mechanism.md | 218 ++++++++++++ zh/catter/design/ipc-protocol.md | 221 ++++++++++++ zh/catter/dev/build.md | 66 ++++ zh/catter/dev/contribution.md | 58 ++++ zh/catter/features/build-profiling.md | 21 ++ zh/catter/features/command-tree.md | 43 +++ zh/catter/features/compilation-database.md | 67 ++++ zh/catter/features/fake-compilation.md | 24 ++ zh/catter/features/target-tree.md | 39 +++ zh/catter/guide/configuration.md | 80 +++++ zh/catter/guide/quick-start.md | 77 +++++ zh/catter/guide/what-is-catter.md | 49 +++ zh/catter/index.md | 29 ++ zh/catter/scripting/builtin-modules.md | 379 +++++++++++++++++++++ zh/catter/scripting/command-analysis.md | 217 ++++++++++++ zh/catter/scripting/option-parsing.md | 213 ++++++++++++ zh/catter/scripting/overview.md | 94 +++++ zh/catter/scripting/service-api.md | 230 +++++++++++++ zh/catter/sidebar.yaml | 38 +++ 40 files changed, 4654 insertions(+) create mode 100644 en/catter/design/architecture.md create mode 100644 en/catter/design/hook-mechanism.md create mode 100644 en/catter/design/ipc-protocol.md create mode 100644 en/catter/dev/build.md create mode 100644 en/catter/dev/contribution.md create mode 100644 en/catter/features/build-profiling.md create mode 100644 en/catter/features/command-tree.md create mode 100644 en/catter/features/compilation-database.md create mode 100644 en/catter/features/fake-compilation.md create mode 100644 en/catter/features/target-tree.md create mode 100644 en/catter/guide/configuration.md create mode 100644 en/catter/guide/quick-start.md create mode 100644 en/catter/guide/what-is-catter.md create mode 100644 en/catter/index.md create mode 100644 en/catter/scripting/builtin-modules.md create mode 100644 en/catter/scripting/command-analysis.md create mode 100644 en/catter/scripting/option-parsing.md create mode 100644 en/catter/scripting/overview.md create mode 100644 en/catter/scripting/service-api.md create mode 100644 en/catter/sidebar.yaml create mode 100644 zh/catter/design/architecture.md create mode 100644 zh/catter/design/hook-mechanism.md create mode 100644 zh/catter/design/ipc-protocol.md create mode 100644 zh/catter/dev/build.md create mode 100644 zh/catter/dev/contribution.md create mode 100644 zh/catter/features/build-profiling.md create mode 100644 zh/catter/features/command-tree.md create mode 100644 zh/catter/features/compilation-database.md create mode 100644 zh/catter/features/fake-compilation.md create mode 100644 zh/catter/features/target-tree.md create mode 100644 zh/catter/guide/configuration.md create mode 100644 zh/catter/guide/quick-start.md create mode 100644 zh/catter/guide/what-is-catter.md create mode 100644 zh/catter/index.md create mode 100644 zh/catter/scripting/builtin-modules.md create mode 100644 zh/catter/scripting/command-analysis.md create mode 100644 zh/catter/scripting/option-parsing.md create mode 100644 zh/catter/scripting/overview.md create mode 100644 zh/catter/scripting/service-api.md create mode 100644 zh/catter/sidebar.yaml diff --git a/en/catter/design/architecture.md b/en/catter/design/architecture.md new file mode 100644 index 0000000..fc4c842 --- /dev/null +++ b/en/catter/design/architecture.md @@ -0,0 +1,164 @@ +# System Architecture + +Catter is a three-part system for intercepting and processing build commands. Each component has a distinct responsibility, and they communicate over IPC to form a pipeline that captures every compiler invocation a build system makes. + +## The Three Components + +### 1. HOOK -- Process Creation Interceptor + +A platform-specific shared library injected into build system processes. It intercepts calls to process creation functions (`execve` on Unix, `CreateProcess` on Windows) and rewrites them so that every child process is routed through `catter-proxy` before execution. + +- **Linux**: `libcatter-hook-unix.so`, injected via `LD_PRELOAD` +- **macOS**: `libcatter-hook-unix.dylib`, injected via `DYLD_INSERT_LIBRARIES` +- **Windows**: `catter-hook-win64.dll`, injected via DLL injection (`VirtualAllocEx` + `LoadLibraryA`) + +The hook is passive -- it does not make decisions. It simply redirects process creation to the proxy. + +### 2. PROXY (`catter-proxy`) -- Compiler Wrapper and Hook Manager + +A standalone executable that operates in two modes: + +- **Injector Mode**: Launched by `catter` to start the build system with the hook library attached. This is the first proxy instance in a session. +- **Wrapper Mode**: Launched by the hook library when a build command is intercepted. Each intercepted command spawns a new `catter-proxy` process that acts as a stand-in for the original compiler/tool. + +In both modes, the proxy connects to the `catter` daemon over IPC, sends information about the captured command, receives a decision (execute, drop, or modify), and acts on it. + +### 3. DECISION (`catter`) -- The Daemon + +The main process that the user invokes. It: + +1. Loads and initializes a JavaScript script via the embedded QuickJS runtime +2. Spawns `catter-proxy` in injector mode to start the build +3. Listens on a Unix domain socket (or named pipe on Windows) for IPC connections +4. Receives intercepted commands from proxy instances +5. Invokes JavaScript callbacks (`onCommand`, `onExecution`, `onStart`, `onFinish`) to decide how to handle each command +6. Maintains a session tree that mirrors the process tree of the build + +The JS runtime is single-threaded -- all script callbacks run sequentially on the event loop, so scripts do not need to handle concurrency. + +## Complete Workflow + +Here is a step-by-step walkthrough of what happens when you run: + +```bash +catter script::cdb -o compile_commands.json -- make +``` + +1. **User invokes `catter`**. The DECISION daemon starts, loads the `script::cdb` script, and initializes the QuickJS runtime. The script's `onStart()` callback runs. + +2. **`catter` spawns `catter-proxy -- make`**. This is the proxy in **injector mode**. The proxy is a child process of `catter`. + +3. **Proxy connects to `catter` via IPC**. It sends a `CHECK_MODE` request to confirm the daemon is in inject mode. + +4. **`catter` responds: inject mode**. The proxy now knows it should launch the build command with the hook library attached. + +5. **Proxy starts `make` with the hook injected**. On Linux, this means adding `libcatter-hook-unix.so` to `LD_PRELOAD` and setting environment variables (`__key_catter_proxy_path_v1`, `__key_catter_command_id_v1`) before calling the real `execve`. On Windows, the process is created suspended, the hook DLL is injected, then the process is resumed. + +6. **`make` runs and tries to spawn `g++ main.cpp -o main.o`**. The build system calls `execve("g++", ...)` (or `CreateProcess` on Windows). + +7. **The HOOK intercepts the `execve()` call**. The hook library, loaded in `make`'s address space, catches the call before it reaches the kernel. + +8. **HOOK rewrites the command**. Instead of executing `g++` directly, the hook rewrites the command to: + ``` + catter-proxy -p --exec /usr/bin/g++ -- g++ main.cpp -o main.o + ``` + It also cleans the environment: removes catter-specific variables and strips the hook library from `LD_PRELOAD` to prevent the proxy itself from being hooked. + +9. **A new `catter-proxy` instance starts** (PROXY in **wrapper mode**). This proxy instance calls the real `execve` with the rewritten command. + +10. **Wrapper proxy connects to `catter` via IPC**. It sends a `CREATE` request (registering itself with its parent ID), then a `MAKE_DECISION` request containing the full command details: working directory, resolved executable path, arguments, and environment. + +11. **`catter` invokes `onCommand(ctx)` in the JS script**. The script inspects the command and returns an action: execute as-is, execute with modifications, or drop. + +12. **Proxy acts on the decision**: + - **INJECT**: Execute the command with the hook library re-attached (so grandchild processes are also intercepted) + - **WRAP**: Execute the command directly, capturing stdout/stderr + - **DROP**: Skip execution, return exit code 0 + +13. **After execution, the proxy sends `FINISH` to `catter`**. The result includes the exit code, captured stdout, and captured stderr. + +14. **`catter` invokes `onExecution(ctx)` in the JS script** with the execution result. + +15. **When `make` finishes**, the original injector proxy exits, and `catter` invokes `onFinish(result)`. + +16. **`catter` shuts down** and writes any output (e.g., `compile_commands.json`). + +## Sequence Diagram + +```mermaid +sequenceDiagram + participant User + participant Catter as catter (DECISION) + participant Proxy1 as catter-proxy (Injector) + participant Make as make (Build System) + participant Hook as HOOK (in make) + participant Proxy2 as catter-proxy (Wrapper) + participant GCC as g++ (Compiler) + + User->>Catter: catter script::cdb -- make + Catter->>Catter: Load script, call onStart() + Catter->>Proxy1: Spawn catter-proxy -- make + Proxy1->>Catter: IPC: CHECK_MODE + Catter-->>Proxy1: INJECT mode + Proxy1->>Make: Start make with HOOK + Make->>Hook: execve("g++", ...) + Hook->>Proxy2: Rewrite to catter-proxy -p ID --exec g++ -- ... + Proxy2->>Catter: IPC: CREATE(parent_id) + Catter-->>Proxy2: New session ID + Proxy2->>Catter: IPC: MAKE_DECISION(command) + Catter->>Catter: Call onCommand(ctx) + Catter-->>Proxy2: Action: execute + Proxy2->>GCC: Execute g++ main.cpp + GCC-->>Proxy2: Exit code 0 + Proxy2->>Catter: IPC: FINISH(result) + Catter->>Catter: Call onExecution(ctx) + Make-->>Proxy1: make exits + Proxy1-->>Catter: Process result + Catter->>Catter: Call onFinish(result) +``` + +## Two Modes of catter-proxy + +The proxy binary serves dual purposes depending on how it is invoked: + +### Injector Mode + +Launched by `catter` to run the build system with hooks. This is always the first proxy instance in a session. + +``` +catter-proxy -- make -j8 +``` + +The injector: +1. Connects to the daemon, confirms inject mode +2. Prepares the environment with `LD_PRELOAD` (or performs DLL injection on Windows) +3. Sets catter-specific environment variables for the hook to read +4. Launches the build command +5. Waits for the build to complete, capturing stdout/stderr + +### Wrapper Mode + +Launched by the hook library when a child process is intercepted. Each intercepted command creates a new wrapper instance. + +``` +catter-proxy -p --exec /usr/bin/g++ -- g++ main.cpp -o main.o +``` + +The wrapper: +1. Connects to the daemon +2. Registers itself as a child of `parent_id` via `CREATE` +3. Sends the captured command via `MAKE_DECISION` +4. Executes (or drops) based on the daemon's response +5. Reports the result via `FINISH` + +## Key Design Decisions + +**Single-threaded JS runtime**. All JavaScript callbacks run sequentially on the event loop. The daemon handles IPC connections concurrently (via async I/O with `kota`), but script execution is serialized. This eliminates race conditions in user scripts. + +**Binary IPC over Unix domain sockets / named pipes**. Communication between proxy and daemon uses Bincode serialization via `kota::ipc::BincodePeer`. This is fast and avoids the overhead of text-based protocols. See the [IPC Protocol](ipc-protocol.md) document for details. + +**Recursive hooking**. On Unix, `LD_PRELOAD` is inherited by child processes, so any subprocess spawned by the build system (including nested `make` invocations, shell scripts, or build tool wrappers) is automatically hooked. On Windows, the hook DLL's `CreateProcess` detour ensures every child process gets the DLL injected before it starts. The interception is transparent and recursive -- the build system has no way to tell it is being monitored. + +**Environment scrubbing**. The hook cleans its own traces from the environment before launching the proxy. If it did not strip itself from `LD_PRELOAD`, the proxy binary would itself be hooked, causing infinite recursion. The proxy re-adds `LD_PRELOAD` only when launching commands that need interception (INJECT action). + +**Session tree**. The daemon maintains a tree of session IDs that mirrors the build process tree. Each proxy instance registers with its parent's session ID, allowing the daemon to track which processes spawned which. This is essential for features like target tree reconstruction and build profiling. diff --git a/en/catter/design/hook-mechanism.md b/en/catter/design/hook-mechanism.md new file mode 100644 index 0000000..32dc3d6 --- /dev/null +++ b/en/catter/design/hook-mechanism.md @@ -0,0 +1,218 @@ +# Hook Mechanism + +The hook is the lowest-level component of catter. It is a shared library that gets loaded into every process spawned by the build system. Its sole job is to intercept process creation calls and rewrite them so that every child process goes through `catter-proxy` instead of executing directly. + +The hook implementation is entirely platform-specific. Unix and Windows use fundamentally different interception techniques. + +## Unix (Linux and macOS) -- LD_PRELOAD Interception + +The Unix hook is a shared library loaded via the dynamic linker's preload mechanism: + +- **Linux**: `libcatter-hook-unix.so` loaded via `LD_PRELOAD` +- **macOS**: `libcatter-hook-unix.dylib` loaded via `DYLD_INSERT_LIBRARIES` + +### Intercepted Functions + +The hook replaces all standard POSIX functions for creating new processes: + +**exec family**: +- `execve()`, `execv()` +- `execvpe()`, `execvp()`, `execvP()` +- `execl()`, `execlp()`, `execle()` + +**posix_spawn family**: +- `posix_spawn()` +- `posix_spawnp()` + +On Linux, the hook uses `dlsym(RTLD_NEXT, "execve")` to obtain the original function pointer from the next library in the load order. It then provides a replacement function with the same signature. When the replacement is called, the hook can inspect and modify the arguments before optionally calling the original. + +On macOS, the hook uses the `DYLD_INTERPOSE` macro to replace functions at the dyld level, which is the preferred technique on that platform. + +### Key Classes + +The hook is implemented as a set of cooperating classes: + +- **`Session`** -- Reads and stores session information from the environment. Provides the proxy path and parent session ID. +- **`Resolver`** -- Resolves the target executable path. Handles PATH lookups, relative path resolution, and edge cases like missing executables. +- **`CmdBuilder`** -- Constructs the rewritten command that invokes `catter-proxy` instead of the original executable. +- **`EnvGuard`** -- RAII guard that scrubs the environment before the real `execve` is called. Removes catter variables and strips the hook library from `LD_PRELOAD`. +- **`Executor`** -- Orchestrates the interception. Validates the session, resolves the executable, builds the proxy command, cleans the environment, and calls the original function. +- **`Linker`** -- Abstraction for the real system call. Wraps `dlsym(RTLD_NEXT, ...)` to call the original `execve` or `posix_spawn`. + +### Hook Initialization + +The hook library is loaded via the dynamic linker's constructor mechanism (`__attribute__((constructor))` or equivalent). During initialization, the library: + +1. Sets up logging (to `log/catter-hook.log` in the catter data directory) +2. Is ready to intercept -- the `Executor` lazily reads session state from the environment on first interception + +### Environment Variables + +The hook reads session information from these environment variables, set by the proxy when it launches the build command: + +| Variable | Purpose | +|----------|---------| +| `__key_catter_proxy_path_v1` | Absolute path to the `catter-proxy` binary | +| `__key_catter_command_id_v1` | Session ID of the parent process | + +### Interception Flow + +When any process creation function is called (e.g., `execve("/usr/bin/g++", argv, envp)`): + +1. The hook's replacement function is entered. + +2. **`Executor`** validates the session. If the session is invalid (missing environment variables), it builds an error command that reports the problem to the daemon. + +3. **`Resolver`** resolves the target executable to an absolute path. For functions like `execvp()` and `execvpe()`, it searches directories in `PATH`. For `execve()`, it resolves relative to the current directory. + +4. **`CmdBuilder`** constructs the proxy command: + ``` + -p --exec -- + ``` + The original `argv[0]` and all subsequent arguments are preserved after the `--` separator. + +5. **`EnvGuard`** (RAII) modifies the environment array: + - Removes `__key_catter_proxy_path_v1` and `__key_catter_command_id_v1` + - Strips the hook library name from `LD_PRELOAD` (or `DYLD_INSERT_LIBRARIES`) + - If `LD_PRELOAD` becomes empty after stripping, removes it entirely + +6. **`Linker`** calls the **real** `execve()` (obtained via `dlsym(RTLD_NEXT, ...)`) with the rewritten command. + +7. If `execve` succeeds, it does not return (the current process image is replaced). If it fails, the hook restores `errno` and returns the error to the caller. + +### Why Clean the Environment? + +This step is critical. If the hook library remained in `LD_PRELOAD` when `catter-proxy` is launched: + +1. `catter-proxy` itself would be hooked +2. When the proxy tries to execute the actual compiler, the hook would intercept that call +3. The hook would rewrite it to launch another `catter-proxy` +4. This creates **infinite recursion** + +By stripping the hook from `LD_PRELOAD`, the proxy executes without interception. The proxy re-adds `LD_PRELOAD` only when it launches a command with the `INJECT` action, ensuring the hook is attached to the right processes. + +### Recursive Hooking + +Since `LD_PRELOAD` is inherited by child processes through the environment, interception is automatically recursive. Consider this build scenario: + +``` +make + -> sh -c "gcc main.c -o main" + -> gcc main.c -o main + -> cc1 main.c -o main.s + -> as main.s -o main.o + -> ld main.o -o main +``` + +Every process in this tree inherits `LD_PRELOAD` and gets hooked. Each invocation goes through `catter-proxy`, which asks the daemon what to do. The build system is completely unaware of the interception. + +--- + +## Windows -- DLL Injection and API Hooking + +The Windows hook uses a fundamentally different approach because Windows has no equivalent to `LD_PRELOAD`. Instead, catter combines: + +1. **DLL injection** -- to load the hook library into target processes +2. **API hooking via MinHook** -- to intercept `CreateProcess` calls within those processes + +### Hook DLL + +The hook is compiled as `catter-hook-win64.dll`. It uses MinHook, a lightweight x86/x64 API hooking library that works by overwriting the first few bytes of a target function with a jump to a detour function (trampoline hooking). + +### Environment Variables + +| Variable | Purpose | +|----------|---------| +| `CATTER_IPC_ID` | Session ID of the parent process | +| `CATTER_PROXY_PATH` | Absolute path to the `catter-proxy.exe` binary | + +### DLL Injection Process + +When `catter-proxy` (in injector mode) needs to start a build command with the hook attached: + +1. **Create the target process suspended**. The proxy calls `CreateProcessA()` with the `CREATE_SUSPENDED` flag. The process is created but its main thread does not run. + +2. **Set environment variables**. `CATTER_IPC_ID` and `CATTER_PROXY_PATH` are set in the target process environment before creation (passed via the environment block). + +3. **Allocate memory in the target process**. The proxy calls `VirtualAllocEx()` to allocate a region of memory in the target process's address space. + +4. **Write the DLL path**. The proxy calls `WriteProcessMemory()` to write the full path of `catter-hook-win64.dll` into the allocated memory. + +5. **Create a remote thread to load the DLL**. The proxy tries three methods in order: + - `CreateRemoteThread()` -- Documented Win32 API. Tried first as it is the most widely supported. + - `NtCreateThreadEx()` -- Undocumented NT API. Fallback that works reliably on modern Windows (Vista+). + - `RtlCreateUserThread()` -- Another undocumented NT API. Last resort fallback. + + The remote thread's entry point is `LoadLibraryA`, and its argument is the pointer to the DLL path string written in step 4. When the thread runs, it loads `catter-hook-win64.dll` into the target process. + +6. **Wait for injection to complete**. The proxy waits for the remote thread to finish (with a 3-second timeout). + +7. **Resume the main thread**. The proxy calls `ResumeThread()` to let the target process start executing. By this point, the hook DLL is loaded and its hooks are active. + +### Hooked Windows APIs + +The hook DLL installs MinHook trampoline hooks on these functions during `DLL_PROCESS_ATTACH`: + +- **`CreateProcessA()`** -- ANSI version of process creation +- **`CreateProcessW()`** -- Wide-character version of process creation +- **`CreateProcessAsUserA()`** -- Passthrough (currently not rewritten) +- **`CreateProcessAsUserW()`** -- Passthrough (currently not rewritten) + +MinHook works by: +1. Saving the first few instructions of the target function +2. Overwriting them with a jump to the detour function +3. Providing a trampoline that contains the saved instructions followed by a jump back, so the original function can still be called + +### Hook Detour Logic + +When `CreateProcessA` or `CreateProcessW` is called by the hooked process: + +1. **Extract the command line**. The hook reads `lpApplicationName` and `lpCommandLine`. + +2. **Resolve the absolute path**. The hook resolves the executable using `resolve_abspath()`, which searches the program directory, current directory, System32, Windows directory, and `PATH` (matching the standard Windows search order). + +3. **Rewrite the command line**: + ``` + {proxy_path} -p {ipc_id} --exec {resolved_path} -- {original_cmdline} + ``` + The `lpApplicationName` is set to `nullptr` so that the command line is parsed by `CreateProcess` normally. + +4. **Call the original `CreateProcess`** via the MinHook trampoline with the modified command line. + +### Key Difference from Unix + +On Unix, `LD_PRELOAD` is an environment variable inherited automatically by all child processes. The hook is "viral" by default -- every subprocess gets it. + +On Windows, there is no such mechanism. The hook DLL must be explicitly injected into each new process. This happens because: + +1. The hook's `CreateProcess` detour intercepts every process creation +2. It rewrites the command to go through `catter-proxy` +3. When the proxy decides to execute with `INJECT` action, it performs DLL injection into the new process (create suspended, inject, resume) + +This means the proxy is responsible for re-injecting the hook into each generation of child processes, maintaining the recursive interception chain. + +### DLL Lifecycle + +``` +DLL_PROCESS_ATTACH: + MH_Initialize() -- Initialize MinHook + MH_CreateHook(...) -- Install hooks on CreateProcess variants + MH_EnableHook(...) -- Activate all hooks + +DLL_PROCESS_DETACH: + MH_DisableHook(...) -- Deactivate all hooks + MH_Uninitialize() -- Clean up MinHook +``` + +`DisableThreadLibraryCalls()` is called during attach to suppress `DLL_THREAD_ATTACH` and `DLL_THREAD_DETACH` notifications, reducing overhead. + +## Platform Comparison + +| Aspect | Linux | macOS | Windows | +|--------|-------|-------|---------| +| Hook library | `libcatter-hook-unix.so` | `libcatter-hook-unix.dylib` | `catter-hook-win64.dll` | +| Injection method | `LD_PRELOAD` env var | `DYLD_INSERT_LIBRARIES` env var | `VirtualAllocEx` + `LoadLibraryA` | +| Interposition | `dlsym(RTLD_NEXT, ...)` | `DYLD_INTERPOSE` macro | MinHook trampoline hooking | +| Intercepted APIs | `execve`, `execvp`, `posix_spawn`, etc. | Same as Linux | `CreateProcessA`, `CreateProcessW` | +| Recursive hooking | Automatic (env inherited) | Automatic (env inherited) | Explicit (DLL re-injected per process) | +| Env vars for session | `__key_catter_proxy_path_v1`, `__key_catter_command_id_v1` | Same as Linux | `CATTER_PROXY_PATH`, `CATTER_IPC_ID` | diff --git a/en/catter/design/ipc-protocol.md b/en/catter/design/ipc-protocol.md new file mode 100644 index 0000000..3df092f --- /dev/null +++ b/en/catter/design/ipc-protocol.md @@ -0,0 +1,221 @@ +# IPC Protocol + +Catter uses inter-process communication between the daemon (`catter`) and proxy instances (`catter-proxy`). The daemon acts as a server, and each proxy instance connects as a client. + +## Transport + +The transport layer is platform-specific: + +| Platform | Mechanism | Path / Name | +|----------|-----------|-------------| +| Linux / macOS | Unix domain socket | `$XDG_DATA_HOME/pipe-catter-ipc.sock` (typically `~/.local/share/pipe-catter-ipc.sock`) | +| Windows | Named pipe | `\\.\pipe\catter-ipc` | + +The daemon creates the listening socket/pipe at startup. Each `catter-proxy` instance connects to it as a client when it starts. The connection persists for the lifetime of the proxy process. + +## Serialization + +All messages are encoded using [Bincode](https://github.com/bincode-org/bincode), a compact binary serialization format. The implementation uses `kota::ipc::BincodePeer`, which provides a request-response abstraction over a stream transport (`kota::ipc::StreamTransport`). + +Bincode was chosen for its efficiency -- messages are small and fast to encode/decode, which matters because every intercepted build command results in multiple IPC round-trips. + +## Request-Response Model + +Communication follows a strict request-response pattern. The client (proxy) sends a request and waits for the server (daemon) to respond before proceeding. Each request type has a well-defined parameter type and result type. + +The protocol is defined in `src/common/util/data.h` using C++ template specialization: + +```cpp +enum class RequestType : uint8_t { + CHECK_MODE, + CREATE, + MAKE_DECISION, + REPORT_ERROR, + FINISH, +}; +``` + +Each request type maps to a `Request` specialization that declares `Params` (the request payload) and `Result` (the response payload). + +## Request Types + +### CHECK_MODE + +Asks the daemon what service mode is active. + +| Field | Type | Description | +|-------|------|-------------| +| **Params** | `ServiceMode` (enum) | The mode to check (currently only `INJECT`) | +| **Result** | `bool` | `true` if the daemon is in the requested mode | + +This is the first request a proxy sends after connecting. It confirms the daemon is ready to handle intercepted commands. + +### CREATE + +Registers a new process session with the daemon. + +| Field | Type | Description | +|-------|------|-------------| +| **Params** | `ipcid_t` (`int32_t`) | The parent session ID | +| **Result** | `ipcid_t` (`int32_t`) | A unique session ID assigned to this proxy instance | + +The parent ID establishes the parent-child relationship in the session tree. For the first proxy (injector mode), the parent ID is provided by the daemon when it spawns the proxy. For wrapper-mode proxies, the parent ID comes from the `__key_catter_command_id_v1` environment variable (or `CATTER_IPC_ID` on Windows), which was set by the hook. + +### MAKE_DECISION + +The core request. The proxy sends a captured command and the daemon decides what to do with it. + +**Params** -- `command`: + +| Field | Type | Description | +|-------|------|-------------| +| `cwd` | `string` | Working directory of the intercepted process | +| `executable` | `string` | Resolved absolute path to the executable | +| `args` | `string[]` | Full argument array (including `argv[0]`) | +| `env` | `string[]` | Environment variables (in `KEY=VALUE` format) | + +**Result** -- `action`: + +| Field | Type | Description | +|-------|------|-------------| +| `type` | `uint8_t` enum | One of `DROP`, `INJECT`, or `WRAP` | +| `cmd` | `command` | The command to execute (may be modified by the script) | + +**Action types**: + +- **`DROP` (0)** -- Do not execute the command. The proxy returns exit code 0 immediately. Used when the script determines a command is irrelevant (e.g., a compiler invocation the user wants to skip). +- **`INJECT` (1)** -- Execute the command with the hook library attached. The proxy re-adds `LD_PRELOAD` (or performs DLL injection on Windows) so that child processes of this command are also intercepted. This is the default for build commands whose children should be monitored. +- **`WRAP` (2)** -- Execute the command directly without hooking. The proxy runs the command and captures its stdout/stderr, but does not inject the hook. Used for leaf commands (like actual compiler invocations) that do not spawn further build processes. + +The daemon may modify the command in the returned action. For example, a script could change compiler flags, redirect output paths, or substitute a different executable. + +### REPORT_ERROR + +Reports an error condition from the hook or proxy back to the daemon. + +**Params**: + +| Field | Type | Description | +|-------|------|-------------| +| `parent_id` | `ipcid_t` | Session ID of the parent process | +| `error_msg` | `string` | Human-readable error description | + +**Result**: `null` (no response payload) + +This is used when the hook encounters an invalid state (e.g., missing environment variables) or when the proxy catches an exception during command processing. The daemon logs the error and can notify the user. + +### FINISH + +Reports that a command has completed execution. + +**Params** -- `process_result`: + +| Field | Type | Description | +|-------|------|-------------| +| `code` | `int64_t` | Process exit code | +| `std_out` | `string` | Captured standard output | +| `std_err` | `string` | Captured standard error | + +**Result**: `null` (no response payload) + +After the daemon receives this request, it invokes the `onExecution()` JavaScript callback with the result data. The proxy then disconnects. + +## Typical Message Sequence + +A complete proxy lifecycle involves this sequence of IPC messages: + +### Injector Mode (first proxy) + +``` +Proxy -> Daemon: CHECK_MODE(INJECT) +Daemon -> Proxy: true +[Proxy launches build command with hook attached] +[Proxy waits for build to complete] +[Proxy exits when build finishes] +``` + +The injector proxy also sends `CREATE`, `MAKE_DECISION`, and `FINISH` for the top-level build command, so the build system command itself passes through `onCommand`/`onExecution` like any other intercepted command. + +### Wrapper Mode (intercepted command) + +``` +Proxy -> Daemon: CHECK_MODE(INJECT) +Daemon -> Proxy: true +Proxy -> Daemon: CREATE(parent_id) +Daemon -> Proxy: new_session_id +Proxy -> Daemon: MAKE_DECISION(command) +Daemon -> Proxy: action {type, cmd} +[Proxy executes or drops the command] +Proxy -> Daemon: FINISH(process_result) +Daemon -> Proxy: null +[Proxy disconnects] +``` + +### Error Case + +``` +Proxy -> Daemon: REPORT_ERROR(parent_id, error_message) +Daemon -> Proxy: null +[Proxy exits with code -1] +``` + +## Session Model + +The daemon maintains a session tree that mirrors the process tree of the build: + +``` +Session 0 (root -- injector proxy) + +-- Session 1 (make -> gcc file1.c) + +-- Session 2 (make -> gcc file2.c) + +-- Session 3 (make -> ar rcs libfoo.a file1.o file2.o) + +-- Session 4 (make -> sh -c "gcc file3.c") + +-- Session 5 (sh -> gcc file3.c) +``` + +Each session is identified by a unique `ipcid_t` (32-bit integer). The tree is built incrementally as `CREATE` requests arrive, each specifying a `parent_id`. This structure enables: + +- **Target tree reconstruction** -- by knowing which object files are linked into which targets +- **Build profiling** -- by tracking timing data per session and visualizing the parallelism +- **Error attribution** -- by tracing errors back to the build command that caused them + +## Connection Lifecycle + +1. **Daemon starts** -- Creates and binds the Unix domain socket or named pipe. +2. **Proxy connects** -- Opens a connection to the socket/pipe. The connection is established using `kota::pipe::connect()` on the proxy side. +3. **Daemon accepts** -- For each incoming connection, the daemon creates a `kota::ipc::BincodePeer` and registers request handlers (`on_request>(...)`) for each request type. +4. **Request handling** -- The daemon's event loop dispatches incoming requests to the appropriate handler. Handlers may be async (using `co_await`) to interact with the JS runtime or other daemon state. +5. **Peer disconnects** -- When the proxy process exits, the transport closes and the `BincodePeer::run()` coroutine completes. The daemon logs the disconnection and cleans up the session. + +## Data Types + +The core data structures used across the IPC boundary: + +```cpp +// Unique identifier for an IPC session +using ipcid_t = int32_t; + +// Captured command from the build system +struct command { + std::string cwd; // Working directory + std::string executable; // Resolved executable path + std::vector args; // Argument array + std::vector env; // Environment (KEY=VALUE entries) +}; + +// Result of executing a command +struct process_result { + int64_t code = -1; // Exit code + std::string std_out; // Captured stdout + std::string std_err; // Captured stderr +}; + +// Decision returned by the daemon +struct action { + enum : uint8_t { + DROP, // Do not execute + INJECT, // Execute with hook attached + WRAP, // Execute without hook + } type; + command cmd; // Possibly modified command +}; +``` diff --git a/en/catter/dev/build.md b/en/catter/dev/build.md new file mode 100644 index 0000000..d4b6a9d --- /dev/null +++ b/en/catter/dev/build.md @@ -0,0 +1,66 @@ +# Building from Source + +## Prerequisites + +- [pixi](https://pixi.sh) -- Environment and dependency management +- [XMake](https://xmake.io/) -- Build system (not installed by pixi; install separately) +- Git + +## Supported Platforms + +| Platform | Toolchain | +|---|---| +| Windows (win-64) | MSVC (VS2022) | +| Linux (linux-64) | GCC 14.2 | +| macOS (osx-arm64) | Clang 20.1 | + +## Build Commands + +All builds are managed through pixi tasks: + +```bash +# Configure the build (mode: debug, release, or releasedbg) +pixi run cfg debug + +# Build the project +pixi run build + +# Configure in release mode +pixi run cfg release + +# Build JavaScript/TypeScript API +pixi run build-js + +# Install npm dependencies +pixi run npm-install +``` + +## Build System + +The project uses [XMake](https://xmake.io/) as the build system, managed through pixi tasks. The C++ standard is C++23. + +### Build Targets + +- `catter` -- Main CLI executable (the DECISION daemon) +- `catter-proxy` -- Proxy process manager +- `catter-hook-unix` -- Unix hook shared library (Linux/macOS) +- `catter-hook-win64` -- Windows hook DLL +- `catter-core` -- Core library +- `common` -- Shared utilities library + +## JavaScript Build + +The TypeScript API under `api/` is compiled to JavaScript and embedded into the catter binary. Build with: + +```bash +pixi run build-js +``` + +This runs rollup to bundle the TypeScript, and the resulting JS is compiled into the binary as a resource. + +## Key Dependencies + +- [QuickJS-ng](https://github.com/quickjs-ng/quickjs) (v0.11.0) -- Embedded JavaScript engine +- [spdlog](https://github.com/gabime/spdlog) (1.15.3) -- Logging +- [kotatsu](https://github.com/clice-io/kotatsu) -- Async runtime, testing framework, option parsing +- [MinHook](https://github.com/TsudaKageworyo/MinHook) (v1.3.4) -- Windows API hooking (Windows only) diff --git a/en/catter/dev/contribution.md b/en/catter/dev/contribution.md new file mode 100644 index 0000000..85bb082 --- /dev/null +++ b/en/catter/dev/contribution.md @@ -0,0 +1,58 @@ +# Contributing + +## Testing + +Three ways to run tests: + +```bash +# Run everything (build + unit tests + integration tests) +pixi run -e dev test + +# Unit tests only +pixi run -e dev unit-test + +# Integration tests only +pixi run -e dev integration-test +``` + +### Unit Tests + +Unit tests use the Kotatsu testing framework (`kota::zest`). Located in `tests/unit/`, the structure mirrors `src/`: + +- `tests/unit/common/` -- Tests for shared utilities and option parsing +- `tests/unit/catter/` -- Tests for core logic (compiler identification, JS runtime) +- `tests/unit/catter-hook/unix/` -- Tests for Unix hook payload + +To enable test targets in the build: + +```bash +pixi run cfg # or: xmake config --test=y +``` + +### Integration Tests + +Integration tests use the [LLVM Lit](https://llvm.org/docs/CommandGuide/lit.html) framework. Located in `tests/integration/`, they compile and run C++ test programs, using FileCheck for output verification. + +```bash +# Run integration tests with verbose output +lit ./tests/integration -sav +``` + +## Commit Message Format + +Use [conventional commits](https://www.conventionalcommits.org/): + +``` +(): +``` + +**Types**: `feat`, `fix`, `refactor`, `chore`, `docs`, `ci`, `test`, `perf`, `style`, `revert` + +Scopes should match source directories or feature names. Keep the subject line under 70 characters. + +## Code Style + +- C++23 standard +- Use `.clang-format` for C++ formatting +- Use ESLint + Prettier for TypeScript +- Pre-commit hooks are configured (`.pre-commit-config.yaml`) diff --git a/en/catter/features/build-profiling.md b/en/catter/features/build-profiling.md new file mode 100644 index 0000000..61517b8 --- /dev/null +++ b/en/catter/features/build-profiling.md @@ -0,0 +1,21 @@ +# Build Profiling + +> [!WARNING] +> This feature is planned but not yet implemented. + +## Concept + +Build profiling captures per-process timing data during a build: + +- Process start time and duration +- Parent-child relationships +- CPU utilization + +This data is rendered as a visual timeline (Gantt-chart style) in a browser, allowing developers to diagnose build performance issues. + +## Planned Capabilities + +- **Identify serialization bottlenecks** -- Find stages where the build runs single-threaded despite available parallelism. +- **Measure actual vs. theoretical parallelism** -- Compare the observed concurrency level against the number of available cores. +- **Find slow compilation units** -- Pinpoint individual source files that take disproportionately long to compile. +- **Debug build system configuration** -- Detect misconfigured dependencies, unnecessary serialization, or suboptimal job scheduling. diff --git a/en/catter/features/command-tree.md b/en/catter/features/command-tree.md new file mode 100644 index 0000000..16a4950 --- /dev/null +++ b/en/catter/features/command-tree.md @@ -0,0 +1,43 @@ +# Command Tree + +The command tree visualizes the build command DAG as an ASCII tree, showing parent-child process relationships and their commands. + +Built-in script: `script::cmd-tree` + +## Usage + +```bash +catter script::cmd-tree [options] -- +``` + +## Options + +| Option | Description | +|--------|-------------| +| `-d, --depth ` | Limit render depth. | +| `-a, --args ` | Number of arguments to show per command. Default: all. | +| `-w, --argWidth ` | Truncate long arguments to this width. Default: 10 characters. | + +## Output + +The output is an ANSI-colored ASCII tree using box-drawing characters (`│`, `├──`, `└──`). Colors cycle through 4 values by depth level, making it easy to distinguish nesting at a glance. + +A typical build tree might look like: + +``` +make -j8 +├── gcc -c main.c -o main.o +│ └── as -o main.o /tmp/ccXXXX.s +├── gcc -c util.c -o util.o +│ └── as -o util.o /tmp/ccYYYY.s +└── gcc main.o util.o -o app + └── ld -o app main.o util.o -lc +``` + +Each node shows the process command line, and its children are the processes it spawned. The tree is constructed from actual process interception, so it reflects exactly what happened during the build. + +## Use Cases + +- **Understanding build orchestration** -- See how a build system spawns compilers, assemblers, and linkers. +- **Debugging unexpected process spawning** -- Identify processes you did not expect the build to invoke. +- **Verifying capture completeness** -- Confirm that all expected compilations are being intercepted by catter. diff --git a/en/catter/features/compilation-database.md b/en/catter/features/compilation-database.md new file mode 100644 index 0000000..b41f919 --- /dev/null +++ b/en/catter/features/compilation-database.md @@ -0,0 +1,67 @@ +# Compilation Database + +The compilation database (CDB) is catter's primary feature. It intercepts compiler invocations during a build and generates a standard `compile_commands.json` file, compatible with clang tooling, language servers, and IDEs. + +Built-in script: `script::cdb` + +## Usage + +```bash +catter script::cdb [options] -- +``` + +## Script Options + +| Option | Description | +|--------|-------------| +| `-o, --output ` | Output path for `compile_commands.json`. Defaults to `build/compile_commands.json`. | +| `--abort-on-command-failure` | Abort the entire build if any intercepted command fails. | +| `--save-on-failure` | Save partial CDB even if the build fails. | + +## Behavior + +By default, catter **merges** with an existing `compile_commands.json` if one is found at the output path. New entries for the same source file replace old ones, so you can incrementally rebuild without losing entries from previous runs. + +Internally, catter: + +1. Intercepts each compiler invocation during the build. +2. Analyzes the command using `CompilerAnalysis` to identify source files, output files, and compiler flags. +3. Builds a `FlatTree` (DAG) of command relationships to track input-to-output edges. +4. On completion, traverses the tree to leaf source files and generates one CDB entry per source file. + +## CDB Entry Format + +The output follows the standard clang JSON compilation database format: + +```json +[ + { + "directory": "/path/to/build", + "file": "/path/to/source.cpp", + "arguments": ["clang++", "-std=c++20", "-c", "source.cpp", "-o", "source.o"], + "output": "source.o" + } +] +``` + +## Supported Compilers + +- **C/C++**: GCC, Clang + +Compiler wrappers such as `ccache`, `distcc`, and `sccache` are recognized and handled transparently. Catter can also identify other compilers (Flang, ifort, NVCC, etc.) but `CompilerAnalysis` currently only generates CDB entries for GCC and Clang commands. + +## Examples + +```bash +# Basic usage with make +catter script::cdb -- make -j8 + +# Specify output path +catter script::cdb -o build/compile_commands.json -- ninja + +# With CMake +catter script::cdb -- cmake --build build + +# With any build system +catter script::cdb -- ./build.sh +``` diff --git a/en/catter/features/fake-compilation.md b/en/catter/features/fake-compilation.md new file mode 100644 index 0000000..17c031e --- /dev/null +++ b/en/catter/features/fake-compilation.md @@ -0,0 +1,24 @@ +# Fake Compilation + +> [!WARNING] +> This feature is planned but not yet implemented. + +## Concept + +Instead of forwarding compilation to the real compiler, catter generates fake placeholder `.o` files. This lets the build system complete its full run -- including link steps -- without actual compilation. + +The result: + +- **A complete CDB in a fraction of the time.** No real compilation work is performed, so the build finishes as fast as the build system can schedule it. +- **Full linker command capture.** Because placeholder object files exist on disk, linker invocations proceed normally and can be intercepted. + +## Smart Dependency Analysis + +Not all commands can be faked. Code generators -- such as LLVM TableGen -- must still be built genuinely, because they produce headers that other compilations depend on. + +Catter will analyze dependencies to distinguish between: + +- **Regular compilation** -- Can be faked. The output `.o` file is only consumed by the linker. +- **Code generators** -- Must be built. Their output (generated headers, source files) is needed as input by other compilation steps. + +This achieves a "minimal build": only build what is strictly necessary for correct CDB generation and header production, and fake everything else. diff --git a/en/catter/features/target-tree.md b/en/catter/features/target-tree.md new file mode 100644 index 0000000..eea3b38 --- /dev/null +++ b/en/catter/features/target-tree.md @@ -0,0 +1,39 @@ +# Target Tree + +The target tree renders build artifacts as a dependency forest, showing what was built from what. + +Built-in script: `script::target-tree` + +## Usage + +```bash +catter script::target-tree [options] -- +``` + +## Options + +| Option | Description | +|--------|-------------| +| `-d, --depth ` | Limit render depth. | + +## Output + +The target tree inverts the perspective of the command tree. Instead of showing process parent-child relationships, it shows output artifacts (executables, libraries, object files) as parents, with their input files as children. + +For example, where the command tree shows `gcc main.o util.o -o app`, the target tree shows: + +``` +app +├── main.o +│ └── main.c +├── util.o +│ └── util.c +``` + +This makes it clear which source files contribute to which final artifacts. + +## Use Cases + +- **Understanding target dependencies** -- See the full dependency chain from final binaries down to source files. +- **Analyzing build graphs for C++20 modules** -- Trace module interface units and their consumers. +- **Identifying source contributions** -- Determine which sources contribute to which targets, useful for splitting or restructuring builds. diff --git a/en/catter/guide/configuration.md b/en/catter/guide/configuration.md new file mode 100644 index 0000000..3d7aa81 --- /dev/null +++ b/en/catter/guide/configuration.md @@ -0,0 +1,80 @@ +# Configuration + +## catter CLI + +``` +catter [options]