From 46c7e3cfd16b5677278f13d5ff55fbe7e0cad661 Mon Sep 17 00:00:00 2001
From: Akshith Gunasekaran <akshithg@users.noreply.github.com>
Date: Thu, 11 Jun 2026 17:05:43 -0700
Subject: [PATCH] Document SQL generation review guidance

---
 AGENTS.md                     |  13 ++++
 README.md                     |   7 +++
 docs/sql-generation-review.md | 110 ++++++++++++++++++++++++++++++++++
 3 files changed, 130 insertions(+)
 create mode 100644 AGENTS.md
 create mode 100644 docs/sql-generation-review.md

diff --git a/AGENTS.md b/AGENTS.md
new file mode 100644
index 00000000..9abe70ff
--- /dev/null
+++ b/AGENTS.md
@@ -0,0 +1,13 @@
+# Agent Guidance
+
+This file gives automated coding agents lightweight project context. Maintainer
+review and the repository documentation remain authoritative.
+
+- Start with `README.md` for setup, supported dialects, and test commands.
+- For generated SQL changes, use `docs/sql-generation-review.md`.
+- Prefer small changes that follow the existing `DbMap`, `TableMap`, `ColumnMap`,
+  `IndexMap`, and dialect patterns.
+- Add focused tests for changed behavior; include affected dialects when SQL
+  syntax or quoting differs by dialect.
+- Keep tests at the public API level where practical.
+- Run `gofmt` and relevant `go test` commands before submitting changes.
diff --git a/README.md b/README.md
index fc375c14..7023d78a 100644
--- a/README.md
+++ b/README.md
@@ -689,6 +689,13 @@ Valid `GORP_TEST_DIALECT` values are: "mysql"(for mymysql),
 `test_all.sh` script for examples of all 3 databases.  This is the
 script I run locally to test the library.
 
+## SQL Generation Review
+
+Changes that affect generated SQL, identifier quoting, mapper metadata, dialect
+behavior, or cached query plans should also consider
+[`docs/sql-generation-review.md`](docs/sql-generation-review.md). It records the main SQL
+synthesis invariants reviewers should keep in mind.
+
 
 ## Contributors
 
diff --git a/docs/sql-generation-review.md b/docs/sql-generation-review.md
new file mode 100644
index 00000000..101c3816
--- /dev/null
+++ b/docs/sql-generation-review.md
@@ -0,0 +1,110 @@
+# SQL Generation Review Guide
+
+borp turns mapper metadata and Go values into SQL. This guide is grounded in the
+code paths that already synthesize SQL. When reviewing those paths, first
+classify each input as one of:
+
+- SQL syntax emitted by borp.
+- Values passed to `database/sql` as bind parameters.
+- Caller-provided SQL fragments that are intentionally embedded in generated
+  statements.
+
+## Representative Code Anchors
+
+Use these anchors as starting points for review, not as an exhaustive inventory.
+Update this section when SQL synthesis moves or private helpers are renamed.
+
+- Raw SQL entry points: `DbMap.Select`, `DbMap.SelectOne`, `DbMap.ExecContext`,
+  and the matching `Transaction` methods accept caller-authored SQL.
+- Generated DML and reads: the `TableMap` binding path, currently including
+  `bindInsert`, `bindUpdate`, `bindDelete`, and `bindGet`, builds statements
+  from mapper metadata and model values.
+- Generated DDL and table maintenance: table creation, index creation and
+  deletion, table drops, and truncate helpers build statements from table,
+  column, index, and dialect metadata.
+- Dialect boundary: `Dialect.QuoteField`, `QuotedTableForQuery`, `BindVar`,
+  `QuerySuffix`, `AutoIncrInsertSuffix`, and DDL suffix methods define how
+  logical names, bind placeholders, and dialect-specific syntax are rendered.
+- Mapper metadata: `TableMap`, `ColumnMap`, `IndexMap`, and struct tags passed
+  through `AddTable` or `AddTableWithName` feed generated SQL.
+- Cached plans: `TableMap` caches insert, update, delete, and get plans and
+  exposes `ResetSql` for mapper changes that require regeneration.
+- Transaction helper names: savepoint helpers build SQL from caller-provided
+  names and currently pass those names through dialect identifier quoting.
+
+Some of these contracts are explicit in public APIs and comments, while others
+are implicit in the way metadata is currently appended or quoted. Raw SQL entry
+points, the `Dialect` interface, and `ResetSql` are explicit anchors; metadata
+passed through `QuoteField` or appended directly is an implementation-derived
+anchor. If a change depends on an implicit contract, prefer clarifying it with
+documentation or tests instead of relying on this guide alone.
+
+## Input Kinds
+
+Model field values should normally be bind parameters. In generated DML, values
+should flow through dialect bind placeholders and the argument list rather than
+being copied into SQL text.
+
+Identifier metadata should be rendered as identifiers through the active
+dialect. This includes table, schema, column, index, and savepoint names.
+Identifier quoting must preserve identifier context even when names contain
+dialect delimiter characters.
+
+Raw SQL fragments are caller-owned SQL. Query strings passed to raw SQL entry
+points and metadata that is appended directly, such as defaults, index types, and
+dialect suffixes, should be treated as SQL fragments unless the API is changed to
+make them structured values.
+
+## Invariants
+
+- Generated identifiers must remain identifiers. Embedded quote characters,
+  separators, comments, or suffixes must not escape the identifier context.
+- Generated values should remain bind parameters unless an API explicitly
+  accepts a SQL expression.
+- Raw SQL escape hatches should be explicit in documentation and tests.
+- Generated statement shape should match the API call. For example, inserting
+  one model object should not create additional rows through mapper metadata.
+- Transaction helpers should keep generated savepoint names within their
+  intended SQL context.
+
+Cached-plan inputs must be explicit. If an API input can change the generated
+statement, it needs to be part of the cache key, the plan must be built per call,
+or the API should avoid exposing that extra statement shape.
+
+Dialect behavior should be tested per dialect. SQLite, MySQL, and PostgreSQL do
+not quote or bind every construct the same way.
+
+## Review Checklist
+
+For every SQL-generation change, ask:
+
+- Which bytes are SQL syntax and which values are bind parameters?
+- Is each metadata field an identifier, a bind value, or an intentional raw SQL
+  fragment?
+- Can a struct tag, exported metadata field, or dialect hook influence those
+  bytes?
+- Are all identifier delimiters escaped for each affected dialect?
+- Is the generated plan cached, and if so, what inputs are part of the cache key?
+- Does the test fail if the generated SQL targets a different table, column,
+  row count, statement, or predicate?
+
+## Regression Expectations
+
+Prefer focused deterministic regressions that inspect both behavior and generated
+SQL when practical. Tests should execute through real borp public APIs when
+possible.
+
+Useful regression patterns include:
+
+- A malicious quote character in every identifier position affected by the fix.
+- A cached-plan test that first builds the unsafe plan and then exercises the
+  later call shape that should differ.
+- A single-statement integrity test for generated DML, such as proving only one
+  intended row is inserted or updated.
+- Dialect-specific tests for syntax that differs across SQLite, MySQL, and
+  PostgreSQL.
+
+Fuzzing can be useful for stable, pure invariants such as identifier quoting, but
+it should supplement deterministic generated-SQL tests rather than replace them.
+
+Run `go test ./...` after changing query generation.