Skip to content

Correctness fixes: dml-refresh data loss, LABEL escaping, dead columnstore guard, init port, misc#706

Open
joshmarkovic wants to merge 9 commits into
dbt-msft:masterfrom
joshmarkovic:fix/audit-quick-wins
Open

Correctness fixes: dml-refresh data loss, LABEL escaping, dead columnstore guard, init port, misc#706
joshmarkovic wants to merge 9 commits into
dbt-msft:masterfrom
joshmarkovic:fix/audit-quick-wins

Conversation

@joshmarkovic

@joshmarkovic joshmarkovic commented Jun 10, 2026

Copy link
Copy Markdown
Contributor

Batch of small, independent correctness fixes. Every change was verified against a live SQL Server 2022 container (devops/server.Dockerfile image): the relevant functional suites (test_table_refresh_method, test_query_options, test_index, test_data_types) pass, unit tests pass 110/110, and each fix has a targeted reproduction described below.

Note: this batch originally also fixed the sys.types join in the catalog queries (joining on system_type_id, which is not unique, fans out and produces duplicate rows / wrong type names for UDT and sysname columns). That commit was dropped while merging master: #289 (persist-docs) independently fixed the same join, so the fix is already on master and is no longer part of this PR.


1. dml refresh: a failed swap could commit an empty target table

Issue: The dml refresh swap runs BEGIN TRANSACTION; DELETE FROM target; INSERT INTO target SELECT ... ; COMMIT; as one batch on an autocommit connection, without SET XACT_ABORT ON. Statement-aborting errors (e.g. a NULL or constraint violation on the INSERT) do not stop a T-SQL batch — execution continues to the trailing COMMIT, which persists the DELETE. Result: the target is left committed-empty. Silent data loss, reproduced live (other connections see 0 rows, @@TRANCOUNT = 0, nothing to roll back).

Solution: SET XACT_ABORT ON at the start of the swap batch (table_dml_refresh.sql). Any statement failure now rolls back the entire transaction. Verified for both error classes (statement-aborting and batch-aborting): the target retains its pre-run rows after a failed refresh.

2. dbt init suggested the Postgres port

Issue: profile_template.yml shipped port: default: 5432 — the Postgres port — so every dbt init user accepting defaults got a non-working profile.

Solution: Default changed to 1433. Verified through dbt's own InitTask.generate_target_from_input code path: accepting the default yields port: 1433 (int).

3. Python float mapped to SQL Server bigint

Issue: The datatypes mapping in sqlserver_constants.py (consumed by SQLServerConnectionManager.data_type_code_to_name to report column type names for query metadata) translated Python float to "bigint".

Solution: Map it to "float". Verified live: a cast(1.5 as float) result column produces cursor type_code = <class 'float'>, which now resolves to float.

4. A single quote in query_tag broke every emitted query (and allowed OPTION-clause injection)

Issue: get_query_options() (and the deprecated apply_label()) interpolated the user-supplied query_tag config into OPTION (LABEL = '...') without escaping. A tag containing ' produced a syntax error in every statement the adapter emits for that model; a crafted tag could inject arbitrary text into the OPTION clause.

Solution: Escape single quotes (''') at both build sites in metadata.sql. Verified: a model with query_tag: "rob's o'clock tag" now builds cleanly on both code paths, including statements wrapped in EXEC('...') (where the pre-escaped label composes correctly with the wrapper's own quote doubling).

5. Columnstore-index existence guard was dead code; misleading comment on the default incremental strategy

Issue: sqlserver__create_clustered_columnstore_index guarded its DROP INDEX with object_id('<schema>_<table>') — an underscore-joined name that never resolves (verified live: always NULL). The IF EXISTS branch could therefore never fire, and re-creating a CCI on a table that already has one fails with "You cannot create more than one clustered index". Separately, the comment in incremental_strategies.sql claimed the default strategy with a unique_key performs delete+insert, when it actually emits a MERGE via get_incremental_merge_sql.

Solution: Use object_id('[schema].[table]') in indexes.sql; the guard now finds the existing index and the macro drops + recreates it (verified by invoking it against a table with an existing CCI). Comment corrected to say MERGE (separate commit).

Known pre-existing limitation, unchanged here: tables built via a __dbt_tmp intermediate + rename keep the index name <schema>_<table>__dbt_tmp_cci, which the macro's computed name does not match, so the guard cannot protect that case.

6. Docs/tooling: duplicated README section, lint args targeting py39, broken make clean

  • README documented dbt_sqlserver_use_default_schema_concat twice, with conflicting flags:-vs-vars: guidance. Merged into a single section matching the actual implementation (schema.sql:61-63: behavior flag first, vars as backwards-compat fallback).
  • pre-commit: the auto black hook used --target-version=py39 and isort used --python-version 39, while requires-python >= 3.10 and the manual black-check hook already targeted py310. Aligned both to py310. Verified that re-running black/isort over the repo with the new targets changes zero files, so no reformatting churn for contributors.
  • Makefile: the clean target had a .PHONY declaration and recipe lines but the clean: rule line itself was missing, so make clean did nothing. Restored (and it now shows in make help).

(Each of these three is its own commit.)

@joshmarkovic joshmarkovic marked this pull request as ready for review June 11, 2026 15:01
Without XACT_ABORT, a statement-aborting error (e.g. a constraint
violation on the INSERT) does not stop the swap batch: execution
continues to the trailing COMMIT, which persists the DELETE and leaves
the target committed-empty. Verified live against SQL Server 2022; with
XACT_ABORT ON the whole transaction rolls back and the target keeps its
pre-run rows.
A query_tag containing a single quote broke every query the adapter
emitted, and allowed injection into the OPTION clause.
The underscore-joined name never resolves, so the existence check was
always false and the DROP never ran. Use a bracketed schema.table name.
With a unique_key the default strategy emits a MERGE via
get_incremental_merge_sql, not delete+insert as the comment claimed.
README documented dbt_sqlserver_use_default_schema_concat twice with
conflicting flags-vs-vars guidance; merged into one section matching
the code (behavior flag primary, vars fallback).
black/isort pre-commit hooks targeted py39 while requires-python is
>=3.10 and the manual black-check already used py310; aligned both.
The clean target had a .PHONY declaration and recipe lines but no
'clean:' rule line, so 'make clean' did nothing.
@joshmarkovic joshmarkovic force-pushed the fix/audit-quick-wins branch from c479a29 to 1b56f99 Compare June 12, 2026 13:14
@joshmarkovic joshmarkovic changed the title Correctness fixes: dml-refresh data loss, catalog UDT types, LABEL escaping, dead columnstore guard, init port, misc Correctness fixes: dml-refresh data loss, LABEL escaping, dead columnstore guard, init port, misc Jun 12, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant