feat: add elementary_extra_indexes config var for custom index injection#1019
Conversation
Add a new get_indexes_for_model macro that merges base indexes with extra indexes passed via the elementary_extra_indexes dbt var. This allows downstream consumers (e.g. Elementary Cloud) to inject additional indexes without modifying the OSS package. The macro deduplicates indexes by sorted column set to prevent duplicate index creation when base and extra indexes overlap. Updated all models with postgres indexes to use get_indexes_for_model: - dbt_run_results - elementary_test_results - data_monitoring_metrics - dbt_source_freshness_results - test_result_rows Co-Authored-By: Itamar Hartstein <haritamar@gmail.com>
🤖 Devin AI EngineerI'll be helping with this pull request! Here's what you should know: ✅ I will automatically:
Note: I can only respond to comments from users who have write access to this repository. ⚙️ Control Options:
|
|
👋 @haritamar |
|
Warning Review limit reached
More reviews will be available in 6 minutes and 58 seconds. Learn how PR review limits work. Your organization has run out of usage credits. Purchase more in the billing tab. ⌛ How to resolve this issue?After more reviews become available, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available. Please see our Fair Usage Limits Policy for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
📝 WalkthroughWalkthroughAdds a default ChangesIndex Configuration Refactor
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
🧹 Nitpick comments (3)
macros/edr/system/system_utils/get_indexes_for_model.sql (2)
4-4: 💤 Low valueUse
elementary.get_config_varfor consistency with the new default config key.You added
elementary_extra_indexestodefault__get_default_config(), but here it's read via rawvar(...), bypassing the dispatched default config and runtime-config plumbing. Functionally equivalent today (default is{}), but reading it through the config accessor keeps a single source of truth.♻️ Suggested change
- {% set extra_indexes_config = var("elementary_extra_indexes", {}) %} + {% set extra_indexes_config = elementary.get_config_var("elementary_extra_indexes") %}🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@macros/edr/system/system_utils/get_indexes_for_model.sql` at line 4, Replace the direct var(...) call with the config accessor to honor the new default config plumbing: change the assignment of extra_indexes_config (currently using var("elementary_extra_indexes", {})) to use elementary.get_config_var("elementary_extra_indexes", {}) so the macro get_indexes_for_model reads the value through the centralized config accessor rather than raw var().
10-24: 💤 Low valueClarify index dedup/override semantics so
unique/typearen’t ignored when column sets match.
macros/edr/system/system_utils/get_indexes_for_model.sql(lines 10-24) dedups byidx.columns | sort | join(","), so entries that differ only byunique/type(or by column order) are silently dropped;base_indexesalways wins for an identical column set.macros/commands/create_indexes.sqlalso only checkscolumnswhen deciding whether to create a new index, so a requestedunique/typechange can be skipped if any index with the same column set already exists.- If an
elementary_extra_indexesentry is missingcolumns, dbt compilation will fail because the macro unconditionally readsidx.columns.Document the “dedup by column set; base wins” precedence (or update the dedup/existence key to also consider
unique/type), and optionally guard/validate extras requirecolumns.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@macros/edr/system/system_utils/get_indexes_for_model.sql` around lines 10 - 24, The current index-merge logic in get_indexes_for_model.sql (looping base_indexes and extra_indexes, tracking seen_column_sets and appending to merged_indexes) dedups solely by sorted column set which ignores differing attributes like unique and type and also assumes idx.columns exists; update the dedup/existence key to include attribute equality (e.g., include idx.unique and idx.type alongside the sorted, order-normalized columns) so extra_indexes with different unique/type are not silently dropped (and ensure column order is normalized via sort/join), and add a validation/guard for elementary_extra_indexes entries so you check idx.columns exists before reading it (or raise a clear error); mirror the same comparison change in create_indexes.sql where existence is checked so creation respects unique/type differences.models/edr/run_results/test_result_rows.sql (1)
1-1: 💤 Low valueStale comment. The Postgres-only gating now lives in
get_indexes_for_model, so this top-of-file note is redundant and could be removed (or trimmed) to avoid implying the model still does inline warehouse gating.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@models/edr/run_results/test_result_rows.sql` at line 1, Remove or trim the stale top-of-file comment in models/edr/run_results/test_result_rows.sql that states "indexes are not supported in all warehouses, relevant to postgres only" since warehouse-specific gating is handled in get_indexes_for_model; update the comment to reflect current behavior (or delete it) so it no longer implies inline warehouse gating in this file and instead points readers to get_indexes_for_model for Postgres-specific logic.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@macros/edr/system/system_utils/get_indexes_for_model.sql`:
- Line 4: Replace the direct var(...) call with the config accessor to honor the
new default config plumbing: change the assignment of extra_indexes_config
(currently using var("elementary_extra_indexes", {})) to use
elementary.get_config_var("elementary_extra_indexes", {}) so the macro
get_indexes_for_model reads the value through the centralized config accessor
rather than raw var().
- Around line 10-24: The current index-merge logic in get_indexes_for_model.sql
(looping base_indexes and extra_indexes, tracking seen_column_sets and appending
to merged_indexes) dedups solely by sorted column set which ignores differing
attributes like unique and type and also assumes idx.columns exists; update the
dedup/existence key to include attribute equality (e.g., include idx.unique and
idx.type alongside the sorted, order-normalized columns) so extra_indexes with
different unique/type are not silently dropped (and ensure column order is
normalized via sort/join), and add a validation/guard for
elementary_extra_indexes entries so you check idx.columns exists before reading
it (or raise a clear error); mirror the same comparison change in
create_indexes.sql where existence is checked so creation respects unique/type
differences.
In `@models/edr/run_results/test_result_rows.sql`:
- Line 1: Remove or trim the stale top-of-file comment in
models/edr/run_results/test_result_rows.sql that states "indexes are not
supported in all warehouses, relevant to postgres only" since warehouse-specific
gating is handled in get_indexes_for_model; update the comment to reflect
current behavior (or delete it) so it no longer implies inline warehouse gating
in this file and instead points readers to get_indexes_for_model for
Postgres-specific logic.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 2036b863-d7a3-464c-a3b9-70e641dc3d52
📒 Files selected for processing (7)
macros/edr/system/system_utils/get_config_var.sqlmacros/edr/system/system_utils/get_indexes_for_model.sqlmodels/edr/data_monitoring/data_monitoring/data_monitoring_metrics.sqlmodels/edr/dbt_artifacts/dbt_run_results.sqlmodels/edr/run_results/dbt_source_freshness_results.sqlmodels/edr/run_results/elementary_test_results.sqlmodels/edr/run_results/test_result_rows.sql
Address CodeRabbit review feedback: - Use elementary.get_config_var for consistency with centralized config - Remove stale warehouse-gating comment from test_result_rows.sql Co-Authored-By: Itamar Hartstein <haritamar@gmail.com>
| {% macro get_indexes_for_model(model_name, base_indexes) %} | ||
| {%- if target.type != "postgres" -%} {{ return([]) }} {%- endif -%} | ||
|
|
||
| {% set extra_indexes_config = elementary.get_config_var( |
There was a problem hiding this comment.
Good catch — switched to elementary.get_config_var("elementary_extra_indexes") in d4a280c.
| {% set extra_indexes = extra_indexes_config.get(model_name, []) %} | ||
|
|
||
| {% set seen_column_sets = [] %} | ||
| {% set merged_indexes = [] %} |
There was a problem hiding this comment.
Intentionally keeping dedup by column set only. The current use case is purely adding simple B-tree indexes (no unique/type variants), and the create_indexes.sql macro also only checks column sets when deciding whether to create. Adding unique/type to the dedup key would be over-engineering for the current need. If that changes in the future, we can extend it then.
| @@ -0,0 +1,29 @@ | |||
| {% macro get_indexes_for_model(model_name, base_indexes) %} | |||
| {%- if target.type != "postgres" -%} {{ return([]) }} {%- endif -%} | |||
There was a problem hiding this comment.
Nit: now that the Postgres gating moved from the individual model files into this macro, it would be good to document why — something like:
{# dbt's indexes config is Postgres-only. Returns [] for all other targets. #}| {% set extra_indexes_config = elementary.get_config_var( | ||
| "elementary_extra_indexes" | ||
| ) %} | ||
| {% set extra_indexes = extra_indexes_config.get(model_name, []) %} |
There was a problem hiding this comment.
Nit: if a user passes an entry missing the columns key (e.g. a typo like {"column": ["created_at"]}), idx.columns will cause a cryptic Jinja compilation error. A small guard would make the error much clearer:
{% for idx in extra_indexes %}
{% if idx.columns is not defined %}
{{ exceptions.raise_compiler_error(
"elementary_extra_indexes entry for '" ~ model_name ~ "' is missing 'columns': " ~ idx
) }}
{% endif %}
...
{% endfor %}Not a blocker — the compile error would still point to the right area — but would improve DX for anyone configuring this var.
There was a problem hiding this comment.
Good suggestion — added the guard in 1510b03. It now raises a clear compiler error pointing to the model name and the offending entry.
| ) %} | ||
| {% set extra_indexes = extra_indexes_config.get(model_name, []) %} | ||
|
|
||
| {% set seen_column_sets = [] %} |
There was a problem hiding this comment.
Nit: the dedup logic only considers the sorted column set, so an extra index with {"columns": ["id"], "unique": true} would be silently dropped if the base already has {"columns": ["id"]}. I see Devin's response that this is intentional for now — makes sense. A one-line comment would help future readers understand the precedence:
{# Dedup by sorted column set only. Base indexes take precedence on conflicts. #}There was a problem hiding this comment.
Added the comment in 1510b03 — {# Dedup by sorted column set only. Base indexes take precedence on conflicts. #}.
MikaKerman
left a comment
There was a problem hiding this comment.
Looks good! A few minor nits posted as inline comments — none are blockers.
Co-Authored-By: Itamar Hartstein <haritamar@gmail.com>
Summary
Adds a new
elementary_extra_indexesdbt var (default{}) that lets downstream consumers inject additional PostgreSQL indexes into Elementary models without modifying this package.New macro:
elementary.get_indexes_for_model(model_name, base_indexes)— merges the model's built-in indexes with any extras fromelementary_extra_indexes, deduplicating by sorted column set. Returns[]on non-postgres targets.Usage — pass a dict mapping model names to index configs via
--vars:All 5 models with postgres indexes now use
get_indexes_for_model:dbt_run_results,elementary_test_results,data_monitoring_metrics,dbt_source_freshness_results,test_result_rows.Motivation: Elementary Cloud needs cloud-specific indexes (e.g.
(created_at)for time-range queries) that don't belong in the OSS defaults. This avoids polluting the base package while keeping index definitions co-located with models."Link to Devin session: https://app.devin.ai/sessions/278ceddbe6d24dbea36a83ecc232f3b6
Requested by: @haritamar
Summary by CodeRabbit