Skip to content

Remove DatetimeOutputCastRule on the analytics-engine route (#5420)#5454

Open
mengweieric wants to merge 2 commits into
opensearch-project:mainfrom
mengweieric:experiment/option-a-no-cast-rule
Open

Remove DatetimeOutputCastRule on the analytics-engine route (#5420)#5454
mengweieric wants to merge 2 commits into
opensearch-project:mainfrom
mengweieric:experiment/option-a-no-cast-rule

Conversation

@mengweieric
Copy link
Copy Markdown
Collaborator

@mengweieric mengweieric commented May 19, 2026

Problem

On the analytics-engine route, every datetime root column is wrapped in CAST(<DATE/TIME/TIMESTAMP> AS VARCHAR) by DatetimeOutputCastRule. A matching DatetimeOutputCastRewriter on the engine side then translates that cast into DataFusion's to_char extension. Whenever the rewriter's format string and the PPL formatter disagree (e.g. trailing Z, T separator), users see wire-format divergence — issue #5420.

Solution

Drop the cast rule and let the engine return real datetime cells. The PPL response pipeline already handles datetime → string conversion natively at the formatter layer:

  • AnalyticsExecutionEngine.convertRows feeds engine cells (LocalDateTime / LocalDate / LocalTime) through ExprValueUtils.fromObjectValue, which produces ExprTimestampValue / ExprDateValue / ExprTimeValue.
  • Their value() methods render the documented PPL space-separated format (yyyy-MM-dd HH:mm:ss[.SSSSSSSSS] etc.).

The companion change in opensearch-project/OpenSearch#21748 removes the matching DatetimeOutputCastRewriter and its callsites.

Changes

  • Delete DatetimeOutputCastRule.
  • Drop the unused UdtMapping.isDatetimeType helper.
  • Rewrite the four cast-shape assertions in DatetimeExtensionTest to assert the post-removal RelNode (no CAST(... VARCHAR) wrapper, schema reports DATE / TIME / TIMESTAMP).
  • Add CalciteAnalyticsDatetimeWireFormatIT as the wire-format regression net.

Verification

1. Unit tests — RelNode shape + return-type assertions

api/src/test/java/org/opensearch/sql/api/spec/datetime/DatetimeExtensionTest.java (PPL V3 parser path) covers:

Test What it pins
testUdfResultNormalized DATE/TIME/TIMESTAMP UDFs produce standard Calcite types, no CAST(... VARCHAR) wrapper
testNestedUdfCallsNormalized UDT normalization survives through nested calls (DATEDIFF(DATE(name), DATE(name)))
testDateLiteralNormalized Literal DATE('2024-01-01') returns DATE, not VARCHAR
testFilterWithTimestampLiteral WHERE created_at > "..." keeps TIMESTAMP end-to-end
testComparisonWithDatetimeUdf Mixed-type comparison promotes DATETIMESTAMP, never VARCHAR
testStandardDatetimeFieldsNotWrapped Direct datetime field projection — no projection wrapper at all
testNonDatetimeFieldsNotWrapped Non-datetime fields are untouched (regression guard)
testDatetimeFieldsPreserveStandardTypes End-to-end JDBC schema check — compiles + executes a plan, asserts ResultSetMetaData reports java.sql.Types.DATE / TIME / TIMESTAMP, not VARCHAR

DatetimeExtensionSqlTest covers the same shape on the SQL V2 parser path (3 cases).

CalcitePlannerConcurrencyIT (added in #5458) is preserved — its testSequentialPlanCallsDoNotCorruptShuttleStack case is a regression guard for the singleton-shuttle bug and exercises the post-analysis pipeline that this PR slims down.

2. Integration test — CalciteAnalyticsDatetimeWireFormatIT

integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteAnalyticsDatetimeWireFormatIT.java provisions a parquet-backed index on the force-routed analytics-engine cluster and asserts:

  • Schema labels are timestamp / date / time (never string).
  • Row values are PPL space-separated (2024-03-15 10:30:00), never ISO T-separator or trailing Z.

3. Curl wire-format probes (manual, pre-PR)

Driver: /tmp/ae-curl-verify/run-verify.py against a force-routed runTask cluster on parquet_wire_format_dt. Coverage:

  • Routing proofs_explain shows LogicalTableScan + lowercase opensearch (AE route) on the parquet index, and CalciteLogicalIndexScan + capital OpenSearch on the legacy probe index, confirming routing is binary.
  • No CAST(... VARCHAR)_explain of fields ts, fields d, t, and sort ts | fields ts contain neither CAST(... VARCHAR) nor to_char.
  • 11 pinned wire-format queries — equality filter, projection, eval-rebind, min/max/stats, > filter, sort, nanosecond precision, dc(). Each asserts schema label and PPL space-separator value.
  • 7 broader querieswhere+fields, head+fields, stats min/max, group by date, dc(d), multi-column projection sorted by ts, nanos > filter.

All 18 wire-format probes return timestamp/date/time schema labels with space-separated values; no T separator, no trailing Z, no string schema labels.

4. Local CI gates (run before each push)

  • :api:test --tests DatetimeExtensionTest → 8/8 pass
  • :api:test --tests DatetimeExtensionSqlTest → 3/3 pass
  • :api:spotlessJavaCheck
  • :integ-test:compileTestJava

Test plan

  • UTs (DatetimeExtensionTest, DatetimeExtensionSqlTest)
  • Wire-format curl sweep (18/18 PASS)
  • CalciteAnalyticsDatetimeWireFormatIT shape
  • CI gates
  • :integ-test:integTestRemote against force-routed AE cluster

Fixes #5420

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 19, 2026

PR Reviewer Guide 🔍

(Review updated until commit c5aa1d5)

Here are some key observations to aid the review process:

🧪 PR contains tests
🔒 No security concerns identified
✅ No TODO sections
🔀 No multiple PR themes
⚡ No major issues detected

@mengweieric mengweieric force-pushed the experiment/option-a-no-cast-rule branch from bfc54ca to cf16199 Compare May 19, 2026 23:09
@github-actions
Copy link
Copy Markdown
Contributor

Persistent review updated to latest commit cf16199

Copy link
Copy Markdown
Collaborator

@dai-chen dai-chen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the fix! Please check the CI failure.

@github-actions
Copy link
Copy Markdown
Contributor

Persistent review updated to latest commit b5e8976

@mengweieric mengweieric force-pushed the experiment/option-a-no-cast-rule branch from b5e8976 to aea12b6 Compare May 21, 2026 22:57
@github-actions
Copy link
Copy Markdown
Contributor

Persistent review updated to latest commit aea12b6

@mengweieric mengweieric force-pushed the experiment/option-a-no-cast-rule branch from aea12b6 to c5aa1d5 Compare May 21, 2026 23:03
@mengweieric mengweieric added PPL Piped processing language feature skip-diff-analyzer Maintainer to skip code-diff-analyzer check, after reviewing issues in AI analysis. skip-diff-reviewer Maintainer to skip code-diff-reviewer check, after reviewing issues in AI analysis. labels May 21, 2026
@github-actions
Copy link
Copy Markdown
Contributor

Persistent review updated to latest commit c5aa1d5

@github-actions
Copy link
Copy Markdown
Contributor

PR Code Suggestions ✨

Explore these optional code suggestions:

CategorySuggestion                                                                                                                                    Impact
General
Make routing assertion more robust

The assertion checks for a hardcoded lowercase string "opensearch" in the explain
output. If the index name or table reference format changes, this check will fail.
Consider making the assertion more robust by checking for the presence of
LogicalTableScan separately from the table name pattern.

integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteAnalyticsDatetimeWireFormatIT.java [101-105]

 String explained = explainQueryToString(query);
 Assert.assertTrue(
-    "Expected analytics-engine route (LogicalTableScan + lowercase 'opensearch'), got: "
-        + explained,
-    explained.contains("LogicalTableScan(table=[[opensearch,"));
+    "Expected analytics-engine route with LogicalTableScan, got: " + explained,
+    explained.contains("LogicalTableScan(table=[["));
+Assert.assertTrue(
+    "Expected lowercase 'opensearch' in table reference, got: " + explained,
+    explained.toLowerCase().contains("opensearch"));
Suggestion importance[1-10]: 3

__

Why: The suggestion proposes splitting the assertion into two separate checks, which could provide slightly more granular error messages. However, the original assertion is already clear and the improvement is marginal. The use of toLowerCase() in the improved version could actually weaken the test by accepting uppercase variations.

Low

dai-chen
dai-chen previously approved these changes May 21, 2026
Copy link
Copy Markdown
Collaborator

@dai-chen dai-chen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add some minor comments. Thanks for the fix!

Comment thread api/src/test/java/org/opensearch/sql/api/spec/datetime/DatetimeExtensionTest.java Outdated
@dai-chen
Copy link
Copy Markdown
Collaborator

CI seems failing on spotless check

@mengweieric mengweieric force-pushed the experiment/option-a-no-cast-rule branch from 7c96e13 to 16c51d7 Compare May 22, 2026 00:55
…ch-project#5420)

On the analytics-engine route, every datetime root column is wrapped in
`CAST(<DATE/TIME/TIMESTAMP> AS VARCHAR)` by `DatetimeOutputCastRule`.
A matching `DatetimeOutputCastRewriter` on the engine side then translates
that cast into DataFusion's `to_char` extension. Whenever the rewriter's
format string and the PPL formatter disagree (e.g. trailing `Z`, `T`
separator), users see wire-format divergence — issue opensearch-project#5420.

Drop the cast rule and let the engine return real datetime cells.
The PPL response pipeline already handles datetime → string conversion
natively at the formatter layer:
- `AnalyticsExecutionEngine.convertRows` feeds engine cells
  (`LocalDateTime` / `LocalDate` / `LocalTime`) through
  `ExprValueUtils.fromObjectValue`, which produces
  `ExprTimestampValue` / `ExprDateValue` / `ExprTimeValue`.
- Their `value()` methods render the documented PPL space-separated
  format (`yyyy-MM-dd HH:mm:ss[.SSSSSSSSS]` etc.).

The companion change in `opensearch-project/OpenSearch` removes the
matching `DatetimeOutputCastRewriter` and its callsites.

- Delete `DatetimeOutputCastRule`.
- Drop the unused `UdtMapping.isDatetimeType` helper.
- Rewrite the four cast-shape assertions in `DatetimeExtensionTest`
  to assert the post-removal RelNode (no `CAST(... VARCHAR)` wrapper,
  schema reports `DATE` / `TIME` / `TIMESTAMP`).

Signed-off-by: Eric Wei <mengwei.eric@gmail.com>
…h-project#5420

Wire-format regression coverage for sql#5420. With DatetimeOutputCastRule
deleted (sql#5454) and DatetimeOutputCastRewriter deleted
(opensearch#21748), datetime root columns must reach the user as PPL's
documented `yyyy-MM-dd HH:mm:ss[.SSSSSSSSS]` format with typed schema
labels (`timestamp` / `date` / `time`, never `string`) on the
analytics-engine route.

The IT skips cleanly when `-Dtests.analytics.parquet_indices=true` is
not set — Calcite-legacy was never affected by sql#5420 and asserting
the same contract on it is duplicative noise.

Coverage:
- Wire-format round trip (typed schema + space-separator value) on
  TIMESTAMP / DATE / TIME root columns, plus eval-derived TIMESTAMP and
  `min(ts)` aggregation.
- Datetime processing inside AE (parsing for WHERE comparison, scalar
  extract functions year/month/day/hour, ORDER BY).
- Nanosecond precision preservation via `date_nanos`.
- Aggregation beyond min(): max(ts), dc(ts).

Each test asserts the query routes to AE (LogicalTableScan with
lowercase `opensearch`) before checking wire format, so a future
regression that silently routes to Calcite-legacy can't leave the
contract green by accident.

Signed-off-by: Eric Wei <mengwei.eric@gmail.com>
@mengweieric mengweieric force-pushed the experiment/option-a-no-cast-rule branch from 16c51d7 to 15e883e Compare May 22, 2026 01:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

feature PPL Piped processing language skip-diff-analyzer Maintainer to skip code-diff-analyzer check, after reviewing issues in AI analysis. skip-diff-reviewer Maintainer to skip code-diff-reviewer check, after reviewing issues in AI analysis.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Datetime output cast format diverges between Calcite and DataFusion engines

4 participants