Remove DatetimeOutputCastRule on the analytics-engine route (#5420)#5454
Remove DatetimeOutputCastRule on the analytics-engine route (#5420)#5454mengweieric wants to merge 2 commits into
Conversation
PR Reviewer Guide 🔍(Review updated until commit c5aa1d5)Here are some key observations to aid the review process:
|
bfc54ca to
cf16199
Compare
|
Persistent review updated to latest commit cf16199 |
dai-chen
left a comment
There was a problem hiding this comment.
Thanks for the fix! Please check the CI failure.
|
Persistent review updated to latest commit b5e8976 |
b5e8976 to
aea12b6
Compare
|
Persistent review updated to latest commit aea12b6 |
aea12b6 to
c5aa1d5
Compare
|
Persistent review updated to latest commit c5aa1d5 |
PR Code Suggestions ✨Explore these optional code suggestions:
|
c5aa1d5 to
7c96e13
Compare
dai-chen
left a comment
There was a problem hiding this comment.
Add some minor comments. Thanks for the fix!
|
CI seems failing on spotless check |
7c96e13 to
16c51d7
Compare
…ch-project#5420) On the analytics-engine route, every datetime root column is wrapped in `CAST(<DATE/TIME/TIMESTAMP> AS VARCHAR)` by `DatetimeOutputCastRule`. A matching `DatetimeOutputCastRewriter` on the engine side then translates that cast into DataFusion's `to_char` extension. Whenever the rewriter's format string and the PPL formatter disagree (e.g. trailing `Z`, `T` separator), users see wire-format divergence — issue opensearch-project#5420. Drop the cast rule and let the engine return real datetime cells. The PPL response pipeline already handles datetime → string conversion natively at the formatter layer: - `AnalyticsExecutionEngine.convertRows` feeds engine cells (`LocalDateTime` / `LocalDate` / `LocalTime`) through `ExprValueUtils.fromObjectValue`, which produces `ExprTimestampValue` / `ExprDateValue` / `ExprTimeValue`. - Their `value()` methods render the documented PPL space-separated format (`yyyy-MM-dd HH:mm:ss[.SSSSSSSSS]` etc.). The companion change in `opensearch-project/OpenSearch` removes the matching `DatetimeOutputCastRewriter` and its callsites. - Delete `DatetimeOutputCastRule`. - Drop the unused `UdtMapping.isDatetimeType` helper. - Rewrite the four cast-shape assertions in `DatetimeExtensionTest` to assert the post-removal RelNode (no `CAST(... VARCHAR)` wrapper, schema reports `DATE` / `TIME` / `TIMESTAMP`). Signed-off-by: Eric Wei <mengwei.eric@gmail.com>
…h-project#5420 Wire-format regression coverage for sql#5420. With DatetimeOutputCastRule deleted (sql#5454) and DatetimeOutputCastRewriter deleted (opensearch#21748), datetime root columns must reach the user as PPL's documented `yyyy-MM-dd HH:mm:ss[.SSSSSSSSS]` format with typed schema labels (`timestamp` / `date` / `time`, never `string`) on the analytics-engine route. The IT skips cleanly when `-Dtests.analytics.parquet_indices=true` is not set — Calcite-legacy was never affected by sql#5420 and asserting the same contract on it is duplicative noise. Coverage: - Wire-format round trip (typed schema + space-separator value) on TIMESTAMP / DATE / TIME root columns, plus eval-derived TIMESTAMP and `min(ts)` aggregation. - Datetime processing inside AE (parsing for WHERE comparison, scalar extract functions year/month/day/hour, ORDER BY). - Nanosecond precision preservation via `date_nanos`. - Aggregation beyond min(): max(ts), dc(ts). Each test asserts the query routes to AE (LogicalTableScan with lowercase `opensearch`) before checking wire format, so a future regression that silently routes to Calcite-legacy can't leave the contract green by accident. Signed-off-by: Eric Wei <mengwei.eric@gmail.com>
16c51d7 to
15e883e
Compare
Problem
On the analytics-engine route, every datetime root column is wrapped in
CAST(<DATE/TIME/TIMESTAMP> AS VARCHAR)byDatetimeOutputCastRule. A matchingDatetimeOutputCastRewriteron the engine side then translates that cast into DataFusion'sto_charextension. Whenever the rewriter's format string and the PPL formatter disagree (e.g. trailingZ,Tseparator), users see wire-format divergence — issue #5420.Solution
Drop the cast rule and let the engine return real datetime cells. The PPL response pipeline already handles datetime → string conversion natively at the formatter layer:
AnalyticsExecutionEngine.convertRowsfeeds engine cells (LocalDateTime/LocalDate/LocalTime) throughExprValueUtils.fromObjectValue, which producesExprTimestampValue/ExprDateValue/ExprTimeValue.value()methods render the documented PPL space-separated format (yyyy-MM-dd HH:mm:ss[.SSSSSSSSS]etc.).The companion change in opensearch-project/OpenSearch#21748 removes the matching
DatetimeOutputCastRewriterand its callsites.Changes
DatetimeOutputCastRule.UdtMapping.isDatetimeTypehelper.DatetimeExtensionTestto assert the post-removal RelNode (noCAST(... VARCHAR)wrapper, schema reportsDATE/TIME/TIMESTAMP).CalciteAnalyticsDatetimeWireFormatITas the wire-format regression net.Verification
1. Unit tests — RelNode shape + return-type assertions
api/src/test/java/org/opensearch/sql/api/spec/datetime/DatetimeExtensionTest.java(PPL V3 parser path) covers:testUdfResultNormalizedDATE/TIME/TIMESTAMPUDFs produce standard Calcite types, noCAST(... VARCHAR)wrappertestNestedUdfCallsNormalizedDATEDIFF(DATE(name), DATE(name)))testDateLiteralNormalizedDATE('2024-01-01')returnsDATE, notVARCHARtestFilterWithTimestampLiteralWHERE created_at > "..."keepsTIMESTAMPend-to-endtestComparisonWithDatetimeUdfDATE→TIMESTAMP, neverVARCHARtestStandardDatetimeFieldsNotWrappedtestNonDatetimeFieldsNotWrappedtestDatetimeFieldsPreserveStandardTypesResultSetMetaDatareportsjava.sql.Types.DATE/TIME/TIMESTAMP, notVARCHARDatetimeExtensionSqlTestcovers the same shape on the SQL V2 parser path (3 cases).CalcitePlannerConcurrencyIT(added in #5458) is preserved — itstestSequentialPlanCallsDoNotCorruptShuttleStackcase is a regression guard for the singleton-shuttle bug and exercises the post-analysis pipeline that this PR slims down.2. Integration test —
CalciteAnalyticsDatetimeWireFormatITinteg-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteAnalyticsDatetimeWireFormatIT.javaprovisions a parquet-backed index on the force-routed analytics-engine cluster and asserts:timestamp/date/time(neverstring).2024-03-15 10:30:00), never ISOT-separator or trailingZ.3. Curl wire-format probes (manual, pre-PR)
Driver:
/tmp/ae-curl-verify/run-verify.pyagainst a force-routed runTask cluster onparquet_wire_format_dt. Coverage:_explainshowsLogicalTableScan+ lowercaseopensearch(AE route) on the parquet index, andCalciteLogicalIndexScan+ capitalOpenSearchon the legacy probe index, confirming routing is binary.CAST(... VARCHAR)—_explainoffields ts,fields d, t, andsort ts | fields tscontain neitherCAST(... VARCHAR)norto_char.min/max/stats,>filter,sort, nanosecond precision,dc(). Each asserts schema label and PPL space-separator value.where+fields,head+fields,stats min/max,group by date,dc(d), multi-column projection sorted byts, nanos>filter.All 18 wire-format probes return
timestamp/date/timeschema labels with space-separated values; noTseparator, no trailingZ, nostringschema labels.4. Local CI gates (run before each push)
:api:test --tests DatetimeExtensionTest→ 8/8 pass:api:test --tests DatetimeExtensionSqlTest→ 3/3 pass:api:spotlessJavaCheck:integ-test:compileTestJavaTest plan
DatetimeExtensionTest,DatetimeExtensionSqlTest)CalciteAnalyticsDatetimeWireFormatITshape:integ-test:integTestRemoteagainst force-routed AE clusterFixes #5420