Skip to content

UnifiedQueryPlanner should preserve original SQL expression names instead of EXPR$N #5332

@vamsimanohar

Description

@vamsimanohar

Problem

When executing SQL queries with unnamed aggregate expressions through UnifiedQueryPlanner, Calcite's default column naming produces EXPR$0, EXPR$1, etc. instead of the original SQL expression text.

Example:

SELECT COUNT(*) FROM hits

Returns column name EXPR$0 instead of count(*).

SELECT SUM(advengineid), COUNT(*), AVG(resolutionwidth) FROM hits

Returns EXPR$0, EXPR$1, EXPR$2 instead of sum(advengineid), count(*), avg(resolutionwidth).

Note: Explicitly aliased expressions work correctly — SELECT COUNT(*) AS cnt returns cnt.

Expected Behavior

Match PostgreSQL/MySQL convention — use the original SQL expression text as the column name when no alias is provided:

Engine SELECT COUNT(*) column name
PostgreSQL count
MySQL COUNT(*)
DuckDB count_star()
Spark SQL count(1)
Trino/Presto _col0
Current (Calcite) EXPR$0

Root Cause

In UnifiedQueryPlanner.plan(), the flow is:

  1. Parse SQL → SqlNode (has original expression text like COUNT(*))
  2. Validate → SqlNode (still has expression text)
  3. Convert to RelNodeCalcite replaces unnamed expressions with EXPR$N

The original expression text is available in the SqlNode (specifically SqlSelect.getSelectList()) but is lost during SqlToRelConverter.

Proposed Fix

In UnifiedQueryPlanner, after parsing but before/after converting to RelNode:

  1. Extract the select-list expression names from the SqlNode (e.g., SqlCall.toString() gives COUNT(*))
  2. After SqlToRelConverter produces the RelNode, rename any EXPR$N fields back to the original expression text (lowercased to match PostgreSQL convention)

This could be done by wrapping the resulting RelNode in a LogicalProject that renames the fields, or by overriding the RelDataType field names.

Context

This affects the lakehouse SQL endpoint (_lakehouse/sql) in the analytics-engine plugin. The UnifiedQueryService calls UnifiedQueryPlanner.plan() and uses RelNode.getRowType().getFieldNames() to get column names for the response. Since UnifiedQueryPlanner is in the org.opensearch.sql.api package (compiled dependency in dsl-query-executor), downstream consumers cannot fix this without modifying the planner.

Workaround

Downstream consumers can post-process column names (e.g., replace EXPR$N with _colN), but this loses the original expression text.

Metadata

Metadata

Assignees

Labels

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions