[FEATURE] Unified SQL language across OpenSearch, Flint and Spark SQL

**Is your feature request related to a problem?**

OpenSearch SQL is currently fragmented across two repositories and four engines. This fragmentation creates the same class of problems identified in the PPL unification effort (opensearch-project/opensearch-spark#1136): semantic divergence, engineering duplication, feature lag, and missed optimization opportunities.

| Repository | Engine | Scope |
|-----------|--------|-------|
| `opensearch-project/sql` | Legacy (NLPchina) | JOINs, UNIONs, subqueries |
| `opensearch-project/sql` | V2 | Simple queries, improved correctness |
| `opensearch-project/sql` | Calcite-based ANSI SQL (new) | Analytics engine integration, ANSI SQL compliance |
| `opensearch-project/opensearch-spark` | Spark SQL + Flint SQL | Spark queries, secondary index DDL |

> A gap analysis that runs the full V2 + Legacy integration test suite (807 queries) against the new Calcite engine shows a 67.9% pass rate, with 259 failures across 13 categories. This quantifies the distance between the current engines and a unified SQL baseline. See https://github.com/opensearch-project/sql/issues/5346#issuecomment-4247257399 below for the full breakdown.

**What solution would you like?**

Define a **Unified SQL Language** specification — an ANSI SQL core with internal language extensions that model capabilities such as custom identifier/parsing rules, relevance search and Flint DDL — and implement it once in the Calcite-based engine for reuse across both the OpenSearch and Spark.

```
Unified SQL Language
├── ANSI SQL Core
├── Search Extension (full-text search, vector search, identifier rules)
└── Secondary Index Extension (Flint DDL)
```

**Phase 1: Analytics engine integration**
- Core DQL — SELECT, subqueries, CTEs, window functions
- Explain support
- Full-text search functions

**Phase 2: Full language unification**
- V2/Legacy backward compatibility (identifier rules, type mappings, syntax gaps)
- Vector search (#5320)
- Flint DDL (skipping index, covering index, materialized view)
- Spark SQL alignment (via ANSI mode)
- Others: prepared statement, cursor/pagination

> See https://github.com/opensearch-project/sql/issues/5346#issuecomment-4247262252 below for the full design and roadmap.

**What alternatives have you considered?**

- Keep all engines permanently: Maintenance burden of 4 code paths, inconsistent behavior, no convergence.
- Strict ANSI SQL only: Breaks existing users who depend on full-text search functions, backtick quoting, and OpenSearch identifier conventions
- Enable V2 SQL: Since SQL V2 and PPL V3 share the same AST abstraction, it could be routed through the existing `CalciteRelNodeVisitor` as PPL V3. However, this bypasses Calcite built-in logical optimizations, ANSI SQL features like CTEs etc.

**Do you have any additional context?**

- Related issue: #5248 (SQL query support for Analytics engine integration)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEATURE] Unified SQL language across OpenSearch, Flint and Spark SQL #5346

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Repository	Engine	Scope
`opensearch-project/sql`	Legacy (NLPchina)	JOINs, UNIONs, subqueries
`opensearch-project/sql`	V2	Simple queries, improved correctness
`opensearch-project/sql`	Calcite-based ANSI SQL (new)	Analytics engine integration, ANSI SQL compliance
`opensearch-project/opensearch-spark`	Spark SQL + Flint SQL	Spark queries, secondary index DDL

[FEATURE] Unified SQL language across OpenSearch, Flint and Spark SQL #5346

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions