Skip to content

Commit caf5f19

Browse files
composition-of-indirects (#636)
Summary: - Support for **limited** composition of indirects, including `view`, `materialized view`, subqueries and user space tables. - Uplift compound query string splitting from naive prior state to instead use existing parser function `SplitStatementToPieces()`. - View documentation updated in `docs/views.md`. - Added robot test `View Depth Limitation Error Message Shows Correct Max`. - Added robot test `View JOIN View Returns Results`. - Added robot test `View JOIN Provider Table Returns Results`. - Added robot test `Subquery JOIN Subquery Returns Results`. - Added robot test `CTE Within View Returns Results`. - Added robot test `Shell Session Multiple Statements Inline`. - Added robot test `Shell Session Multi Line Then Multi Statement`. - Added robot test `View JOIN Materialized View Returns Results`. - Added robot test `View JOIN Subquery Returns Results`.
1 parent 4963566 commit caf5f19

14 files changed

Lines changed: 440 additions & 29 deletions

File tree

.github/workflows/build.yml

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1388,6 +1388,21 @@ jobs:
13881388
if: always()
13891389
run: |
13901390
cat ./test/robot/reports/output.xml
1391+
1392+
- name: Upload Robot Reports Artifact
1393+
uses: actions/upload-artifact@v4.3.1
1394+
if: always() && github.repository == 'stackql/stackql-devel'
1395+
with:
1396+
name: stackql_darwin_amd64_robot_reports
1397+
path: test/robot/reports
1398+
1399+
1400+
- name: Upload Robot Tmp Artifact
1401+
uses: actions/upload-artifact@v4.3.1
1402+
if: always() && github.repository == 'stackql/stackql-devel'
1403+
with:
1404+
name: stackql_darwin_amd64_robot_tmp
1405+
path: test/robot/functional/tmp
13911406

13921407
- name: Run robot integration tests
13931408
if: env.AZURE_CLIENT_SECRET != '' && startsWith(env.STATE_SOURCE_TAG, 'build-release')

.vscode/launch.json

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -198,6 +198,7 @@
198198
"select lhs.proj, lhs.bucket from (select 'testing-project' as proj, 'silly-bucket' as bucket) lhs LEFT OUTER join (select name from google.storage.buckets where project = 'testing-project') rhs on lhs.bucket = rhs.name where rhs.name;",
199199
"insert into google.storage.buckets( project, data__name) select lhs.proj, lhs.bucket from (select 'testing-project' as proj, 'silly-bucket' as bucket) lhs LEFT OUTER join (select name from google.storage.buckets where project = 'testing-project') rhs on lhs.bucket = rhs.name where rhs.name is null returning *;",
200200
"select description, price_monthly, price_hourly from digitalocean.sizes.sizes where price_monthly = 7.0 order by description desc;",
201+
"create or replace view vw_repos_name as select name from stackql_repositories; create or replace view vw_repos_url as select name, url from stackql_repositories; select v1.name from vw_repos_name v1 inner join vw_repos_url v2 on v1.name = v2.name;",
201202
],
202203
"default": "show providers;"
203204
},

docs/data_flow.md

Lines changed: 46 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,46 @@
1+
2+
# Data flow analysis in stackql
3+
4+
Data flow analysis is impplmented as multiple passes on:
5+
6+
- An inital abstract syntax tree (AST) from the parser.
7+
- Annotated derivatives of the AST.
8+
- `any-sdk` `{ provider, service, resource, method, schema... }` graphs.
9+
- `gonum` DAG adaptations with data flow dependencies representing edges.
10+
11+
Some other aspects of data flow analysis:
12+
13+
- Relational algebra is implemented in a coupled RDBMS (embedded `sqlite` or `postgres` over TCP). There is a query rewriting process to stringify "containers" for this.
14+
- There are `transaction control counter` objects and corresponding RDBMS columns to bound relational algebra "containers" and future proof for gargage collection. Some mutex protection is in place.
15+
- Views in `stackql` permit clobbering of where clause arguments from outside the view. The canonical case is a document-based view in a provider document. A good example are in [test/registry/src/aws/v0.1.0/services/pseudo_s3.yaml](/test/registry/src/aws/v0.1.0/services/pseudo_s3.yaml)at `...s3_bucket_list_and_detail.config.views.select`; one can overwrite `region` here.
16+
- Views, subqueries, materialized views and user space tables are modelled as "indirections".
17+
18+
19+
## Open Issues
20+
21+
## Indirection Data Flow Analysis and Query Execution
22+
23+
Data flow analysis for indirections is not composable:
24+
25+
- It it impossible to join heterogenous collections of these with each other or conventional resources. There is no recusrsive and stable data flow analysis.
26+
- While `stackql` does have a `max depth` parameter, I do not believe it is stable enfoced eagerly. Ie: queries too complex should fail at analysis time. Cannot remember param name of=r default.
27+
28+
The expected fix for this issue:
29+
30+
- Joins, unions etc on indirections work to arbitrary and configurable depth. For depth violations, failure is eager in the analysis phase and error message is plain and in the canonical err stream already widely used.
31+
- Data flow analysis includes assurance on reuired poarams and viability of projections, joins, etc.
32+
- Support for CTEs internal to these indirections is in place.
33+
- Mocked robot tests are added to the canonical test suite, covering off this function.
34+
35+
36+
## Glossary of terms
37+
38+
| Term | Expansion |
39+
|---|---|
40+
| AST | Abstract Syntax Tree |
41+
| CTE | Common Table Expression |
42+
| DAG | Directed Acyclic Graph |
43+
| GC | Garbage Collection |
44+
| RDBMS | Relational Database Management System |
45+
| TCP | Transmission Control Protocol |
46+
| | |

docs/views.md

Lines changed: 50 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22

33
# Views
44

5-
## *a priori*
5+
## *a priori*
66

77
At definition time, it is apparent:
88

@@ -24,22 +24,68 @@ The runtime representation of views must support:
2424
- StackQL views DDL stored in some special stackql table designated for this purpose.
2525
- Physical table name such as `__iql__.views`.
2626
- Views need not exist until the `SELECT ... FROM <view>` portion of the query is executed.
27-
This is advantagesous on RDBMS systems where view creation will fail if physical tables do not exist.
27+
This is advantageous on RDBMS systems where view creation will fail if physical tables do not exist.
2828
- We may need a layer of indirection for views to execute, wrt table names containing generation ID.
2929
Simplest option is input table name.
3030
- SQL view definitions (translated to physical tables) are stored in the RDBMS.
3131
- This implies that even quite early in analysis, it must be known that a view is being referenced.
3232
- Some part of the namespace must be reserved for these views; configurable using existing regex / template namespacing?
3333
- Quite possibly some specialised object(s) or extension of the `table` interface stages are used for view analysis and parameter routing.
3434
- Once analysis is complete:
35-
- Acquistion occurs as normal through primitive DAG.
35+
- Acquisition occurs as normal through primitive DAG.
3636
- Selection phase uses physical views.
3737

38+
## Materialized views
39+
40+
Materialized views are similar in nature to views, although eager executed and lacking in mutation of internal `WHERE` clauses from outside.
41+
42+
## User space tables
43+
44+
These map to RDBMS tables. The DDL is somewhat impaired; we imagine these are useful for staging in general and applications across: ELT, IAC.
45+
3846

3947
## Subqueries
4048

41-
Some aspects of subquery analysis and execution will be similar to views, but not all. What are the considerations for view implementation in the short term such that subsequent subquery implmentation is expedited and natural.
49+
Some aspects of subquery analysis and execution will be similar to views, but not all. What are the considerations for view implementation in the short term such that subsequent subquery implementation is expedited and natural.
4250

4351
To be continued...
4452

4553

54+
## Joins and aliasing on Views etc
55+
56+
### Views (lazy evaluated)
57+
58+
Views are rendered as inline subqueries `( SELECT ... ) AS "alias"` in the final SQL. When a user alias is provided (e.g. `FROM my_view v1`), the alias `v1` replaces the view name in the `AS` clause.
59+
60+
**Supported:**
61+
- View aliased and selected from: `SELECT * FROM my_view v1`.
62+
- View JOIN view: `SELECT ... FROM v1 INNER JOIN v2 ON ...`.
63+
- View JOIN provider table: `SELECT ... FROM my_view v1 INNER JOIN provider.svc.resource r ON ...`.
64+
- View JOIN subquery: `SELECT ... FROM my_view v1 INNER JOIN (SELECT ...) sq ON ...`.
65+
- View JOIN materialized view: `SELECT ... FROM my_view v1 INNER JOIN mv ON ...`.
66+
- Nested views (view wrapping a view): supported up to configurable depth (`--indirect-depth-max`, default 5).
67+
- WHERE clause parameter clobbering from outside the view, using **unqualified** parameters (e.g. `WHERE region = 'us-east-1'`).
68+
69+
**Not supported:**
70+
- Table-qualified parameter clobbering into views (e.g. `WHERE v1.region = 'us-east-1'` will not override the view's internal `region` parameter).
71+
- Joins of three or more heterogeneous indirections (e.g. `view JOIN subquery JOIN provider_table`). Binary joins work; three-way and beyond fail with parameter count mismatches in the SQL composition layer.
72+
73+
### Materialized views (eager evaluated)
74+
75+
Materialized views are persisted as physical tables in the RDBMS. They are referenced by their table name directly (not as inline subqueries).
76+
77+
**Supported:**
78+
- Materialized view aliased and selected from.
79+
- Materialized view joined with provider tables, user space tables, views and subqueries.
80+
- `CREATE`, `DROP`, `REFRESH`, `CREATE OR REPLACE` lifecycle.
81+
82+
**Not supported:**
83+
- WHERE clause parameter clobbering from outside (materialized views are snapshot-based).
84+
85+
### Subqueries
86+
87+
Subqueries appear as inline `( SELECT ... )` expressions. CTEs (`WITH ... AS`) are converted to subqueries at AST level and handled identically.
88+
89+
### User space tables
90+
91+
User space tables are RDBMS-resident tables created via `CREATE TABLE`. They can participate in joins with any other indirection type.

internal/stackql/acid/tsm_physio/best_effort_orchestrator.go

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -2,8 +2,8 @@ package tsm_physio //nolint:revive,stylecheck // prefer this nomenclature
22

33
import (
44
"fmt"
5-
"strings"
65

6+
"github.com/stackql/stackql-parser/go/vt/sqlparser"
77
"github.com/stackql/stackql/internal/stackql/acid/binlog"
88
"github.com/stackql/stackql/internal/stackql/acid/tsm"
99
"github.com/stackql/stackql/internal/stackql/acid/txn_context"
@@ -42,7 +42,8 @@ func (orc *bestEffortOrchestrator) processQueryOrQueries(
4242
) ([]internaldto.ExecutorOutput, bool) {
4343
var retVal []internaldto.ExecutorOutput
4444
cmdString := handlerCtx.GetRawQuery()
45-
for _, s := range strings.Split(cmdString, ";") {
45+
splitQueries, _ := sqlparser.SplitStatementToPieces(cmdString)
46+
for _, s := range splitQueries {
4647
response, hasResponse := orc.processQuery(handlerCtx, s)
4748
if hasResponse {
4849
retVal = append(retVal, response...)

internal/stackql/acid/tsm_physio/txn_orchestrator.go

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -2,9 +2,9 @@ package tsm_physio //nolint:stylecheck,revive // prefer this nomenclature
22

33
import (
44
"fmt"
5-
"strings"
65

76
"github.com/stackql/any-sdk/pkg/constants"
7+
"github.com/stackql/stackql-parser/go/vt/sqlparser"
88
"github.com/stackql/stackql/internal/stackql/acid/tsm"
99
"github.com/stackql/stackql/internal/stackql/acid/txn_context"
1010
"github.com/stackql/stackql/internal/stackql/handler"
@@ -68,7 +68,8 @@ func (orc *standardOrchestrator) processQueryOrQueries(
6868
) ([]internaldto.ExecutorOutput, bool) {
6969
var retVal []internaldto.ExecutorOutput
7070
cmdString := handlerCtx.GetRawQuery()
71-
for _, s := range strings.Split(cmdString, ";") {
71+
splitQueries, _ := sqlparser.SplitStatementToPieces(cmdString)
72+
for _, s := range splitQueries {
7273
response, hasResponse := orc.processQuery(handlerCtx, s)
7374
if hasResponse {
7475
retVal = append(retVal, response...)

internal/stackql/astanalysis/earlyanalysis/ast_expand.go

Lines changed: 19 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,6 @@ import (
44
"fmt"
55
"strings"
66

7-
"github.com/stackql/any-sdk/pkg/constants"
87
"github.com/stackql/any-sdk/pkg/logging"
98
"github.com/stackql/stackql/internal/stackql/astanalysis/annotatedast"
109
"github.com/stackql/stackql/internal/stackql/astindirect"
@@ -141,14 +140,31 @@ func (v *indirectExpandAstVisitor) processCTEReference(
141140
}
142141

143142
func (v *indirectExpandAstVisitor) processIndirect(node sqlparser.SQLNode, indirect astindirect.Indirect) error {
143+
// Eager depth check: fail before recursively analyzing an indirection that would exceed the limit.
144+
if v.indirectionDepth+1 > v.handlerCtx.GetRuntimeContext().IndirectDepthMax {
145+
return fmt.Errorf(
146+
"query error: indirection chain length %d > %d and is therefore disallowed; please do not cite views at too deep a level", //nolint:lll
147+
v.indirectionDepth+1,
148+
v.handlerCtx.GetRuntimeContext().IndirectDepthMax,
149+
)
150+
}
144151
err := indirect.Parse()
145152
if err != nil {
146153
return nil //nolint:nilerr //TODO: investigate
147154
}
155+
// Filter parent WHERE params to only pass down unqualified (alias-free) entries.
156+
// Aliased params like "r.org" reference specific outer tables and must not
157+
// leak into child indirection analysis, where the alias would be unresolvable.
158+
filteredWhereParams := parserutil.NewParameterMap()
159+
for k, val := range v.whereParams.GetMap() {
160+
if k.Alias() == "" {
161+
filteredWhereParams.Set(k, val) //nolint:errcheck // best effort
162+
}
163+
}
148164
childAnalyzer, err := NewEarlyScreenerAnalyzer(
149165
v.primitiveGenerator,
150166
v.annotatedAST,
151-
v.whereParams.Clone(),
167+
filteredWhereParams,
152168
v.indirectionDepth+1,
153169
)
154170
if err != nil {
@@ -178,7 +194,7 @@ func (v *indirectExpandAstVisitor) processIndirect(node sqlparser.SQLNode, indir
178194
return fmt.Errorf(
179195
"query error: indirection chain length %d > %d and is therefore disallowed; please do not cite views at too deep a level", //nolint:lll
180196
maxIndirectCount,
181-
constants.LimitsIndirectMaxChainLength,
197+
v.handlerCtx.GetRuntimeContext().IndirectDepthMax,
182198
)
183199
}
184200
indirectPrimitiveGenerator.GetPrimitiveComposer().GetAst()

internal/stackql/astvisit/from_rewrite.go

Lines changed: 11 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -650,6 +650,7 @@ func (v *standardFromRewriteAstVisitor) Visit(node sqlparser.SQLNode) error {
650650

651651
case *sqlparser.AliasedTableExpr:
652652
var exprStr, partitionStr string
653+
aliasHandledByIndirect := false
653654
if node.Expr != nil {
654655
anCtx, ok := v.annotations[node]
655656
if !ok {
@@ -664,9 +665,17 @@ func (v *standardFromRewriteAstVisitor) Visit(node sqlparser.SQLNode) error {
664665
indirectType := indirect.GetType()
665666
switch indirectType {
666667
case astindirect.ViewType:
667-
templateString := fmt.Sprintf(` ( %%s ) AS "%s" `, name)
668+
// Use the user-specified alias if present, otherwise the view name.
669+
// The alias is embedded in the template to prevent double aliasing
670+
// when the node.As fallthrough at the end of this case would append it again.
671+
viewAlias := name
672+
if !node.As.IsEmpty() {
673+
viewAlias = node.As.GetRawVal()
674+
}
675+
templateString := fmt.Sprintf(` ( %%s ) AS "%s" `, viewAlias)
668676
v.rewrittenQuery = templateString
669677
v.indirectContexts = append(v.indirectContexts, indirect.GetSelectContext())
678+
aliasHandledByIndirect = true
670679
case astindirect.SubqueryType:
671680
// Note: CTEs are converted to SubqueryType at AST level,
672681
// so this path handles both regular subqueries and CTEs.
@@ -726,7 +735,7 @@ func (v *standardFromRewriteAstVisitor) Visit(node sqlparser.SQLNode) error {
726735
partitionStr = v.GetRewrittenQuery()
727736
}
728737
q := fmt.Sprintf("%s%s", exprStr, partitionStr)
729-
if !node.As.IsEmpty() {
738+
if !node.As.IsEmpty() && !aliasHandledByIndirect {
730739
node.As.Accept(v)
731740
asStr := v.GetRewrittenQuery()
732741
q = fmt.Sprintf("%s as %v", q, asStr)

internal/stackql/cmd/shell.go

Lines changed: 21 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -25,6 +25,7 @@ import (
2525

2626
"github.com/stackql/any-sdk/pkg/dto"
2727
"github.com/stackql/any-sdk/pkg/logging"
28+
"github.com/stackql/stackql-parser/go/vt/sqlparser"
2829
"github.com/stackql/stackql/internal/stackql/config"
2930
"github.com/stackql/stackql/internal/stackql/driver"
3031
"github.com/stackql/stackql/internal/stackql/entryutil"
@@ -225,20 +226,27 @@ var shellCmd = &cobra.Command{
225226
if inlineCommentIdx > -1 {
226227
line = line[:inlineCommentIdx]
227228
}
228-
semiColonIdx := strings.Index(line, ";")
229-
if semiColonIdx > -1 {
230-
line = strings.TrimSpace(line[:semiColonIdx+1])
231-
subSemiColonIdx := strings.Index(line, ";")
232-
sb.WriteString(" " + line[:subSemiColonIdx+1])
233-
rawQuery := sb.String()
234-
queryToExecute, qErr := entryutil.PreprocessInline(runtimeCtx, rawQuery)
235-
if qErr != nil {
236-
io.WriteString(outErrFile, "\r\n"+qErr.Error()+"\r\n") //nolint:errcheck // TODO: investigate
229+
hasRHSSemiColon := strings.HasSuffix(strings.TrimSpace(line), ";")
230+
splitQueries, _ := sqlparser.SplitStatementToPieces(line)
231+
if len(splitQueries) > 0 {
232+
for i, s := range splitQueries {
233+
if i == len(splitQueries)-1 && !hasRHSSemiColon {
234+
// Last piece has no trailing semicolon;
235+
// accumulate for multi-line continuation.
236+
sb.Reset()
237+
sb.WriteString(s)
238+
continue
239+
}
240+
sb.WriteString(" " + s)
241+
rawQuery := sb.String()
242+
queryToExecute, qErr := entryutil.PreprocessInline(runtimeCtx, rawQuery)
243+
if qErr != nil {
244+
io.WriteString(outErrFile, "\r\n"+qErr.Error()+"\r\n") //nolint:errcheck // TODO: investigate
245+
}
246+
l.WriteToHistory(rawQuery) //nolint:errcheck // TODO: investigate
247+
sessionRunnerInstance.RunCommand(queryToExecute)
248+
sb.Reset()
237249
}
238-
l.WriteToHistory(rawQuery) //nolint:errcheck // TODO: investigate
239-
sessionRunnerInstance.RunCommand(queryToExecute)
240-
sb.Reset()
241-
sb.WriteString(line[subSemiColonIdx+1:])
242250
} else {
243251
sb.WriteString(" " + line)
244252
}

0 commit comments

Comments
 (0)