Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
96 changes: 96 additions & 0 deletions INTEGRATION-TESTS.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,96 @@
<!--
Licensed to the Apache Software Foundation (ASF) under one
or more contributor license agreements. See the NOTICE file
distributed with this work for additional information
regarding copyright ownership. The ASF licenses this file
to you under the Apache License, Version 2.0 (the
"License"); you may not use this file except in compliance
with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->

# Integration Tests – Presto

End-to-end tests that drive the Presto platform operators (TableSource, Filter,
Projection, Join) through the Wayang API against a live PrestoDB cluster.

| Test | Module | Needs |
|------|--------|-------|
| `AllOperatorsIT` | `wayang-presto` | a local Presto (Docker) |

The test **skips** (it does not fail) when Presto is unreachable.

---

## Prerequisites

- **JDK 17** — required. The Scala in `wayang-spark` does not compile on JDK 21+,
and the build targets release 17, so JDK 11 is too old. Point Maven at a JDK 17:
```bash
export JAVA_HOME=/path/to/jdk-17 # e.g. .../corretto-17.jdk/Contents/Home
```
- **Maven 3.8+**
- **Docker**

### Common Maven flags

The repo's root build runs RAT + license + prerequisite checks that are noisy for
local runs; skip them:

```
-Drat.skip=true -Dlicense.skip=true -Dmaven.javadoc.skip=true -Pskip-prerequisite-check
```

> First build only: drop `-o` (offline) so Maven can download dependencies.

---

## Presto

The test is self-contained: it creates and seeds its own `memory.wayang_it`
tables in Presto's built-in **in-memory connector** (scaled to 120k rows so the
optimizer elects SQL pushdown) and drops them afterwards — no Hive metastore or
object storage required.

```bash
# 1. start a single PrestoDB node with the in-memory connector
cd presto-setup && docker compose up -d --wait && cd ..

# 2. run the operator tests (JDK 17)
JAVA_HOME=/path/to/jdk-17 \
mvn -o test -pl wayang-platforms/wayang-presto \
-Dtest=AllOperatorsIT -Dsurefire.failIfNoSpecifiedTests=false \
-Drat.skip=true -Dlicense.skip=true -Dmaven.javadoc.skip=true -Pskip-prerequisite-check

# 3. tear down when done
cd presto-setup && docker compose down -v && cd ..
```

Expected: `Tests run: 4, Failures: 0, Errors: 0, Skipped: 0`.

`docker compose up -d --wait` blocks on the container healthcheck, so Presto is
query-ready when it returns. Presto listens on host port **8081** (container 8080).

---

## Notes

- **Pushdown is cost-gated.** On tiny tables Wayang's optimizer prefers a full
scan + Java-side filter/projection, so pushdown only appears once a table is
large enough (hence the test scales to 120k rows). Each test asserts both correct
results and that the expected SQL reached Presto (`system.runtime.queries`).
- **Join.** A JDBC join is verified through the operator's SQL-clause contract
executed on Presto, not the high-level `WayangContext` API — the logical
`JoinOperator` emits `Tuple2<Record,Record>`, which cannot connect to a `Record`
sink before the SQL pushdown flattens it.
- **Trailing semicolons.** Presto's SQL parser rejects a trailing `;` in
`executeQuery`, so this branch also carries the jdbc-template change that stops
emitting one (shared with the other JDBC platforms; Postgres/SQLite tolerate its
absence).
120 changes: 120 additions & 0 deletions improvement.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,120 @@
<!--
Licensed to the Apache Software Foundation (ASF) under one
or more contributor license agreements. See the NOTICE file
distributed with this work for additional information
regarding copyright ownership. The ASF licenses this file
to you under the Apache License, Version 2.0 (the
"License"); you may not use this file except in compliance
with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->

# Presto engine-only integration test

## 1. What this branch demonstrates

The question this branch answers is **not** "does Presto execute some single
operator?" but:

> From `WayangContext.execute(...)` to the end of the whole Wayang plan, do all
> data processing **and** the final sink run inside Presto, **without** registering
> `Java.basicPlugin()`?

On this branch the answer is **yes**. `PrestoOperatorsIT`:

- registers **only** `Presto.plugin()` — no `Java.basicPlugin()`;
- ends **every** Wayang plan in a Presto `TableSink`, which compiles to a single
`CREATE TABLE ... AS SELECT` executed inside Presto;
- after `WayangContext.execute(...)` returns, JUnit reads the result table with a
plain JDBC query (assertion only — not part of the Wayang plan);
- handles the join `Tuple2<Record, Record>` vs flat `Record` mismatch with a
test-only flatten mapping (see §4). This is a test-only scheme, not a final
decision on Tuple-to-Record semantics for JDBC platforms.

This mirrors the Trino-only work on `wayang-trino-only-test`; the contrast is the
older mixed branch `wayang-presto`, which registered both `Java.basicPlugin()` and
`Presto.plugin()` and ended most operator tests in a Java `LocalCallbackSink`.

## 2. Execution shape

```text
Presto TableSource -> Presto operator(s) -> Presto TableSink
|
v
CREATE TABLE memory.wayang_it.operator_result AS SELECT ...

WayangContext.execute(...) returns
|
v
JUnit queries the result table over JDBC (assertions only)
```

The final JDBC query is part of the test only: it is not in the Wayang logical
plan, it is not a Wayang Java execution operator, and it does not process plan
data on Presto's behalf — it just inspects what Presto already wrote.

## 3. The shared executor change

All JDBC platforms share `wayang-jdbc-template`'s `JdbcExecutor`. When a stage's
terminal task is a `JdbcTableSinkOperator`, `JdbcExecutor.executeSinkStage(...)`
composes and runs the `CREATE TABLE ... AS SELECT` directly on the connection.

The previous Presto branch's `executeSinkStage` had two gaps that only surface
once **every** test ends in a `TableSink`:

1. It asserted a stage has a single source, so a join (orders + customers, two
sources) could not be composed. It also lacked the `selectStartTask(...)` helper
that picks the correct left/`FROM` table.
2. It only collected filter, projection and join; it threw `WayangException` for
global reduce, reduce-by and sort, and passed `null` for them to
`createSqlString(...)`.

This branch ports the engine-only `executeSinkStage` (identical to the file on
`wayang-trino-only-test`): it uses `selectStartTask(...)` for multi-source joins
and collects global reduce / reduce-by / sort, passing them into the existing
`createSqlString(...)`. The file is platform-agnostic. (Assertions are enabled
under Maven — `pom.xml` `enableAssertions=true` — so without this change a
join/reduce/sort sink would fail loudly, not silently.)

## 4. The join flatten mapping

A logical `JoinOperator` emits `Tuple2<Record, Record>`, while a pushed-down JDBC
join already emits a flat `Record`. The test wires an explicit flatten `MapOperator`
(named `JOIN_FLATTEN_NAME`) and registers a test-only `JoinFlattenMapping` on the
configuration whitelist; the mapping rewrites that named map into a
`PrestoProjectionOperator`, so the flatten is also pushed into Presto SQL and the
plan stays entirely in Presto. (Same approach as the Trino-only test, using
`PrestoProjectionOperator` + `PrestoPlatform`.)

## 5. Coverage and results

`PrestoOperatorsIT` runs 13 tests (8 operator-level + 5 high-level
`JavaPlanBuilder`) covering `TableSource`, `Filter`, `Projection`, `Join`,
`GlobalReduce`, `ReduceBy`, `Sort`, `TableSink`. Each composes a
`CREATE TABLE ... AS SELECT` and additionally asserts, via
`system.runtime.queries`, that the expected SQL actually reached Presto.

The high-level tests also rely on the `withSqlUdf` / `withSqlUdfs` additions to
`DataQuantaBuilder.scala` (ported from `wayang-trino-only-test`) so reduce / join /
sort builders can carry SQL implementations.

```bash
docker compose -f presto-setup/docker-compose.yml up -d

JAVA_HOME=<jdk17> mvn test -pl wayang-platforms/wayang-presto -am \
-Dtest=PrestoOperatorsIT -DfailIfNoTests=false -Dsurefire.failIfNoSpecifiedTests=false \
-Drat.skip=true -Dlicense.skip=true -Pskip-prerequisite-check

docker compose -f presto-setup/docker-compose.yml down
```

Expected: `Tests run: 13, Failures: 0, Errors: 0, Skipped: 0`. The suite scales
its fixtures to 120k rows and creates/drops its own `memory.wayang_it` schema.
If Presto is unreachable the whole class is skipped (not failed).
47 changes: 47 additions & 0 deletions presto-setup/docker-compose.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,47 @@
---
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# Stack: a single PrestoDB coordinator/worker with the in-memory connector.
#
# The `memory` connector supports CREATE SCHEMA / CREATE TABLE / INSERT / SELECT
# entirely in-memory, so the integration test is fully self-contained — no Hive
# metastore, object storage, or external catalog required.
#
# Ports:
# Presto: http://localhost:8081 (UI + JDBC; container listens on 8080)
#
# The host port is 8081 to avoid clashing with the Trino stack (which uses 8080).

services:

presto:
image: prestodb/presto:0.289
container_name: presto
ports:
- "8081:8080"
volumes:
# Enable the in-memory connector by adding a catalog properties file.
- ./etc/catalog/memory.properties:/opt/presto-server/etc/catalog/memory.properties
# Presto needs ~20-60s before it accepts queries. Gate on /v1/info so callers
# (and `docker compose up -d --wait`) can wait for readiness; the container
# reports "Up" long before the coordinator is ready.
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:8080/v1/info"]
interval: 10s
timeout: 5s
retries: 15
start_period: 30s
26 changes: 26 additions & 0 deletions presto-setup/etc/catalog/memory.properties
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# PrestoDB in-memory connector.
# Backs the `memory` catalog used by the Wayang Presto integration tests:
# tables created here (CREATE TABLE memory.<schema>.<table>) live in worker
# memory and are dropped at teardown.
connector.name=memory
# Cap on-heap connector data per node. The image's default heap is -Xmx1G with
# -XX:+ExitOnOutOfMemoryError, so keep this well under the heap; 256MB is ample
# for the small, transient test tables. Raise the heap (custom jvm.config) before
# increasing this if you reuse the stack for larger inserts.
memory.max-data-per-node=256MB
Loading