[KYUUBI][PROPOSAL] Add Spark Connect gRPC server support to Kyuubi#7510
Closed
turboFei wants to merge 1 commit into
Closed
[KYUUBI][PROPOSAL] Add Spark Connect gRPC server support to Kyuubi#7510turboFei wants to merge 1 commit into
turboFei wants to merge 1 commit into
Conversation
Adds openspec planning artifacts for implementing Apache Spark Connect protocol support in Kyuubi, enabling Kyuubi to act as a multi-tenant gRPC gateway for Spark Connect clients (PySpark, Go, Rust thin clients). Artifacts created: - proposal.md: motivation, capability list, and impact analysis - design.md: proxy architecture, key technical decisions, risks, open questions - specs/spark-connect-frontend/spec.md: gRPC server lifecycle, RPC surface, TLS, config keys - specs/spark-connect-session/spec.md: session lifecycle, isolation, event logging - specs/spark-connect-engine-proxy/spec.md: port discovery, transparent forwarding, failure handling - specs/spark-connect-auth/spec.md: bearer token auth, user identity propagation, authz checks - tasks.md: 37 implementation tasks across 10 groups (module → proto → config → session → proxy → auth → frontend → engine → tests → docs)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Why are the changes needed?
Apache Spark 3.4 introduced Spark Connect — a gRPC-based client-server protocol using Protocol Buffers that decouples clients from the Spark driver. PySpark users can now connect via
spark.remote("sc://host:port/")instead of embedding a local SparkSession. Currently Kyuubi has no gRPC listener and cannot accept Spark Connect clients, meaning PySpark thin clients and any non-JVM Spark Connect clients (Go, Rust) cannot benefit from Kyuubi's multi-tenancy, session pooling, and access control.This PR adds planning artifacts (proposal, design, specs, tasks) for implementing Spark Connect support in Kyuubi as a multi-tenant gRPC proxy gateway.
Architecture
flowchart TD subgraph Clients A1["PySpark\nspark.remote('sc://...')"] A2["Go / Rust\nSpark Connect client"] end subgraph KyuubiServer ["Kyuubi Server (kyuubi-spark-connect module)"] B["NettyGrpcServer\n:15002"] C["AuthInterceptor\n(Bearer token → KyuubiUser)"] D["SparkConnectFrontendService\n(SparkConnectServiceGrpc)"] E["SparkConnectSessionManager\n(session lifecycle, user isolation)"] F["SparkConnectEngineProxy\n(gRPC channel to engine)"] G["ServiceDiscovery\n(ZooKeeper / Kubernetes)"] end subgraph EngineAlice ["Spark Engine – user alice"] H1["SparkConnect gRPC\n(embedded, auto-port)"] H2["SparkSession"] end subgraph EngineBob ["Spark Engine – user bob"] I1["SparkConnect gRPC\n(embedded, auto-port)"] I2["SparkSession"] end A1 -- "gRPC (Spark Connect proto)" --> B A2 -- "gRPC (Spark Connect proto)" --> B B --> C C -- "KyuubiUser" --> D D --> E E -- "open / reuse session" --> F F -- "discover sparkConnectPort" --> G F -- "proxy protobuf stream\n(alice)" --> H1 F -- "proxy protobuf stream\n(bob)" --> I1 H1 --- H2 I1 --- I2What Changes
This is a planning/proposal PR containing
openspec/artifacts only — no production code is changed. The design proposes:kyuubi-spark-connectexposing aSparkConnectServicegRPC endpoint (default port 15002).--spark-connectmode. Kyuubi does NOT embed a SparkSession (preserving the server/engine module boundary).SessionManager; per-user engine limits and idle timeouts apply unchanged.AuthenticationProviderchain; user identity overrides client-suppliedUserContext.user_name.kyuubi.frontend.spark.connect.enabled=false); additive and non-breaking.Artifacts included
openspec/changes/kyuubi-spark-connect-grpc-server/proposal.mdopenspec/changes/kyuubi-spark-connect-grpc-server/design.mdopenspec/changes/kyuubi-spark-connect-grpc-server/specs/spark-connect-frontend/spec.mdopenspec/changes/kyuubi-spark-connect-grpc-server/specs/spark-connect-session/spec.mdopenspec/changes/kyuubi-spark-connect-grpc-server/specs/spark-connect-engine-proxy/spec.mdopenspec/changes/kyuubi-spark-connect-grpc-server/specs/spark-connect-auth/spec.mdopenspec/changes/kyuubi-spark-connect-grpc-server/tasks.mdHow was this patch tested?
This PR contains planning artifacts only; no code to test. Implementation PRs will include unit and integration tests per
tasks.mdsection 9.Was this patch authored or co-authored using generative AI tooling?
Assisted-by: Claude:claude-sonnet-4-6