Skip to content

[KYUUBI][PROPOSAL] Add Spark Connect gRPC server support to Kyuubi#7510

Closed
turboFei wants to merge 1 commit into
apache:masterfrom
turboFei:kyuubi-spark-connect-grpc-server
Closed

[KYUUBI][PROPOSAL] Add Spark Connect gRPC server support to Kyuubi#7510
turboFei wants to merge 1 commit into
apache:masterfrom
turboFei:kyuubi-spark-connect-grpc-server

Conversation

@turboFei

@turboFei turboFei commented Jun 12, 2026

Copy link
Copy Markdown
Member

Why are the changes needed?

Apache Spark 3.4 introduced Spark Connect — a gRPC-based client-server protocol using Protocol Buffers that decouples clients from the Spark driver. PySpark users can now connect via spark.remote("sc://host:port/") instead of embedding a local SparkSession. Currently Kyuubi has no gRPC listener and cannot accept Spark Connect clients, meaning PySpark thin clients and any non-JVM Spark Connect clients (Go, Rust) cannot benefit from Kyuubi's multi-tenancy, session pooling, and access control.

This PR adds planning artifacts (proposal, design, specs, tasks) for implementing Spark Connect support in Kyuubi as a multi-tenant gRPC proxy gateway.

Architecture

flowchart TD
    subgraph Clients
        A1["PySpark\nspark.remote('sc://...')"]
        A2["Go / Rust\nSpark Connect client"]
    end

    subgraph KyuubiServer ["Kyuubi Server (kyuubi-spark-connect module)"]
        B["NettyGrpcServer\n:15002"]
        C["AuthInterceptor\n(Bearer token → KyuubiUser)"]
        D["SparkConnectFrontendService\n(SparkConnectServiceGrpc)"]
        E["SparkConnectSessionManager\n(session lifecycle, user isolation)"]
        F["SparkConnectEngineProxy\n(gRPC channel to engine)"]
        G["ServiceDiscovery\n(ZooKeeper / Kubernetes)"]
    end

    subgraph EngineAlice ["Spark Engine – user alice"]
        H1["SparkConnect gRPC\n(embedded, auto-port)"]
        H2["SparkSession"]
    end

    subgraph EngineBob ["Spark Engine – user bob"]
        I1["SparkConnect gRPC\n(embedded, auto-port)"]
        I2["SparkSession"]
    end

    A1 -- "gRPC (Spark Connect proto)" --> B
    A2 -- "gRPC (Spark Connect proto)" --> B
    B --> C
    C -- "KyuubiUser" --> D
    D --> E
    E -- "open / reuse session" --> F
    F -- "discover sparkConnectPort" --> G
    F -- "proxy protobuf stream\n(alice)" --> H1
    F -- "proxy protobuf stream\n(bob)" --> I1
    H1 --- H2
    I1 --- I2
Loading

What Changes

This is a planning/proposal PR containing openspec/ artifacts only — no production code is changed. The design proposes:

  • A new profile-gated Maven module kyuubi-spark-connect exposing a SparkConnectService gRPC endpoint (default port 15002).
  • Proxy architecture: Kyuubi authenticates clients and routes Spark Connect protobuf streams to a Spark engine running in --spark-connect mode. Kyuubi does NOT embed a SparkSession (preserving the server/engine module boundary).
  • Session lifecycle mapped onto Kyuubi's existing SessionManager; per-user engine limits and idle timeouts apply unchanged.
  • Engine gRPC port discovered via ZooKeeper node attribute or Kubernetes pod label.
  • Bearer-token auth via existing AuthenticationProvider chain; user identity overrides client-supplied UserContext.user_name.
  • Feature off by default (kyuubi.frontend.spark.connect.enabled=false); additive and non-breaking.

Artifacts included

File Description
openspec/changes/kyuubi-spark-connect-grpc-server/proposal.md Motivation, capability list, impact
openspec/changes/kyuubi-spark-connect-grpc-server/design.md Architecture, key decisions, risks, open questions
openspec/changes/kyuubi-spark-connect-grpc-server/specs/spark-connect-frontend/spec.md gRPC server lifecycle, RPC surface, TLS, config keys
openspec/changes/kyuubi-spark-connect-grpc-server/specs/spark-connect-session/spec.md Session lifecycle, user isolation, event logging
openspec/changes/kyuubi-spark-connect-grpc-server/specs/spark-connect-engine-proxy/spec.md Port discovery, transparent forwarding, failure handling
openspec/changes/kyuubi-spark-connect-grpc-server/specs/spark-connect-auth/spec.md Auth interceptor, identity propagation, authz checks
openspec/changes/kyuubi-spark-connect-grpc-server/tasks.md 37 implementation tasks across 10 groups

How was this patch tested?

This PR contains planning artifacts only; no code to test. Implementation PRs will include unit and integration tests per tasks.md section 9.

Was this patch authored or co-authored using generative AI tooling?

Assisted-by: Claude:claude-sonnet-4-6

Adds openspec planning artifacts for implementing Apache Spark Connect
protocol support in Kyuubi, enabling Kyuubi to act as a multi-tenant
gRPC gateway for Spark Connect clients (PySpark, Go, Rust thin clients).

Artifacts created:
- proposal.md: motivation, capability list, and impact analysis
- design.md: proxy architecture, key technical decisions, risks, open questions
- specs/spark-connect-frontend/spec.md: gRPC server lifecycle, RPC surface, TLS, config keys
- specs/spark-connect-session/spec.md: session lifecycle, isolation, event logging
- specs/spark-connect-engine-proxy/spec.md: port discovery, transparent forwarding, failure handling
- specs/spark-connect-auth/spec.md: bearer token auth, user identity propagation, authz checks
- tasks.md: 37 implementation tasks across 10 groups (module → proto → config → session → proxy → auth → frontend → engine → tests → docs)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant