Skip to content

Commit d1e132c

Browse files
committed
feat(epp): Add plugin lifecycle and stability levels proposal
Signed-off-by: Sam Batschelet <sbatsche@redhat.com>
1 parent 8676fae commit d1e132c

2 files changed

Lines changed: 323 additions & 0 deletions

File tree

config/crd/kustomization.yaml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,7 @@ resources:
77
- bases/inference.networking.x-k8s.io_inferenceobjectives.yaml
88
- bases/inference.networking.x-k8s.io_inferencepoolimports.yaml
99
- bases/inference.networking.k8s.io_inferencepools.yaml
10+
- rbac-aggregation.yaml
1011
# +kubebuilder:scaffold:crdkustomizeresource
1112

1213
patches:
Lines changed: 322 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,322 @@
1+
# Plugin Lifecycle and Stability Levels
2+
3+
Author(s): @hexfusion
4+
5+
Related issues:
6+
- https://github.com/kubernetes-sigs/gateway-api-inference-extension/issues/2653
7+
- https://github.com/kubernetes-sigs/gateway-api-inference-extension/issues/1405
8+
9+
## Proposal Status
10+
***Draft***
11+
12+
## Summary
13+
14+
GIE's plugin system is growing. Extension points now support
15+
multiple implementations, and more plugin types are coming as
16+
the EPP evolves (data layer sources, parsers, flow control
17+
policies). This growth is healthy, it lets contributors
18+
experiment with new approaches and iterate quickly.
19+
20+
Today there is no mechanism to communicate plugin maturity to
21+
operators. A plugin either exists in the registry or it doesn't.
22+
There is no way to distinguish "this plugin is experimental and
23+
may change" from "this plugin is stable and its config API is
24+
committed." Without a clear support contract, operators can't
25+
make informed deployment decisions, and maintainers can't iterate
26+
on plugin designs without risking silent breakage for users who
27+
adopted them early.
28+
29+
A plugin lifecycle model would let experimentation and stability
30+
coexist: contributors can ship new plugins without the pressure
31+
of immediate stability guarantees, and operators can see exactly
32+
what they're opting into.
33+
34+
## Goals
35+
36+
* Define maturity tiers for EPP plugins (Alpha, Beta, Stable)
37+
with clear support contracts at each tier
38+
* Gate experimental plugins behind feature flags so they're
39+
opt-in by default
40+
* Reject removed plugins at config validation time with
41+
actionable error messages
42+
* Communicate stability to operators at startup via structured
43+
log messages
44+
45+
## Non-Goals
46+
47+
* Runtime stability negotiation (plugins don't change stability
48+
while running)
49+
* Out-of-tree plugin certification, conformance testing, or
50+
governance of stability declarations
51+
* CRD-level stability annotations (this proposal covers compiled
52+
EPP plugins only)
53+
54+
## Prior Art
55+
56+
kube-scheduler gates alpha plugins via feature flags and
57+
hard-rejects removed plugins at config validation time. Gateway
58+
API uses [Standard/Experimental channels](https://gateway-api.sigs.k8s.io/concepts/versioning/) with
59+
formal graduation criteria. Neither system puts stability
60+
metadata in the plugin interface itself.
61+
62+
## Proposed Design
63+
64+
Stability is managed through the plugin registry, feature gates,
65+
and config validation not through the Plugin interface.
66+
67+
### Stability Levels
68+
69+
Plugin stability uses three maturity tiers: Alpha, Beta, and
70+
Stable. These are plugin-specific labels, not Kubernetes API
71+
versions. There is no separate "Deprecated" level, deprecation
72+
is a signal (a message indicating replacement), not a maturity
73+
tier. The plugin's current level determines its removal timeline.
74+
75+
| Level | Default | Config Contract | Removal Policy |
76+
|-------|---------|-----------------|----------------|
77+
| **Alpha** | Gated off (requires feature gate) | No compatibility guarantee. Config schema may change between releases. | Can be removed any release. |
78+
| **Beta** | Gated on | Config schema is stable. Behavioral changes require release notes. | 2 releases + 6 months after deprecation notice. |
79+
| **Stable** | Always available | Full backward compatibility within config API version. | Not removed within a config API major version. |
80+
81+
**Deprecation** is orthogonal to level. A plugin at any level
82+
can carry a deprecation message signaling that it will be
83+
removed. The level determines how long it must remain available
84+
after that signal. When the policy window expires, the plugin is
85+
removed from the registry entirely. A separate validation
86+
tombstone provides the migration message for stale configs that
87+
still reference it.
88+
89+
**Removal** is not a stability level. Removed plugins are
90+
deleted from the registry. A tombstone map in the validation
91+
layer catches stale configs and returns actionable errors with
92+
migration guidance.
93+
94+
These tiers and their removal policies are defined by this
95+
proposal and are specific to GIE's plugin system. They do not
96+
map to Kubernetes API versions and are independent of the
97+
`EndpointPickerConfig` API version.
98+
99+
### Key Mechanisms
100+
101+
**Registry metadata.** The existing `plugin.Registry` (a
102+
`map[string]FactoryFunc`) is extended to carry stability,
103+
feature gate, and deprecation message alongside the factory
104+
function. This is the single source of truth for plugin
105+
maturity. No changes to the `Plugin` interface are needed.
106+
107+
**Feature gate integration.** Alpha plugins require an explicit
108+
feature gate in `EndpointPickerConfig.FeatureGates`. GIE already
109+
has a `FeatureGates []string` field on the config; this proposal
110+
extends its use to cover per-plugin gating.
111+
112+
**Config validation.** At config load time:
113+
* Alpha plugins without their feature gate enabled are rejected
114+
with an actionable error
115+
* Removed plugins are rejected with migration guidance
116+
* Plugins with a deprecation message are accepted but log a
117+
warning with the replacement and removal timeline
118+
119+
**Startup logging.** Every loaded plugin is logged with its
120+
stability level and any deprecation message. This gives
121+
operators immediate visibility into what they're running.
122+
123+
## Implementation
124+
125+
The implementation is scoped to the GIE framework packages. No
126+
changes to the `Plugin` interface or individual plugin code are
127+
required in Phase 1 or 2.
128+
129+
### Current State
130+
131+
Today `plugin.Registry` is a `map[string]FactoryFunc` with no
132+
metadata. Feature gates are phase-level (`prepareDataPlugins`,
133+
`experimentalDatalayer`, `flowControl`), not per-plugin.
134+
Validation checks profile references and gate names but knows
135+
nothing about plugin maturity.
136+
137+
### Phase 1: Registry Metadata + Startup Logging
138+
139+
**Goal:** Every plugin in the registry carries stability
140+
metadata. Operators see stability at startup.
141+
142+
**Changes to `pkg/epp/framework/interface/plugin/registry.go`:**
143+
144+
```go
145+
// StabilityLevel defines the maturity of a registered plugin.
146+
// Three maturity tiers that define the config contract and
147+
// removal policy. These are plugin-specific labels, not
148+
// Kubernetes API versions. Deprecation is orthogonal (a
149+
// message, not a level). Removal means the plugin leaves
150+
// the registry entirely.
151+
type StabilityLevel string
152+
153+
const (
154+
// Unknown is the zero value. Assigned to plugins registered
155+
// via the backward-compatible Register() path that have not
156+
// yet opted into the lifecycle model.
157+
Unknown StabilityLevel = "Unknown"
158+
Alpha StabilityLevel = "Alpha"
159+
Beta StabilityLevel = "Beta"
160+
Stable StabilityLevel = "Stable"
161+
)
162+
163+
// IsValid returns true if s is a recognized stability level
164+
// that carries a support contract. Unknown is recognized but
165+
// indicates the plugin has not declared its stability.
166+
func (s StabilityLevel) IsValid() bool {
167+
switch s {
168+
case Unknown, Alpha, Beta, Stable:
169+
return true
170+
}
171+
return false
172+
}
173+
174+
// RegistryEntry holds a plugin factory and its lifecycle
175+
// metadata.
176+
type RegistryEntry struct {
177+
// Factory instantiates the plugin.
178+
Factory FactoryFunc
179+
180+
// Stability is the maturity level of this plugin.
181+
// Unknown for plugins registered via Register();
182+
// Alpha, Beta, or Stable for plugins registered
183+
// via MustRegister().
184+
Stability StabilityLevel
185+
186+
// FeatureGate is the feature gate name required for
187+
// Alpha plugins. Must be non-empty when Stability is
188+
// Alpha.
189+
FeatureGate string
190+
191+
// DeprecationMessage, if non-empty, signals that this
192+
// plugin will be removed in a future release. Logged as
193+
// a warning at startup. The plugin remains fully
194+
// functional. The removal timeline is determined by the
195+
// plugin's stability level.
196+
DeprecationMessage string
197+
}
198+
199+
// Registry is the global plugin registry, keyed by plugin
200+
// type string. All registration must complete before
201+
// LoadRawConfig is called. Concurrent registration is not
202+
// supported.
203+
var Registry = map[string]RegistryEntry{}
204+
205+
// Register adds a plugin factory to the registry without
206+
// stability metadata. Plugins registered this way get Unknown
207+
// stability and will log a warning at startup prompting the
208+
// author to migrate to MustRegister. This preserves backward
209+
// compatibility for out-of-tree plugins that have not yet
210+
// opted into the lifecycle model.
211+
func Register(pluginType string, factory FactoryFunc) {
212+
Registry[pluginType] = RegistryEntry{
213+
Factory: factory,
214+
Stability: Unknown,
215+
}
216+
}
217+
218+
// MustRegister adds a plugin factory with explicit lifecycle
219+
// metadata and panics on invalid plugin.
220+
func MustRegister(pluginType string, entry RegistryEntry) {
221+
if !entry.Stability.IsValid() {
222+
panic(fmt.Sprintf(
223+
"plugin %q: invalid stability level %q",
224+
pluginType, entry.Stability))
225+
}
226+
if entry.Stability == Alpha && entry.FeatureGate == "" {
227+
panic(fmt.Sprintf(
228+
"plugin %q: alpha plugins must specify a FeatureGate",
229+
pluginType))
230+
}
231+
if entry.Factory == nil {
232+
panic(fmt.Sprintf(
233+
"plugin %q: Factory must not be nil",
234+
pluginType))
235+
}
236+
Registry[pluginType] = entry
237+
}
238+
```
239+
240+
**Startup logging** is a separate pass (`logPluginStability`)
241+
that runs after validation but before factory calls. It logs
242+
each plugin's name, type, and stability level. Plugins with a
243+
`DeprecationMessage` get an additional warning.
244+
245+
**Migration path:** Existing `plugin.Register()` calls continue
246+
to work with `Unknown` stability. Plugin authors adopt
247+
`MustRegister()` at their own pace.
248+
249+
### Phase 2: Alpha Gating + Removed Plugin Rejection
250+
251+
**Goal:** Alpha plugins require explicit opt-in. Removed plugins
252+
produce actionable errors. Stability validation runs before
253+
plugin factories are called.
254+
255+
```go
256+
// removedPlugins is a tombstone map for plugins that have been
257+
// deleted from the registry. When an operator's config
258+
// references a removed plugin, validation returns an actionable
259+
// error with migration guidance instead of the generic "not
260+
// registered" error from instantiatePlugins. Tombstones are
261+
// permanent and small.
262+
var removedPlugins = map[string]string{
263+
// Populated as plugins are removed. Key is the plugin type,
264+
// value is the migration message. Example:
265+
// "old-plugin": "Use new-plugin instead. See https://...",
266+
}
267+
268+
func validatePluginStability(
269+
cfg *configapi.EndpointPickerConfig,
270+
) error {
271+
enabledGates := sets.New(cfg.FeatureGates...)
272+
273+
for _, spec := range cfg.Plugins {
274+
// Check tombstones first -- give a useful migration
275+
// error instead of the generic "not registered" from
276+
// instantiatePlugins.
277+
if msg, ok := removedPlugins[spec.Type]; ok {
278+
return fmt.Errorf(
279+
"plugin type '%s' has been removed: %s",
280+
spec.Type, msg,
281+
)
282+
}
283+
284+
entry, ok := fwkplugin.Registry[spec.Type]
285+
if !ok {
286+
continue // Will be caught by instantiatePlugins.
287+
}
288+
289+
// Alpha plugins require their feature gate to be
290+
// explicitly enabled.
291+
if entry.Stability == fwkplugin.Alpha {
292+
if !enabledGates.Has(entry.FeatureGate) {
293+
return fmt.Errorf(
294+
"plugin '%s' (type: %s) is alpha and "+
295+
"requires feature gate '%s' to be "+
296+
"enabled in featureGates",
297+
spec.Name, spec.Type, entry.FeatureGate,
298+
)
299+
}
300+
}
301+
}
302+
return nil
303+
}
304+
```
305+
306+
**Removed plugins** are deleted from the registry. The
307+
maintainer removes the `MustRegister` call and adds a tombstone
308+
to `removedPlugins`. Tombstones are permanent and small.
309+
310+
**Feature gate registration** for alpha plugins is manual via
311+
`loader.RegisterFeatureGate()`, called alongside
312+
`plugin.MustRegister()`.
313+
314+
## Open Questions
315+
316+
1. Should alpha plugins be completely invisible in the default
317+
config, or just gated off?
318+
2. Should graduation criteria be GIE-specific, or adopt Gateway
319+
API's requirements?
320+
3. Where does the stability policy live, `docs/plugin-lifecycle.md`,
321+
`CONTRIBUTING.md`, or a dedicated proposal?
322+

0 commit comments

Comments
 (0)