feat(os-index): implement OS getIndicesStats and getClusterHealth for Phase 1 portlet display#35853
feat(os-index): implement OS getIndicesStats and getClusterHealth for Phase 1 portlet display#35853fabrizzio-dotCMS wants to merge 6 commits into
Conversation
…ts.jsp (#35820) Adds a temporary debug panel and per-row provider column to the indices portlet to identify the origin of each listed index during an ES→OS Phase 1 migration bootstrap. Console output (Logger.info, tag [DEBUG #35820]): - Active migration phase ordinal - Raw keys from listDotCMSIndices(), getIndicesStats(), getClusterHealth() - ES store slots from indicies table - OS store slots from versioned_indices table (os:: tag stripped) - Per-index: origin label, hasStats, hasHealth, active, building flags Page output: - Dark debug panel above the indices table with all diagnostic data - Extra "[DBG] Provider / Origin" column on each index row showing ES-store / OS-store / ORPHAN label, plus in-DB confirmation To be removed once root cause is confirmed on the cloud instance. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
Claude finished @fabrizzio-dotCMS's task in 3m 33s —— View job Claude PR Review
Scope drift — PR metadata is staleThe title and body describe a temporary Phase 1 diagnostic ("debug code will be removed after root cause is confirmed"), but commit Findings1. if (bytes < 1_048_576L) return String.format("%.1fkb", bytes / 1_024.0);
2. The ES path ( 3. No Fresh cluster / pre-bootstrap path: a call before any index matches 4. Silent merge collision — final Map<String, IndexStats> merged = new HashMap<>(esImpl.getIndicesStats());
merged.putAll(osImpl.getIndicesStats());The javadoc says "On key collision the OS entry wins." In Phase 1/2 ES holds T0 names and OS holds T1 names, so collisions shouldn't happen — but if naming ever overlaps (misconfig, shared prefix across clusters) ES data is dropped without any signal. A 5. Broad Renders as "no data" in the portlet, which is what was happening before this PR for a different reason. A transient network blip + cached empty map gives the operator the same misleading screen the ghost-index investigation was chasing. Matches existing patterns in the file, so not a blocker — but worth narrowing to OS client exceptions, or at least logging at 6. Test assertion strength —
7. Nit — fully-qualified The OS client also has a class named Tests cover the right cases (Phase 0 ES-only, Phase 1 dual-merge for both stats and health). Routing aggregation in |
Old reindex-leftover indices had hasHealth=true but hasStats=false (closed indices are excluded from _stats). They incorrectly fell through to UNKNOWN. Now: - hasStats → Stale ES index (orphan in cluster, not in DB) - hasHealth + isClosed → Stale CLOSED ES index - hasHealth → Stale ES index (possibly closed — no stats) - neither → UNKNOWN (truly unresolvable) Also switched inEs/inOs lookups to use bare (stripped) name so the comparison is correct regardless of cluster-prefix presence in the indices list. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
… portlet display Both methods were stubs returning empty maps. OS indices appeared in the admin portlet with all metrics as n/a in Phase 1/2. OSIndexAPIImpl: query OS cluster via indices().stats() and cluster().health() using the existing clusterPrefix wildcard and hasClusterPrefix/removeClusterIdFromName helpers. Errors are caught and logged — degraded portlet beats a hard failure. IndexAPIImpl.getIndicesStats: changed from router.read() (ES-only) to the same dual-write aggregation pattern used by getClusterHealth() — merges ES and OS maps so both providers contribute stats in Phase 1/2. OS entry wins on key collision. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Root cause confirmed. OS stats/health now implemented in OSIndexAPIImpl and IndexAPIImpl — debug scaffolding no longer needed. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…egation Unit (IndexAPIImplStatsAggregationTest, 8 cases — no cluster needed): - Phase 0/3: single-provider delegation for getIndicesStats and getClusterHealth - Phase 1: dual-write merge — disjoint keys, OS wins on collision, OS degraded fallback Integration (OSIndexAPIImplIntegrationTest — live OS cluster): - getIndicesStats: verifies doc count >= 0, raw size >= 0, non-empty size string - getClusterHealth: verifies non-null status, shards > 0, replicas >= 0 (replaces the previous "returns non-null" smoke test with assertions on actual data) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ntegration) Removes the Mockito-based unit test and adds three real-cluster cases to ContentletIndexAPIImplMigrationIT — the existing suite that exercises phase-aware routing against live ES and OS containers: - getIndicesStats Phase 1: ES(T0) + OS(T1) both present in merged map - getIndicesStats Phase 0: ES only (skipped on single-cluster profile) - getClusterHealth Phase 1: ES(T0) + OS(T1) both present with valid status/shards Also strengthens OSIndexAPIImplIntegrationTest: replaces smoke tests with assertions on actual data (doc count >= 0, size non-blank, shards > 0). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Summary
OSIndexAPIImpl.getIndicesStats()andOSIndexAPIImpl.getClusterHealth()were unimplemented stubs returning empty maps. OS shadow indices appeared in the admin indices portlet with all metrics asn/a(count, size, replicas, health).IndexAPIImpl.getIndicesStats()routed exclusively to the read provider (ES in Phase 1) viarouter.read(), so OS stats were never consulted even if implemented.Changes
OSIndexAPIImpl— implements two previously stubbed methods:getIndicesStats(): queries OS cluster viaindices().stats(cluster_xxx.*), extractsprimaries.docs().count()andprimaries.store().sizeInBytes()per index, returns vendor-neutralIndexStatsmapgetClusterHealth(): queries OS cluster viacluster().health(cluster_xxx.*)atIndiceslevel, returns vendor-neutralClusterIndexHealthmapIndexAPIImpl.getIndicesStats()— changed fromrouter.read()(ES-only) to the same dual-write aggregation pattern used bygetClusterHealth(): merges ES and OS maps in Phase 1/2, OS entry wins on key collision.Tests
ContentletIndexAPIImplMigrationIT(integration, live ES + OS clusters):getIndicesStatsPhase 1: ES(T0) and OS(T1) both present in merged map with valid datagetIndicesStatsPhase 0: ES only (auto-skipped on single-cluster profile)getClusterHealthPhase 1: ES(T0) and OS(T1) both present with non-null status and positive shard countOSIndexAPIImplIntegrationTest(integration, live OS cluster):getIndicesStats: creates two indices, verifies doc count ≥ 0, raw size ≥ 0, size string non-emptygetClusterHealth: creates an index, verifies status non-empty, shards > 0, replicas ≥ 0Test plan
ContentletIndexAPIImplMigrationITwithFEATURE_FLAG_OPEN_SEARCH_PHASE=1— ES(T0) and OS(T1) indices should both appear ingetIndicesStatsandgetClusterHealthresults/dotAdmin/#/dot-layout-grid) shows real count/size/replicas/health for OS shadow indices in Phase 1 instead ofn/aCloses #35820
🤖 Generated with Claude Code