Add comprehensive end-to-end user test suite#24
Merged
Conversation
Adds tests/user/ with end-to-end subprocess-driven coverage: - test_rest_full.py: every Flask endpoint, response envelopes, snapshot lifecycle, observe diff, Prometheus metrics. - test_mcp_protocol.py: NDJSON framing, all 49 MCP tools smoked, stdout purity (logs to stderr). - test_predicates_full.py: all 9 assert_state predicate kinds plus AND. - test_element_actions_full.py: focus/set_value/invoke/select/hover/drag/ key_into/clear_text/propose-confirm flow. - test_scenarios_user.py, test_trace_replay.py, test_ascii_render_snapshot.py, test_budget_redaction_audit.py, test_setup_config_live.py. - Optional-deps tests (test_ocr_real_tesseract.py, test_vlm_real_ollama.py, test_ollama_setup_live.py, test_xvfb_live.py) skip gracefully without the underlying binaries / daemons. Adds pytest.ini with markers (user, slow_llm, slow_vlm, needs_display, needs_tesseract). Updates ci.yml to run the new tier alongside regression. Documents the test surface in README.md. https://claude.ai/code/session_01Q7eSEmS8XK4wU5GsK5Ey1z
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This PR introduces a complete end-to-end user test suite (
tests/user/) that exercises OSScreenObserver by spawning realpython main.pysubprocesses and driving them over the wire (REST HTTP and MCP stdio). This complements the existing in-process regression tests and catches threading, serialization, header, and protocol issues that in-process testing cannot expose.Key Changes
Test infrastructure (
conftest.py):oso_server_factoryfixture that spawns configurable OSO subprocesses with mock adapteroso_mcp_serverfixture for MCP stdio mode testingHttpJsonhelper class for urllib-based JSON HTTP requests over loopbackMCPClienthelper for newline-delimited JSON-RPC 2.0 framingREST API coverage (
test_rest_full.py):MCP protocol (
test_mcp_protocol.py):Scenario and trace testing:
test_scenarios_user.py: Drives login.yaml end-to-end with reactions and oraclestest_trace_replay.py: Record/replay round-trip with divergence detectionPredicate and action coverage:
test_predicates_full.py: All 9 assert_state predicate kinds (element_exists, element_absent, value_equals, value_matches, text_visible, window_focused, window_exists, tree_hash_equals, AND combination)test_element_actions_full.py: Focus, set_value, invoke, select_option, hover, drag, key_into, clear_textSpecialized tests:
test_ascii_render_snapshot.py: ASCII sketch renderer against stored snapshottest_budget_redaction_audit.py: Budget caps, redaction, audit log enforcementtest_ocr_real_tesseract.py: Real Tesseract OCR against generated PIL imagestest_vlm_real_ollama.py: Vision-LLM pipeline against real Ollama daemontest_setup_config_live.py: setup_config.py subprocess executiontest_xvfb_live.py: Live X11 adapter against real Xvfb displayTest configuration:
pytest.ini: Marker definitions foruser,slow_llm,slow_vlm,needs_display,needs_tesseract-m "not user") from user testsDocumentation: Updated README with testing tier explanation and user test coverage details
Implementation Details
PYTHONUNBUFFERED=1and stderr redirected to log files for debugging_wait_for_http()polling with configurable timeoutoso_server_factoryto amortize startup costneeds_displayand skipped when no X11slow_llm/slow_vlm/needs_tesseractand skipped when unavailablehttps://claude.ai/code/session_01Q7eSEmS8XK4wU5GsK5Ey1z