This document outlines the vision, goals, and planned milestones for pytest-test-categories.
Become the de facto standard for test categorization, timing enforcement, and resource isolation in the Python ecosystem, enabling teams to maintain fast, reliable, hermetic test suites that follow Google's "Software Engineering at Google" best practices.
pytest-test-categories is the foundational component of a commercial Python testing ecosystem:
┌─────────────────────┐ ┌─────────────────────┐ ┌─────────────────────┐
│ pytest-test- │ │ pytest-test- │ │ [mutation │
│ categories │ │ impact │ │ testing tool] │
│ │ │ │ │ │
│ "Which tests are │ │ "Which tests cover │ │ "Are my tests │
│ fast/hermetic?" │ │ this code?" │ │ catching bugs?" │
│ │ │ │ │ │
└─────────┬───────────┘ └──────────┬──────────┘ └──────────┬──────────┘
│ │ │
└────────────────┬───────────┴───────────────────────────┘
│
▼
┌─────────────────────┐
│ dioxide │
│ "How do I write │
│ testable code?" │
└─────────────────────┘
The killer integration:
# Mutation test using only fast, hermetic tests that cover the changed code
pytest --mutate --impacted-by-diff origin/main -m smallThis integration provides 10x faster mutation testing by combining:
- Test size filtering (pytest-test-categories): Only run fast tests
- Impact analysis (pytest-test-impact): Only run tests that cover mutated code
- Hermeticity enforcement (pytest-test-categories): Ensure reliable, non-flaky results
- Foundation: Be the cornerstone of the commercial Python testing ecosystem
- Hermeticity: Enforce resource isolation so small tests are truly hermetic
- Best Practices: Promote Google's test size philosophy across the Python community
- Integration: Enable seamless integration with pytest-test-impact and mutation testing
- Performance: Zero-overhead test categorization and timing
- Extensibility: Pluggable architecture for custom categories and resource policies
- ✅ Four test size categories (small, medium, large, xlarge)
- ✅ Timing enforcement with configurable limits (default: 1s/300s/900s/900s)
- ✅ Distribution validation with target percentages (80/15/5)
- ✅ Distribution enforcement modes (off/warn/strict)
- ✅ Test size reporting (basic, detailed, and JSON)
- ✅ Base test classes for easy categorization
- ✅ Comprehensive test coverage (100%)
- ✅ CI/CD pipeline with multi-version Python support (3.11, 3.12, 3.13, 3.14)
- ✅ Pre-commit hooks for quality enforcement
- ✅ Hexagonal architecture (Ports and Adapters pattern throughout)
All resource isolation features are fully implemented and production-ready:
- ✅ Network Isolation - Block all network access for small tests, localhost-only for medium
- ✅ Filesystem Isolation - Block filesystem access for small tests (except tmp_path, tempdir)
- ✅ Process Isolation - Block subprocess spawning in small tests
- ✅ Database Isolation - Block database connections in small tests (including in-memory SQLite)
- ✅ Sleep Blocking - Block time.sleep() and asyncio.sleep() in small tests
- ✅ Thread Monitoring - Warn when small tests use threading primitives
- ✅ External Systems Detection - Warn when medium tests use testcontainers/docker
- ✅ Enforcement modes -
off(default),warn, andstrictmodes - ✅ Configurable allowed paths -
--test-categories-allowed-pathsCLI option
This plugin intentionally provides NO per-test override markers (e.g., @pytest.mark.allow_network).
This is a deliberate architectural decision, not a missing feature.
Rationale:
- Small tests must be hermetic. Period. No escape hatches.
- If a test needs external resources, it should be
@pytest.mark.medium, not a small test with an exception. - Override markers would undermine the entire philosophy and make enforcement meaningless.
- The correct remediation is always to either mock the dependency or upgrade the test category.
See each ADR in docs/architecture/ for detailed rationale per resource type.
- ✅
Filesystem isolation implementationDONE - ✅
Sleep blocking for small testsDONE - ✅
Configurable time limitsDONE - ✅
JSON report exportDONE - Comprehensive documentation review
- Final testing and polish
Based on development velocity with Claude Code assistance, the project is ~6 weeks ahead of schedule.
Delivered: v0.4.0 - v0.7.0
- ✅ Network access blocking for small tests
- ✅ Localhost-only restriction for medium tests
- ✅ Process/subprocess blocking for small tests
- ✅ Database connection blocking for small tests
- ✅ Filesystem isolation for small tests
- ✅ Sleep blocking for small tests
- ✅ Thread monitoring with warnings
- ✅ External systems detection for medium tests
- ✅ Enforcement modes:
off(default),warn, andstrict - ✅ Clear error messages with remediation guidance
- ✅ Configurable time limits via CLI and ini options
- ✅ JSON report export
Delivered: v0.7.0
- ✅ Comprehensive user guide documentation
- ✅ Architecture documentation with ADRs
- ✅ Migration guide and common patterns
- ✅ API reference documentation
- ✅ Ecosystem integration guides
- ✅ Real-world example test suite
- ✅ Performance benchmarks
- ✅ Security audit
Target: v1.0.0
Acceptance Criteria:
- Network isolation enforcement
- Process isolation enforcement
- Database isolation enforcement
- Filesystem isolation enforcement
- Sleep blocking for small tests
- Thread monitoring
- Distribution enforcement modes
- Configurable time limits and tolerances
- JSON reporting
- Comprehensive documentation
- Zero known critical bugs (final verification)
- Security audit completed
- Performance benchmarks published
Target: v1.1.0 - v1.3.0
Scope:
- Integration with pytest-test-impact
- pytest-xdist parallel execution support
- Dashboard integrations (Allure, ReportPortal)
- Historical trend tracking
Target: v2.0.0
Scope:
- Custom test categories
- dioxide DI integration (automatic faking for small tests)
- ML-based test categorization suggestions
- Flaky test detection
-
✅ Configurable Time Limits
- Allow users to override default limits
- Support per-category configuration via CLI and ini
- Validate configuration at startup
-
✅ Sleep Blocking
time.sleep()andasyncio.sleep()blocked for small tests- Warning/strict modes
- Clear error messages with remediation
-
✅ Filesystem Isolation
- Block filesystem access for small tests (except temp dirs)
- Configurable allowed paths via
--test-categories-allowed-paths - Full implementation matching ADR-002
-
✅ Enhanced Reporting
- JSON export for CI integration via
--test-size-report=json - Hermeticity violation reports
- JSON export for CI integration via
-
pytest-test-impact Integration
- Size metadata API for impact queries
- Combined filtering examples
- CI optimization patterns
-
Parallel Execution Support
- Full pytest-xdist compatibility
- Per-worker timer isolation
- Correct distribution validation
-
Dashboard Integration
- Allure integration
- ReportPortal integration
- Historical trend tracking
-
Custom Test Categories
- User-defined categories
- Custom resource policies
- Category inheritance
-
dioxide Integration
- Automatic test double injection
- Profile-based configuration
- Premium feature tier
-
Advanced Analytics
- ML-based categorization suggestions
- Flaky test detection
- Optimization recommendations
Acceptance Criteria (ALL COMPLETE):
- Configurable time limits via pyproject.toml/pytest.ini
- Sleep blocking for small tests
- Filesystem isolation for small tests
- JSON report export for CI integration
- Comprehensive documentation
- All ADRs updated to "Implemented" status
Acceptance Criteria:
- Full resource isolation (network, process, database, filesystem, sleep)
- Configurable time limits
- JSON reporting
- Comprehensive documentation
- Final testing and bug verification
- Security audit completed
- Performance benchmarks
Note: All v1.0.0 features are implemented. Release is pending final testing and polish.
Acceptance Criteria:
- Size metadata API for pytest-test-impact
- Combined filtering documentation
- CI optimization examples
- Integration test suite
Acceptance Criteria:
- Custom test categories
- dioxide integration (optional)
- ML-based suggestions
- Flaky test detection
- Code Quality: 100% test coverage maintained
- Security: Zero unpatched vulnerabilities
- Performance: < 1% overhead on test execution
- Documentation: 100% of public API documented
- Integration: Seamless with pytest-test-impact
- Adoption: Used by mutation testing tool users
- Reliability: Zero flaky tests in hermeticity-enforced suites
- Contributors: Growing contributor base
- Issues: < 7 day median response time
- PRs: < 14 day median merge time
- Releases: Monthly patches, quarterly minors
Following Semantic Versioning 2.0.0:
- MAJOR (e.g., 1.0.0 → 2.0.0): Breaking changes to public API
- MINOR (e.g., 1.0.0 → 1.1.0): New features, backward compatible
- PATCH (e.g., 1.0.0 → 1.0.1): Bug fixes, backward compatible
- Patch releases: As needed for bug fixes (1-2 weeks)
- Minor releases: Quarterly for new features
- Major releases: Annually or when breaking changes required
This roadmap is a living document that evolves based on:
- Ecosystem Needs: Integration requirements with pytest-test-impact and mutation testing
- Community Feedback: Your needs and priorities
- Industry Trends: Emerging best practices
- Technical Capabilities: New technologies and approaches
- Share Your Use Case: Open a discussion describing how you use pytest-test-categories
- Propose Features: Use the feature request template
- Vote on Issues: React with 👍 to issues you care about
- Contribute: Submit PRs for features you want to see
- Provide Feedback: Comment on proposed features
Last Updated: November 30, 2025 Next Review: January 2026 (v1.0.0 Release)