The DevOps Execution Engine includes 11 comprehensive DevOps skills.
Purpose: Troubleshoot Kubernetes pods, deployments, and nodes
Use cases:
- Diagnose CrashLoopBackOff pods
- Investigate OOMKilled containers
- Check resource usage
- Analyze pod events
- Review logs across multiple pods
Example commands:
"Debug pods in production namespace"
"Why is api-service crashing?"
"Check resource usage for worker nodes"
"Show me logs for all api pods"
Risk level: LOW (read-only diagnosis)
Purpose: Safe Kubernetes deployment workflows with rollback
Use cases:
- Deploy new versions
- Rollback failed deployments
- Blue-green deployments
- Canary releases
- Update deployment configs
Example commands:
"Deploy api-service v2.1.0 to production"
"Rollback the last deployment"
"Show deployment history for api-service"
Risk level: MEDIUM to HIGH (depending on environment)
Purpose: GitOps workflows with ArgoCD
Use cases:
- Check ArgoCD sync status
- Trigger manual sync
- Review application health
- Investigate sync failures
- Manage ArgoCD applications
Example commands:
"Check ArgoCD sync status"
"Sync the api-service app"
"Why did the sync fail?"
Risk level: MEDIUM
Purpose: AWS operations, queries, and resource management
Use cases:
- List EC2 instances
- Check RDS status
- Review S3 buckets
- Analyze CloudWatch metrics
- Manage IAM resources
Example commands:
"List all EC2 instances"
"Check RDS database status"
"Show S3 bucket sizes"
"What's the CloudWatch alarm status?"
Risk level: LOW (reads) to HIGH (modifications)
Purpose: Cloud cost analysis and optimization
Use cases:
- Analyze AWS spending
- Find idle resources
- Identify oversized instances
- Suggest cost savings
- Track cost trends
Example commands:
"Analyze AWS costs this month"
"Find underutilized resources"
"Suggest cost optimizations"
"Show me the top 10 expensive resources"
Risk level: LOW (analysis) to MEDIUM (recommendations)
Purpose: Infrastructure as Code workflows and best practices
Use cases:
- Plan Terraform changes
- Review state
- Validate configurations
- Manage workspaces
- Detect drift
Example commands:
"Run terraform plan"
"Check terraform state"
"Validate terraform config"
"Detect infrastructure drift"
Risk level: LOW (plan) to CRITICAL (apply/destroy)
Purpose: Docker container operations and debugging
Use cases:
- List running containers
- Check container logs
- Inspect container configs
- Analyze image sizes
- Debug networking
Example commands:
"List all running containers"
"Show logs for api container"
"Inspect the nginx image"
"Check container resource usage"
Risk level: LOW (inspect) to MEDIUM (restart/remove)
Purpose: Structured incident response playbooks
Use cases:
- SEV1/SEV2 incident handling
- Service outage response
- High error rate investigation
- Performance degradation
- Security incident response
Example commands:
"We have a SEV1 - API is down"
"High error rates in payment service"
"Database is slow"
"Security incident detected"
Risk level: Varies (diagnosis is LOW, mitigation is HIGH)
Workflow:
- Triage - Assess severity and impact
- Diagnose - Identify root cause
- Mitigate - Generate action plan
- Approve - Human approves mitigation
- Execute - Apply fixes
- Verify - Confirm resolution
- Document - Create incident report
Purpose: Cross-platform log analysis patterns
Use cases:
- Parse and analyze logs
- Find error patterns
- Correlate events
- Extract metrics
- Identify anomalies
Example commands:
"Analyze logs for errors"
"Show me 5xx responses in the last hour"
"Find slow queries in postgres logs"
"Correlate API errors with database issues"
Risk level: LOW (read-only)
Purpose: Quick system health checks (disk, memory, CPU, processes)
Use cases:
- Cluster health overview
- Node resource usage
- Disk space monitoring
- Memory pressure detection
- Process monitoring
Example commands:
"Check system health"
"Show disk usage across nodes"
"Check memory usage"
"Is any node under pressure?"
Risk level: LOW (monitoring only)
Purpose: Git workflows, branching strategies, and DevOps practices
Use cases:
- Check git status
- Review recent commits
- Manage branches
- Resolve merge conflicts
- CI/CD integration
Example commands:
"Show git status"
"What changed in the last deploy?"
"Check recent commits"
"Create a feature branch"
Risk level: LOW (read) to MEDIUM (commits/pushes)
You: API is returning 500 errors
Clawd uses:
1. system-health - Check cluster nodes
2. k8s-debug - Inspect API pods
3. log-analysis - Analyze error logs
4. incident-response - Structure the response
Clawd: I found the issue:
- Database connection pool exhausted
- API pods hitting memory limits
Would you like me to generate a mitigation plan?
You: yes
Clawd uses:
- k8s-deploy - Generate deployment patch plan
- Generates execution plan to increase resources
[You approve, Clawd executes]
Clawd uses:
- k8s-debug - Verify fix
- log-analysis - Confirm errors stopped
You: Analyze our AWS costs
Clawd uses:
1. aws-ops - List all resources
2. cost-optimization - Analyze usage patterns
3. system-health - Check resource utilization
Clawd: Found potential savings:
- 5 idle EC2 instances ($800/month)
- 3 oversized RDS databases ($1200/month)
- Unattached EBS volumes ($150/month)
Total potential savings: $2150/month
Would you like detailed recommendations?
In config.yaml:
enabled_skills:
- k8s-debug
- k8s-deploy
- incident-response
- system-healthLeave empty to enable all skills.
Some skills support additional configuration:
skills:
k8s-debug:
default_namespace: production
log_tail_lines: 100
aws-ops:
default_region: us-east-1
profile: production
cost-optimization:
savings_threshold: 100 # Only suggest if >$100/month- Create skill directory in
skills/ - Add
SKILL.mdwith documentation - Create execution plan templates
- Test in isolation
- Submit PR
See CONTRIBUTING.md for details.
| Skill | Maturity | Test Coverage | Documentation |
|---|---|---|---|
| k8s-debug | Stable | High | Complete |
| k8s-deploy | Stable | High | Complete |
| argocd-gitops | Stable | Medium | Complete |
| aws-ops | Stable | Medium | Complete |
| cost-optimization | Beta | Medium | Complete |
| terraform-workflow | Stable | Medium | Complete |
| docker-ops | Stable | High | Complete |
| incident-response | Stable | High | Complete |
| log-analysis | Stable | Medium | Complete |
| system-health | Stable | High | Complete |
| git-workflow | Stable | Medium | Complete |
- Try each skill in read-only mode first
- Review generated execution plans
- Start with LOW risk operations
- Build confidence over time
- See EXAMPLES.md for detailed usage examples
Each skill follows the same safety model: Plan → Approve → Execute