Skills Reference

The DevOps Execution Engine includes 11 comprehensive DevOps skills.

Kubernetes Skills

k8s-debug

Purpose: Troubleshoot Kubernetes pods, deployments, and nodes

Use cases:

Diagnose CrashLoopBackOff pods
Investigate OOMKilled containers
Check resource usage
Analyze pod events
Review logs across multiple pods

Example commands:

"Debug pods in production namespace"
"Why is api-service crashing?"
"Check resource usage for worker nodes"
"Show me logs for all api pods"

Risk level: LOW (read-only diagnosis)

k8s-deploy

Purpose: Safe Kubernetes deployment workflows with rollback

Use cases:

Deploy new versions
Rollback failed deployments
Blue-green deployments
Canary releases
Update deployment configs

Example commands:

"Deploy api-service v2.1.0 to production"
"Rollback the last deployment"
"Show deployment history for api-service"

Risk level: MEDIUM to HIGH (depending on environment)

argocd-gitops

Purpose: GitOps workflows with ArgoCD

Use cases:

Check ArgoCD sync status
Trigger manual sync
Review application health
Investigate sync failures
Manage ArgoCD applications

Example commands:

"Check ArgoCD sync status"
"Sync the api-service app"
"Why did the sync fail?"

Risk level: MEDIUM

Cloud Skills

aws-ops

Purpose: AWS operations, queries, and resource management

Use cases:

List EC2 instances
Check RDS status
Review S3 buckets
Analyze CloudWatch metrics
Manage IAM resources

Example commands:

"List all EC2 instances"
"Check RDS database status"
"Show S3 bucket sizes"
"What's the CloudWatch alarm status?"

Risk level: LOW (reads) to HIGH (modifications)

cost-optimization

Purpose: Cloud cost analysis and optimization

Use cases:

Analyze AWS spending
Find idle resources
Identify oversized instances
Suggest cost savings
Track cost trends

Example commands:

"Analyze AWS costs this month"
"Find underutilized resources"
"Suggest cost optimizations"
"Show me the top 10 expensive resources"

Risk level: LOW (analysis) to MEDIUM (recommendations)

Infrastructure Skills

terraform-workflow

Purpose: Infrastructure as Code workflows and best practices

Use cases:

Plan Terraform changes
Review state
Validate configurations
Manage workspaces
Detect drift

Example commands:

"Run terraform plan"
"Check terraform state"
"Validate terraform config"
"Detect infrastructure drift"

Risk level: LOW (plan) to CRITICAL (apply/destroy)

docker-ops

Purpose: Docker container operations and debugging

Use cases:

List running containers
Check container logs
Inspect container configs
Analyze image sizes
Debug networking

Example commands:

"List all running containers"
"Show logs for api container"
"Inspect the nginx image"
"Check container resource usage"

Risk level: LOW (inspect) to MEDIUM (restart/remove)

Operations Skills

incident-response

Purpose: Structured incident response playbooks

Use cases:

SEV1/SEV2 incident handling
Service outage response
High error rate investigation
Performance degradation
Security incident response

Example commands:

"We have a SEV1 - API is down"
"High error rates in payment service"
"Database is slow"
"Security incident detected"

Risk level: Varies (diagnosis is LOW, mitigation is HIGH)

Workflow:

Triage - Assess severity and impact
Diagnose - Identify root cause
Mitigate - Generate action plan
Approve - Human approves mitigation
Execute - Apply fixes
Verify - Confirm resolution
Document - Create incident report

log-analysis

Purpose: Cross-platform log analysis patterns

Use cases:

Parse and analyze logs
Find error patterns
Correlate events
Extract metrics
Identify anomalies

Example commands:

"Analyze logs for errors"
"Show me 5xx responses in the last hour"
"Find slow queries in postgres logs"
"Correlate API errors with database issues"

Risk level: LOW (read-only)

system-health

Purpose: Quick system health checks (disk, memory, CPU, processes)

Use cases:

Cluster health overview
Node resource usage
Disk space monitoring
Memory pressure detection
Process monitoring

Example commands:

"Check system health"
"Show disk usage across nodes"
"Check memory usage"
"Is any node under pressure?"

Risk level: LOW (monitoring only)

git-workflow

Purpose: Git workflows, branching strategies, and DevOps practices

Use cases:

Check git status
Review recent commits
Manage branches
Resolve merge conflicts
CI/CD integration

Example commands:

"Show git status"
"What changed in the last deploy?"
"Check recent commits"
"Create a feature branch"

Risk level: LOW (read) to MEDIUM (commits/pushes)

How Skills Work Together

Example: Complete Incident Response

You: API is returning 500 errors

Clawd uses:
1. system-health - Check cluster nodes
2. k8s-debug - Inspect API pods
3. log-analysis - Analyze error logs
4. incident-response - Structure the response

Clawd: I found the issue:
- Database connection pool exhausted
- API pods hitting memory limits

Would you like me to generate a mitigation plan?

You: yes

Clawd uses:
- k8s-deploy - Generate deployment patch plan
- Generates execution plan to increase resources

[You approve, Clawd executes]

Clawd uses:
- k8s-debug - Verify fix
- log-analysis - Confirm errors stopped

Example: Cost Optimization Audit

You: Analyze our AWS costs

Clawd uses:
1. aws-ops - List all resources
2. cost-optimization - Analyze usage patterns
3. system-health - Check resource utilization

Clawd: Found potential savings:
- 5 idle EC2 instances ($800/month)
- 3 oversized RDS databases ($1200/month)
- Unattached EBS volumes ($150/month)

Total potential savings: $2150/month

Would you like detailed recommendations?

Skill Configuration

Enable Specific Skills

In config.yaml:

enabled_skills:
  - k8s-debug
  - k8s-deploy
  - incident-response
  - system-health

Leave empty to enable all skills.

Skill-Specific Config

Some skills support additional configuration:

skills:
  k8s-debug:
    default_namespace: production
    log_tail_lines: 100
    
  aws-ops:
    default_region: us-east-1
    profile: production
    
  cost-optimization:
    savings_threshold: 100  # Only suggest if >$100/month

Adding Custom Skills

Create skill directory in skills/
Add SKILL.md with documentation
Create execution plan templates
Test in isolation
Submit PR

See CONTRIBUTING.md for details.

Skill Maturity

Skill	Maturity	Test Coverage	Documentation
k8s-debug	Stable	High	Complete
k8s-deploy	Stable	High	Complete
argocd-gitops	Stable	Medium	Complete
aws-ops	Stable	Medium	Complete
cost-optimization	Beta	Medium	Complete
terraform-workflow	Stable	Medium	Complete
docker-ops	Stable	High	Complete
incident-response	Stable	High	Complete
log-analysis	Stable	Medium	Complete
system-health	Stable	High	Complete
git-workflow	Stable	Medium	Complete

Next Steps

Try each skill in read-only mode first
Review generated execution plans
Start with LOW risk operations
Build confidence over time
See EXAMPLES.md for detailed usage examples

Each skill follows the same safety model: Plan → Approve → Execute

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Skills Reference

Kubernetes Skills

k8s-debug

k8s-deploy

argocd-gitops

Cloud Skills

aws-ops

cost-optimization

Infrastructure Skills

terraform-workflow

docker-ops

Operations Skills

incident-response

log-analysis

system-health

git-workflow

How Skills Work Together

Example: Complete Incident Response

Example: Cost Optimization Audit

Skill Configuration

Enable Specific Skills

Skill-Specific Config

Adding Custom Skills

Skill Maturity

Next Steps

FilesExpand file tree

SKILLS.md

Latest commit

History

SKILLS.md

File metadata and controls

Skills Reference

Kubernetes Skills

k8s-debug

k8s-deploy

argocd-gitops

Cloud Skills

aws-ops

cost-optimization

Infrastructure Skills

terraform-workflow

docker-ops

Operations Skills

incident-response

log-analysis

system-health

git-workflow

How Skills Work Together

Example: Complete Incident Response

Example: Cost Optimization Audit

Skill Configuration

Enable Specific Skills

Skill-Specific Config

Adding Custom Skills

Skill Maturity

Next Steps