machine-ops
The machine-ops Claude Code skill provides lifecycle management (restart, pause, resume), real-time metrics analysis, and backup operations for server instances. Load this skill when diagnosing server health issues, monitoring system performance trends, or managing machine state changes. It includes a structured diagnostic framework that progressively investigates SSH connectivity, resource utilization, Docker health, system logs, and network status to identify root causes.
git clone --depth 1 https://github.com/nixopus/nixopus /tmp/machine-ops && cp -r /tmp/machine-ops/api/skills/machine-ops ~/.claude/skills/machine-opsSKILL.md
# Machine Operations ## Lifecycle Management You can check and control the machine instance state: - get_machine_lifecycle_status → current state (Running, Paused, Stopped), PID, uptime - restart_machine → restart the instance (requires user approval) - pause_machine → pause the instance (requires user approval) - resume_machine → resume a paused instance (requires user approval) Always check get_machine_lifecycle_status before performing restart/pause/resume. ## Metrics & Events - get_machine_metrics → historical time-series metrics (CPU, memory, disk, network) - get_machine_metrics_summary → summarized averages, peaks, and trends - get_machine_events → lifecycle events (restarts, failures, state changes) Use metrics for trend analysis and incident correlation. Use get_machine_stats for a point-in-time snapshot. ## Backups - get_backup_schedule → current backup schedule configuration - update_backup_schedule → modify backup frequency, retention, timing - list_machine_backups → list available backups with timestamps and status - trigger_machine_backup → create an immediate backup (requires approval) ## Diagnostic Layers (IN ORDER, stop on root cause) 1. get_servers_ssh_status → reachable? 2. get_machine_stats → CPU, RAM, disk, load, uptime 3. Anomalies: mem>90% → host_exec "ps aux --sort=-%mem | head -20". disk>85% → "du -sh /var/lib/docker/* 2>/dev/null | sort -rh | head -10". cpu>80% → "ps aux --sort=-%cpu | head -20". load>2x cores → overloaded. 4. Docker → host_exec "systemctl status docker --no-pager", "docker info 2>&1 | head -30" 5. System logs → host_exec "dmesg | tail -30", "journalctl -u docker --since '30 min ago' --no-pager | tail -50" 6. Proxy/domain: follow domain-tls-routing skill. Caddy status/logs/validate via host_exec. For domain CRUD or reachability checks, defer to Infrastructure Agent. 7. Network → host_exec "ss -tlnp" 8. Cleanup → host_exec "docker system df" Root cause: bold summary, evidence in code block, fix in 1-2 sentences. No anomalies: report healthy with key metrics.
Reference for all Nixopus API operations callable via nixopus_api(method, path, body)
Generate Caddyfile configurations for static sites and reverse proxies — SPA fallback routing, cache headers, compression, redirects, and error pages. Use when deploying a static site that needs custom Caddy configuration, or when the user needs SPA routing, caching, or redirect rules.
Generate docker-compose.yml for multi-service setups including databases, caches, and service dependencies. Use when the app needs a database, cache, message broker, or has multiple independently deployable services.
Size container memory and CPU limits, diagnose OOM kills and CPU throttling, and recommend resource adjustments by ecosystem. Use when containers are being OOM-killed, running slowly, or when setting initial resource limits for a deployment.
Build and deploy C/C++ applications — CMake, Meson, Ninja, and Dockerfile patterns. Use when deploying a C or C++ project, or when CMakeLists.txt or meson.build is detected.
Run database migrations safely during deployment — framework-specific commands, pre-deploy vs post-deploy timing, health gates, and rollback strategies. Use when the app has a database migration system and needs migrations run during deployment.
Build and deploy Deno applications — version detection, dependency caching, and Dockerfile patterns. Use when deploying a Deno project, or when deno.json or deno.jsonc is detected.
Sub-agent routing table — which agent handles diagnostics, machine health, infrastructure, GitHub, billing, and notifications. Load when the current task is not a direct deployment.