Skip to main content
ClaudeWave
Skill1.4k repo starsupdated 27d ago

domain-tls-routing

**domain-tls-routing** diagnoses failures in domain resolution, TLS certificate provisioning, and request routing. Use this skill when a domain fails to resolve, TLS certificates don't provision, the proxy returns 502/503/504 errors, or custom domains remain stuck in pending status. It systematizes troubleshooting across DNS configuration, certificate issuance, and routing layer health to identify whether issues originate from external DNS settings, certificate provisioning failures, or internal routing misconfigurations.

Install in Claude Code
Copy
git clone --depth 1 https://github.com/nixopus/nixopus /tmp/domain-tls-routing && cp -r /tmp/domain-tls-routing/api/skills/domain-tls-routing ~/.claude/skills/domain-tls-routing
Then start a new Claude Code session; the skill loads automatically.

SKILL.md

# Domain, TLS, and Routing Diagnosis

## CRITICAL: Never Reveal Internals

The user knows ONLY that they have "a server" with deployed apps and domains. They must NEVER learn about:

- The existence of multiple proxy layers, edge proxies, or any layered routing architecture
- VMs, Firecracker, guest IPs, infra hosts, tenant hosts, or that their "server" is a virtual machine
- SSH tunnels, Caddy admin API, Caddy config JSON, server blocks, route matchers, or handler arrays
- Abyss, provisioning system internals, reconciler, pending removal sets, extension domain hashes
- Cloudflare API, zone IDs, or that the system manages DNS records on the user's behalf behind the scenes
- Redis queues, KrakenD, internal service routing, upstream host resolution, or multi-layer proxy hops
- Any internal IP addresses, internal ports (like 2019), or internal service names

Banned phrases in user-facing output: "edge proxy", "VM proxy", "VM", "virtual machine", "infra host", "guest IP", "edge Caddy", "upstream dial", "Caddy admin API", "reconciler", "Cloudflare zone", "two-layer", "proxy layer", "on-demand TLS", "ACME challenge on", "registration queue", "server block", "route matcher".

Always say: "your server", "your domain", "DNS settings", "TLS certificate", "routing configuration", "proxy service".

## How Routing Works (Agent-Only Context — NEVER Describe This to Users)

Requests reach the user's app through a routing chain. The agent needs to understand this to diagnose issues, but must never describe the architecture to the user.

The routing chain has an outer layer and an inner layer. The outer layer receives public traffic and forwards it to the correct server. The inner layer runs on the user's server and routes to the correct application container.

The outer layer handles wildcard TLS for `*.nixopus.ai` subdomains and forwards custom domain traffic. The inner layer handles per-application routing and TLS for application-specific domains.

DNS records (A and wildcard A) are managed by the system for `*.nixopus.ai` subdomains. Custom domains require the user to set up a CNAME pointing to their assigned `subdomain.nixopus.ai`.

When diagnosing, check from outside in: public reachability first, then server-level proxy config, then container-level app health. If the outer layer is misconfigured, the user can't fix it — escalate internally. If the inner layer is misconfigured, use `proxy_config` and domain tools to fix it.

## Domain Types

| Type | Example | How it works |
|---|---|---|
| Auto-generated subdomain | `a1b2c3d4.example.nixopus.ai` | Created during app deployment; DNS is pre-configured |
| Custom domain | `app.userdomain.com` | User adds CNAME pointing to their `subdomain.nixopus.ai`; requires DNS verification |

## Domain Lifecycle

### Auto-generated subdomain

1. `generate_random_subdomain` creates an 8-char prefix + org domain
2. Domain added to application via `add_application_domain`
3. Server proxy registers the route (domain → container)
4. Wildcard DNS already covers `*.subdomain.nixopus.ai`
5. TLS provisioned automatically on first request

Failure points: step 3 (route registration fails), step 5 (TLS provisioning fails if DNS doesn't resolve to the server).

### Custom domain

1. User provides domain name
2. System returns DNS instructions:
   - CNAME: `app.userdomain.com` → `subdomain.nixopus.ai`
   - TXT: `_nixopus-verify.app.userdomain.com` → verification token
3. User configures DNS at their provider
4. Verification checks CNAME/A records and TXT record
5. On success: status moves to `dns_verified`, routing configured
6. Application domain binding adds the route on the server
7. TLS provisioned on first request via ACME

Failure points: step 3 (user misconfigures DNS), step 4 (DNS propagation delay), step 5 (routing registration fails), step 7 (ACME challenge fails because DNS doesn't resolve correctly).

## Diagnostic Flows

### Domain not resolving (user reports "site can't be reached")

1. **Check domain status**
   - `get_domains` to find the domain and its current status
   - If status is `pending_dns`: DNS not yet configured or verified — guide user through DNS setup
   - If status is `dns_verified`: DNS is good, problem is downstream

2. **Check DNS resolution**
   - `network_diagnostics` with type `dns` targeting the domain
   - Expected: resolves to the server's public IP
   - If fails: user's DNS is misconfigured
   - For custom domains: CNAME should point to `subdomain.nixopus.ai`
   - For auto-generated domains: should resolve automatically (system-managed)

3. **Check reachability**
   - `http_probe` the domain on port 443
   - If DNS resolves but HTTP fails: routing or TLS issue (continue below)

Tell the user: "Your domain's DNS is not pointing to the correct server" or "DNS is configured correctly but there's a routing issue on the server."

### TLS certificate errors (ERR_CERT, SSL_ERROR, mixed content)

1. **Verify DNS first** — TLS provisioning requires DNS to resolve to the server
   - `network_diagnostics` type `dns` on the domain
   - If DNS doesn't resolve: TLS can't be provisioned, fix DNS first

2. **Check proxy config**
   - `proxy_config` for the application
   - If `tls_enabled` is false: TLS not configured for this route
   - If `tls_enabled` is true but cert errors persist: certificate provisioning may have failed

3. **Check HTTP vs HTTPS**
   - `http_probe` on port 80 (HTTP) — if it works but 443 doesn't, TLS provisioning failed
   - `http_probe` on port 443 (HTTPS) — if cert error, the certificate is invalid or missing

4. **Common TLS failure causes**

| Symptom | Cause | Fix |
|---|---|---|
| `ERR_CERT_AUTHORITY_INVALID` | Certificate not yet provisioned or provisioning failed | Verify DNS points to the server; wait a few minutes for automatic provisioning |
| `ERR_CERT_COMMON_NAME_INVALID` | Certificate issued for wrong domain | Check the domain binding matches the actual domain name |
| `SSL_ERROR_RX_RECORD_TOO_LONG` | App serving
api-catalogSkill

Reference for all Nixopus API operations callable via nixopus_api(method, path, body)

caddyfile-generationSkill

Generate Caddyfile configurations for static sites and reverse proxies — SPA fallback routing, cache headers, compression, redirects, and error pages. Use when deploying a static site that needs custom Caddy configuration, or when the user needs SPA routing, caching, or redirect rules.

compose-setupSkill

Generate docker-compose.yml for multi-service setups including databases, caches, and service dependencies. Use when the app needs a database, cache, message broker, or has multiple independently deployable services.

container-resource-tuningSkill

Size container memory and CPU limits, diagnose OOM kills and CPU throttling, and recommend resource adjustments by ecosystem. Use when containers are being OOM-killed, running slowly, or when setting initial resource limits for a deployment.

cpp-deploySkill

Build and deploy C/C++ applications — CMake, Meson, Ninja, and Dockerfile patterns. Use when deploying a C or C++ project, or when CMakeLists.txt or meson.build is detected.

database-migrationSkill

Run database migrations safely during deployment — framework-specific commands, pre-deploy vs post-deploy timing, health gates, and rollback strategies. Use when the app has a database migration system and needs migrations run during deployment.

deno-deploySkill

Build and deploy Deno applications — version detection, dependency caching, and Dockerfile patterns. Use when deploying a Deno project, or when deno.json or deno.jsonc is detected.

deploy-delegationSkill

Sub-agent routing table — which agent handles diagnostics, machine health, infrastructure, GitHub, billing, and notifications. Load when the current task is not a direct deployment.