azure-diagnostics
**WORKFLOW SKILL** — Debug and troubleshoot Azure production issues: Container Apps + Function Apps diagnostics, KQL log analysis, health checks. WHEN: 'debug production issues', 'troubleshoot container apps', 'troubleshoot function apps', 'image pull failures', 'cold start issues', 'health probe failures'. DO NOT USE FOR: pre-deployment validation (azure-validate), cost analysis (azure-cost-optimization).
git clone --depth 1 https://github.com/jonathan-vella/apex /tmp/azure-diagnostics && cp -r /tmp/azure-diagnostics/.github/skills/azure-diagnostics ~/.claude/skills/azure-diagnosticsSKILL.md
# Azure Diagnostics
> **AUTHORITATIVE GUIDANCE — MANDATORY COMPLIANCE**
>
> This document is the **official source** for debugging and troubleshooting Azure production issues. Follow these instructions to diagnose and resolve common Azure service problems systematically.
## Triggers
Activate this skill when user wants to:
- Debug or troubleshoot production issues
- Diagnose errors in Azure services
- Analyze application logs or metrics
- Fix image pull, cold start, or health probe issues
- Investigate why Azure resources are failing
- Find root cause of application errors
- Troubleshoot Azure Function Apps (invocation failures, timeouts, binding errors)
- Find the App Insights or Log Analytics workspace linked to a Function App
## Rules
1. Start with systematic diagnosis flow
2. Use AppLens (MCP) for AI-powered diagnostics when available
3. Check resource health before deep-diving into logs
4. Select appropriate troubleshooting guide based on service type
5. Document findings and attempted remediation steps
---
## Steps
1. **Identify symptoms** - What's failing?
2. **Check resource health** - Is Azure healthy?
3. **Review logs** - What do logs show?
4. **Analyze metrics** - Performance patterns?
5. **Investigate recent changes** - What changed?
---
## Troubleshooting Guides by Service
| Service | Common Issues | Reference |
| ------------------ | --------------------------------------------------------------------------------------------- | ------------------------------------------------------ |
| **Container Apps** | Image pull failures, cold starts, health probes, port mismatches | [container-apps/](references/container-apps/README.md) |
| **Function Apps** | App details, invocation failures, timeouts, binding errors, cold starts, missing app settings | [functions/](references/functions/README.md) |
---
## Quick Reference
### Common Diagnostic Commands
```bash
# Check resource health
az resource show --ids RESOURCE_ID
# View activity log
az monitor activity-log list -g RG --max-events 20
# Container Apps logs
az containerapp logs show --name APP -g RG --follow
# Function App logs (query App Insights traces)
az monitor app-insights query --apps APP-INSIGHTS -g RG \
--analytics-query "traces | where timestamp > ago(1h) | order by timestamp desc | take 50"
```
### AppLens (MCP Tools)
For AI-powered diagnostics, use:
```
mcp_azure-mcp_applens
intent: "diagnose issues with <resource-name>"
command: "diagnose"
parameters:
resourceId: "<resource-id>"
Provides:
- Automated issue detection
- Root cause analysis
- Remediation recommendations
```
### Azure Monitor (MCP Tools)
For querying logs and metrics:
```
mcp_azure-mcp_monitor
intent: "query logs for <resource-name>"
command: "logs_query"
parameters:
workspaceId: "<workspace-id>"
query: "<KQL-query>"
```
See [kql-queries.md](references/kql-queries.md) for common diagnostic queries.
---
## Check Azure Resource Health
### Using MCP
```
mcp_azure-mcp_resourcehealth
intent: "check health status of <resource-name>"
command: "get"
parameters:
resourceId: "<resource-id>"
```
### Using CLI
```bash
# Check specific resource health
az resource show --ids RESOURCE_ID
# Check recent activity
az monitor activity-log list -g RG --max-events 20
```
---
## References
- [KQL Query Library](references/kql-queries.md)
- [Azure Resource Graph Queries](references/azure-resource-graph.md)
- [InfraOps KQL Templates](references/infraops-kql-templates.md) — custom Azure Resource Graph and Log Analytics queries
- [InfraOps Health Checks](references/infraops-health-checks.md) — per-resource-type diagnostic commands
- [InfraOps Remediation Playbooks](references/infraops-remediation-playbooks.md) — 6-phase diagnostic workflow
- [Function Apps Troubleshooting](references/functions/README.md)
## Reference Index
Load these on demand — do NOT read all at once:
| Reference | When to Load |
| ---------------------------------------------- | ------------------------------ |
| `references/azure-resource-graph.md` | Azure Resource Graph |
| `references/infraops-health-checks.md` | Infraops Health Checks |
| `references/infraops-kql-templates.md` | Infraops Kql Templates |
| `references/infraops-remediation-playbooks.md` | Infraops Remediation Playbooks |
| `references/kql-queries.md` | Kql Queries |Guidance for instrumenting webapps with Azure Application Insights. Provides telemetry patterns, SDK setup, and configuration references. WHEN: how to instrument app, App Insights SDK, telemetry patterns, what is App Insights, Application Insights guidance, instrumentation examples, APM best practices.
Use for Azure AI: Search, Speech, OpenAI, Document Intelligence. Helps with search, vector/hybrid search, speech-to-text, text-to-speech, transcription, OCR. WHEN: AI Search, query search, vector search, hybrid search, semantic search, speech-to-text, text-to-speech, transcribe, OCR, convert text to speech.
Configure Azure API Management as an AI Gateway for AI models, MCP tools, and agents. WHEN: semantic caching, token limit, content safety, load balancing, AI model governance, MCP rate limiting, jailbreak detection, add Azure OpenAI backend, add AI Foundry model, test AI gateway, LLM policies, configure AI backend, token metrics, AI cost control, convert API to MCP, import OpenAPI to gateway.
ROUTING SKILL — delegates to specialized diagram skills. USE FOR: any diagram request when the caller does not know which tool to use. Routes to drawio, python-diagrams, or mermaid based on diagram type.
Build and deploy GitHub Copilot SDK apps to Azure. WHEN: build copilot app, create copilot app, copilot SDK, @github/copilot-sdk, scaffold copilot project, copilot-powered app, deploy copilot app, host on azure, azure model, BYOM, bring your own model, use my own model, azure openai model, DefaultAzureCredential, self-hosted model, copilot SDK service, chat app with copilot, copilot-sdk-service template, azd init copilot, CopilotClient, createSession, sendAndWait, GitHub Models API.
Troubleshoot and resolve issues with Azure Messaging SDKs for Event Hubs and Service Bus. Covers connection failures, authentication errors, message processing issues, and SDK configuration problems. WHEN: event hub SDK error, service bus SDK issue, messaging connection failure, AMQP error, event processor host issue, message lock lost, send timeout, receiver disconnected, SDK troubleshooting, azure messaging SDK, event hub consumer, service bus queue issue, topic subscription error, enable logging event hub, service bus logging, eventhub python, servicebus java, eventhub javascript, servicebus dotnet, event hub checkpoint, event hub not receiving messages, service bus dead letter.
Authoritative reference for VS Code Copilot customization mechanisms: instructions, prompt files, custom agents, agent skills, MCP servers, hooks, and plugins. Use when deciding which customization type to use, creating new .instructions.md/.prompt.md/.agent.md/SKILL.md/mcp.json files from scratch, or debugging why a customization is not loading. DO NOT USE FOR: routine file edits where the format is already known.
Provides canonical entity counts from count-manifest.json. Use when agents need to reference how many agents, skills, instructions, or validators exist. Prevents hard-coded counts. WHEN: agent count, skill count, how many agents, how many skills, entity inventory, project statistics.