Skip to main content
ClaudeWave
Skill1.1k repo starsupdated 4d ago

cloudwatch

This CloudWatch skill enables monitoring and observability of AWS resources through metrics collection, log querying with CloudWatch Insights, alarm configuration, and dashboard creation. Use it when setting up application monitoring, creating automated alerts based on performance thresholds, troubleshooting issues through log analysis, configuring metric filters, or building custom dashboards to track AWS environment health and performance.

Install in Claude Code
Copy
git clone --depth 1 https://github.com/itsmostafa/aws-agent-skills /tmp/cloudwatch && cp -r /tmp/cloudwatch/skills/cloudwatch ~/.claude/skills/cloudwatch
Then start a new Claude Code session; the skill loads automatically.

SKILL.md

# AWS CloudWatch

Amazon CloudWatch provides monitoring and observability for AWS resources and applications. It collects metrics, logs, and events, enabling you to monitor, troubleshoot, and optimize your AWS environment.

## Table of Contents

- [Core Concepts](#core-concepts)
- [Common Patterns](#common-patterns)
- [CLI Reference](#cli-reference)
- [Best Practices](#best-practices)
- [Troubleshooting](#troubleshooting)
- [References](#references)

## Core Concepts

### Metrics

Time-ordered data points published to CloudWatch. Key components:
- **Namespace**: Container for metrics (e.g., `AWS/Lambda`)
- **Metric name**: Name of the measurement (e.g., `Invocations`)
- **Dimensions**: Name-value pairs for filtering (e.g., `FunctionName=MyFunc`)
- **Statistics**: Aggregations (Sum, Average, Min, Max, SampleCount, pN)

### Logs

Log data from AWS services and applications:
- **Log groups**: Collections of log streams
- **Log streams**: Sequences of log events from same source
- **Log events**: Individual log entries with timestamp and message

### Alarms

Automated actions based on metric thresholds:
- **States**: OK, ALARM, INSUFFICIENT_DATA
- **Actions**: SNS notifications, Auto Scaling, EC2 actions

## Common Patterns

### Create a Metric Alarm

**AWS CLI:**

```bash
# CPU utilization alarm for EC2
aws cloudwatch put-metric-alarm \
  --alarm-name "HighCPU-i-1234567890abcdef0" \
  --metric-name CPUUtilization \
  --namespace AWS/EC2 \
  --statistic Average \
  --period 300 \
  --threshold 80 \
  --comparison-operator GreaterThanThreshold \
  --evaluation-periods 2 \
  --dimensions Name=InstanceId,Value=i-1234567890abcdef0 \
  --alarm-actions arn:aws:sns:us-east-1:123456789012:alerts \
  --ok-actions arn:aws:sns:us-east-1:123456789012:alerts
```

**boto3:**

```python
import boto3

cloudwatch = boto3.client('cloudwatch')

cloudwatch.put_metric_alarm(
    AlarmName='HighCPU-i-1234567890abcdef0',
    MetricName='CPUUtilization',
    Namespace='AWS/EC2',
    Statistic='Average',
    Period=300,
    Threshold=80.0,
    ComparisonOperator='GreaterThanThreshold',
    EvaluationPeriods=2,
    Dimensions=[
        {'Name': 'InstanceId', 'Value': 'i-1234567890abcdef0'}
    ],
    AlarmActions=['arn:aws:sns:us-east-1:123456789012:alerts'],
    OKActions=['arn:aws:sns:us-east-1:123456789012:alerts']
)
```

### Lambda Error Rate Alarm

```bash
aws cloudwatch put-metric-alarm \
  --alarm-name "LambdaErrorRate-MyFunction" \
  --metrics '[
    {
      "Id": "errors",
      "MetricStat": {
        "Metric": {
          "Namespace": "AWS/Lambda",
          "MetricName": "Errors",
          "Dimensions": [{"Name": "FunctionName", "Value": "MyFunction"}]
        },
        "Period": 60,
        "Stat": "Sum"
      },
      "ReturnData": false
    },
    {
      "Id": "invocations",
      "MetricStat": {
        "Metric": {
          "Namespace": "AWS/Lambda",
          "MetricName": "Invocations",
          "Dimensions": [{"Name": "FunctionName", "Value": "MyFunction"}]
        },
        "Period": 60,
        "Stat": "Sum"
      },
      "ReturnData": false
    },
    {
      "Id": "errorRate",
      "Expression": "errors/invocations*100",
      "Label": "Error Rate",
      "ReturnData": true
    }
  ]' \
  --threshold 5 \
  --comparison-operator GreaterThanThreshold \
  --evaluation-periods 3 \
  --alarm-actions arn:aws:sns:us-east-1:123456789012:alerts
```

### Query Logs with Insights

```bash
# Find errors in Lambda logs
aws logs start-query \
  --log-group-name /aws/lambda/MyFunction \
  --start-time $(date -d '1 hour ago' +%s) \
  --end-time $(date +%s) \
  --query-string '
    fields @timestamp, @message
    | filter @message like /ERROR/
    | sort @timestamp desc
    | limit 50
  '

# Get query results
aws logs get-query-results --query-id <query-id>
```

**boto3:**

```python
import boto3
import time

logs = boto3.client('logs')

# Start query
response = logs.start_query(
    logGroupName='/aws/lambda/MyFunction',
    startTime=int(time.time()) - 3600,
    endTime=int(time.time()),
    queryString='''
        fields @timestamp, @message
        | filter @message like /ERROR/
        | sort @timestamp desc
        | limit 50
    '''
)

query_id = response['queryId']

# Wait for results
while True:
    result = logs.get_query_results(queryId=query_id)
    if result['status'] == 'Complete':
        break
    time.sleep(1)

for row in result['results']:
    print(row)
```

### Create Metric Filter

Extract metrics from log patterns:

```bash
# Create metric filter for error count
aws logs put-metric-filter \
  --log-group-name /aws/lambda/MyFunction \
  --filter-name ErrorCount \
  --filter-pattern "ERROR" \
  --metric-transformations \
    metricName=ErrorCount,metricNamespace=MyApp,metricValue=1,defaultValue=0
```

### Publish Custom Metrics

```python
import boto3

cloudwatch = boto3.client('cloudwatch')

cloudwatch.put_metric_data(
    Namespace='MyApp',
    MetricData=[
        {
            'MetricName': 'OrdersProcessed',
            'Value': 1,
            'Unit': 'Count',
            'Dimensions': [
                {'Name': 'Environment', 'Value': 'Production'},
                {'Name': 'OrderType', 'Value': 'Standard'}
            ]
        }
    ]
)
```

### Create Dashboard

```bash
cat > dashboard.json << 'EOF'
{
  "widgets": [
    {
      "type": "metric",
      "x": 0, "y": 0, "width": 12, "height": 6,
      "properties": {
        "title": "Lambda Invocations",
        "metrics": [
          ["AWS/Lambda", "Invocations", "FunctionName", "MyFunction"]
        ],
        "period": 60,
        "stat": "Sum",
        "region": "us-east-1"
      }
    },
    {
      "type": "log",
      "x": 12, "y": 0, "width": 12, "height": 6,
      "properties": {
        "title": "Recent Errors",
        "query": "SOURCE '/aws/lambda/MyFunction' | filter @message like /ERROR/ | limit 20",
        "region": "us-east-1"
      }
    }
  ]
}
EOF

aws cloudwatch put-dashboard \
  --dashboard-name MyAppDas
api-gatewaySkill

AWS API Gateway for REST and HTTP API management. Use when creating APIs, configuring integrations, setting up authorization, managing stages, implementing rate limiting, or troubleshooting API issues.

bedrockSkill

AWS Bedrock foundation models for generative AI. Use when invoking foundation models, building AI applications, creating embeddings, configuring model access, or implementing RAG patterns.

cloudformationSkill

AWS CloudFormation infrastructure as code for stack management. Use when writing templates, deploying stacks, managing drift, troubleshooting deployments, or organizing infrastructure with nested stacks.

cognitoSkill

AWS Cognito user authentication and authorization service. Use when setting up user pools, configuring identity pools, implementing OAuth flows, managing user attributes, or integrating with social identity providers.

dynamodbSkill

AWS DynamoDB NoSQL database for scalable data storage. Use when designing table schemas, writing queries, configuring indexes, managing capacity, implementing single-table design, or troubleshooting performance issues.

ec2Skill

>

ecsSkill

AWS ECS container orchestration for running Docker containers. Use when deploying containerized applications, configuring task definitions, setting up services, managing clusters, or troubleshooting container issues.

eksSkill

AWS EKS Kubernetes management for clusters, node groups, and workloads. Use when creating clusters, configuring IRSA, managing node groups, deploying applications, or integrating with AWS services.