Skill85 estrellas del repoactualizado 7d ago
ai-observability
This Spring Boot skill provides observability for AI applications by enabling built-in Micrometer instrumentation in Spring AI 1.0+. Use it to monitor LLM operations, track token consumption through auto-generated OpenTelemetry metrics, configure Prometheus endpoint exposure, and implement custom AI metrics with configurable prompt and completion logging for production deployments.
Instalar en Claude Code
Copiargit clone --depth 1 https://github.com/rrezartprebreza/spring-boot-skills /tmp/ai-observability && cp -r /tmp/ai-observability/skills/ai-observability ~/.claude/skills/ai-observabilityDespués abre una sesión nueva de Claude Code; el skill carga automáticamente.
Definición
SKILL.md
# AI Observability
## Dependencies
```xml
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-actuator</artifactId>
</dependency>
<dependency>
<groupId>io.micrometer</groupId>
<artifactId>micrometer-registry-prometheus</artifactId>
</dependency>
```
## Spring AI Built-in Observability
Spring AI 1.0+ includes built-in Micrometer instrumentation:
```yaml
spring:
ai:
chat:
observations:
log-prompt: true # GA renamed include-prompt → log-prompt. OFF in prod (PII).
log-completion: true # GA renamed include-completion → log-completion
management:
metrics:
tags:
application: order-service
endpoints:
web:
exposure:
include: health,prometheus,metrics
```
Auto-generated metrics (OpenTelemetry GenAI semantic conventions):
- `gen_ai.client.operation` — model call latency, tagged with provider and model
- `gen_ai.client.token.usage` — token counts (input/output/total)
- `spring.ai.chat.client` — ChatClient-level operation timer/span
## Custom AI Metrics
```java
@Component
@RequiredArgsConstructor
public class AiMetrics {
private final MeterRegistry meterRegistry;
private final Timer.Builder promptTimer = Timer.builder("ai.prompt.latency")
.description("LLM prompt latency");
private final Counter.Builder tokenCounter = Counter.builder("ai.tokens.used")
.description("Total tokens consumed");
public <T> T track(String operation, String model, Supplier<T> call) {
return Timer.builder("ai.prompt.latency")
.tag("operation", operation)
.tag("model", model)
.register(meterRegistry)
.recordCallable(() -> call.get());
}
public void recordTokens(String operation, String model, int inputTokens, int outputTokens) {
Counter.builder("ai.tokens.used")
.tag("operation", operation)
.tag("model", model)
.tag("type", "input")
.register(meterRegistry)
.increment(inputTokens);
Counter.builder("ai.tokens.used")
.tag("operation", operation)
.tag("model", model)
.tag("type", "output")
.register(meterRegistry)
.increment(outputTokens);
}
}
```
## Prompt/Response Logging Advisor
GA replaced the whole advisor API: `CallAroundAdvisor` → `CallAdvisor`, `AdvisedRequest` →
`ChatClientRequest`, `AdvisedResponse` → `ChatClientResponse`, and `Usage.getGenerationTokens()` →
`getCompletionTokens()`. Agents reliably generate the old one — it does not compile on 1.0.
```java
@Component
public class AiAuditAdvisor implements CallAdvisor {
private static final Logger log = LoggerFactory.getLogger(AiAuditAdvisor.class);
@Override
public ChatClientResponse adviseCall(ChatClientRequest request, CallAdvisorChain chain) {
String requestId = UUID.randomUUID().toString();
long start = System.currentTimeMillis();
log.info("[AI-AUDIT] requestId={} promptLength={}",
requestId, request.prompt().getUserMessage().getText().length());
try {
ChatClientResponse response = chain.nextCall(request);
long latency = System.currentTimeMillis() - start;
ChatResponse chatResponse = response.chatResponse();
if (chatResponse != null && chatResponse.getMetadata() != null) {
Usage usage = chatResponse.getMetadata().getUsage();
log.info("[AI-AUDIT] requestId={} latencyMs={} inputTokens={} outputTokens={}",
requestId, latency,
usage.getPromptTokens(), usage.getCompletionTokens()); // GA: not getGenerationTokens()
}
return response;
} catch (Exception e) {
log.error("[AI-AUDIT] requestId={} FAILED after {}ms", requestId,
System.currentTimeMillis() - start, e);
throw e;
}
}
@Override
public String getName() { return "AiAuditAdvisor"; }
@Override
public int getOrder() { return Ordered.LOWEST_PRECEDENCE; }
}
```
## Cost Estimation
```java
@Service
public class AiCostEstimator {
// Prices per million tokens — update when pricing changes
private static final Map<String, double[]> PRICING = Map.of(
"claude-sonnet-4-20250514", new double[]{3.0, 15.0}, // [input, output] per 1M tokens
"claude-haiku-4-5-20251001", new double[]{0.8, 4.0},
"gpt-4o", new double[]{5.0, 15.0},
"gpt-4o-mini", new double[]{0.15, 0.6}
);
public double estimateCost(String model, int inputTokens, int outputTokens) {
double[] prices = PRICING.getOrDefault(model, new double[]{5.0, 15.0});
return (inputTokens * prices[0] + outputTokens * prices[1]) / 1_000_000;
}
}
```
## Structured AI Audit Log (DB)
```java
@Entity
@Table(name = "ai_audit_log")
public class AiAuditLog {
@Id @GeneratedValue(strategy = GenerationType.UUID)
private UUID id;
private String operation;
private String model;
private int inputTokens;
private int outputTokens;
private double estimatedCostUsd;
private long latencyMs;
private boolean success;
private Instant createdAt;
}
// Async to avoid blocking main flow
@Async
public void saveAuditLog(AiAuditLog log) {
auditLogRepository.save(log);
}
```
## application.yml — Full Observability
```yaml
management:
endpoints:
web:
exposure:
include: health,prometheus,metrics,info
metrics:
distribution:
percentiles-histogram:
ai.prompt.latency: true # enables P50/P95/P99
tracing:
sampling:
probability: 1.0 # 100% trace sampling in dev, reduce in prod
logging:
level:
org.springframework.ai: DEBUG # enable in dev only
```
## Gotchas
- Agent implements `CallAroundAdvisor`/`AdvisedRequest` — removed in GA; use `CallAdvisor`/`ChatClientRequest`
- Agent calls `usage.getGenerationT