langchain4j-testing-strategies
This Claude Code item provides unit test, integration test, and mock AI patterns for LangChain4j applications in Java. It includes strategies for mocking LLM responses, testing retrieval and RAG workflows with Testcontainers, and validating AI services. Use this when building unit tests for LangChain4j services, integration testing with containerized models like Ollama, or testing LLM-based Java applications without calling external APIs.
git clone --depth 1 https://github.com/giuseppe-trisciuoglio/developer-kit /tmp/langchain4j-testing-strategies && cp -r /tmp/langchain4j-testing-strategies/plugins/developer-kit-java/skills/langchain4j-testing-strategies ~/.claude/skills/langchain4j-testing-strategiesSKILL.md
# LangChain4J Testing Strategies
## Overview
Patterns for unit testing with mocks, integration testing with Testcontainers, and end-to-end validation of RAG systems, AI Services, and tool execution.
## When to Use
- **Unit testing AI services**: When you need fast, isolated tests for services using LangChain4j AiServices
- **Integration testing LangChain4j components**: When testing real ChatModel, EmbeddingModel, or RAG pipelines with Testcontainers
- **Mocking AI models**: When you need deterministic responses without calling external APIs
- **Testing LLM-based Java applications**: When validating RAG workflows, tool execution, or retrieval chains
## Instructions
### 1. Unit Testing with Mocks
Use mock models for fast, isolated testing. See `references/unit-testing.md`.
```java
ChatModel mockModel = mock(ChatModel.class);
when(mockModel.generate(any(String.class)))
.thenReturn(Response.from(AiMessage.from("Mocked response")));
var service = AiServices.builder(AiService.class)
.chatModel(mockModel)
.build();
```
### 2. Configure Testing Dependencies
Setup Maven/Gradle dependencies. See `references/testing-dependencies.md`.
- `langchain4j-test` - Guardrail assertions
- `testcontainers` - Containerized testing
- `mockito` - Mock external dependencies
- `assertj` - Fluent assertions
### 3. Integration Testing with Testcontainers
Test with real services. See `references/integration-testing.md`.
```java
@Testcontainers
class OllamaIntegrationTest {
@Container
static GenericContainer<?> ollama = new GenericContainer<>(
DockerImageName.parse("ollama/ollama:0.5.4")
).withExposedPorts(11434);
@Test
void shouldGenerateResponse() {
// Verify container is healthy
assertTrue(ollama.isRunning());
await().atMost(30, TimeUnit.SECONDS)
.until(() -> ollama.getLogs().contains("API server listening"));
ChatModel model = OllamaChatModel.builder()
.baseUrl(ollama.getEndpoint())
.build();
// Verify model responds before running tests
assertDoesNotThrow(() -> model.generate("ping"));
String response = model.generate("Test query");
assertNotNull(response);
}
}
```
### 4. Advanced Features
Streaming, memory, error handling patterns in `references/advanced-testing.md`.
### 5. Testing Workflow
Follow the testing pyramid from `references/workflow-patterns.md`:
- **70% Unit Tests**: Fast, isolated with mocks
- **20% Integration Tests**: Real services with health checks
- **10% End-to-End Tests**: Complete workflows
```
70% Unit Tests ─ Mock ChatModel, guardrails, edge cases
20% Integration Tests ─ Testcontainers, vector stores, RAG
10% End-to-End Tests ─ Complete user journeys
```
### Troubleshooting
- **Container fails to start**: Check Docker daemon is running, verify image exists, increase timeout
- **Model not responding**: Verify baseUrl is correct, check container logs, ensure model is loaded
- **Test timeout**: Increase `@Timeout` duration for slow models, check container resource limits
- **Flaky tests**: Add retry logic or health checks before assertions
## Examples
### Unit Test
```java
@Test
void shouldProcessQueryWithMock() {
ChatModel mockModel = mock(ChatModel.class);
when(mockModel.generate(any(String.class)))
.thenReturn(Response.from(AiMessage.from("Test response")));
var service = AiServices.builder(AiService.class)
.chatModel(mockModel)
.build();
String result = service.chat("What is Java?");
assertEquals("Test response", result);
}
```
### Integration Test with Testcontainers
```java
@Testcontainers
class RAGIntegrationTest {
@Container
static GenericContainer<?> ollama = new GenericContainer<>(
DockerImageName.parse("ollama/ollama:0.5.4")
);
@BeforeAll
static void waitForContainerReady() {
await().atMost(60, TimeUnit.SECONDS)
.until(() -> ollama.getLogs().contains("API server listening"));
}
@Test
void shouldCompleteRAGWorkflow() {
assertTrue(ollama.isRunning());
var chatModel = OllamaChatModel.builder()
.baseUrl(ollama.getEndpoint())
.build();
var embeddingModel = OllamaEmbeddingModel.builder()
.baseUrl(ollama.getEndpoint())
.build();
var store = new InMemoryEmbeddingStore<>();
var retriever = EmbeddingStoreContentRetriever.builder()
.chatModel(chatModel)
.embeddingStore(store)
.embeddingModel(embeddingModel)
.build();
var assistant = AiServices.builder(RagAssistant.class)
.chatLanguageModel(chatModel)
.contentRetriever(retriever)
.build();
String response = assistant.chat("What is Spring Boot?");
assertNotNull(response);
assertTrue(response.contains("Spring"));
}
}
```
## Best Practices
- Use `@BeforeEach`/`@AfterEach` for test isolation
- Never call real APIs in unit tests; use mocks
- Include `@Timeout` for external service calls
- Test both success and error handling scenarios
- Validate response coherence and edge cases
## Common Patterns
### Mock Strategy
```java
ChatModel mockModel = mock(ChatModel.class);
when(mockModel.generate(anyString())).thenReturn(Response.from(AiMessage.from("Mocked")));
when(mockModel.generate(eq("Hello"))).thenReturn(Response.from(AiMessage.from("Hi")));
when(mockModel.generate(contains("Java"))).thenReturn(Response.from(AiMessage.from("Java")));
```
### Assertion Helpers
```java
assertThat(response).isNotNull().isNotEmpty();
assertThat(response).containsAll(expectedKeywords);
assertThat(response).doesNotContain("error");
```
## Reference Documentation
- **[Testing Dependencies](references/testing-dependencies.md)** - Maven/Gradle configuration
- **[Unit Testing](references/unit-testing.md)** - MockProvides chunking strategies for RAG systems. Generates chunk size recommendations (256-1024 tokens), overlap percentages (10-20%), and semantic boundary detection methods. Validates semantic coherence and evaluates retrieval precision/recall metrics. Use when building retrieval-augmented generation systems, vector databases, or processing large documents.
>
Implements document chunking, embedding generation, vector storage, and retrieval pipelines for Retrieval-Augmented Generation systems. Use when building RAG applications, creating document Q&A systems, or integrating AI with knowledge bases.
Provides AWS CloudFormation patterns for Auto Scaling including EC2, ECS, and Lambda. Use when creating Auto Scaling groups, launch configurations, launch templates, scaling policies, lifecycle hooks, and predictive scaling. Covers template structure with Parameters, Outputs, Mappings, Conditions, cross-stack references, and best practices for high availability and cost optimization.
Provides AWS CloudFormation patterns for Amazon Bedrock resources including agents, knowledge bases, data sources, guardrails, prompts, flows, and inference profiles. Use when creating Bedrock agents with action groups, implementing RAG with knowledge bases, configuring vector stores, setting up content moderation guardrails, managing prompts, orchestrating workflows with flows, and configuring inference profiles for model optimization.
Provides AWS CloudFormation patterns for CloudFront distributions, origins (ALB, S3, Lambda@Edge, VPC Origins), CacheBehaviors, Functions, SecurityHeaders, parameters, Outputs and cross-stack references. Use when creating CloudFront distributions with CloudFormation, configuring multiple origins, implementing caching strategies, managing custom domains with ACM, configuring WAF, and optimizing performance.
Provides AWS CloudFormation patterns for CloudWatch monitoring, metrics, alarms, dashboards, logs, and observability. Use when creating CloudWatch metrics, alarms, dashboards, log groups, log subscriptions, anomaly detection, synthesized canaries, Application Signals, and implementing template structure with Parameters, Outputs, Mappings, Conditions, cross-stack references, and CloudWatch best practices for monitoring production infrastructure.
Provides AWS CloudFormation patterns for DynamoDB tables, GSIs, LSIs, auto-scaling, and streams. Use when creating DynamoDB tables with CloudFormation, configuring primary keys, local/global secondary indexes, capacity modes (on-demand/provisioned), point-in-time recovery, encryption, TTL, and implementing template structure with Parameters, Outputs, Mappings, Conditions, cross-stack references.