DivInit: More Effective Agentic Search Without Retraining Models

When scaling a search agent by executing multiple trajectories in parallel, intuition suggests that more attempts should yield richer answers. In practice, however, models tend to formulate very similar initial queries, causing each thread to retrieve overlapping documents and trajectories to diverge very little from the first step. The result: diminishing returns as more rollouts are added, with no real information gain.

That is exactly the problem addressed by DivInit (arXiv:2606.17209), published this week by researchers from Carnegie Mellon University.

The Problem: Redundancy in the First Query

Inference-time scaling for agentic search has two classic dimensions: depth (more turns and tokens per trajectory) and width (more parallel trajectories). The work focuses on the width axis and precisely diagnoses its Achilles heel.

When a model generates k queries independently for k rollouts, natural sampling variance is insufficient to ensure each thread explores a different angle of the question. The resulting queries are semantically similar, retrieved passages overlap, and trajectories end up being redundant. All the additional compute does little good.

The Solution: Deliberate Diversity at the Start

DivInit solves this with a surgical intervention and zero training cost. Instead of sampling k queries independently, the method:

1. Generates n query candidates in a single model call (n > k).
2. Selects k seeds by maximizing diversity among them.
3. Launches each seed as an independent parallel trajectory.

The effect is that each thread starts from a different angle in the search space, retrieves complementary evidence, and arrives at less correlated reasoning. The final aggregation thus has more non-redundant information to work with.

Results: Five to Seven Points in Multi-Hop QA

The authors evaluate DivInit on five open-source models and eight benchmarks. The results are consistent: average gains of five to seven points on multi-hop QA questions compared to standard parallel sampling, with equivalent compute budget. This is not a marginal improvement for a specific use case; it replicates across all models and most benchmarks.

What matters from a practical standpoint is the cost of adoption: none. DivInit requires no fine-tuning, does not alter the agent architecture, and can be applied as a layer on top of any existing agentic framework. Code is available at github.com/cxcscmu/diverse-query-initialization.

Who This Matters For

The finding has relevance at several levels:

Teams building search agents on Claude Code with specialized sub-agents for information retrieval: DivInit offers a concrete way to improve response quality without increasing your token budget.
Developers of multi-hop RAG pipelines: the technique directly attacks the weak point of these systems when scaling through voting or best-of-k.
Researchers studying test-time compute: the paper adds evidence to the discussion on when width scaling is worthwhile and when compute is wasted due to lack of diversity.

The work also has implications for how hooks and sub-agents are designed in environments like Claude Code: if multiple search sub-agents are launched in parallel for the same task, coordinating their initial queries can make the difference between obtaining complementary perspectives or getting the same result five times over.

Broader Context

The idea of diversifying exploration is not new in search or planning, but applying it so directly to the first token of each agentic trajectory, and demonstrating it empirically at this scale, is what gives the paper its value. The community has spent months discussing the limits of width scaling; DivInit offers an operational answer, not just a theoretical one.

---

From our perspective, this is a solid result precisely because it is simple to implement and generalizable. If you are running agentic searches with multiple rollouts, it is worth reviewing the repository before investing in more compute.

DivInit: More Effective Agentic Search Without Retraining Models

The Problem: Redundancy in the First Query

The Solution: Deliberate Diversity at the Start

Results: Five to Seven Points in Multi-Hop QA

Who This Matters For

Broader Context

Sources

Read next

OpenAI publishes ten advances in mathematics and theoretical computing

RL versus SFT: what changes inside a reasoning model

An LLM-maintained wiki to preserve what research teams forget