guidance
Guidance is a constrained generation framework from Microsoft Research that controls large language model outputs through regex patterns and formal grammars to guarantee valid JSON, XML, and code generation while enforcing structured data formats and reducing latency compared to traditional prompting. Use it when you need deterministic structured outputs, must prevent invalid formats, require guaranteed syntax compliance, or need to build multi-step workflows with programmatic control flow.
git clone --depth 1 https://github.com/NousResearch/hermes-agent /tmp/guidance && cp -r /tmp/guidance/optional-skills/mlops/guidance ~/.claude/skills/guidanceSKILL.md
# Guidance: Constrained LLM Generation
## When to Use This Skill
Use Guidance when you need to:
- **Control LLM output syntax** with regex or grammars
- **Guarantee valid JSON/XML/code** generation
- **Reduce latency** vs traditional prompting approaches
- **Enforce structured formats** (dates, emails, IDs, etc.)
- **Build multi-step workflows** with Pythonic control flow
- **Prevent invalid outputs** through grammatical constraints
**GitHub Stars**: 18,000+ | **From**: Microsoft Research
## Installation
```bash
# Base installation
pip install guidance
# With specific backends
pip install guidance[transformers] # Hugging Face models
pip install guidance[llama_cpp] # llama.cpp models
```
## Quick Start
### Basic Example: Structured Generation
```python
from guidance import models, gen
# Load model (supports OpenAI, Transformers, llama.cpp)
lm = models.OpenAI("gpt-4")
# Generate with constraints
result = lm + "The capital of France is " + gen("capital", max_tokens=5)
print(result["capital"]) # "Paris"
```
### With Anthropic Claude
```python
from guidance import models, gen, system, user, assistant
# Configure Claude
lm = models.Anthropic("claude-sonnet-4-5-20250929")
# Use context managers for chat format
with system():
lm += "You are a helpful assistant."
with user():
lm += "What is the capital of France?"
with assistant():
lm += gen(max_tokens=20)
```
## Core Concepts
### 1. Context Managers
Guidance uses Pythonic context managers for chat-style interactions.
```python
from guidance import system, user, assistant, gen
lm = models.Anthropic("claude-sonnet-4-5-20250929")
# System message
with system():
lm += "You are a JSON generation expert."
# User message
with user():
lm += "Generate a person object with name and age."
# Assistant response
with assistant():
lm += gen("response", max_tokens=100)
print(lm["response"])
```
**Benefits:**
- Natural chat flow
- Clear role separation
- Easy to read and maintain
### 2. Constrained Generation
Guidance ensures outputs match specified patterns using regex or grammars.
#### Regex Constraints
```python
from guidance import models, gen
lm = models.Anthropic("claude-sonnet-4-5-20250929")
# Constrain to valid email format
lm += "Email: " + gen("email", regex=r"[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}")
# Constrain to date format (YYYY-MM-DD)
lm += "Date: " + gen("date", regex=r"\d{4}-\d{2}-\d{2}")
# Constrain to phone number
lm += "Phone: " + gen("phone", regex=r"\d{3}-\d{3}-\d{4}")
print(lm["email"]) # Guaranteed valid email
print(lm["date"]) # Guaranteed YYYY-MM-DD format
```
**How it works:**
- Regex converted to grammar at token level
- Invalid tokens filtered during generation
- Model can only produce matching outputs
#### Selection Constraints
```python
from guidance import models, gen, select
lm = models.Anthropic("claude-sonnet-4-5-20250929")
# Constrain to specific choices
lm += "Sentiment: " + select(["positive", "negative", "neutral"], name="sentiment")
# Multiple-choice selection
lm += "Best answer: " + select(
["A) Paris", "B) London", "C) Berlin", "D) Madrid"],
name="answer"
)
print(lm["sentiment"]) # One of: positive, negative, neutral
print(lm["answer"]) # One of: A, B, C, or D
```
### 3. Token Healing
Guidance automatically "heals" token boundaries between prompt and generation.
**Problem:** Tokenization creates unnatural boundaries.
```python
# Without token healing
prompt = "The capital of France is "
# Last token: " is "
# First generated token might be " Par" (with leading space)
# Result: "The capital of France is Paris" (double space!)
```
**Solution:** Guidance backs up one token and regenerates.
```python
from guidance import models, gen
lm = models.Anthropic("claude-sonnet-4-5-20250929")
# Token healing enabled by default
lm += "The capital of France is " + gen("capital", max_tokens=5)
# Result: "The capital of France is Paris" (correct spacing)
```
**Benefits:**
- Natural text boundaries
- No awkward spacing issues
- Better model performance (sees natural token sequences)
### 4. Grammar-Based Generation
Define complex structures using context-free grammars.
```python
from guidance import models, gen
lm = models.Anthropic("claude-sonnet-4-5-20250929")
# JSON grammar (simplified)
json_grammar = """
{
"name": <gen name regex="[A-Za-z ]+" max_tokens=20>,
"age": <gen age regex="[0-9]+" max_tokens=3>,
"email": <gen email regex="[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\\.[a-zA-Z]{2,}" max_tokens=50>
}
"""
# Generate valid JSON
lm += gen("person", grammar=json_grammar)
print(lm["person"]) # Guaranteed valid JSON structure
```
**Use cases:**
- Complex structured outputs
- Nested data structures
- Programming language syntax
- Domain-specific languages
### 5. Guidance Functions
Create reusable generation patterns with the `@guidance` decorator.
```python
from guidance import guidance, gen, models
@guidance
def generate_person(lm):
"""Generate a person with name and age."""
lm += "Name: " + gen("name", max_tokens=20, stop="\n")
lm += "\nAge: " + gen("age", regex=r"[0-9]+", max_tokens=3)
return lm
# Use the function
lm = models.Anthropic("claude-sonnet-4-5-20250929")
lm = generate_person(lm)
print(lm["name"])
print(lm["age"])
```
**Stateful Functions:**
```python
@guidance(stateless=False)
def react_agent(lm, question, tools, max_rounds=5):
"""ReAct agent with tool use."""
lm += f"Question: {question}\n\n"
for i in range(max_rounds):
# Thought
lm += f"Thought {i+1}: " + gen("thought", stop="\n")
# Action
lm += "\nAction: " + select(list(tools.keys()), name="action")
# Execute tool
tool_result = tools[lm["action"]]()
lm += f"\nObservation: {tool_result}\n\n"
# Check if done
lm += "Done? " + select(["Yes", "No"], name="done")
if lm["done"] == "Yes":
break
# Final answer
lm += "\nFinaOperate the Antigravity CLI (agy): plugins, auth, sandbox.
Delegate coding tasks to Blackbox AI CLI agent. Multi-model agent with built-in judge that runs tasks through multiple LLMs and picks the best result. Requires the blackbox CLI and a Blackbox AI API key.
Delegate coding to xAI Grok Build CLI (features, PRs).
Configure and use Honcho memory with Hermes -- cross-session user modeling, multi-profile peer isolation, observation config, dialectic reasoning, session summaries, and context budget enforcement. Use when setting up Honcho, troubleshooting memory, managing profiles with Honcho peers, or tuning observation, recall, and dialectic settings.
Delegate coding to OpenHands CLI (model-agnostic, LiteLLM).
Read-only EVM client: wallets, tokens, gas across 8 chains.
Hyperliquid market data, account history, trade review.
Query Solana blockchain data with USD pricing — wallet balances, token portfolios with values, transaction details, NFTs, whale detection, and live network stats. Uses Solana RPC + CoinGecko. No API key required.