Local-first MCP memory for AI agents — the searchable corpus (library). Pairs with mcp-memory-rs for curated state.
git clone https://github.com/DioNanos/mcp-vl-msa-rs{
"mcpServers": {
"mcp-vl-msa-rs": {
"command": "mcp-vl-msa-rs"
}
}
}MCP Servers overview
# mcp-vl-msa-rs
A searchable long-term memory for AI agents, exposed as an MCP stdio server.
Index documents, notes and past conversations into collections; retrieve the
top-k relevant chunks for a query and inject the original text back to the
model; add or drop agent memories with `msa_remember` / `msa_forget`. Pure
Rust, BM25 over [tantivy](https://github.com/quickwit-oss/tantivy), zero ML
deps in the default build; optional in-process dense rerank.
Any MCP client (Claude Code, Codex, or anything speaking MCP stdio) gets the
same memory: a queryable corpus that survives across sessions and model swaps,
with no cloud account and no embedding service required. Use it to give an
agent durable recall over a knowledge base, a docs tree, or its own chat
history — retrieval that returns the original text, not just embeddings.
It is one half of a two-part memory: this server is the **library** (corpus
recall), its companion [mcp-memory-rs](https://github.com/DioNanos/mcp-memory-rs)
is the **notebook** (curated state). An agent that swaps models loses neither.
```mermaid
flowchart LR
A["AI agent<br/>(any MCP client)"]
A -->|"curated state<br/>read / write / sync"| M["mcp-memory-rs<br/><i>the notebook</i>"]
A -->|"corpus recall<br/>index / search / fetch"| V["mcp-vl-msa-rs<br/><i>the library</i>"]
M --- D1[("JSON categories<br/>SQLite FTS5")]
V --- D2[("tantivy BM25<br/>collections")]
```
**The name**: `msa` is the retrieval pattern it borrows from the Memory Sparse
Attention paper (arXiv:2603.23516) — an *extrinsic* approximation, not the
neural model; distinct from MiniMax's MSA-architecture LLMs, which are
*intrinsic* (in-model) generators. `vl` is for Vivling (`codex-vl`), its first
adopter — but the server is fully AI-agnostic and depends on nothing from it.
**Status**: v0.4 — hybrid sparse+dense optional.
## Why
The original [Memory Sparse Attention](https://arxiv.org/abs/2603.23516) paper (EverMind-AI) describes an end-to-end trainable sparse attention layer over chunk-pooled KV caches. That is a neural artifact and is not portable to a pure-Rust MCP server. What *is* portable, and what this repo aims to deliver, is the MSA macro pattern:
1. **Chunked storage** of long-form text with a small fixed pool size (`P=64` words by default, mirroring the paper).
2. **Top-k sparse routing** over chunks (BM25 surrogate; learned routing is out of scope).
3. **Original text injection** (paper §4.3, ablation -37.1% without): `msa_search` returns chunks, `msa_fetch_doc` returns the full document.
4. **Memory Interleave** as a *protocol* (planned v0.4): the AI client orchestrates multi-hop retrieval through repeated tool calls with a server-side cursor.
Design and rationale are documented in the project notes (negative results, gate methodology); see [`docs/NEGATIVE_RESULTS.md`](docs/NEGATIVE_RESULTS.md).
## Tool surface
| Tool | Since | Description |
|---|---|---|
| `msa_index` | v0.1 | Index a document; existing chunks for `doc_id` are replaced. |
| `msa_search` | v0.1 | Top-k chunks, score normalized 0.0–1.0. |
| `msa_fetch_doc` | v0.1 | Full original text of a document. |
| `msa_delete` | v0.1 | Remove a document and all its chunks. |
| `msa_list_collections` | v0.1 | Collections open in the registry. |
| `msa_stats` | v0.1 | Per-collection statistics (exact `num_documents` / `total_tokens`). |
| `SearchFilter` | v0.2 | Metadata filter (`where_eq`/`where_in`/`created_*`), post-retrieval. |
| `msa_search_iterative` | v0.3 | Memory Interleave with server-side cursor; dedups across rounds. |
| `msa_drop_session` | v0.3 | Force-evict a Memory Interleave session before TTL. |
| `dense_alpha` on `msa_search` | v0.4 | Hybrid BM25 + cosine rerank. Requires `--features embeddings` + `[embeddings]` config. |
| `msa_remember` / `msa_forget` | v0.4 | Agent-memory surface: enrich + low-signal gate + content-hash dedup; standard metadata (`kind` / `source_id` / `created_at`). |
| `msa_sync_path` | v0.4 | Mirror a directory into a collection (filesystem source; blake3 delta sync). |
## Install
**Prebuilt binary** (recommended) — download the archive for your platform from
the [latest release](https://github.com/DioNanos/mcp-vl-msa-rs/releases/latest),
extract, and point your MCP client at the binary:
```bash
tar xzf mcp-vl-msa-rs-x86_64-unknown-linux-gnu.tar.gz
install -m755 mcp-vl-msa-rs-*/mcp-vl-msa-rs ~/.local/bin/
```
Prebuilt targets (Linux + Android): `x86_64-unknown-linux-gnu`,
`x86_64-unknown-linux-musl`, `aarch64-unknown-linux-gnu`,
`aarch64-unknown-linux-musl` (edge / ARM / Termux), `aarch64-linux-android`.
**macOS**: no prebuilt binary is shipped (it would need Apple code-signing).
Install from source instead — `cargo install` below compiles it on your Mac in
one command, no signing needed.
**From source** (Rust toolchain) — `--locked` is required (the workspace
`Cargo.lock` pins a working `time` / `tantivy-common` resolution; a fresh
resolve breaks the build), and `mcp-msa-server` is the package name (the
binary it installs is `mcp-vl-msa-rs`):
```bash
cargo install --git https://github.com/DioNanos/mcp-vl-msa-rs \
--locked --features source-fs mcp-msa-server
```
## Build & test
```bash
cd mcp-vl-msa-rs
# Default: pure BM25, zero network deps
cargo build --release
cargo test
# Hybrid sparse + dense (in-process Candle rerank, no external service)
cargo build --release --features embeddings
cargo test --features embeddings
```
### Hybrid mode config
Add `[embeddings]` to `MCP_MSA_CONFIG` to activate dense rerank. Without
this section the server stays in BM25-only mode even when the binary was
built with `--features embeddings`.
The production backend is `candle-modernbert`: the encoder runs **in-process**
(Candle), offline-deterministic, from a local model bundle — no daemon, no
network at runtime, no automatic downloads. Prepare the bundle once with
`scripts/prepare-granite-r2-97m.sh`.
```toml
[storage]
storage_dir = "~/.local/state/mcp-vl-msa-rs"
[chunking]
chunk_size = 64
overlap = 0
[embeddings]
backend = "candle-modernbert"
model_dir = "~/.local/share/mcp-vl-msa-rs/models/granite-r2-97m"
dim = 768
model_id = "granite-r2-97m"
```
A transitional `backend = "ollama"` (HTTP to an Ollama-compatible service)
still exists but is **deprecated and scheduled for removal in v0.6** — do not
build new setups on it.
The AI client opts into hybrid scoring per-call by passing `dense_alpha`
to `msa_search` (or any future tool that supports it). `dense_alpha = 1.0`
(default) is BM25-only; `0.0` is dense-only; intermediate values are a
linear blend `α·bm25 + (1-α)·((cos+1)/2)`. Cosine is shifted to `[0,1]`
so it composes linearly with the already max-normalized BM25 score.
## Run as MCP stdio
```bash
# Default storage: ~/.local/state/mcp-vl-msa-rs/
./target/release/mcp-vl-msa-rs
# With explicit config
MCP_VL_MSA_CONFIG=~/.config/mcp-vl-msa-rs/config.toml \
MCP_DEVICE=my-node \
./target/release/mcp-vl-msa-rs
```
Example `~/.codex/config.toml` entry:
```toml
[mcp_servers.vl_msa]
command = "/path/to/mcp-vl-msa-rs/target/release/mcp-vl-msa-rs"
env = { MCP_DEVICE = "my-node" }
# let the model call tools without a per-call approval prompt
default_tools_approval_mode = "approve"
```
Equivalent `~/.claude.json` entry for Claude Code:
```json
{
"mcpServers": {
"vl_msa": {
"command": "/path/to/mcp-vl-msa-rs/target/release/mcp-vl-msa-rs",
"env": { "MCP_DEVICE": "my-node" }
}
}
}
```
### AI client compatibility
- Clients with partial MCP support may not surface the server's `instructions`
text. The tool descriptions and request-field descriptions are self-contained,
so a model can work from those alone.
- Read-only tools (`msa_search`, `msa_fetch_doc`, `msa_stats`,
`msa_list_collections`, `msa_manifest`, `msa_search_iterative`,
`msa_interleave_round`) carry the `readOnlyHint` annotation, which lets a
gating client auto-approve them.
- If a model reports an "unsupported call" or "user cancelled" on codex, that is
the approval gate, not a server fault — set `default_tools_approval_mode`
(above) so tool calls are not blocked on a prompt.
## Storage layout
```
~/.local/state/mcp-vl-msa-rs/
├── <collection_a>/ ← tantivy index directory
├── <collection_b>/
└── ...
```
Each collection is an independent tantivy index. Collection names are validated
(rejected if they contain path separators, `..`, etc.) so a collection cannot
escape the root.
## Roadmap
Shipped:
- **v0.2** — `SearchFilter` (where_eq / where_in / created range), post-retrieval.
- **v0.3** — `msa_search_iterative` Memory Interleave with server-side cursor + TTL'd `MsaSession` registry.
- **v0.4** — hybrid BM25 + dense rerank behind feature flag `embeddings`, Ollama backend, per-call `dense_alpha`; agent-memory surface (`msa_remember` / `msa_forget`); filesystem source metadata (`created_at` / `source` / `ext` / `dir`) at index time; exact `num_documents` / `total_tokens` in `msa_stats`; `msa-bench` reproducible benchmark crate; prebuilt-binary packaging.
Next (not yet built):
- Query-time tantivy filter (today `SearchFilter` runs post-retrieval; fine for
normal corpora, but a pre-filter would help when selectivity is high on a very
large index).
- ACL for multi-tenant collections.
- Tool-description tuning.
## Related work
- **MSA paper** ([arXiv:2603.23516](https://arxiv.org/abs/2603.23516)) — the
architectural inspiration (neural, intrinsic); this repo is an extrinsic,
pure-Rust approximation of the macro pattern.
- **Vivling** (in `codex-vl`) — the first downstream consumer: this server is
its long-term memory.
- **[mcp-memory-rs](https://github.com/DioNanos/mcp-memory-rs)** — the companion
server for *curated* agent state (named JSON categories, per-device ACL,
fleet sync). This server does corpus recall; together they cover both halves
of agent memory: the curated notebook and the queryable library.
## License
Apache-2.0. See [LICENSE](./LICENSE).What people ask about mcp-vl-msa-rs
What is DioNanos/mcp-vl-msa-rs?
+
DioNanos/mcp-vl-msa-rs is mcp servers for the Claude AI ecosystem. Local-first MCP memory for AI agents — the searchable corpus (library). Pairs with mcp-memory-rs for curated state. It has 0 GitHub stars and was last updated today.
How do I install mcp-vl-msa-rs?
+
You can install mcp-vl-msa-rs by cloning the repository (https://github.com/DioNanos/mcp-vl-msa-rs) or following the README instructions on GitHub. ClaudeWave also provides quick install blocks on this page.
Is DioNanos/mcp-vl-msa-rs safe to use?
+
DioNanos/mcp-vl-msa-rs has not been audited yet by our security agent. Review the original repository on GitHub before using it in production.
Who maintains DioNanos/mcp-vl-msa-rs?
+
DioNanos/mcp-vl-msa-rs is maintained by DioNanos. The last recorded GitHub activity is from today, with 0 open issues.
Are there alternatives to mcp-vl-msa-rs?
+
Yes. On ClaudeWave you can browse similar mcp servers at /categories/mcp, sorted by popularity or recent activity.
Deploy mcp-vl-msa-rs to your cloud
Ship this repo to production in minutes. Each platform spins up its own environment with editable env vars.
Maintain this repo? Add a badge to your README
Drop the badge into your GitHub README to show it's tracked on ClaudeWave. Each badge links back to this page and reflects the live Trust Score.
[](https://claudewave.com/repo/dionanos-mcp-vl-msa-rs)<a href="https://claudewave.com/repo/dionanos-mcp-vl-msa-rs"><img src="https://claudewave.com/api/badge/dionanos-mcp-vl-msa-rs" alt="Featured on ClaudeWave: DioNanos/mcp-vl-msa-rs" width="320" height="64" /></a>More MCP Servers
Fair-code workflow automation platform with native AI capabilities. Combine visual building with custom code, self-host or cloud, 400+ integrations.
User-friendly AI Interface (Supports Ollama, OpenAI API, ...)
An open-source AI agent that brings the power of Gemini directly into your terminal.
The fastest path to AI-powered full stack observability, even for lean teams.
🕷️ An adaptive Web Scraping framework that handles everything from a single request to a full-scale crawl!
⭐AI-driven public opinion & trend monitor with multi-platform aggregation, RSS, and smart alerts.🎯 告别信息过载,你的 AI 舆情监控助手与热点筛选工具!聚合多平台热点 + RSS 订阅,支持关键词精准筛选。AI 智能筛选新闻 + AI 翻译 + AI 分析简报直推手机,也支持接入 MCP 架构,赋能 AI 自然语言对话分析、情感洞察与趋势预测等。支持 Docker ,数据本地/云端自持。集成微信/飞书/钉钉/Telegram/邮件/ntfy/bark/slack 等渠道智能推送。