Multi-source search aggregation tool that unifies retrieval across diverse data sources — Confluence, MCP servers, local files, and more — using AI-powered search and response synthesis through a single query interface.
pip install fusesearchWith all optional dependencies (MCP server, local embeddings, OpenAI):
pip install fusesearch[all]make build
make start
make index # index docs from data/docs
make search "your query"Query → Embed → Vector Search ─┐
├→ RRF Fusion → Rerank (optional) → Results
Keyword Search ─┘
FuseSearch supports three embedding providers. Each uses a separate Qdrant collection due to different vector dimensions.
| Local (default) | OpenAI | Ollama | |
|---|---|---|---|
| Model | all-MiniLM-L6-v2 (384 dims) | text-embedding-3-small (1536 dims) | nomic-embed-text (768 dims) |
| Quality | Good for general English | Better for nuanced/complex queries | Good, varies by model |
| Cost | Free | ~$0.02 per 1M tokens | Free |
| Privacy | Data stays local | Data sent to OpenAI | Data stays local |
| Offline | Yes | No | Yes |
Uses sentence-transformers. Runs entirely on your machine, no API key needed.
Uses OpenAI's API. Higher quality embeddings but requires an API key.
To use OpenAI embeddings, add to your .env:
FUSESEARCH_EMBEDDER=openai
OPENAI_API_KEY=sk-...Or pass via CLI:
fusesearch --embedder openai index data/docs
fusesearch --embedder openai search "your query"Rate limits: OpenAI Tier 1 accounts have a 40k tokens-per-minute limit on embeddings. FuseSearch retries automatically on rate limit errors, but initial indexing of large document sets will be slow. Higher tiers (auto-upgrade as you spend) increase this significantly. See OpenAI rate limits.
Uses Ollama to run embedding models locally. No API key needed.
- Install from ollama.com
- Pull an embedding model:
ollama pull nomic-embed-text
FUSESEARCH_EMBEDDER=ollamaOr pass via CLI:
fusesearch --embedder ollama index data/docs
fusesearch --embedder ollama search "your query"Docker: When running FuseSearch in Docker with Ollama on the host, set OLLAMA_HOST so the container can reach it:
OLLAMA_HOST=http://host.docker.internal:11434Other Ollama embedding models: bge-m3, mxbai-embed-large, snowflake-arctic-embed. Configure with:
OLLAMA_EMBED_MODEL=bge-m3Reranking uses a cross-encoder model to rescore search results after retrieval. The cross-encoder evaluates each (query, document) pair directly, producing more accurate relevance scores than initial retrieval alone.
When enabled, FuseSearch overfetches 3x candidates from hybrid search, then reranks down to the requested limit.
Per-request via CLI flag or API parameter:
fusesearch search "your query" --rerankcurl -X POST http://localhost:8000/search \
-H "Content-Type: application/json" \
-d '{"query": "your query", "rerank": true}'Or enable globally via environment variable:
FUSESEARCH_RERANK=true| Variable | Default | Description |
|---|---|---|
FUSESEARCH_RERANK |
false |
Enable reranking globally |
FUSESEARCH_RERANKER |
local |
Reranker provider |
FUSESEARCH_RERANK_MODEL |
cross-encoder/ms-marco-MiniLM-L-6-v2 |
Cross-encoder model |
The reranker is independent of the embedding provider — it works on raw text, not vectors. You can use FUSESEARCH_EMBEDDER=openai with FUSESEARCH_RERANK=true. The local reranker requires the [local] extra (sentence-transformers).
The ask command searches your indexed documents and uses an LLM to synthesize an answer with citations. This is optional — search works without any LLM provider installed.
fusesearch ask "What is Drupal?"curl -X POST http://localhost:8000/ask \
-H "Content-Type: application/json" \
-d '{"query": "What is Drupal?"}'| Provider | Model | Extra | Cost |
|---|---|---|---|
| Anthropic | claude-sonnet-4-20250514 | [anthropic] |
Pay-as-you-go API |
| OpenAI | gpt-4o-mini | [openai] |
Pay-as-you-go API |
| Ollama | llama3.2 | [ollama] |
Free (runs locally) |
Requires a separate API key (a Claude Pro/Team subscription does not include API access).
- Create an account at console.anthropic.com
- Add billing under Settings > Billing (pay-as-you-go)
- Create a key under Settings > API Keys
FUSESEARCH_LLM=anthropic
ANTHROPIC_API_KEY=sk-ant-...If you already have an API key for OpenAI embeddings, the same key works here.
- Go to platform.openai.com/api-keys
- Create a new secret key
FUSESEARCH_LLM=openai
OPENAI_API_KEY=sk-...No API key needed. Runs entirely on your machine.
- Install from ollama.com
- Pull a model:
ollama pull llama3.2
FUSESEARCH_LLM=ollamaIf FUSESEARCH_LLM is not set, FuseSearch auto-detects the first installed provider. If none are installed, ask returns a clear error — search continues to work normally.
The fusesearch-mcp Docker service exposes a streamable HTTP endpoint on port 8001. Tools: search (hybrid search), count (indexed chunks).
claude mcp add fusesearch http://localhost:8001/mcp --transport httpOption 1: Connectors UI (recommended)
In Claude Desktop, go to Settings > Connectors > Add custom connector and enter https://localhost:8001/mcp.
Option 2: Config file with mcp-remote bridge (local dev)
Add to ~/Library/Application Support/Claude/claude_desktop_config.json:
{
"mcpServers": {
"fusesearch": {
"command": "npx",
"args": ["-y", "mcp-remote", "http://localhost:8001/mcp", "--allow-http"]
}
}
}Requires Node.js >= 18. --allow-http is required for plain HTTP (not needed for HTTPS).

