🤖 MCP Integration
Acts as a Model Context Protocol server, allowing AI assistants to query websites using natural language.
NLWeb transforms natural language queries into structured Schema.org responses
Tap any layer for details
Existing structured web data
Python scripts ingest & embed content
Semantic index for retrieval
Python service orchestrating the pipeline
Pluggable model providers
Returns Schema.org JSON responses
Humans & agents consume the API
NLWeb : MCP/A2A :: HTML : HTTP
Every instance is an MCP server. Uses Schema.org for responses.
MIT licensed, runs on clusters to laptops.
Try example queries or connect to your own NLWeb instance
Acts as a Model Context Protocol server, allowing AI assistants to query websites using natural language.
Uses structured web data formats already deployed on over 100 million websites.
Compatible with Windows, macOS, and Linux. Supports multiple vector databases and LLMs.
Works with Qdrant, Snowflake, Milvus, Azure AI Search, Elasticsearch, Postgres, and Cloudflare AutoRAG.
Clean protocol for natural language queries returning JSON responses with Schema.org vocabulary.
List, summarize, or generate responses based on your use case needs.
git clone https://github.com/microsoft/NLWeb
cd NLWeb
python -m venv myenv
source myenv/bin/activate # On Windows: myenv\Scripts\activate
cd code/python
pip install -r requirements.txt
cp .env.template .env
# Edit .env with your LLM provider credentials
# Load a podcast RSS feed
python -m data_loading.db_load https://feeds.libsyn.com/121695/rss behind-the-tech
python app-aiohttp.py
# Access at http://localhost:8000/
POST /ask - Standard response formatPOST /mcp - MCP client-compatible format| Parameter | Type | Required | Description |
|---|---|---|---|
query |
string | ✓ | Natural language query |
mode |
string | list, summarize, or generate (default: list) | |
site |
string | Backend subset/site token | |
prev |
string | Comma-separated previous queries for context | |
streaming |
boolean | Enable streaming (default: true) |
curl -X POST http://localhost:8000/ask \
-H "Content-Type: application/json" \
-d '{
"query": "What are the latest tech podcasts?",
"mode": "list"
}'
{
"query_id": "abc123",
"results": [
{
"name": "Behind the Tech Episode 42",
"url": "https://example.com/episode-42",
"site": "behind-the-tech",
"score": 0.95,
"description": "Discussion about AI and the future of technology",
"schema_object": {
"@type": "PodcastEpisode",
"name": "Behind the Tech Episode 42",
"datePublished": "2024-01-15"
}
}
]
}