Self-Hosted LLMs
Run AI models locally on your machine — your data never leaves your computer.
Why Self-Host?
| Benefit | Description |
|---|---|
| 🔒 Complete Privacy | No data sent to cloud APIs. Everything runs on your hardware. |
| 💰 Zero Cost | No API fees, no subscriptions, no usage limits. |
| ⚡ No Internet Required | Works offline — on planes, in air-gapped networks, anywhere. |
| 🏢 Enterprise Compliance | Meet data residency, HIPAA, SOC2, and regulatory requirements. |
| 🎛️ Full Control | Choose your model, quantization, and context length. |
Supported Local LLM Servers
Vibe Browser works with any server that exposes an OpenAI-compatible API (/v1/chat/completions).
| Server | Best For | Install |
|---|---|---|
| Ollama | Easiest setup, CLI-first, great model library | One command |
| LM Studio | GUI-first, drag-and-drop models | Desktop app |
| LocalAI | Docker-based, API-first | Docker |
| vLLM | Production GPU inference | pip install |
| llama.cpp server | Minimal, single binary | Build from source |
Quick Start with Ollama (Recommended)
# 1. Install Ollama
curl -fsSL https://ollama.com/install.sh | sh # Linux
brew install ollama # macOS
winget install Ollama.Ollama # Windows
# 2. Pull a model
ollama pull qwen3.5
# 3. Open Vibe Browser → Settings → Select "Ollama (Self-Hosted)"
# Vibe auto-detects Ollama, shows installable library models,
# and auto-installs uninstalled selections
👉 See the full Ollama Setup Guide for detailed instructions.
How It Works
┌─────────────────┐ localhost:11434 ┌──────────────┐
│ Vibe Browser │ ◄──────────────────► │ Ollama │
│ Extension │ OpenAI-compatible │ (local LLM) │
│ │ /v1/chat/completions │ │
└─────────────────┘ └──────────────┘
│ │
│ Your prompts & responses │ Model runs
│ stay on your machine │ on your CPU/GPU
│ │
└── No cloud. No tracking. No API keys. ──┘
The extension communicates directly with the local LLM server over localhost. No proxy, no wrapper — a direct connection.
Privacy Architecture
When using a self-hosted model:
- ✅ Prompts stay on your machine
- ✅ Responses stay on your machine
- ✅ Browsing context (page content, screenshots) stays on your machine
- ✅ No telemetry sent about your queries
- ✅ No API keys needed — the model runs locally
- ❌ Cloud providers are not contacted at all
This makes self-hosted mode ideal for:
- Legal professionals handling privileged communications
- Healthcare workers with HIPAA requirements
- Researchers working with sensitive data
- Anyone who values privacy
Recommended Models
| Model | Parameters | RAM Needed | Best For |
|---|---|---|---|
qwen3.5 | 7B+ | ~6 GB | Default recommendation with strong reasoning and tool use |
llama3.1:8b | 8B | ~6 GB | Great general-purpose model |
qwen3:4b | 4B | ~3 GB | Good for machines with 8 GB RAM |
deepseek-r1:8b | 8B | ~6 GB | Strong reasoning and math |
smollm2:1.7b | 1.7B | ~1.5 GB | Runs on anything, good for testing |
Hardware Rule of Thumb
Models need approximately 1.2× their file size in available RAM to run. An 8 GB model needs ~10 GB free RAM.