Self-Hosted LLMs

Run AI models locally on your machine — your data never leaves your computer.

Why Self-Host?

Benefit	Description
🔒 Complete Privacy	No data sent to cloud APIs. Everything runs on your hardware.
💰 Zero Cost	No API fees, no subscriptions, no usage limits.
⚡ No Internet Required	Works offline — on planes, in air-gapped networks, anywhere.
🏢 Enterprise Compliance	Meet data residency, HIPAA, SOC2, and regulatory requirements.
🎛️ Full Control	Choose your model, quantization, and context length.

Supported Local LLM Servers

Vibe Browser works with any server that exposes an OpenAI-compatible API (/v1/chat/completions).

Server	Best For	Install
Ollama	Easiest setup, CLI-first, great model library	One command
LM Studio	GUI-first, drag-and-drop models	Desktop app
LocalAI	Docker-based, API-first	Docker
vLLM	Production GPU inference	pip install
llama.cpp server	Minimal, single binary	Build from source

Quick Start with Ollama (Recommended)

# 1. Install Ollama
curl -fsSL https://ollama.com/install.sh | sh   # Linux
brew install ollama                                # macOS
winget install Ollama.Ollama                       # Windows

# 2. Pull a model
ollama pull qwen3.5

# 3. Open Vibe Browser → Settings → Select "Ollama (Self-Hosted)"
# Vibe auto-detects Ollama, shows installable library models,
# and auto-installs uninstalled selections

👉 See the full Ollama Setup Guide for detailed instructions.

How It Works

┌─────────────────┐     localhost:11434     ┌──────────────┐
│  Vibe Browser    │ ◄──────────────────► │    Ollama      │
│  Extension       │   OpenAI-compatible   │  (local LLM)  │
│                  │   /v1/chat/completions │               │
└─────────────────┘                        └──────────────┘
         │                                        │
         │  Your prompts & responses               │  Model runs
         │  stay on your machine                   │  on your CPU/GPU
         │                                        │
         └── No cloud. No tracking. No API keys. ──┘

The extension communicates directly with the local LLM server over localhost. No proxy, no wrapper — a direct connection.

Privacy Architecture

When using a self-hosted model:

✅ Prompts stay on your machine
✅ Responses stay on your machine
✅ Browsing context (page content, screenshots) stays on your machine
✅ No telemetry sent about your queries
✅ No API keys needed — the model runs locally
❌ Cloud providers are not contacted at all

This makes self-hosted mode ideal for:

Legal professionals handling privileged communications
Healthcare workers with HIPAA requirements
Researchers working with sensitive data
Anyone who values privacy

Recommended Models

Model	Parameters	RAM Needed	Best For
`qwen3.5`	7B+	~6 GB	Default recommendation with strong reasoning and tool use
`llama3.1:8b`	8B	~6 GB	Great general-purpose model
`qwen3:4b`	4B	~3 GB	Good for machines with 8 GB RAM
`deepseek-r1:8b`	8B	~6 GB	Strong reasoning and math
`smollm2:1.7b`	1.7B	~1.5 GB	Runs on anything, good for testing

Hardware Rule of Thumb

Models need approximately 1.2× their file size in available RAM to run. An 8 GB model needs ~10 GB free RAM.

Why Self-Host?​

Supported Local LLM Servers​

Quick Start with Ollama (Recommended)​

How It Works​

Privacy Architecture​

Recommended Models​