Skip to main content

Self-Hosted LLMs

Run AI models locally on your machine — your data never leaves your computer.

Why Self-Host?

BenefitDescription
🔒 Complete PrivacyNo data sent to cloud APIs. Everything runs on your hardware.
💰 Zero CostNo API fees, no subscriptions, no usage limits.
No Internet RequiredWorks offline — on planes, in air-gapped networks, anywhere.
🏢 Enterprise ComplianceMeet data residency, HIPAA, SOC2, and regulatory requirements.
🎛️ Full ControlChoose your model, quantization, and context length.

Supported Local LLM Servers

Vibe Browser works with any server that exposes an OpenAI-compatible API (/v1/chat/completions).

ServerBest ForInstall
OllamaEasiest setup, CLI-first, great model libraryOne command
LM StudioGUI-first, drag-and-drop modelsDesktop app
LocalAIDocker-based, API-firstDocker
vLLMProduction GPU inferencepip install
llama.cpp serverMinimal, single binaryBuild from source
# 1. Install Ollama
curl -fsSL https://ollama.com/install.sh | sh # Linux
brew install ollama # macOS
winget install Ollama.Ollama # Windows

# 2. Pull a model
ollama pull qwen3.5

# 3. Open Vibe Browser → Settings → Select "Ollama (Self-Hosted)"
# Vibe auto-detects Ollama, shows installable library models,
# and auto-installs uninstalled selections

👉 See the full Ollama Setup Guide for detailed instructions.

How It Works

┌─────────────────┐     localhost:11434     ┌──────────────┐
│ Vibe Browser │ ◄──────────────────► │ Ollama │
│ Extension │ OpenAI-compatible │ (local LLM) │
│ │ /v1/chat/completions │ │
└─────────────────┘ └──────────────┘
│ │
│ Your prompts & responses │ Model runs
│ stay on your machine │ on your CPU/GPU
│ │
└── No cloud. No tracking. No API keys. ──┘

The extension communicates directly with the local LLM server over localhost. No proxy, no wrapper — a direct connection.

Privacy Architecture

When using a self-hosted model:

  • Prompts stay on your machine
  • Responses stay on your machine
  • Browsing context (page content, screenshots) stays on your machine
  • No telemetry sent about your queries
  • No API keys needed — the model runs locally
  • ❌ Cloud providers are not contacted at all

This makes self-hosted mode ideal for:

  • Legal professionals handling privileged communications
  • Healthcare workers with HIPAA requirements
  • Researchers working with sensitive data
  • Anyone who values privacy
ModelParametersRAM NeededBest For
qwen3.57B+~6 GBDefault recommendation with strong reasoning and tool use
llama3.1:8b8B~6 GBGreat general-purpose model
qwen3:4b4B~3 GBGood for machines with 8 GB RAM
deepseek-r1:8b8B~6 GBStrong reasoning and math
smollm2:1.7b1.7B~1.5 GBRuns on anything, good for testing
Hardware Rule of Thumb

Models need approximately 1.2× their file size in available RAM to run. An 8 GB model needs ~10 GB free RAM.