Skip to main content

Self-Hosting Large Language Models in Trusted Execution Environments

A Comprehensive Survey of GPU and TPU Security Architectures

Technical Research Report
Date: January 2026
Version: 1.0

Abstract

The deployment of Large Language Models (LLMs) such as Llama, DeepSeek, and Mistral in privacy-sensitive environments presents significant challenges regarding data confidentiality and model intellectual property protection. Trusted Execution Environments (TEEs) offer a hardware-based solution to protect data-in-use through cryptographic isolation and remote attestation. This paper presents a comprehensive survey of current TEE technologies for GPU and TPU-accelerated LLM inference, examining performance overheads, implementation architectures, and practical deployment considerations for privacy-focused enterprise customers. Our analysis synthesizes findings from recent academic research (2023-2025) and industry implementations to provide actionable guidance for self-hosted confidential AI inference systems.

Keywords: Trusted Execution Environment, Large Language Models, Confidential Computing, GPU TEE, NVIDIA H100, AMD SEV-SNP, Intel TDX, Privacy-Preserving AI

Table of Contents

  1. Introduction
  2. Background: Trusted Execution Environments
  3. TEE Technologies for AI Accelerators
  4. Performance Analysis and Benchmarks
  5. Practical Implementation Architectures
  6. Cloud Provider Offerings
  7. Open-Source Frameworks and Tools
  8. TPU TEE Considerations
  9. Security Analysis and Threat Model
  10. Recommendations for Self-Hosted Deployment
  11. Future Directions
  12. Conclusion
  13. References

1. Introduction

1.1 Problem Statement

The rapid advancement of Large Language Models has created unprecedented opportunities for enterprise AI applications. However, deploying these models in cloud environments raises critical concerns:

  1. Data Confidentiality: User prompts and responses may contain sensitive personal, financial, or healthcare information
  2. Model Intellectual Property: Fine-tuned models represent significant investment and competitive advantage
  3. Regulatory Compliance: GDPR, HIPAA, and industry-specific regulations mandate data protection during processing
  4. Supply Chain Trust: Organizations may not fully trust cloud infrastructure operators

Traditional encryption protects data at rest and in transit, but data must be decrypted for processing, creating a vulnerability window. Confidential computing addresses this "data-in-use" protection gap through hardware-enforced isolation.

1.2 Scope and Objectives

This research examines:

1.3 Contribution

We provide the first comprehensive synthesis of TEE-based LLM deployment strategies, consolidating findings from 15+ academic papers (2023-2025) with industry documentation to deliver actionable guidance for privacy-focused enterprise deployments.

2. Background: Trusted Execution Environments

2.1 Definition and Properties

A Trusted Execution Environment (TEE) is a secure area within a processor that provides:

  1. Confidentiality: Code and data are protected from external access, including privileged software
  2. Integrity: Execution cannot be tampered with by external entities
  3. Attestation: Remote parties can verify the TEE's identity and configuration

The Confidential Computing Consortium (CCC), of which Microsoft, Google, and AMD are founding members, defines confidential computing as:

"Confidential Computing protects data in use by performing computation in a hardware-based, attested Trusted Execution Environment. These secure and isolated environments prevent unauthorized access or modification of applications and data while in use." [1]

2.2 Evolution of TEE Technologies

GenerationTechnologyIsolation GranularityMemory Protection
1st GenIntel SGXProcess (Enclave)Encrypted memory regions
2nd GenAMD SEVVirtual MachineFull VM encryption
3rd GenAMD SEV-SNP, Intel TDXVM with integrityEncryption + integrity checking
4th GenNVIDIA CC, AMD SEV-TIOVM + AcceleratorEnd-to-end GPU/peripheral protection

2.3 Threat Model

TEEs protect against:

TEEs do NOT protect against:

3. TEE Technologies for AI Accelerators

3.1 NVIDIA Confidential Computing (Hopper Architecture)

NVIDIA introduced hardware-based confidential computing with the H100 (Hopper) GPU architecture, representing the first production-ready GPU TEE solution [2].

3.1.1 Architecture

┌─────────────────────────────────────────────────────────────────┐
│ NVIDIA H100 GPU │
│ ┌─────────────────────────────────────────────────────────────┐ │
│ │ On-chip AES-256 Encryption Engine │ │
│ │ - HBM3 memory encryption at line rate │ │
│ │ - Per-VM key isolation │ │
│ └─────────────────────────────────────────────────────────────┘ │
│ ┌─────────────────────────────────────────────────────────────┐ │
│ │ Secure Boot & Attestation │ │
│ │ - Hardware root of trust │ │
│ │ - Firmware measurement and reporting │ │
│ └─────────────────────────────────────────────────────────────┘ │
│ ┌─────────────────────────────────────────────────────────────┐ │
│ │ NVLink Encryption │ │
│ │ - GPU-to-GPU encrypted communication │ │
│ └─────────────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────────┘

3.1.2 Operational Modes

ModeDescriptionUse Case
CC-OnFull confidential computing with memory encryptionProduction deployment
CC-DevToolsRelaxed security for debuggingDevelopment/testing
CC-OffStandard GPU operationNon-confidential workloads

3.1.3 Key Features

Documentation: https://docs.nvidia.com/nvtrust/

3.2 AMD SEV-SNP (Secure Encrypted Virtualization - Secure Nested Paging)

AMD SEV-SNP provides VM-level memory encryption with integrity protection, essential for protecting the CPU portion of LLM workloads [3].

3.2.1 Evolution by EPYC Generation

GenerationProcessorKey Features
1st GenEPYC 7001 (Naples)SEV: 128-bit AES, 15 encryption keys
2nd GenEPYC 7002 (Rome)SEV-ES: Encrypted register state, 509 keys
3rd GenEPYC 7003 (Milan)SEV-SNP: Integrity protection via RMP
4th GenEPYC 9004 (Genoa)256-bit AES-XTS, CXL memory encryption
5th GenEPYC 9005 (Turin)SEV-TIO: Trusted I/O for PCIe devices, 1006 keys

3.2.2 SEV-SNP Security Features

3.2.3 SEV-TIO for GPU Integration

AMD's SEV-TIO (Trusted I/O) extends confidential computing to PCIe devices via the TDISP (TEE Device Interface Security Protocol):

┌─────────────────┐     TDISP      ┌─────────────────┐
│ CPU (SEV-SNP) │◄──────────────►│ GPU (H100 CC) │
│ Encrypted VM │ Encrypted │ Encrypted HBM │
└─────────────────┘ PCIe Link └─────────────────┘

Documentation: https://www.amd.com/en/developer/sev.html

3.3 Intel TDX (Trust Domain Extensions)

Intel TDX provides VM-level isolation through Trust Domains (TDs) with hardware memory encryption [4].

3.3.1 Architecture Components

3.3.2 AI-Specific Features

3.4 ARM TrustZone and CCA

ARM's Confidential Compute Architecture (CCA) provides TEE capabilities for mobile and edge deployments [6].

3.4.1 Realm Management Extension (RME)

3.4.2 TZ-LLM Research

Wang et al. (2025) demonstrated on-device LLM protection using ARM TrustZone:

4. Performance Analysis and Benchmarks

4.1 GPU TEE Overhead (NVIDIA H100)

Zhu et al. (2024) conducted comprehensive benchmarks of NVIDIA H100 Confidential Computing [8]:

ConfigurationBatch SizeInput LengthOverhead
LLM InferenceSmall (1-4)Short (128)~7%
LLM InferenceMedium (8-16)Medium (512)~4%
LLM InferenceLarge (32+)Long (2048)~0%

Key Finding: Overhead is dominated by CPU-GPU PCIe data transfer encryption, not GPU computation. Larger batch sizes amortize transfer costs, approaching native performance.

4.2 CPU TEE Performance (Intel TDX/AMD SEV-SNP)

Chrapek et al. (2025) from ETH Zurich published the first comprehensive CPU+GPU TEE benchmark [5]:

PlatformMetricOverheadNotes
Intel TDXThroughput<10%With AMX acceleration
Intel TDXLatency<20%End-to-end inference
AMD SEV-SNPThroughput<10%EPYC 9004 series

Models Tested: Llama2-7B, 13B, 70B

4.3 Hybrid TEE-GPU Architectures

Recent research explores splitting computation between CPU TEE and GPU for optimal security/performance trade-offs:

SystemArchitectureSpeedup vs CPU-Only TEEReference
PKUSTEE for adapters, GPU for backbone8.1-11.9xCai et al. (2025) [9]
TwinShieldTEE for sensitive layers, GPU offload4.0-6.1xXue et al. (2025) [10]
SecureInferSGX for non-linear, GPU for linear opsVariesNayan et al. (2025) [11]

4.4 Comparative Cost Analysis

Chrapek et al. (2025) analyzed cost-effectiveness across platforms [5]:

Deployment$/Token (Relative)Best For
GPU (No TEE)1.0xNon-sensitive workloads
GPU + CPU TEE1.05-1.08xProduction confidential AI
CPU-Only TEE8-12xMaximum security, small models

5. Practical Implementation Architectures

5.1 Reference Architecture: End-to-End Confidential LLM

┌─────────────────────────────────────────────────────────────────────────┐
│ Confidential VM (AMD SEV-SNP / Intel TDX) │
│ │
│ ┌─────────────────────────────────────────────────────────────────┐ │
│ │ Application Layer │ │
│ │ ┌──────────────────┐ ┌──────────────────┐ ┌───────────────┐ │ │
│ │ │ vLLM / TGI / │ │ Attestation │ │ Model │ │ │
│ │ │ llama.cpp │ │ Client │ │ Encryption │ │ │
│ │ └──────────────────┘ └──────────────────┘ └───────────────┘ │ │
│ └─────────────────────────────────────────────────────────────────┘ │
│ │
│ ┌─────────────────────────────────────────────────────────────────┐ │
│ │ Runtime Layer │ │
│ │ - Confidential Container Runtime (CoCo) │ │
│ │ - Guest OS with minimal TCB │ │
│ │ - Encrypted memory (CPU TEE enforced) │ │
│ └─────────────────────────────────────────────────────────────────┘ │
│ │
│ ┌─────────────────────────────────────────────────────────────────┐ │
│ │ Hardware Layer │ │
│ │ - NVIDIA H100 GPU (CC Mode) - Encrypted HBM3 │ │
│ │ - AMD EPYC 9004+ (SEV-SNP) / Intel Xeon 4th Gen+ (TDX) │ │
│ │ - Encrypted PCIe via TDISP │ │
│ └─────────────────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────────────────┘

5.2 Attestation Flow

┌──────────┐     1. Request      ┌──────────────┐
│ Client │ ─────────────────► │ Confidential │
│ │ │ VM │
│ │ ◄───────────────── │ │
│ │ 2. Attestation │ │
│ │ Evidence │ │
└──────────┘ └──────────────┘
│ ▲
│ 3. Verify │
▼ │
┌──────────────────┐ │
│ Attestation │ 4. Certificate │
│ Service │ ───────────────────┘
│ (NVIDIA/AMD/ │
│ Intel/Azure) │
└──────────────────┘

Attestation Evidence Includes:

5.3 Deployment Models

5.3.1 Cloud-Based Confidential VMs

Advantages:

Suitable For: Organizations preferring operational simplicity

5.3.2 On-Premise Deployment

Requirements:

Advantages:

5.3.3 Hybrid TEE-GPU (Research Stage)

Architecture: Security-critical operations (attention heads, adapters, non-linear layers) execute in CPU TEE; compute-intensive linear operations offload to GPU

Implementations:

6. Cloud Provider Offerings

6.1 Microsoft Azure Confidential Computing

Azure provides the most comprehensive confidential computing portfolio:

VM SeriesCPU TEEGPU TEEUse Case
DCasv5/DCadsv5AMD SEV-SNP-CPU-only workloads
DCesv5/DCedsv5Intel TDX-Intel-based workloads
NCCadsH100v5AMD SEV-SNPNVIDIA H100 CCGPU-accelerated confidential AI

Attestation: Microsoft Azure Attestation (MAA)

Documentation: https://learn.microsoft.com/en-us/azure/confidential-computing/

6.2 Google Cloud Confidential Computing

ServiceTechnologyGPU Support
Confidential VMs (N2D)AMD SEVNo
Confidential VMs (C3)Intel TDX + AMXNo
Confidential VMs (A3)SEV-SNP + H100 CCYes
Confidential GKE NodesSEV-SNP/TDXVaries
Confidential SpaceMulti-party computationLimited

AI-Specific: C3 machine series with Intel AMX provides CPU-based matrix acceleration for inference workloads within TDX.

Documentation: https://cloud.google.com/confidential-computing

6.3 AWS Nitro Enclaves

AWS takes a different approach with hypervisor-based isolation:

FeatureCapability
Isolation TypeHypervisor-enforced (not hardware TEE)
GPU TEE SupportNot available
AttestationNitro Hypervisor signed documents
Use CasesKey management, tokenization

Limitation: Nitro Enclaves do not extend TEE protection to GPU accelerators.

Documentation: https://aws.amazon.com/ec2/nitro/nitro-enclaves/

6.4 Comparison Summary

ProviderCPU TEEGPU TEEAttestationLLM Suitability
AzureSEV-SNP, TDXH100 CCMAAExcellent
GCPSEV, TDXH100 CCPlatformExcellent
AWSNitroNoneNitroLimited

7. Open-Source Frameworks and Tools

7.1 NVIDIA nvTrust

Repository: https://github.com/NVIDIA/nvtrust

Components:

Usage:

# Verify GPU CC mode
nvidia-smi conf-compute -grs

# Generate attestation evidence
python3 -m nvtrust.attestation.generate_evidence

7.2 Confidential Containers (CoCo)

Repository: https://github.com/confidential-containers/confidential-containers

Features:

Architecture:

┌─────────────────────────────────────────┐
│ Kubernetes Cluster │
│ ┌───────────────────────────────────┐ │
│ │ Confidential Pod │ │
│ │ ┌─────────────────────────────┐ │ │
│ │ │ Container (unmodified) │ │ │
│ │ └─────────────────────────────┘ │ │
│ │ ┌─────────────────────────────┐ │ │
│ │ │ Kata Containers Runtime │ │ │
│ │ └─────────────────────────────┘ │ │
│ │ ┌─────────────────────────────┐ │ │
│ │ │ TEE VM (SEV-SNP/TDX) │ │ │
│ │ └─────────────────────────────┘ │ │
│ └───────────────────────────────────┘ │
└─────────────────────────────────────────┘

7.3 Additional Tools

ProjectPurposeURL
GramineLibrary OS for Intel SGXhttps://github.com/gramineproject/gramine
OcclumMemory-safe LibOS for SGXhttps://github.com/occlum/occlum
EnarxCross-platform TEE runtimehttps://github.com/enarx/enarx
AMD SEV ToolSEV platform managementhttps://github.com/AMDESE/sev-tool

8. TPU TEE Considerations

8.1 Current State: No Native TPU TEE Support

As of January 2026, Google TPUs do not provide native TEE capabilities. This represents a significant gap for organizations requiring hardware-enforced confidential computing on TPU infrastructure.

8.2 Architectural Challenges

ChallengeDescription
Custom ASIC DesignTPUs lack built-in TEE mechanisms present in general-purpose processors
Memory ArchitectureTPU HBM does not have hardware encryption like NVIDIA's implementation
AttestationNo hardware root of trust for TPU-specific attestation
Multi-tenancyTPU pods share infrastructure without hardware isolation

8.3 Current Workarounds

  1. Confidential VM Wrapper: Run TPU workloads within SEV-SNP/TDX Confidential VMs

    • Protects CPU memory and control plane
    • Does NOT extend protection to TPU memory/computation
  2. Software-Based Encryption: Encrypt data before sending to TPU

    • Protects data in transit
    • Data must be decrypted for TPU computation
  3. Differential Privacy: Apply noise to protect individual data points

    • Provides statistical privacy guarantees
    • Impacts model accuracy

8.4 Research Gap

No academic papers (as of 2025) describe native TPU TEE implementations. This remains an open research area.

9. Security Analysis and Threat Model

9.1 What TEEs Protect Against

ThreatProtection LevelNotes
Cloud operator accessStrongCannot access encrypted memory
Hypervisor compromiseStrongMemory remains encrypted
Physical memory dumpStrongCold boot attacks mitigated
Privileged malwareStrongCannot access TEE memory
Network eavesdroppingN/A (use TLS)TEE scope is compute

9.2 Residual Risks

RiskMitigationStatus
Side-channel attacksHardware countermeasures, software mitigationsPartially mitigated in Hopper/Genoa
Speculative executionMicrocode updatesOngoing
Supply chain attacksHardware attestation, firmware verificationStrong
Application vulnerabilitiesSecure coding, minimal TCBDeveloper responsibility

9.3 Attestation Importance

Critical Principle: Never trust a TEE without verifying its attestation evidence.

Attestation ensures:

10. Recommendations for Self-Hosted Deployment

10.1 Hardware Selection

ComponentRecommendationRationale
GPUNVIDIA H100 (80GB HBM3)Only production GPU with TEE support
CPUAMD EPYC 9004+ (Genoa)Best SEV-SNP support, TDISP ready
Alternative CPUIntel Xeon 4th Gen+TDX support, AMX acceleration
MemoryDDR5 with ECCRequired for enterprise reliability

10.2 Software Stack

# Recommended Configuration
Operating System: Ubuntu 22.04+ / RHEL 9+
Kernel: Linux 6.2+ (TDX/SEV-SNP support)
GPU Driver: NVIDIA Datacenter Driver 550+ (TRD release)
Container Runtime: containerd + Kata Containers
Orchestration: Kubernetes 1.28+ with CoCo operator
Inference Engine: vLLM / TensorRT-LLM / llama.cpp

10.3 Deployment Checklist

10.4 Model-Specific Considerations

ModelParametersMemory RequirementRecommended Configuration
Llama-3.1-8B8B~16GBSingle H100 80GB CC
Mistral-7B7B~14GBSingle H100 80GB CC
DeepSeek-R1-32B32B~64GBSingle H100 80GB CC
Llama-3.1-70B70B~140GB2x H100 with NVLink CC
DeepSeek-R1-671B671B~1.3TBMulti-node with secure NVLink

11. Future Directions

11.1 Emerging Technologies

  1. NVIDIA Blackwell Architecture: Expected enhanced CC capabilities with increased performance
  2. AMD SEV-TIO GA: Full Trusted I/O support for GPU passthrough
  3. Intel TDX 2.0: Improved performance and attestation features
  4. TPU TEE: Potential future Google development (not announced)

11.2 Research Frontiers

  1. Hybrid TEE Architectures: Optimizing security/performance trade-offs for specific model architectures
  2. Secure Multi-Party Computation: Combining TEE with MPC for multi-stakeholder scenarios
  3. Homomorphic Encryption + TEE: Exploring complementary cryptographic techniques
  4. Formal Verification: Proving security properties of TEE-based AI systems

11.3 Standardization Efforts

12. Conclusion

Trusted Execution Environments provide a practical and production-ready solution for self-hosting LLMs with strong privacy guarantees. The combination of NVIDIA H100 Confidential Computing with AMD SEV-SNP or Intel TDX enables end-to-end protection of data and models with acceptable performance overhead (4-8% for GPU workloads, <20% for CPU TEE).

Key Findings:

  1. GPU TEE is Production-Ready: NVIDIA H100 CC mode is deployed in major cloud providers with documented performance characteristics
  2. Overhead is Manageable: Larger batch sizes effectively amortize encryption costs
  3. Attestation is Critical: Remote verification should be mandatory for any confidential deployment
  4. TPU Gap Exists: No native TPU TEE support; use GPU for confidential workloads
  5. Open-Source Ecosystem: CoCo and nvTrust provide solid foundations for Kubernetes-based deployment

For VibeBrowser's privacy-focused enterprise customers, we recommend:

  1. Deploy on Azure NCCadsH100v5 or equivalent for fastest time-to-market
  2. Implement remote attestation verification in client applications
  3. Use Confidential Containers for Kubernetes-native orchestration
  4. Plan for on-premise deployment using H100 + EPYC 9004 for maximum control

13. References

[1] Confidential Computing Consortium, "Confidential Computing: Hardware-Based Trusted Execution for Applications and Data," CCC Whitepaper, 2022. https://confidentialcomputing.io/

[2] NVIDIA Corporation, "NVIDIA Hopper Confidential Computing," NVIDIA Documentation, 2024. https://docs.nvidia.com/nvtrust/

[3] AMD Inc., "AMD SEV-SNP: Strengthening VM Isolation with Integrity Protection and More," AMD Technical Documentation, 2024. https://www.amd.com/en/developer/sev.html

[4] Intel Corporation, "Intel Trust Domain Extensions (Intel TDX)," Intel Technology Documentation, 2024.

[5] M. Chrapek, M. Copik, E. Mettaz, and T. Hoefler, "Confidential LLM Inference: Performance and Cost Across CPU and GPU TEEs," arXiv:2509.18886, 2025.

[6] ARM Ltd., "ARM Confidential Compute Architecture," ARM Architecture Documentation, 2024.

[7] X. Wang, J. Shi, Z. Zhao, Y. Yu, Z. Hua, and J. Gu, "TZ-LLM: Protecting On-Device Large Language Models with Arm TrustZone," arXiv:2511.13717, 2025.

[8] S. Zhu et al., "Benchmarking NVIDIA H100 Confidential Computing for LLM Inference," arXiv:2409.03992, 2024.

[9] Y. Cai, Z. An, Y. Meng, H. Liu, P. Wang, H. Lei, Y. Guo, and D. Li, "PKUS: Trustworthy and Controllable Professional Knowledge Utilization in LLMs with TEE-GPU Execution," arXiv:2512.16238, 2025.

[10] J. Xue, Y. Zhao, M. Zheng, F. Yao, Y. Solihin, and Q. Lou, "TwinShield: Securing Transformer-based AI Execution via Unified TEEs and Crypto-protected Accelerators," arXiv:2507.03278, 2025.

[11] T. Nayan, Z. Zhang, and R. Sun, "SecureInfer: Heterogeneous TEE-GPU Architecture for Privacy-Critical Tensors for LLM Deployment," arXiv:2510.19979, 2025.

[12] A. Chan, A. Ding, F. Chen, A. Wu, B. Zhang, and A. Tian, "Optimistic TEE-Rollups: A Hybrid Architecture for Scalable and Verifiable Generative AI Inference on Blockchain," arXiv:2512.20176, 2025.

[13] R. Zhang, Y. Zhao, N. Javidnia, M. Zheng, and F. Koushanfar, "AttestLLM: Efficient Attestation Framework for Billion-scale On-device LLMs," arXiv:2509.06326, 2025.

[14] H. Yu, Y. Wang, F. Dai, D. Liu, H. Fan, and X. Gu, "CMIF: Towards Confidential and Efficient LLM Inference with Dual Privacy Protection," arXiv:2509.09091, 2025.

[15] D. Ben, H. Feng, and Q. Wang, "Distilled Large Language Model in Confidential Computing Environment for System-on-Chip Design," arXiv:2507.16226, 2025.

Appendix A: Glossary

TermDefinition
TEETrusted Execution Environment
SEVSecure Encrypted Virtualization (AMD)
SNPSecure Nested Paging (AMD)
TDXTrust Domain Extensions (Intel)
CCConfidential Computing
VCEKVersioned Chip Endorsement Key
RMPReverse Map Table
TDISPTEE Device Interface Security Protocol
CoCoConfidential Containers
MAAMicrosoft Azure Attestation
HBMHigh Bandwidth Memory

Appendix B: Quick Reference Commands

NVIDIA GPU CC Mode Configuration

# Check current CC mode
nvidia-smi conf-compute -grs

# Enable CC mode (requires reboot)
nvidia-smi conf-compute -srs 1

# Verify CC mode after reboot
nvidia-smi conf-compute -gcs

AMD SEV-SNP Verification

# Check SEV capability
dmesg | grep -i sev

# Verify SNP is enabled
cat /sys/module/kvm_amd/parameters/sev_snp

Attestation SDK Usage

# NVIDIA Attestation (Python)
from nvtrust.attestation import verify_gpu_attestation

result = verify_gpu_attestation(gpu_index=0)
if result.verified:
print("GPU attestation verified")

Document prepared for VibeBrowser internal research purposes. Last updated: January 2026.