Zero-Trust RAG Design
Version 1.0 | Last Updated: December 2024
Abstract
This document provides an architecture design for implementing Retrieval-Augmented Generation (RAG) systems with zero-trust security principles. It ensures document-level access control, encryption at every stage, and comprehensive audit logging throughout the retrieval pipeline. Target audience: security architects, ML engineers, and infrastructure teams implementing enterprise RAG systems for regulated industries (healthcare, finance, defense).
Zero-Trust Principles Applied to RAG
Zero-trust architecture operates on the assumption that threats exist both inside and outside the network. Applied to RAG systems, every retrieval request is treated as potentially hostile until proven otherwise.
1. Never Trust, Always Verify
Every query is authenticated and authorized before accessing the vector database. No implicit trust based on network location or previous access.
2. Assume Breach
Design system as if attackers are already inside. Encrypt data in motion and at rest. Minimize lateral movement through micro-segmentation.
3. Verify Explicitly
Use all available data points for authentication: user identity, device health, location, time of access, risk score.
4. Least Privilege Access
Users retrieve ONLY documents they need. Document-level permissions enforced dynamically at query time, not at ingestion.
5. Continuous Monitoring
Log and analyze all retrievals in real-time. Detect anomalies (unusual access patterns, bulk retrievals, off-hours queries).
Secure RAG Pipeline Architecture
Complete Zero-Trust RAG Pipeline
- SSO/SAML/OAuth
- MFA Required
- Session Validation
- Check User Roles
- Validate Permissions
- Risk Assessment
- Sanitize Query
- Detect Injection
- Rate Limit Check
- Convert to Vector
- Encrypt in Memory
- Log Query Hash
- Retrieve Top-K Similar Docs
- K=20 pre-filter
- Check User Permissions
- Remove Unauthorized Docs
- Keep Top-5 post-filter
- Document IDs Retrieved
- User ID/Timestamp
- Access Justification
- Context + Query → LLM
- Generate Response
- DLP Scan PII/PHI
- Toxicity Check
- Citation Generation
- Full Response Logged
- User Feedback Captured
Figure 1: Zero-trust RAG pipeline with security controls at each stage. Every step validates authorization and logs actions for audit trail. Grey boxes indicate security checkpoints.
Document-Level Access Control
RBAC (Role-Based Access Control)
Traditional approach: Assign roles to users, assign permissions to roles.
Example RBAC Configuration:
{
"roles": {
"healthcare_provider": {
"permissions": ["read:patient_records", "read:treatment_plans"]
},
"researcher": {
"permissions": ["read:anonymized_data", "read:publications"]
},
"admin": {
"permissions": ["read:*", "write:*", "delete:*"]
}
},
"users": {
"dr.smith@hospital.com": ["healthcare_provider"],
"researcher.jones@university.edu": ["researcher"]
}
} ABAC (Attribute-Based Access Control)
More flexible: Access decisions based on attributes (user, resource, environment).
Example ABAC Policy:
ALLOW access IF: user.department == document.department AND user.clearance_level >= document.classification_level AND current_time.hour >= 8 AND current_time.hour <= 18 AND user.location == "corporate_network"
Dynamic Policy Evaluation
Evaluate access policies in real-time during vector similarity search:
- 1. Query Vector DB: Retrieve top-20 most similar document chunks
- 2. Fetch Document Metadata: For each chunk, get: owner, classification, tags
- 3. Apply ACL Filter: Remove chunks user is not authorized to see
- 4. Return Top-5: Of the remaining authorized chunks, return highest scoring
Encryption Strategy
| Data State | Encryption Method | Key Management |
|---|---|---|
| At-Rest (Vector DB) | AES-256-GCM | AWS KMS, Azure Key Vault, or HSM |
| In-Transit (API Calls) | TLS 1.3 | Certificate rotation every 90 days |
| In-Memory (Embeddings) | Encrypted RAM (Intel SGX or ARM TrustZone) | Hardware-backed keys |
| Backup Storage | AES-256-GCM + GPG | Offline keys in secure vault |
Key Rotation Policy
- Master Keys: Rotate every 12 months or immediately upon compromise
- Data Encryption Keys: Rotate every 90 days
- API Keys/Tokens: Rotate every 30 days
- TLS Certificates: Rotate every 90 days (automated via Let's Encrypt or ACME)
Audit & Compliance
Query Logging (Who, What, When)
Required Log Fields:
{
"timestamp": "2024-12-18T01:15:42Z",
"user_id": "dr.smith@hospital.com",
"session_id": "sess_a1b2c3d4e5",
"query_hash": "sha256:abc123...def456",
"retrieved_document_ids": ["doc_001", "doc_003", "doc_007"],
"access_justification": "patient_care",
"ip_address": "10.0.1.45",
"user_agent": "AgenixChat/1.0",
"response_time_ms": 342
}
} Document Access Tracking
For HIPAA compliance, track every access to PHI:
- Access Logs: Who accessed which document, when, and why
- Retention: 7 years minimum (HIPAA requirement)
- Tamper-Proof: Write-once storage (WORM) or blockchain-backed logs
- Alerts: Notify on unusual access (bulk downloads, off-hours, terminated employees)
Retrieval Justification
In highly regulated environments, users must provide justification for data access:
Example Justification Flow:
- 1. User Query: "What medications is patient John Doe taking?"
- 2. System Prompt: "Why do you need access to this patient's records?"
- 3. User Response: "Ongoing treatment for scheduled appointment"
- 4. System: Logs justification, proceeds with retrieval if policy allows
Implementation Example (Python Pseudocode)
# Zero-Trust RAG Query with ACL Filtering
async def query_rag_with_acl(query: str, user: User) -> str:
# 1. Authentication & Authorization
if not user.is_authenticated():
raise Unauthorized("User not authenticated")
if not await check_rate_limit(user.id):
raise RateLimitExceeded("Too many requests")
# 2. Input Validation
query = sanitize_input(query)
if detect_prompt_injection(query):
log_security_event(user, "prompt_injection_attempt")
raise SecurityViolation("Invalid query detected")
# 3. Embedding Generation
query_embedding = await embed_query(query) # Encrypted in memory
# 4. Vector Search (Pre-Filter)
results = await vector_db.search(
embedding=query_embedding,
top_k=20 # Retrieve more than needed for ACL filtering
)
# 5. Document-Level ACL Filter
authorized_docs = []
for doc in results:
doc_metadata = await get_document_metadata(doc.id)
if await check_user_permission(user, doc_metadata):
authorized_docs.append(doc)
# Log retrieval
await log_document_access(
user_id=user.id,
document_id=doc.id,
timestamp=datetime.now(),
justification="query_response"
)
# Keep only top-5 after filtering
authorized_docs = authorized_docs[:5]
if not authorized_docs:
return "No authorized documents found for your query."
# 6. Generate Response
context = "\n".join([doc.content for doc in authorized_docs])
response = await llm.generate(
prompt=f"Context: {context}\n\nQuery: {query}"
)
# 7. Output Filtering (DLP)
response = await filter_pii(response)
response = await filter_phi(response)
# 8. Log Response
await log_query_response(user.id, query, response)
return response