AgenixHub company logo AgenixHub
Menu
Technical Documentation

Zero-Trust RAG Design

Version 1.0 | Last Updated: December 2024

Abstract

This document provides an architecture design for implementing Retrieval-Augmented Generation (RAG) systems with zero-trust security principles. It ensures document-level access control, encryption at every stage, and comprehensive audit logging throughout the retrieval pipeline. Target audience: security architects, ML engineers, and infrastructure teams implementing enterprise RAG systems for regulated industries (healthcare, finance, defense).

Zero-Trust Principles Applied to RAG

Zero-trust architecture operates on the assumption that threats exist both inside and outside the network. Applied to RAG systems, every retrieval request is treated as potentially hostile until proven otherwise.

1. Never Trust, Always Verify

Every query is authenticated and authorized before accessing the vector database. No implicit trust based on network location or previous access.

2. Assume Breach

Design system as if attackers are already inside. Encrypt data in motion and at rest. Minimize lateral movement through micro-segmentation.

3. Verify Explicitly

Use all available data points for authentication: user identity, device health, location, time of access, risk score.

4. Least Privilege Access

Users retrieve ONLY documents they need. Document-level permissions enforced dynamically at query time, not at ingestion.

5. Continuous Monitoring

Log and analyze all retrievals in real-time. Detect anomalies (unusual access patterns, bulk retrievals, off-hours queries).

Secure RAG Pipeline Architecture

Complete Zero-Trust RAG Pipeline

User Query Input
2. Authentication
  • SSO/SAML/OAuth
  • MFA Required
  • Session Validation
3. Authorization
  • Check User Roles
  • Validate Permissions
  • Risk Assessment
4. Input Validation
  • Sanitize Query
  • Detect Injection
  • Rate Limit Check
5. Embedding Generation
  • Convert to Vector
  • Encrypt in Memory
  • Log Query Hash
6. Vector Search with ACL Filtering
Vector DB Query
  • Retrieve Top-K Similar Docs
  • K=20 pre-filter
Document-Level ACL Filter
  • Check User Permissions
  • Remove Unauthorized Docs
  • Keep Top-5 post-filter
7. Retrieval Logging
  • Document IDs Retrieved
  • User ID/Timestamp
  • Access Justification
8. LLM Generation
  • Context + Query → LLM
  • Generate Response
9. Output Filtering
  • DLP Scan PII/PHI
  • Toxicity Check
  • Citation Generation
10. Response Logging
  • Full Response Logged
  • User Feedback Captured
Return to User

Figure 1: Zero-trust RAG pipeline with security controls at each stage. Every step validates authorization and logs actions for audit trail. Grey boxes indicate security checkpoints.

Document-Level Access Control

RBAC (Role-Based Access Control)

Traditional approach: Assign roles to users, assign permissions to roles.

Example RBAC Configuration:

{
  "roles": {
    "healthcare_provider": {
      "permissions": ["read:patient_records", "read:treatment_plans"]
    },
    "researcher": {
      "permissions": ["read:anonymized_data", "read:publications"]
    },
    "admin": {
      "permissions": ["read:*", "write:*", "delete:*"]
    }
  },
  "users": {
    "dr.smith@hospital.com": ["healthcare_provider"],
    "researcher.jones@university.edu": ["researcher"]
  }
}

ABAC (Attribute-Based Access Control)

More flexible: Access decisions based on attributes (user, resource, environment).

Example ABAC Policy:

ALLOW access IF:
  user.department == document.department AND
  user.clearance_level >= document.classification_level AND
  current_time.hour >= 8 AND current_time.hour <= 18 AND
  user.location == "corporate_network"

Dynamic Policy Evaluation

Evaluate access policies in real-time during vector similarity search:

  1. 1. Query Vector DB: Retrieve top-20 most similar document chunks
  2. 2. Fetch Document Metadata: For each chunk, get: owner, classification, tags
  3. 3. Apply ACL Filter: Remove chunks user is not authorized to see
  4. 4. Return Top-5: Of the remaining authorized chunks, return highest scoring

Encryption Strategy

Data State Encryption Method Key Management
At-Rest (Vector DB) AES-256-GCM AWS KMS, Azure Key Vault, or HSM
In-Transit (API Calls) TLS 1.3 Certificate rotation every 90 days
In-Memory (Embeddings) Encrypted RAM (Intel SGX or ARM TrustZone) Hardware-backed keys
Backup Storage AES-256-GCM + GPG Offline keys in secure vault

Key Rotation Policy

  • Master Keys: Rotate every 12 months or immediately upon compromise
  • Data Encryption Keys: Rotate every 90 days
  • API Keys/Tokens: Rotate every 30 days
  • TLS Certificates: Rotate every 90 days (automated via Let's Encrypt or ACME)

Audit & Compliance

Query Logging (Who, What, When)

Required Log Fields:

{
  "timestamp": "2024-12-18T01:15:42Z",
  "user_id": "dr.smith@hospital.com",
  "session_id": "sess_a1b2c3d4e5",
  "query_hash": "sha256:abc123...def456",
  "retrieved_document_ids": ["doc_001", "doc_003", "doc_007"],
  "access_justification": "patient_care",
  "ip_address": "10.0.1.45",
  "user_agent": "AgenixChat/1.0",
  "response_time_ms": 342
}
}

Document Access Tracking

For HIPAA compliance, track every access to PHI:

  • Access Logs: Who accessed which document, when, and why
  • Retention: 7 years minimum (HIPAA requirement)
  • Tamper-Proof: Write-once storage (WORM) or blockchain-backed logs
  • Alerts: Notify on unusual access (bulk downloads, off-hours, terminated employees)

Retrieval Justification

In highly regulated environments, users must provide justification for data access:

Example Justification Flow:

  1. 1. User Query: "What medications is patient John Doe taking?"
  2. 2. System Prompt: "Why do you need access to this patient's records?"
  3. 3. User Response: "Ongoing treatment for scheduled appointment"
  4. 4. System: Logs justification, proceeds with retrieval if policy allows

Implementation Example (Python Pseudocode)

# Zero-Trust RAG Query with ACL Filtering

async def query_rag_with_acl(query: str, user: User) -> str:
    # 1. Authentication & Authorization
    if not user.is_authenticated():
        raise Unauthorized("User not authenticated")
    
    if not await check_rate_limit(user.id):
        raise RateLimitExceeded("Too many requests")
    
    # 2. Input Validation
    query = sanitize_input(query)
    if detect_prompt_injection(query):
        log_security_event(user, "prompt_injection_attempt")
        raise SecurityViolation("Invalid query detected")
    
    # 3. Embedding Generation
    query_embedding = await embed_query(query)  # Encrypted in memory
    
    # 4. Vector Search (Pre-Filter)
    results = await vector_db.search(
        embedding=query_embedding,
        top_k=20  # Retrieve more than needed for ACL filtering
    )
    
    # 5. Document-Level ACL Filter
    authorized_docs = []
    for doc in results:
        doc_metadata = await get_document_metadata(doc.id)
        if await check_user_permission(user, doc_metadata):
            authorized_docs.append(doc)
            # Log retrieval
            await log_document_access(
                user_id=user.id,
                document_id=doc.id,
                timestamp=datetime.now(),
                justification="query_response"
            )
    
    # Keep only top-5 after filtering
    authorized_docs = authorized_docs[:5]
    
    if not authorized_docs:
        return "No authorized documents found for your query."
    
    # 6. Generate Response
    context = "\n".join([doc.content for doc in authorized_docs])
    response = await llm.generate(
        prompt=f"Context: {context}\n\nQuery: {query}"
    )
    
    # 7. Output Filtering (DLP)
    response = await filter_pii(response)
    response = await filter_phi(response)
    
    # 8. Log Response
    await log_query_response(user.id, query, response)
    
    return response

Zero-Trust RAG Compliance Checklist

Authentication required for all queries (MFA for high-risk access)
Document-level access control enforced at query time
All data encrypted at rest (AES-256), in transit (TLS 1.3), and in memory
Comprehensive audit logging (tamper-proof, 7-year retention)
Rate limiting in place (100 requests/hour/user)
Input validation and prompt injection detection
Output filtering for PII/PHI (DLP integration)
Key rotation policy implemented (90-day rotation)
Security monitoring with anomaly detection

Related Documentation