Technical Architecture

Local Model Execution & Data Sovereignty

How AML Labs deploys agentic AI for compliance without transferring sensitive data to third-party ML providers, external APIs, or cross-border cloud regions โ€” ensuring full regulatory control.

Why Local Execution Matters

Financial institutions operating under AML/KYC regulations face strict data residency and processing requirements. Transmitting customer PII, transaction records, or risk assessments to external ML inference APIs introduces regulatory, security, and operational risks that are incompatible with enterprise compliance mandates.

๐Ÿ”’

Zero Data Exfiltration

All model inference runs within the client's own infrastructure boundary. No customer data, embeddings, or compliance artifacts leave the institution's controlled environment โ€” not even in encrypted form.

๐Ÿ›๏ธ

Regulatory Alignment

Satisfies data residency requirements under GDPR, UAE PDPL, DIFC Data Protection Law, ADGM regulations, and sector-specific guidance from CBUAE, DFSA, and FSRA without requiring cross-border data processing agreements.

โšก

Deterministic Latency

No dependency on external API rate limits, provider outages, or internet routing. Inference latency is bounded by local compute, enabling real-time compliance decisions during onboarding and transaction monitoring.

๐Ÿ”‘

Full Auditability

Every model version, prompt template, retrieval source, and inference output is logged within the institution's audit perimeter. Complete chain-of-custody for regulatory examination.

On-Premise Deployment Architecture

The system is composed of three isolated tiers โ€” ingestion, intelligence, and integration โ€” all executing within the institution's network boundary. No component makes outbound calls to external ML services.

Fig. 1 โ€” High-Level System Architecture
CLIENT INFRASTRUCTURE BOUNDARY DATA INGESTION TIER Core Banking System KYC / CDD / EDD Records Transaction Monitor Alerts / STRs / SARs Document Store IDs / Proof of Address / UBO Sanctions / PEP Lists Local Mirror โ€” Daily Sync ETL Pipeline Normalize โ†’ Validate โ†’ Chunk Vector Database pgvector / Milvus (local) AI INTELLIGENCE TIER Local LLM Runtime vLLM / Ollama / TGI GPU: A100 / H100 / L40S RAG Engine Retrieval-Augmented Generation Agent Orchestrator LangGraph / CrewAI (local) KYC Agent EDD Agent TM Agent QC Agent AUDIT LOG โ€” ALL INFERENCE INTEGRATION TIER Case Management Auto-populate decisions Analyst Dashboard Review / Approve / Override Regulatory Reporting goAML / FIU Feeds Risk Scoring Engine Dynamic risk recalculation REST / gRPC API Layer Internal network only RBAC & mTLS Zero-trust auth layer โœ• NO EXTERNAL ML API CALLS

Cloud ML APIs vs. Local Execution

Traditional approaches to AI-powered compliance rely on sending sensitive data to third-party inference endpoints. Our architecture eliminates this entirely.

Typical Cloud ML Approach
  • Customer PII sent to external inference APIs (OpenAI, Azure, AWS Bedrock)
  • Data may be processed in regions outside regulatory jurisdiction
  • Prompt content potentially used for model training by provider
  • Latency dependent on internet connectivity and API rate limits
  • Vendor lock-in to specific model provider's pricing and availability
  • Audit trail gaps โ€” inference logs held by third party
  • Requires complex Data Processing Agreements and cross-border safeguards
AML Labs โ€” Local Execution
  • All inference runs on institution-controlled GPU infrastructure
  • Data never leaves the designated compliance region or jurisdiction
  • No data shared with any ML provider โ€” zero training data leakage
  • Sub-100ms inference latency on local hardware, no API dependency
  • Model-agnostic โ€” swap between Llama, Mistral, Qwen, or fine-tuned variants
  • Complete audit trail within institution's own SIEM / log infrastructure
  • Simplified compliance โ€” no cross-border data transfer mechanisms needed

Inference Pipeline โ€” Zero External Transfer

A step-by-step view of how a compliance query is processed entirely within the institution's perimeter, from initial trigger through to analyst review.

Fig. 2 โ€” Inference Request Lifecycle
1. TRIGGER Alert / Onboarding event received via internal queue 2. RETRIEVE Vector search on local embeddings pgvector / Milvus 3. REASON Local LLM inference with RAG context vLLM on GPU cluster 4. VALIDATE QC agent verifies output + confidence rule-based + LLM check 5. DELIVER Structured result to case management REST API โ†’ Dashboard PERSISTENT AUDIT LOG Every stage writes: timestamp โ€ข model version โ€ข input hash โ€ข output โ€ข confidence score โ€ข retrieval sources โ€ข latency NETWORK BOUNDARY โ€” NO OUTBOUND ML TRAFFIC Firewall rules explicitly block egress to known ML inference endpoints (api.openai.com, *.azure.com/openai, bedrock.*.amazonaws.com) Network monitoring alerts on any attempted outbound connection to ML service IPs โ€” integrated with SOC / SIEM

Reference Implementation

A proven stack of open-source and enterprise-grade components, each selected for on-premise deployability, auditability, and compliance-readiness.

LLM Inference
vLLM / Ollama / TGI
High-throughput local serving with PagedAttention, continuous batching, and quantized model support (GPTQ, AWQ, GGUF).
Base Models
Llama 3.x / Mistral / Qwen
Open-weight models fine-tuned on AML/KYC domain data. No dependency on any proprietary model API.
Embeddings
BGE / E5 / Nomic Embed
Locally-hosted embedding models for document vectorization. No external embedding API calls.
Vector Store
pgvector / Milvus
On-premise vector database for semantic search over compliance documents, regulations, and client records.
Orchestration
LangGraph / CrewAI
Multi-agent orchestration with structured workflows for KYC, EDD, transaction monitoring, and quality control.
Document Processing
Unstructured / DocTR
Local OCR and document parsing for IDs, proof of address, corporate documents, and UBO structures.
Compute
NVIDIA A100 / H100 / L40S
On-premise or private cloud GPU infrastructure. Kubernetes-orchestrated for scaling and failover.
Observability
Prometheus / Grafana / ELK
Full inference monitoring, latency tracking, model drift detection, and audit log aggregation within the institution's SIEM.

Network Isolation & Trust Zones

The deployment uses a defense-in-depth approach with distinct network zones, each enforcing strict ingress/egress rules to guarantee that sensitive data and model inference remain within the compliance perimeter.

Fig. 3 โ€” Network Security Zones
DMZ Sanctions Feed Inbound only โ€” TLS FIU / goAML Outbound reports โ€” mTLS Reverse Proxy WAF + rate limiting Egress Firewall BLOCK: *.openai.com BLOCK: *.azure.com/openai FW APPLICATION ZONE โ€” RESTRICTED API Gateway mTLS + JWT + RBAC Analyst UI SSO + MFA enforced Agent Orchestrator + RAG Engine Business logic โ€” no external network access Local LLM Inference Cluster GPU nodes โ€” vLLM serving โ€” no egress permitted Message Queue Kafka / RabbitMQ Monitoring Prometheus + Grafana FW DATA ZONE โ€” HIGHEST TRUST Customer PII Store Encrypted at rest โ€” AES-256 Vector Database Embeddings โ€” no raw PII Audit Database Immutable append-only log Model Registry Versioned weights + configs NO EXTERNAL NETWORK ACCESS

Implementation Workflow

A structured deployment process ensures the local AI infrastructure meets compliance requirements from day one, with validation at every stage.

01

Infrastructure Assessment & Provisioning

Evaluate existing compute infrastructure and provision GPU nodes within the institution's data centre or private cloud. Establish network segmentation, firewall rules blocking outbound ML endpoints, and mTLS certificates for internal service communication.

02

Model Selection & Domain Fine-Tuning

Select open-weight base models and fine-tune on the institution's anonymized compliance data โ€” including past KYC decisions, EDD reports, and regulatory correspondence. All training runs locally; no data leaves the environment. Model weights are stored in a versioned local registry.

03

RAG Pipeline & Knowledge Base Construction

Ingest and vectorize institutional knowledge: internal policies, regulatory guidance (CBUAE, FATF, EU AMLDs), sanctions lists, and historical case files. Embeddings generated locally using open-source models, stored in on-premise vector database.

04

Agent Design & Workflow Configuration

Configure specialized agents โ€” KYC reviewer, EDD analyst, transaction monitoring assessor, quality control verifier โ€” with structured prompt templates, tool access permissions, and escalation rules. Define human-in-the-loop checkpoints.

05

Integration & Validation Testing

Connect to existing core banking, case management, and regulatory reporting systems via internal APIs. Run parallel testing against historical cases to validate accuracy, measure false positive reduction, and calibrate confidence thresholds before go-live.

06

Production Deployment & Continuous Monitoring

Gradual rollout with real-time monitoring of inference latency, model accuracy, and drift metrics. Automated alerting on anomalies. Regular model retraining cycles using updated institutional data โ€” always executed locally.

Data Sovereignty Compliance Matrix

Local model execution directly addresses the data handling requirements of major regulatory frameworks applicable to financial institutions in the UAE, EU, and globally.

Regulation Requirement Local Execution
GDPR (EU) Data minimization, purpose limitation, restricted cross-border transfers โœ“ Satisfied โ€” no external transfer
UAE Federal PDPL Personal data processed within UAE or approved jurisdiction โœ“ Satisfied โ€” on-premise in UAE
DIFC DPL Adequate data protection for DIFC-based entities โœ“ Satisfied โ€” local processing
ADGM DPR Data protection principles for ADGM entities โœ“ Satisfied โ€” no third-party sharing
CBUAE AML Guidelines Secure handling of customer due diligence data โœ“ Satisfied โ€” full audit trail
FATF Recommendation 15 Appropriate controls for new technologies in AML/CFT โœ“ Satisfied โ€” controlled, auditable AI