01 Problem & Threat Landscape
AI-assisted server management introduces a fundamental conflict: the AI needs raw data to reason effectively, but raw data contains secrets that must not leave the infrastructure boundary.
Current State (Without This Architecture)
1.
cat wp-config.php — DB credentials, auth keys, salts sent to third-party2.
grep -r password /etc/ — system credentials in AI context3.
tail -f /var/log/auth.log — usernames, IPs, session data exposed4.
cat /root/.ssh/id_rsa — private key material in third-party memory5. Application logs containing customer PII (names, emails, IPs) processed by public AI
Proposed State (With This Architecture)
02 System Architecture
The architecture implements a layered isolation pattern across four security zones. The Claude Code instance runs in the Company WAN — isolated from the internet, accessible only to a defined user set. All intelligence that touches raw data runs in a separate trusted zone that only the Claude Code instance can reach.
- Claude Opus / Sonnet API inference
- Web Search (research)
- All data entering this zone is considered permanently disclosed
Only anonymized context reaches Anthropic
- High-level reasoning (via Anthropic API)
- Web browsing for research
- Task planning & delegation
- No direct SSH to targets
- Calls MCP Server via MCP protocol
- Only allowed outbound to trusted zone
Only connection from Company WAN into Trusted Zone
Each component runs as a separate instance — VMs, bare-metal, or Kubernetes/Docker resources
- Ephemeral key generation
- Session management
- Command execution
- Layer 1: Regex (deterministic)
- Layer 2: Local AI (contextual)
- Canary token validation
- Local-first resolution
- Anonymize + escalate to Claude Code
- Web search for research
- Anonymized I/O logging
- Command audit trail
- Cost tracking
- Self-hosted LLM inference
- Raw data access for local reasoning
- Anonymization Layer 2
- No external network access
- PostgreSQL: session metadata, JSONB
- Qdrant: vector embeddings
- Auto-purge raw logs after embedding
- No external network access
- Company infrastructure
- SSH inbound from MCP Server only
- No AI agent runs on targets
- Customer webspace, VPS, servers
- SSH inbound from MCP Server IP
- Ephemeral keys, source-restricted
03 Security Zones & Network Topology
The architecture defines four distinct security zones with strict, unidirectional data flow. Network policies enforce zone boundaries at the infrastructure level. Each zone has clearly defined trust boundaries and communication rules.
Component Responsibilities
| Component | Zone | Role |
|---|---|---|
| Anthropic Cloud | Internet | LLM inference (Opus/Sonnet), web search |
| Claude Code Instance | Company WAN | High-level planning, task delegation, research, user interface |
| MCP Server | Trusted Zone | SSH orchestration, anonymization, escalation, web search, logging |
| Local AI (MiniMax 2.5) | Trusted Zone (separate instance) | Low-level reasoning on raw data, anonymization Layer 2 |
| Storage (PG + Qdrant) | Trusted Zone (separate instance) | Anonymized session logs, vector embeddings |
| Target Machines | Variable | SSH endpoints — internal company servers or external customer infrastructure |
Network Policy Matrix
| Source → Destination | Protocol | Policy |
|---|---|---|
| Claude Code → Anthropic Cloud | HTTPS :443 | ALLOW API calls, web search |
| Claude Code → MCP Server | MCP (JSON-RPC) | ALLOW only allowed outbound to Trusted Zone |
| Claude Code → Targets | SSH / Any | DENY no direct access |
| MCP Server → Claude Code | Response only | ALLOW MCP responses |
| MCP Server → Local AI | gRPC / HTTP :8080 | ALLOW inference requests |
| MCP Server → Storage | PostgreSQL / HTTP | ALLOW logging, embeddings |
| MCP Server → Targets | SSH :22 | ALLOW ephemeral key only |
| MCP Server → Internet | HTTPS :443 | ALLOW web search for research |
| Local AI → Internet | Any | DENY air-gapped |
| Storage → Internet | Any | DENY air-gapped |
| Targets → MCP Server | Any | DENY SSH is outbound-only from MCP |
| Internet → Company WAN | Any | DENY WAN-only access |
| Internet → Trusted Zone | Any | DENY no inbound |
Kubernetes NetworkPolicy (Trusted Zone — Local AI)
Kubernetes NetworkPolicy (Trusted Zone — Storage)
04 SSH Key Lifecycle
SSH authentication uses ephemeral, API-issued keys with a hardcoded 15-minute TTL. The key lifecycle is fully automated with no persistent secrets.
ssh_connect with target host and user. The MCP server generates an Ed25519 keypair in memory.-V +15m). The CA private key is stored in Kubernetes Secrets, never in the container image. Certificate includes force-command and source-address restrictions.Target Machine sshd_config
Even a stolen key is useless after expiry — and only from MCP server IP.
05 Data Flow & Escalation
The system implements a local-first resolution strategy. MiniMax 2.5 handles as much as possible without involving the public AI. Escalation only happens when local reasoning is insufficient.
Escalation Decision Matrix
| Scenario | Local (MiniMax) | Escalate (Claude) | Rationale |
|---|---|---|---|
| Service restart | Yes | Routine operation, no reasoning needed | |
| Log analysis (pattern match) | Yes | Known error patterns, local resolution | |
| Config syntax error | Yes | Deterministic validation | |
| Complex debugging | Yes | Multi-step reasoning on anonymized context | |
| Architecture decisions | Yes | Requires broader knowledge | |
| Performance optimization | Yes | Needs advanced reasoning on metrics | |
| Security incident response | Yes | Sensitive data stays local, no escalation |
06 Anonymization Pipeline
Every piece of data passes through two independent anonymization layers before leaving the self-hosted infrastructure. Applied at two trigger points: (1) before escalation to Claude and (2) before writing to persistent logs.
Coverage: ~85% of sensitive patterns
Pattern categories:
• IPv4/IPv6 addresses →
[IP_REDACTED]• Email addresses →
[EMAIL_REDACTED]• API keys (AWS, GCP, Stripe, etc.) →
[APIKEY_REDACTED]• Connection strings →
[CONNSTR_REDACTED]• Private key material →
[PRIVKEY_REDACTED]• JWT tokens →
[JWT_REDACTED]• Basic auth headers →
[AUTH_REDACTED]• File paths matching sensitive patterns
• Known file formats (.env, wp-config.php, etc.)
Coverage: Edge cases Layer 1 misses
Catches:
• Credentials in non-standard formats
• Passwords embedded in shell scripts
• PII in application log messages
• Custom tokens in proprietary formats
• Hostnames revealing customer identity
• Database content with personal data
• Secrets in comments or documentation
• Encoded/base64 sensitive values
Mode: Classification-only. Cannot modify instructions. Output is a list of byte ranges to redact.
Example: Anonymization in Action
Canary Token Validation
Continuous pipeline integrity verification without manual auditing. Canary tokens are known-fake sensitive values injected into the pipeline at regular intervals.
| Canary Type | Example | Expected Result | On Failure |
|---|---|---|---|
| AWS Key | AKIA_CANARY_TOKEN_001 |
[APIKEY_REDACTED] |
HALT pipeline |
| IPv4 Address | 10.255.255.1 |
[IP_REDACTED] |
HALT pipeline |
canary@test.internal |
[EMAIL_REDACTED] |
HALT pipeline | |
| Private Key | -----BEGIN RSA PRIVATE KEY-----\nCANARY... |
[PRIVKEY_REDACTED] |
HALT pipeline |
| Embedded PII | User John Doe (john@canary.test) logged in |
User [PII] ([EMAIL_REDACTED]) logged in |
HALT pipeline |
07 Prompt Injection Defense
Remote file content is inherently untrusted. A malicious actor could embed LLM instructions inside config files, logs, or even filenames to manipulate MiniMax 2.5 into exfiltrating data or executing unintended commands.
Attack Vectors
Defense Layers
-
D1: Strict Context Separation (System Prompt Architecture)
MiniMax 2.5's system prompt uses explicit boundary markers:
<SYSTEM_INSTRUCTIONS>for trusted instructions and<UNTRUSTED_DATA>for file content. The model is trained to never interpret content within untrusted blocks as instructions. The system prompt is immutable and loaded from a signed config, not from any user-accessible source. -
D2: Data Envelope Wrapping
All SSH output is wrapped before reaching MiniMax:
<FILE_CONTENT source="ssh" trust="none" path="[REDACTED]">...</FILE_CONTENT>. The model processes envelope metadata (file type, size, encoding) but treats the content as opaque data to reason about, never as instructions to follow. -
D3: Output Validation & Exfiltration Detection
A post-processing layer checks MiniMax responses for: (a) verbatim content copied from input files (b) base64-encoded strings that decode to input data (c) responses that are suspiciously long relative to the question (d) attempts to include raw data in "reasoning" text. Flagged responses are blocked and logged.
-
D4: Constrained Tool Access
MiniMax 2.5 cannot issue SSH commands directly. It can only: analyze data, classify content, and return structured JSON responses to the MCP server. The MCP server alone decides which commands to execute. Even if injection succeeds, the model has no tools to act on it.
-
D5: Behavioral Monitoring
Statistical baselines for MiniMax response patterns. Anomalous behavior triggers circuit breaker: (a) sudden increase in response size, (b) responses containing patterns matching sensitive data formats, (c) requests to "ignore" or "override" in model output. Circuit breaker halts the session and alerts operations.
08 Threat Model & Attack Surface
Systematic analysis of attack vectors, their likelihood, impact, and mitigations. Follows STRIDE methodology.
| Threat | Vector | Severity | Mitigation |
|---|---|---|---|
| Data exfiltration via Claude context | Anonymization bypass | Critical | Dual-layer anonymization + canary tokens + pipeline halt |
| SSH key theft | Memory dump of MCP server | High | 15min TTL, in-memory only, source-IP restriction on cert |
| Prompt injection via file content | Malicious file on target | High | 5-layer defense (D1–D5), constrained tool access |
| MiniMax model compromise | Adversarial input to LLM | High | No external network, no tool access, output validation |
| CA key compromise | K8s secret extraction | Critical | HSM backing, RBAC, audit logging on secret access |
| Lateral movement via MCP server | Container escape | High | Minimal container (distroless), read-only rootfs, no capabilities |
| Log data re-identification | Correlation attack on embeddings | Medium | Raw log purge after embedding, no raw data retention |
| Man-in-the-middle on MCP protocol | Network interception | Low | MCP runs over stdio (local process), no network exposure |
| Denial of service on MiniMax | Large payload flooding | Medium | Request size limits, rate limiting, circuit breaker |
| Supply chain attack on dependencies | Compromised npm/Docker package | Medium | Immutable images, pinned versions, Trivy scanning in CI |
Blast Radius Analysis
-
Worst Case: Full anonymization failure
Raw data reaches Claude/Anthropic. Containment: Canary system detects within seconds. Pipeline halts. Affected session quarantined. Data Processing Agreement with Anthropic required under DSGVO. Blast radius: Limited to one session's data (max ~15 minutes of SSH output). No persistent exposure beyond Anthropic's retention policy.
-
Moderate Case: SSH key compromise
Attacker obtains ephemeral key. Containment: Key expires in 15 minutes. Valid only from MCP server IP. Certificate includes source-address restriction. Blast radius: Access to one target machine for ≤15 minutes, only from the MCP server's network.
-
Contained Case: MiniMax prompt injection
Attacker manipulates MiniMax output. Containment: MiniMax has no tools, no network, no SSH access. Output validation catches anomalies. Blast radius: Incorrect analysis returned to MCP server. No data exfiltration possible — model cannot initiate outbound connections.
09 Logging & Observability
Every SSH session is fully auditable, but raw data is never retained beyond processing. The logging pipeline implements a strict ingest → embed → purge lifecycle.
Session Log Schema (PostgreSQL)
Data Lifecycle
stdout_anon and stderr_anon JSONB fields are set to NULL. Only structural metadata remains: hashes, counts, durations, anonymization stats. Vector embeddings are non-reversible — original text cannot be reconstructed.Observability Stack
| Metric | Source | Alert Threshold |
|---|---|---|
| Anonymization latency (p99) | MCP Server | >500ms → warning, >2s → critical |
| Canary token pass rate | Canary validator | <100% → HALT |
| Escalation ratio | MCP Server | >40% → review MiniMax effectiveness |
| SSH session duration | Session logger | >14 min → warning (approaching key expiry) |
| MiniMax inference latency (p99) | MiniMax service | >5s → scale up |
| Output validation rejections | Post-processor | >5% → investigate prompt injection attempts |
| Raw log purge lag | Purge worker | >10 min → critical (DSGVO exposure) |
10 DSGVO Compliance
Privacy is not a feature — it is the architecture. Every component is designed with DSGVO principles as structural constraints, not add-ons.
| DSGVO Requirement | Implementation | Verification |
|---|---|---|
| Privacy by Design (Art. 25) | Anonymization is a structural component, not a filter | Architecture review, canary testing |
| Storage Limitation (Art. 5(1)(e)) | Raw logs purged after embedding (≤5 min) | Purge lag monitoring, audit trail |
| Data Minimization (Art. 5(1)(c)) | Only anonymized embeddings retained long-term | Schema constraints (JSONB fields nulled) |
| Security of Processing (Art. 32) | Ephemeral SSH keys, network isolation, K8s policies | Penetration testing, NetworkPolicy audit |
| Access Control | RBAC on all data stores, audit logging | K8s RBAC review, access log analysis |
| Data Processing Agreement | Required with Anthropic only if anonymization fails | Canary system provides continuous proof |
11 Infrastructure Deployment
The Trusted Zone runs as a single-namespace Kubernetes deployment with strict resource isolation, network policies, and immutable infrastructure. The Claude Code instance runs separately in the Company WAN. All components can alternatively be deployed as VMs or bare-metal servers.
| Component | K8s Resource | Replicas | Resources | Stack |
|---|---|---|---|---|
| MCP Server | Deployment | 2 (HA) | 512Mi / 1 CPU | Node.js 22 Agent SDK |
| MiniMax 2.5 | Deployment | 1–4 (HPA) | 8Gi / 4 CPU (or GPU) | Self-hosted LLM |
| PostgreSQL | StatefulSet | 1 (primary) | 2Gi / 1 CPU | PostgreSQL 16 |
| Qdrant | StatefulSet | 1 | 2Gi / 1 CPU | Qdrant 1.x |
| SSH Key Service | Deployment | 2 (HA) | 128Mi / 0.25 CPU | Go Ed25519 |
| Purge Worker | CronJob | 1 (every 5 min) | 256Mi / 0.5 CPU | Node.js |
| Canary Validator | CronJob | 1 (every 1 min) | 128Mi / 0.25 CPU | Node.js |
Security Hardening
-
Distroless container images
No shell, no package manager, no debugging tools in production images. Attack surface minimized to application binary and runtime dependencies only.
-
Read-only root filesystem
All containers run with
readOnlyRootFilesystem: true. Ephemeral data usesemptyDirvolumes. No write access to the container filesystem prevents persistent malware. -
No privileged capabilities
drop: [ALL]in securityContext. NoNET_RAW, noSYS_ADMIN. Containers run as non-root user (UID 1000).allowPrivilegeEscalation: false. -
Secrets management
CA private key stored in Kubernetes Secrets (or external HSM via CSI driver). Mounted as read-only volume, never as environment variable. Access audited via K8s audit log.
-
Image provenance
All images built in CI, signed with cosign, verified at admission via Kyverno/OPA policy. No
:latesttags. Pinned digests only. Trivy vulnerability scanning on every build.
Horizontal Pod Autoscaler (MiniMax)
12 Roadmap
Implementation phases ordered by dependency and risk. Each phase is independently deployable and testable.