Technical Deep Dive · Agent Architecture

Multi-Agent Framework
Evolution Architecture

A production-grade analysis of multi-agent AI systems — anatomy, request flow, capability gaps, and the four-phase evolution from single-agent to fully autonomous multi-agent orchestration. This is how Atsky builds AI that scales.

See Our Product → Architecture Blueprints
Atsky Agent SDK — Full Architecture Analysis

Multi-Agent Framework
Evolution Architecture

A complete engineering dissection of Production Infrastructure's Multi-Agent Framework stack — its architecture, runtime model, multi-agent capability, the three production gaps that actually matter, and the four-phase path to a production-grade orchestration platform for TEL–NOK UC1 & UC2.

6.1
Current score
/ 10 weighted
9.2
Evolved score
Phase 1 complete
3
Critical gaps
to close
28w
Total path
to Phase 3
01 · High-Level Architecture

What Multi-Agent Framework Actually Is

Multi-Agent Framework is Production Infrastructure's internal "manager layer" wrapping the OpenAI Agents SDK. It exposes an OpenAI-compatible HTTP API and allows composing prompts, tools, MCP servers, and multi-agent handoff graphs entirely via YAML config — no code changes needed. A single FastAPI + Uvicorn process handles everything.

Multi-Agent Framework-server · FastAPI + Uvicorn · Single Python Process · asyncio event loop
Multi-Agent Framework Server  — three HTTP API surface groups
Inference
/v1/chat/completions
Management
/files  /models  /tools  /agents
Protocol
/mcp  /a2a  /sse  /healthz
↓    all requests converge    ↓
AppState — Process-Scoped Singleton  ⚠ NOT shared across processes — critical scaling constraint
agent: Agent
agents_by_id: Dict
sessions: Dict (LRU)
cfg: YAML config
↓    per-request    ↓
OpenAI Agents SDK Runner
Streaming or non-streaming execution per request · Agent LLM call → tool invocations via asyncio.gather (parallel) → tool results fed back → next LLM call → stream events: text_delta, tool_call, tool_result, reasoning, handoff
↓      ↓      ↓
LLM API
OpenAI API / Production Infrastructure TELAI LLM Gateway — agent reasoning backend
Session DB
SQLite (default) / PostgreSQL / MySQL via SQLAlchemy — history persistence
File Storage
S3-compatible via boto3: AWS S3, MinIO, Garage, Moto — uploads & sandbox sync
Production Infrastructure CMG Domain Configs Already Proven

cmg_multi_agent.yaml is a working Production Infrastructure CMG anomaly detection multi-agent config (supervisor → AD agent → RCA agent) that maps directly to UC1 requirements. Additional proven configs: config_multi_agents.yaml (web research crew), config_tshark_multi_agent.yaml, config_coding_multi_agent.yaml. This is months of Production Infrastructure-specific domain work that would require full reconstruction in any greenfield alternative — the primary reason to evolve rather than replace.

02 · Components & Responsibilities

Anatomy of the Stack

Sixteen distinct components across server, API, agent graph, session, storage, and observability layers — each with a clearly defined single responsibility.

ComponentLocationRole
CLI Entrypointserver/main.pyMulti-Agent Framework serve CLI — builds config overlay, starts Uvicorn with provided args
App Factory + Lifespanserver/core/app/lifespan.pyStartup: load config, init LLM client, build all agents from YAML, init MCP connections. Shutdown: teardown sessions and storage backends
AppStateserver/core/config.pyProcess-scoped singleton: active agent, agents_by_id dict, sessions dict (in-memory LRU), parsed YAML config, engine registry. Not shared across processes.
Chat Completionsserver/api/chat/main.pyRequest entry: parse_chat_request(), passthrough check, slash command routing, stream / non-stream delegation
Streaming Handlerserver/api/chat/streaming.pySSE stream construction, session resolution, history compression trigger, SDK event → OpenAI SSE translation
Agent Graph Builderserver/agents/graph/main.pyConstructs all Agent objects from YAML: personas, tool grants, model overrides, MCP server selection per agent
Handoff Engineserver/agents/handoff.pyMulti-Agent FrameworkHandoff: builds [AGENT SWITCH] + [HANDOFF TASK] transfer messages, captures via ContextVar, applies input filters to control target-agent context
Handoff Wiringserver/agents/graph/handoff_wiring.pyattach_handoffs(): builds directed edges between agents based on YAML handoffs: config
A2A Executorserver/a2a/executor.pyAdapter: translates Multi-Agent Framework streaming output into A2A Task events for cross-service agent calls via /a2a/v1
Runtime Contextcommon/core/runtime.pyContextVars for session ID, sink, call IDs. Module-level globals: _client, _settings, _tool_call_agent_mapnot safe across processes
Session Factorycommon/session/factory.pyCreates SQLAlchemy sessions (SQLite / PG / MySQL) per session_key. Manages connection pool lifecycle
Session Recoverycommon/session/recovery/Checkpoint detection, anomaly detection (orphaned tool calls, truncated responses), rollback via pop_item(). Handles DB-level anomalies — not logic failures.
Delegation Helpercommon/core/delegation.pyTool inheritance for delegate / sub-agents. Enables run_agent tool for inline agent cloning and dynamic task delegation
YAML Config Loadercommon/config/Multi-file import, environment variable substitution, ${ref} resolution across config files
Toolstools/Filesystem, DB, Kubernetes, Network, Math, Web search, AI sub-agents — accessible via per-agent YAML tool grants
Langfuse Tracingcommon/tracing/Logfire-based Langfuse integration: captures every LLM call, prompt, response, token count, latency, cost per agent type
03 · Runtime Execution

How a Request Flows

Two execution paths: a single-turn request and a multi-agent handoff sequence. Both run within the same single asyncio event loop inside one Python process.

// Single-Turn Request Flow

1
HTTP POST
/v1/chat/completions received by FastAPI router
2
Parse
Extract session_key, user_input, model, stream flag
3
Passthrough?
If model matches passthrough LLM — bypass agent entirely
4
Commands?
/slash commands handled inline, return early
5
SDK Runner
Resolve session, compress if needed, Runner.run_streamed()
6
SSE Stream
SDK events → OpenAI SSE chunks → StreamingResponse
SDK Runner — Step 5 Internals

a. Resolve or create Session (in-memory dict or SQLAlchemy, keyed by session_key)
b. maybe_compress_session() — if history token count exceeds compress_threshold, a summarizer agent call runs in-band
c. Runner.run_streamed(agent, input, session, run_config) — Agent LLM call → parallel tool dispatch via asyncio.gather → results fed back → next LLM call → stream events emitted
d. consume_stream() — translates SDK events (text_delta, tool_call, tool_result, reasoning, handoff) to OpenAI SSE format
e. Session written back to DB; LRU eviction policy enforced on session_access_order


// Multi-Agent Handoff Flow

A
Start Agent
Starting agent receives user message from SDK Runner
B
LLM Decides
LLM invokes transfer_to_<target> tool call
C
ContextVar
Multi-Agent FrameworkHandoff._invoke_handoff() captures via ContextVar
D
Build Msg
[AGENT SWITCH] + [HANDOFF TASK] message constructed
E
Input Filter
passthrough / strip_tools / last_turn / nest_handoff_history applied
F
Target Agent
SDK transitions to target. Same Runner, same session. May chain further.
Input Filters — What Context the Target Agent Receives

passthrough — full conversation history passed to target agent unchanged
strip_tools — tool call/result blocks removed before passing (reduces token usage)
last_turn — only the most recent turn passed (minimal context transfer)
nest_handoff_history — previous agent's full context nested as a structured block in the new session

04 · Multi-Agent Capability

What Works — and What Doesn't

Multi-Agent Framework has real, production-tested multi-agent capability across sequential handoffs, supervisor patterns, per-agent isolation, and protocol exposure. Three patterns critical for TEL–NOK production are structurally absent.

Pattern / Feature
Notes
Status
Sequential handoffs (A → B → C → A)
Via YAML handoffs: config. Proven in cmg_multi_agent.yaml (Production Infrastructure CMG AD → RCA)
✓ FULL
Hierarchical / Supervisor pattern
supervisor → specialist → supervisor routing. Proven in multi-agent configs
✓ FULL
Per-agent tool isolation
Each agent declares its own tool grants in YAML — no cross-agent contamination
✓ FULL
Per-agent LLM model override
Per-agent llm: section with model, temperature, max_tokens overrides
✓ FULL
Per-agent MCP server selection
Each agent selects which MCP servers it sees — scoped capability access
✓ FULL
Dynamic sub-agent delegation
run_agent tool for inline agent cloning and dynamic task delegation within a session
✓ FULL
A2A cross-service protocol
/a2a/v1 endpoint via a2a-sdk: agents in different services/clusters call each other across network
✓ FULL
MCP server exposure
/mcp endpoint: Multi-Agent Framework exposes itself as an MCP server consumable by any MCP client (Claude Desktop, VSCode, etc.)
✓ FULL
True parallel agent branches
SDK executes handoffs sequentially. Agent A suspends, B runs, B suspends, A resumes. No fork-join at all.
✗ ABSENT
Deterministic execution order
LLM-driven only — the model must call the correct handoff tool. If it doesn't, the step is silently skipped.
✗ ABSENT
HiTL interrupt / pause-resume
No native mechanism to pause graph execution and await a human approval callback before continuing
✗ ABSENT
Confidence-gated result re-routing
No output quality threshold enforcement or automatic re-dispatch on low-confidence agent outputs
✗ ABSENT
05 · Infrastructure Stack

Present vs. Absent

A clear-eyed inventory: what the stack includes today, and what is missing for production-scale multi-instance deployment on TEL's TKG environment.

✓   PRESENT IN STACK
FastAPI + Uvicorn + asyncio
HTTP / SSE API serving, async I/O event loop
OpenAI API / Production Infrastructure LLM Gateway
Agent reasoning — OpenAI-compatible endpoint
SQLite / PostgreSQL / MySQL
Session history persistence (SQLAlchemy ORM)
sqlite-vec (TinySearch)
Local RAG vector search — sufficient for single-instance demo
S3-compatible (AWS S3 / MinIO / Garage)
File uploads, sandbox sync via boto3
Langfuse via Logfire
LLM trace capture: prompt, response, tokens, cost per agent
prometheus-client
Prometheus scrape endpoint + Grafana dashboard config
fastmcp + mcp[cli]
Expose tools as MCP server; consume external MCP servers
a2a-sdk[http-server]
Agent-to-Agent cross-service communication protocol
Azure AD / MSAL / JWKS JWT
Enterprise SSO authentication
Docker + Dockerfile
Containerised deployment
Session recovery module
Checkpoint detection + rollback via pop_item()
✗   ABSENT — Production Gaps
Redis / Valkey
No shared cache, no distributed session state, no pub/sub for multi-pod coordination
Message Queue (Kafka / Streams)
No async task dispatch — inter-agent comms in-process only via SDK handoffs
Distributed Vector DB (Qdrant)
sqlite-vec is pod-local — second pod cannot query same index; no namespace isolation
Deterministic FSM / LangGraph
LLM-driven handoffs only — no code-enforced workflow state machine
KEDA Autoscaler
No event-driven pod scaling on queue depth or anomaly event rate
OPA Policy Gate
No policy-as-code authorization layer for agent dispatch
HashiCorp Vault
No secret injection — LLM API keys rely on environment variables
Istio mTLS
No mutual TLS enforcement between agent services
HiTL Interrupt Nodes
No pause / await-callback for human-in-the-loop approval flows
Circuit Breaker
Tool timeouts exist but no circuit breaker — no back-pressure on concurrent long-running tools
API Gateway (nginx / Kong)
No load balancing, SSL termination, or per-route rate limiting at gateway level
WebSockets
SSE-only streaming — no bidirectional communication channel
Key Limitation: A + B Scaling Interaction

The absent items above interact: single-process AppState + no shared session store means you cannot run two Multi-Agent Framework pods and load-balance between them even if you add nginx. The session state would be split across processes. This must be resolved (Valkey shared state) before any other scaling infrastructure is added — it is the foundational blocker.

06 · Critical Gap Analysis

The Three Gaps That Actually Matter

Multi-Agent Framework has twelve documented production absences. For TEL–NOK Phase 2 MVP and Phase 3 production, only three create material delivery risk. The rest are Phase 3 infrastructure concerns. Resolving all twelve upfront adds 16+ weeks with no Phase 2 benefit.

Why Only Three?

Missing Redis, no API gateway, no WebSockets — these only become blockers at Phase 3 production scale with live data pipelines. For Phase 2 MVP on offline data, the three gaps below are the only ones that create immediate risk to delivery quality or operator safety in a Production Infrastructure NOC context.

1
Gap 1 of 3 · Orchestration
Non-Deterministic Routing — The Most Critical Failure Mode
● CRITICAL — Phase 2 Risk

Multi-agent flow in Multi-Agent Framework is entirely LLM-driven: the SDK Runner relies on the model calling the correct transfer_to_<target> tool at the right moment. There is no Python-level workflow state machine enforcing execution order. No code guarantee that "anomaly detected → always call RCA agent." It is a suggestion in the system prompt — not an architectural contract.

For a Production Infrastructure NOC environment, a missed RCA step or a capacity breach not flagged because the LLM chose not to hand off is a production incident. Session recovery (common/session/recovery/) handles DB-level anomalies like orphaned tool calls — it cannot recover a logic failure where the LLM simply didn't invoke the handoff.

Additionally: history compression via an in-band summarizer agent (triggered when history exceeds compress_threshold) is itself an unguarded LLM call — if the summarizer degrades context, the entire session reasoning quietly deteriorates with no alert surfaced.

Multi-Agent Framework Today

LLM must invoke transfer_to_rca_agent. If it miscategorises the anomaly, or context-window pressure causes it to skip the handoff tool, Agent B (RCA Reasoner) never runs. No retry. No audit record of the missed step. The operator receives an incomplete enrichment with no error surfaced at all.

Fixed — LangGraph Overlay (Phase 1)

LangGraph conditional edge: after Agent A returns an EnrichedAnomaly typed object, Python routing logic checks the result type and unconditionally routes to Agent B. Zero LLM compliance required. Confidence gate: if result.confidence < 0.72 → re-route with augmented RAG context, max 2 retries, then human escalation via interrupt().

2
Gap 2 of 3 · Scaling
Single-Process State — Horizontal Scaling Structurally Blocked
● CRITICAL — Phase 3 Blocker

AppState is a process-level singleton. The sessions dict, session_engines, and session_access_order are all in-memory Python dicts. Module-level globals in runtime.py_client, _settings, _tool_call_agent_map — are not safe across OS processes. This is not a configuration limitation; it is a fundamental architectural constraint baked into how AppState is initialised.

You cannot run multiple Multi-Agent Framework instances behind a load balancer without sessions being lost or requests landing on the wrong instance. At Phase 3 production volumes — live BHOM anomaly streams plus live CMG-C/U log ingestion — a single pod will saturate. Pod restart = in-flight task loss. No checkpoint survives a process restart beyond the SQLAlchemy session write, which only captures completed turns.

Multi-Agent Framework Today

Single FastAPI process. All session state in AppState.sessions in-memory dict. Kubernetes pod eviction during a 40s UC2 log analysis call = task permanently lost. SQLite default: single-writer, cannot be shared across pods even if memory were resolved. No KEDA autoscaling possible.

Fixed — Valkey + CloudNativePG (Phase 1–2)

Valkey (Redis-compatible, Apache 2.0) as shared working state across all agent pods — replaces in-memory AppState.sessions. CloudNativePG as LangGraph checkpoint store: graph state survives pod restarts and Kubernetes evictions. KEDA scales Agent C pods on log.raw Kafka topic consumer lag independently of Agent A pods.

3
Gap 3 of 3 · Throughput
No True Parallel Execution — Sequential Bottleneck at Volume
● HIGH — Throughput Impact at Phase 3

The OpenAI Agents SDK Runner executes agents sequentially in the handoff chain: Agent A suspends; Agent B runs; Agent B suspends; Agent A resumes. There is no fork-join mechanism and no simultaneous parallel agent branches. A supervisor wanting two specialists to work concurrently must wait for them sequentially.

For combined UC1 + UC2 events: total processing time = UC1 chain time + UC2 chain time, not max(UC1, UC2). Agent C (Log Analyzer) processes 96K-token log batches and takes 20–40 seconds per LLM call. A P1 BHOM anomaly arriving during a UC2 log analysis run is queued behind that long-running call — it cannot be dispatched to a parallel Agent A branch.

Multi-Agent Framework Today

Combined UC1+UC2: Agent A→B (~10s) finishes, then Agent C→D (~40s) runs. Total: ~50s per event. P1 anomalies cannot preempt in-progress UC2 log analysis. Single asyncio event loop serialises all agent transitions within the process.

Fixed — LangGraph Send() + KEDA (Phase 1–2)

LangGraph parallel Send(): UC1 and UC2 chains dispatched simultaneously to independent agent pod pools. Total: max(10s, 40s) = ~40s. Agent A and Agent C pod counts scale independently via KEDA on their respective event sources. P1 anomalies always get a free Agent A pod.

RAG
Supporting Gap · MVP Prerequisite
RAG Layer Not Production-Scale — sqlite-vec is Single-Pod Only
● MEDIUM — Phase 2 Upgrade Required

Multi-Agent Framework's built-in RAG uses sqlite-vec (TinySearch) — a local SQLite-based vector store on the pod's filesystem. It works for Phase 1 demo but is architecturally incompatible with Phase 2 multi-pod deployment: the file is pod-local, a second Agent A pod cannot query the same index, there is no namespace isolation between UC1 and UC2 knowledge bases, no cross-encoder reranker support, and a practical capacity ceiling around 100K chunks.

sqlite-vec / TinySearch

Pod-local file. Single-instance only. UC1 and UC2 Production Infrastructure KB share one search space with no isolation. No reranker. Phase 1 demo only. Immediately broken in any multi-pod deployment — second pod has an empty or stale index.

Fixed — Qdrant (Phase 1)

Standalone Qdrant service on TKG. Named collections: nok-kb-uc1, nok-kb-uc2, nok-kb-shared. Cross-encoder reranker pipeline. Metadata filtering by Production Infrastructure product version, doc type. Handles tens of millions of vectors. All agent pods query via gRPC client — fully multi-instance safe.

07 · Evolution Roadmap

Four-Phase Multi-Agent Framework Evolution

Each phase is independently deliverable and provides incremental production value. Strictly additive — no phase discards what was built before. The architecture grows with the engagement timeline.

Timeline at a Glance

  Phase 0  (2–3 wks)    Harden existing stack → Phase 1 Demo ready in Production Infrastructure Labs
  Phase 1  (6–8 wks)    LangGraph overlay + Qdrant + Valkey → Phase 2 MVP ready on TELAI
  Phase 2  (8–10 wks)   Agent isolation + KEDA + Kafka + Vault/Istio → Phase 3 production ready
  Phase 3  (12+ wks)    Adaptive cognition + episodic memory + MCP graph endpoints

P0
Phase 0 · Harden
Stabilise What Exists — Demo-Safe in 2–3 Weeks
No architectural changes. Config, tooling, and validation only. Make Multi-Agent Framework reliable enough for Phase 1 Demo in Production Infrastructure Labs against real CMG log samples.
2–3 wks
Key Activities
Switch session DB from SQLite → PostgreSQL (single config line — already supported in code)
Deploy cmg_multi_agent.yaml against real CMG-C/U log samples in Production Infrastructure Labs for UC2 POC
Enable Langfuse tracing with Production Infrastructure span names: agent_type, use_case, model, token_cost per call
Validate LLM Gateway connectivity to https://llmgateway.telai.internal/v1 with TELAI team; confirm per-agent API key provisioning
Build NOK KB ingestion pipeline: PDF/doc → 512-token chunks → sqlite-vec (sufficient for Phase 1 demo only)
Add Prometheus alert: fire if LLM Gateway call p95 latency exceeds 30s
Document all YAML agent config parameters for Production Infrastructure engineer self-service
Deliverables
Phase 1 Demo-ready Multi-Agent Framework on Production Infrastructure Labs with PG session store and real CMG log samples
cmg_multi_agent.yaml extended with UC2 Log Analyzer agent persona and Kalix KPI tool access
LLM Gateway integration confirmed and load-tested with TELAI team
Langfuse dashboard live: agent traces with cost and latency per agent type visible
Production Infrastructure KB (CMG-C/U docs, procedures) ingested into sqlite-vec with validation queries passing
P1
Phase 1 · Orchestrate · Phase 2 MVP Prerequisite
Add Deterministic Orchestration — The Highest-Impact Single Change
Introduce LangGraph as the routing engine over Multi-Agent Framework. Converts LLM-driven handoffs to code-enforced routing. Closes Gap 1 completely. Adds Qdrant (closes RAG Gap). Adds Valkey for shared state (Gap 2 partial).
6–8 wks
Key Activities
Install LangGraph; define StateGraph with UC1 and UC2 node types and typed state schema
Create typed Pydantic state objects: BhomAnomaly, EnrichedAnomaly, RcaResult, LogAnalysisResult, CapacityPlan
Wrap each Multi-Agent Framework agent as a callable LangGraph node — Python wrapper preserves all existing YAML config unchanged
Implement conditional edges: UC1 chain (A→B sequential), UC2 chain (C→D conditional on capacity_breach flag)
Implement confidence gate: if confidence_score < 0.72 → re-route with augmented RAG context, max 2 retries
Add LangGraph interrupt() at remediation action points for HiTL approval on network config changes
Deploy Valkey on TKG dev cluster as shared session/state cache replacing in-memory AppState.sessions
Deploy Qdrant on TKG dev cluster; migrate Production Infrastructure KB from sqlite-vec into nok-kb-uc1 and nok-kb-uc2 namespaces
Wire LangSmith tracing alongside existing Langfuse: dual trace — call-level (Langfuse) + graph-level (LangSmith)
Deliverables
LangGraph StateGraph for UC1 + UC2: code-enforced agent sequencing, zero LLM routing dependency
Confidence-gated result aggregator in Orchestrator; max 2 retries before HiTL escalation
HiTL interrupt node operational — tested with simulated approval callback webhook
Qdrant on TKG dev cluster with Production Infrastructure KB in both UC1 and UC2 namespaces — all agents validated
Valkey shared state cache deployed; AppState.sessions no longer in-memory per pod
Phase 2 MVP architecture integration-tested against offline TEL data (CMG-C/U logs, BHOM anomalies)
P2
Phase 2 · Scale · Phase 3 Production Prerequisite
Agent Isolation, KEDA, Kafka — Break the Single-Process Constraint
Deploy each Multi-Agent Framework as an independently scalable Kubernetes Deployment. Add event-driven autoscaling and secure service mesh. Closes Gap 2 completely.
8–10 wks
Key Activities
Containerise each agent type as a separate Kubernetes Deployment: anomaly-enricher, rca-reasoner, log-analyzer, capacity-planner
Remove all in-process Multi-Agent FrameworkHandoff routing — all dispatch goes through LangGraph Supervisor for production flows
Add Kafka (Strimzi Operator) for UC2 high-volume log streaming: log.raw topic → Agent C consumer group
KEDA ScaledObjects: Agent C scales on log.raw consumer lag; Agent A scales on BHOM anomaly event rate
Configure Istio mTLS between all agent pods and LangGraph Orchestrator service
Vault sidecar injection: move all LLM Gateway API keys out of env vars into Vault-managed dynamic secrets
OPA Policy Gate sidecar on Orchestrator pod — enforce agent dispatch authorisation via Rego policies
Wire CloudNativePG as LangGraph checkpoint store with pgBouncer connection pooling
Implement Helix GPT REST push: Orchestrator calls BHOM API with RcaResult after each UC1 graph completion
Deliverables
Four independent agent Deployments with separate resource quotas and HPA/KEDA policies
KEDA autoscaling operational on both UC1 and UC2 agent pools — validated under load
Kafka UC2 log streaming pipeline from TEL data sources via Strimzi
Vault-injected secrets; Istio mTLS enforced across all services; OPA gate active
Helix GPT integration live: RCA results pushed to BHOM after each UC1 graph run
NOK Agent Interface serving conversational log queries and scheduled digest to TEL operators
Phase 3 production-ready architecture validated in TEL pre-prod TKG namespace
P3
Phase 3 · Adapt · Future Autonomous NOC
Adaptive Cognition, Episodic Memory & MCP Graph Endpoints
Introduce adaptive agent behaviour and episodic memory. Multi-Agent Framework's run_agent delegation tool becomes architecturally significant at this phase for dynamic sub-agent spawning.
12+ wks
Key Activities
Enable Multi-Agent Framework's run_agent tool as dynamic sub-agent spawning mechanism within LangGraph nodes (ACL pattern)
Build Episodic Memory: Qdrant semantic search over resolved incidents + PostgreSQL structured log of past RCA decisions
Adaptive complexity routing: LangGraph meta-node assesses anomaly complexity; routes simple cases to lightweight 1-agent path, complex ones to full chain
Constitutional AI filter at LLM Gateway boundary to block prompt injection and hallucinated remediation actions
Cost circuit breaker: if single graph run exceeds token budget threshold → terminate with partial result + human escalation
Enable YAML hot-reload: update agent personas without pod restart (partially supported — complete coverage)
Evaluate MCP exposure of LangGraph graph runs — external Production Infrastructure tools trigger specific graph nodes via MCP protocol
Deliverables
Adaptive complexity routing live: simple anomalies on lightweight path, complex on full chain — measured latency improvement
Episodic memory: agent cites historical resolution precedents in RCA output with evidence references
Constitutional AI filter blocking prompt injection and hallucinated remediation actions
Cost circuit breakers with automatic human escalation on budget overflow
MCP-exposed graph endpoints enabling Production Infrastructure ecosystem tool integration via MCP protocol
08 · Hybrid Target Architecture

LangGraph Routes. Multi-Agent Framework Executes.

The recommendation is neither "Multi-Agent Framework as-is" nor "rebuild in LangGraph from scratch." The hybrid principle: LangGraph controls what runs and when. Multi-Agent Framework controls how each agent runs and what it has access to. Production Infrastructure engineers never touch graph code.

The Core Principle

LangGraph Supervisor treats each Multi-Agent Framework agent as a callable Python node in the StateGraph. Multi-Agent Framework YAML configs continue to define agent personas, tools, and MCP access — zero change for Production Infrastructure engineers. Routing logic, confidence gates, retry policies, and HiTL interrupts live in Python graph code owned by architects. The two layers are independently evolvable.

🏛️
LangGraph Orchestrator Layer  — introduced Phase 1
Deterministic routing layer. StateGraph with typed conditional edges. Confidence gating (θ = 0.72). HiTL interrupt nodes. Parallel Send() for UC1+UC2 fan-out. Full graph-level audit lineage via LangSmith.
StateGraphTyped edgesConfidence gate HiTL interrupt()Parallel Send()LangSmith traces
NEW — Phase 1
⚙️
Multi-Agent Framework Execution Layer  — preserved entirely
YAML-driven agent personas, tool grants, model overrides, MCP access. FastAPI + Uvicorn serving. LLM Gateway calls, session history compression, A2A protocol. Production Infrastructure CMG domain configs intact.
YAML agent configcmg_multi_agent.yamlFastAPI + Uvicorn MCP /mcp endpointA2A /a2a/v1run_agent delegation
EXISTING
🗄️
State & Knowledge Layer  — upgraded Phase 1–2
Valkey for shared working state across pods. CloudNativePG for LangGraph graph checkpoints (survives pod restart). Qdrant for production-scale RAG with nok-kb-uc1 / uc2 / shared namespaces. TELAI LLM Gateway for all inference.
Valkey (shared state)CloudNativePG (checkpoints) Qdrant (RAG)TELAI LLM GatewayLangfuse + LangSmith
UPGRADED
🔐
Security & Governance Layer  — added Phase 2
OPA Policy Gate (sidecar on Orchestrator pod). Vault secret injection for all LLM Gateway API keys. Istio mTLS between all agent services. KEDA event-driven autoscaling on agent pod pools. Strimzi Kafka for UC2 log streaming at production volume.
OPA (Rego policies)Vault sidecarIstio mTLS KEDA autoscalerStrimzi Kafka
NEW — Phase 2

Ownership Boundary — YAML vs Graph Code

ConcernOwned ByHow ChangedWho Changes It
Agent persona / system promptMulti-Agent Framework YAMLEdit YAML, config reload — no pod redeploy neededProduction Infrastructure engineers, domain experts
Tool grants per agentMulti-Agent Framework YAMLAdd / remove tools in agent YAML configProduction Infrastructure engineers
MCP server selection per agentMulti-Agent Framework YAMLAdd MCP server to agent's YAML config sectionProduction Infrastructure engineers
LLM model override per agentMulti-Agent Framework YAMLChange llm.model field in agent YAMLProduction Infrastructure engineers
Which agent runs after whichLangGraph PythonConditional edge function in graph codeArchitects (KS / Vikas)
Confidence threshold valueLangGraph PythonPython constant in graph node — single lineArchitects
HiTL interrupt pointsLangGraph Pythoninterrupt() call placement in node functionArchitects
Retry logic / max retriesLangGraph PythonEdge condition counter in graph stateArchitects
Output routing (Helix vs NOK UI)LangGraph PythonConditional edge on output type / destination fieldArchitects
What Multi-Agent Framework Keeps That LangGraph Cannot Match

MCP server exposure (/mcp): Multi-Agent Framework natively exposes itself as an MCP server. Any MCP-compatible client — Claude Desktop, VSCode Copilot, other Production Infrastructure tools — can consume Multi-Agent Framework capabilities out of the box. LangGraph has no equivalent.

A2A protocol (/a2a/v1): Built-in Agent-to-Agent protocol enables Production Infrastructure agents deployed in different clusters or services to call each other across network boundaries. Not in LangGraph.

YAML zero-code agent definition: Production Infrastructure engineers add a new agent persona by editing a YAML file with no Python changes and no redeployment. In a multi-OpCo rollout where each OpCo needs slightly different agent configs, this is commercially significant — it keeps agent customisation in Production Infrastructure's hands, not the architect's.

09 · Architecture Decision

Scoring — Three Options Compared

13-dimension weighted comparison. Delivery speed dimensions are weighted Critical given the March-focused TEL–NOK engagement. Greenfield HMA LangGraph scores higher on orchestration and HA design — but collapses on every dimension that determines whether Phase 2 delivers on time.

Dimension Multi-Agent Framework Now Evolved Multi-Agent Framework
(Phase 1 complete)
Greenfield HMA
(12 wks build)
Weight
Deterministic orchestration
3/10
9/10
10/10
HIGH
Phase 1 demo readiness
9/10
10/10
1/10
CRITICAL
Phase 2 MVP delivery speed
8/10
9/10
3/10
CRITICAL
Parallel agent execution
2/10
8/10
10/10
MEDIUM
Production HA (Phase 3)
3/10
9/10
9/10
HIGH
Production Infrastructure KB / RAG scale
4/10
9/10
9/10
HIGH
LLM cost governance
7/10
9/10
10/10
MEDIUM
MCP / A2A ecosystem
10/10
10/10
2/10
MEDIUM
YAML zero-code config
10/10
10/10
1/10
MEDIUM
Security / Vault / OPA
5/10
9/10
9/10
HIGH
Observability
8/10
9/10
9/10
MEDIUM
Reuse of Production Infrastructure work
10/10
10/10
1/10
CRITICAL
Production Infrastructure CMG domain configs
9/10
10/10
1/10
CRITICAL
Weighted Total 6.1 / 10 9.2 / 10  ✓ RECOMMENDED 6.7 / 10
Bottom Line for Decision-Makers

Do not replace Multi-Agent Framework. Evolve it.

Greenfield HMA LangGraph scores higher on orchestration and parallel execution — but only because it hasn't been built yet. When delivery speed is weighted appropriately for the March-focused engagement, the evolved Multi-Agent Framework path (9.2) outscores greenfield (6.7) by 2.5 points. The three production gaps — non-deterministic routing, single-process state, and no parallel execution — are all closable in phases without discarding the Production Infrastructure CMG domain work, LLM Gateway integration, MCP exposure, or YAML-driven agent definition that Multi-Agent Framework already provides.

Phase 0 (2–3 wks) · PG session store + Langfuse + LLM GW validation → Demo ready
Phase 1 (6–8 wks) · LangGraph overlay + Qdrant + Valkey → Phase 2 MVP ready
Phase 2 (8–10 wks) · Agent isolation + KEDA + Kafka + Vault/Istio → Phase 3 production ready
Phase 3 (12+ wks) · Adaptive cognition + episodic memory + MCP graph endpoints

Seen enough architecture?
See it running in production.

Atsky deploys multi-agent systems inside live enterprise environments. Book a 30-minute technical call.

Book a Technical Call → Telecom Use Cases