Multi-Agent Framework Architecture

01 · High-Level Architecture

What Multi-Agent Framework Actually Is

Multi-Agent Framework is Production Infrastructure's internal "manager layer" wrapping the OpenAI Agents SDK. It exposes an OpenAI-compatible HTTP API and allows composing prompts, tools, MCP servers, and multi-agent handoff graphs entirely via YAML config — no code changes needed. A single FastAPI + Uvicorn process handles everything.

Multi-Agent Framework-server · FastAPI + Uvicorn · Single Python Process · asyncio event loop

Multi-Agent Framework Server — three HTTP API surface groups

Inference

/v1/chat/completions

Management

/files /models /tools /agents

Protocol

/mcp /a2a /sse /healthz

↓ all requests converge ↓

AppState — Process-Scoped Singleton ⚠ NOT shared across processes — critical scaling constraint

agent: Agent

agents_by_id: Dict

sessions: Dict (LRU)

cfg: YAML config

↓ per-request ↓

OpenAI Agents SDK Runner

Streaming or non-streaming execution per request · Agent LLM call → tool invocations via asyncio.gather (parallel) → tool results fed back → next LLM call → stream events: text_delta, tool_call, tool_result, reasoning, handoff

↓ ↓ ↓

LLM API

OpenAI API / Production Infrastructure TELAI LLM Gateway — agent reasoning backend

Session DB

SQLite (default) / PostgreSQL / MySQL via SQLAlchemy — history persistence

File Storage

S3-compatible via boto3: AWS S3, MinIO, Garage, Moto — uploads & sandbox sync

Production Infrastructure CMG Domain Configs Already Proven

cmg_multi_agent.yaml is a working Production Infrastructure CMG anomaly detection multi-agent config (supervisor → AD agent → RCA agent) that maps directly to UC1 requirements. Additional proven configs: config_multi_agents.yaml (web research crew), config_tshark_multi_agent.yaml, config_coding_multi_agent.yaml. This is months of Production Infrastructure-specific domain work that would require full reconstruction in any greenfield alternative — the primary reason to evolve rather than replace.

02 · Components & Responsibilities

Anatomy of the Stack

Sixteen distinct components across server, API, agent graph, session, storage, and observability layers — each with a clearly defined single responsibility.

Component	Location	Role
CLI Entrypoint	`server/main.py`	Multi-Agent Framework serve CLI — builds config overlay, starts Uvicorn with provided args
App Factory + Lifespan	`server/core/app/lifespan.py`	Startup: load config, init LLM client, build all agents from YAML, init MCP connections. Shutdown: teardown sessions and storage backends
AppState	`server/core/config.py`	Process-scoped singleton: active agent, agents_by_id dict, sessions dict (in-memory LRU), parsed YAML config, engine registry. Not shared across processes.
Chat Completions	`server/api/chat/main.py`	Request entry: parse_chat_request(), passthrough check, slash command routing, stream / non-stream delegation
Streaming Handler	`server/api/chat/streaming.py`	SSE stream construction, session resolution, history compression trigger, SDK event → OpenAI SSE translation
Agent Graph Builder	`server/agents/graph/main.py`	Constructs all Agent objects from YAML: personas, tool grants, model overrides, MCP server selection per agent
Handoff Engine	`server/agents/handoff.py`	Multi-Agent FrameworkHandoff: builds [AGENT SWITCH] + [HANDOFF TASK] transfer messages, captures via ContextVar, applies input filters to control target-agent context
Handoff Wiring	`server/agents/graph/handoff_wiring.py`	attach_handoffs(): builds directed edges between agents based on YAML `handoffs:` config
A2A Executor	`server/a2a/executor.py`	Adapter: translates Multi-Agent Framework streaming output into A2A Task events for cross-service agent calls via /a2a/v1
Runtime Context	`common/core/runtime.py`	ContextVars for session ID, sink, call IDs. Module-level globals: `_client`, `_settings`, `_tool_call_agent_map` — not safe across processes
Session Factory	`common/session/factory.py`	Creates SQLAlchemy sessions (SQLite / PG / MySQL) per session_key. Manages connection pool lifecycle
Session Recovery	`common/session/recovery/`	Checkpoint detection, anomaly detection (orphaned tool calls, truncated responses), rollback via pop_item(). Handles DB-level anomalies — not logic failures.
Delegation Helper	`common/core/delegation.py`	Tool inheritance for delegate / sub-agents. Enables run_agent tool for inline agent cloning and dynamic task delegation
YAML Config Loader	`common/config/`	Multi-file import, environment variable substitution, ${ref} resolution across config files
Tools	`tools/`	Filesystem, DB, Kubernetes, Network, Math, Web search, AI sub-agents — accessible via per-agent YAML tool grants
Langfuse Tracing	`common/tracing/`	Logfire-based Langfuse integration: captures every LLM call, prompt, response, token count, latency, cost per agent type

03 · Runtime Execution

How a Request Flows

Two execution paths: a single-turn request and a multi-agent handoff sequence. Both run within the same single asyncio event loop inside one Python process.

// Single-Turn Request Flow

1

HTTP POST

/v1/chat/completions received by FastAPI router

2

Parse

Extract session_key, user_input, model, stream flag

3

Passthrough?

If model matches passthrough LLM — bypass agent entirely

4

Commands?

/slash commands handled inline, return early

5

SDK Runner

Resolve session, compress if needed, Runner.run_streamed()

6

SSE Stream

SDK events → OpenAI SSE chunks → StreamingResponse

SDK Runner — Step 5 Internals

a. Resolve or create Session (in-memory dict or SQLAlchemy, keyed by session_key)
b. maybe_compress_session() — if history token count exceeds compress_threshold, a summarizer agent call runs in-band
c. Runner.run_streamed(agent, input, session, run_config) — Agent LLM call → parallel tool dispatch via asyncio.gather → results fed back → next LLM call → stream events emitted
d. consume_stream() — translates SDK events (text_delta, tool_call, tool_result, reasoning, handoff) to OpenAI SSE format
e. Session written back to DB; LRU eviction policy enforced on session_access_order

// Multi-Agent Handoff Flow

A

Start Agent

Starting agent receives user message from SDK Runner

B

LLM Decides

LLM invokes transfer_to_<target> tool call

C

ContextVar

Multi-Agent FrameworkHandoff._invoke_handoff() captures via ContextVar

D

Build Msg

[AGENT SWITCH] + [HANDOFF TASK] message constructed

E

Input Filter

passthrough / strip_tools / last_turn / nest_handoff_history applied

F

Target Agent

SDK transitions to target. Same Runner, same session. May chain further.

Input Filters — What Context the Target Agent Receives

passthrough — full conversation history passed to target agent unchanged
strip_tools — tool call/result blocks removed before passing (reduces token usage)
last_turn — only the most recent turn passed (minimal context transfer)
nest_handoff_history — previous agent's full context nested as a structured block in the new session

04 · Multi-Agent Capability

What Works — and What Doesn't

Multi-Agent Framework has real, production-tested multi-agent capability across sequential handoffs, supervisor patterns, per-agent isolation, and protocol exposure. Three patterns critical for TEL–NOK production are structurally absent.

Pattern / Feature

Notes

Status

Sequential handoffs (A → B → C → A)

Via YAML handoffs: config. Proven in cmg_multi_agent.yaml (Production Infrastructure CMG AD → RCA)

✓ FULL

Hierarchical / Supervisor pattern

supervisor → specialist → supervisor routing. Proven in multi-agent configs

✓ FULL

Per-agent tool isolation

Each agent declares its own tool grants in YAML — no cross-agent contamination

✓ FULL

Per-agent LLM model override

Per-agent llm: section with model, temperature, max_tokens overrides

✓ FULL

Per-agent MCP server selection

Each agent selects which MCP servers it sees — scoped capability access

✓ FULL

Dynamic sub-agent delegation

run_agent tool for inline agent cloning and dynamic task delegation within a session

✓ FULL

A2A cross-service protocol

/a2a/v1 endpoint via a2a-sdk: agents in different services/clusters call each other across network

✓ FULL

MCP server exposure

/mcp endpoint: Multi-Agent Framework exposes itself as an MCP server consumable by any MCP client (Claude Desktop, VSCode, etc.)

✓ FULL

True parallel agent branches

SDK executes handoffs sequentially. Agent A suspends, B runs, B suspends, A resumes. No fork-join at all.

✗ ABSENT

Deterministic execution order

LLM-driven only — the model must call the correct handoff tool. If it doesn't, the step is silently skipped.

✗ ABSENT

HiTL interrupt / pause-resume

No native mechanism to pause graph execution and await a human approval callback before continuing

✗ ABSENT

Confidence-gated result re-routing

No output quality threshold enforcement or automatic re-dispatch on low-confidence agent outputs

✗ ABSENT

05 · Infrastructure Stack

Present vs. Absent

A clear-eyed inventory: what the stack includes today, and what is missing for production-scale multi-instance deployment on TEL's TKG environment.

✓ PRESENT IN STACK

FastAPI + Uvicorn + asyncio

HTTP / SSE API serving, async I/O event loop

OpenAI API / Production Infrastructure LLM Gateway

Agent reasoning — OpenAI-compatible endpoint

SQLite / PostgreSQL / MySQL

Session history persistence (SQLAlchemy ORM)

sqlite-vec (TinySearch)

Local RAG vector search — sufficient for single-instance demo

S3-compatible (AWS S3 / MinIO / Garage)

File uploads, sandbox sync via boto3

Langfuse via Logfire

LLM trace capture: prompt, response, tokens, cost per agent

prometheus-client

Prometheus scrape endpoint + Grafana dashboard config

fastmcp + mcp[cli]

Expose tools as MCP server; consume external MCP servers

a2a-sdk[http-server]

Agent-to-Agent cross-service communication protocol

Azure AD / MSAL / JWKS JWT

Enterprise SSO authentication

Docker + Dockerfile

Containerised deployment

Session recovery module

Checkpoint detection + rollback via pop_item()

✗ ABSENT — Production Gaps

Redis / Valkey

No shared cache, no distributed session state, no pub/sub for multi-pod coordination

Message Queue (Kafka / Streams)

No async task dispatch — inter-agent comms in-process only via SDK handoffs

Distributed Vector DB (Qdrant)

sqlite-vec is pod-local — second pod cannot query same index; no namespace isolation

Deterministic FSM / LangGraph

LLM-driven handoffs only — no code-enforced workflow state machine

KEDA Autoscaler

No event-driven pod scaling on queue depth or anomaly event rate

OPA Policy Gate

No policy-as-code authorization layer for agent dispatch

HashiCorp Vault

No secret injection — LLM API keys rely on environment variables

Istio mTLS

No mutual TLS enforcement between agent services

HiTL Interrupt Nodes

No pause / await-callback for human-in-the-loop approval flows

Circuit Breaker

Tool timeouts exist but no circuit breaker — no back-pressure on concurrent long-running tools

API Gateway (nginx / Kong)

No load balancing, SSL termination, or per-route rate limiting at gateway level

WebSockets

SSE-only streaming — no bidirectional communication channel

Key Limitation: A + B Scaling Interaction

The absent items above interact: single-process AppState + no shared session store means you cannot run two Multi-Agent Framework pods and load-balance between them even if you add nginx. The session state would be split across processes. This must be resolved (Valkey shared state) before any other scaling infrastructure is added — it is the foundational blocker.

06 · Critical Gap Analysis

The Three Gaps That Actually Matter

Multi-Agent Framework has twelve documented production absences. For TEL–NOK Phase 2 MVP and Phase 3 production, only three create material delivery risk. The rest are Phase 3 infrastructure concerns. Resolving all twelve upfront adds 16+ weeks with no Phase 2 benefit.

Why Only Three?

Missing Redis, no API gateway, no WebSockets — these only become blockers at Phase 3 production scale with live data pipelines. For Phase 2 MVP on offline data, the three gaps below are the only ones that create immediate risk to delivery quality or operator safety in a Production Infrastructure NOC context.

1

Gap 1 of 3 · Orchestration

Non-Deterministic Routing — The Most Critical Failure Mode

● CRITICAL — Phase 2 Risk

Multi-agent flow in Multi-Agent Framework is entirely LLM-driven: the SDK Runner relies on the model calling the correct transfer_to_<target> tool at the right moment. There is no Python-level workflow state machine enforcing execution order. No code guarantee that "anomaly detected → always call RCA agent." It is a suggestion in the system prompt — not an architectural contract.

For a Production Infrastructure NOC environment, a missed RCA step or a capacity breach not flagged because the LLM chose not to hand off is a production incident. Session recovery (common/session/recovery/) handles DB-level anomalies like orphaned tool calls — it cannot recover a logic failure where the LLM simply didn't invoke the handoff.

Additionally: history compression via an in-band summarizer agent (triggered when history exceeds compress_threshold) is itself an unguarded LLM call — if the summarizer degrades context, the entire session reasoning quietly deteriorates with no alert surfaced.

Multi-Agent Framework Today

LLM must invoke transfer_to_rca_agent. If it miscategorises the anomaly, or context-window pressure causes it to skip the handoff tool, Agent B (RCA Reasoner) never runs. No retry. No audit record of the missed step. The operator receives an incomplete enrichment with no error surfaced at all.

Fixed — LangGraph Overlay (Phase 1)

LangGraph conditional edge: after Agent A returns an EnrichedAnomaly typed object, Python routing logic checks the result type and unconditionally routes to Agent B. Zero LLM compliance required. Confidence gate: if result.confidence < 0.72 → re-route with augmented RAG context, max 2 retries, then human escalation via interrupt().

2

Gap 2 of 3 · Scaling

Single-Process State — Horizontal Scaling Structurally Blocked

● CRITICAL — Phase 3 Blocker

AppState is a process-level singleton. The sessions dict, session_engines, and session_access_order are all in-memory Python dicts. Module-level globals in runtime.py — _client, _settings, _tool_call_agent_map — are not safe across OS processes. This is not a configuration limitation; it is a fundamental architectural constraint baked into how AppState is initialised.

You cannot run multiple Multi-Agent Framework instances behind a load balancer without sessions being lost or requests landing on the wrong instance. At Phase 3 production volumes — live BHOM anomaly streams plus live CMG-C/U log ingestion — a single pod will saturate. Pod restart = in-flight task loss. No checkpoint survives a process restart beyond the SQLAlchemy session write, which only captures completed turns.

Multi-Agent Framework Today

Single FastAPI process. All session state in AppState.sessions in-memory dict. Kubernetes pod eviction during a 40s UC2 log analysis call = task permanently lost. SQLite default: single-writer, cannot be shared across pods even if memory were resolved. No KEDA autoscaling possible.

Fixed — Valkey + CloudNativePG (Phase 1–2)

Valkey (Redis-compatible, Apache 2.0) as shared working state across all agent pods — replaces in-memory AppState.sessions. CloudNativePG as LangGraph checkpoint store: graph state survives pod restarts and Kubernetes evictions. KEDA scales Agent C pods on log.raw Kafka topic consumer lag independently of Agent A pods.

3

Gap 3 of 3 · Throughput

No True Parallel Execution — Sequential Bottleneck at Volume

● HIGH — Throughput Impact at Phase 3

The OpenAI Agents SDK Runner executes agents sequentially in the handoff chain: Agent A suspends; Agent B runs; Agent B suspends; Agent A resumes. There is no fork-join mechanism and no simultaneous parallel agent branches. A supervisor wanting two specialists to work concurrently must wait for them sequentially.

For combined UC1 + UC2 events: total processing time = UC1 chain time + UC2 chain time, not max(UC1, UC2). Agent C (Log Analyzer) processes 96K-token log batches and takes 20–40 seconds per LLM call. A P1 BHOM anomaly arriving during a UC2 log analysis run is queued behind that long-running call — it cannot be dispatched to a parallel Agent A branch.

Multi-Agent Framework Today

Combined UC1+UC2: Agent A→B (~10s) finishes, then Agent C→D (~40s) runs. Total: ~50s per event. P1 anomalies cannot preempt in-progress UC2 log analysis. Single asyncio event loop serialises all agent transitions within the process.

Fixed — LangGraph Send() + KEDA (Phase 1–2)

LangGraph parallel Send(): UC1 and UC2 chains dispatched simultaneously to independent agent pod pools. Total: max(10s, 40s) = ~40s. Agent A and Agent C pod counts scale independently via KEDA on their respective event sources. P1 anomalies always get a free Agent A pod.

RAG

Supporting Gap · MVP Prerequisite

RAG Layer Not Production-Scale — sqlite-vec is Single-Pod Only

● MEDIUM — Phase 2 Upgrade Required

Multi-Agent Framework's built-in RAG uses sqlite-vec (TinySearch) — a local SQLite-based vector store on the pod's filesystem. It works for Phase 1 demo but is architecturally incompatible with Phase 2 multi-pod deployment: the file is pod-local, a second Agent A pod cannot query the same index, there is no namespace isolation between UC1 and UC2 knowledge bases, no cross-encoder reranker support, and a practical capacity ceiling around 100K chunks.

sqlite-vec / TinySearch

Pod-local file. Single-instance only. UC1 and UC2 Production Infrastructure KB share one search space with no isolation. No reranker. Phase 1 demo only. Immediately broken in any multi-pod deployment — second pod has an empty or stale index.

Fixed — Qdrant (Phase 1)

Standalone Qdrant service on TKG. Named collections: nok-kb-uc1, nok-kb-uc2, nok-kb-shared. Cross-encoder reranker pipeline. Metadata filtering by Production Infrastructure product version, doc type. Handles tens of millions of vectors. All agent pods query via gRPC client — fully multi-instance safe.

07 · Evolution Roadmap

Four-Phase Multi-Agent Framework Evolution

Each phase is independently deliverable and provides incremental production value. Strictly additive — no phase discards what was built before. The architecture grows with the engagement timeline.

Timeline at a Glance

■ Phase 0 (2–3 wks)    Harden existing stack → Phase 1 Demo ready in Production Infrastructure Labs
■ Phase 1 (6–8 wks)    LangGraph overlay + Qdrant + Valkey → Phase 2 MVP ready on TELAI
■ Phase 2 (8–10 wks) Agent isolation + KEDA + Kafka + Vault/Istio → Phase 3 production ready
■ Phase 3 (12+ wks)    Adaptive cognition + episodic memory + MCP graph endpoints

P0

Phase 0 · Harden

Stabilise What Exists — Demo-Safe in 2–3 Weeks

No architectural changes. Config, tooling, and validation only. Make Multi-Agent Framework reliable enough for Phase 1 Demo in Production Infrastructure Labs against real CMG log samples.

2–3 wks

Key Activities

Switch session DB from SQLite → PostgreSQL (single config line — already supported in code)

Deploy cmg_multi_agent.yaml against real CMG-C/U log samples in Production Infrastructure Labs for UC2 POC

Enable Langfuse tracing with Production Infrastructure span names: agent_type, use_case, model, token_cost per call

Validate LLM Gateway connectivity to https://llmgateway.telai.internal/v1 with TELAI team; confirm per-agent API key provisioning

Build NOK KB ingestion pipeline: PDF/doc → 512-token chunks → sqlite-vec (sufficient for Phase 1 demo only)

Add Prometheus alert: fire if LLM Gateway call p95 latency exceeds 30s

Document all YAML agent config parameters for Production Infrastructure engineer self-service

Deliverables

Phase 1 Demo-ready Multi-Agent Framework on Production Infrastructure Labs with PG session store and real CMG log samples

cmg_multi_agent.yaml extended with UC2 Log Analyzer agent persona and Kalix KPI tool access

LLM Gateway integration confirmed and load-tested with TELAI team

Langfuse dashboard live: agent traces with cost and latency per agent type visible

Production Infrastructure KB (CMG-C/U docs, procedures) ingested into sqlite-vec with validation queries passing

P1

Phase 1 · Orchestrate · Phase 2 MVP Prerequisite

Add Deterministic Orchestration — The Highest-Impact Single Change

Introduce LangGraph as the routing engine over Multi-Agent Framework. Converts LLM-driven handoffs to code-enforced routing. Closes Gap 1 completely. Adds Qdrant (closes RAG Gap). Adds Valkey for shared state (Gap 2 partial).

6–8 wks

Key Activities

Install LangGraph; define StateGraph with UC1 and UC2 node types and typed state schema

Create typed Pydantic state objects: BhomAnomaly, EnrichedAnomaly, RcaResult, LogAnalysisResult, CapacityPlan

Wrap each Multi-Agent Framework agent as a callable LangGraph node — Python wrapper preserves all existing YAML config unchanged

Implement conditional edges: UC1 chain (A→B sequential), UC2 chain (C→D conditional on capacity_breach flag)

Implement confidence gate: if confidence_score < 0.72 → re-route with augmented RAG context, max 2 retries

Add LangGraph interrupt() at remediation action points for HiTL approval on network config changes

Deploy Valkey on TKG dev cluster as shared session/state cache replacing in-memory AppState.sessions

Deploy Qdrant on TKG dev cluster; migrate Production Infrastructure KB from sqlite-vec into nok-kb-uc1 and nok-kb-uc2 namespaces

Wire LangSmith tracing alongside existing Langfuse: dual trace — call-level (Langfuse) + graph-level (LangSmith)

Deliverables

LangGraph StateGraph for UC1 + UC2: code-enforced agent sequencing, zero LLM routing dependency

Confidence-gated result aggregator in Orchestrator; max 2 retries before HiTL escalation

HiTL interrupt node operational — tested with simulated approval callback webhook

Qdrant on TKG dev cluster with Production Infrastructure KB in both UC1 and UC2 namespaces — all agents validated

Valkey shared state cache deployed; AppState.sessions no longer in-memory per pod

Phase 2 MVP architecture integration-tested against offline TEL data (CMG-C/U logs, BHOM anomalies)

P2

Phase 2 · Scale · Phase 3 Production Prerequisite

Agent Isolation, KEDA, Kafka — Break the Single-Process Constraint

Deploy each Multi-Agent Framework as an independently scalable Kubernetes Deployment. Add event-driven autoscaling and secure service mesh. Closes Gap 2 completely.

8–10 wks

Key Activities

Containerise each agent type as a separate Kubernetes Deployment: anomaly-enricher, rca-reasoner, log-analyzer, capacity-planner

Remove all in-process Multi-Agent FrameworkHandoff routing — all dispatch goes through LangGraph Supervisor for production flows

Add Kafka (Strimzi Operator) for UC2 high-volume log streaming: log.raw topic → Agent C consumer group

KEDA ScaledObjects: Agent C scales on log.raw consumer lag; Agent A scales on BHOM anomaly event rate

Configure Istio mTLS between all agent pods and LangGraph Orchestrator service

Vault sidecar injection: move all LLM Gateway API keys out of env vars into Vault-managed dynamic secrets

OPA Policy Gate sidecar on Orchestrator pod — enforce agent dispatch authorisation via Rego policies

Wire CloudNativePG as LangGraph checkpoint store with pgBouncer connection pooling

Implement Helix GPT REST push: Orchestrator calls BHOM API with RcaResult after each UC1 graph completion

Deliverables

Four independent agent Deployments with separate resource quotas and HPA/KEDA policies

KEDA autoscaling operational on both UC1 and UC2 agent pools — validated under load

Kafka UC2 log streaming pipeline from TEL data sources via Strimzi

Vault-injected secrets; Istio mTLS enforced across all services; OPA gate active

Helix GPT integration live: RCA results pushed to BHOM after each UC1 graph run

NOK Agent Interface serving conversational log queries and scheduled digest to TEL operators

Phase 3 production-ready architecture validated in TEL pre-prod TKG namespace

P3

Phase 3 · Adapt · Future Autonomous NOC

Adaptive Cognition, Episodic Memory & MCP Graph Endpoints

Introduce adaptive agent behaviour and episodic memory. Multi-Agent Framework's run_agent delegation tool becomes architecturally significant at this phase for dynamic sub-agent spawning.

12+ wks

Key Activities

Enable Multi-Agent Framework's run_agent tool as dynamic sub-agent spawning mechanism within LangGraph nodes (ACL pattern)

Build Episodic Memory: Qdrant semantic search over resolved incidents + PostgreSQL structured log of past RCA decisions

Adaptive complexity routing: LangGraph meta-node assesses anomaly complexity; routes simple cases to lightweight 1-agent path, complex ones to full chain

Constitutional AI filter at LLM Gateway boundary to block prompt injection and hallucinated remediation actions

Cost circuit breaker: if single graph run exceeds token budget threshold → terminate with partial result + human escalation

Enable YAML hot-reload: update agent personas without pod restart (partially supported — complete coverage)

Evaluate MCP exposure of LangGraph graph runs — external Production Infrastructure tools trigger specific graph nodes via MCP protocol

Deliverables

Adaptive complexity routing live: simple anomalies on lightweight path, complex on full chain — measured latency improvement

Episodic memory: agent cites historical resolution precedents in RCA output with evidence references

Constitutional AI filter blocking prompt injection and hallucinated remediation actions

Cost circuit breakers with automatic human escalation on budget overflow

MCP-exposed graph endpoints enabling Production Infrastructure ecosystem tool integration via MCP protocol

08 · Hybrid Target Architecture

LangGraph Routes. Multi-Agent Framework Executes.

The recommendation is neither "Multi-Agent Framework as-is" nor "rebuild in LangGraph from scratch." The hybrid principle: LangGraph controls what runs and when. Multi-Agent Framework controls how each agent runs and what it has access to. Production Infrastructure engineers never touch graph code.

The Core Principle

LangGraph Supervisor treats each Multi-Agent Framework agent as a callable Python node in the StateGraph. Multi-Agent Framework YAML configs continue to define agent personas, tools, and MCP access — zero change for Production Infrastructure engineers. Routing logic, confidence gates, retry policies, and HiTL interrupts live in Python graph code owned by architects. The two layers are independently evolvable.

🏛️

LangGraph Orchestrator Layer — introduced Phase 1

Deterministic routing layer. StateGraph with typed conditional edges. Confidence gating (θ = 0.72). HiTL interrupt nodes. Parallel Send() for UC1+UC2 fan-out. Full graph-level audit lineage via LangSmith.

StateGraphTyped edgesConfidence gate HiTL interrupt()Parallel Send()LangSmith traces

NEW — Phase 1

⚙️

Multi-Agent Framework Execution Layer — preserved entirely

YAML-driven agent personas, tool grants, model overrides, MCP access. FastAPI + Uvicorn serving. LLM Gateway calls, session history compression, A2A protocol. Production Infrastructure CMG domain configs intact.

YAML agent configcmg_multi_agent.yamlFastAPI + Uvicorn MCP /mcp endpointA2A /a2a/v1run_agent delegation

EXISTING

🗄️

State & Knowledge Layer — upgraded Phase 1–2

Valkey for shared working state across pods. CloudNativePG for LangGraph graph checkpoints (survives pod restart). Qdrant for production-scale RAG with nok-kb-uc1 / uc2 / shared namespaces. TELAI LLM Gateway for all inference.

Valkey (shared state)CloudNativePG (checkpoints) Qdrant (RAG)TELAI LLM GatewayLangfuse + LangSmith

UPGRADED

🔐

Security & Governance Layer — added Phase 2

OPA Policy Gate (sidecar on Orchestrator pod). Vault secret injection for all LLM Gateway API keys. Istio mTLS between all agent services. KEDA event-driven autoscaling on agent pod pools. Strimzi Kafka for UC2 log streaming at production volume.

OPA (Rego policies)Vault sidecarIstio mTLS KEDA autoscalerStrimzi Kafka

NEW — Phase 2

Ownership Boundary — YAML vs Graph Code

Concern	Owned By	How Changed	Who Changes It
Agent persona / system prompt	Multi-Agent Framework YAML	Edit YAML, config reload — no pod redeploy needed	Production Infrastructure engineers, domain experts
Tool grants per agent	Multi-Agent Framework YAML	Add / remove tools in agent YAML config	Production Infrastructure engineers
MCP server selection per agent	Multi-Agent Framework YAML	Add MCP server to agent's YAML config section	Production Infrastructure engineers
LLM model override per agent	Multi-Agent Framework YAML	Change llm.model field in agent YAML	Production Infrastructure engineers
Which agent runs after which	LangGraph Python	Conditional edge function in graph code	Architects (KS / Vikas)
Confidence threshold value	LangGraph Python	Python constant in graph node — single line	Architects
HiTL interrupt points	LangGraph Python	interrupt() call placement in node function	Architects
Retry logic / max retries	LangGraph Python	Edge condition counter in graph state	Architects
Output routing (Helix vs NOK UI)	LangGraph Python	Conditional edge on output type / destination field	Architects

What Multi-Agent Framework Keeps That LangGraph Cannot Match

MCP server exposure (/mcp): Multi-Agent Framework natively exposes itself as an MCP server. Any MCP-compatible client — Claude Desktop, VSCode Copilot, other Production Infrastructure tools — can consume Multi-Agent Framework capabilities out of the box. LangGraph has no equivalent.

A2A protocol (/a2a/v1): Built-in Agent-to-Agent protocol enables Production Infrastructure agents deployed in different clusters or services to call each other across network boundaries. Not in LangGraph.

YAML zero-code agent definition: Production Infrastructure engineers add a new agent persona by editing a YAML file with no Python changes and no redeployment. In a multi-OpCo rollout where each OpCo needs slightly different agent configs, this is commercially significant — it keeps agent customisation in Production Infrastructure's hands, not the architect's.

09 · Architecture Decision

Scoring — Three Options Compared

13-dimension weighted comparison. Delivery speed dimensions are weighted Critical given the March-focused TEL–NOK engagement. Greenfield HMA LangGraph scores higher on orchestration and HA design — but collapses on every dimension that determines whether Phase 2 delivers on time.

Dimension	Multi-Agent Framework Now	Evolved Multi-Agent Framework (Phase 1 complete)	Greenfield HMA (12 wks build)	Weight
Deterministic orchestration	3/10	9/10	10/10	HIGH
Phase 1 demo readiness	9/10	10/10	1/10	CRITICAL
Phase 2 MVP delivery speed	8/10	9/10	3/10	CRITICAL
Parallel agent execution	2/10	8/10	10/10	MEDIUM
Production HA (Phase 3)	3/10	9/10	9/10	HIGH
Production Infrastructure KB / RAG scale	4/10	9/10	9/10	HIGH
LLM cost governance	7/10	9/10	10/10	MEDIUM
MCP / A2A ecosystem	10/10	10/10	2/10	MEDIUM
YAML zero-code config	10/10	10/10	1/10	MEDIUM
Security / Vault / OPA	5/10	9/10	9/10	HIGH
Observability	8/10	9/10	9/10	MEDIUM
Reuse of Production Infrastructure work	10/10	10/10	1/10	CRITICAL
Production Infrastructure CMG domain configs	9/10	10/10	1/10	CRITICAL
Weighted Total	6.1 / 10	9.2 / 10 ✓ RECOMMENDED	6.7 / 10

Bottom Line for Decision-Makers

Do not replace Multi-Agent Framework. Evolve it.

Greenfield HMA LangGraph scores higher on orchestration and parallel execution — but only because it hasn't been built yet. When delivery speed is weighted appropriately for the March-focused engagement, the evolved Multi-Agent Framework path (9.2) outscores greenfield (6.7) by 2.5 points. The three production gaps — non-deterministic routing, single-process state, and no parallel execution — are all closable in phases without discarding the Production Infrastructure CMG domain work, LLM Gateway integration, MCP exposure, or YAML-driven agent definition that Multi-Agent Framework already provides.

Phase 0 (2–3 wks) · PG session store + Langfuse + LLM GW validation → Demo ready
Phase 1 (6–8 wks) · LangGraph overlay + Qdrant + Valkey → Phase 2 MVP ready
Phase 2 (8–10 wks) · Agent isolation + KEDA + Kafka + Vault/Istio → Phase 3 production ready
Phase 3 (12+ wks) · Adaptive cognition + episodic memory + MCP graph endpoints

Multi-Agent Framework
Evolution Architecture

Multi-Agent Framework
Evolution Architecture

What Multi-Agent Framework Actually Is

Anatomy of the Stack

How a Request Flows

What Works — and What Doesn't

Present vs. Absent

The Three Gaps That Actually Matter

Four-Phase Multi-Agent Framework Evolution

LangGraph Routes. Multi-Agent Framework Executes.

Ownership Boundary — YAML vs Graph Code

Scoring — Three Options Compared

Seen enough architecture?
See it running in production.

Multi-Agent FrameworkEvolution Architecture

Multi-Agent FrameworkEvolution Architecture

What Multi-Agent Framework Actually Is

Anatomy of the Stack

How a Request Flows

What Works — and What Doesn't

Present vs. Absent

The Three Gaps That Actually Matter

Four-Phase Multi-Agent Framework Evolution

LangGraph Routes. Multi-Agent Framework Executes.

Ownership Boundary — YAML vs Graph Code

Scoring — Three Options Compared

Seen enough architecture?See it running in production.

Multi-Agent Framework
Evolution Architecture

Multi-Agent Framework
Evolution Architecture

Seen enough architecture?
See it running in production.