Technical Deep Dive · Hierarchical Multi-Agent Orchestration

HMA Architecture
Deep Dive

A production-grade technical reference for Hierarchical Multi-Agent Orchestration — the architectural pattern Atsky uses to deploy AI-driven network log analytics and AIOps at enterprise scale. Covers the LangGraph supervisor, every agent component, the RAG/skills layer, state management, and security controls.

See Our Product → Architecture Blueprints Discuss a Deployment ↗

Why HMA? Component Map LangGraph Orchestrator Every Agent RAG Layer State & Memory Security Gate

⊕ RECOMMENDED ARCHITECTURE · HMA Blueprint 1 · Deep Dive

Hierarchical Multi-Agent
Orchestration for
Enterprise Network Log Analytics

A full elaboration of how, who, when, and where the LangGraph Orchestrator, Atsky LLM Gateway, Specialist Agents, RAG Skills, and supporting infrastructure interact across UC1 (Anomaly Enrichment) and UC2 (Intelligent Log Analyzer) — from event ingestion to final output surfacing in BHOM and the NOK Agent Interface.

Architecture

HMA · Hierarchical Multi-Agent Orchestration

Use Cases

UC1 Anomaly Enrichment · UC2 Log Analyzer

Orchestrator

LangGraph · Supervisor Agent

LLM Gateway

Atsky LLM Lite GW · OpenAI + OSS via URL

SOW Phase

Phase 1 Demo → Phase 2 MVP → Phase 3 Prod

// 01 · RATIONALE

Why Hierarchical Multi-Agent Orchestration?

The Enterprise Operator engagement operates across two distinct use case families — UC1 Anomaly Enrichment (reactive, driven by BHOM situations) and UC2 Intelligent Log Analyzer (proactive, driven by structured log ingestion from CMG-C/U, Kalix KPI reports, BHOM counters, and Cflowd). Both require deterministic, auditable, compliance-grade behaviour from day one. The HMA pattern is recommended because:

Central control point — full lineage of every LLM call, routing decision, and tool result

Specialist agents dispatched in parallel per graph execution — no sequential bottleneck

100%

LLM calls proxied through Atsky LLM Gateway — no agent bypasses cost or rate governance

Output channels: Helix GPT (UC1), NOK Agent UI (UC2 on-demand), Periodic Digest (UC2 scheduled)

Orchestration Philosophy

The Orchestrator does not execute domain logic. It is a pure routing and state machine — it decomposes incoming events into typed tasks, assigns them to agents with typed inputs, tracks their outputs in a shared graph state, and decides what to do next based on those outputs. Domain intelligence lives exclusively inside specialist agents and their attached skills. This separation is what makes the system testable, replaceable, and auditable.

// 02 · ARCHITECTURE ANATOMY

Component Map — Every Layer Explained

The HMA architecture is composed of eight distinct layers. Each has a single responsibility, a defined interface to its neighbours, and a clear operational owner.

🏛️

LangGraph Orchestrator (Supervisor Agent)

The central nervous system. A LangGraph state machine that owns the full execution graph. It receives all incoming triggers (BHOM anomaly push, log scheduler, user prompt), decomposes them into typed task nodes, dispatches agents in parallel or sequential order depending on dependency, tracks all intermediate state in a shared checkpoint, and decides termination or re-routing. It is the only component that calls the Policy Gate — no agent has direct access to the auth layer.

LangGraph v0.2+PythonSupervisor patternCheckpointingHiTL interrupt

⚡

Atsky LLM Gateway

The single egress point for all LLM inference. Hosted as part of the ENTERPRISE AI framework and reached by all agents via a single URL: https://llmgateway.enterprise-ai.internal/v1 (OpenAI-compatible API). Every agent sends a structured HTTP POST with its system prompt, user context, and tool schema. The gateway applies token budgets, rate limits, model routing (GPT-4.1 → LLaMA3-70B → Mistral-7B fallback), prompt caching, and cost tagging per agent type before forwarding to the actual model host.

OpenAI-compatibleOSS hostedModel routerCost governanceENTERPRISE AI framework

🔍

UC1 Agent Pair — Anomaly Enrichment

Agent A — Anomaly Enricher: Receives BHOM annotated situations. Queries Qdrant (Network Infrastructure Platform KB + telecom standards). Constructs enriched prompt, calls LLM Gateway. Returns structured JSON: severity, context, probable root cause hints.

Agent B — RCA Reasoner: Takes enriched context from Agent A, applies chain-of-thought reasoning via LLM Gateway, produces ranked root-cause hypotheses with confidence scores. Result feeds directly into Helix GPT for Next Best Actions.

RAGQdrantCoT reasoningBHOM → Helix GPTConfidence score

📊

UC2 Agent Pair — Log Analyzer

Agent C — Log Analyzer: Ingests CMG-C/U config, Kalix KPI reports, BHOM counters, Cflowd, cmd printouts. Applies log-pattern RAG skill (Network Infrastructure Platform log library vector store). Calls LLM Gateway to summarize, identify severity anomalies, flag capacity breach signals. Output: structured log analysis JSON.

Agent D — Capacity Planner: Triggered only if Agent C flags breach. Uses trend data + Network Infrastructure Platform planning knowledge base to call LLM Gateway for capacity enhancement proposals. Produces human-readable report for NOK Agent UI and periodic digest.

Log ingestionCapacity detectCMG-C/UKalix KPICflowdScheduled

📚

RAG / Skills Layer — Qdrant Vector Store

The domain knowledge backbone. Two namespaced vector stores in Qdrant: nok-kb-uc1 (Network Infrastructure Platform technical docs, telecom standards, anomaly resolution playbooks) and nok-kb-uc2 (Network Infrastructure Platform log pattern library, CMG-C/U documentation, capacity planning guides, Kalix metric definitions). Embeddings generated at ingestion time using a licensed embedding model. Retrieved via cosine similarity search with a cross-encoder reranker before being injected into the agent's LLM Gateway prompt as context.

Qdrantnok-kb-uc1nok-kb-uc2RerankerRAG pipeline

💾

State & Memory Layer

Valkey (Apache 2.0 Redis fork): In-flight agent working state, intermediate results between Agent A→B and C→D, agent liveness heartbeats.

PostgreSQL (CloudNativePG): LangGraph checkpoint persistence — the full graph execution state survives pod restarts. Immutable audit log (every LLM call, tool result, routing decision appended, never updated).

LangSmith: Full distributed trace per graph execution — every Orchestrator → Gateway → Agent hop with token counts, latency, and prompt/response captured.

ValkeyPostgreSQLLangSmith tracesImmutable audit

🔐

Security Layer — Policy Gate

The single authorization chokepoint. OPA (Open Policy Agent) evaluates every task dispatch from the Orchestrator: is this agent allowed to receive this task type? Is the requesting tenant/user JWT valid? Does the task match the RBAC policy for this graph run?

All LLM Gateway API keys are injected at pod startup via HashiCorp Vault (Vault Agent sidecar pattern) — never stored in environment variables or prompts. mTLS enforced between all services via Istio service mesh.

OPAJWT/RBACVaultmTLSIstio

📤

Output / Sink Layer

Three distinct output channels:

① BHOM / Helix GPT: UC1 enriched anomaly + ranked RCA pushed back to BMC Helix via REST API. Helix GPT uses this as context for Next Best Action recommendations to NOC operators.

② NOK Agent Interface: UC2 conversational bot — operator queries logs in natural language; Agent C+D respond on-demand through the interface.

③ Scheduled Digest: Curated periodic prompt runs (daily/weekly) → Agent C generates log summary report → delivered as email / dashboard widget.

Helix GPT RESTNOK Agent UIPeriodic DigestEmail / Dashboard

// 03 · ORCHESTRATOR DEEP-DIVE

Who Orchestrates? The LangGraph Supervisor Explained

Core Responsibility

The LangGraph Supervisor Agent is the only component that understands the full task graph. It is a Python process running a LangGraph StateGraph with typed nodes for each agent and typed edges representing conditional routing logic. It never executes business logic itself — it decomposes, delegates, collects, and decides.

The Orchestrator operates through four internal phases on every graph execution:

Phase D · Decompose

Task Decomposition & Classification

On receiving an input event (BHOM anomaly push or log trigger or user prompt), the Orchestrator's entry node runs an initial classification LLM call through the Atsky LLM Gateway. This call uses a lightweight system prompt to determine:

UC path: Is this a UC1 (anomaly) event, UC2 (log) event, or combined?
Complexity level: Simple (single agent sufficient) vs. complex (multi-agent parallel required)
Priority: P1 (immediate NOC alert) vs. P2 (scheduled analysis)

This single gateway call uses the cheapest available model (Mistral-7B) since it is purely a classification task — no domain knowledge required. Output is a typed TaskPlan Pydantic object.

Phase A · Authorize

Policy Gate Evaluation

Before dispatching any agent, the Orchestrator calls the OPA Policy Gate with the TaskPlan and the requesting entity's JWT. OPA evaluates three policies:

Agent authorization: Can this caller invoke Agent A/B/C/D?
Data access scope: Is the input data type (BHOM anomaly / CMG-C logs / Cflowd) accessible to this tenant?
Rate limit check: Is this graph run within the defined quota for this time window?

If any policy fails, the Orchestrator terminates the graph with an auth-failure event written to the audit log. No agent is ever invoked before OPA clears the task.

Phase R · Route & Dispatch

Parallel Fan-Out to Specialist Agents

The Orchestrator uses LangGraph's Send API to dispatch agents in parallel where there are no data dependencies:

UC1 path: Agent A (Anomaly Enricher) is always dispatched first. Agent B (RCA Reasoner) is dispatched after Agent A returns — it depends on Agent A's enriched context. Sequential dependency, not parallel.
UC2 path: Agent C (Log Analyzer) and Agent D (Capacity Planner) can run in parallel if the input already contains a prior breach signal. If Agent C must first detect the breach, Agent D is dispatched conditionally after Agent C returns.
Combined event: UC1 and UC2 agent chains run fully in parallel — they share no intermediate state.

Each dispatch packages a typed AgentInput containing: task type, data payload, RAG namespace to use, token budget override (if any), and trace correlation ID linking to LangSmith.

Phase M · Merge & Decide

Result Aggregation, Confidence Gating & Output Routing

As agents return, the Orchestrator's aggregator node collects typed AgentResult objects. It then applies:

Confidence gate: If any result has confidence_score < θ (default 0.72), the Orchestrator re-routes that agent's task with an augmented prompt (adds additional context from the State Store or retrieves similar historical cases from Qdrant). Max 2 re-runs before surfacing with a low-confidence flag.
Output routing: UC1 results → REST push to BHOM/Helix GPT. UC2 results → NOK Agent Interface response payload or scheduled digest message queue depending on trigger type.
Audit flush: Full execution record (inputs, all LLM calls + responses, routing decisions, outputs, confidence scores, latency) written to PostgreSQL audit table and LangSmith trace finalized.

Human-in-the-Loop (HiTL) Interrupt Points

The Orchestrator graph includes two interrupt nodes — points where execution pauses and waits for an operator approval callback before continuing:

Before any remediation action (UC1): If Agent B's RCA result includes a recommended next best action that involves a network configuration change, the graph halts and sends an approval request to the NOC operator via BHOM. Execution resumes only on approved=true callback.
Before capacity enhancement proposal delivery (UC2): Agent D's capacity plan is held at an interrupt node for a planning team reviewer sign-off before being sent to the scheduled digest channel.

// 04 · LLM GATEWAY TANDEM

How the Orchestrator & Agents Work in Tandem with the LLM Gateway

The relationship between the Orchestrator, its agents, and the Atsky LLM Gateway is the most architecturally critical interface in the system. No component calls an LLM directly. Every inference request flows through https://llmgateway.enterprise-ai.internal/v1 — an OpenAI-compatible REST endpoint hosted within the ENTERPRISE AI framework.

The Three-Party Contract

Every LLM invocation in this architecture involves exactly three parties in sequence:

Caller (Agent or Orchestrator) — sends a structured HTTP POST with system prompt, user message, tool schema, and agent-type header
Atsky LLM Gateway — validates the API key (from Vault), applies rate limit and token budget per agent type, selects the appropriate model, optionally retrieves a cached response, forwards to the model host, and returns the structured response
Model Host — either OpenAI API (GPT-4.1 / GPT-4o) or an OSS model endpoint (LLaMA3-70B / Mistral-7B) hosted within ENTERPRISE AI infrastructure

The gateway implements a model routing policy that maps agent types to model tiers:

Agent / Call Type	Trigger Condition	Model Selected	Why
Orchestrator — task classification	Every graph entry	Mistral-7B	Simple classification — cost-optimized
Agent A — Anomaly Enricher	Every UC1 event	GPT-4o	Needs broad telecom knowledge + structured JSON output
Agent B — RCA Reasoner	After Agent A returns	GPT-4.1	Complex multi-step chain-of-thought reasoning required
Agent C — Log Analyzer	Every UC2 event	GPT-4o	Long-context log summarization + pattern recognition
Agent D — Capacity Planner	Conditional: breach detected	GPT-4.1	Analytical reasoning over KPI trends + planning output
Confidence Evaluator (Orchestrator)	On low-confidence result	Mistral-7B	Simple scoring task — cost-optimized re-check

"The LLM Gateway is not just a proxy — it is a governance layer. Swapping GPT-4.1 for a new model requires a change in the gateway routing table only — zero changes to any agent code. This is the architectural benefit of the gateway pattern in a multi-year engagement like Enterprise Operator."

// 05 · AGENT ELABORATION

Every Agent — What It Does, When It's Invoked, What It Returns

Step 0 · Entry · Triggered by: BHOM push / Scheduler / User prompt

Orchestrator Entry Node — Event Ingestion & Classification

Who: LangGraph Supervisor Agent (Orchestrator process)
When: On every incoming event — BHOM webhook push (UC1), cron-scheduled log batch (UC2), or user query via NOK Agent Interface (UC2 on-demand)
Where: Runs as the FastAPI application pod on Tanzu Kubernetes. Listens on internal service endpoint.
How: Deserializes incoming payload → validates schema → makes a lightweight classification call to LLM Gateway (Mistral-7B) → creates a typed GraphState object with task type, priority, and routing plan → starts the LangGraph execution loop. The full event is written to State Store (Valkey) with a TTL of 4 hours.

Step 1 · Auth · Triggered by: Orchestrator after classification

Policy Gate — OPA Authorization Check

Who: OPA (Open Policy Agent) sidecar — called synchronously by the Orchestrator
When: Immediately after task classification, before any agent dispatch
Where: OPA runs as a sidecar container in the Orchestrator pod (sidecar pattern) — zero network hop
How: Orchestrator calls OPA's local HTTP endpoint (localhost:8181/v1/data/policy/allow) with the task plan as input data. OPA evaluates the Rego policy bundle (loaded from Vault at startup) and returns allow: true/false within <5ms. Policy covers agent authorization, data scope, and rate limits. On deny, graph terminates immediately with audit record.

Step 2a · UC1 Agent A · Triggered by: OPA allow on UC1 task

Agent A — Anomaly Enricher

Who: Dedicated Python microservice (agent-anomaly-enricher) running as a Kubernetes Deployment
When: Dispatched by Orchestrator via internal gRPC call for every UC1 event. Runs immediately after auth approval — no pre-conditions on other agents.
Where: Tanzu Kubernetes, namespace enterprise-operator-agents, separate pod from Orchestrator with its own resource quota (2 vCPU, 4Gi)
How (4-step internal loop):

① Receive: Typed BhomAnomaly input (situation ID, annotations, timestamp, affected nodes)
② RAG Query: Embeds anomaly description → queries Qdrant namespace nok-kb-uc1 (Network Infrastructure Platform technical docs + anomaly playbooks) with cosine similarity → applies cross-encoder reranker → retrieves top-5 relevant document chunks
③ LLM Gateway Call: Constructs prompt with system context (Network Infrastructure Platform domain expert persona) + RAG chunks + anomaly data → POST to llmgateway.enterprise-ai.internal/v1/chat/completions with X-Agent-Type: anomaly-enricher header → GPT-4o returns structured JSON (severity, affected service, context summary, probable cause hints)
④ Return: Typed EnrichedAnomaly object back to Orchestrator via gRPC response. Includes confidence score derived from RAG retrieval quality + LLM certainty markers.

Step 2b · UC1 Agent B · Triggered by: Agent A completion (sequential dependency)

Agent B — RCA Reasoner

Who: Dedicated Python microservice (agent-rca-reasoner)
When: Dispatched by Orchestrator only after Agent A returns EnrichedAnomaly. This is a sequential dependency edge in the LangGraph — Agent B cannot start until Agent A completes. Typical trigger latency: 2–4 seconds after Agent A returns.
Where: Same Kubernetes namespace, separate pod (3 vCPU, 6Gi — larger due to long-context reasoning calls)
How:

① Receive: EnrichedAnomaly from Orchestrator state + original BHOM situation
② Second RAG Query (optional): If confidence from Agent A < 0.8, queries Qdrant for historical similar anomalies and their resolution records to augment context
③ LLM Gateway Call (Chain-of-Thought): Uses a multi-turn prompt structure — system prompt instructs GPT-4.1 to reason step-by-step through root causes using the Network Infrastructure Platform Packet Core domain context. Temperature set to 0.2 (deterministic reasoning preferred). Response is streamed for latency optimization.
④ Return: RcaResult object: ranked root-cause hypotheses (max 3), each with probability score, supporting evidence citations from Network Infrastructure Platform KB, and recommended next best action category.

Step 3a · UC2 Agent C · Triggered by: OPA allow on UC2 task (scheduler or user prompt)

Agent C — Log Analyzer

Who: Dedicated Python microservice (agent-log-analyzer)
When: Two trigger modes: (i) Scheduled — cron job fires at configured interval (e.g., every 6 hours), Orchestrator receives scheduled event from the Task Scheduler, dispatches Agent C with a batch of recent logs. (ii) On-demand — operator sends a natural language query via NOK Agent Interface ("Show me CMG-C errors in the last 24 hours"), Orchestrator receives and dispatches Agent C with query context. Both paths use the same agent code — only the input wrapper differs.
Where: Kubernetes pod, 4 vCPU / 8Gi — largest agent due to log context windows. Reads log data from a pre-staged object store (S3-compatible MinIO bucket, populated by the log ingestion pipeline).
How (5-step):

① Data Fetch: Retrieves relevant log files from MinIO based on time window — CMG-C config logs, CMG-U config logs, Kalix KPI CSV exports, BHOM counter snapshots, Cflowd records, cmd printouts
② Preprocessing: Parses and normalizes log formats (different formats per source). Extracts key metrics (counter values, error codes, timestamps). Produces a structured log summary document (typically 4–12K tokens).
③ RAG Query: For each error code or anomalous pattern detected in preprocessing, queries Qdrant namespace nok-kb-uc2 (Network Infrastructure Platform log pattern library, CMG documentation, known issue register) to retrieve relevant context and historical precedents.
④ LLM Gateway Call: Sends structured prompt to GPT-4o via gateway — system prompt is the Network Infrastructure Platform Core Network Log Analyst persona. User message contains the preprocessed log summary + RAG context. Instructions: identify top issues, assess severity, flag capacity breach indicators, suggest immediate actions. Response is structured JSON.
⑤ Return: LogAnalysisResult — issue list with severity scores, capacity breach flag (bool + breach percentage), top 5 recommended actions, anomaly correlation with BHOM counters.

Step 3b · UC2 Agent D · Triggered by: Agent C result with capacity_breach=true

Agent D — Capacity Planner

Who: Dedicated Python microservice (agent-capacity-planner)
When: Conditionally dispatched — the Orchestrator's routing logic checks LogAnalysisResult.capacity_breach == true. If false, Agent D is never invoked and the graph proceeds directly to output. If true, Agent D is dispatched. This conditional edge is a core LangGraph routing pattern — if breach detected → route_to_agent_d else → route_to_output.
Where: Same Kubernetes namespace, lighter pod (2 vCPU, 4Gi)
How:

① Receive: LogAnalysisResult + historical KPI trends from the State Store (Valkey cache of recent Kalix metrics)
② RAG Query: Queries Qdrant nok-kb-uc2 for Network Infrastructure Platform capacity planning guidelines, CMG-C scaling procedures, and prior capacity enhancement case studies from the Network Infrastructure Platform knowledge base
③ LLM Gateway Call: GPT-4.1 with Network Infrastructure Platform Capacity Planning Engineer persona. Prompt contains: current breach metrics, trend data, RAG context (planning guidelines). Output: capacity enhancement proposals in structured format with priority ranking, estimated effort, and risk assessment.
④ Return: CapacityPlan — 3–5 enhancement proposals, each with: recommendation text, affected nodes, priority (P1/P2/P3), estimated capacity gain percentage, implementation complexity.

Step 4 · Aggregation · Triggered by: All dispatched agents returned

Orchestrator Aggregator Node — Merge, Gate, Route

Who: Orchestrator (LangGraph aggregator node)
When: Executes when all parallel/sequential agent branches for a given graph run have completed (or timed out with partial results)
How: Collects all AgentResult objects from Valkey state. Applies confidence gate (re-routes below-threshold agents, max 2 retries). Constructs final output payload. Routes to appropriate sink(s): UC1 → BHOM REST API, UC2 (on-demand) → NOK Agent Interface response, UC2 (scheduled) → digest queue. Writes full audit record to PostgreSQL. Finalizes LangSmith trace.

Step 5 · Output · Final delivery to consuming systems

Output Sinks — BHOM / NOK Agent UI / Scheduled Digest

BHOM / Helix GPT (UC1): Orchestrator calls the BHOM REST API with the enriched anomaly + RCA result as structured JSON. Helix GPT ingests this as additional context for its Next Best Action generation. The NOC operator sees enriched incident details directly in the BMC Helix AIOps interface.

NOK Agent Interface (UC2 on-demand): The Orchestrator returns the LogAnalysisResult (and optionally CapacityPlan) to the FastAPI response stream that the operator's browser is connected to. Rendered as a conversational response in the NOK Agent bot interface.

Scheduled Digest (UC2 scheduled): Orchestrator publishes the analysis to a digest message queue. A lightweight notification service picks this up, formats it as a human-readable report (Markdown → HTML email), and dispatches to the configured distribution list or inserts into the NOC dashboard widget.

// 06 · SKILLS & RAG

Agent Skills — Domain Knowledge as Modular Capabilities

In the HMA architecture, Skills are the modular, reusable domain knowledge packages that agents load at invocation time. A Skill is the combination of: a Qdrant vector namespace (the knowledge corpus), an embedding configuration, a retrieval strategy (similarity threshold, top-k, reranker model), and a system-prompt fragment that tells the LLM Gateway how to use that knowledge in its response. Skills are defined as YAML configuration files and loaded by agents at startup — an agent can be re-skilled by updating its config without code changes.

SKILL · UC1

Network Infrastructure Platform Anomaly Resolution Playbook

Vectorized Network Infrastructure Platform technical documentation covering Packet Core anomaly patterns, resolution procedures, and known-issue registers. Loaded by Agent A (Anomaly Enricher).

Invoked by: Agent A · Qdrant ns: nok-kb-uc1 · Top-k: 5 · Reranker: cross-encoder

SKILL · UC1

RCA Chain-of-Thought Reasoning

A structured reasoning prompt template + historical RCA resolution corpus. Guides the LLM to reason step-by-step through root cause hypotheses using Network Infrastructure Platform domain vocabulary.

Invoked by: Agent B · System prompt template + Qdrant RCA history corpus

SKILL · UC2

Network Infrastructure Platform Log Pattern Library

Vectorized CMG-C/U documentation, Kalix metric definitions, known error code catalogue, BHOM counter semantics, and Cflowd interpretation guides.

Invoked by: Agent C · Qdrant ns: nok-kb-uc2 · Top-k: 8 · Similarity: cosine >0.72

SKILL · UC2

Log Summarization Persona

A structured LLM Gateway system prompt that configures the model as a Network Infrastructure Platform Core Network Log Analyst — governing tone, output schema, severity classification criteria, and citation format.

Invoked by: Agent C · Gateway header: X-Skill: log-summarizer · Model: GPT-4o

SKILL · UC2

Capacity Planning Guidelines

Network Infrastructure Platform CMG-C scaling procedures, capacity planning best practices, and historical enhancement case studies — used by Agent D to generate grounded, evidence-backed capacity proposals.

Invoked by: Agent D · Qdrant ns: nok-kb-uc2/capacity · Top-k: 6

SKILL · SHARED

Telecom Standards Context

3GPP standards, ETSI NFV specifications, and Network Infrastructure Platform whitepaper content relevant to both UC1 (anomaly standards) and UC2 (KPI threshold definitions). Shared across both agent families.

Invoked by: Agent A, B, C · Qdrant ns: nok-kb-shared · On-demand retrieval

How Skills Get Built (RAG Ingestion Pipeline)

Skills are built through an offline ingestion pipeline that runs periodically (or triggered by Network Infrastructure Platform KB updates):

Source ingestion: Network Infrastructure Platform PDFs, Word docs, HTML pages, and structured CSV exports are fetched from the Network Infrastructure Platform knowledge base S3 bucket
Chunking: Documents are split into overlapping chunks (512 tokens, 50-token overlap) using semantic sentence boundaries
Embedding: Each chunk is embedded using a licensed embedding model (Apache 2.0 compliant — e.g., bge-m3 or e5-mistral) via the LLM Gateway embedding endpoint
Storage: Vectors stored in Qdrant with metadata: source document, chunk index, namespace (uc1/uc2/shared), Network Infrastructure Platform product version tag, and ingestion timestamp
Update strategy: New documents are incrementally added. Outdated chunks are marked stale and filtered from retrieval results. Full re-index runs quarterly.

// 07 · END-TO-END FLOWS

Complete Execution Flows — UC1 & UC2 Side by Side

● UC1 — Anomaly Enrichment & RCA Flow

● UC2 — Intelligent Log Analyzer Flow (Scheduled Trigger)

// 08 · HOW · WHO · WHEN · WHERE

Complete Invocation Reference Matrix

Every significant invocation in the system — LLM calls, agent dispatches, RAG queries, and infrastructure interactions — catalogued with the four key dimensions.

Invocation	Who Invokes	How	When (Trigger)	Where (Infrastructure)
Task Classification LLM call	Orchestrator	HTTP POST to LLM GW · Mistral-7B · system: classifier prompt · returns typed TaskPlan JSON	Every graph entry — BHOM webhook, cron event, or user query received	Orchestrator pod → Gateway service · internal cluster DNS · sub-10ms routing
OPA Policy Gate check	Orchestrator	HTTP GET `localhost:8181/v1/data/policy/allow` · OPA sidecar in same pod	After classification, before every agent dispatch	In-pod sidecar — zero network hop · <5ms latency
Agent A dispatch	Orchestrator	LangGraph `Send()` API → gRPC call to agent-anomaly-enricher service · typed AgentInput	Every UC1 event, immediately after OPA allow	Kubernetes service: `agent-anomaly-enricher.enterprise-operator-agents.svc`
RAG query — nok-kb-uc1	Agent A	Qdrant gRPC client · embed anomaly desc → cosine search top-5 → cross-encoder rerank	Inside Agent A, after receiving input, before LLM Gateway call	Qdrant StatefulSet · Tanzu persistent volume · same namespace
Anomaly Enrichment LLM call	Agent A	HTTP POST to LLM GW · GPT-4o · Network Infrastructure Platform domain expert persona · RAG context + anomaly in user msg · structured JSON output schema	After RAG retrieval within Agent A	LLM Gateway service → OpenAI API (or OSS endpoint) · API key from Vault
Agent B dispatch	Orchestrator	LangGraph conditional edge: Agent A result received → dispatch Agent B with EnrichedAnomaly	After Agent A returns — sequential dependency (not parallel)	Kubernetes service: `agent-rca-reasoner.enterprise-operator-agents.svc`
RCA Reasoning LLM call	Agent B	HTTP POST to LLM GW · GPT-4.1 · multi-turn CoT prompt · temperature 0.2 · streaming response	Inside Agent B, with enriched context + optional RAG augmentation	LLM Gateway → OpenAI GPT-4.1 · Streaming response for <P95 latency
BHOM/Helix GPT push (UC1 output)	Orchestrator	REST API call to BHOM endpoint · enriched anomaly + RCA result as JSON body	After Orchestrator aggregator node receives RcaResult from Agent B	BHOM REST endpoint · ENTERPRISE AI perimeter · authenticated with service account token
Agent C dispatch (Scheduled)	Orchestrator	LangGraph entry triggered by cron event → dispatch Agent C with batch time window + log source config	Configured cron schedule (e.g., every 6h) · Kubernetes CronJob fires Orchestrator webhook	Kubernetes service: `agent-log-analyzer.enterprise-operator-agents.svc`
Agent C dispatch (On-demand)	Orchestrator	User query received via NOK Agent Interface FastAPI endpoint → Orchestrator parses NL query → dispatches Agent C with query context	When operator submits query in NOK Agent bot interface	FastAPI response stream held open · Agent C response streamed back to UI
Log data fetch (MinIO)	Agent C	S3 client GET · time-windowed log files (CMG-C/U config, Kalix KPI CSV, Cflowd, BHOM counters, cmd printouts)	First step inside Agent C · before any RAG or LLM calls	MinIO StatefulSet (S3-compatible) · Tanzu persistent volume · same namespace
RAG query — nok-kb-uc2	Agent C	Per detected error code: embed → cosine search Qdrant uc2 namespace → top-8 results → rerank → inject into prompt context	After log preprocessing, for each flagged error pattern	Qdrant · log pattern library namespace · same cluster
Log Analysis LLM call	Agent C	HTTP POST to LLM GW · GPT-4o · Network Infrastructure Platform Core Log Analyst persona · structured log summary + RAG context · output schema: LogAnalysisResult	After data fetch + RAG retrieval within Agent C	LLM Gateway → GPT-4o · long-context window (up to 128K) · response time 4–8s
Agent D dispatch (conditional)	Orchestrator	LangGraph conditional edge: `if LogAnalysisResult.capacity_breach == true → Send(Agent D)` else → route directly to output node	Only when Agent C flags capacity breach threshold exceeded	Kubernetes service: `agent-capacity-planner.enterprise-operator-agents.svc`
Capacity Plan LLM call	Agent D	HTTP POST to LLM GW · GPT-4.1 · Network Infrastructure Platform Capacity Planning Engineer persona · breach metrics + KPI trends + RAG capacity guidelines	Inside Agent D after RAG retrieval of capacity planning guidelines	LLM Gateway → GPT-4.1 · response: CapacityPlan JSON with 3–5 ranked proposals
Confidence gate re-route	Orchestrator	Orchestrator checks `confidence_score` on each AgentResult · calls LLM GW (Mistral-7B) to evaluate quality · if <θ → re-dispatch agent with augmented context	On receiving any AgentResult with confidence_score < 0.72 · max 2 re-runs	Orchestrator logic node · additional LLM GW call to Mistral-7B for cheap evaluation
Audit log write	Orchestrator	Append-only INSERT to PostgreSQL audit table (never UPDATE/DELETE) · full execution record per graph run	On every LLM call completion, routing decision, and graph finalization	PostgreSQL (CloudNativePG) · Tanzu persistent volume · SIEM-forwarded via Fluent Bit
HiTL Interrupt (remediation)	Orchestrator	LangGraph interrupt node pauses graph · sends approval request to BHOM notification API · waits for async callback with `approved=true/false`	When Agent B's RCA result recommends a network configuration action	BHOM notification API · NOC operator sees approval prompt in Helix UI · graph resumes on callback

// 09 · SECURITY & GOVERNANCE

Security, Secrets, and Observability Architecture

🔑

Secrets Management

All LLM Gateway API keys, BHOM service account tokens, PostgreSQL credentials, and Qdrant access keys are stored in HashiCorp Vault. Injected into pods at startup via Vault Agent sidecar (annotation-based injection on all pods). Keys are short-lived (24h TTL) and rotated automatically. No secrets appear in environment variables, pod specs, or prompt text — any attempt to include credentials in an LLM prompt would be caught by the Guardrail filter.

HashiCorp VaultVault Agent sidecar24h TTL rotationNo env secrets

🛡️

Network Security

mTLS everywhere via Istio service mesh — all inter-pod communication is mutually authenticated and encrypted. Agents cannot communicate directly with each other (no east-west agent-to-agent traffic). All routing goes through the Orchestrator. The LLM Gateway is the only egress point outside the cluster — it is the sole component with outbound internet access (to OpenAI API). All other pods are network-policy restricted to cluster-internal only.

Istio mTLSNetwork policiesSingle egressAgent isolation

📡

Observability Stack

LangSmith: Full distributed trace per graph execution — every Orchestrator decision, agent invocation, LLM Gateway call (with token count, model, latency, prompt hash) captured in a single trace. Prometheus + Grafana: Per-agent metrics (invocation count, p95 latency, LLM call duration, confidence score distribution). OpenTelemetry: Distributed spans propagated via traceparent header through Orchestrator → Agent → LLM Gateway. Fluent Bit → SIEM: Audit log events streamed to the SIEM for security monitoring.

LangSmithPrometheusGrafanaOTELSIEM

💰

LLM Cost Governance

The LLM Gateway enforces per-agent-type token budgets: Agent A (8K input / 1K output max), Agent B (16K input / 2K output), Agent C (96K input / 4K output — long-context logs), Agent D (8K input / 2K output). If an agent exceeds its budget, the gateway returns a truncated context error — the Orchestrator logs this and re-submits with a summarized input. Monthly cost tracking per use case (UC1 vs UC2) via gateway cost-tag headers visible in ENTERPRISE AI billing dashboard.

Token budgets per agentCost taggingUC1 vs UC2 allocationENTERPRISE AI billing

// 10 · PHASED DELIVERY

What Gets Built When — SOW Phase Mapping

Phase 1 · Demo · NOK Labs · 4–6 Weeks

Demoware — Prove the LLM + RAG Capability

Build a single-agent simplified version of Agent C (Log Analyzer) running in Network Infrastructure Platform Labs environment. Demonstrates: reading logs with LLM models via LLM Gateway, summarizing core network logs using the Network Infrastructure Platform knowledge base (manual Qdrant ingestion of a curated subset), and identifying key problems + RCA hints. No Orchestrator yet — single FastAPI endpoint that calls the LLM Gateway directly. Goal: prove functional sufficiency of the log analysis approach to TEL stakeholders. Deliverables: Demoware, Design + Architecture doc, Demo video.

Phase 2 · MVP · ENTERPRISE AI Environment · Offline TEL Data · 10–14 Weeks

Working MVP — Full HMA Architecture on ENTERPRISE AI Stack

Deploy the full HMA architecture on ENTERPRISE AI framework using offline (batch) TEL data sets. Includes: LangGraph Orchestrator, Agent A+B (UC1), Agent C+D (UC2), Qdrant vector stores with full Network Infrastructure Platform KB ingestion, LLM Gateway integration (Atsky LLM Lite GW), Valkey state store, PostgreSQL audit, and both output channels (BHOM REST + NOK Agent Interface). Integration testing with TEL pre-prod. Prompt engineering and model tuning based on TEL feedback. HiTL interrupt nodes operational. Live BHOM integration wired but fed with offline replay data.

P2.5

Phase 2.5 · Support · 3 Months Post-MVP

Model Tuning, Hallucination Management & Drift Monitoring

Data quality monitoring, agent output quality tracking (confidence score trends, RAG retrieval quality metrics), LLM response hallucination detection (fact-checking against Network Infrastructure Platform KB), prompt engineering refinements based on production feedback, and SOP documentation. Health Dashboard live. Support ticket SLA defined.

Phase 3 · Production · Live TEL Systems · Full Data Pipelines

Production Deployment — Live Data, Full Integration, Scale

Wire live BHOM anomaly webhook, live log ingestion pipeline (real-time CMG-C/U, Kalix, Cflowd feeds), scale agent deployments for production load, activate Kafka-based event bus for high-volume log streaming (BP2 ERAM elements overlaid on the HMA graph), finalize Gateway API migration, activate full Vault + Istio production security posture. Operate RACI per SOW matrix with TEL owning data availability and Network Infrastructure Platform owning model development + deployment.

Bottom Line — Why This Architecture Holds for the Full Journey

The HMA blueprint is the correct choice for Phase 1 through Phase 3 because it gives Network Infrastructure Platform and Enterprise Operator a single mental model that scales: start with a demo-grade single agent (Phase 1), grow into the full multi-agent graph (Phase 2), layer event-driven Kafka components on top when live throughput demands it (Phase 3+), and eventually evolve selected graph nodes into adaptive cognitive loops for deep autonomous RCA (future). The architecture grows with the engagement — it does not require a re-architecture at each phase boundary. The LangGraph state machine simply adds nodes and edges as capabilities mature.

Related Architecture Resources

AI Blueprints

Agentic AI Architecture Blueprints

HMA, ERAM, and ACL patterns — choose the right architecture for your use case.

Evolution Path

Multi-Agent Framework Evolution

Four-phase evolution from single agent to fully autonomous multi-agent systems.

Infrastructure

AI Enrichment Network Architecture

Kubernetes and network-level topology of the production AI enrichment pipeline.

HMA Architecture Deep Dive

Hierarchical Multi-AgentOrchestration forEnterprise Network Log Analytics

Why Hierarchical Multi-Agent Orchestration?

Orchestration Philosophy

Component Map — Every Layer Explained

LangGraph Orchestrator (Supervisor Agent)

Atsky LLM Gateway

UC1 Agent Pair — Anomaly Enrichment

UC2 Agent Pair — Log Analyzer

RAG / Skills Layer — Qdrant Vector Store

State & Memory Layer

Security Layer — Policy Gate

Output / Sink Layer

Who Orchestrates? The LangGraph Supervisor Explained

Core Responsibility

Task Decomposition & Classification

Policy Gate Evaluation

Parallel Fan-Out to Specialist Agents

Result Aggregation, Confidence Gating & Output Routing

Human-in-the-Loop (HiTL) Interrupt Points

How the Orchestrator & Agents Work in Tandem with the LLM Gateway

The Three-Party Contract

Every Agent — What It Does, When It's Invoked, What It Returns

Orchestrator Entry Node — Event Ingestion & Classification

Policy Gate — OPA Authorization Check

Agent A — Anomaly Enricher

Agent B — RCA Reasoner

Agent C — Log Analyzer

Agent D — Capacity Planner

Orchestrator Aggregator Node — Merge, Gate, Route

Output Sinks — BHOM / NOK Agent UI / Scheduled Digest

Agent Skills — Domain Knowledge as Modular Capabilities

SKILL · UC1

SKILL · UC1

SKILL · UC2

SKILL · UC2

SKILL · UC2

SKILL · SHARED

How Skills Get Built (RAG Ingestion Pipeline)

Complete Execution Flows — UC1 & UC2 Side by Side

● UC1 — Anomaly Enrichment & RCA Flow

● UC2 — Intelligent Log Analyzer Flow (Scheduled Trigger)

Complete Invocation Reference Matrix

Security, Secrets, and Observability Architecture

Secrets Management

Network Security

Observability Stack

LLM Cost Governance

What Gets Built When — SOW Phase Mapping

Demoware — Prove the LLM + RAG Capability

Working MVP — Full HMA Architecture on ENTERPRISE AI Stack

Model Tuning, Hallucination Management & Drift Monitoring

Production Deployment — Live Data, Full Integration, Scale

Bottom Line — Why This Architecture Holds for the Full Journey

Want this architecturerunning in your environment?

HMA Architecture
Deep Dive

Hierarchical Multi-Agent
Orchestration for
Enterprise Network Log Analytics

Want this architecture
running in your environment?