Skip to content

Architecture

High-Level Overview

The Agentic AI platform follows a layered architecture with a supervisor orchestration pattern. User requests flow through a Next.js frontend to a FastAPI backend, where a supervisor agent intelligently routes to specialist agents powered by local Ollama inference.

flowchart TB
    subgraph "Presentation Layer"
        UI[Next.js 14 UI<br/>shadcn/ui + Tailwind]
    end

    subgraph "API Layer"
        API[FastAPI<br/>Streaming SSE]
    end

    subgraph "Orchestration Layer"
        SUP[Supervisor Agent<br/>Request Classification]
        K8S[Kubernetes Agent]
        TF[Terraform Agent]
        AWS[AWS Agent]
        PY[Python Agent]
        FE[Frontend Agent]
        ARCH[Architect Agent]
    end

    subgraph "Intelligence Layer"
        OLLAMA[Ollama<br/>qwen2.5:32b]
        RAG[RAG Pipeline<br/>nomic-embed-text]
    end

    subgraph "Storage Layer"
        DDB[(DynamoDB<br/>Sessions & History)]
        S3[(S3<br/>Documents & Knowledge)]
        PGV[(pgvector<br/>Embeddings)]
        REDIS[(Redis<br/>Cache)]
    end

    UI -->|HTTP/SSE Stream| API
    API -->|Route Request| SUP
    SUP -->|Delegate| K8S & TF & AWS & PY & FE & ARCH
    K8S & TF & AWS & PY & FE & ARCH -->|Inference| OLLAMA
    K8S & TF & AWS & PY & FE & ARCH -->|Context| RAG
    RAG -->|Embeddings| PGV
    API -->|Session State| DDB
    RAG -->|Document Retrieval| S3
    API -->|Response Cache| REDIS

Supervisor Orchestration

The platform uses agent-squad's supervisor pattern rather than a sequential chain. The supervisor agent analyzes each incoming request and routes it to the most appropriate specialist.

sequenceDiagram
    participant U as User
    participant API as FastAPI
    participant S as Supervisor
    participant A as Specialist Agent
    participant O as Ollama
    participant R as RAG Pipeline

    U->>API: Send message
    API->>API: Load session history (DynamoDB)
    API->>S: Route request
    S->>O: Classify intent
    O-->>S: Best agent: "kubernetes"
    S->>A: Delegate to Kubernetes Agent
    A->>R: Retrieve relevant context
    R->>R: Embed query (nomic-embed-text)
    R-->>A: Top-k documents
    A->>O: Generate response (with context)
    O-->>A: Streaming tokens
    A-->>API: Stream response
    API-->>U: SSE stream
    API->>API: Persist to DynamoDB

Routing Strategy

Routing Mode Description
Supervisor LLM-based classification — supervisor analyzes request and selects best agent
Direct Explicit agent selection — user or API specifies which agent to use

The supervisor agent uses the descriptions and capabilities defined in each agent's YAML configuration to make routing decisions.

Module-Blueprint Pattern

The infrastructure follows a two-tier composition pattern that separates reusable modules from domain-specific deployments.

flowchart TB
    subgraph "Terraform Modules (Reusable)"
        DDB_MOD[dynamodb module<br/>Tables, indexes, capacity]
        S3_MOD[s3 module<br/>Buckets, policies, lifecycle]
        PGV_MOD[pgvector module<br/>PostgreSQL + extension]
        OBS_MOD[observability module<br/>Metrics, traces, dashboards]
    end

    subgraph "Blueprints (Domain-Specific)"
        subgraph "DevAssist Blueprint"
            DA_TF[terraform/<br/>Compose modules]
            DA_AGENTS[agents/<br/>7 YAML definitions]
            DA_KB[knowledge/<br/>RAG documents]
            DA_CFG[config.yaml<br/>Blueprint config]
        end

        subgraph "Future Blueprint"
            FB[Custom composition<br/>of same modules]
        end
    end

    DA_TF -->|Uses| DDB_MOD & S3_MOD & PGV_MOD & OBS_MOD
    FB -.->|Reuses| DDB_MOD & S3_MOD & PGV_MOD & OBS_MOD

Blueprint Configuration

Each blueprint is defined by a config.yaml that specifies:

Field Description
name Blueprint identifier (e.g., devassist)
description Human-readable purpose
supervisor_mode Orchestration strategy (supervisor)
agents List of agent YAML files to load
knowledge_base RAG document paths
storage Backend configuration (DynamoDB tables, S3 buckets)

Agent YAML Definition

Agents are defined declaratively with personality, capabilities, and tool access:

# blueprints/devassist/agents/kubernetes.yaml
name: kubernetes
description: "Kubernetes specialist for manifests, debugging, and cluster operations"
model: qwen2.5:32b
system_prompt: |
  You are a Kubernetes expert...
tools:
  - kubectl_explain
  - manifest_generator
capabilities:
  - kubernetes_manifests
  - cluster_debugging
  - helm_charts

Storage Architecture

flowchart LR
    subgraph "Application"
        CORE[Core Backend]
    end

    subgraph "Session Storage"
        DDB[(ScyllaDB Alternator<br/>DynamoDB-compatible)]
    end

    subgraph "Document Storage"
        S3_LS[(LocalStack S3)]
    end

    subgraph "Vector Storage"
        PGV[(PostgreSQL<br/>+ pgvector extension)]
    end

    subgraph "Cache"
        REDIS[(Redis)]
    end

    CORE -->|Sessions, Chat History| DDB
    CORE -->|Knowledge Base, Uploads| S3_LS
    CORE -->|Embeddings, Similarity Search| PGV
    CORE -->|Response Cache, Rate Limits| REDIS
Store Technology Purpose Data
Sessions ScyllaDB Alternator (DynamoDB API) Conversation persistence Chat history, session metadata
Documents LocalStack S3 Knowledge base storage RAG source documents, uploads
Embeddings PostgreSQL + pgvector Vector similarity search Document embeddings, search index
Cache Redis Performance optimization Response cache, rate limiting

Development Modes

The platform supports two development workflows:

Local Development

flowchart LR
    subgraph "Local Machine"
        UV[Uvicorn<br/>FastAPI Backend]
        NEXT[Next.js Dev<br/>Frontend]
        OLL[Ollama<br/>LLM Server]
    end

    subgraph "K8s Cluster (port-forward)"
        PG[PostgreSQL + pgvector]
        SCYLLA[ScyllaDB]
        RED[Redis]
        LS[LocalStack S3]
    end

    UV -->|Port Forward| PG & SCYLLA & RED & LS
    NEXT -->|API Calls| UV
    UV -->|Inference| OLL
Aspect Detail
Backend uvicorn with auto-reload
Frontend next dev with fast refresh
LLM Ollama running natively
Databases Port-forwarded from fleet-infra cluster
Command make dev-local

Kubernetes Development (Skaffold)

flowchart LR
    subgraph "Skaffold"
        SK[Skaffold<br/>File Sync + Hot Reload]
    end

    subgraph "K8s Cluster"
        BE_POD[Backend Pod<br/>FastAPI]
        FE_POD[Frontend Pod<br/>Next.js]
        OLL_POD[Ollama Pod]
        DB_PODS[Database Pods]
    end

    SK -->|Build & Deploy| BE_POD & FE_POD
    SK -->|File Sync| BE_POD & FE_POD
    BE_POD -->|In-Cluster| OLL_POD & DB_PODS
Aspect Detail
Orchestration Skaffold with two configs: backend-only and full
Hot Reload File sync maps local changes into running pods
Networking In-cluster service discovery (no port-forwarding needed)
Command make dev-k8s or skaffold dev -p backend-only

Integration with Fleet Infrastructure

The agentic-ai platform runs on the Kubernetes cluster provisioned by terraform-infra and managed by fleet-infra.

flowchart TB
    subgraph "terraform-infra"
        TF[OpenTofu/Terraform]
    end

    subgraph "fleet-infra (GitOps)"
        FLUX[Flux CD]
        PROM[kube-prometheus-stack]
        PG[PostgreSQL Cluster]
        REDIS[Redis Sentinel]
        SCYLLA[ScyllaDB]
        LS[LocalStack]
        TRAEFIK[Traefik Ingress]
    end

    subgraph "agentic-ai"
        CORE[Core Backend]
        UIFE[Next.js Frontend]
        OLLAMA[Ollama]
    end

    TF -->|Provisions Cluster| FLUX
    FLUX -->|Manages| PROM & PG & REDIS & SCYLLA & LS & TRAEFIK
    CORE -->|Metrics| PROM
    CORE -->|Database| PG & SCYLLA
    CORE -->|Cache| REDIS
    CORE -->|Object Store| LS
    TRAEFIK -->|Ingress| UIFE

Design Decisions

agent-squad over LangChain/LangGraph

The agent-squad framework (AWS Labs) was chosen for its lightweight supervisor pattern and native multi-agent support. Unlike LangChain's chain-based approach, agent-squad provides:

  • Built-in supervisor routing without custom graph definitions
  • YAML-driven agent configuration for rapid iteration
  • Minimal abstraction overhead compared to LangGraph's state machines

Ollama for Local-First Inference

All LLM inference runs locally via Ollama, eliminating cloud API costs and latency:

  • Zero API costs — no OpenAI/Anthropic billing
  • Full data privacy — no data leaves the local network
  • Offline capable — works without internet connectivity
  • Model flexibility — swap models by changing one config line

Supervisor Pattern vs Direct Routing

The supervisor agent dynamically classifies requests using the LLM itself, rather than keyword matching or static rules:

  • Adaptive routing — learns from agent descriptions in YAML
  • Graceful degradation — falls back to general-purpose agent
  • No maintenance — adding a new agent YAML automatically updates routing

Dual Development Modes

Supporting both local (uvicorn) and Kubernetes (Skaffold) development accommodates different workflows:

  • Local mode — fastest iteration, minimal resource usage, ideal for backend changes
  • K8s mode — production-like environment, tests service mesh, ideal for integration testing