Architecture Decision Records¶

Key architectural decisions made during the development of the Agentic AI platform.

ADR Index¶

ADR	Title	Status	Impact
ADR-001	agent-squad over LangChain/LangGraph	Accepted	Core framework
ADR-002	Ollama for Local-First LLM Inference	Accepted	LLM runtime
ADR-003	pnpm Monorepo Structure	Accepted	Project organization
ADR-004	Supervisor Routing Pattern	Accepted	Agent orchestration
ADR-005	YAML-Driven Agent Configuration	Accepted	Agent management
ADR-006	pgvector for Embeddings	Accepted	Vector storage
ADR-007	ScyllaDB Alternator for DynamoDB-Compatible Storage	Accepted	Session storage
ADR-008	Skaffold for Kubernetes Development	Accepted	Dev workflow
ADR-009	Module-Blueprint Pattern	Accepted	Infrastructure

ADR-001: agent-squad over LangChain/LangGraph¶

Accepted

Context¶

The platform requires a multi-agent orchestration framework that supports supervisor-based routing, YAML-driven agent definitions, and lightweight integration with local LLMs. The primary contenders were:

LangChain/LangGraph — the most popular AI framework ecosystem
agent-squad (AWS Labs) — a lightweight multi-agent orchestration library
CrewAI — role-based agent framework

Decision¶

Use agent-squad (AWS Labs) as the core orchestration framework.

Consequences¶

Positive	Negative
Native supervisor routing pattern — no custom graph building	Smaller community and ecosystem than LangChain
Lightweight — minimal abstraction layers	Fewer pre-built tools and integrations
Easy YAML-to-agent mapping with agent factory pattern	Less documentation and tutorials available
AWS-maintained with production usage patterns	Tighter coupling to AWS service patterns

ADR-002: Ollama for Local-First LLM Inference¶

Accepted

Context¶

The platform needs LLM inference capabilities. Options include cloud APIs (OpenAI, Anthropic) and local inference servers (Ollama, vLLM, llama.cpp).

Decision¶

Use Ollama as the exclusive LLM inference runtime, running models locally.

Consequences¶

Positive	Negative
Zero API costs — no per-token billing	Requires significant local hardware (32GB+ RAM for qwen2.5:32b)
Full data privacy — no data leaves the network	Slower inference than cloud GPUs
Works offline without internet	Model quality may lag behind latest cloud offerings
Simple model management (`ollama pull`)	Limited to models available in Ollama's registry
Easy model switching via config change	No multi-GPU scaling without additional setup

ADR-003: pnpm Monorepo Structure¶

Accepted

Context¶

The project contains a Python backend (core) and a TypeScript frontend (ui) that share deployment lifecycle. Options: separate repos, npm workspaces, pnpm workspaces, Turborepo.

Decision¶

Use pnpm workspaces for monorepo management with packages under packages/.

Consequences¶

Positive	Negative
Single repo for unified versioning and CI/CD	pnpm is less common than npm (learning curve)
Shared configuration (ESLint, Prettier, Husky)	Python package (core) doesn't use pnpm — mixed tooling
Efficient disk usage via pnpm's content-addressed store	Workspace hoisting can cause subtle dependency issues
Simpler cross-package development workflow	Monorepo CI can be slower without proper caching

ADR-004: Supervisor Routing Pattern¶

Accepted

Context¶

With multiple specialist agents, the system needs a strategy for routing user requests to the appropriate agent. Options: keyword matching, user-selected routing, LLM-based classification (supervisor pattern).

Decision¶

Use a supervisor agent that leverages the LLM to dynamically classify and route requests to specialist agents.

Consequences¶

Positive	Negative
Adaptive — understands nuanced requests	Adds one extra LLM call per request (latency + compute)
No manual routing rules to maintain	Routing errors are possible (wrong agent selected)
Adding new agents automatically updates routing	Supervisor prompt engineering required
Graceful degradation to general agent	Debugging routing decisions is less transparent

ADR-005: YAML-Driven Agent Configuration¶

Accepted

Context¶

Agent definitions (personality, model, tools, capabilities) need to be configurable without code changes. Options: Python classes, JSON config, YAML config, database-driven.

Decision¶

Define agents as YAML files under blueprints/<name>/agents/, loaded by an agent factory at startup.

Consequences¶

Positive	Negative
Non-developers can modify agent behavior	YAML lacks type safety — runtime errors possible
Version-controlled agent configurations	Complex logic still requires code changes
Easy to add/remove/modify agents	No IDE autocompletion for custom YAML schema
Blueprint-scoped — different blueprints have different agents	Validation must be implemented separately

ADR-006: pgvector for Embeddings¶

Accepted

Context¶

The RAG pipeline requires vector storage for document embeddings and similarity search. Options: Pinecone, Weaviate, ChromaDB, pgvector.

Decision¶

Use PostgreSQL with the pgvector extension for vector storage, leveraging the existing PostgreSQL cluster from fleet-infra.

Consequences¶

Positive	Negative
Reuses existing PostgreSQL infrastructure (CNPG from fleet-infra)	Less optimized for vector-only workloads than dedicated vector DBs
No additional service to manage	Limited ANN index types compared to Pinecone/Weaviate
SQL-based queries — familiar interface	Scaling vector search independently is harder
Open-source, self-hosted	Missing some advanced features (hybrid search, metadata filtering)

ADR-007: ScyllaDB Alternator for DynamoDB-Compatible Storage¶

Accepted

Context¶

The platform needs a key-value store for session management and chat history. The agent-squad framework uses DynamoDB patterns. Options: AWS DynamoDB, ScyllaDB Alternator, local DynamoDB, Redis.

Decision¶

Use ScyllaDB Alternator (DynamoDB-compatible API) running in the fleet-infra cluster for session and history storage.

Consequences¶

Positive	Negative
DynamoDB API compatibility — works with AWS SDK	Not 100% DynamoDB feature parity
Self-hosted — no AWS dependency or costs	Additional operational complexity vs managed service
Already running in fleet-infra cluster	ScyllaDB has higher resource requirements than simple KV stores
Migration path to real DynamoDB if moving to cloud	Alternator-specific quirks may require workarounds

ADR-008: Skaffold for Kubernetes Development¶

Accepted

Context¶

Developing against Kubernetes requires a workflow that bridges local code changes with in-cluster deployment. Options: Skaffold, Tilt, DevSpace, manual kubectl apply.

Decision¶

Use Skaffold with file-sync for Kubernetes development, providing hot-reload of code changes into running pods.

Consequences¶

Positive	Negative
File sync enables hot-reload without image rebuilds	Skaffold configuration can be complex
Multiple profiles (backend-only, full) for different workflows	File sync has edge cases with certain file types
Google-maintained with active development	Adds tooling dependency beyond kubectl
Integrates with existing Kustomize manifests	Initial setup and debugging can be time-consuming

ADR-009: Module-Blueprint Pattern¶

Accepted

Context¶

The platform needs infrastructure that is both reusable across different AI assistant configurations and customizable per deployment. Traditional Terraform module composition doesn't capture the domain-specific agent + infra relationship.

Decision¶

Adopt a Module-Blueprint pattern where:

terraform/modules/ contains reusable, general-purpose infrastructure modules
blueprints/<name>/ composes those modules with agent definitions and knowledge bases

Consequences¶

Positive	Negative
Clear separation of reusable infra from domain config	Two-level indirection can be confusing initially
New blueprints reuse existing modules without duplication	Blueprint-specific Terraform overrides add complexity
Agent definitions live alongside their infrastructure	Pattern is non-standard — new contributors need onboarding
Self-contained deployable units	Testing blueprint compositions requires integration tests