Architecture Decision Records¶
Key architectural decisions made during the development of the Agentic AI platform.
ADR Index¶
| ADR | Title | Status | Impact |
|---|---|---|---|
| ADR-001 | agent-squad over LangChain/LangGraph | Accepted | Core framework |
| ADR-002 | Ollama for Local-First LLM Inference | Accepted | LLM runtime |
| ADR-003 | pnpm Monorepo Structure | Accepted | Project organization |
| ADR-004 | Supervisor Routing Pattern | Accepted | Agent orchestration |
| ADR-005 | YAML-Driven Agent Configuration | Accepted | Agent management |
| ADR-006 | pgvector for Embeddings | Accepted | Vector storage |
| ADR-007 | ScyllaDB Alternator for DynamoDB-Compatible Storage | Accepted | Session storage |
| ADR-008 | Skaffold for Kubernetes Development | Accepted | Dev workflow |
| ADR-009 | Module-Blueprint Pattern | Accepted | Infrastructure |
ADR-001: agent-squad over LangChain/LangGraph¶
Accepted
Context¶
The platform requires a multi-agent orchestration framework that supports supervisor-based routing, YAML-driven agent definitions, and lightweight integration with local LLMs. The primary contenders were:
- LangChain/LangGraph — the most popular AI framework ecosystem
- agent-squad (AWS Labs) — a lightweight multi-agent orchestration library
- CrewAI — role-based agent framework
Decision¶
Use agent-squad (AWS Labs) as the core orchestration framework.
Consequences¶
| Positive | Negative |
|---|---|
| Native supervisor routing pattern — no custom graph building | Smaller community and ecosystem than LangChain |
| Lightweight — minimal abstraction layers | Fewer pre-built tools and integrations |
| Easy YAML-to-agent mapping with agent factory pattern | Less documentation and tutorials available |
| AWS-maintained with production usage patterns | Tighter coupling to AWS service patterns |
ADR-002: Ollama for Local-First LLM Inference¶
Accepted
Context¶
The platform needs LLM inference capabilities. Options include cloud APIs (OpenAI, Anthropic) and local inference servers (Ollama, vLLM, llama.cpp).
Decision¶
Use Ollama as the exclusive LLM inference runtime, running models locally.
Consequences¶
| Positive | Negative |
|---|---|
| Zero API costs — no per-token billing | Requires significant local hardware (32GB+ RAM for qwen2.5:32b) |
| Full data privacy — no data leaves the network | Slower inference than cloud GPUs |
| Works offline without internet | Model quality may lag behind latest cloud offerings |
Simple model management (ollama pull) |
Limited to models available in Ollama's registry |
| Easy model switching via config change | No multi-GPU scaling without additional setup |
ADR-003: pnpm Monorepo Structure¶
Accepted
Context¶
The project contains a Python backend (core) and a TypeScript frontend (ui) that share deployment lifecycle. Options: separate repos, npm workspaces, pnpm workspaces, Turborepo.
Decision¶
Use pnpm workspaces for monorepo management with packages under packages/.
Consequences¶
| Positive | Negative |
|---|---|
| Single repo for unified versioning and CI/CD | pnpm is less common than npm (learning curve) |
| Shared configuration (ESLint, Prettier, Husky) | Python package (core) doesn't use pnpm — mixed tooling |
| Efficient disk usage via pnpm's content-addressed store | Workspace hoisting can cause subtle dependency issues |
| Simpler cross-package development workflow | Monorepo CI can be slower without proper caching |
ADR-004: Supervisor Routing Pattern¶
Accepted
Context¶
With multiple specialist agents, the system needs a strategy for routing user requests to the appropriate agent. Options: keyword matching, user-selected routing, LLM-based classification (supervisor pattern).
Decision¶
Use a supervisor agent that leverages the LLM to dynamically classify and route requests to specialist agents.
Consequences¶
| Positive | Negative |
|---|---|
| Adaptive — understands nuanced requests | Adds one extra LLM call per request (latency + compute) |
| No manual routing rules to maintain | Routing errors are possible (wrong agent selected) |
| Adding new agents automatically updates routing | Supervisor prompt engineering required |
| Graceful degradation to general agent | Debugging routing decisions is less transparent |
ADR-005: YAML-Driven Agent Configuration¶
Accepted
Context¶
Agent definitions (personality, model, tools, capabilities) need to be configurable without code changes. Options: Python classes, JSON config, YAML config, database-driven.
Decision¶
Define agents as YAML files under blueprints/<name>/agents/, loaded by an agent factory at startup.
Consequences¶
| Positive | Negative |
|---|---|
| Non-developers can modify agent behavior | YAML lacks type safety — runtime errors possible |
| Version-controlled agent configurations | Complex logic still requires code changes |
| Easy to add/remove/modify agents | No IDE autocompletion for custom YAML schema |
| Blueprint-scoped — different blueprints have different agents | Validation must be implemented separately |
ADR-006: pgvector for Embeddings¶
Accepted
Context¶
The RAG pipeline requires vector storage for document embeddings and similarity search. Options: Pinecone, Weaviate, ChromaDB, pgvector.
Decision¶
Use PostgreSQL with the pgvector extension for vector storage, leveraging the existing PostgreSQL cluster from fleet-infra.
Consequences¶
| Positive | Negative |
|---|---|
| Reuses existing PostgreSQL infrastructure (CNPG from fleet-infra) | Less optimized for vector-only workloads than dedicated vector DBs |
| No additional service to manage | Limited ANN index types compared to Pinecone/Weaviate |
| SQL-based queries — familiar interface | Scaling vector search independently is harder |
| Open-source, self-hosted | Missing some advanced features (hybrid search, metadata filtering) |
ADR-007: ScyllaDB Alternator for DynamoDB-Compatible Storage¶
Accepted
Context¶
The platform needs a key-value store for session management and chat history. The agent-squad framework uses DynamoDB patterns. Options: AWS DynamoDB, ScyllaDB Alternator, local DynamoDB, Redis.
Decision¶
Use ScyllaDB Alternator (DynamoDB-compatible API) running in the fleet-infra cluster for session and history storage.
Consequences¶
| Positive | Negative |
|---|---|
| DynamoDB API compatibility — works with AWS SDK | Not 100% DynamoDB feature parity |
| Self-hosted — no AWS dependency or costs | Additional operational complexity vs managed service |
| Already running in fleet-infra cluster | ScyllaDB has higher resource requirements than simple KV stores |
| Migration path to real DynamoDB if moving to cloud | Alternator-specific quirks may require workarounds |
ADR-008: Skaffold for Kubernetes Development¶
Accepted
Context¶
Developing against Kubernetes requires a workflow that bridges local code changes with in-cluster deployment. Options: Skaffold, Tilt, DevSpace, manual kubectl apply.
Decision¶
Use Skaffold with file-sync for Kubernetes development, providing hot-reload of code changes into running pods.
Consequences¶
| Positive | Negative |
|---|---|
| File sync enables hot-reload without image rebuilds | Skaffold configuration can be complex |
| Multiple profiles (backend-only, full) for different workflows | File sync has edge cases with certain file types |
| Google-maintained with active development | Adds tooling dependency beyond kubectl |
| Integrates with existing Kustomize manifests | Initial setup and debugging can be time-consuming |
ADR-009: Module-Blueprint Pattern¶
Accepted
Context¶
The platform needs infrastructure that is both reusable across different AI assistant configurations and customizable per deployment. Traditional Terraform module composition doesn't capture the domain-specific agent + infra relationship.
Decision¶
Adopt a Module-Blueprint pattern where:
terraform/modules/contains reusable, general-purpose infrastructure modulesblueprints/<name>/composes those modules with agent definitions and knowledge bases
Consequences¶
| Positive | Negative |
|---|---|
| Clear separation of reusable infra from domain config | Two-level indirection can be confusing initially |
| New blueprints reuse existing modules without duplication | Blueprint-specific Terraform overrides add complexity |
| Agent definitions live alongside their infrastructure | Pattern is non-standard — new contributors need onboarding |
| Self-contained deployable units | Testing blueprint compositions requires integration tests |