Built a Multi-Agent AI System for Customer Support — Here's Everything I Learned

When a bank customer fires off "My debit card hasn't arrived in three weeks!", something deceptively complex needs to happen. The system must understand that this is a complaint — not a question — empathise with the frustration, generate a unique ticket, persist it to a database, and respond with warmth and a reference number. All in under four seconds. All without a human in the loop.

That's what I set out to build for my Applied Generative AI capstone: a production-grade, multi-agent customer support system for a bank. No monolithic prompt. No single-agent loop. A directed graph of specialised AI agents, each owning exactly one job.

This post walks through what I built, every feature it has, the architectural decisions behind it, and — most importantly — why I chose RAG, why I chose MCP, and why those choices matter beyond this project.

The Problem With "Just Use an LLM"

The first instinct for an AI customer support system is to stuff everything into one prompt:

"You are a banking support assistant. 
Classify the message, respond empathetically,
check the database if needed, create tickets if needed..."

This breaks in production for three reasons:

1. Reliability collapses at complexity. A single prompt trying to classify, retrieve data, write to a database, and generate a response will occasionally fail at one step — and you'll have no idea which step failed or why.

2. No auditability. Banking is a regulated domain. When something goes wrong, you need to know: did the classifier misfire? Did the database write fail? Did the LLM hallucinate a policy? A monolithic prompt gives you none of that.

3. No knowledge of your bank. Claude was trained on public internet data. It has zero knowledge of your bank's KYC document requirements, SLA commitments, or card replacement procedures. Ask it, and it will confidently make something up.

The solution is a multi-agent architecture with three key technologies: LangGraph for orchestration, RAG for grounded knowledge, and MCP for tool abstraction. Let me explain each.

What I Built: AI Customer Support

AI Customer Support handles six real customer scenarios with distinct agent paths:

Customer Says	What Happens
"Thank you, your team was amazing!"	Classifier → Positive feedback agent → Personalised warm reply
"My debit card hasn't arrived in 3 weeks"	Classifier → Negative feedback agent → Empathetic reply + auto-created ticket
"What documents do I need for KYC?"	Classifier → Query router → RAG agent → Grounded answer from policy docs
"What is the status of ticket TKT042?"	Classifier → Query router → Ticket lookup → Live DB response
"Please close ticket TKT042"	Classifier → Query router → Close ticket (with ownership validation)
"Why was my forex card blocked abroad?"	Classifier → Query router → RAG → Low confidence → Fallback ticket

Every single one of these paths is handled by a different agent — or combination of agents — each with a specific system prompt, specific tools, and specific failure modes.

2. Tech Stack — Full Rationale

Backend Decision Matrix

Component	Chosen	Alternatives Considered	Why This Choice
LLM	Claude claude-sonnet-4-5	GPT-4o, Gemini 1.5 Pro, Mistral	Reliable structured JSON output (critical for classifier), strong instruction following, low hallucination rate on constrained prompts
Agent Orchestration	LangGraph 0.2.28	LangChain AgentExecutor, CrewAI, AutoGen, custom DAG	Explicit stateful graph with TypedDict — full auditability required for banking; conditional edges map directly to routing logic; no "reasoning loop" ambiguity
LLM Client	Anthropic SDK 0.34.2	LangChain ChatAnthropic	Direct SDK avoids LangChain abstraction overhead; simpler retry logic; cleaner streaming interface
Vector Store	FAISS-CPU 1.9.0	Pinecone, Weaviate, Chroma, Qdrant	Zero infrastructure — single Python process, no network calls; in-process L2 search at <5 ms; deployable without external services; swappable via LangChain interface
Embedding Model	all-MiniLM-L6-v2 (sentence-transformers)	OpenAI text-embedding-ada-002, Cohere embed, bge-large-en	Local execution (no API key, zero cost, no latency); 384 dimensions sufficient for 5-doc corpus; pre-normalised vectors
RAG Framework	LangChain 0.3.1	LlamaIndex, Haystack	FAISS + RecursiveCharacterTextSplitter integration is standard; only used for ingestion utilities — retrieval logic is custom
API Framework	FastAPI 0.115.0	Flask, Django REST, Starlette	Auto OpenAPI docs (swagger at /docs); Pydantic v2 native; async support for future streaming; dependency injection for DB sessions
Database ORM	SQLAlchemy 2.0.35	Tortoise-ORM, Peewee, raw sqlite3	Mature, battle-tested; WAL mode support; easy migration path to PostgreSQL
Database	SQLite (WAL)	PostgreSQL, MySQL	Zero infrastructure for a capstone demo; WAL mode handles two processes (port 8000 and 8001) reading/writing concurrently; trivial migration path
HTTP Client (agent→MCP)	httpx 0.27.2	requests, aiohttp	Sync + async in one library; connection pooling; HTTPX timeout control critical for MCP call reliability
Retry Logic	tenacity 8.x	custom loops, backoff library	Declarative `@retry` decorator with exponential backoff; used on Claude API calls
Validation	Pydantic v2 2.9.2	dataclasses, attrs, marshmallow	V2 performance improvement; native FastAPI integration; used in both API schemas and MCP schemas

Frontend Decision Matrix

Component	Chosen	Alternatives Considered	Why This Choice
UI Framework	React 18	Vue 3, Svelte, SolidJS	Widest ecosystem; hooks model clean for chat + streaming state; team familiarity in most settings
Build Tool	Vite 5	Create React App, Webpack, Parcel	Sub-100 ms HMR; first-class TypeScript; native ESM in dev
State Management	Zustand 5	Redux Toolkit, Jotai, Recoil, Context	Minimal boilerplate; no provider wrapping; perfect for chat message array + customer ID
HTTP	Axios 1.7.7	Fetch API, SWR, React Query	Interceptors for error handling; automatic JSON parsing; cleaner TypeScript generics
Styling	Tailwind CSS 3	styled-components, CSS Modules, Emotion	Utility-first prevents style drift; design tokens via config; no JS-in-CSS overhead
Icons	Lucide React	Heroicons, Feather, React Icons	Consistent line-weight; tree-shakeable; named imports
Router	React Router v6	TanStack Router, Next.js	Standard SPA routing; declarative route tree

The Agent Graph: 8 Nodes, 3 Decision Points

The system is a directed acyclic graph built with LangGraph. Think of it as a flowchart where every box is an AI agent and every arrow is a conditional routing decision.

                    Customer Message
                           │
                  ┌────────▼─────────┐
                  │   CLASSIFIER     │  ← Claude reads the message,
                  │     AGENT        │    outputs a JSON label
                  └────────┬─────────┘
                           │
         ┌─────────────────┼──────────────────┐
         ▼                 ▼                  ▼
  [Positive Feedback] [Negative Feedback] [Query Router]
      Agent               Agent               Agent
   (warm reply)      (empathy + ticket)   (intent flags)
                                               │
                           ┌───────────────────┼──────────────────┐
                           ▼                   ▼                  ▼
                    [Close Ticket]       [Ticket Lookup]       [RAG Agent]
                       Agent                Agent             (FAISS search)
                                                                   │
                                                      ┌────────────┴────────────┐
                                                      ▼                         ▼
                                               [Grounded             [Fallback Ticket]
                                                Answer]                  Agent
                                           (conf ≥ 0.55)            (conf < 0.55)
                                                      │
                                                      └──────► [LOG NODE] ──► END

Three things make this powerful:

Every node is replaceable. Want to swap Claude for Gemini on the classifier? Change one function. The rest of the graph is untouched.

State flows through the entire graph. Every node reads from and writes to a shared AgentState TypedDict. By the time the response reaches the user, the state carries the customer ID, classification label, RAG confidence score, ticket ID, route chain, tool called, and list of agents invoked — all assembled automatically.

Routing is explicit, not inferred. In reasoning-loop architectures (like ReAct agents), the LLM decides what to do next. Here, conditional Python functions make those decisions. This matters enormously in banking: you want deterministic, auditable routing — not an LLM that occasionally takes a creative detour.

Feature 1: Intent Classification

Every message enters through the same gate: the Classifier Agent.

Input:  "My debit card hasn't arrived in 3 weeks"
Output: { "classification": "negative_feedback" }

Input:  "Thank you so much for your help!"
Output: { "classification": "positive_feedback" }

Input:  "What is the status of ticket TKT042?"
Output: { "classification": "query" }

The system prompt is deliberately minimal:

"You are a classifier. Categorise the message into exactly one of: positive_feedback, negative_feedback, query. Return only valid JSON: {"classification": "<label>"}"

The constraint to output only JSON is critical. No preamble, no explanation — just the label. This makes the classifier composable: the routing function after it reads state["classification"] and picks the next node. If the LLM outputs natural language instead, the routing breaks. The JSON-only instruction prevents that.

Feature 2: Personalised Positive Feedback

When the classifier returns positive_feedback, the system does something small but meaningful: it fetches the customer's name from the database before generating the reply.

MCP call: GET /mcp/get_customer_profile/CUST001
← { customer_name: "Priya Sharma", segment: "PREMIUM" }

Claude generates: "Thank you for your kind words, Priya! We're absolutely 
delighted to hear about your positive experience with SecureBank. Your
satisfaction means the world to us..."

The difference between "Thank you for your feedback!" and "Thank you, Priya!" seems minor. In customer support, it's the entire experience.

Feature 3: Negative Feedback + Auto-Ticket

This is where the system earns its keep. When a complaint comes in, three things happen automatically:

A unique ticket number is generated (TKT042 — 3 uppercase letters + 3 digits = 17.5 million combinations)
The ticket is written to the database with status OPEN
Claude generates an empathetic response that includes the ticket number

Customer: "My debit card replacement still hasn't arrived after 3 weeks!"

System:
→ MCP: generate_ticket_number()    → "TKT042"
→ MCP: create_support_ticket(...)  → ticket persisted, status = OPEN
→ Claude: "We sincerely apologize for the inconvenience, Priya. We've 
   created ticket TKT042 for your debit card replacement. Our team will
   follow up within 24 hours..."

No human had to read the complaint, assess it, or create a ticket. The entire triage happened in under 3 seconds.

Feature 4: RAG-Grounded Policy Q&A

This is the most technically interesting feature — and the one that required the most thought.

When a customer asks "What documents do I need for KYC at SecureBank?", Claude has no idea. It was trained on generic internet data, not your bank's specific policies. If you ask it without grounding, it will either refuse or hallucinate a plausible-sounding answer that may be completely wrong.

RAG (Retrieval-Augmented Generation) solves this. Here's how it works end to end:

Step 1: Ingest Policy Documents (Run Once)

The bank's policy documents are chunked into ~100 overlapping pieces of 512 characters each, then converted to 384-dimensional vectors using a local embedding model (all-MiniLM-L6-v2). These vectors are stored in a FAISS index on disk.

debit_card_policy.txt  (5.4 KB)  ─┐
kyc_guidelines.txt     (6.2 KB)  ─┤   → ~100 text chunks
dispute_resolution.txt (6.9 KB)  ─┤   → 100 × 384-dim vectors
net_banking_reset.txt  (5.8 KB)  ─┤   → FAISS index (saved to disk)
sla_commitments.txt    (6.8 KB)  ─┘

Step 2: Retrieve on Every Query

When the customer asks a question, it's embedded with the same model and compared against all 100 stored vectors using L2 distance. The four closest chunks are returned in ~5 milliseconds.

Step 3: Ground Claude's Answer

Claude never reads the original documents. It only sees the retrieved chunks, with a strict instruction:

"Base your answer STRICTLY on the provided context documents. Do NOT use external knowledge or make assumptions. If the context does not contain enough information, say so."

The result: Claude answers with the bank's actual policies, and cannot invent content it wasn't given.

Step 4: Confidence Gate

This is the part most RAG tutorials skip. Not every customer question is answerable from the knowledge base. A query about "why my forex card was blocked in Singapore" might not appear in any policy document.

The system computes a confidence score after retrieval:

confidence = 0.7 × (top match score) + 0.3 × (average of top 3 scores)

If confidence ≥ 0.55: answer using the retrieved chunks.
If confidence < 0.55: don't guess — create a fallback ticket for a human specialist.

This single gate is what prevents the system from confidently hallucinating answers to questions it doesn't know.

Feature 5: Live Ticket Status Lookup

Customer: "What is the status of ticket TKT042?"

→ Query router extracts "TKT042" from the message
→ MCP: GET /mcp/get_ticket_status/TKT042
   ← { status: "IN_PROGRESS", days_open: 2, sla_breached: false }
→ Claude: "Your ticket TKT042 is currently marked as In Progress. 
   It was opened 2 days ago and is within our SLA commitment."

The SLA breach flag is computed in real time — not stored. If a ticket is OPEN and older than 3 days, sla_breached is true. This means it's always accurate, with no batch job needed.

Feature 6: Ownership-Validated Ticket Closure

A subtle but important security feature: customers can only close their own tickets.

Customer CUST001: "Please close ticket TKT042"

MCP ownership check:
  ticket.customer_id == "CUST001"? ✅
  → UPDATE support_tickets SET status='CLOSED' WHERE ticket_id='TKT042'

Customer CUST002 trying to close CUST001's ticket:
  ticket.customer_id == "CUST002"? ❌ HTTP 403
  → "You can only close tickets that belong to your account."

This business rule lives in exactly one place — the MCP tool layer — and is enforced regardless of which agent calls the tool.

Feature 7: Full Agent Trace on Every Response

Every response the system generates includes a collapsible debug trace in the UI:

{
  "classification": "query",
  "route_taken": "classifier → query_router → ticket_lookup(TKT042:IN_PROGRESS)",
  "rag_confidence": null,
  "tool_called": "get_ticket_status",
  "agents_invoked": ["classifier_node", "query_router_node", "ticket_lookup_node", "log_node"],
  "latency_ms": 2341
}

In a banking context, this isn't optional — it's necessary. When a response is wrong, you need to know if the classifier mislabelled, if the query router extracted the wrong ticket number, or if the LLM ignored the retrieved context.

The Technology Behind It: Three Core Choices

Why LangGraph?

LangGraph treats the agent pipeline as a stateful directed graph. Each node is a function that receives the full agent state and returns an updated state. Routing decisions are explicit Python functions, not LLM inferences.

The alternative is a "reasoning loop" agent (like LangChain's AgentExecutor or AutoGen), where the LLM itself decides what to do next. For general-purpose assistants, that's fine. For banking — where every decision needs to be auditable and every routing path needs to be testable — explicit graph routing is the right choice.

LangGraph also gives you the trace for free. Because state grows as it passes through each node (appending to route_taken and agents_invoked), you get a complete audit trail without any extra instrumentation.

Framework	Routing	State	Auditability	Banking Suitability
LangGraph	Explicit conditional edges	TypedDict, full propagation	Full trace	✅ High
CrewAI	Agent roles with task assignment	Per-agent memory	Medium	⚠️ Medium
AutoGen	Multi-agent conversation	Message history	Low	⚠️ Low
LangChain AgentExecutor	ReAct loop (reason-act)	Tool call history	Low	❌ Low
Custom Python	Manual if/else	Dict passing	Custom	✅ (if well-built)

Why MCP?

MCP (Model Context Protocol) is an architectural pattern where tools — database operations, API calls, business logic — are exposed as HTTP endpoints that agents call over HTTP. Agents never import database models or write SQL.

Here's the concrete benefit. Imagine the negative feedback agent needs to check ownership before updating a ticket. Without MCP, that ownership check logic lives inside the agent. If you add a second agent that also updates tickets, you either duplicate the check or extract it into a shared utility. Then a third agent. Then a fourth.

With MCP, the ownership check lives in one place: the update_ticket_status tool. Every agent that calls this endpoint gets the validation automatically. Change the business rule once; all agents benefit immediately.

 WITHOUT MCP                          WITH MCP
 ─────────────────────────────────    ──────────────────────────────────────
 Agent imports db models directly     Agent calls HTTP endpoint
 Agent writes SQLAlchemy queries      MCP server owns all DB logic
 Agent knows about TicketStatus enum  Agent only knows tool name + JSON shape
 Change DB schema → fix every agent   Change DB schema → fix MCP only
 Can't reuse logic across agents      Any agent can call any tool

The six tools in this system cover the complete lifecycle of a support interaction:

Tool	What It Does
`generate_ticket_number`	Returns a unique alphanumeric ID
`create_support_ticket`	Persists a new ticket with deduplication
`get_ticket_status`	Fetches live status with SLA breach calculation
`update_ticket_status`	Updates status with ownership validation
`get_customer_profile`	Returns profile with email masking (PII protection)
`log_interaction`	Async audit trail — never blocks the response

Why RAG?

The short answer: because you cannot fine-tune your way to correctness, and you cannot prompt-engineer your way to accuracy on proprietary policies.

Fine-tuning would require retraining the model every time a policy changes. RAG requires adding a .txt file and re-running an ingestion script.

Prompt-stuffing the entire policy corpus into the context window is expensive, slow, and hits token limits. RAG retrieves only the relevant 4 chunks — around 2,000 characters — per query.

The key insight is that RAG separates what the model knows how to do (generate fluent, empathetic text) from what it knows (your specific bank's policies). Claude provides the language skill; the FAISS index provides the knowledge.

System-Level Design

Process Topology

 ┌─────────────────────────────────────────────────────────────────────────────┐
 │  Client Browser                                                              │
 │  React SPA (Vite dev server :5173 / Nginx :80 in prod)                      │
 │                                                                              │
 │  Pages:  Chat  │  Tickets  │  Logs  │  Evaluation                           │
 └──────────────────────────────┬──────────────────────────────────────────────┘
                                │  HTTP/JSON  (REST over localhost or CDN)
                                ▼
 ┌─────────────────────────────────────────────────────────────────────────────┐
 │  FastAPI Application Server  (:8000)                                         │
 │                                                                              │
 │  POST /api/query          →  routers/query.py  →  run_graph()               │
 │  GET  /api/tickets        →  routers/tickets.py                              │
 │  GET  /api/logs           →  routers/logs.py                                 │
 │  GET  /api/evaluation     →  routers/evaluation.py                           │
 │  GET  /health                                                                │
 │                                                                              │
 │  ┌────────────────────────────────────────────────────────────────┐         │
 │  │  LangGraph Orchestrator                                        │         │
 │  │  (compiled StateGraph, singleton at module import)             │         │
 │  │                                                                │         │
 │  │  classifier_node → route_after_classification                  │         │
 │  │       ├── positive_feedback_node                               │         │
 │  │       ├── negative_feedback_node                               │         │
 │  │       └── query_router_node → route_after_query                │         │
 │  │               ├── ticket_lookup_node                           │         │
 │  │               ├── close_ticket_node                            │         │
 │  │               └── rag_node → route_after_rag                   │         │
 │  │                       ├── log_node → END                       │         │
 │  │                       └── fallback_ticket_node → log_node      │         │
 │  └───────────────────┬────────────────────────┬───────────────────┘         │
 │                      │ httpx                  │ local                       │
 │                      ▼                        ▼                             │
 │            MCP Server calls            FAISS Index                          │
 │          (localhost:8001)              (rag/faiss_index/)                   │
 └──────────────────────┬─────────────────────────────────────────────────────┘
                        │  HTTP/JSON  (inter-process, localhost)
                        ▼
 ┌─────────────────────────────────────────────────────────────────────────────┐
 │  MCP Tool Server  (:8001)  — FastAPI app (mcp_app)                          │
 │                                                                              │
 │  POST /mcp/generate_ticket_number                                            │
 │  POST /mcp/create_support_ticket                                             │
 │  GET  /mcp/get_ticket_status/{ticket_id}                                     │
 │  POST /mcp/update_ticket_status                                              │
 │  GET  /mcp/get_customer_profile/{customer_id}                                │
 │  POST /mcp/log_interaction                                                   │
 │                                                                              │
 │  mcp/server.py  →  mcp/tools.py  →  db/crud.py                              │
 └──────────────────────────────────────┬──────────────────────────────────────┘
                                        │  SQLAlchemy ORM
                                        ▼
 ┌─────────────────────────────────────────────────────────────────────────────┐
 │  SQLite  (banking_support.db, WAL mode)                                      │
 │                                                                              │
 │  Tables: customers · support_tickets · interaction_logs                      │
 └─────────────────────────────────────────────────────────────────────────────┘

 ┌─────────────────────────────────────────────────────────────────────────────┐
 │  Anthropic API  (external, HTTPS)                                            │
 │  claude-sonnet-4-5 — called from llm_client.py by every agent node          │
 └─────────────────────────────────────────────────────────────────────────────┘

Request Lifecycle — Sequence Diagram

Browser          FastAPI(:8000)    LangGraph         MCP(:8001)    Anthropic API
  │                   │               │                  │               │
  │ POST /api/query   │               │                  │               │
  │ ─────────────────►│               │                  │               │
  │                   │ run_graph()   │                  │               │
  │                   │──────────────►│                  │               │
  │                   │               │ classify_message │               │
  │                   │               │─────────────────────────────────►│
  │                   │               │◄─────────────────────────────────│
  │                   │               │ (label: "negative_feedback")      │
  │                   │               │                  │               │
  │                   │               │ POST /mcp/generate_ticket_number  │
  │                   │               │─────────────────►│               │
  │                   │               │◄─────────────────│               │
  │                   │               │ ("TKT042")        │               │
  │                   │               │                  │               │
  │                   │               │ POST /mcp/create_support_ticket   │
  │                   │               │─────────────────►│               │
  │                   │               │◄─────────────────│               │
  │                   │               │ (ticket created)  │               │
  │                   │               │                  │               │
  │                   │               │ generate empathy reply            │
  │                   │               │─────────────────────────────────►│
  │                   │               │◄─────────────────────────────────│
  │                   │               │                  │               │
  │                   │               │ POST /mcp/log_interaction (async) │
  │                   │               │─────────────────►│               │
  │                   │               │ (fire-and-forget) │               │
  │                   │               │                  │               │
  │                   │◄──────────────│ final state       │               │
  │                   │ build response│                  │               │
  │◄──────────────────│               │                  │               │
  │ { response, trace }               │                  │               │

Thread Safety Model

 Port 8000 (FastAPI)          Port 8001 (MCP Server)
       │                              │
       │  Both processes share        │
       └──────────────┬───────────────┘
                      │
              banking_support.db
              (SQLite, WAL mode)

 WAL (Write-Ahead Log) mode allows:
 - Multiple concurrent READERS
 - One WRITER at a time
 - Readers don't block the writer
 - busy_timeout = 5000ms prevents deadlocks

 PRAGMA journal_mode = WAL;   ← set at connection time in db/database.py
 PRAGMA busy_timeout = 5000;  ← 5s timeout before "database locked" error

 FastAPI + MCP Server each use scoped_session per request thread.
 Connection returned to pool after each request — no cross-process leaks.

Module Dependency Graph

main.py
  └── routers/query.py
        └── agents/orchestrator.py  ← compiled LangGraph DAG
              ├── agents/classifier_agent.py
              │     └── agents/llm_client.py
              ├── agents/feedback_agent.py
              │     ├── agents/llm_client.py
              │     └── httpx → MCP :8001
              ├── agents/query_router_agent.py
              │     └── agents/llm_client.py
              ├── agents/rag_agent.py
              │     ├── rag/retriever.py
              │     │     └── rag/faiss_index/ (disk)
              │     ├── agents/llm_client.py
              │     └── httpx → MCP :8001
              ├── agents/ticket_agent.py
              │     ├── agents/llm_client.py
              │     └── httpx → MCP :8001
              └── (log_node inline in orchestrator.py)
                    └── httpx → MCP :8001

mcp/server.py (mcp_app)
  └── mcp/tools.py
        └── db/crud.py
              └── db/models.py
                    └── db/database.py (SQLite engine)

rag/ingest.py (one-time script)
  └── rag/documents/*.txt
  └── LangChain loaders + splitter
  └── sentence-transformers (HuggingFaceEmbeddings)
  └── FAISS → rag/faiss_index/ (disk)

Architecture Diagram

High-Level Architecture

╔═══════════════════════════════════════════════════════════════════════════╗
║                         AI Customer Support PLATFORM                           ║
║                                                                           ║
║  ┌─────────────────────────────────────────────────────────────────────┐ ║
║  │                    PRESENTATION LAYER                                │ ║
║  │                                                                      │ ║
║  │   ┌──────────┐   ┌──────────┐   ┌──────────┐   ┌───────────────┐  │ ║
║  │   │   Chat   │   │ Tickets  │   │   Logs   │   │  Evaluation   │  │ ║
║  │   │   Page   │   │   Page   │   │   Page   │   │   Dashboard   │  │ ║
║  │   └────┬─────┘   └─────┬────┘   └────┬─────┘   └──────┬────────┘  │ ║
║  │        └───────────────┴─────────────┴─────────────────┘           │ ║
║  │                              Axios + Zustand                        │ ║
║  └──────────────────────────────────┬───────────────────────────────────┘ ║
║                                     │ REST/JSON                          ║
║  ┌──────────────────────────────────▼───────────────────────────────────┐ ║
║  │                       API GATEWAY LAYER                              │ ║
║  │                    FastAPI  :8000                                    │ ║
║  │   POST /api/query  │  GET /api/tickets  │  GET /api/logs             │ ║
║  └──────────────────────────────────┬───────────────────────────────────┘ ║
║                                     │                                     ║
║  ┌──────────────────────────────────▼───────────────────────────────────┐ ║
║  │                    ORCHESTRATION LAYER                               │ ║
║  │                 LangGraph StateGraph                                 │ ║
║  │                                                                      │ ║
║  │  ┌─────────────┐                                                    │ ║
║  │  │  CLASSIFIER │─── positive ──►┌──────────────────────┐           │ ║
║  │  │    NODE     │─── negative ──►│ POSITIVE FEEDBACK    │           │ ║
║  │  │  (Claude)   │                │ NODE (Claude)        │           │ ║
║  │  └──────┬──────┘                └──────────────────────┘           │ ║
║  │         │ query                                                      │ ║
║  │         │                       ┌──────────────────────┐           │ ║
║  │         ▼                       │ NEGATIVE FEEDBACK    │           │ ║
║  │  ┌─────────────┐                │ NODE (Claude + MCP)  │           │ ║
║  │  │   QUERY     │── close+num ──►└──────────────────────┘           │ ║
║  │  │   ROUTER    │                ┌──────────────────────┐           │ ║
║  │  │   NODE      │── num only ───►│  CLOSE TICKET NODE   │           │ ║
║  │  │  (Claude)   │── no number ──►│  (Claude + MCP)      │           │ ║
║  │  └─────────────┘                ├──────────────────────┤           │ ║
║  │                                 │  TICKET LOOKUP NODE  │           │ ║
║  │                                 │  (Claude + MCP)      │           │ ║
║  │                                 └──────────────────────┘           │ ║
║  │                                                                      │ ║
║  │                           ┌──────────────────────┐                 │ ║
║  │                           │    RAG NODE          │──conf≥0.55──►  │ ║
║  │                           │  (FAISS + Claude)    │                 │ ║
║  │                           └──────────┬───────────┘                 │ ║
║  │                                      │ conf<0.55                   │ ║
║  │                                      ▼                             │ ║
║  │                           ┌──────────────────────┐                 │ ║
║  │                           │  FALLBACK TICKET     │                 │ ║
║  │                           │  NODE (Claude + MCP) │                 │ ║
║  │                           └──────────────────────┘                 │ ║
║  │                                      │ all paths                   │ ║
║  │                                      ▼                             │ ║
║  │                           ┌──────────────────────┐                 │ ║
║  │                           │     LOG NODE         │──────► END      │ ║
║  │                           │  (MCP async)         │                 │ ║
║  │                           └──────────────────────┘                 │ ║
║  └───────────────────────────────────────┬───────────────────────────-┘ ║
║                  ┌────────────────────────┤                              ║
║                  │                        │                              ║
║  ┌───────────────▼───────┐  ┌─────────────▼──────────────────────────┐  ║
║  │   KNOWLEDGE LAYER     │  │       TOOL LAYER (MCP :8001)           │  ║
║  │                       │  │                                         │  ║
║  │  FAISS Vector Index   │  │  generate_ticket_number                 │  ║
║  │  all-MiniLM-L6-v2     │  │  create_support_ticket                  │  ║
║  │  384-dim embeddings   │  │  get_ticket_status                      │  ║
║  │                       │  │  update_ticket_status                   │  ║
║  │  5 policy documents:  │  │  get_customer_profile                   │  ║
║  │  · debit_card_policy  │  │  log_interaction                        │  ║
║  │  · kyc_guidelines     │  │                                         │  ║
║  │  · dispute_resolution │  │  Business Rules:                        │  ║
║  │  · net_banking_reset  │  │  · Ownership validation                 │  ║
║  │  · sla_commitments    │  │  · SLA breach detection                 │  ║
║  │                       │  │  · Email masking (PII)                  │  ║
║  └───────────────────────┘  │  · Ticket deduplication                 │  ║
║                             └──────────────────────┬────────────────-─┘  ║
║                                                    │ SQLAlchemy ORM      ║
║                             ┌──────────────────────▼────────────────────┐ ║
║                             │           PERSISTENCE LAYER               │ ║
║                             │          SQLite (WAL mode)                │ ║
║                             │  customers │ support_tickets │ logs       │ ║
║                             └────────────────────────────────────────────┘ ║
║                                                                           ║
║  ┌────────────────────────────────────────────────────────────────────┐  ║
║  │                      EXTERNAL SERVICES                              │  ║
║  │              Anthropic API (claude-sonnet-4-5)                      │  ║
║  │              Called by: classifier, feedback, rag, ticket agents    │  ║
║  └────────────────────────────────────────────────────────────────────┘  ║
╚═══════════════════════════════════════════════════════════════════════════╝

Data Layer Architecture

┌─────────────────────────────────────────────────────────────────────────┐
│                        DATA ARCHITECTURE                                 │
│                                                                          │
│  STRUCTURED DATA (SQLite)          VECTOR DATA (FAISS on disk)          │
│  ─────────────────────             ─────────────────────────────        │
│                                                                          │
│  customers                         rag/faiss_index/                     │
│  ┌──────────────────────┐         ┌─────────────────────────────┐      │
│  │ customer_id (PK)     │         │  index.faiss                │      │
│  │ customer_name        │         │  (FAISS FlatL2 index)       │      │
│  │ segment              │         │  ~100 384-dim float vectors  │      │
│  │ email (maskable)     │         └─────────────────────────────┘      │
│  │ account_since        │                                               │
│  │ preferred_lang       │         ┌─────────────────────────────┐      │
│  │ created_at           │         │  index.pkl                  │      │
│  └──────────┬───────────┘         │  (LangChain FAISS wrapper)  │      │
│             │ 1:N                 │  doc metadata + content     │      │
│  support_tickets                  └─────────────────────────────┘      │
│  ┌──────────────────────┐                                               │
│  │ ticket_id (PK)       │         SOURCE DOCUMENTS                     │
│  │ customer_id (FK)     │         5 × .txt policy files (~31 KB)       │
│  │ issue_text           │         Chunked: ~100 × 512-char pieces      │
│  │ status               │         Embedded once, stored in FAISS       │
│  │ sla_breached         │                                               │
│  │ created_at           │                                               │
│  │ updated_at           │                                               │
│  └──────────┬───────────┘                                               │
│             │ 1:N (nullable)                                            │
│  interaction_logs                                                        │
│  ┌──────────────────────┐                                               │
│  │ id (PK auto)         │                                               │
│  │ customer_id (FK)     │                                               │
│  │ message              │                                               │
│  │ classification       │                                               │
│  │ route_taken          │                                               │
│  │ response_text        │                                               │
│  │ tool_called          │                                               │
│  │ rag_confidence       │                                               │
│  │ ticket_id (FK null)  │                                               │
│  │ created_at           │                                               │
│  └──────────────────────┘                                               │
└─────────────────────────────────────────────────────────────────────────┘

RAG Pipeline — Deep Dive

The Two Phases

RAG has two completely separate phases. Ingestion happens once. Retrieval happens on every query.

 ┌─────────────────────────────────────────────────────────────────────┐
 │  PHASE 1 — INGESTION                                                │
 │  Run once: python -m rag.ingest                                     │
 │                                                                     │
 │   Policy documents (.txt)                                           │
 │         │                                                           │
 │         ▼  Split into chunks                                        │
 │         │  RecursiveCharacterTextSplitter                           │
 │         │  chunk_size = 512 chars · overlap = 128 chars             │
 │         │                                                           │
 │         ▼  ~100 overlapping text chunks                             │
 │         │                                                           │
 │         ▼  Convert each chunk to a vector                           │
 │         │  HuggingFaceEmbeddings (all-MiniLM-L6-v2)                │
 │         │  → 384 floating-point numbers per chunk                   │
 │         │                                                           │
 │         ▼  Build the FAISS index                                    │
 │         │  FAISS.from_documents(chunks, embeddings)                 │
 │         │                                                           │
 │         └──▶  Saved to disk                                         │
 │               rag/faiss_index/index.faiss                           │
 │               rag/faiss_index/index.pkl                             │
 └─────────────────────────────────────────────────────────────────────┘

 ┌─────────────────────────────────────────────────────────────────────┐
 │  PHASE 2 — RETRIEVAL                                                │
 │  Runs on every customer query                                       │
 │                                                                     │
 │   Customer question: "What documents do I need for KYC?"           │
 │         │                                                           │
 │         ▼  Embed the question with the same model                  │
 │         │  all-MiniLM-L6-v2 → [0.21, -0.44, 0.87, ...]            │
 │         │                                                           │
 │         ▼  Search the FAISS index                                   │
 │         │  similarity_search_with_score(query_vector, k=4)         │
 │         │  → returns top 4 closest chunks + their L2 distances     │
 │         │                                                           │
 │         ▼  Convert distances → similarity scores                   │
 │         │  score = 1 / (1 + L2_distance)  →  range (0, 1]         │
 │         │                                                           │
 │         ▼  Compute confidence                                       │
 │         │  conf = 0.7 × top1_score + 0.3 × avg(top3_scores)       │
 │         │                                                           │
 │         ├── conf ≥ 0.55  ──▶  Send chunks to Claude → answer      │
 │         │                                                           │
 │         └── conf < 0.55  ──▶  Fallback ticket created              │
 └─────────────────────────────────────────────────────────────────────┘

How Embeddings Work

An embedding model converts text into a list of numbers (a vector) that captures the meaning of the text. Texts with similar meaning produce numerically similar vectors.

 Text                                   Vector (384 numbers)
 ─────────────────────────────────────────────────────────────────────
 "What documents for KYC?"          →  [0.21, -0.44,  0.87,  0.13, ...]
 "KYC requires Aadhaar or PAN..."   →  [0.23, -0.41,  0.85,  0.15, ...]
                                         ↑ very similar numbers = same topic ✅

 "Cards dispatched via post..."     →  [-0.31, 0.72, -0.12, -0.55, ...]
                                         ↑ very different numbers = different topic ❌

FAISS finds the KYC chunk as the closest match because its vector is numerically close to the query vector. This is pure math — no LLM involved at this step.

Why We Chunk Documents

 Full document (150 lines) — too large
 ┌───────────────────────────────────────────────────────────────┐
 │  Debit Card Policy                                            │
 │  Section 1: Issuance...                                       │ ← about issuance
 │  Section 2: Replacement...                                    │ ← about replacement
 │  Section 3: Blocking...                                       │ ← about blocking
 └───────────────────────────────────────────────────────────────┘
  One vector for all topics → weak signal for any specific question

 After chunking (512 chars each, 128 overlap)
 ┌─────────────────┐  ┌─────────────────┐  ┌─────────────────┐
 │  Chunk 1        │  │  Chunk 2        │  │  Chunk 3        │
 │  (Issuance)     │  │  (Replacement)  │  │  (Blocking)     │
 └─────────────────┘  └─────────────────┘  └─────────────────┘
  → high score for      → high score for      → high score for
    issuance query        replacement query     blocking query

Why overlap of 128 characters? Sentences that span a chunk boundary are not lost. The last 128 characters of each chunk are repeated as the first 128 of the next, preserving context continuity.

Confidence Scoring — The Formula

Step 1: Convert L2 distance to similarity score
        similarity = 1 / (1 + L2_distance)
        L2_distance = 0   → similarity = 1.0  (identical)
        L2_distance = 1   → similarity = 0.5
        L2_distance = ∞   → similarity = 0.0  (completely different)

Step 2: Compute weighted confidence
        confidence = 0.7 × top1_similarity
                   + 0.3 × average(top1, top2, top3 similarities)

Using only top1 can be misleading — if one chunk scores 0.70 but chunks 2, 3, 4 all score 0.30, the question may be only partially covered. Blending with avg_top3 (30% weight) penalises poor broader coverage.

Threshold Calibration

 Score distribution for this corpus (all-MiniLM-L6-v2):

  0.40          0.50    0.55    0.60          0.70
   │             │       │       │             │
 ──┼─────────────┼───────┼───────┼─────────────┼──
   │  off-topic  │  gap  │       │  on-topic   │
   │  queries    │       │  THRESHOLD          │
                                 ↑
                               0.55

Query	Confidence	Result
Debit card replacement procedures	0.69	Answered by RAG
Net banking password reset	0.67	Answered by RAG
KYC document requirements	0.64	Answered by RAG
SLA response time commitments	0.61	Answered by RAG
Forex card blocked internationally	0.52	Fallback ticket created
Completely off-topic question	0.40	Fallback ticket created

The original threshold of 0.65 was too high — KYC queries scored 0.635 and were incorrectly sent to fallback. After empirical testing, 0.55 was the natural split for this corpus.

What Claude Actually Receives

Claude never reads the raw .txt files. It only sees the retrieved chunks:

System:
  "Base your answer strictly on the provided context.
   Do NOT use external knowledge or make assumptions.
   If the context does not contain enough information, say exactly:
   'I don't have enough information in our knowledge base to answer this.'"

User:
  CONTEXT DOCUMENTS:

  [Document 1 — Source: kyc guidelines]
  To apply for KYC, customers must submit a valid government-issued
  photo ID such as Aadhaar, PAN, or Passport. Address proof must be
  less than 3 months old...

  [Document 2 — Source: kyc guidelines]
  Digital KYC is available via the SecureBank mobile app for
  Aadhaar-linked accounts...

  ────────────────────────────────────────────────────────────

  CUSTOMER QUESTION:
  What documents do I need for KYC?

  Answer using ONLY the context above.

This grounding constraint is what prevents hallucination. Claude cannot say "you need a utility bill" unless that is in the retrieved chunks.

MCP — Deep Dive

Architecture: Three Layers

 ┌──────────────────────────────────────────────────────────────────────┐
 │  AGENT LAYER  (orchestrator.py · ticket_agent.py · rag_agent.py)    │
 │                                                                      │
 │  Agents know:  tool name + JSON input shape                          │
 │  Agents do:    httpx.post("http://localhost:8001/mcp/tool_name")     │
 │  Agents don't: know SQL, ORM models, or business rules               │
 └──────────────────────────────┬───────────────────────────────────────┘
                                │  HTTP / JSON  (:8001)
 ┌──────────────────────────────▼───────────────────────────────────────┐
 │  MCP SERVER LAYER  (mcp/server.py)                                   │
 │                                                                      │
 │  Receives HTTP requests, validates with Pydantic                     │
 │  Calls tool functions in mcp/tools.py                                │
 │  Returns typed JSON responses                                        │
 │  Handles HTTP errors (404 ticket not found, 403 ownership error)     │
 └──────────────────────────────┬───────────────────────────────────────┘
                                │  Python function calls
 ┌──────────────────────────────▼───────────────────────────────────────┐
 │  TOOL LOGIC LAYER  (mcp/tools.py · db/crud.py)                      │
 │                                                                      │
 │  All business rules live here:                                       │
 │    - Ownership checks before ticket updates                          │
 │    - SLA breach calculation (OPEN > 3 days)                          │
 │    - Email masking before returning customer profile                 │
 │    - Ticket ID deduplication on creation                             │
 │    - Auto-create anonymous customer for demo flows                   │
 └──────────────────────────────┬───────────────────────────────────────┘
                                │  SQLAlchemy ORM
 ┌──────────────────────────────▼───────────────────────────────────────┐
 │  DATABASE LAYER  (SQLite)                                            │
 └──────────────────────────────────────────────────────────────────────┘

The 6 MCP Tools — Full Specification

Tool 1: generate_ticket_number

POST /mcp/generate_ticket_number
Input:   none
Output:  { "ticket_number": "TKT042" }
Logic:   3 random uppercase letters + 3 random digits
         26³ × 10³ = 17,576,000 possible combinations

Tool 2: create_support_ticket

POST /mcp/create_support_ticket
Input:   { "customer_id": "CUST001", "issue_text": "...", "ticket_id": "TKT042" }
Output:  { "ticket_id": "TKT042", "status": "OPEN", "created_at": "..." }
Logic:   1. Ticket ID collision → regenerate
         2. Unknown customer_id → auto-create customer record
         3. Persist ticket with status OPEN

Tool 3: get_ticket_status

GET /mcp/get_ticket_status/{ticket_id}
Output:  { "status": "IN_PROGRESS", "days_open": 3, "sla_breached": false, ... }
Logic:   days_open = (now - created_at).days
         sla_breached = status == OPEN AND days_open > 3
         Returns 404 if ticket not found

Tool 4: update_ticket_status

POST /mcp/update_ticket_status
Input:   { "ticket_id": "TKT042", "customer_id": "CUST001", "new_status": "CLOSED" }
Output:  { "old_status": "OPEN", "new_status": "CLOSED", "updated_at": "..." }
Errors:  404 TICKET_NOT_FOUND | 403 TICKET_OWNERSHIP_ERROR

Tool 5: get_customer_profile

GET /mcp/get_customer_profile/{customer_id}
Output:  { "customer_name": "Priya Sharma", "segment": "PREMIUM",
           "email": "p***@gmail.com" }    ← masked for PII
Logic:   Never fails — returns "Valued Customer" / "RETAIL" if unknown

Tool 6: log_interaction

POST /mcp/log_interaction
Input:   { customer_id, message, classification, route_taken,
           response_text, tool_called, rag_confidence, ticket_id }
Output:  { "log_id": 47, "status": "LOGGED" }
Design:  Fire-and-forget — failure silently swallowed, never blocks user response

End-to-End: How an Agent Calls MCP

Here is the full journey for "Please close ticket TKT042":

 1. AGENT CODE (close_ticket_node in ticket_agent.py)
    result = httpx.post(
        "http://localhost:8001/mcp/update_ticket_status",
        json={"ticket_id": "TKT042", "customer_id": "CUST001", "new_status": "CLOSED"},
        timeout=10.0
    )

 2. MCP SERVER (mcp/server.py)
    @mcp_app.post("/mcp/update_ticket_status")
    def update_ticket_status(data: UpdateTicketStatusInput, db: Session):
        try:
            return _update_ticket_status(db, data)
        except ValueError as e:
            if "TICKET_NOT_FOUND" in str(e):   raise HTTPException(404, ...)
            if "TICKET_OWNERSHIP_ERROR" in str(e): raise HTTPException(403, ...)

 3. TOOL LOGIC (mcp/tools.py)
    ticket = get_ticket(db, "TKT042")
    if ticket.customer_id != "CUST001":
        raise ValueError("TICKET_OWNERSHIP_ERROR:TKT042")
    updated = db_update_ticket_status(db, "TKT042", TicketStatus.CLOSED)
    return UpdateTicketStatusOutput(old_status="OPEN", new_status="CLOSED", ...)

 4. BACK IN THE AGENT
    → Claude call: "Confirm closure of TKT042 for Priya..."
    ← "Dear Priya, your ticket TKT042 has been successfully closed..."

 5. USER SEES
    "Dear Priya, your ticket TKT042 has been successfully closed.
     Thank you for banking with SecureBank."

Database Design

Schema

CREATE TABLE customers (
    customer_id    VARCHAR(50)   PRIMARY KEY,
    customer_name  VARCHAR(100)  NOT NULL,
    segment        VARCHAR(20)   DEFAULT 'RETAIL'
                   CHECK (segment IN ('RETAIL', 'PREMIUM', 'CORPORATE', 'STUDENT')),
    email          VARCHAR(150),
    account_since  DATETIME,
    preferred_lang VARCHAR(10)   DEFAULT 'en',
    created_at     DATETIME      DEFAULT CURRENT_TIMESTAMP
);

CREATE TABLE support_tickets (
    ticket_id    VARCHAR(6)    PRIMARY KEY,
    customer_id  VARCHAR(50)   NOT NULL
                 REFERENCES customers(customer_id) ON DELETE RESTRICT,
    issue_text   TEXT          NOT NULL,
    status       VARCHAR(20)   DEFAULT 'OPEN'
                 CHECK (status IN ('OPEN', 'IN_PROGRESS', 'RESOLVED', 'CLOSED')),
    sla_breached BOOLEAN       DEFAULT FALSE,
    created_at   DATETIME      DEFAULT CURRENT_TIMESTAMP,
    updated_at   DATETIME      DEFAULT CURRENT_TIMESTAMP
);

CREATE TABLE interaction_logs (
    id             INTEGER       PRIMARY KEY AUTOINCREMENT,
    customer_id    VARCHAR(50)   REFERENCES customers(customer_id),
    message        TEXT          NOT NULL,
    classification VARCHAR(30)   CHECK (classification IN
                     ('positive_feedback', 'negative_feedback', 'query', 'unknown')),
    route_taken    VARCHAR(300),
    response_text  TEXT,
    tool_called    VARCHAR(100),
    rag_confidence FLOAT         CHECK (rag_confidence IS NULL OR
                                        (rag_confidence >= 0.0 AND rag_confidence <= 1.0)),
    ticket_id      VARCHAR(6)    REFERENCES support_tickets(ticket_id),
    created_at     DATETIME      DEFAULT CURRENT_TIMESTAMP
);

SLA Breach Logic — Why It's Computed, Not Stored

sla_breached is never written as a derived value from a batch job. It's computed at query time:

# In mcp/tools.py — get_ticket_status
days_open    = (datetime.now() - ticket.created_at).days
sla_breached = ticket.status == TicketStatus.OPEN and days_open > 3

If stored, it would be stale the moment a ticket ages past day 3. Computed at query time from created_at, it is always accurate with zero maintenance.

WAL Mode — Why Two Processes Need It

 Standard SQLite:   Writer locks entire file → readers BLOCKED during write
                    FastAPI + MCP Server sharing the file = intermittent lock errors

 WAL Mode:          Writer appends to .wal file
                    Readers see last committed snapshot
                    Reader and writer proceed simultaneously
                    → No lock contention between the two services

 PRAGMA journal_mode = WAL;   ← set at connection time
 PRAGMA busy_timeout = 5000;  ← 5s before "database locked" error

Six Complete Data Flows

Path 1: Positive Feedback

Input: "Thanks for sorting out my account issue so quickly!"

classifier_node   → "positive_feedback"
positive_feedback → MCP: get_customer_profile/CUST001
                  → Claude: warm reply using "Priya Sharma"
log_node          → MCP: log_interaction (async)

Latency: ~2s  |  Side effects: none (read-only)

Path 2: Negative Feedback + Auto-Ticket

Input: "My debit card replacement still hasn't arrived after 3 weeks!"

classifier_node    → "negative_feedback"
negative_feedback  → MCP: generate_ticket_number()          → "TKT042"
                   → MCP: create_support_ticket(...)         → OPEN ticket
                   → MCP: get_customer_profile/CUST001       → "Priya Sharma"
                   → Claude: empathy reply with TKT042
log_node           → MCP: log_interaction (async)

Latency: ~3s  |  Side effects: new row in support_tickets

Path 3: RAG Query — High Confidence

Input: "What documents do I need for KYC at SecureBank?"

classifier_node    → "query"
query_router_node  → { has_ticket_number: false, close_intent: false }
rag_node           → FAISS search, scores: [0.69, 0.64, 0.61, 0.48]
                     confidence = 0.7×0.69 + 0.3×avg(0.69, 0.64, 0.61) = 0.677 ✅
                   → Claude: grounded answer from kyc_guidelines chunks
log_node           → MCP: log_interaction(rag_confidence=0.677)

Latency: ~3-4s  |  Side effects: none

Path 4: RAG Fallback — Low Confidence

Input: "Why was my forex card declined in Singapore?"

classifier_node    → "query"
query_router_node  → { has_ticket_number: false }
rag_node           → FAISS scores: [0.53, 0.47, 0.41, 0.38]
                     confidence = 0.512 < 0.55 ❌ → fallback
fallback_ticket    → MCP: generate_ticket_number()   → "TKT043"
                   → MCP: create_support_ticket(...)  → OPEN ticket
                   → Claude: apology + specialist escalation message
log_node           → MCP: log_interaction(rag_confidence=0.512)

Latency: ~4s  |  Side effects: new fallback ticket for specialist

Path 5: Ticket Status Lookup

Input: "What is the status of ticket TKT042?"

classifier_node    → "query"
query_router_node  → { has_ticket_number: true, ticket_number: "TKT042", close_intent: false }
ticket_lookup_node → MCP: get_ticket_status/TKT042
                     ← { status: "IN_PROGRESS", days_open: 2, sla_breached: false }
                   → Claude: formats status reply
log_node           → MCP: log_interaction

Latency: ~2s  |  Side effects: none (read-only)

Path 6: Ticket Closure (Ownership-Validated)

Input: "Please close my ticket TKT042"

classifier_node  → "query"
query_router_node → { has_ticket_number: true, ticket_number: "TKT042", close_intent: true }
close_ticket_node → MCP: update_ticket_status(TKT042, CUST001, CLOSED)
                    Ownership check: ticket.customer_id == "CUST001" ✅
                    ← { old_status: "OPEN", new_status: "CLOSED" }
                  → Claude: confirmation reply
log_node          → MCP: log_interaction

Failure: CUST002 tries to close CUST001's ticket → HTTP 403 → "You can only close your own tickets."

Latency: ~2s  |  Side effects: ticket status updated to CLOSED

The Dashboard

The React frontend has four pages:

Chat — A standard chat interface where you can switch between customer IDs and see the agent trace panel on every response. The RAG confidence bar is colour-coded: green if the system answered from the knowledge base, orange if it created a fallback ticket.

Tickets — A filterable list of all support tickets with status badges, days open, and SLA indicators.

Logs — A paginated table of every interaction: who sent what, how the system classified it, what route it took, what tool it called, and the final response.

Evaluation — Aggregated metrics over the last N days: classification breakdown, RAG answer rate vs fallback rate, average confidence score, ticket counts by status.

Evaluation: How Do You Know If It Works?

For the LLMOps component of the capstone, the system is evaluated across four dimensions:

1. Classification accuracy — Hold-out set of 30 labelled messages (10 per class). Target: ≥ 90% F1 on all three labels. Ambiguous messages default to query, which routes to the safest path.

2. RAG answer quality — 5 in-domain questions (one per policy document) + 5 deliberately out-of-scope questions. In-domain queries should score ≥ 0.55; out-of-domain should fall below the threshold.

3. Agent routing accuracy — 25 test messages (5 per route). Because routing is deterministic after classification, routing errors are always traceable to classification errors.

4. Response quality — For negative feedback: does the reply acknowledge the issue? Is there an apology? Is the ticket number included? Is the tone professional? All verifiable from the logs table.

Metrics Collected

Every interaction logged to interaction_logs enables:

Classification:  total_interactions, count per label, % breakdown
RAG:             rag_queries_total, rag_answered, rag_fallback,
                 avg_rag_confidence, rag_answer_rate
Tickets:         tickets_created, tickets_by_status, sla_breached_count
Routing:         route_distribution, tool_usage counts

Logs and Debugging View

Per Interaction:
  customer_id     →  who sent the message
  classification  →  label assigned by classifier
  route_taken     →  "classifier→query_router→rag(conf=0.68)"
  tool_called     →  last MCP tool used
  rag_confidence  →  float or null
  ticket_id       →  created or referenced ticket

Debug use cases:
  Classification wrong?  → check 'classification', note which messages misclassify
  RAG not answering?     → filter by route containing 'rag', check rag_confidence
  Ticket not created?    → filter classification = 'negative_feedback', check tool_called
  Routing error?         → compare route_taken to expected path for message type

Non-Functional Properties

Latency Budget

Path	Breakdown	Total
Positive feedback	1×Claude + 1×MCP	~1.5s
Negative feedback + ticket	1×Claude + 3×MCP	~2.5s
RAG query (high confidence)	2×Claude + FAISS(5ms)	~3–4s
RAG fallback + ticket	2×Claude + FAISS + 2×MCP	~4–5s
Ticket lookup	1×Claude + 1×MCP	~2s
Ticket closure	1×Claude + 1×MCP	~2s

Bottleneck: Claude API (1–2s per call). FAISS: ~5ms. MCP: ~5–10ms.

Scalability Path

Step 1: SQLite → PostgreSQL    — change DATABASE_URL in .env only
Step 2: FAISS → Pinecone       — change rag/retriever.py only (LangChain interface)
Step 3: MCP Server → separate  — change MCP_SERVER_URL in .env only
Step 4: FastAPI → multi-worker — uvicorn --workers 4 (graph compiled at import, stateless)

Security Properties

PII Protection:      email masked in get_customer_profile (p***@gmail.com)
Ownership checks:    ticket updates require customer_id to match ticket owner
                     HTTP 403 on mismatch
Input validation:    Pydantic on all MCP inputs; SQLAlchemy parameterised queries
Missing (production): JWT auth, rate limiting per customer_id,
                      audit log immutability, encryption at rest

Tech Stack at a Glance

Layer	Technology	Why
LLM	Claude claude-sonnet-4-5	Reliable JSON output, strong instruction following
Agent graph	LangGraph	Explicit routing, typed state, full auditability
Tool layer	MCP (FastAPI :8001)	Decoupled business logic, one place for rules
Vector search	FAISS + all-MiniLM-L6-v2	Zero infra, ~5ms local search, swappable
API	FastAPI	Auto docs, Pydantic v2, async-ready
Database	SQLite (WAL mode)	Zero infra, two-process safe, Postgres-migratable
Frontend	React + Zustand + Tailwind	Minimal boilerplate, clean state management

Extension Points

Adding a New Agent (Example: Escalation)

Step 1: Add label to classifier_agent.py
  labels = ["positive_feedback", "negative_feedback", "query", "escalation"]

Step 2: Create escalation_agent.py
  def escalation_node(state: AgentState) -> AgentState:
      # MCP: notify supervisor system
      # Claude: generate handoff message
      return { ...state, "final_response": "...", "route_taken": "...→escalation" }

Step 3: Add to orchestrator.py
  g.add_node("escalation_node", escalation_node)
  g.add_edge("escalation_node", "log_node")
  # update route_after_classification to include "escalation"

Step 4: Add MCP tool if needed
  POST /mcp/notify_supervisor in mcp/server.py + mcp/tools.py

No other files change.

Adding a New RAG Document

Step 1: Add .txt file to backend/rag/documents/
Step 2: python -m rag.ingest   (rebuilds the FAISS index)
Step 3: Restart backend

No code changes needed.

Upgrading to Streaming Responses

@router.post("/api/query/stream")
async def stream_query(request: QueryRequest):
    async def generate():
        async for token in claude.stream_tokens(state):
            yield f"data: {token}\n\n"
    return StreamingResponse(generate(), media_type="text/event-stream")

Upgrading to Native Claude Tool Use

response = client.messages.create(
    model="claude-sonnet-4-5",
    messages=[{"role": "user", "content": customer_message}],
    tools=[
        {"name": "create_support_ticket", "description": "...", "input_schema": {...}},
        {"name": "get_ticket_status",     "description": "...", "input_schema": {...}},
        {"name": "update_ticket_status",  "description": "...", "input_schema": {...}},
        {"name": "search_knowledge_base", "description": "...", "input_schema": {...}},
    ]
)
# Claude decides which tool to call
# Tool execution calls the same MCP HTTP endpoints — MCP Server unchanged

The Bigger Lesson

The most important thing I learned from this project is that architecture matters more than model quality for production AI systems.

A single GPT-4 call with a perfect prompt would answer most of these customer messages correctly most of the time. But "most of the time" isn't acceptable in banking. When it fails — and it will fail — you need to know exactly where and why.

The multi-agent architecture with explicit graph routing, MCP tool abstraction, RAG confidence gating, and async audit logging isn't over-engineering. Each piece solves a specific, concrete failure mode:

LangGraph solves the "I don't know what the system did" problem
MCP solves the "business logic is scattered across agents" problem
RAG + confidence gate solves the "LLM hallucinating bank policies" problem
Async logging solves the "adding observability slows down responses" problem

Build the simplest thing that works. Then identify the failure modes. Then add exactly the architecture you need to prevent them. That's the process this project followed — and it's the process I'll carry into every AI system I build after this.

What I Would Do Differently

Streaming responses. The current system waits for the full Claude response before sending anything to the browser. For a 4-second response, that's a blank screen for 4 seconds. FastAPI supports SSE; LangGraph supports streaming nodes. This is the highest-impact UX improvement.

Native Claude tool use. Right now, routing decisions are made by explicit Python conditionals. The more scalable approach is to pass all MCP tools to Claude directly and let the model decide which tool to call. The MCP server stays completely unchanged — only the orchestration logic moves from Python to Claude's reasoning. For 6 tools, explicit routing is cleaner. At 50 tools, native tool use becomes necessary.

PostgreSQL for production. SQLite with WAL mode handles this demo well. At real banking scale — concurrent agents, millions of tickets, audit requirements — you'd migrate to PostgreSQL. The only change is DATABASE_URL in the environment config; SQLAlchemy handles the rest.

Authentication layer. Right now, any client can call POST /api/query with any customer_id. A production deployment needs JWT authentication and per-customer authorisation before any agent runs.

Try It Yourself

# 1. Set your Anthropic API key
cp .env.example .env
# Add ANTHROPIC_API_KEY to .env

# 2. Install and build
cd backend && pip install -r requirements.txt
python -m rag.ingest      # Build the FAISS index from policy docs
python seed_data.py        # Seed demo customers and tickets

# 3. Start three processes
uvicorn mcp.server:mcp_app --port 8001   # Tool layer
uvicorn main:app --port 8000              # API + agents
cd ../ui && npm install && npm run dev    # React UI at localhost:5173

Then open localhost:5173, pick a customer ID, and try:

"Thanks for sorting out my net banking issue!" — positive feedback path
"My debit card hasn't arrived in 3 weeks" — auto-ticket path
"What documents do I need for KYC?" — RAG path
"What is the status of ticket TKT042?" — live lookup path
"Why was my forex card blocked abroad?" — RAG confidence gate → fallback path

Each message shows a different agent path, different tools called, and different confidence scores in the trace panel.

Live Demo

The video below walks through AI Customer Support end-to-end — sending customer messages, watching the agent graph route in real time, seeing tickets created automatically, and observing the RAG confidence score on policy queries.

Source Code

The full source code for AI Customer Support is available on GitHub. Refer to the repository for implementation details, setup instructions, and the complete codebase. github.com/nselvar/AICustomerSupport