InsightForge: AI-Powered Business Intelligence Assistant InsightForge | Nisha Selvarajan

InsightForge

AI-Powered Business Intelligence Assistant — Comprehensive Technical Document & Blog Post

LangChain RAG LangGraph OpenAI GPT-4 FastAPI pgvector Next.js 13 Nivo Charts

Building InsightForge: A Conversational BI Assistant with LangChain, RAG, and LLMs

From raw CSV data to natural-language insights — the architecture decisions, lessons learned, and why each choice matters.

01Problem Statement

1.1 The Business Intelligence Gap

In today's data-centric business environment, organizations across industries accumulate vast amounts of operational and transactional data. However, the ability to transform this raw data into actionable insights remains a significant challenge — particularly for small to medium-sized enterprises (SMEs) that lack the budget, personnel, and infrastructure for advanced BI tools.

Traditional BI solutions require:

  • Dedicated data analysts with specialized skill sets
  • Complex BI platforms (Tableau, Power BI) with steep learning curves
  • Significant upfront investment in licensing and infrastructure
  • Time-intensive report generation cycles

This creates a compounding knowledge gap: the organizations that need data-driven decisions most urgently are precisely those least equipped to generate them.

1.2 Opportunity in AI Advancement

Recent breakthroughs in Large Language Models (LLMs) and Retrieval-Augmented Generation (RAG) systems offer an unprecedented opportunity to democratize business intelligence:

  • Conversational data exploration — query data in plain English without SQL knowledge
  • Contextual insight generation — synthesize patterns across multiple data dimensions simultaneously
  • Grounded responses — RAG ensures answers are tied to actual business data, not hallucinated
  • Automated visualization — auto-select and render appropriate charts for each query

1.3 The InsightForge Solution

InsightForge addresses these challenges by delivering an automated, conversational AI Business Intelligence Assistant that enables any business user — regardless of technical background — to extract meaningful insights from their data through natural language interaction.

02Project Overview & Objectives

2.1 Project Summary

AttributeDetail
Project NameInsightForge
TypeAI-Powered Business Intelligence Assistant
Primary TechnologiesLangChain, RAG, LLMs (OpenAI GPT-4), FastAPI, Next.js
Target UsersBusiness analysts, SME owners, operations teams
Data DomainSales, product, regional, and customer analytics

2.2 Core Objectives

ObjectiveDescription
Analyze Business DataIdentify key trends and patterns across time, products, regions, and customer demographics
Generate Insights & RecommendationsUse NLP to produce actionable business insights with structured recommendations
Visualize Data InsightsPresent findings through interactive charts and dashboards for intuitive interpretation

2.3 Project Phases

Part 1: AI-Powered Business Intelligence Assistant
Steps 1–6: Data preparation → Knowledge base → LLM development → Chain prompts → RAG setup → Memory integration

Part 2: LLMOps & User Interface
Steps 7+: Model evaluation (QAEvalChain) → Data visualization → Next.js UI creation


The Problem: Data Rich, Insight Poor

Every business generates data. Sales records, customer demographics, regional performance, product metrics — the numbers accumulate in spreadsheets and databases month after month. Yet for most small and medium-sized businesses, that data just sits there.

Extracting meaningful insights has traditionally required:

  • A dedicated analyst (expensive and slow)
  • A BI platform like Tableau or Power BI (steep learning curve)
  • Custom reporting scripts (technical barrier, ongoing maintenance)

What if a business owner could simply ask their data — "Which product is performing best in the South region?" — and get a clear, accurate, actionable answer in seconds?

That's exactly what InsightForge was built to solve.

What Is InsightForge?

InsightForge is a conversational Business Intelligence Assistant that combines:

  • LangChain — to orchestrate LLM workflows and retrieval chains
  • Retrieval-Augmented Generation (RAG) — to ground every response in real business data
  • OpenAI GPT-4 / GPT-3.5 — for natural language understanding and generation

You upload your sales CSV, ask questions in plain English, and receive structured insights with supporting statistics and visualizations — no SQL, no code required.

How It Works: The Architecture

The system is deliberately layered. Every query passes through a smart router before touching the LLM, ensuring the cheapest appropriate tool handles each request:

User (Browser)
      │
      ▼
Next.js Frontend  (React + Tailwind + Nivo Charts)
      │
      ▼
FastAPI Backend
      │
      ├── Query Router ──► chart keywords?    → ChartEngine
      │                ──► analytics keywords? → DuckDB SQL
      │                ──► open-ended?         → RAG + LLM
      │
      ├── RAG Pipeline  (embed → pgvector → GPT-3.5)
      ├── LangGraph     (4-node GPT-4 analysis workflow)
      ├── BusinessAnalyzer (Pandas + scikit-learn + SciPy)
      └── Dual-layer Memory (PostgreSQL short-term + summaries)

This division of labor is the core architectural insight: LLMs are synthesizers and communicators, not databases or calculators. Let DuckDB handle SQL. Let scikit-learn cluster. Let the LLM understand intent and communicate insights.

Building the Knowledge Base

The first design challenge: how do you make a CSV retrievable by an LLM?

Dumping raw rows into a prompt doesn't scale. Instead, InsightForge uses semantic chunking by business domain — when a file is uploaded, the RAG Manager segments the data into four meaning-complete summary types before embedding and storing them in pgvector:

Chunk TypeRepresents
CategoryProduct × Region cross-tabulation statistics
RegionalTotal revenue and satisfaction per region
Time-basedMonthly and weekly aggregated trends
ProductPer-product performance and market share

Each chunk carries business intent, not just text. A query about "East region performance" retrieves the chunk built specifically for that context — not a random fragment. This precision is what makes the RAG system reliable rather than noisy.

The Smart Query Router

One of the highest-ROI decisions was building a three-way router that intercepts queries before any LLM call:

  • "Show me a sales chart" → ChartEngine returns a spec in <5ms (zero LLM cost)
  • "What is total sales by region?" → DuckDB SQL executes in <10ms (zero LLM cost)
  • "Why might the South be underperforming?" → Full RAG + LLM (nuanced reasoning needed)

This alone reduces API costs and latency dramatically on the majority of queries, which tend to be structured aggregations rather than open-ended reasoning.

Memory: Making Conversations Coherent

A one-shot Q&A tool isn't a real BI assistant. Real analysis is iterative — you start broad, then drill down, compare, and follow up. InsightForge maintains two memory layers to support this:

Short-term memory persists every message in PostgreSQL and injects the last N exchanges as chat_history into each new prompt. This enables natural pronoun resolution:

User: "What are sales in the North region?"
AI:   "North had $45,230 in total sales..."
User: "How does that compare to South?"
AI:   [Resolves "that" correctly] "Compared to North's $45,230, South had $38,100..."

Long-term memory prevents context overflow in extended sessions. After each exchange, a compressed LLM summary is written to session_summaries, preserving key insights without replaying the full message history.

This is the same principle behind ChatGPT's persistent memory: compress what matters, discard what doesn't.

Prompt Engineering That Works

Getting reliable business insights out of an LLM requires disciplined prompt design. The core analysis template enforces a consistent, actionable structure:

You are a business intelligence expert analyzing data metrics.

Available Metrics: {context}
User Question:     {question}
Previous Conversation: {chat_history}

Respond with:
1. Key Findings:          2–3 main insights
2. Detailed Analysis:     In-depth explanation
3. Business Implications: Strategic meaning
4. Recommendations:       Specific, actionable steps
  1. Explicit expert role — anchors domain tone and prevents vague responses
  2. Mandatory output structure — forces organized findings over freeform prose
  3. Injected context — retrieved chunks ensure every answer is grounded in actual data

03System Architecture

3.1 High-Level Architecture

┌─────────────────────────────────────────────────────────────────┐
│                        USER LAYER                               │
│              Next.js Frontend (Port 3000)                       │
│   ┌──────────┐  ┌───────────┐  ┌──────────┐  ┌─────────────┐  │
│   │  Chat UI  │  │ Analytics │  │ History  │  │ Evaluation  │  │
│   └──────────┘  └───────────┘  └──────────┘  └─────────────┘  │
└────────────────────────────┬────────────────────────────────────┘
                             │ HTTP/REST
┌────────────────────────────▼────────────────────────────────────┐
│                     API GATEWAY LAYER                           │
│              FastAPI Backend (Port 9100)                        │
│  /query  /upload  /session  /chart  /evaluate                   │
│  /analytics  /dashboard  /feedback  /history                    │
└──────────────┬──────────────────┬───────────────────┬───────────┘
               │                  │                   │
    ┌──────────▼──────┐  ┌────────▼────────┐  ┌──────▼──────────┐
    │   LLM LAYER     │  │ ANALYTICS LAYER │  │  STORAGE LAYER  │
    │                 │  │                 │  │                 │
    │ LangChain       │  │ DuckDB (SQL)    │  │ PostgreSQL      │
    │ LangGraph       │  │ Pandas          │  │ + pgvector      │
    │ OpenAI GPT-4    │  │ scikit-learn    │  │                 │
    │ GPT-3.5-turbo   │  │ SciPy           │  │ Sessions,Chunks │
    │ text-ada-002    │  │ BusinessAnalyzer│  │ Metrics, Sales  │
    └─────────────────┘  └─────────────────┘  └─────────────────┘

3.2 Component Architecture

BusinessAnalytics/
├── backend/
│   ├── main.py                    ← App entry point + router registration
│   ├── models.py                  ← SQLAlchemy ORM (10 tables)
│   ├── db.py                      ← Database session management
│   ├── llm/
│   │   ├── analysis_graph.py      ← LangGraph 4-node analysis workflow
│   │   ├── business_retriever.py  ← Custom RAG retriever (cosine similarity)
│   │   ├── rag_manager.py         ← RAG pipeline: embed → retrieve → generate
│   │   └── business_prompts.py    ← Prompt templates & manager
│   ├── analysis/
│   │   └── business_analyzer.py   ← Time-series, products, regions, K-means
│   ├── visualization/
│   │   └── chart_engine.py        ← 6 chart types
│   ├── chains/
│   │   ├── analysis_chain.py
│   │   ├── business_analysis_chain.py  ← Memory-integrated chain
│   │   └── visualization_chain.py
│   ├── evaluation/
│   │   ├── qa_eval.py             ← QAEvalChain (4-dimension scoring)
│   │   └── model_monitor.py       ← Feedback collection & monitoring
│   ├── utils/
│   │   ├── embeddings.py, data_processor.py, rag_prompts.py
│   └── routes/
│       ├── rag.py, analytics.py, session.py, feedback.py
│       └── chart.py, upload.py, dashboard.py, evaluate.py
├── frontend/
│   ├── pages/                     ← 6 application pages
│   ├── components/                ← Reusable React components
│   └── styles/globals.css
├── sample_sales_data.csv          ← 2,501-record sample dataset
├── requirements.txt
└── run_all.sh                     ← Startup orchestration script

3.3 Request Routing Flow

Every incoming query is classified before any LLM call is made, routing it to the cheapest appropriate handler:

POST /api/query
    │
    ├─→ classify_query()
    │        ├── chart keywords?      → ChartEngine.create_visualization()
    │        ├── analytics keywords?  → natural_to_sql() → DuckDB execute
    │        └── default              → RAG Pipeline
    │                                        │
    │                                   get_embedding() [OpenAI]
    │                                   pgvector similarity search
    │                                   cosine_similarity re-ranking
    │                                   Top-3 chunks → LLM (GPT-3.5)
    │                                   Session memory update
    ▼
Structured JSON Response → Frontend

04Technology Stack

4.1 Backend

CategoryTechnologyVersionPurpose
Web FrameworkFastAPILatestAsync REST API with Pydantic validation
LLM OrchestrationLangChainLatestChains, retrievers, prompt templates
LLM WorkflowLangGraphLatest4-node stateful analysis graph
LLM — Deep AnalysisOpenAI GPT-4LatestComplex analysis (temperature=0.2)
LLM — Fast ResponsesGPT-3.5-turboLatestRAG synthesis + evaluation
Embeddingstext-embedding-ada-002LatestQuery & document vectorization (1536-dim)
Vector SearchpgvectorLatestSemantic similarity search
Analytics DBDuckDBLatestIn-memory OLAP SQL queries
Data ProcessingPandasLatestDataFrame operations & aggregations
Machine Learningscikit-learnLatestK-means clustering, StandardScaler
StatisticsSciPyLatestSkew, kurtosis, statistical moments
ORMSQLAlchemyLatestDatabase session management
MigrationsAlembicLatestSchema version control
ServerUvicornLatestASGI server for FastAPI

4.2 Frontend

CategoryTechnologyVersionPurpose
FrameworkNext.js13.4.7SSR/SSG React framework
UI LibraryReact18.2.0Component-based UI
StylingTailwind CSS3.3.2Utility-first CSS framework
ChartsNivo0.99.0D3-based data visualization
Iconslucide-react0.244.0SVG icon system
Animationsframer-motion10.12.16Smooth transitions

4.3 Storage & Infrastructure

ComponentTechnologyPortPurpose
Primary DatabasePostgreSQL + pgvector5432Sessions, messages, embeddings, metrics
Analytics EngineDuckDBIn-memory OLAP over uploaded CSVs
Backend APIUvicorn (FastAPI)9100REST API + Swagger UI at /docs
FrontendNext.js Dev Server3000React application

05Data Architecture & Schema

5.1 Database Schema (PostgreSQL — 10 Tables)

┌──────────────────┐     ┌─────────────────────┐
│      users       │     │    chat_sessions     │
├──────────────────┤     ├─────────────────────┤
│ id (PK)          │◄────│ user_id (FK)         │
│ email            │     │ id (PK)             │
│ created_at       │     │ title               │
└──────────────────┘     │ created_at          │
                         │ updated_at          │
                         └──────────┬──────────┘
                                    │
            ┌───────────────────────┼──────────────────────┐
            ▼                       ▼                      ▼
┌───────────────────┐  ┌──────────────────────┐  ┌─────────────────┐
│     messages      │  │  session_summaries   │  │    feedback     │
├───────────────────┤  ├──────────────────────┤  ├─────────────────┤
│ id (PK)           │  │ session_id (FK, PK)  │  │ id (PK)         │
│ session_id (FK)   │  │ summary (TEXT)       │  │ session_id (FK) │
│ role              │  │ updated_at           │  │ message_id (FK) │
│ content (TEXT)    │  └──────────────────────┘  │ rating (1–5)    │
│ created_at        │                            │ feedback_text   │
└───────────────────┘                            └─────────────────┘

┌──────────────────────┐    ┌─────────────────────┐
│  business_documents  │    │   document_chunks   │
├──────────────────────┤    ├─────────────────────┤
│ id (PK)              │◄───│ document_id (FK)    │
│ source_type          │    │ chunk_text (TEXT)   │
│ title / raw_content  │    │ chunk_index         │
│ metadata_json        │    │ embedding (vector)  │
└──────────────────────┘    └─────────────────────┘

┌──────────────────┐   ┌──────────────────┐   ┌──────────────────┐
│ business_metrics │   │  sales_records   │   │    products      │
├──────────────────┤   ├──────────────────┤   ├──────────────────┤
│ metric_name      │   │ order_date       │   │ product_id (PK)  │
│ definition       │   │ region           │   │ product_name     │
│ formula / owner  │   │ product_id (FK)  │   │ category         │
└──────────────────┘   │ revenue, qty     │   └──────────────────┘
                       └──────────────────┘   ┌──────────────────┐
                                               │   customers      │
                                               ├──────────────────┤
                                               │ customer_id (PK) │
                                               │ segment          │
                                               │ geography        │
                                               │ demographic_group│
                                               └──────────────────┘

5.2 Sample Dataset

ColumnTypeValues / Range
DateDATEYYYY-MM-DD (from 2022-01-01)
ProductSTRINGWidget A, B, C, D
RegionSTRINGNorth, South, East, West
SalesFLOATTransaction amount
Customer_AgeINT18–80
Customer_GenderSTRINGMale, Female
Customer_SatisfactionFLOAT1.0–5.0

2,501 records · 4 products × 4 regions × 3 demographic groups × time series = dozens of analytical dimensions from a simple 7-column file.

5.3 Column Alias Normalization

python_COLUMN_ALIASES = {
    "date": "Date",  "product": "Product",  "region": "Region",
    "sales": "Sales",  "revenue": "Sales",  "units": "Sales",
    "customer_age": "Customer_Age",  "age": "Customer_Age",
    "customer_gender": "Customer_Gender",  "gender": "Customer_Gender",
    "customer_satisfaction": "Customer_Satisfaction",
    "satisfaction": "Customer_Satisfaction"
}

06Core Components

6.1 RAG Manager backend/llm/rag_manager.py

MethodDescription
process_business_data()Splits data into 4 semantic chunk types and stores embeddings in pgvector
get_relevant_context()Cosine similarity search against stored embeddings; returns top-k chunks
generate_response()Assembles context + session history into an LLM prompt and returns the answer
extract_insights()Parses structured insight fields from raw LLM response text

Embedding model: text-embedding-ada-002 (1,536-dim vectors). Token usage tracked via get_openai_callback().

6.2 Custom Business Retriever backend/llm/business_retriever.py

pythonclass BusinessRetriever(BaseRetriever):

    def _get_time_based_metrics(self, df) -> dict:
        # 7-day / 30-day moving averages
        # Day-of-week seasonality patterns
        # Month-over-month growth rates
        # Peak day identification

    def _get_product_metrics(self, df) -> dict:
        # Product × Region cross-tabulation
        # Age-based segmentation analysis
        # Satisfaction–sales correlation

    def _get_customer_metrics(self, df) -> dict:
        # Age segment profiling (young/middle/senior)
        # Satisfaction tiers (low/medium/high)
        # Purchase behavior patterns

    def get_relevant_documents(self, query: str) -> List[Document]:
        # 1. Embed query with OpenAI
        # 2. Cosine similarity vs. all stored chunk embeddings
        # 3. Return top-2 most relevant metric documents

6.3 LangGraph Analysis Workflow backend/llm/analysis_graph.py

              ┌──────────────────┐
              │  route_by_type() │  ← Classifies query by keywords
              └────────┬─────────┘
       ┌───────────────┼───────────────┬──────────────┐
       ▼               ▼               ▼              ▼
  ┌─────────┐   ┌───────────┐   ┌──────────┐  ┌──────────┐
  │  Trend  │   │Comparative│   │Predictive│  │ General  │
  │Analysis │   │ Analysis  │   │ Analysis │  │ Analysis │
  └────┬────┘   └─────┬─────┘   └────┬─────┘  └────┬─────┘
       └───────────────┴──────────────┴──────────────┘
                              │
                     ┌────────▼────────┐
                     │  GPT-4 Synthesis │
                     │  (temp = 0.2)   │
                     └─────────────────┘
Analysis TypeRouting Keywords
Trend"trend", "over time", "growth", "decline"
Comparative"compare", "vs", "difference", "better", "worse"
Predictive"predict", "forecast", "future", "expect"
General(default fallback)

6.4 Business Analyzer backend/analysis/business_analyzer.py

MethodWhat It Computes
analyze_time_series()Daily aggregations, 7-day rolling avg, MoM growth, trend direction via np.polyfit
analyze_products()Revenue per product, market share (%), top/bottom performers, satisfaction scores
analyze_regions()Revenue by region, satisfaction per region, product mix, performance index
analyze_customer_segments()K-means (n=3) on age + revenue after StandardScaler normalization
calculate_statistics()Mean, median, std, skewness, kurtosis, Q1, Q3, IQR for each numeric column

07RAG System Design

7.1 End-to-End RAG Pipeline

━━━━ PHASE 1: INDEXING (at upload time) ━━━━━━━━━━━━━━━━━━━━━

CSV Upload → POST /api/upload
    │
    ▼
RAGManager.process_business_data()
    ├── Category Summary    (Product × Region cross-tabs)
    ├── Regional Summary    (revenue + satisfaction per region)
    ├── Time-based Summary  (monthly/weekly aggregates)
    └── Product Summary     (per-product performance metrics)
    │
    ▼
Text serialization → OpenAI text-embedding-ada-002
    │
    ▼
1536-dim vectors → INSERT INTO document_chunks (pgvector)

━━━━ PHASE 2: RETRIEVAL (at query time) ━━━━━━━━━━━━━━━━━━━━━

User Query → Embed with text-embedding-ada-002
    │
    ▼
pgvector approximate search → cosine_similarity() re-ranking
    │
    ▼
Top-3 most relevant chunks returned as context

━━━━ PHASE 3: GENERATION ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

Context chunks + session history + user query
    │
    ▼
BusinessPromptManager assembles prompt → GPT-3.5-turbo
    │
    ▼
Structured response with insights

7.2 Semantic Chunk Strategy

Chunk TypeContentExample
CategoryProduct × Region stats"Widget A in North: 150 sales, avg satisfaction 3.2"
RegionalRevenue + satisfaction"South region: $45,230 total, 3.4/5 satisfaction"
Time-basedMonthly trends"January 2022: 210 transactions, 12% MoM growth"
ProductPer-product metrics"Widget B: 28% market share, top performer in East"

7.3 Cosine Similarity

pythondef cosine_similarity(a, b):
    a, b = np.array(a), np.array(b)
    return np.dot(a, b) / (np.linalg.norm(a) * np.linalg.norm(b))

# Applied during retrieval:
similarities = [cosine_similarity(query_emb, chunk.embedding) for chunk in all_chunks]
top_k = sorted(zip(similarities, chunks), reverse=True)[:3]

Range: −1 to +1. Scores above ~0.75 are considered highly relevant for this business domain.

08Memory Integration

InsightForge uses a dual-layer memory architecture so multi-turn conversations remain coherent across both short and long spans.

8.1 Short-Term Memory (Conversation Buffer)

Every user query and assistant response is persisted as a Message row linked to the active ChatSession. The last N messages are injected as chat_history into every new LLM prompt.

python# Stored on every exchange:
Message(session_id=sid, role="user",      content=query)
Message(session_id=sid, role="assistant", content=answer)

8.2 Long-Term Memory (Session Summarization)

After each exchange, a compressed LLM summary of the session is stored in session_summaries, preventing token-window overflow while preserving key analytical context.

pythonexisting_summary = await get_session_summary(session_id)
new_summary      = await llm.compress(existing_summary + new_messages)
await update_session_summary(session_id, new_summary)

8.3 Memory Architecture

┌──────────────────────────────────────────────────────────┐
│                    MEMORY LAYERS                         │
│                                                          │
│  Short-Term (messages table)   Long-Term (summaries)     │
│  ┌──────────────────────┐      ┌─────────────────────┐   │
│  │ [msg 1] user         │      │ Compressed session  │   │
│  │ [msg 2] assistant    │─LLM─▶│ context snapshot    │   │
│  │ [msg N] assistant    │compress                    │   │
│  └──────────┬───────────┘      └──────────┬──────────┘   │
│             └─────────────────────────────┘              │
│                               ▼                          │
│                    ┌─────────────────────┐               │
│                    │  Combined Context   │               │
│                    │  injected into LLM  │               │
│                    └─────────────────────┘               │
└──────────────────────────────────────────────────────────┘

8.4 Memory in Practice

User ActionMemory Role
"Which product did we discuss earlier?"Short-term buffer recalls previous turn
Region → product → trend drill-downEach step builds on accumulated context
Resuming a session laterLong-term summary preserves key findings
Long analysis spanning many queriesSummarization prevents prompt overflow

09LLM Orchestration & Prompt Engineering

9.1 Core Analysis Prompt Template

You are a business intelligence expert analyzing data metrics.

Available Metrics:
{context}

User Question: {question}

Previous Conversation:
{chat_history}

Provide a detailed analysis following these guidelines:
1. Focus on the specific metrics relevant to the question
2. Identify key trends, patterns, or anomalies
3. Provide actionable insights based on the data
4. Highlight statistical significance where applicable
5. Suggest potential business implications

Response Format:
1. Key Findings:          [2-3 main insights]
2. Detailed Analysis:     [In-depth explanation]
3. Business Implications: [Strategic meaning]
4. Recommendations:       [Specific, actionable steps]
  • Explicit role — anchors the LLM's tone and domain expertise
  • Structured output format — forces organized findings rather than freeform prose
  • Context injection — retrieved chunks ground every answer in real data

9.2 System Prompts by Context

ContextSystem Prompt
RAG Query"You are a business analytics assistant. Provide clear, data-driven responses."
Analysis Graph"You are a business intelligence analyst. Use tools to analyze data and provide insights."
Evaluation"You are an expert evaluator for a business intelligence AI assistant."

9.3 Chain Architecture

ChainFileMemory
General analysisanalysis_chain.pyNo
Business analysis with historybusiness_analysis_chain.pyYes — ConversationBufferMemory
Chart type recommendationvisualization_chain.pyNo

9.4 Model Selection

TaskModelRationale
Deep analysis (LangGraph)GPT-4, temp=0.2Highest accuracy; deterministic outputs
RAG synthesisGPT-3.5-turboSpeed + cost balance for frequent queries
Evaluation scoringGPT-3.5-turboReliable structured output parsing
All embeddingstext-embedding-ada-002Standard, high-quality 1536-dim vectors

10Analytics Engine

10.1 Query Classification

pythondef classify_query(query: str) -> Literal["analytics", "chart", "rag"]:
    analytics_kw = ["total", "average", "sum", "trend", "compare", "top",
                    "by region", "by product", "sales", "revenue",
                    "how many", "count", "highest", "lowest"]
    chart_kw     = ["chart", "plot", "graph", "visualize", "show me"]

    q = query.lower()
    if any(kw in q for kw in chart_kw):     return "chart"
    if any(kw in q for kw in analytics_kw): return "analytics"
    return "rag"
Query TypeHandlerTypical Latency
Chart requestChartEngine spec<5ms
Aggregation / comparisonDuckDB SQL<10ms
Open-ended / reasoningRAG + LLM2–5s

10.2 DuckDB SQL Analytics

python# "trend over time"  → GROUP BY Date ORDER BY Date
# "by product"       → GROUP BY Product
# "satisfaction"     → AVG(Customer_Satisfaction) GROUP BY Region
# "top N"            → ORDER BY Sales DESC LIMIT N

10.3 Statistical Analysis Suite

MeasureMethodBusiness Use
Mean / Mediandf.mean() / df.median()Baseline benchmarks
Std Deviationdf.std()Volatility / consistency
Skewnessscipy.stats.skew()Detect revenue outlier bias
Kurtosisscipy.stats.kurtosis()Detect extreme value concentration
Q1 / Q3 / IQRdf.quantile()Outlier detection range

10.4 K-Means Customer Segmentation

pythonfrom sklearn.cluster import KMeans
from sklearn.preprocessing import StandardScaler

features = df.groupby('Customer_Age').agg(
    {'Sales': 'sum', 'Customer_Satisfaction': 'mean'}
).reset_index()

scaler  = StandardScaler()
X       = scaler.fit_transform(features[['Sales', 'Customer_Satisfaction']])
kmeans  = KMeans(n_clusters=3, random_state=42)
features['Segment'] = kmeans.fit_predict(X)
# Segments → Young (18–35) | Middle-aged (36–55) | Senior (56+)

12Model Evaluation & LLMOps

12.1 QAEvalChain Evaluator

pythonclass BusinessInsightEvaluator:
    async def evaluate_response(self, question, answer, ground_truth):
        # 1. Build rubric prompt with all three inputs
        # 2. Call GPT-3.5-turbo
        # 3. Parse 4 scores via regex
        # 4. Generate improvement recommendations
        # 5. Append to metrics_history
MetricScaleCriteria
Accuracy1–5Factual correctness vs. ground truth
Completeness1–5Coverage of all relevant aspects
Relevance1–5Direct address of the question asked
Business Value1–5Actionability and strategic utility

Thresholds: <3.0 → critical · <4.0 → advisory · ≥4.0 → satisfactory

12.2 Evaluation Pipeline

POST /api/evaluate  { question, answer, ground_truth }
    │
    ▼
BusinessInsightEvaluator.evaluate_response()
    ├── Assemble rubric prompt
    ├── Call GPT-3.5-turbo
    ├── Parse scores: r"(accuracy|completeness|relevance|business_value):\s*(\d+)"
    ├── Identify improvement areas (threshold-based)
    └── Append to metrics_history
    │
    ▼
{
  metrics:           { accuracy: 4.2, completeness: 3.8, relevance: 4.5, business_value: 3.9 },
  feedback:          "The response demonstrates...",
  improvement_areas: ["Consider adding quantitative specifics..."]
}

12.3 Monitoring Signals

SignalSource
User ratings (1–5 stars)feedback table
Per-session feedback textfeedback table
QAEvalChain score historymetrics_history in-memory list
Token usageOpenAI callback per request

13API Reference

13.1 Endpoint Catalog

MethodEndpointPurpose
POST/api/queryClassify → route → RAG/SQL/chart → respond
GET/api/analyticsDuckDB SQL analytics query
GET/api/dashboard/metricsSummary metrics + 6 chart datasets
GET/api/chart/{metric_id}Generate chart spec for a saved metric
POST/api/uploadUpload CSV/Excel data file
GET/api/session/{id}Retrieve session with full message history
GET/api/historyList all sessions (newest first)
POST/api/feedbackSubmit star rating + feedback text
GET/api/feedback/{session_id}Retrieve all feedback for a session
POST/api/evaluateRun QAEvalChain evaluation

13.2 POST /api/query — Primary Query Endpoint

json{
  "query": "What are the top-performing products by region?",
  "session_id": "uuid-optional"
}

// Response:
{
  "answer": "Widget B leads in the East region with 28% market share...",
  "citations": ["Regional summary chunk", "Product analysis chunk"],
  "chart_spec": { "type": "bar", "data": [...], "title": "..." },
  "follow_up_questions": ["What is Widget B's satisfaction score in the East?"],
  "session_id": "uuid-generated",
  "type": "rag"
}

15Configuration & Deployment

15.1 Environment Variables

VariablePurposeExample
DATABASE_URLPostgreSQL connectionpostgresql+psycopg2://user:pass@localhost:5432/insightforge
OPENAI_API_KEYOpenAI authenticationsk-...
OPENAI_BASE_URLAPI endpoint (proxy-friendly)https://openai.vocareum.com/v1

15.2 Deployment Steps

bash# 1. Python environment
python -m venv .venv && source .venv/bin/activate
pip install -r requirements.txt

# 2. Database setup
createdb insightforge
cd backend && alembic upgrade head

# 3. Environment variables
cp backend/.env.example backend/.env
# Set DATABASE_URL and OPENAI_API_KEY

# 4. Frontend dependencies
cd ../frontend && npm install

# 5. Start all services
./run_all.sh
# Backend  → http://localhost:9100
# Frontend → http://localhost:3000
# API docs → http://localhost:9100/docs

Measuring Quality with QAEvalChain

Building the system is only half the work. Knowing whether it's good requires an evaluation framework.

InsightForge implements a QAEvalChain-style evaluator that scores responses across four dimensions (each 1–5) using GPT-3.5-turbo as the judge:

DimensionWhat It Measures
AccuracyFactual correctness vs. ground truth
CompletenessCoverage of all relevant aspects
RelevanceHow directly it addresses the question
Business ValueActionability and strategic utility

Scores below 3.0 are flagged as critical; below 4.0 as advisory. This creates a continuous improvement loop — every poor-scoring response points to a specific, fixable gap.

Key Lessons Learned

Lesson 1

Smart routing beats universal RAG. Not every question deserves an LLM call. Routing simple aggregations to DuckDB SQL cuts latency by ~95% for those queries with no quality loss.

Lesson 2

Domain-semantic chunking beats fixed-size windows. Chunks that represent complete business domains retrieve far more precisely than arbitrary 512-token text slices. The chunk carries intent.

Lesson 3

Temperature matters more than model version. GPT-4 at temperature=0.2 for analysis; GPT-3.5 at temperature=0.3 for generation. Low temperature is what delivers factual accuracy.

Lesson 4

Memory is what separates Q&A from analysis. Users naturally ask multi-turn questions. The dual-layer architecture — short-term buffer plus long-term summaries — is what makes InsightForge feel like a genuine analyst.

Lesson 5

Evaluation is not an afterthought. Without a scoring framework, there's no feedback loop. Building QAEvalChain in from the start created a reliable signal for where the system was failing and why.

Lesson 6

Structured prompts beat longer prompts. The four-section output format (Findings → Analysis → Implications → Recommendations) consistently outperforms longer unstructured prompts.

Live Demo

The video below walks through InsightForge end-to-end — uploading a sales CSV, querying in plain English, and watching the system route, retrieve, and respond in real time.

Conclusion

InsightForge shows that combining RAG, LLMs, and classical analytics tools solves a real-world problem: making business intelligence accessible without enterprise budgets or dedicated data teams.

The key insight: each technology does what it does best. DuckDB handles SQL. scikit-learn handles clustering. The LLM handles intent understanding and communication. Memory makes it conversational. Evaluation makes it trustworthy. Structured prompting makes it actionable.

That division of responsibility — not a single monolithic LLM call for everything — is what transforms a language model into a genuine business intelligence assistant.

Source Code

The full source code for InsightForge is available on GitHub. Refer to the repository for implementation details, setup instructions, and the complete codebase. github.com/nselvar/AIBusinessAnalytics