Production AI systems. Real numbers, real customers.

Six case studies from live deployments — voice agents taking calls, chatbots deflecting tickets, agents reactivating customers. Every metric below is from production traffic, not a demo environment. NDAs kept where required; architecture shared openly.

Voice Agents Chatbots AI Agents RAG Evaluation HIPAA
Voice Agent · Restaurant · NYC

Milina — AI voice agent for a NYC restaurant at $0.09 per call

50+ reservations a night, bilingual (English + Spanish), sub-700ms response latency. LiveKit + Deepgram + GPT-4o-mini + Cartesia. Callers routinely don't realize they're talking to AI.

LiveKitDeepgram Nova-2GPT-4o-miniCartesiaResyToast POS
91%Task completion
$0.09Per call
+22%Bookings MoM
<700msp50 latency
Read the Milina case →
Voice Agent · HIPAA · Dental

CleverAnswerAI — HIPAA dental receptionist, 20+ offices

Self-hosted LiveKit on a BAA-covered stack, live for a year. 100% answer rate. 28% more new-patient bookings. Direct integration with Dentrix, Open Dental, Curve, Eaglesoft.

LiveKit (self-hosted)Deepgram EnterpriseAzure OpenAIElevenLabs EnterpriseDentrix
100%Answer rate
+28%New bookings
20+Offices
Read the CleverAnswerAI case →
LLM Evaluation · iGaming

iGaming QA — 66% to 91% with schema-guided reasoning

Took a Tier-1 operator's QA accuracy from 66% to 91% and coverage from 2% to 25%. Rubric-as-code, 1,200-case eval harness, two-model ensemble on regulatory criteria.

GPT-4oClaude Sonnet 3.5LangGraphLangSmithPydantic
66→91%Accuracy
2→25%Coverage
$0.04Per audit
Read the iGaming QA case →
AI Agent · Retail · Reactivation

Dry cleaning chain — AI reactivation agent, 3.5x ROI

192K customer × category intervals scored daily. LangGraph agent picks channel, message, offer, and timing per customer. 18.7% reactivation across 23 treatment categories.

LangGraphGPT-4oTwilio SMSWhatsApp Businessn8n
3.5xROI vs. control
18.7%Reactivation
60+Locations
Read the reactivation case →
Call QA · Sales Ops · B2B SaaS

ConvoTune — AI call transcription & scoring for a 40-seat sales org

3,000+ calls scored per month against a 30-point playbook. 89% agreement with human reviewers. Real-time coaching prompts at <300ms. Entire pipeline in client AWS.

Whisper fine-tunedDeepgram Nova-2Azure OpenAILangGraphTerraform
3,000+Calls/month
89%Scoring agreement
$34Per seat/mo
Read the ConvoTune case →
RAG · Research · Open Benchmark

Enterprise RAG Challenge — winning architecture

Short technical case study. Hybrid retrieval (BM25 + dense + rerank), structured document parsing, schema-validated refusal, query decomposition. Same architecture we ship to clients.

GPT-4obge-reranker-largePineconeOpenSearchUnstructured.io
1stPlace
>90%Top-5 hit rate
0Hallucinations
Read the RAG Challenge case →

Data platform and analytics engineering case studies.

Before we focused on AI voice and chatbot work, we built data stacks on dbt, Snowflake, and Arabic-optimized analytics platforms. That work still pays bills for the clients — and we still do selective analytics engineering for existing AI clients — but it's no longer our primary practice.

Archive

Selected data-platform work: fitness-clubs analytics, medical aesthetics data platform, premium clinics analytics. These pages remain online for reference but aren't featured in our current navigation.

Want similar results? Let's see if your use case ships.

One 20-minute call. Bring us your call volume, your tech stack, or your current conversion rate — we'll tell you honestly whether we can build it, what the architecture looks like, and what it'll cost.