← Back to Projects
Phonx AI Phonx AI System Architecture
Phonx real-time voice AI architecture Diagram showing PSTN audio moving through Twilio, Krisp, Groq Whisper, conversation orchestration, Groq LLM, ElevenLabs TTS, Redis, Neo4j, GoHighLevel, and Postgres. PSTN caller 8 kHz audio Twilio Programmable Voice + Media Streams Krisp Noise suppression + turn-taking (acoustic) REAL-TIME INFERENCE LOOP ~1s end-to-end turn budget (caller stop → first TTS audio frame) ↑ audio back to caller Upsampler 8 kHz → 16 kHz Groq Whisper Streaming STT, multilingual Conversation Manager Turn state, orchestration Groq LLM Reasoning + response tokens ElevenLabs TTS Streamed audio out Neo4j VKG Plans, eligibility, regulatory edges Vector embeddings on nodes Redis Live session state, turn history Async outcome queue After call completes — async path (does not block conversation) CRM Sync worker Retry + DLQ GoHighLevel Structured outcomes Postgres on EC2 Call records, customers All services run on AWS EC2 · Hot path = solid arrows · Async path = dashed arrows
Voice AI Insurance Real-Time Compliance

Phonx AI

Production voice AI for US insurance — sub-second turn-taking, compliance-aware reasoning, CRM-native execution.

Industry US Insurance (Medicare, ACA enrollment)
Role Technical Lead & Systems Architect
Stack Twilio · Groq · ElevenLabs · Neo4j · Redis · AWS
Status Outbound in production · Inbound in pilot
01 — Overview

Project Overview

Phonx AI is a voice agent system built for US insurance agencies running high-volume outbound enrollment and inbound policyholder support. It places and receives calls through Twilio, runs real-time speech understanding and reasoning under a one-second turn budget, and writes outcomes back into the agency's CRM as structured events — not transcripts.

The system is designed around a hard constraint that defines voice AI in this market: a human caller will hang up if the agent feels slow, robotic, or evasive on regulated topics. Every architectural decision in Phonx — from the choice of inference provider to how state is held across barge-ins — is a decision about preserving that one-second budget without losing accuracy on insurance-specific reasoning.

Key constraint

The one-second turn budget (end-of-utterance → first TTS audio frame) is the single number every architectural decision is tested against. Inference provider, retrieval strategy, barge-in handling — all chosen to protect that margin.

Today it handles outbound enrollment booking flows in production and is moving inbound qualification toward pilot.

02 — Problem Statement

The Problem

US insurance agencies operate under three pressures that legacy IVR and offshore call centres can't resolve simultaneously:

Volume is bursty and seasonal.

Open Enrollment Periods compress months of demand into weeks. Headcount can't flex that fast, and missed calls are missed policies.

Compliance is non-negotiable.

Every statement an agent makes about coverage, eligibility, or benefits is a potential audit finding. Scripts drift, agents improvise, and quality monitoring is sampled — not exhaustive.

The CRM is the source of truth, not the call.

A call that doesn't land in the CRM as a structured outcome — booked appointment, qualified lead, opted-out contact — effectively didn't happen. Most voice tools produce transcripts; agencies need state changes.

Phonx is built to absorb the volume, hold the compliance line on every call, and write directly into the workflows agencies already run.

03 — Solution

What We Built

A voice agent system structured as four cooperating layers, each with its own latency budget and failure mode:

Telephony

Audio moves between the caller and the system over Twilio's Media Streams, with Krisp running noise suppression and acoustic turn-taking on the inbound leg before any model sees the audio.

Inference

Real-time speech and language inference runs ASR, LLM reasoning, and TTS on a streaming pipeline. Groq handles both Whisper-based STT and LLM inference where token-level latency is the constraint; ElevenLabs handles voice synthesis where naturalness under interruption matters.

Knowledge

Conversation state and domain knowledge live across Redis, for hot session state and fast reads during a turn, and Neo4j, for the insurance knowledge graph. The graph encodes plan structures, eligibility rules, and the relationships between products, carriers, and regulatory boundaries so the agent's answers are grounded in domain logic, not just retrieved text.

Execution

Structured outcomes write back into GoHighLevel today, with persistent records in Postgres-on-EC2. CRM sync runs out-of-band so the conversation never blocks on it. The four layers are deliberately decoupled — telephony failures don't corrupt state, model latency spikes don't break call control.

04 — Architecture

System Architecture

Inbound Path
Telephony Gateway — Twilio + Krisp

Twilio absorbs carrier-side complexity — DID provisioning, STIR/SHAKEN, regional routing. Krisp runs noise suppression and acoustic turn-taking before STT, because residual noise on US PSTN calls degrades transcription accuracy more than it degrades human comprehension.

Streaming STT — Groq Whisper

PSTN audio arrives at 8 kHz and is upsampled to 16 kHz before transcription, since Whisper expects 16 kHz. Streaming, not batch: partial transcripts feed the conversation manager so the system reasons in parallel instead of waiting for the full utterance.

Real-time Loop
Conversation Manager

The orchestration layer. Holds turn state, manages barge-in when a caller starts speaking mid-response, consumes the acoustic turn-taking signal from Krisp, and decides when to commit a partial response to TTS versus wait. Fallbacks and timeouts are first-class.

LLM Reasoning — Groq Inference

Groq is chosen for token throughput at low latency. For a voice agent, the cost ceiling is not just tokens-per-dollar — it is tokens-per-second under a one-second budget. Reasoning prompts stay narrow and grounded; insurance-specific knowledge comes from retrieval, not prompt bloat.

TTS — ElevenLabs

Streamed back over the Twilio media channel. Voice choice and prosody matter for trust on insurance calls — the agent is not trying to pass as human, but it cannot sound like an old IVR either.

Persistence & Sync
Knowledge Layer — Neo4j Vector Knowledge Graph

The proprietary asset. Plans, carriers, eligibility rules, and regulatory constraints are modelled as a graph with vector embeddings on relevant nodes. A question like "does this plan cover insulin pumps under Part D" resolves through graph traversal and semantic match — not one or the other.

State & Sync — Redis · Postgres · GHL Connector

Redis holds live session state: turn history, extracted entities, and compliance flags. Postgres-on-EC2 holds persistent call records. Once the call closes, a separate worker writes structured outcomes into GoHighLevel. CRM writes are async, retryable, and never block the caller experience.

05 — Engineering Challenges

The hard problems behind the one-second budget.

One-second budget

Holding a one-second turn budget end-to-end.

Total latency = audio capture + STT + end-of-utterance detection + LLM + TTS + audio playback. Each component has a budget; none can take the whole pie. Groq inference and streaming STT partials create most of the headroom. The remaining margin is spent on retrieval against the knowledge graph, which has its own timeout and degraded fallback using a smaller cached subgraph.

Barge-in integrity

Barge-in without context loss.

When a caller interrupts mid-response, the system has to stop TTS playback quickly, decide whether the new utterance replaces or refines the prior turn, and preserve the partial response so the agent does not repeat itself if the interruption was just a backchannel like "uh-huh" or "right". This is handled in the conversation manager, not outsourced to the LLM.

Compliance at retrieval

Compliance as a runtime constraint, not a post-hoc filter.

Insurance regulations restrict what an agent can say about coverage, premiums, and eligibility. The constraint is encoded as edges in the knowledge graph, connecting benefit claims to the regulatory boundaries that govern them. The agent does not generate a claim and then filter it; retrieval is constrained before generation.

Dropped call recovery

State recovery on dropped calls.

PSTN calls drop. When a call disconnects mid-flow, Redis holds session state long enough that a callback can resume from the last completed turn rather than starting over. This matters most in enrollment flows, where re-collecting fields kills conversion.

CRM resilience

CRM synchronisation under failure.

GoHighLevel's API can hit rate limits and latency spikes. The CRM connector queues outcome events in Redis and replays them with exponential backoff, with a dead-letter path for repeated failures. The call experience never depends on CRM availability in real time.

06 — Tech Stack

Technologies Used

Telephony & Media
Twilio Krisp
Inference
Groq Whisper STT Groq LLM ElevenLabs TTS
Data
Neo4j Redis Postgres on EC2
Infrastructure
AWS EC2 Docker Python
Integration
GoHighLevel
07 — Impact & Results

Outcomes

<1s
End-to-end turn latency

Measured from end-of-utterance to first TTS audio frame, p50 across production outbound calls.

Prod
Outbound enrollment in production

Currently handling open enrollment booking flows for partner agencies in the US insurance market.

Pilot
Inbound pilot underway

Qualification and routing flows in active development, scheduled for production rollout following pilot validation.

08 — Engagement Path

From Interest to Delivery

01 · Call flow audit

Map current intents, volume distribution, and compliance boundaries against the existing agency stack.

02 · Scoped outbound pilot

Deploy on a single high-volume flow, typically enrollment booking, measured against the agency's existing baseline.

03 · Production rollout with CRM sync

Full integration with the agency's CRM and telephony, with compliance logging and outcome tracking.