Each platform owns one capability and stays in its lane. The intelligence lives in how they hand off: Retell carries the voice, Make does the orchestration, Twilio handles telephony and SMS, Google Workspace is the system of record. Replace any one of them and the rest still works.
Hosts the agent. An 11-node conversation graph runs on GPT-4.1 Mini for live dialogue, with Claude Sonnet 4.6 doing structured extraction the moment the call ends.
Ten scenarios chained end-to-end. One paces the dialer. One handles every post-call event. Eight more govern the email cadence — triggered, branched, and idempotent.
Outbound calls go over a SIP trunk to Retell. Confirmation SMS flows back through a Messaging Service. Brand registered, A2P 10DLC compliant, CNAM verified end-to-end.
Sheets are the database — Call Queue plus an append-only log. Docs auto-generate per-call qualified-lead reports from a template. Drive holds collateral the emails attach.
From the moment a row is added to the lead sheet to the morning the team lead gets a text saying a meeting was just booked — every hop, every API call, every state mutation. The accent path is the happy case.
The accent paths trace the qualified-lead flow. The agent never has to know about the email cascade or the report generator — every system reads from the sheet, mutates one column, and exits. Composability falls out naturally.
Georgia never ends a call without either (a) a meeting on the calendar or (b) a specific callback day and time. The conversation graph enforces this with three global nodes that any state can route into — objection handling, callback, not-interested — and four terminal nodes that close the call cleanly.
Each scenario owns one stage of the lead's lifecycle. They share state through the lead sheet, so any one of them can be paused, edited, or replaced without touching the others. This is the entire outbound cadence as it actually runs in production.
Each model is picked for the job it does best, not for the resume. Retell runs GPT-4.1 Mini for live conversation because latency is the constraint that matters. Sonnet 4.6 handles structured extraction post-call where accuracy on edge cases is what matters. A separate Sonnet 4 instance writes per-prospect copy in the email cadence. The cheapest option, GPT-4o-mini, handles the bulk enrichment job where simple JSON is the whole product.
Georgia speaks on GPT-4.1 Mini — fast, cheap, fluent enough to hold a live cold call without dead air. The post-call extraction is a different problem: pull thirteen structured fields out of a free-form transcript, including nuanced ones like user_sentiment and objections_raised. Sonnet 4.6 takes that pass.
Both run inside Retell. The only thing my code has to do is write the agent prompt and define the extraction schema.
Every new lead gets searched on Serper, then handed to GPT-4o-mini with strict instructions: return JSON with two fields, a "your X" distinctive phrase and a plural industry keyword. The same fields get used by the cold email, the dialer's prompt to Georgia, and three of the email-cadence scenarios.
Two of the seven email scenarios call Sonnet 4 inline, mid-pipeline, to generate a single sentence sized exactly to the email. The 1st Non-VM email gets a "your X" phrase that bridges into the body; the 2nd gets a 15–30 word sentence that connects a voicemail apology to a case study reference.
Strict prompt rules: no hype words, no geography, no questions, no quotes. Output drops in clean.
Each prompt has an escape hatch. If GPT-4o-mini can't find enough about a company, it writes "your work in this space" and "local businesses". If Sonnet returns junk, the email body still uses the static enriched phrase from the row, not the fresh generation — the AI is layered on, not load-bearing.
The system survives a model failure. That was the point of the design.
Twelve services. Each one earns its place. No "we already had it" in this list — it's all decisions.
The kinds of decisions that don't show up in a feature list, but separate "I wired some tools together" from "I built a system."
The first instinct is four scenarios — Call 1, Call 2, Call 3, Call 4. I built one with a 4-block OR filter and a 4-route router. The filter evaluates the row's state to find which attempt is due; the router stamps the right column. Three fewer scenarios to maintain, one place to change cadence.
Every scenario reads the row, decides if it should run, mutates one column, and exits. No central state machine, no queue. Adding a new touchpoint means writing a filter, not editing a workflow. The whole pipeline can be re-built one scenario at a time.
call_ended fires before Retell finishes the LLM extraction. By the time the webhook arrived, half the fields would still be empty. call_analyzed waits for the post-call pass to complete, so the router has every field it needs on first contact.
Two reasons. Legally, it's the safe path. Practically, it's the right answer to the most common objection ("Are you a robot?"), so the script handles it warmly and converts it into the pitch. Hiding it would create the very objection the rest of the call has to absorb.
The dialer passes call_day and call_date as dynamic variables on every call, so Georgia can say "today's Tuesday" without the agent prompt knowing what today is. Same agent prompt, different conversation, every day.
GPT for the live call. Sonnet for the structured extraction. Sonnet again for personalized copy. GPT-4o-mini for the cheap bulk job. None of the choices were "we already use OpenAI" — each one was the cheapest model that met the bar for that job.
Georgia is outbound for a B2B services firm. I built the same shape for individual athletes — different stack, different audience, same problem of running personalized outreach at scale.
Or return to the index for contact and other work.