Email Automation Workflow
Avara Labs
AI-augmented email automation for a Shopify D2C brand — built as a freelance engagement, deployed in 2023.
Project Overview
A freelance engagement with Avara Labs to build and deploy an AI-augmented email automation workflow for one of their D2C brand clients on Shopify. The system classifies inbound customer emails by intent, generates draft responses against the brand's tone and policies, and escalates uncertain or sensitive cases to a human via Zapier.
The pragmatic choice on this project was to assemble the system from off-the-shelf components — OpenAI for classification and generation, Shopify webhooks for triggering, Zapier for human escalation — rather than build custom infrastructure. The brand needed something working in weeks, not a platform.
The Problem
D2C support teams spend most of their time on the same handful of email categories: order status, returns, product questions, and shipping issues. Response quality drifts across agents and shifts. For a small brand, hiring more support staff is not always economical, but unanswered emails directly hurt repeat purchase rates.
The brand needed automation that handled the repetitive majority of emails reliably and routed the genuinely complex cases to a human — without months of platform engineering or a six-figure SaaS contract.
What I Built
Shopify webhooks and the brand's support inbox feed incoming customer emails into the pipeline.
OpenAI handles both — classifying the email by category such as order status, return request, product question, complaint, or shipping issue, and generating a draft response grounded in the brand's voice and policies.
If the model's confidence on either classification or response is below threshold, the email is routed to a human via Zapier instead of being sent directly. This was the most important design decision on the project: failure modes for AI-generated customer responses are serious enough that the threshold has to be conservative.
Per-brand prompt templates encode tone, policies, and escalation rules so the same architecture can be repointed at a new brand without code changes — just configuration.
The conservative confidence threshold wasn't a technical afterthought — it was the core product decision. An AI-generated reply that upsets a customer costs more than a missed automation. Routing to human on uncertainty is the right default for D2C support at this scale.
System Architecture
Shopify webhooks and the brand's support inbox trigger the pipeline on each new customer email.
OpenAI handles intent classification and draft response generation in the same pipeline, grounded by per-brand prompt templates.
Below-threshold predictions route to human review via Zapier; above-threshold responses are sent or queued for send.
Per-brand prompt templates encode tone, policies, and escalation rules. Repoint at a new brand through configuration rather than rebuilding core logic.
Technologies Used
Result
Deployed in 2023 to a D2C beauty brand on Shopify. The system handled high-volume repetitive categories — order status, return requests, basic product questions — without human touch, while routing complex or ambiguous cases through Zapier to the brand's support team.
I haven't been involved in operations since handoff, so I can't speak to current production status or long-term volume metrics. What I can speak to is the build: working pipeline, clean handoff, and a system that fit the brand's actual scale rather than over-engineering for hypothetical scale.
What I'd Approach Differently Today
Two things.
Tooling has caught up.
In 2023, OpenAI + Zapier was the right choice for a fast freelance build. Today I'd evaluate purpose-built tools like Gorgias, Re:amaze with LLM features, or vertical-specific agents before writing a custom pipeline. For a single-brand deployment, the build-vs-buy math has shifted toward buy.
Confidence calibration matters more than I thought.
The conservative threshold I shipped with worked, but I did not have a principled way to set it — it was tuned by trial. Today I'd build a small held-out evaluation set from the brand's actual email history to calibrate the threshold against real escalation cost versus miss rate.