Agentic Engineering — software that reasons, decides, and learns — applied to one of e-commerce's most persistent operational challenges.
LumaCogent exists to close a gap that has widened for years. Ambitious, growing e-commerce companies operate with a fraction of the operational intelligence available to the largest players in the market. Amazon and its peers have spent a decade building proprietary AI systems that make their operations faster, smarter, and self-improving. Mid-market companies largely do not. We are building the systems that change that.
We started with reverse logistics — one of the most pressing problems in e-commerce, and one that reaches into almost every function of the business: customer experience, warehouse operations, finance, inventory, and merchandising.
We asked ourselves: how might we make reverse logistics fundamentally better — not incrementally, but from first principles? Recent advances in Agentic Engineering — software that reasons, decides, and learns — reveal an opportunity to do exactly that. The question is not how to automate the existing process. It is whether the entire function can be redesigned into something structurally different. The answer, we found, is yes — and the impact is substantially larger than optimising the current state.
The same systems architecture applies to other operational functions across e-commerce. Returns is where we started.
What follows covers what we found when we mapped the as-is reality, what we built, the architecture underneath it, the methodology that guided every decision, and a working proof of concept.
The business impact is well understood by anyone operating at scale. What is less visible is why these problems persist — and what is actually driving them.
To understand what was driving these outcomes, we mapped the operation end-to-end — every actor, every system, every handoff — across every stage of the return journey.
The mapping revealed 43 distinct issues. These were not edge cases — they were structural, recurring, and compounding. The most impactful:
| 1 | Reason code accuracy is structurally low — shoppers select the fastest dropdown option; all downstream analytics are corrupted at the source | Initiation | P1 |
| 2 | Inspection findings never reach Merchandising — damage type, quality defects, and sizing issues are not reported to the Buying team; high-return SKUs are reordered without modification | Inspection → Merchandising | P1 |
| 3 | Restocked returns never reach Merchandising — Buying team not notified when returned inventory re-enters stock; cannot inform promotions, markdowns, or buying decisions | Inventory → Merchandising | P1 |
| 4 | Restock cycle time is 5–10 business days — popular SKUs are off-market during the window; direct margin loss on every day a sellable item is unavailable | Inventory Management | P1 |
| 5 | Returns data fragmented across 3–5 disconnected systems — OMS, WMS, CS platform, and accounting each hold a piece; no single source of truth; reporting assembled manually in Excel, always 5–14 days lagging | Reporting | P1 |
| 6 | WMS has no advance return notice — warehouse receives zero pre-notification of inbound returns; receiving is entirely reactive; peak backlogs stretch processing to 1–2 weeks | Warehouse Receiving | P1 |
| 7 | Refund-on-receipt fraud window — most retailers refund before inspection; items in poor condition, wrong items, or empty boxes are discovered only after the refund is already processed | Refund Processing | P1 |
| 8 | Total cost of returns is systematically underestimated — direct costs tracked; indirect costs excluded: lost sale during restock window, liquidation discount vs. full-price recovery, customer churn | Finance / Reporting | P1 |
| 9 | Operations Supervisor has no real-time visibility across Stages 3–7 — no dashboard for in-transit volume, receiving throughput, inspection queue, disposition split, or restock cycle time; all decisions are reactive | Operations | P1 |
| 10 | Grading standards are informal and unenforced — no documented rubric with reference photographs; grades drift across staff, shifts, and days; borderline items decided individually with no documented criteria | Inspection | P1 |
| 11 | No photographic documentation at inspection — disputed grades cannot be reviewed; fraud evidence is lost; damage attribution is subjective and costly | Inspection | P1 |
| 12 | Disposition rules are informal and not programmatic — similar items in similar condition receive different dispositions depending on which staff member processes them; liquidation is reactive and poorly timed | Disposition | P1 |
| 13 | OMS reason code data corrupts all downstream analytics — Shopify Analytics and NetSuite reports give a misleading picture of why items are actually returned; no cycle time tracking, no disposition split, no SKU-level return rate | All downstream | P1 |
| 14 | Returns portal absent for smaller retailers ($5M–$30M ARR) — CS agents manually verify eligibility and create RMAs in Shopify; not scalable; no reason code capture; no WMS notification | Initiation | P1 |
| 15 | Re-tagging and inventory re-entry is entirely manual — every item individually re-tagged with SKU, price, and condition; scales linearly with return volume; bottleneck at peak | Inventory Management | P2 |
| 16 | SKU entry errors create ghost inventory — returned items entered under wrong SKU, size, or colour appear available in OMS but are physically misplaced; error propagates until next inventory count | Inventory / OMS | P2 |
| 17 | WMS → OMS sync is not real-time — receiving confirmation and inventory updates batch-sync; shopper's refund is held even though item has physically arrived; creates CS contacts and cash flow lag | Receiving / Inventory | P2 |
| 18 | 3PL data handoff is inconsistent and delayed — batch files from 3PL to merchant are sometimes incomplete; receiving confirmation, condition flag, and RMA match data are unreliable | Warehouse Receiving | P2 |
| 19 | RMA matching at receiving is manual and error-prone — missing, damaged, or wrong-item returns require manual investigation; no-RMA returns create further bottleneck | Warehouse Receiving | P2 |
| 20 | BNPL refunds are a separate, manual workflow — Klarna, Afterpay, and Affirm refunds not automatically triggered by OMS; require manual action in each provider's portal; high error rate and delay | Refund Processing | P2 |
| 21 | Partial refund calculation has no systematic tool — damage-adjusted refund amounts determined by individual judgment with no documented framework; inconsistent outcomes, no audit trail | Refund Processing | P2 |
| 22 | Chargebacks are under-tracked and costly — rate and cost of bank disputes not tracked separately from returns costs in P&L; no proactive chargeback management | Finance | P2 |
| 23 | Liquidation management is reactive and manual — items accumulate in staging for weeks before vendor contact; batches poorly curated; timing driven by storage pressure not price optimisation | Disposition | P2 |
| 24 | Shrinkage in staging zones goes undetected — items can be moved or mis-routed without WMS record; discovered only at weekly or monthly inventory reconciliation | Disposition | P2 |
| 25 | Condition codes are freeform or inconsistently defined — "Like New" means different things across WMS configurations and staff members; condition data is structurally unreliable | Inspection | P2 |
| 26 | Paper-based inspection workflows eliminate data entirely — at less mature operations, condition data captured on paper and never entered into WMS; inspection outcome data does not exist in any system | Inspection | P2 |
| 27 | Portal analytics siloed from back-end operations data — Loop/Narvar dashboards cover initiation and reason code distribution but are not connected to WMS inspection outcomes, OMS refund records, or carrier performance | Reporting | P2 |
| 28 | Return policy is buried and ambiguous — not surfaced at product page, cart, or checkout; vague language generates pre-emptive CS contacts before return is initiated | Pre-initiation | P2 |
| 29 | No real-time outstanding refund obligation view — OMS refund data is a daily or weekly rollup; Operations Supervisor cannot model total refund exposure for open returns; cash flow pressure at peak | Finance / Operations | P2 |
| 30 | High-value / high-fraud-risk escalation is ad hoc — no WMS flag triggers supervisor review for high-value or high-fraud-risk items; inspector decides unilaterally with no formal protocol | Inspection | P2 |
| 31 | Damage attribution is subjective — determining customer-caused vs. manufacturing-caused damage requires product expertise floor staff often lack; conservative, costly disposition decisions result | Inspection | P2 |
| 32 | Exchange processing is double-handling — an exchange must be processed as a return plus a new order; two OMS records, double labour, poor data integrity, no native tooling at mid-market scale | Refund / OMS | P2 |
| 33 | OMS not receiving WMS inspection data — return record stays at "Received" even after inspection completes; refund trigger may be delayed or incorrect depending on merchant's refund policy | Inspection → OMS | P2 |
| 34 | WMS holds most granular returns data with no BI connection — inspection outcomes, disposition splits, and cycle times only accessible via manual CSV export; not connected to any BI tool at mid-market scale | Reporting | P2 |
| 35 | No self-service refund status tracking for shoppers — shopper cannot check refund status between receipt confirmation and bank posting; every inquiry requires a CS contact | Post-refund | P2 |
| 36 | Refund timing is unpredictable and uncommunicated — 5–14 business day total processing is common with no proactive status update after receipt confirmation | Refund Processing | P2 |
| 37 | Webhook failure risk on tracking events — if OMS tracking webhook fails, return status stays at "Pending" despite item being in transit; shopper receives no confirmation; CS spike | In-transit | P3 |
| 38 | Accounting system sync is batch — QuickBooks/NetSuite do not receive refund events in real time; Finance reconciliation is always lagging; cost of returns always delayed in P&L | Finance | P3 |
| 39 | Multi-carrier tracking is fragmented — merchants using multiple carriers have no unified performance view; transit time, loss rate, and damage rate siloed in individual carrier portals | In-transit | P3 |
| 40 | No benchmarking capability — cannot compare return rate, processing time, or recovery rate against segment peers; no way to assess whether performance is good, average, or poor | Reporting | P3 |
| 41 | Packaging burden on shopper — shopper must source own materials; no packaging provided; friction point that increases return abandonment | Return Shipping | P3 |
| 42 | Label delivery friction — PDF label requires home printer; QR code requires working scanner at drop-off; both create friction and abandonment for less tech-savvy shoppers | Initiation / Shipping | P3 |
| 43 | USPS scan gaps create phantom in-transit periods — packages accepted at USPS post office may not be scanned for 24–48 hours; no tracking event fires; CS contacts spike; Ops Supervisor has no visibility | In-transit | P3 |
These were not operational inefficiencies to be optimised. They were structural failures — and they pointed to a different question: not how to make the current process faster, but whether it should be replaced entirely.
Having mapped the as-is state and defined the problem, we shifted to design. We created a To-Be Service Blueprint that reimagined every actor's role, every system interaction, and every handoff from first principles. Alongside it, we wrote experience scripts — one for each human in the operation — describing what their working day should feel like in the redesigned world. These two artefacts became the specification the architecture was built to deliver.
The shopper opens the brand's website and describes their problem in their own words. No dropdown. No form. The conversation agent identifies the order, classifies the return reason accurately behind the scenes, and schedules a home pickup. Done in under two minutes. For the first time, the reason code that enters the system reflects what actually happened.
Inside the operation, the item arrives pre-matched to its record — the warehouse knew it was coming the moment the shopper completed their return. No reactive receiving. No manual RMA matching. The inspector reviews a pre-assembled context package: AI condition assessment, reference photographs, SKU return history, and a fraud signal score. They adjudicate rather than guess, and every decision is photographically documented. A disposition recommendation arrives with full reasoning. The inventory manager approves or overrides — starting from knowledge, not from scratch.
The operations supervisor opens a single view: in-transit volume, inspection queue, disposition split, and restock cycle time — live, from one source. Not assembled from five systems in a spreadsheet. The finance team's reconciliation is pre-assembled; BNPL refunds trigger automatically; chargeback evidence assembles itself.
To show what the redesigned operation actually feels like to work inside, we wireframed the future experience of the Operations Supervisor — the role that, in the current state, spends most of its day assembling data that should arrive automatically.
In the redesigned operation, Diana opens a single surface. She sees live signals ranked by severity, her team's workload at a glance, and a queue of decisions that genuinely need her judgment. Policy proposals from the Learning Agent surface automatically — patterns the system has spotted and structured for her approval. The Cold Start indicator in the header is not cosmetic; it communicates exactly where the system stands in its trust-building phase, and what level of autonomous authority it currently holds. She is no longer a data assembler. The interface is designed around that shift.
The merchandising buyer receives what was previously impossible: structured intelligence from the warehouse — defect patterns by SKU, sizing failure distributions, quality alerts tied to the buying cycle. The feedback loop that has compounded return rates for years is now structural and automatic. High-return SKUs get corrected before the next season's purchase order is cut.
None of these improvements is isolated. When reason code accuracy goes from 60% to 90%, the entire downstream picture changes — fraud detection sharpens, merchandising intelligence becomes reliable, and the return rate starts to fall. When restock cycle time halves, sellable inventory returns to market faster, margin recovery accelerates, and the operation handles the same volume with fewer hands. These are not efficiency gains on individual tasks. They are structural changes that reinforce each other.
| Metric | Today — typical mid-market | Redesigned Operation |
|---|---|---|
| Shopper return experience | Static portal, 10+ minutes | Conversational, <3 minutes |
| Restock cycle time | 5–10 business days | <48 hours |
| Inspection throughput | ~50 items / person / day (manual) | ~120+ items / person / day (AI-assisted) |
| Disposition decision time | Hours to days (manual, per-item) | Minutes (recommended with reasoning) |
| Reason code accuracy | ~40–60% (dropdown selection) | 90%+ (NLP-derived from conversation) |
| Cost visibility lag | 5–14 days (CSV / Excel) | Real-time (unified dashboard) |
| Merchandising feedback loop | Broken (manual, ad-hoc) | Structural and automatic |
| True cost of returns visibility | Partial (direct costs only) | Fully loaded |
Modeled projections based on architectural design and industry benchmarks. Not observed production data.
The Human Actor Transformation Model shows what changes for each of the six roles in the operation — from where they spend their time today, to the role the redesigned system creates for them. Volume moves to the system. Judgment stays human. What each person actually does becomes meaningfully different — and, by design, more valuable.
Six roles, transformed. The system handles volume; humans handle judgment.
| Role | Today | Transformed | What the System Handles |
|---|---|---|---|
| Operations Supervisor | Data assembler — CSV exports, Excel reports, manual issue identification | Strategic supervisor — reviews prepared alerts, approves decisions | All data aggregation, all SLA monitoring, all routine reporting, all inbound forecasting |
| Returns Inspector | Manual inspector — every item, manual grade, informal criteria | Expert adjudicator — reviews AI recommendations, confirms or overrides | All grading of high-confidence items; all SKU history lookup; all fraud signal matching |
| Inventory Management | Disposition decision-maker from scratch — no system support | Recommendation approver — reviews AI reasoning, overrides when context differs | All disposition recommendations with reasoning; all vendor communication on approval |
| Finance Team | Manual reconciler — BNPL portals, chargeback spreadsheets, lagging data | Exception reviewer — two-minute exception queue, pre-assembled packages | All BNPL routing; all chargeback evidence assembly; all accounting sync |
| Merchandising / Buying | Data chaser — emails Operations for return rate data that rarely arrives | Intelligence consumer — structured, buying-cycle-aware intelligence automatically routed | All returns-to-merchandising routing; all defect threshold alerting |
| Warehouse Receiving | Manual operator — scans every package, matches RMAs by hand | Exception handler — monitors terminal, handles exception lane only | All pre-matched receiving; all conveyor routing for in-scope returns |
"We didn't build a returns tool. We rebuilt the operation — and the same approach reimagines every function in e-commerce."
The methodology determined what problems to solve and for whom. The design principles governed what we optimised for in every decision. The architecture principles governed how we built the system that delivered it.
These principles governed every product and experience decision throughout the design process. They are not aspirations — each one was tested against the experience scripts written for each human actor in the operation. Where the design failed the principle, the design changed.
These principles governed structural decisions — what it means to be AI-native rather than AI-decorated. Each principle has a direct implication for how the system learns, scales, and remains trustworthy as autonomous authority expands over time.
The returns function runs on six independent layers. Each has a single responsibility. Each upgrades without touching the others. The system improves continuously without being rebuilt.
Most enterprise AI deployments add models on top of unchanged processes. The model reasons; the underlying system does not. Our architecture is structurally different. Knowing, deciding, and doing are separated into six independent layers — each with a single responsibility, each upgradeable without the others knowing. The agents never query databases directly. All context is assembled by a purpose-built service before any reasoning begins. Every decision is logged to memory — and every log is training data. The result: a system that learns from every activation, expands its autonomous authority as it earns trust, and does not require a rebuild to become smarter. This is what AI-native means in practice.
Layer 3 is where the system thinks. Each agent perceives relevant events, assembles a context package via the Context Assembly Service, reasons to a structured output, and acts within a strictly enforced tool registry. Every decision is logged to episodic memory — training data for the next activation. The mesh starts with four agents at MVP and expands to ten as the system earns trust and accumulates episodic memory.
A working two-agent prototype demonstrates the inspection and resolution pipeline on real return records — the same agent design, the same authority boundaries, the same reasoning pattern as the production architecture.
The architecture we built for returns is domain-agnostic. The same six layers, the same agent mesh, the same self-learning loop — applied to other operational functions in e-commerce.
What we built is not a returns company. It's a way of reimagining e-commerce companies — one operational function at a time.
LumaCogent is an early-stage company. The architecture described in this piece is designed and specified. The methodology is proven through the work documented here. We are now building the first production version of this system, and we are in early conversations with a small number of mid-market apparel retailers who want to be part of that build.
We are not pitching a finished product. We are looking for merchants who understand that their returns problem is both a cost problem and a data problem, and are ready to approach it that way.
If that's your situation, we'd like to talk.
Start the conversation →No deck. No proposal. A direct conversation about your operation.