Supply Chain · Service Analyst Case Study

A distribution network that couldn’t see its own blind spots.

A network of 6 distribution centers, 126 stores, and 20 products, simulated across 90 days. Sales were quietly leaking to stockouts no one saw coming — buried under messy, real-world data. This is the end-to-end analysis that found the money, then built a control tower to keep watch.

$1.53M
estimated lost sales to stockouts
4.1%
of CEDIS-days ran short
6
distribution centers (CEDIS)

All data is 100% synthetic. Lost sales is a labeled estimate.

Mexico CityGuadalajaraMonterreyPueblaMeridaTijuana

6 CEDIS · simulated national network

02 · The Raw Data

Real operational data never arrives clean.

These are raw shipment records, straight from the source — the kind of messy operational data an analyst actually inherits. Of 13,715 rows, 2,583 (18.8%) carried at least one quality issue. A representative sample, every problem flagged:

Sample of raw shipment rows. Cells with data-quality issues are highlighted and named in the Flags column: SKU typos, incomplete loads, late deliveries, missing or negative quantities, unknown (phantom) stores, and duplicate records.
ShipmentStoreSKUOrderedReceivedPromisedActualFlags
SH-000002ST-0001sku-00565652024-01-062024-01-06SKU typo
SH-000009ST-0001sku-01936362024-01-292024-02-02SKU typolate delivery
SH-000005ST-0001SKU-01125142024-01-162024-01-16incomplete load
SH-000008ST-0001SKU-01757492024-01-262024-01-26incomplete load
SH-000220ST-9008SKU-00758582024-02-202024-02-20unknown store
SH-000350ST-9000SKU-01582822024-03-292024-03-29unknown store
SH-000002ST-0001SKU-01749492024-01-062024-01-07late delivery
SH-000005ST-0001SKU-01847472024-01-162024-01-18late delivery
SH-000059ST-0003SKU-020-50502024-02-032024-02-03negative qty
SH-000025ST-0001SKU-00649null2024-03-212024-03-21missing qty
SH-001394ST-0071SKU-00148482024-01-312024-01-31duplicate
SH-001394ST-0071SKU-00148482024-01-312024-01-31duplicate
amber — recoverable (SKU typo, incomplete load, duplicate)red — hard error (late, missing / negative qty, phantom store)

Note that one shipment ID can legitimately repeat across its SKUs — which is exactly why duplicates are detected on shipment + SKU, not the ID alone. Nothing is discarded yet: next, how each defect is detected and resolved — recovering what we can, dropping only the unusable.

03 · The Cleaning

Most problems get fixed, not thrown away.

Every injected defect was detected, then resolved by the cheapest honest means: recover in place where the data allows, flag and keep the real service events, and drop only what is genuinely unusable. 456 issues were repaired without losing a row; just 279 rows were dropped.

Raw intake

13,715

rows received

98.0%

retained

Clean output

13,436

rows kept

456 recovered · 2,000flagged & kept279 removed

Recovered

456

fixed in place — kept

  • SKU typos394

    Standardized to canonical SKU (uppercase; O→0, I/L→1). Rows retained.

  • Negative quantities62

    Sign error corrected to absolute value. Rows retained.

Flagged

2,000

real events — kept for follow-up

  • Incomplete loads801

    Flagged is_incomplete=True for follow-up. Rows retained.

  • Late deliveries1,199

    Flagged is_late=True for follow-up. Rows retained.

Removed

279

unusable — dropped

  • Duplicate shipment lines203

    Dropped exact duplicate (shipment_id, SKU) lines; kept first occurrence.

  • Phantom shipments12

    Dropped — store_id absent from the store master.

  • Missing received quantity64

    Dropped — received quantity missing and cannot be honestly imputed.

Detection check: every injected defect was found — 738/738 matched (typos, duplicates, phantom stores, missing/negative quantities).

04 · The Findings

Where the sales leaked — and why.

An estimated $1.53M (est.) in sales never converted — demand that arrived to empty shelves. Three questions: where did the loss concentrate, when do deliveries fail, and how good is the forecast?

Three CEDIS — Mexico City, Guadalajara, Monterrey — drive 62% of lost sales.

Lost sales concentrate sharply. Fixing the top three distribution centers first captures most of the recoverable upside — that is where to act.

Orders placed Thursday–Saturday slip the most.

Friday orders run late 12.4% of the time vs 6.2% on Tuesday — a weekend dispatch backlog. The dashed line marks the 8.9% network average.

Forecast error is ~18.9% across the board.

Only ~2 points higher for the event-driven categories (Snacks, Beverages) — worth noting, not a headline.

05 · The Control Tower

Everything above — live.

Filter by region, brand, and date. Every KPI and chart recomputes from the same daily aggregate. With no filter applied, the numbers equal the analysis exactly — the same data, made interactive.

Loading live data…

06 · The Prediction

Now — see it coming.

Everything so far looked backward. The last move is forward: a stock-out risk score for every SKU at every CEDIS — and, deliberately, one you can read. Not a black box; a transparent scoring heuristic where every point traces to a cause a planner can act on. That legibility is the point, not a limitation.

How the score works

A 0–100 score from four drivers, each adding points:

  • Days of cover — the dominant driver. Under 1 day adds 50, tiering to 0 by ~7 days.
  • Below safety stock — +20 when stock is under the buffer.
  • Demand volatility (CV) — up to +20 for erratic demand.
  • SKU rotation — +10 high / +5 medium; a fast-mover costs more to lose.

Capped at 100, bucketed Low <35 · Medium 35–59 · High 60+.

Worked example

SKU-015 · FrutaViva @ Guadalajara

Days of cover (1.48 days)
+40
Below safety stock (619 < 1,042)
+20
Demand volatility (CV 0.247)
+20
High rotation (fast-mover)
+10
Total
90/100 → High

The 7-day risk view

Of 120 SKU×CEDIS positions, the model flags a deliberately tight critical list — 13 High, 27 Medium, 80 Low. Most of the network is fine; attention goes where it’s earned.

Risk landscape

x: days of cover · y: score · dashed = Medium (35) & High (60)

Critical list — 13 highest-risk positions
SKUCEDISCoverCovSafVolRotScore
SKU-015GDL1.48d4020201090
SKU-010GDL0.58d50206581
SKU-017MER2.29d3020201080
SKU-001GDL2.46d3020201080
SKU-011MTY0.71d50200575
SKU-006GDL1.55d40206571
SKU-006TIJ1.65d40206571
SKU-002MTY2.45d302012567
SKU-014MX1.64d40206066
SKU-009GDL1.66d40206066
SKU-007GDL1.18d40206066
SKU-012MER1.93d40206066
SKU-017GDL2.94d300201060

The backward and forward views agree

Guadalajara drove 20% of historical lost sales — the #2 CEDIS in the Pareto — and now holds 7 of the 13 highest forward-looking risk scores. The place the analysis flagged in hindsight is exactly where the model says to look next. Two independent methods, one answer.

07 · The Recommendation

What I’d do Monday morning.

Three moves, in priority order. Each ties back to a finding above — none is generic, and the numbers are the analysis’s, not a wishlist.

  1. 01

    Concentrate replenishment where the loss is.

    Findings Three CEDIS — Mexico City, Guadalajara, Monterrey — drive 62% of an estimated $1.53M in lost sales.

    Review safety stock, reorder points, and inbound lead times for those three centers first. The loss is concentrated, so the earliest fixes capture most of the recoverable upside before touching the long tail.

  2. 02

    Move the weekend dispatch cutoff earlier.

    Findings Friday orders run late 12.4% of the time vs 6.2% on Tuesday — a weekend dispatch backlog.

    Bring the Thursday/Friday cutoff forward, or add weekend dispatch capacity, so late-week orders clear before the backlog builds. A scheduling change, not a capital one — cheap to pilot.

  3. 03

    Make the forecast event-aware, and run the risk watchlist weekly.

    Findings + Prediction Forecast error of 18.9% that misses the event surge, and 13 positions already flagged High-risk.

    Add an event- and payday-aware uplift to the forecast for beverages & snacks, and stand up the High-risk list as a standing weekly report — so planners pre-position stock instead of reacting after the shelf is empty.

If I could run only one play first: the Guadalajara replenishment review. It is the #2 historical loss and 7 of the 13 forward risks — the one place hindsight and the model agree. Highest confidence, first hour.