All posts
Perspective2026-05-26

From forecasting to decision: why ML models aren't enough in retail

Retail forecasts have never been more accurate, yet operational results haven't budged. Here's the gap between predicting and deciding — and how to close it.

Damien Didelot12 min read

For a few years now, the discourse has become ubiquitous in retail. "We're putting machine learning in." "Our new forecast reaches 92% accuracy." "We trained a demand model at the SKU/store level." The data leadership of large chains has tooled up, vendors have multiplied their offerings, and it's no longer rare to come across forecasting projects with several full-time data scientists.

And yet, when you look at operational results — stockout rates, overstock levels, markdown rates, replenishment performance — the situation has barely moved in most chains. Forecasts are more accurate than five years ago. Decisions, however, remain largely sub-optimal.

This dissonance isn't a failure of machine learning. It's the expression of a structural misunderstanding about what an ML model can, and can't, do in a retail context. A forecast is not a decision. And until this distinction is understood — and especially, translated into platform architecture — data science investments will keep producing very pretty predictions that change nothing on the ground.

This article looks straight at the gap between forecasting and decision: why it exists, why it's so persistent, and what to put in place to cross it.

The founding misunderstanding: confusing predicting with deciding

In most retail data projects of the past decade, the implicit reasoning has been: "if we predict demand better, we'll mechanically take better decisions." This idea sounds like common sense. It's deeply wrong.

A forecast, however accurate, says only one thing: how many units of a product are likely to sell in a store over a horizon. It doesn't say whether you should replenish, mark down, transfer, or do nothing. It doesn't say how deep to mark down. It doesn't say which store to transfer to. It doesn't say how to arbitrate between these options when several are possible simultaneously. It doesn't say whether the decision is compatible with your business rules, supplier constraints, current commercial strategy.

Between prediction and decision, there's a gap that holds almost all the operational complexity of retail. And that gap, ML forecasting models don't cross. They don't even claim to — it's their users who projected onto them a promise they never carried.

The result is familiar to every retail leadership that's invested in this kind of project. You get a technically remarkable forecast, praised in committee, presented at sector conferences. And three months later, on the ground, stores keep getting inadequate replenishments, markdowns keep being set in Excel, and operational performance hasn't budged.

Why an excellent forecast doesn't produce, on its own, a good decision

Several structural reasons explain this gap. They're worth looking at in detail, because they determine what needs to be built on top of the forecast for it to become really useful.

Reason #1: a forecast doesn't carry a trade-off

A typical retail decision is almost always a trade-off between several competing objectives. Should you mark down now at -20%, or wait two weeks and risk going to -40%? Should you transfer this stock from store A to store B, or leave it and hope a local promo revives sales? Should you replenish this SKU at the risk of creating overstock, or accept a few days of stockout?

Each of these questions implies weighing costs: markdown cost vs carrying cost, transfer cost vs sale probability, stockout cost vs overstock risk. The forecast provides part of the inputs needed for this weighting. It doesn't do the weighting itself. Doing the weighting requires economic logic, business rules, strategic preferences — all elements that are not in a demand model.

Reason #2: a forecast ignores real constraints

A demand model predicts what could sell. It knows nothing of the constraints that limit what can be done. The supplier imposes an MOQ of 200 units when the prediction suggests ordering 50. The store contract forbids certain markdowns before W+8. The category's floor margin doesn't allow a discount past 35%. The transfer's logistics cost exceeds the recovered margin. A store's receiving capacity caps what you can ship this week.

None of these constraints are in the forecast. And yet they're what determines what's actually doable. A system that proposes decisions without integrating these constraints produces, by construction, recommendations that are either inapplicable or ignored by teams — which amounts to the same.

Reason #3: a forecast is probabilistic, a decision is binary

A good ML model doesn't give a single prediction: it gives a distribution. "There's a 70% chance weekly sales will be between 8 and 12 units, 20% above, 10% below." This information is precious — it says something about the uncertainty around the prediction.

But in real life, you can't replenish "70% of an order." You can't mark down "probably at -25%." At some point, you have to commit: order or not, mark down or not, transfer or not. This translation from probabilistic continuum to binary decision requires threshold logic, error-cost logic, risk aversion. Again: not in the forecast.

Reason #4: a forecast is local, a decision is systemic

Retail ML models generally predict at the SKU/store/week level. Each prediction is made independently — or with weak account of other predictions. In reality, retail decisions are interdependent. Marking down product A can cannibalize product B's sales. Transferring stock between two stores creates a global logistics cost. Replenishing one category consumes receiving capacity no longer available for another.

A retail decision at scale is a problem of systemic constrained optimization, not an addition of individual predictions. Handing the final decision to an aggregate of SKU/store forecasts means ignoring the entire orchestration dimension that makes the difference between effective and incoherent steering.

Reason #5: a forecast is silent on what action to take

This is the most fundamental limit, and the one that makes all the others operative. Even if a model has perfectly predicted that a product will underperform in a store, it doesn't say what to do. Launch a markdown? Transfer the stock? Activate a targeted promotion? Request a supplier return? Do nothing and accept the loss?

Each of these actions has a cost, a delay, a risk, consequences on other business dimensions. Choosing between them isn't prediction anymore — it's prescription. And prescription requires a software layer different from the ML model: a decision engine that, based on forecasts and business rules, formulates the right action recommendation for each case.

The accuracy trap: why chasing 95% is often a strategic mistake

The industry long confused project data performance with forecast performance itself. How many projects have been launched with the primary goal: "go from 78% to 90% accuracy on demand forecasting"? How many steering committees have dedicated meetings to analyzing why the model got it wrong on a cluster or category?

It's not that accuracy doesn't matter. It does. But it counts much less than people think — and its return drops sharply past a certain threshold.

Here's why. Forecast accuracy improves a decision only to the extent that the decision is sensitive to it. In most operational cases, the decision isn't that sensitive. A product selling half as fast as its peers will be marked down, whether you predict 5 units per week or 7. A structurally overstocked store will be identified as such whether prediction accuracy is 90% or 95%. The final decision depends on thresholds, rules, trade-offs — not on the second decimal of the prediction.

Meanwhile, going from 90% to 95% accuracy typically costs more than going from 70% to 90%: you need more data, more features, more training, more governance. And that marginal cost is, in practice, almost always spent on the wrong problem.

The real value reservoir isn't in the last five points of forecast accuracy. It's in turning existing forecasts into executed decisions. That passage from prediction to action holds most of the recoverable margin — and is precisely what most data projects neglect.

The solitary data scientist: a frequent anti-pattern

Let's say it clearly, because it's one of the structural causes of the gap between forecast and decision: in many retail organizations, data and operations live in separate worlds.

On one side, a data team, often attached to IT or a cross-functional leadership, builds increasingly sophisticated models. It talks about RMSE, cross-validation, feature engineering, model drift. It delivers its predictions as files, dashboards, or APIs.

On the other, the merchandising, supply, operations teams, talking about cover, margin floor, sell-through, replenishment, commercial calendar. These teams have business rules in their heads, operational constraints in their legs, and a field sense no one else has.

Between the two, the gap is rarely bridged. Data scientists produce technically excellent forecasts disconnected from real business rules. Operations teams receive these forecasts and have no trust in predictions that ignore the constraints they know by heart. Result: the forecast is used at the margin — or not used at all — and decisions keep being made the old way, on experience and Excel files.

It's nobody's fault. It's the result of an organizational architecture that separates those who predict from those who decide. And as long as this separation exists, no ML investment will produce the expected gains.

What to build on top of ML: the decision layer

The question is no longer whether to do machine learning in retail — the answer is obviously yes. The question is: what do you build on top to turn these predictions into operational performance?

Four bricks are necessary.

Brick 1: a decision engine that turns predictions into action recommendations. Based on forecasts (of demand, markdown risk, stockout probability), the engine formulates for each SKU/store a concrete recommendation: transfer, mark down, replenish, activate a promotion, return to supplier, or do nothing. The output is no longer a number, it's an action.

Brick 2: native integration of business rules and constraints. Margin floors, supplier constraints, commercial calendars, logistics capacities, category arbitration rules — all of this must be codified into the engine so recommendations are both optimal and applicable. A perfect recommendation that violates a business rule will never be executed. A less optimal but constraint-compatible recommendation, however, will be implemented.

Brick 3: systemic optimization logic. Beyond individually-taken SKU/store decisions, the engine must account for interactions: cannibalization between products, mutualization of logistics costs, allocation of constrained capacities. It's this systemic view that allows moving from a sum of good local intentions to a global network optimization.

Brick 4: execution without breakage. A decision has value only if it's executed. The engine must therefore be connected downstream to execution systems — ERP, WMS, e-commerce, pricing — to propagate validated actions without re-entry. This prediction → decision → execution continuum is what turns ML from an analytical exercise into an operational lever.

This four-brick architecture is what separates a retail data project that produces margin from a retail data project that produces dashboards. And it, not forecast accuracy, deserves the bulk of the investment.

ML's real role in this architecture

Let's be clear: none of this diminishes the value of machine learning. On the contrary — ML remains indispensable in a modern retail decision platform. It provides the inputs without which the decision engine would be blind: demand forecasts, price elasticities, stockout probabilities, risk scores, product segmentations, store clustering.

But its role changes. It stops being the project's purpose to become what it should always have been: a component in a broader architecture whose purpose is operational decision. Project success is no longer measured by model accuracy, but by the quality and speed of decisions executed on the floor.

This paradigm shift has concrete implications. You accept that a forecast at 85% accuracy, exploited by an intelligent decision engine and executed frictionlessly, will produce much more value than a 93% forecast that ends up ignored in a dashboard. You stop chasing decimal points of accuracy to invest in the prediction → decision → action transformation chain. And you realign the organization around a single objective — operational performance — rather than around disjoint technical goals.

The Solya approach: from forecast to executed action

That's precisely the architecture Solya puts in place at retailers who want to turn their data investments into operational performance. Not by proposing yet another forecasting engine competing with existing models — but by building the missing layer that turns predictions, whether they come from Solya or elsewhere, into decisions executed on the floor.

Concretely, Solya connects to your data sources — POS, ERP, e-commerce, supply chain, internal tools — and rebuilds a unified view of the network at the SKU/store level. That data feeds a decision engine combining demand predictions, the chain's business rules, and operational constraints to formulate, continuously, optimal actions: transfers, markdowns, replenishments, supplier returns, or argued status quo. Validated recommendations are propagated to existing execution systems without re-entry.

ML models, whether developed in-house by your data teams or integrated by Solya, are no longer the goal — they become one input among others in a system designed to decide and execute. And it's this end-to-end integration, from forecast to floor action, that makes the economic impact of your data investments measurable.

The real question to ask

It's not "is our forecast accurate enough?". It's "how many operational decisions does our organization take, each week, based on our predictions?".

If the answer counts in the hundreds rather than the tens of thousands, your models are probably producing far more analysis than action. And the next step isn't to perfect the models. It's to build the layer that turns their outputs into executed decisions at scale.

In a sector where combinatorial complexity has long exceeded human arbitration capacity, prediction quality is no longer what makes the difference. It's the ability to convert predictions, continuously and frictionlessly, into concrete network actions. That's where, and nowhere else, the next generation of retail performance gains plays out.


Do your predictions deserve to be executed?

At Solya, we offer retail leadership teams a personalized 30-minute diagnostic to assess, on your own data stack, the gap between your predictive capabilities and your decision capabilities — and quantify the value potential you're leaving on the table in that blind zone.

👉 [Book your Solya diagnostic] — 30 minutes, by video, with one of our retail experts.

You'll walk away with:

  • A map of the current chain between your forecasts and your operational decisions
  • An estimate of margin potential recoverable by industrializing this chain
  • The first high-ROI use cases to turn your predictions into floor performance

Related articles