Activity forecasting: why mathematical sobriety outperforms Big Data

Share this article

Jean-Sébastien Mackiewicz and Jérémy Nusa — Hub One DataTrust conference at Global Industrie

At the latest Global Industrie trade show, Jean-Sébastien Mackiewicz and Jérémy Nusa, PhD in mathematics at Hub One DataTrust, shared a vision that runs counter to traditional "Big Data". Between a probabilistic approach and "Ockham's razor", here is a look back at a method in which algorithmic efficiency arises above all from sobriety and domain expertise.

The illusion of volume: the sobriety approach

In an industrial landscape saturated with terms such as "Hyperscalers", "Big Data" or "Digital Twins", the temptation to accumulate data is strong. Yet Hub One DataTrust advocates the opposite approach, grounded in a 14th-century philosophical principle: Ockham's Razor.

As Jérémy Nusa reminded the audience, this principle — later formalised mathematically by Solomonoff's theorem — demonstrates that the shortest explanation (or the simplest model) is often the most effective. To achieve robust activity forecasts, you do not need "more" data, but better-chosen data according to three criteria:

  1. Representativeness: data must be anchored in operational reality. In logistics, a vehicle's cargo volume is representative data; the truck brand, far less so.
  2. Stability: using ten years of historical data is pointless if the information systems have changed three times over. Success relies on data whose structure has remained consistent over time.
  3. Relevant historisation: to detect cycles and "patterns", three years of reliable data is often more than enough to build an effective predictive model.

AI & Modelling: Stepping out of the "Black Box"

One of the key messages of this talk concerned the nature of the models used. Faced with the rise of Large Language Models (LLMs), Hub One DataTrust reminds us that, for operational industrial forecasting, mathematical rigour must take precedence over model size.

The recommended approach rests on two pillars:

  • Probabilism: a model must natively account for the fact that data is biased or incomplete. By tackling the subject from a probabilistic angle, the algorithm learns to ignore "noise" and anomalies in order to retain only the real underlying trend.
  • Explainability: unlike opaque neural networks, explainable models make it possible to understand why a forecast is made. It is this transparency that secures buy-in from business teams.

From model to action: the role of the Trusted Third Party

A forecast, however mathematically perfect, only has value if it is actionable. For Hub On