Methoden

1. Classical & Statistical Methods

ARIMA / SARIMA

ARIMA (AutoRegressive Integrated Moving Average) models a time series using its own past values (AR), differences (I) and past forecast errors (MA). It is very strong for stationary or difference-stationary series with clear autocorrelation structures. SARIMA extends ARIMA with seasonal components, making it suitable for data with weekly, monthly, or yearly patterns. These models are transparent and interpretable, and remain a standard baseline in industry.

ETS / Holt–Winters

ETS (Error, Trend, Seasonality) models combine exponential smoothing with an explicit structure for trend and seasonal components. Holt–Winters is a classic ETS-type model that updates level, trend and season in a recursive, adaptive way. It works especially well for short- to medium-term operational forecasts where recent data are more important than older observations. The model is easy to explain and often surprisingly competitive in practice.

GARCH

GARCH (Generalized Autoregressive Conditional Heteroskedasticity) models time-varying volatility rather than the level of the series. It is widely used in finance to forecast risk and volatility of returns. The model assumes that large shocks cluster in time and that volatility itself follows an autoregressive process. While it does not forecast the mean very well, it is essential when volatility and risk are the main focus.

Theta Method

The Theta method decomposes the time series into “theta lines” that emphasize different aspects of trend and curvature, and then combines their forecasts. It became famous for its strong performance in the M3 and M4 forecasting competitions. The method is conceptually simple but surprisingly powerful and robust across many domains. It is often used as a strong classical benchmark for univariate time-series forecasting.

VAR / VARMA / VECM

Vector Autoregression VAR extends univariate AR models to multiple time series by letting each variable depend on past values of all variables in the system. VARMA adds moving average terms, allowing both autoregressive and shock dynamics in a multivariate setting. VECM is a special form designed for nonstationary but cointegrated series and focuses on modeling both short term dynamics and long run equilibrium relationships. These models are widely used in macroeconomics and finance when feedback effects between variables are important and interpretability of cross impacts is required.

Kalman Filter and State Space Models

State space models represent a time series through latent states that evolve over time and produce observations through an observation equation. The Kalman filter is the recursive algorithm that estimates these hidden states in real time for linear Gaussian models, delivering optimal forecasts and uncertainty estimates. Many classical models such as ARIMA and exponential smoothing can be written in state space form, but the framework also allows custom components for trend, seasonality, cycles and exogenous effects. State space models handle missing data naturally and are very flexible for building structured, interpretable forecasting systems.

Croston Method and Variants SBA, TSB

Croston method is tailored to intermittent demand series with many zeros, such as spare parts or long tail product sales. It separately updates estimates of demand size and the interval between nonzero demands, and uses their ratio as the forecast, which avoids bias that standard methods show on sparse data. The Syntetos Boylan Approximation SBA corrects a known bias in the original method, while the TSB variant adds explicit smoothing for the probability of demand occurrence. These methods are standard baselines in inventory management and service level planning for items with sporadic demand.

Markov Switching Models

Markov switching models allow the parameters of a time series model to change according to an unobserved discrete state that follows a Markov chain. Typical examples are models that switch between recession and expansion regimes or between low and high volatility states. Each regime has its own dynamics, for instance different means or variances, and the model estimates both state specific parameters and transition probabilities between states. This framework captures structural breaks and nonlinear behavior that single regime models miss and is widely used in macroeconomics and financial time series analysis.

2. Hybrid Models (Statistics × Machine Learning

Prophet (Meta/Facebook)

Prophet is an additive model that decomposes a time series into trend, seasonality and special events (e.g. holidays). It is designed for business time series that have strong seasonal patterns, growth, and many outliers or missing values. The model automatically detects changepoints in the trend and handles multiple seasonalities (e.g. weekly and yearly). Its main strength is ease of use and interpretability, rather than squeezing out the last possible percentage of accuracy.

Auto-ARIMA

Auto-ARIMA automates the process of selecting the best ARIMA configuration (p, d, q and seasonal parameters) based on information criteria such as AIC or BIC. It systematically searches over many candidate models and chooses the one that balances fit quality and simplicity. This saves analysts from manual trial-and-error in model selection. It is particularly useful as a robust and scalable baseline in production pipelines.

TBATS / BATS

BATS and TBATS are extensions of exponential smoothing that use Box–Cox transformations, ARMA error structures, trend and trigonometric (Fourier) representations of seasonality. They are designed for time series with multiple and possibly non-integer seasonalities, such as hourly data with daily and weekly patterns. TBATS is especially effective when standard seasonal models fail to capture complex or long seasonal cycles. It is a go-to method when you have many overlapping seasonal patterns.

STL + ARIMA / ETS

In STL + ARIMA/ETS, the series is first decomposed into trend, seasonal and remainder components using STL (Seasonal and Trend decomposition using Loess). The remainder is then modeled with ARIMA or ETS, while the seasonal and trend components are forecast separately and recombined. This approach allows more flexibility, because each component can be treated with a different specialized method. It is robust against changes in seasonality and often provides very stable performance.

ARIMAX / Dynamic Regression

ARIMAX extends ARIMA by including exogenous variables (X), such as marketing spend, weather, or price changes, directly into the model. It allows you to quantify how external drivers influence the target time series while still capturing autocorrelation and seasonality. Dynamic regression is the broader concept of combining regression with time-series structures in the residuals. These models are important when you want explanatory, cause–effect aware forecasts rather than purely pattern-based predictions.

Ensemble Models (e.g. ETS + ARIMA)

Ensemble models combine forecasts from multiple individual models, such as ARIMA, ETS, and ML methods, to obtain a more robust overall forecast. The idea is that different models capture different aspects of the data, and averaging them reduces variance and model-specific bias. Ensembles often outperform any single method, especially in real-world noisy data. They are widely used in competitions and production systems for extra stability.

Generalized Additive Models (GAMs)

GAMs are regression models where the effect of each predictor is modeled as a smooth, possibly nonlinear function, but the overall model remains additive and interpretable. In time-series forecasting, GAMs can incorporate calendar effects, seasonal patterns, and external drivers with transparent smooth curves. They serve as a middle ground between classical statistics and opaque ML models. This makes them attractive for domains that value both flexibility and interpretability.

Gradient-Boosted Hybrid Models

Gradient-boosted hybrid models take features from classical time-series analysis (lags, rolling statistics, seasonal indicators) and feed them into boosting algorithms such as XGBoost or LightGBM. The time-series structure is encoded in the features, while the boosting model handles nonlinear interactions. This approach can significantly improve accuracy when many external features are available. It is a pragmatic way to combine domain knowledge from statistics with the power of ML.

Conformal Prediction (as Add-On)

Conformal prediction is not a forecasting model itself but a calibration layer that can be added on top of any model to produce valid prediction intervals. It uses past residuals to determine how wide intervals should be to achieve a desired coverage probability. The method is distribution-free and makes few assumptions about the underlying model. This makes it very attractive in practice for robust, trustworthy uncertainty quantification.

ES RNN

ES RNN is a hybrid model that combines exponential smoothing components with a recurrent neural network, originally developed for the M4 forecasting competition. Exponential smoothing handles local level and seasonality updates, while the RNN learns complex patterns across many series at once from shared representations. This design brings strong inductive bias from classical time series modeling together with the flexibility of deep learning. ES RNN demonstrated that such hybrids can significantly outperform both pure statistical and pure neural approaches on large forecasting benchmarks.

NeuralProphet

NeuralProphet is a neural extension of Facebook Prophet that keeps the familiar additive structure of trend, seasonality and holiday effects but augments it with learned autoregressive and lagged components. Built on PyTorch, it can use neural networks to model parts of the signal that are difficult for simple linear components, while still exposing clear decompositions for interpretability. NeuralProphet supports multiple seasonalities, future covariates and custom events, and often improves accuracy over Prophet when the data exhibits nonlinear or more complex dynamics. It aims to balance ease of use and explainability with the extra power of deep learning.

3. Machine Learning Models

Random Forest

Random Forests are ensembles of decision trees trained on bootstrapped samples of the data with random feature subsets. In time series, they operate on features like lags, moving averages, and calendar indicators rather than raw sequences. They are robust to noise, handle nonlinear relationships, and provide variable importance measures. However, they do not natively model temporal dependence, so good feature engineering is critical.

XGBoost

XGBoost is a highly optimized gradient-boosting library that builds trees sequentially to correct previous errors. In forecasting, it is used on engineered time-series features such as lags, seasonal dummies, and external variables. It often delivers top performance in competitive environments and Kaggle-style projects. Its main trade-off is lower interpretability compared to classical models, but tools like SHAP can help explain predictions.

LightGBM

LightGBM is another gradient-boosting framework, optimized for speed and scalability using histogram-based algorithms and leaf-wise tree growth. It can handle large feature sets and high-frequency time-series data efficiently. With appropriate feature engineering, it can capture complex nonlinear relationships and interactions between variables. It is often used in production for its combination of speed, accuracy, and reasonable resource usage.

Support Vector Machines (SVM / SVR)

Support Vector Regression (SVR) uses kernel methods to learn nonlinear relationships between input features and the target variable. For time series, features are typically lags, differences, and seasonality indicators. SVR can perform very well on moderate-sized datasets with clean patterns but does not scale easily to very large datasets. It is also less straightforward to interpret compared to linear or additive models.

k-Nearest Neighbors (kNN) Forecasting

kNN forecasting predicts future values based on similar patterns observed in the past. It searches for historical windows that resemble the current situation and uses their subsequent values as a forecast. This approach is simple and intuitive, and can work well when repeating motifs exist in the data. However, performance degrades when the series changes regime or when the dimensionality of features becomes very high.

Quantile Regression Forests / Gradient Boosting

Quantile regression adaptations of forests and boosting models predict specific quantiles of the future distribution rather than a single mean forecast. This allows direct estimation of prediction intervals and risk measures. These methods are useful when asymmetric risks matter, for example, underestimating demand is far more costly than overestimating. They can be used standalone or combined with conformal prediction techniques.

Rule-Based Models (e.g. RuleFit)

Rule-based models derive human-readable rules from tree ensembles and then use them in a linear model. For time series, rules might describe conditions such as “if last week’s sales were high and it is December, then demand will likely be very high.” This gives a bridge between flexible ML models and interpretable business logic. Such models are especially helpful when stakeholders need to understand decision logic in plain language.

Feature-Based Time-Series Pipelines (e.g. TSFresh)

Feature-based approaches automatically compute hundreds or thousands of statistical features from time series (e.g., autocorrelation, trend strength, entropy). These features are then used with generic ML models such as random forests or boosting. This strategy allows you to reuse powerful tabular ML algorithms while capturing rich time-series characteristics. It is particularly useful when you have many short time series and want to classify or forecast them in a unified way.

CatBoost

CatBoost is a gradient boosting library that is particularly strong with categorical features due to its ordered target statistics encoding and specialized regularization. In time series forecasting it is typically used on tabular features derived from the series, such as lags, rolling statistics, calendar variables and categorical identifiers for items or locations. CatBoost often matches or exceeds the performance of other tree boosting libraries while requiring less manual preprocessing of categorical variables. It is a strong candidate for production ready forecasting models when there are many covariates and complex interactions.

4. Deep Learning & AI Models
4.1 Recurrent & Convolutional Models

LSTM (Long Short-Term Memory)

LSTMs are recurrent neural networks designed to capture long-term dependencies by using gates to control information flow. They can model complex nonlinear relationships, multiple inputs, and long history in time-series data. LSTMs are widely used in forecasting, especially when simple linear models fail to capture intricate patterns. However, they can be slow to train and may require careful tuning and regularization.

GRU (Gated Recurrent Unit)

GRUs are a simplified variant of LSTMs with fewer gates and parameters. They often achieve similar performance to LSTMs but train faster and are easier to tune. In forecasting, GRUs are commonly used for multivariate sequences and problems with moderate sequence length. They are a good default choice when moving from classical models to deep sequence models.

TCN (Temporal Convolutional Networks)

TCNs use dilated, causal convolutions to capture long-range temporal dependencies without recurrence. They process sequences in parallel, making them faster and more stable to train than many RNNs. For forecasting, TCNs can model both short- and long-term patterns effectively. They are particularly attractive for large-scale applications where training speed matters.

BiTCN (Bidirectional Temporal Convolutional Network)

BiTCN extends TCNs by using bidirectional convolutions to learn from both past and future context in a training window. During training it can exploit the full sequence, improving feature extraction and pattern recognition. At prediction time, it typically uses only past context, but with richer learned representations. This can lead to more accurate forecasts, especially for complex and noisy time series.

DeepAR

DeepAR is a probabilistic recurrent neural network model designed to forecast many related time series simultaneously. It learns a global model across all series, which is especially useful when each individual series is short or noisy. The model outputs full predictive distributions rather than just point forecasts. It is heavily used in large-scale retail and supply-chain forecasting.

Encoder Decoder LSTMs Seq2Seq

Encoder decoder LSTM models, also called sequence to sequence or Seq2Seq, use one recurrent network to encode a history window into a latent representation and a second network to decode that representation into future values. This architecture is well suited for multi step forecasting because it can generate an entire forecast trajectory instead of predicting each horizon independently. Attention mechanisms can be added so the decoder focuses on the most relevant past time steps when producing each future point. Seq2Seq models are flexible and powerful for multivariate and multi horizon forecasting but usually require substantial data and careful regularization.

4. Deep Learning & AI Models
4.2 Transformer-Based Models

Transformer (Generic Architecture)

Transformers use self-attention mechanisms to weigh different time points in the input sequence and capture long-range dependencies without recurrence. They process all time steps in parallel, which scales well with modern hardware. For time series, they can handle long histories and heterogeneous inputs, including categorical and continuous covariates. The main challenges are data requirements and computational cost for very long sequences.

Temporal Fusion Transformer (TFT)

TFT is a specialized transformer architecture for forecasting that combines attention with recurrent layers and variable selection networks. It can handle static features, known future inputs (e.g. holidays), and observed past covariates in a unified way. TFT also provides interpretability through attention weights and variable selection scores, showing which inputs mattered when. This makes it attractive for industrial forecasting where both accuracy and explainability are required.

iTransformer

iTransformer introduces an “inverted” perspective on the time-series representation, often treating variables as tokens and leveraging attention in a more flexible way. This can improve performance in multivariate settings where relationships between variables are as important as temporal patterns. It is designed to be parameter-efficient while still capturing complex dependencies. As a modern architecture, it often outperforms earlier transformer variants on benchmark datasets.

PatchTST

PatchTST segments the time series into overlapping “patches,” similar to how Vision Transformers operate on image patches. Each patch is embedded and processed by transformer layers, allowing the model to learn local and global patterns efficiently. This patch-based representation helps handle long context windows without exploding computational cost. PatchTST has shown strong performance on long-horizon forecasting tasks.

TimesNet

TimesNet maps one-dimensional time series into multiple two-dimensional “time–frequency-like” representations and processes them using convolutional-style blocks inside a transformer-like framework. This allows the model to capture periodic patterns at multiple scales in a structured way. It has been reported to excel on diverse long-term forecasting benchmarks. TimesNet is particularly suitable when the data exhibits strong multi-periodic behavior.

Informer

Informer introduces probabilistic sparse self-attention to make transformers scalable to very long input sequences. It selectively focuses attention on the most informative time steps rather than all of them. This reduces memory and computation while maintaining good forecasting performance. It is often used for long-horizon forecasting in domains like energy and traffic.

Autoformer

Autoformer integrates an “auto-correlation” mechanism to explicitly model periodic patterns and long-term dependencies. It decomposes the input series into trend and seasonal parts and processes them separately. This decomposition helps the model handle long sequences without losing focus on main periodicities. Autoformer is particularly targeted at long-term forecasting tasks with clear periodic structure.

FEDformer

FEDformer (Frequency-Enhanced Decomposed Transformer) operates partly in the frequency domain, decomposing series into components and applying attention selectively. By working in frequency space, it can capture periodic and seasonal behaviors more efficiently. It combines time-domain and frequency-domain information to improve robustness. This architecture is especially useful when data exhibits complex, overlapping seasonal cycles.

Moirai

Moirai is a recent time-series foundation model that uses transformer-based architectures optimized for multi-scale, multi-horizon forecasting. It is designed to be pretrained on large time-series corpora and then adapted to specific tasks. The model emphasizes efficiency and flexible context windows, making it suitable for real-world deployments. Moirai aims to act as a general-purpose backbone for many forecasting problems.

ETSformer

ETSformer is a transformer architecture that explicitly incorporates ideas from exponential smoothing and classical ETS models into its design. It decomposes time series into level, trend and seasonal components and uses specialized attention and smoothing like operations to model each component. By embedding these inductive biases, ETSformer produces forecasts that extrapolate trends and seasonal patterns more reliably and are easier to interpret than generic transformers. It often delivers strong performance on long horizon tasks where stable trend and seasonality modeling are crucial.

Crossformer

Crossformer is a transformer variant tailored for multivariate time series that emphasizes cross variable dependencies. It organizes the data as a two dimensional structure of variables and time segments and uses attention both along the temporal dimension and across variables. This design helps the model learn how different series influence each other, for example how weather variables relate to energy demand or how multiple sensors interact in an industrial system. Crossformer is particularly effective when high dimensional time series require modeling both temporal patterns and rich inter series relationships.

4. Deep Learning & AI Models
4.3 Modern Foundation & Mixture-of-Experts Models

TimeGPT

TimeGPT is a large foundation model for time series, pretrained on massive collections of temporal data. It can be adapted to new tasks with minimal fine-tuning or even used “out of the box” for many forecasting problems. The model aims to provide strong baseline performance without extensive model engineering. It reflects a shift toward “time-series-as-a-service” with powerful generic backbones.

Chronos

Chronos is a token-based, pretrained time-series model that treats numeric values as discretized tokens, similar to language models. It learns generic temporal patterns and probabilistic behavior across many datasets. At inference time, it can generate full predictive distributions for new series with minimal adaptation. Chronos is particularly interesting for probabilistic forecasting and risk-sensitive applications.

Time-LLM

Time-LLM refers to architectures that combine large language models (LLMs) with time-series inputs, often via prompt engineering or learned adapters. The idea is to leverage the reasoning and pattern-recognition capabilities of LLMs for forecasting, anomaly detection, or scenario analysis. Time-LLM systems may describe uncertainty and context in natural language, making outputs more accessible to non-technical stakeholders. This is an emerging area where best practices are still evolving.

TiDE (Time-series Dense Encoder)

TiDE is a neural architecture that uses a dense encoder–decoder structure, often with relatively simple building blocks compared to transformers. It focuses on efficiency and strong performance on practical, real-world forecasting tasks. TiDE models can handle multiple covariates and long input windows while remaining relatively lightweight. This makes them attractive for industrial deployments where latency and resource usage matter.

Lag-Llama

Lag-Llama is a modern time-series model inspired by large language models but optimized for lag-based temporal data. It uses sequence modeling over lagged embeddings to capture complex patterns and dependencies. The model can often generalize across multiple datasets and tasks, acting as a semi-foundation model. Its strength lies in flexible representation learning for varied time-series problems.

TSMixer

TSMixer adapts the MLP-Mixer idea from computer vision to time series, using simple multi-layer perceptrons to mix information across time and feature dimensions. This results in a relatively lightweight architecture with fewer parameters than many transformer models. Despite its simplicity, TSMixer can achieve competitive performance on several benchmarks. It is appealing when you want deep learning benefits without the complexity and cost of full transformers.

TimeMixer

TimeMixer is a related architecture that mixes temporal segments or patches through specialized mixing layers. It aims to leverage both local and global patterns by mixing information across segments efficiently. This can reduce training cost while maintaining strong forecasting accuracy. TimeMixer belongs to a broader family of models that seek to replace heavy attention mechanisms with more efficient mixing operations.

SOFTS

SOFTS (often described as a self-organizing or structured foundation model for time series) is designed to automatically adapt to different time-series structures and regimes. It typically incorporates global pretraining and flexible adaptation mechanisms. The goal is to serve as a universal backbone for various forecasting and temporal analysis tasks. As a foundation-style model, it emphasizes generality and plug-and-play usability.

RMoK Model (Recurrent Mixture of Kolmogorov Models)

RMoK combines recurrent neural architectures with Kolmogorov–Arnold-type functional decompositions. It tries to represent complex nonlinear relationships as compositions of simpler functions, organized in a mixture-of-experts style. This allows the model to capture a wide range of behaviors with good approximation power. RMoK can be seen as a bridge between functional approximation theory and practical sequence modeling.

TimeXer

TimeXer is a time-series model based on a mixture-of-experts transformer backbone, where different experts specialize in different temporal patterns or regimes. A gating mechanism decides which experts to consult for a given input, improving flexibility and accuracy. This structure can handle heterogeneous datasets where no single model works best everywhere. It is particularly useful when data come from many domains or show strong regime shifts.

Time-MoE

Time-MoE (Time Mixture-of-Experts) is a broader family of models that use multiple expert subnetworks and a gating function for time-series forecasting. Different experts might focus on, for example, short-term dynamics, long-term trends, or specific seasonalities. The mixture structure allows the overall system to adapt to various patterns without being overly complex locally. Time-MoE models can scale to large datasets while keeping inference efficient through sparse expert activation.

Kolmogorov–Arnold Networks (KANs)

KANs are neural networks inspired by the Kolmogorov–Arnold representation theorem, which states that multivariate continuous functions can be decomposed into sums of univariate functions and their compositions. Instead of standard linear layers, KANs use learnable basis functions and spline-like operations. For time series, they can model complex nonlinear relationships with fewer parameters and potentially better interpretability. They are a promising alternative to classical MLPs.

KAN Experts

KAN Experts are mixture-of-experts architectures where each expert is a Kolmogorov–Arnold Network. The gating network selects which KAN experts to activate for a given input pattern. This combines the expressive power of KANs with the flexibility of expert mixtures, allowing different experts to specialize in different temporal regimes. Such models aim to achieve strong performance with structured, interpretable functional components.

N-BEATS

N-BEATS is a deep neural architecture designed specifically for univariate forecasting using backward and forward residual blocks. It learns trend and seasonality components directly from data without hand-crafted features. The model achieved state-of-the-art performance in major forecasting competitions and is relatively simple to implement. It is widely used as a strong deep-learning baseline for many time-series tasks.

N-HiTS

N-HiTS (Neural Hierarchical Interpolation for Time Series) extends N-BEATS with multi-rate, hierarchical interpolation of signals. It is particularly powerful for long-horizon forecasting where different resolutions of the series matter. By modeling low- and high-frequency components at different scales, N-HiTS manages to capture both overall shape and local details. It often outperforms earlier deep models on long-term benchmarks.

DeepFactor

DeepFactor combines global neural networks with local state-space models to capture both shared patterns across series and idiosyncratic behavior of each individual series. The global network learns latent factors, while the local component accounts for series-specific dynamics. This leads to flexible probabilistic forecasts for large collections of related time series. It is particularly useful in domains like retail, where thousands of products share common dynamics.

Intermittent Time-Series Models (Neural Variants)

Intermittent time series are characterized by many zeros and occasional non-zero spikes, common in spare-parts or long-tail demand. Classical models like Croston’s method and its variants (SBA, TSB) estimate demand size and inter-arrival time separately. Neural variants extend these ideas with recurrent or transformer architectures designed to handle sparse and bursty patterns. They aim to improve accuracy on long-tail items where standard models tend to underperform.

Synthefy Migas

Synthefy Migas is a lightweight Mixture-of-Experts forecasting model that intelligently combines multiple pretrained time series foundation models. Instead of relying on a single general-purpose backbone, Migas learns the individual biases and strengths of different experts and adapts them to new data with minimal fine-tuning. This enables strong zero-shot performance and fast customization on domain-specific datasets. With only around ten million parameters, Migas achieves state-of-the-art accuracy on benchmarks like GIFT Eval and consistently outperforms its individual experts as well as classical statistical baselines. The model is not open source and is available exclusively through Synthefy’s platform.

5. Probabilistic & Uncertainty-Focused Methods

Conformal Predictions (as Standalone Concept)

Conformal prediction provides a general way to construct prediction intervals with guaranteed coverage, assuming only exchangeability of data. For time series, it is often applied on top of point predictors by analyzing recent residuals. The method can adapt intervals to changing volatility or regime shifts. Its main advantage is that it is model-agnostic and can be applied to classical, ML, or deep-learning models alike.

Gaussian Processes Gaussian Process Regression

Gaussian Process Regression models a distribution over functions rather than fixed parameters, defined by a mean function and a covariance kernel. For time series forecasting, kernels can encode assumptions such as smooth trends or periodic seasonality, and the model produces a Gaussian predictive distribution for each future time point. This provides natural and coherent uncertainty estimates that increase where data is sparse or noisy. Gaussian processes are very flexible and interpretable but scale poorly with long series or large numbers of observations, so they are best suited to moderate sized problems or used with sparse approximations.

Bayesian Deep Learning MC Dropout and Deep Ensembles

Bayesian deep learning approaches adapt neural networks to produce uncertainty estimates instead of only point forecasts. Monte Carlo Dropout keeps dropout active at prediction time and performs multiple forward passes, interpreting the variation in outputs as model uncertainty. Deep ensembles train several independent networks and use the spread of their predictions as a measure of uncertainty. Both techniques are simple to implement with existing architectures and convert standard deep forecasting models into probabilistic ones, although their intervals often benefit from additional calibration such as conformal prediction.

6. Frameworks, Libraries & Tools

Darts

Darts is a Python library that offers a unified interface for many forecasting models, from ARIMA and ETS to N-BEATS, TFT, PatchTST and more. It simplifies experiments by providing consistent APIs for training, backtesting, and ensembling. Users can quickly compare classical, ML, and deep-learning approaches on the same data. It is very convenient for building end-to-end forecasting workflows.

GluonTS

GluonTS is a deep-learning toolkit for probabilistic time-series modeling, originally built on MXNet and now with PyTorch integration. It includes reference implementations of models like DeepAR, DeepState, DeepFactor, and Transformer-based variants. The library emphasizes probabilistic forecasting and evaluation, including predictive intervals and distributions. It is particularly used in research and industrial projects around large-scale forecasting.

sktime

sktime is a scikit-learn–like framework for time-series analysis in Python. It supports forecasting, classification, and transformation of time series with a unified API. You can benchmark classical and ML models, compose pipelines, and integrate with scikit-learn tools. It is especially useful for systematic comparisons and reproducible experiments.

neuralforecast

neuralforecast is a library focused on modern deep-learning models for forecasting, including N-BEATS, N-HiTS, PatchTST, TimesNet, TFT and others. It provides efficient implementations optimized for GPUs and large datasets. The library integrates well with typical Python data science stacks and includes utilities for evaluation and hyperparameter tuning. It is well suited for users who want cutting-edge deep models without reinventing the wheel.

StatsForecast

StatsForecast is a Python library optimized for fast and scalable classical forecasting methods such as ARIMA, ETS, Theta and Croston variants. It uses efficient compiled code and parallelization to fit models on large collections of series, making it suitable for industrial settings with thousands or millions of time series. The library provides automatic model selection and interval forecasts and is often used to build strong statistical baselines or full production systems. It integrates well with modern deep learning toolkits, allowing hybrid pipelines that combine classical and neural models.

PyTorch Forecasting

PyTorch Forecasting is a high level library built on PyTorch Lightning that simplifies training of advanced deep learning models for time series, such as Temporal Fusion Transformer, DeepAR and generic Seq2Seq architectures. It offers convenient dataset abstractions, automatic handling of covariates and time indexing, and built in tools for backtesting and hyperparameter tuning. The library also includes interpretation utilities, for example visualizing attention weights or feature importances for TFT. It significantly reduces boilerplate for deep forecasting projects and helps practitioners move from raw data to trained models quickly.

Kats

Kats is a general purpose toolkit for time series analysis released by Meta. It bundles forecasting models, anomaly and change point detection algorithms, feature extraction methods and simulation tools under one interface. For forecasting, Kats provides wrappers around models like Prophet, ARIMA and various ensemble strategies, as well as integrations with NeuralProphet. It is particularly useful in exploratory work where one needs to try different approaches quickly and combine forecasting with diagnostics such as change point detection or feature based analysis.

Orbit

Orbit is a Python package from Uber for Bayesian time series modeling and forecasting. It focuses on structural models such as local and global trend, seasonality and regression components, fitted using Bayesian inference engines like Stan or Pyro. Orbit provides simple interfaces to obtain full posterior predictive distributions, enabling credible intervals and causal impact style analyses. It is well suited for applications where interpretability and honest quantification of uncertainty are as important as point forecast accuracy.

From Classical Models to AI: A Forecasting Overview