Mobirise

From Classical Models to AI: A Forecasting Overview

This guide offers a brief overview of the main forecasting methods used today. It covers classical statistics, hybrid approaches, machine learning, deep learning and modern foundation models, showing what each group does well and when it is most useful.

1. Classical & Statistical Methods

ARIMA (AutoRegressive Integrated Moving Average) models a time series using its own past values (AR), differences (I) and past forecast errors (MA). It is very strong for stationary or difference-stationary series with clear autocorrelation structures. SARIMA extends ARIMA with seasonal components, making it suitable for data with weekly, monthly, or yearly patterns. These models are transparent and interpretable, and remain a standard baseline in industry.

ETS (Error, Trend, Seasonality) models combine exponential smoothing with an explicit structure for trend and seasonal components. Holt–Winters is a classic ETS-type model that updates level, trend and season in a recursive, adaptive way. It works especially well for short- to medium-term operational forecasts where recent data are more important than older observations. The model is easy to explain and often surprisingly competitive in practice.

GARCH (Generalized Autoregressive Conditional Heteroskedasticity) models time-varying volatility rather than the level of the series. It is widely used in finance to forecast risk and volatility of returns. The model assumes that large shocks cluster in time and that volatility itself follows an autoregressive process. While it does not forecast the mean very well, it is essential when volatility and risk are the main focus.

The Theta method decomposes the time series into “theta lines” that emphasize different aspects of trend and curvature, and then combines their forecasts. It became famous for its strong performance in the M3 and M4 forecasting competitions. The method is conceptually simple but surprisingly powerful and robust across many domains. It is often used as a strong classical benchmark for univariate time-series forecasting.

Vector Autoregression VAR extends univariate AR models to multiple time series by letting each variable depend on past values of all variables in the system. VARMA adds moving average terms, allowing both autoregressive and shock dynamics in a multivariate setting. VECM is a special form designed for nonstationary but cointegrated series and focuses on modeling both short term dynamics and long run equilibrium relationships. These models are widely used in macroeconomics and finance when feedback effects between variables are important and interpretability of cross impacts is required.

State space models represent a time series through latent states that evolve over time and produce observations through an observation equation. The Kalman filter is the recursive algorithm that estimates these hidden states in real time for linear Gaussian models, delivering optimal forecasts and uncertainty estimates. Many classical models such as ARIMA and exponential smoothing can be written in state space form, but the framework also allows custom components for trend, seasonality, cycles and exogenous effects. State space models handle missing data naturally and are very flexible for building structured, interpretable forecasting systems.

Croston method is tailored to intermittent demand series with many zeros, such as spare parts or long tail product sales. It separately updates estimates of demand size and the interval between nonzero demands, and uses their ratio as the forecast, which avoids bias that standard methods show on sparse data. The Syntetos Boylan Approximation SBA corrects a known bias in the original method, while the TSB variant adds explicit smoothing for the probability of demand occurrence. These methods are standard baselines in inventory management and service level planning for items with sporadic demand.

Markov switching models allow the parameters of a time series model to change according to an unobserved discrete state that follows a Markov chain. Typical examples are models that switch between recession and expansion regimes or between low and high volatility states. Each regime has its own dynamics, for instance different means or variances, and the model estimates both state specific parameters and transition probabilities between states. This framework captures structural breaks and nonlinear behavior that single regime models miss and is widely used in macroeconomics and financial time series analysis.

2. Hybrid Models (Statistics × Machine Learning

Prophet is an additive model that decomposes a time series into trend, seasonality and special events (e.g. holidays). It is designed for business time series that have strong seasonal patterns, growth, and many outliers or missing values. The model automatically detects changepoints in the trend and handles multiple seasonalities (e.g. weekly and yearly). Its main strength is ease of use and interpretability, rather than squeezing out the last possible percentage of accuracy.

Auto-ARIMA automates the process of selecting the best ARIMA configuration (p, d, q and seasonal parameters) based on information criteria such as AIC or BIC. It systematically searches over many candidate models and chooses the one that balances fit quality and simplicity. This saves analysts from manual trial-and-error in model selection. It is particularly useful as a robust and scalable baseline in production pipelines.

BATS and TBATS are extensions of exponential smoothing that use Box–Cox transformations, ARMA error structures, trend and trigonometric (Fourier) representations of seasonality. They are designed for time series with multiple and possibly non-integer seasonalities, such as hourly data with daily and weekly patterns. TBATS is especially effective when standard seasonal models fail to capture complex or long seasonal cycles. It is a go-to method when you have many overlapping seasonal patterns.

In STL + ARIMA/ETS, the series is first decomposed into trend, seasonal and remainder components using STL (Seasonal and Trend decomposition using Loess). The remainder is then modeled with ARIMA or ETS, while the seasonal and trend components are forecast separately and recombined. This approach allows more flexibility, because each component can be treated with a different specialized method. It is robust against changes in seasonality and often provides very stable performance.

ARIMAX extends ARIMA by including exogenous variables (X), such as marketing spend, weather, or price changes, directly into the model. It allows you to quantify how external drivers influence the target time series while still capturing autocorrelation and seasonality. Dynamic regression is the broader concept of combining regression with time-series structures in the residuals. These models are important when you want explanatory, cause–effect aware forecasts rather than purely pattern-based predictions.

Ensemble models combine forecasts from multiple individual models, such as ARIMA, ETS, and ML methods, to obtain a more robust overall forecast. The idea is that different models capture different aspects of the data, and averaging them reduces variance and model-specific bias. Ensembles often outperform any single method, especially in real-world noisy data. They are widely used in competitions and production systems for extra stability.

GAMs are regression models where the effect of each predictor is modeled as a smooth, possibly nonlinear function, but the overall model remains additive and interpretable. In time-series forecasting, GAMs can incorporate calendar effects, seasonal patterns, and external drivers with transparent smooth curves. They serve as a middle ground between classical statistics and opaque ML models. This makes them attractive for domains that value both flexibility and interpretability.

Gradient-boosted hybrid models take features from classical time-series analysis (lags, rolling statistics, seasonal indicators) and feed them into boosting algorithms such as XGBoost or LightGBM. The time-series structure is encoded in the features, while the boosting model handles nonlinear interactions. This approach can significantly improve accuracy when many external features are available. It is a pragmatic way to combine domain knowledge from statistics with the power of ML.

ES RNN is a hybrid model that combines exponential smoothing components with a recurrent neural network, originally developed for the M4 forecasting competition. Exponential smoothing handles local level and seasonality updates, while the RNN learns complex patterns across many series at once from shared representations. This design brings strong inductive bias from classical time series modeling together with the flexibility of deep learning. ES RNN demonstrated that such hybrids can significantly outperform both pure statistical and pure neural approaches on large forecasting benchmarks.

NeuralProphet is a neural extension of Facebook Prophet that keeps the familiar additive structure of trend, seasonality and holiday effects but augments it with learned autoregressive and lagged components. Built on PyTorch, it can use neural networks to model parts of the signal that are difficult for simple linear components, while still exposing clear decompositions for interpretability. NeuralProphet supports multiple seasonalities, future covariates and custom events, and often improves accuracy over Prophet when the data exhibits nonlinear or more complex dynamics. It aims to balance ease of use and explainability with the extra power of deep learning.

3. Machine Learning Models

Random Forests are ensembles of decision trees trained on bootstrapped samples of the data with random feature subsets. In time series, they operate on features like lags, moving averages, and calendar indicators rather than raw sequences. They are robust to noise, handle nonlinear relationships, and provide variable importance measures. However, they do not natively model temporal dependence, so good feature engineering is critical.

XGBoost is a highly optimized gradient-boosting library that builds trees sequentially to correct previous errors. In forecasting, it is used on engineered time-series features such as lags, seasonal dummies, and external variables. It often delivers top performance in competitive environments and Kaggle-style projects. Its main trade-off is lower interpretability compared to classical models, but tools like SHAP can help explain predictions.

LightGBM is another gradient-boosting framework, optimized for speed and scalability using histogram-based algorithms and leaf-wise tree growth. It can handle large feature sets and high-frequency time-series data efficiently. With appropriate feature engineering, it can capture complex nonlinear relationships and interactions between variables. It is often used in production for its combination of speed, accuracy, and reasonable resource usage.

Support Vector Regression (SVR) uses kernel methods to learn nonlinear relationships between input features and the target variable. For time series, features are typically lags, differences, and seasonality indicators. SVR can perform very well on moderate-sized datasets with clean patterns but does not scale easily to very large datasets. It is also less straightforward to interpret compared to linear or additive models.

kNN forecasting predicts future values based on similar patterns observed in the past. It searches for historical windows that resemble the current situation and uses their subsequent values as a forecast. This approach is simple and intuitive, and can work well when repeating motifs exist in the data. However, performance degrades when the series changes regime or when the dimensionality of features becomes very high.

Quantile regression adaptations of forests and boosting models predict specific quantiles of the future distribution rather than a single mean forecast. This allows direct estimation of prediction intervals and risk measures. These methods are useful when asymmetric risks matter, for example, underestimating demand is far more costly than overestimating. They can be used standalone or combined with conformal prediction techniques.

Rule-based models derive human-readable rules from tree ensembles and then use them in a linear model. For time series, rules might describe conditions such as “if last week’s sales were high and it is December, then demand will likely be very high.” This gives a bridge between flexible ML models and interpretable business logic. Such models are especially helpful when stakeholders need to understand decision logic in plain language.

Feature-based approaches automatically compute hundreds or thousands of statistical features from time series (e.g., autocorrelation, trend strength, entropy). These features are then used with generic ML models such as random forests or boosting. This strategy allows you to reuse powerful tabular ML algorithms while capturing rich time-series characteristics. It is particularly useful when you have many short time series and want to classify or forecast them in a unified way.

CatBoost is a gradient boosting library that is particularly strong with categorical features due to its ordered target statistics encoding and specialized regularization. In time series forecasting it is typically used on tabular features derived from the series, such as lags, rolling statistics, calendar variables and categorical identifiers for items or locations. CatBoost often matches or exceeds the performance of other tree boosting libraries while requiring less manual preprocessing of categorical variables. It is a strong candidate for production ready forecasting models when there are many covariates and complex interactions.

4.  Deep Learning & AI Models
4.1 Recurrent & Convolutional Models

LSTMs are recurrent neural networks designed to capture long-term dependencies by using gates to control information flow. They can model complex nonlinear relationships, multiple inputs, and long history in time-series data. LSTMs are widely used in forecasting, especially when simple linear models fail to capture intricate patterns. However, they can be slow to train and may require careful tuning and regularization.

GRUs are a simplified variant of LSTMs with fewer gates and parameters. They often achieve similar performance to LSTMs but train faster and are easier to tune. In forecasting, GRUs are commonly used for multivariate sequences and problems with moderate sequence length. They are a good default choice when moving from classical models to deep sequence models.

TCNs use dilated, causal convolutions to capture long-range temporal dependencies without recurrence. They process sequences in parallel, making them faster and more stable to train than many RNNs. For forecasting, TCNs can model both short- and long-term patterns effectively. They are particularly attractive for large-scale applications where training speed matters.

BiTCN extends TCNs by using bidirectional convolutions to learn from both past and future context in a training window. During training it can exploit the full sequence, improving feature extraction and pattern recognition. At prediction time, it typically uses only past context, but with richer learned representations. This can lead to more accurate forecasts, especially for complex and noisy time series.

DeepAR is a probabilistic recurrent neural network model designed to forecast many related time series simultaneously. It learns a global model across all series, which is especially useful when each individual series is short or noisy. The model outputs full predictive distributions rather than just point forecasts. It is heavily used in large-scale retail and supply-chain forecasting.

Encoder decoder LSTM models, also called sequence to sequence or Seq2Seq, use one recurrent network to encode a history window into a latent representation and a second network to decode that representation into future values. This architecture is well suited for multi step forecasting because it can generate an entire forecast trajectory instead of predicting each horizon independently. Attention mechanisms can be added so the decoder focuses on the most relevant past time steps when producing each future point. Seq2Seq models are flexible and powerful for multivariate and multi horizon forecasting but usually require substantial data and careful regularization.

xLSTMTime applies extended LSTM architectures to time series forecasting, using modern recurrent designs that improve scalability and long range sequence modeling. It can be seen as a renewed recurrent alternative to transformer based forecasting models. The model is useful when sequential inductive bias is important but standard LSTMs are too limited for modern large scale datasets. xLSTMTime belongs near the boundary between recurrent deep learning models and modern foundation style architectures.

4.  Deep Learning & AI Models
4.2 Transformer-Based Models

Transformers use self-attention mechanisms to weigh different time points in the input sequence and capture long-range dependencies without recurrence. They process all time steps in parallel, which scales well with modern hardware. For time series, they can handle long histories and heterogeneous inputs, including categorical and continuous covariates. The main challenges are data requirements and computational cost for very long sequences.

TFT is a specialized transformer architecture for forecasting that combines attention with recurrent layers and variable selection networks. It can handle static features, known future inputs (e.g. holidays), and observed past covariates in a unified way. TFT also provides interpretability through attention weights and variable selection scores, showing which inputs mattered when. This makes it attractive for industrial forecasting where both accuracy and explainability are required.

iTransformer introduces an “inverted” perspective on the time-series representation, often treating variables as tokens and leveraging attention in a more flexible way. This can improve performance in multivariate settings where relationships between variables are as important as temporal patterns. It is designed to be parameter-efficient while still capturing complex dependencies. As a modern architecture, it often outperforms earlier transformer variants on benchmark datasets.

PatchTST segments the time series into overlapping “patches,” similar to how Vision Transformers operate on image patches. Each patch is embedded and processed by transformer layers, allowing the model to learn local and global patterns efficiently. This patch-based representation helps handle long context windows without exploding computational cost. PatchTST has shown strong performance on long-horizon forecasting tasks.

TimesNet maps one-dimensional time series into multiple two-dimensional “time–frequency-like” representations and processes them using convolutional-style blocks inside a transformer-like framework. This allows the model to capture periodic patterns at multiple scales in a structured way. It has been reported to excel on diverse long-term forecasting benchmarks. TimesNet is particularly suitable when the data exhibits strong multi-periodic behavior.

Informer introduces probabilistic sparse self-attention to make transformers scalable to very long input sequences. It selectively focuses attention on the most informative time steps rather than all of them. This reduces memory and computation while maintaining good forecasting performance. It is often used for long-horizon forecasting in domains like energy and traffic.

Autoformer integrates an “auto-correlation” mechanism to explicitly model periodic patterns and long-term dependencies. It decomposes the input series into trend and seasonal parts and processes them separately. This decomposition helps the model handle long sequences without losing focus on main periodicities. Autoformer is particularly targeted at long-term forecasting tasks with clear periodic structure.

FEDformer (Frequency-Enhanced Decomposed Transformer) operates partly in the frequency domain, decomposing series into components and applying attention selectively. By working in frequency space, it can capture periodic and seasonal behaviors more efficiently. It combines time-domain and frequency-domain information to improve robustness. This architecture is especially useful when data exhibits complex, overlapping seasonal cycles.

Crossformer is a transformer variant tailored for multivariate time series that emphasizes cross variable dependencies. It organizes the data as a two dimensional structure of variables and time segments and uses attention both along the temporal dimension and across variables. This design helps the model learn how different series influence each other, for example how weather variables relate to energy demand or how multiple sensors interact in an industrial system. Crossformer is particularly effective when high dimensional time series require modeling both temporal patterns and rich inter series relationships.

ETSformer is a transformer architecture that explicitly incorporates ideas from exponential smoothing and classical ETS models into its design. It decomposes time series into level, trend and seasonal components and uses specialized attention and smoothing like operations to model each component. By embedding these inductive biases, ETSformer produces forecasts that extrapolate trends and seasonal patterns more reliably and are easier to interpret than generic transformers. It often delivers strong performance on long horizon tasks where stable trend and seasonality modeling are crucial.

4.  Deep Learning & AI Models
4.3 Foundation Forecasting Models

TimeGPT is a large foundation model for time series, pretrained on massive collections of temporal data. It can be adapted to new tasks with minimal fine-tuning or even used “out of the box” for many forecasting problems. The model aims to provide strong baseline performance without extensive model engineering. It reflects a shift toward “time-series-as-a-service” with powerful generic backbones.

Chronos is a token based pretrained time series foundation model family developed by Amazon that represents numerical values as discretized tokens similar to language models. It learns generic temporal patterns and probabilistic behavior across large collections of time series and can generate zero shot or few shot forecasts with minimal task specific adaptation. The model family is particularly relevant for probabilistic forecasting, uncertainty estimation and scalable cross-domain forecasting applications.

Chronos Bolt extends the original Chronos architecture with improved inference efficiency, lower latency and better scalability for industrial production environments while preserving strong probabilistic forecasting capabilities across diverse datasets and forecasting horizons.

Chronos 2 further expands the model family toward more universal forecasting capabilities by supporting multivariate forecasting, covariates and richer contextual inputs in addition to univariate forecasting. It is designed as a more general purpose foundation forecasting model that improves transfer learning, flexibility and practical usability across heterogeneous real world forecasting tasks. 

TimesFM is a time series foundation model developed by Google Research for zero shot and few shot forecasting. It uses a decoder only architecture and is pretrained on very large collections of time series, allowing it to generalize to new domains without task specific training. TimesFM 2.0 extends this direction with stronger few shot capabilities, where examples from related series can improve forecasts for a target series. TimesFM 2.5 further improves scalability, context handling and transfer performance across heterogeneous forecasting datasets while maintaining efficient inference for large scale industrial deployment. The model family is especially relevant when organizations want a general purpose forecasting backbone that can work across many datasets with minimal manual feature engineering.

Moirai 1.0 and Moirai 1.1 are universal time series foundation models from Salesforce AI Research. They are designed for probabilistic forecasting across different domains, frequencies and horizons, using large scale pretraining on heterogeneous time series data. The original Moirai family uses a masked encoder style architecture and supports flexible patch sizes to handle different temporal resolutions. These models are best classified as foundation models rather than ordinary transformer models, because their main strength is transferability across many forecasting tasks.

Moirai 2.0 is the next generation of the Moirai family and uses a simpler decoder only architecture for universal time series forecasting. Compared with Moirai 1.0, it replaces several more complex design choices with quantile forecasting, recursive multi quantile decoding and a more efficient model structure. The result is a smaller and faster foundation model that still delivers strong probabilistic forecasts across many domains. Moirai 2.0 should therefore be listed separately from Moirai 1.0, because it represents a notable architectural shift within the same model family.

Kairos is a modern time series foundation model designed for scalable forecasting and temporal representation learning across heterogeneous domains. It uses large scale pretraining to capture reusable temporal structures that can generalize to unseen datasets and forecasting tasks with limited adaptation. The model focuses on robust cross-domain transfer learning, efficient inference and flexible handling of different forecasting horizons and frequencies. Kairos represents the growing trend toward universal pretrained forecasting backbones for industrial and research applications.

TiRex is a pretrained time series foundation model focused on learning rich temporal representations for forecasting and related downstream tasks. It combines large scale pretraining with architectures optimized for long horizon forecasting and complex temporal dependencies. The model is designed to generalize across different industries, frequencies and forecasting settings while reducing the need for dataset specific model engineering. TiRex is particularly relevant as part of the new generation of transferable time series foundation models.

4.  Deep Learning & AI Models
4.4 Universal & Transfer Learning Time Series Models

MOMENT is a family of open time series foundation models designed for general purpose time series analysis. It is pretrained on a large public collection of time series and can be adapted to tasks such as forecasting, classification, anomaly detection and imputation. MOMENT is important because it treats time series foundation modeling as a broad representation learning problem rather than only a forecasting problem. It is especially useful in low data or limited supervision settings where pretrained temporal representations can reduce the need for large task specific datasets.

UniTime is a unified time series model designed to handle multiple temporal tasks and domains within a common architecture. Instead of building separate models for each dataset or forecasting horizon, UniTime aims to learn transferable temporal representations. It is relevant for organizations that want to standardize forecasting, classification or anomaly detection workflows across many related datasets. UniTime belongs to the modern foundation model category because of its emphasis on generality and transfer learning.

Toto is a modern time series foundation model focused on general purpose forecasting from pretrained temporal representations. It belongs to the group of models that aim to reduce the need for task specific model design by learning broad patterns from many datasets. Such models are especially useful when data comes from multiple domains and manual model selection would be expensive. Toto should be listed among newer foundation model approaches, although its practical relevance depends on availability, benchmarks and tooling.

Sundial is a recent foundation style model for time series that focuses on flexible forecasting across heterogeneous temporal data. Like other pretrained models, it aims to learn general temporal structure that can transfer to unseen datasets. Its relevance lies in the growing ecosystem of general purpose forecasting backbones that compete with models such as Chronos, TimesFM, Moirai and TimeGPT. Sundial is best placed in the modern foundation model category.

4.  Deep Learning & AI Models
4.5 Efficient & Compact Foundation Models

TinyTimeMixer, also known as TTM and included in IBM Granite Time Series models, is a compact pretrained time series foundation model family. It focuses on efficient forecasting with very small model sizes and can often run with low infrastructure requirements compared with larger transformer based foundation models. TTM is designed for zero shot and fine tuned forecasting, making it attractive for industrial use cases where speed, cost and deployability matter. It is a good example of the trend toward small, specialized foundation models for time series rather than only very large models.

Timer and Timer XL are time series foundation models that use large scale pretraining to learn reusable temporal representations. They are designed to support forecasting across diverse datasets and horizons, with larger variants aiming to improve generalization and long context handling. The model family reflects the broader move from task specific forecasting models toward pretrained backbones that can be adapted to many time series problems. Timer XL is particularly relevant for long horizon forecasting and cross domain transfer. 

Lag-Llama is a modern time-series model inspired by large language models but optimized for lag-based temporal data. It uses sequence modeling over lagged embeddings to capture complex patterns and dependencies. The model can often generalize across multiple datasets and tasks, acting as a semi-foundation model. Its strength lies in flexible representation learning for varied time-series problems.

TabPFN TS is a time series forecasting approach derived from the TabPFN foundation model framework for probabilistic tabular learning. It adapts pretrained transformer style inference mechanisms to temporal forecasting tasks by converting forecasting problems into structured tabular prediction representations. The model emphasizes strong zero-shot and few-shot forecasting performance with minimal hyperparameter tuning or feature engineering. TabPFN TS is especially relevant for practitioners who want fast, automated forecasting systems that perform competitively even with relatively small datasets and limited training time.

4.  Deep Learning & AI Models
4.6 Mixture-of-Experts & Modular Foundation Models

Moirai MoE is a mixture of experts extension of the Moirai foundation model family. Instead of using one single model path for all time series, a gating mechanism routes inputs to different expert components that specialize in different temporal structures or domains. This makes the model more flexible for heterogeneous forecasting problems where series may differ strongly in frequency, seasonality, volatility or business context. Moirai MoE belongs clearly to the foundation and mixture of experts category.

TimeXer is a time-series model based on a mixture-of-experts transformer backbone, where different experts specialize in different temporal patterns or regimes. A gating mechanism decides which experts to consult for a given input, improving flexibility and accuracy. This structure can handle heterogeneous datasets where no single model works best everywhere. It is particularly useful when data come from many domains or show strong regime shifts.

Time-MoE (Time Mixture-of-Experts) is a broader family of models that use multiple expert subnetworks and a gating function for time-series forecasting. Different experts might focus on, for example, short-term dynamics, long-term trends, or specific seasonalities. The mixture structure allows the overall system to adapt to various patterns without being overly complex locally. Time-MoE models can scale to large datasets while keeping inference efficient through sparse expert activation.

Synthefy Migas is a lightweight Mixture of Experts forecasting model that combines multiple pretrained time series foundation models. Instead of relying on a single general purpose backbone, Migas learns the individual strengths and biases of different experts and adapts them to new datasets with minimal fine tuning. This enables strong zero shot forecasting capabilities and efficient customization for domain specific applications. With roughly ten million parameters, Migas has reported strong benchmark performance on evaluations such as GIFT Eval while remaining computationally efficient compared with larger foundation models. The model is proprietary rather than open source and is available through Synthefy’s platform.

KAN Experts are mixture-of-experts architectures where each expert is a Kolmogorov–Arnold Network. The gating network selects which KAN experts to activate for a given input pattern. This combines the expressive power of KANs with the flexibility of expert mixtures, allowing different experts to specialize in different temporal regimes. Such models aim to achieve strong performance with structured, interpretable functional components.

RMoK combines recurrent neural architectures with Kolmogorov–Arnold-type functional decompositions. It tries to represent complex nonlinear relationships as compositions of simpler functions, organized in a mixture-of-experts style. This allows the model to capture a wide range of behaviors with good approximation power. RMoK can be seen as a bridge between functional approximation theory and practical sequence modeling.

4.  Deep Learning & AI Models
4.7 LLM & Prompt Based Forecasting

PromptCast formulates time series forecasting as a prompting task for large language models. Historical observations and task descriptions are converted into textual prompts, and the language model is asked to generate future values or forecast descriptions. This approach is still more experimental than classical or specialized neural forecasting models, but it is important because it shows how natural language interfaces can be combined with forecasting. PromptCast is most relevant for scenarios where explanation, context and human readable interaction are part of the forecasting workflow.

LLMTime uses large language models for time series forecasting by converting numerical sequences into token based representations that can be processed by language model architectures. The core idea is that pretrained language models may capture sequential structure and extrapolation patterns even outside natural language. LLMTime is particularly interesting from a research perspective because it bridges numerical forecasting and generative language modeling. It should be listed near Time LLM and PromptCast as part of the LLM based time series forecasting family.

Time-LLM refers to architectures that combine large language models (LLMs) with time-series inputs, often via prompt engineering or learned adapters. The idea is to leverage the reasoning and pattern-recognition capabilities of LLMs for forecasting, anomaly detection, or scenario analysis. Time-LLM systems may describe uncertainty and context in natural language, making outputs more accessible to non-technical stakeholders. This is an emerging area where best practices are still evolving.

TEMPO is a time series model that combines decomposition ideas with pretrained language or sequence models. It typically separates temporal structure into components such as trend, seasonality and residual dynamics, then uses powerful neural backbones to model these components. This makes it a hybrid between classical time series decomposition and modern foundation model thinking. TEMPO is relevant for long horizon forecasting where explicit structure and pretrained representations can complement each other.

4.  Deep Learning & AI Models
4.8 Vision & Multimodal Time Series Models

VisionTS adapts ideas from computer vision and visual representation learning to time series forecasting by transforming temporal data into structured representations that can be processed with architectures inspired by vision transformers and image patch based models. These approaches reuse techniques from modern computer vision and multimodal AI to capture complex temporal patterns, long range dependencies and cross-domain structures in time series data. VisionTS++ extends the original VisionTS direction with improved scalability, stronger feature extraction and better cross-domain generalization for heterogeneous forecasting tasks. The VisionTS family highlights the growing convergence between computer vision, multimodal foundation models and modern time series forecasting.

4.  Deep Learning & AI Models
4.9 Linear & MLP Based Forecasting Models

MLP forecasting models use multi layer perceptrons to learn temporal relationships directly from lagged observations and optional covariates without relying on recurrent or attention based architectures. Modern MLP based forecasting approaches have shown that relatively simple feedforward networks can achieve highly competitive performance on many forecasting benchmarks while remaining computationally efficient and easy to scale. Variants such as multivariate MLP forecasting models extend this idea to jointly model dependencies between multiple related time series. MLP forecasting architectures are especially attractive in practical applications where efficiency, simplicity and stable training behavior are important.

N-BEATS is a deep neural architecture specifically designed for univariate time series forecasting using backward and forward residual blocks. It learns trend and seasonality components directly from data without relying on hand crafted statistical assumptions and achieved state of the art performance in major forecasting competitions. N-BEATSx extends this architecture by incorporating exogenous variables and covariates such as calendar effects, weather data or pricing information, making the model more suitable for practical multivariate business forecasting tasks. Together, the N-BEATS family represents an important class of efficient MLP based forecasting architectures that combine strong predictive performance with relatively simple and scalable model design.

N-HiTS (Neural Hierarchical Interpolation for Time Series) extends N-BEATS with multi-rate, hierarchical interpolation of signals. It is particularly powerful for long-horizon forecasting where different resolutions of the series matter. By modeling low- and high-frequency components at different scales, N-HiTS manages to capture both overall shape and local details. It often outperforms earlier deep models on long-term benchmarks.

DLinear is a simple but influential neural forecasting model based on linear layers and time series decomposition. It separates a series into trend and remainder components and applies linear mappings to forecast future values. Despite its simplicity, DLinear has shown surprisingly strong performance on several long horizon forecasting benchmarks. It is important because it challenged the assumption that complex transformer architectures are always necessary for time series forecasting.

NLinear is a normalization based linear forecasting model that subtracts the last observed value before applying a linear forecast layer and then adds it back to the prediction. This simple design helps handle distribution shifts and changing levels in the time series. NLinear is lightweight, fast and often competitive with much larger deep learning architectures. It is a strong baseline for long horizon forecasting and should be listed together with DLinear and RLinear.

RLinear is a robust linear forecasting architecture that extends the family of efficient linear time series models. It focuses on stable long horizon forecasting with minimal architectural complexity. The model is relevant because it shows that careful normalization, residual design and simple linear mappings can compete with heavier neural architectures in many benchmark settings. RLinear is useful as both a practical baseline and a reminder that model simplicity can be a major advantage.

CARD is a modern time series forecasting architecture designed to improve representation learning for long horizon prediction. It focuses on capturing complex temporal dependencies while keeping the model efficient enough for practical use. CARD belongs to the family of recent deep forecasting models that try to balance accuracy, scalability and robustness. It is best listed together with other efficient non transformer architectures.

TSMixer is a lightweight forecasting architecture inspired by the MLP-Mixer concept from computer vision. Instead of relying on attention mechanisms, it uses multi-layer perceptrons to mix information across temporal and feature dimensions, resulting in efficient and scalable forecasting models with relatively low computational cost. TSMixerx extends this approach by incorporating exogenous variables and additional covariates such as calendar features, weather signals or item specific metadata, making it more suitable for practical industrial forecasting applications. The TSMixer family demonstrates that carefully designed MLP based architectures can achieve competitive forecasting performance while remaining significantly simpler and more efficient than many transformer based models.

TimeMixer is a related architecture that mixes temporal segments or patches through specialized mixing layers. It aims to leverage both local and global patterns by mixing information across segments efficiently. This can reduce training cost while maintaining strong forecasting accuracy. TimeMixer belongs to a broader family of models that seek to replace heavy attention mechanisms with more efficient mixing operations.

TiDE is a neural architecture that uses a dense encoder–decoder structure, often with relatively simple building blocks compared to transformers. It focuses on efficiency and strong performance on practical, real-world forecasting tasks. TiDE models can handle multiple covariates and long input windows while remaining relatively lightweight. This makes them attractive for industrial deployments where latency and resource usage matter.

SOFTS, Series Core Fused Time Series forecaster, is an efficient MLP based model for multivariate time series forecasting. It uses a STAR module to capture interactions between multiple series while keeping the architecture lightweight. SOFTS is designed for scalable long horizon forecasting and is especially useful when many related variables must be modeled efficiently. It belongs to the modern MLP and efficient neural forecasting family rather than to foundation models.

4.  Deep Learning & AI Models
4.10 Convolutional, Frequency & Efficient Neural Models

ModernTCN is an updated temporal convolutional architecture designed for strong long term time series forecasting. It revisits convolutional modeling with modern design choices such as larger receptive fields, improved normalization and efficient channel mixing. Compared with transformer models, it can offer better computational efficiency while still capturing long range temporal dependencies. ModernTCN is important because it shows that convolutional models remain highly competitive in modern forecasting.

SCINet is a neural forecasting model based on sample convolution and interaction mechanisms. It recursively splits the time series into sub sequences and learns interactions between them, allowing the model to capture both local and global temporal structure. SCINet is especially relevant for long horizon forecasting where hierarchical temporal patterns matter. It belongs to the group of efficient deep learning architectures that are neither classical RNNs nor standard transformers. 

MICN is a multi scale convolutional network for time series forecasting. It is designed to capture both local short term patterns and global long term dependencies through different convolutional scales. This makes it suitable for series with mixed periodicities or patterns operating at different resolutions. MICN is a useful addition because it represents the family of multi scale convolutional forecasting models.

FiLM is a frequency improved model for long term time series forecasting. It focuses on capturing important frequency components and using them to improve extrapolation over longer horizons. The model is useful when data contains strong cyclic or periodic structures that are easier to represent in the frequency domain. FiLM fits well beside FEDformer and other frequency aware architectures, but it is typically lighter and more specialized.

FreTS is a frequency domain neural forecasting model that uses spectral representations to capture temporal dependencies. Instead of modeling all patterns directly in the time domain, it learns relationships in transformed frequency components. This can be efficient and effective for series with periodic or quasi periodic structures. FreTS is a useful complement to transformer and convolution based methods because it highlights the role of frequency domain modeling in modern forecasting.

4.  Deep Learning & AI Models
4.11 Functional, Dynamical & Graph Based Models

KANs are neural networks inspired by the Kolmogorov–Arnold representation theorem, which states that multivariate continuous functions can be decomposed into sums of univariate functions and their compositions. Instead of standard linear layers, KANs use learnable basis functions and spline-like operations. For time series, they can model complex nonlinear relationships with fewer parameters and potentially better interpretability. They are a promising alternative to classical MLPs.

Koopa is a forecasting model inspired by Koopman operator theory, which seeks to represent nonlinear dynamics in a space where evolution is more linear and easier to forecast. This gives the model a strong theoretical connection to dynamical systems. Koopa is relevant for time series with complex nonlinear behavior where ordinary linear models are too simple but fully black box models are hard to interpret. It belongs to the modern efficient neural forecasting family.

StemGNN is a graph neural network architecture for multivariate time series forecasting that combines temporal modeling with learned inter-series dependency graphs. The model jointly captures temporal dynamics and relationships between variables by integrating spectral graph learning with sequence forecasting components. This allows StemGNN to model complex interactions between correlated time series, such as sensors, financial assets or energy systems, more effectively than purely independent forecasting models. StemGNN is particularly useful for high dimensional multivariate forecasting problems where cross variable structure plays an important role.

5. Probabilistic & Uncertainty-Focused Methods

Conformal prediction provides a general way to construct prediction intervals with guaranteed coverage, assuming only exchangeability of data. For time series, it is often applied on top of point predictors by analyzing recent residuals. The method can adapt intervals to changing volatility or regime shifts. Its main advantage is that it is model-agnostic and can be applied to classical, ML, or deep-learning models alike.

DeepFactor combines global neural networks with local state-space models to capture both shared patterns across series and idiosyncratic behavior of each individual series. The global network learns latent factors, while the local component accounts for series-specific dynamics. This leads to flexible probabilistic forecasts for large collections of related time series. It is particularly useful in domains like retail, where thousands of products share common dynamics.

Gaussian Process Regression models a distribution over functions rather than fixed parameters, defined by a mean function and a covariance kernel. For time series forecasting, kernels can encode assumptions such as smooth trends or periodic seasonality, and the model produces a Gaussian predictive distribution for each future time point. This provides natural and coherent uncertainty estimates that increase where data is sparse or noisy. Gaussian processes are very flexible and interpretable but scale poorly with long series or large numbers of observations, so they are best suited to moderate sized problems or used with sparse approximations.

Bayesian deep learning approaches adapt neural networks to produce uncertainty estimates instead of only point forecasts. Monte Carlo Dropout keeps dropout active at prediction time and performs multiple forward passes, interpreting the variation in outputs as model uncertainty. Deep ensembles train several independent networks and use the spread of their predictions as a measure of uncertainty. Both techniques are simple to implement with existing architectures and convert standard deep forecasting models into probabilistic ones, although their intervals often benefit from additional calibration such as conformal prediction.

Bayesian Structural Time Series models combine trend, seasonality, regression and intervention components in a Bayesian state space framework. They produce full posterior distributions for forecasts and can be used for causal impact analysis when an intervention or campaign occurs. BSTS models are interpretable because each component has a clear meaning, while Bayesian inference provides credible intervals for uncertainty. They are useful in marketing, economics and business analytics where explanation matters as much as prediction. 

Quantile forecasting estimates specific points of the predictive distribution, such as the 10th, 50th and 90th percentiles, instead of only a single mean forecast. This allows direct construction of prediction intervals and asymmetric risk estimates. Many modern models, including gradient boosting methods and foundation models such as Moirai 2.0, use quantile objectives to produce probabilistic forecasts. Quantile forecasting is especially important in business settings where over prediction and under prediction have different costs.

Distributional forecasting predicts a full probability distribution for future values rather than only point forecasts or selected quantiles. Models may output Gaussian, Student t, negative binomial, mixture or other likelihood based distributions depending on the data. This approach is common in probabilistic deep learning models such as DeepAR and GluonTS based architectures. It is valuable when decisions require risk assessment, service level planning or simulation of future scenarios.

Deep state space models combine neural networks with classical latent state space modeling. Neural components learn complex nonlinear transitions or emissions, while the state space structure provides temporal consistency and uncertainty estimates. This makes them powerful for multivariate, noisy or partially observed time series. They are especially useful when one wants the flexibility of deep learning together with the probabilistic discipline of state space models.

6. Frameworks, Libraries & Tools
6.2 Statistical & Machine Learning Forecasting Libraries

StatsForecast is a Python library optimized for fast and scalable classical forecasting methods such as ARIMA, ETS, Theta and Croston variants. It uses efficient compiled code and parallelization to fit models on large collections of series, making it suitable for industrial settings with thousands or millions of time series. The library provides automatic model selection and interval forecasts and is often used to build strong statistical baselines or full production systems. It integrates well with modern deep learning toolkits, allowing hybrid pipelines that combine classical and neural models.

MLForecast is a library for machine learning based time series forecasting using tabular models such as LightGBM, XGBoost, CatBoost and linear models. It automates the creation of lag features, rolling statistics and calendar features while keeping the modeling workflow efficient and scalable. MLForecast is especially useful for large panels of related time series where tree based models can exploit item level and calendar covariates. It complements StatsForecast and NeuralForecast in production forecasting pipelines.

AutoGluon Time Series is an AutoML framework for forecasting that can automatically train, tune and ensemble multiple model families. It supports statistical models, machine learning models and deep learning architectures within a unified workflow. The main advantage is fast development of strong baselines without extensive manual model selection. It is especially useful for teams that want competitive forecasting performance with limited time for algorithm engineering. 

Orbit is a Python package from Uber for Bayesian time series modeling and forecasting. It focuses on structural models such as local and global trend, seasonality and regression components, fitted using Bayesian inference engines like Stan or Pyro. Orbit provides simple interfaces to obtain full posterior predictive distributions, enabling credible intervals and causal impact style analyses. It is well suited for applications where interpretability and honest quantification of uncertainty are as important as point forecast accuracy.

pyts is a Python package for time series classification, transformation and feature extraction. It includes methods for converting time series into alternative representations such as recurrence plots, Gramian angular fields and symbolic aggregate approximations. While it is more focused on classification and representation learning than forecasting itself, it is useful in preprocessing, feature engineering and time series ML workflows. pyts is a valuable addition to the tooling section because it broadens the ecosystem beyond pure forecasting libraries.

6. Frameworks, Libraries & Tools
6.3 Deep Learning & Foundation Model Frameworks

GluonTS is a deep-learning toolkit for probabilistic time-series modeling, originally built on MXNet and now with PyTorch integration. It includes reference implementations of models like DeepAR, DeepState, DeepFactor, and Transformer-based variants. The library emphasizes probabilistic forecasting and evaluation, including predictive intervals and distributions. It is particularly used in research and industrial projects around large-scale forecasting.

NeuralForecast is a library focused on modern deep-learning models for forecasting, including N-BEATS, N-HiTS, PatchTST, TimesNet, TFT and others. It provides efficient implementations optimized for GPUs and large datasets. The library integrates well with typical Python data science stacks and includes utilities for evaluation and hyperparameter tuning. It is well suited for users who want cutting-edge deep models without reinventing the wheel.

PyTorch Forecasting is a high level library built on PyTorch Lightning that simplifies training of advanced deep learning models for time series, such as Temporal Fusion Transformer, DeepAR and generic Seq2Seq architectures. It offers convenient dataset abstractions, automatic handling of covariates and time indexing, and built in tools for backtesting and hyperparameter tuning. The library also includes interpretation utilities, for example visualizing attention weights or feature importances for TFT. It significantly reduces boilerplate for deep forecasting projects and helps practitioners move from raw data to trained models quickly.

Nixtla is a modern forecasting ecosystem that includes tools such as StatsForecast, MLForecast, NeuralForecast and TimeGPT. It provides scalable implementations for classical, machine learning, deep learning and foundation model based forecasting. The ecosystem is especially strong for production use because it focuses on speed, backtesting, cross validation and clean APIs. Nixtla is important to list because several tools already mentioned in this overview belong to or integrate with this ecosystem.

tsai is a deep learning library for time series classification, regression and forecasting built around PyTorch and fastai concepts. It provides many modern neural architectures and convenient training utilities for temporal data. The library is useful when time series forecasting is part of a broader time series machine learning workflow. It is especially attractive for practitioners already familiar with the fastai ecosystem. 

Time Series Library, commonly known as TSLib, is a widely used open source benchmark and research framework for deep learning based time series analysis and forecasting. It provides standardized implementations of many modern architectures such as Informer, Autoformer, FEDformer, PatchTST, TimesNet and iTransformer within a unified experimental environment. TSLib is especially important for reproducible benchmarking, comparative research and rapid experimentation with state of the art forecasting models. The framework has become a common reference ecosystem in academic research on modern time series forecasting architectures.

6. Frameworks, Libraries & Tools
6.4 Foundation Model Ecosystems & Platforms

TSFM, the IBM Granite Time Series Foundation Models toolkit, provides code, notebooks and utilities for working with IBM time series foundation models such as TinyTimeMixer. It supports tasks such as zero shot forecasting, fine tuning and deployment of pretrained time series models. The toolkit is relevant because it connects research models with practical implementation workflows. It should be listed near Darts, GluonTS and NeuralForecast as part of the modern foundation model tooling ecosystem. 

Hugging Face Time Series Models refers to the growing ecosystem of pretrained forecasting and temporal representation models available through the Hugging Face platform and Transformers ecosystem. It includes foundation models, transformer architectures and probabilistic forecasting models that can be shared, fine tuned and deployed using standardized APIs and model hubs. The ecosystem enables researchers and practitioners to access modern time series models with workflows similar to those used in natural language processing and computer vision. It is especially important for open model sharing, reproducibility and integration of time series foundation models into broader multimodal AI pipelines.

BigQuery TimesFM is Google Cloud’s managed integration of the TimesFM foundation forecasting model within the BigQuery ecosystem. It allows users to perform large scale zero shot and few shot time series forecasting directly inside cloud based analytical workflows without manually training forecasting models. The integration combines the scalability of BigQuery with pretrained foundation model forecasting capabilities, making advanced forecasting accessible through SQL oriented data pipelines and enterprise analytics environments. BigQuery TimesFM is especially relevant for industrial forecasting applications that require scalable deployment, cloud integration and minimal operational overhead.

Uni2TS is a universal time series modeling framework associated with the Moirai model family. It provides infrastructure and model components for building and evaluating pretrained time series models across different datasets and forecasting scenarios. Uni2TS is relevant not only as a model ecosystem but also as a practical research framework for foundation model based forecasting. It fits naturally beside Moirai, because it supports the same direction of universal and transferable time series modeling.

6. Frameworks, Libraries & Tools
6.5 Benchmarks, Datasets & Evaluation

GIFT Eval is a benchmark for evaluating time series foundation models across diverse datasets, frequencies and forecasting settings. It is useful because foundation models require evaluation beyond single domain benchmarks to test whether they truly generalize. Models such as Moirai, Chronos, TimesFM and others are often compared on broad benchmark suites of this kind. GIFT Eval should be listed as an evaluation resource rather than a forecasting library. 

The Monash Time Series Forecasting Repository is a widely used collection of benchmark datasets for forecasting research. It contains time series from many domains, frequencies and forecasting horizons, making it useful for comparing algorithms under diverse conditions. Many modern forecasting papers use datasets from this repository to evaluate generalization. It is an important resource for reproducible benchmarking and should be listed under tools and datasets. 

TFB (Time Series Forecasting Benchmark) is a benchmarking framework and evaluation suite designed for systematic comparison of forecasting models across diverse datasets, horizons and temporal domains. It provides standardized datasets, evaluation protocols and reproducible experimental settings for assessing statistical models, machine learning methods, deep learning architectures and modern foundation models. TFB is especially useful for measuring how well forecasting approaches generalize across heterogeneous real world scenarios rather than only excelling on isolated benchmark datasets. It belongs in the tools, benchmarking and evaluation ecosystem alongside resources such as GIFT Eval and the Monash Time Series Forecasting Repository.

6. Frameworks, Libraries & Tools
6.6 AI Assisted Forecasting & Copilot Systems

TimeCopilot is an AI driven framework for automated time series analysis, forecasting and decision support that combines large language models with statistical and machine learning forecasting pipelines. It is designed to assist users in tasks such as model selection, feature engineering, anomaly detection, forecasting interpretation and scenario analysis through natural language interaction. By integrating forecasting workflows with conversational AI capabilities, TimeCopilot represents a broader shift toward intelligent forecasting assistants and AI powered analytics systems. It is especially relevant for organizations that want to make advanced forecasting workflows more accessible to non specialist users.

Mobirise Website Builder