Statistical timeseries forecasting in DuckDB
git clone https://github.com/DataZooDE/anofox-forecast ~/.claude/skills/anofox-forecast# Anofox Forecast - Time Series Forecasting for DuckDB
[](LICENSE)
[](https://duckdb.org)
[]()
> [!IMPORTANT]
> This extension is in early development, so bugs and breaking changes are expected.
> Please use the [issues page](https://github.com/DataZooDE/anofox-forecast/issues) to report bugs or request features.
A time series forecasting extension for DuckDB with 31 models, data preparation, and analytics β all in pure SQL.
## β¨ Key Features
### π― Forecasting (31 Models)
- **AutoML**: AutoETS, AutoARIMA, AutoMFLES, AutoMSTL, AutoTBATS
- **Statistical**: ETS, ARIMA, Theta, Holt-Winters, Seasonal Naive
- **Advanced**: TBATS, MSTL, MFLES (multiple seasonality)
- **Intermittent Demand**: Croston, ADIDA, IMAPA, TSB
### π Complete Workflow
- **EDA & Data Quality**: 5 functions (2 table functions, 3 macros) for exploratory analysis and data quality assessment
- **Data Preparation**: 12 macros for cleaning and transformation
- **Evaluation**: 12 metrics including coverage analysis
- **Seasonality Detection**: Automatic period identification
- **Changepoint Detection**: Regime identification with probabilities
### π’ Feature Calculation
- **76+ Statistical Features**: Extract comprehensive time series features for ML pipelines
- **GROUP BY & Window Support**: Native DuckDB parallelization for multi-series feature extraction
- **Flexible Configuration**: Select specific features, customize parameters, or use JSON/CSV configs
- **tsfresh-Compatible**: Compatible feature vectors for seamless integration with existing ML workflows (hctsa will come also)
### β‘ Performance
- **Parallel**: Native DuckDB parallelization on GROUP BY
- **Scalable**: Handles millions of series
- **Memory Efficient**: Columnar storage, streaming operations
- **Native C++ Operators**: High-performance native implementations for data preparation (e.g., `ts_fill_gaps_operator` with 6-258x speedup)
### π¨ User-Friendly API
- **Zero Setup**: All macros load automatically
- **Consistent**: MAP-based parameters
- **Composable**: Chain operations easily
- **Multi-Language**: Use from Python, R, Julia, C++, Rust, and more!
## π Table of Contents
- [Installation](#installation)
- [Multi-Language Support](#multi-language-support)
- [API Reference](#api-reference)
- [Guides](#guides)
- [Performance](#performance)
- [License](#license)
## Why a C++ Port?
We implemented time-series forecasting algorithms in C++ with native DuckDB
integration, drawing from multiple open-source implementations including
StatsForecast (Python) and various Rust libraries focused on financial analytics.
- **Zero Python overhead** - No subprocess calls, no serialization, pure native execution
- **Automatic parallelization** - DuckDB handles parallel execution across CPU cores natively
- **In-database forecasting** - Generate forecasts directly in SQL without moving data
- **Production-ready performance** - C++ speed with DuckDB's query optimization
- **Portability**: Run forecasts in DuckDB anywhereβincluding the browser via WASM (WebAssembly)βand from any language that supports DuckDB integration.
## Attribution
This extension includes C++ ports of algorithms from several open-source projects.
See [THIRD_PARTY_NOTICES](THIRD_PARTY_NOTICES) for complete attribution and license information.
## Installation
### Community Extension
```sql
INSTALL anofox_forecast FROM community;
LOAD anofox_forecast;
```
### From Source
```bash
# Clone the repository with submodules
git clone --recurse-submodules https://github.com/DataZooDE/anofox-forecast.git
cd anofox-forecast
# Build the extension
make release
# The extension will be built to:
# build/release/extension/anofox_forecast/anofox_forecast.duckdb_extension
```
## π Quick Start on M5 Dataset
The forecast takes ~2 minutes on a Dell XPS 13. (You need DuckDB v1.4.2).
```sql
-- Load extension
LOAD httpfs;
LOAD anofox_forecast;
CREATE OR REPLACE TABLE m5 AS
SELECT item_id, CAST(timestamp AS TIMESTAMP) AS ds, demand AS y FROM 'https://m5-benchmarks.s3.amazonaws.com/data/train/target.parquet'
ORDER BY item_id, timestamp;
CREATE OR REPLACE TABLE m5_train AS
SELECT * FROM m5 WHERE ds < DATE '2016-04-25';
CREATE OR REPLACE TABLE m5_test AS
SELECT * FROM m5 WHERE ds >= DATE '2016-04-25';
-- Perform baseline forecast and evaluate performance
CREATE OR REPLACE TABLE forecast_results AS (
SELECT *
FROM anofox_fcst_ts_forecast_by('m5_train', item_id, ds, y, 'SeasonalNaive', 28, {'seasonal_period': 7})
UNION ALL
SELECT *
FROM anofox_fcst_ts_forecast_by('m5_train', item_id, ds, y, 'Theta', 28, {'seasonal_period': 7})
UNION ALL
SELECT *
FROM anofox_fcst_ts_forecast_by('m5_train', item_id, ds, y, 'AutoARIMA', 28, {'seasonal_period': 7})
);
-- MAE and Bias of Forecasts
CREATE OR REPLACE TABLE evaluation_results AS (
SELECT
item_id,
model_name,
anofox_fcst_ts_mae(LIST(y), LIST(point_forecast)) AS mae,
anofox_fcst_ts_bias(LIST(y), LIST(point_forecast)) AS bias
FROM (
-- Join Forecast with Test Data
SELECT
m.item_id,
m.ds,
m.y,
n.model_name,
n.point_forecast
FROM forecast_results n
JOIN m5_test m ON n.item_id = m.item_id AND n.date = m.ds
)
GROUP BY item_id, model_name
);
-- Summarise evaluation results by model
SELECT
model_name,
AVG(mae) AS avg_mae,
STDDEV(mae) AS std_mae,
AVG(bias) AS avg_bias,
STDDEV(bias) AS std_bias
FROM evaluation_results
GROUP BY model_name
ORDER BY avg_mae;
```
---
## π Multi-Language Support
**Write SQL once, use everywhere!** The extension works from any language with DuckDB bindings.
| Language | Status | Guide |
|----------|--------|-------|
| **Python** | β
| [Python Usage](guides/81_python_integration.md) |
| **R** | β
| [R Usage](guides/82_r_integration.md) |
| **Julia** | β
| [Julia Usage](guides/83_julia_integration.md) |
| **C++** | β
| Via DuckDB C++ bindings |
| **Rust** | β
| Via DuckDB Rust bindings |
| **Node.js** | β
| Via DuckDB Node bindings |
| **Go** | β
| Via DuckDB Go bindings |
| **Java** | β
| Via DuckDB JDBC driver |
**See**: [Multi-Language Overview](guides/80_multi_language_overview.md) for polyglot workflows!
---
## π API Reference
For complete function signatures, parameters, and detailed documentation, see the [API Reference](docs/API_REFERENCE.md).
### Guides and API Sections
| Guide | API Reference Section |
|-------|----------------------|
| [Quick Start](guides/01_quickstart.md) | [Forecasting](docs/API_REFERENCE.md#forecasting) |
| [EDA & Data Preparation](guides/11_exploratory_analysis.md) | [Exploratory Data Analysis](docs/API_REFERENCE.md#exploratory-data-analysis), [Data Quality](docs/API_REFERENCE.md#data-quality), [Data Preparation](docs/API_REFERENCE.md#data-preparation) |
| [Detecting Seasonality](guides/12_detecting_seasonality.md) | [Seasonality](docs/API_REFERENCE.md#seasonality) |
| [Detecting Changepoints](guides/13_detecting_changepoints.md) | [Changepoint Detection](docs/API_REFERENCE.md#changepoint-detection) |
| [Time Series Features](guides/20_time_series_features.md) | [Time Series Features](docs/API_REFERENCE.md#time-series-features) |
| [Basic Forecasting](guides/30_basic_forecasting.md) | [Forecasting](docs/API_REFERENCE.md#forecasting) |
| [Evaluation Metrics](guides/50_evaluation_metrics.md) | [Evaluation](docs/API_REFERENCE.md#evaluation) |
| Forecasting Model Parameters | [Supported Models](docs/API_REFERENCE.md#supported-models), [Parameter Reference](docs/API_REFERENCE.md#parameter-reference) |
## π¦ Development
### Prerequisites
Before building, install the required dependencies:
**Manjaro/Arch Linux**:
```bash
sudo pacman -S base-devel cmake ninja openssl eigen
```
**Ubuntu/Debian**:
```bash
sudo apt update
sudo apt install build-essential cmake ninja-build libssl-dev libeigen3-dev
```
**Fedora/RHEL**:
```bash
sudo dnf install gcc-c++ cmake ninja-build openssl-devel eigen3-devel
```
**macOS**:
```bash
brew install cmake ninja openssl eigen
```
**Windows** (Option 1 - vcpkg, recommended):
```powershell
# Install vcpkg
git clone https://github.com/Microsoft/vcpkg.git
.\vcpkg\bootstrap-vcpkg.bat
# Install dependencies
.\vcpkg\vcpkg install eigen3 openssl
# Build with vcpkg toolchain
cmake -DCMAKE_TOOLCHAIN_FILE=.\vcpkg\scripts\buildsystems\vcpkg.cmake .
cmake --build . --config Release
```
**Windows** (Option 2 - MSYS2/MinGW):
```bash
# In MSYS2 MinGW64 terminal
pacman -S mingw-w64-x86_64-gcc mingw-w64-x86_64-cmake mingw-w64-x86_64-ninja
pacman -S mingw-w64-x86_64-openssl mingw-w64-x86_64-eigen3
# Then build as normal
make -j$(nproc)
```
**Windows** (Option 3 - WSL, easiest):
```bash
# Use Ubuntu in WSL
wsl --install
# Then follow Ubuntu instructions above
```
**Required**:
- C++ compiler (GCC 9+ or Clang 10+)
- CMake 3.15+
- OpenSSL (development libraries)
- Eigen3 (linear algebra library)
- Make or Ninja (build system)
### Build from Source
```bash
# Clone with submodules
git clone --recurse-submodules https://github.com/DataZooDE/anofox-forecast.git
cd anofox-forecast
# Build (choose one)
make -j$(nproc) # With Make
GEN=ninja make release # With Ninja (faster)
```
### Verify Installation
```bash
# Test the extension
./build/release/duckdb -c "
LOAD 'build/release/extension/anofox_forecast/anofox_forecast.duckdb_extension';
SELECT 'Extension loaded successfully! β
' AS status;
"
```
### Load Extension
```sql
-- In DuckDB
LOAD 'path/to/anofox_forecast.duckdb_extension';
-- Verify all functions are available
SELECT * FROM TS_FORECAST('sales', date, amount, 'AutoETS', 7, {'seasonal_period': 7});
```
## π License
**Business Source License 1.1 (BSL 1.1)**
### Key Points
β
**Free for production use** - Use internally in your business
β
**Free for development** - Build applications with it
β
**Free for research** - Academic and research use
β **Cannot offer as hosted service** - No SaaS offerings to third parties
β **Cannot embed in commercial product** - For third-party distribution
π **Converts to MPL 2.0** - After 5 years from first publication
See [LICENSE](LICENSE) for full terms.
## π€ Contributing
We welcome contributions! Please see [CONTRIBUTING.md](CONTRIBUTING.md) for guidelines.
## π Support
- **Documentation**: [guides/](guides/)
- **Issues**: [GitHub Issues](https://github.com/DataZooDE/anofox-forecast/issues)
- **Discussions**: [GitHub Discussions](https://github.com/DataZooDE/anofox-forecast/discussions)
- **Email**: sm@data-zoo.de
## π Citation
If you use this extension in research, please cite:
```bibtex
@software{anofox_forecast,
title = {Anofox Forecast: Time Series Forecasting for DuckDB},
author = {Joachim Rosskopf, Simon MΓΌller, DataZoo GmbH},
year = {2025},
url = {https://github.com/DataZooDE/anofox-forecast}
}
```
## π Acknowledgments
Built on top of:
- [DuckDB](https://duckdb.org) - Amazing analytical database
- [anofox-time](https://github.com/anofox/anofox-time) - Core forecasting library
Special thanks to the DuckDB team for making extensions possible!
---
**Made with β€οΈ by the Anofox Team**
β **Star us on GitHub** if you find this useful!
π’ **Follow us** for updates: [@datazoo](https://www.linkedin.com/company/datazoo/)
π **Get started now**: `LOAD 'anofox_forecast';`