normet: Normalisation, Decomposition, and Counterfactual Modelling for Air Quality Time-series

normet is an R package designed for air quality time-series analysis. It provides a powerful and user-friendly suite of tools for air quality research, causal inference, and policy evaluation.

✨ Core Strengths

Automated & Intelligent: Powered by an H2O AutoML backend, it automatically finds the optimal model, eliminating tedious manual tuning.
All-in-One Solution: Offers high-level functions that cover the entire workflow: data preprocessing, model training, weather normalisation, decomposition, and counterfactual modelling.
Robust Causal Inference: Integrates both classic (SCM) and machine-learning-based (ML-SCM) Synthetic Control Methods.
Uncertainty Quantification: Provides comprehensive tools for uncertainty estimation, including Bootstrap, Jackknife, and Placebo Tests (in-space).
High Performance: Built-in memory management and parallel processing for handling large datasets.

🚀 Workflow Overview

The analysis typically follows this sequence:

Initialize: Start the H2O backend (nm_init_h2o).
Prepare: Process raw data and create time-based features (nm_prepare_data).
Train: Build predictive models (nm_train_model).
Analyse:
- Importance: Rank predictors by influence (nm_feature_importance).
- Explain: Visualize marginal effects (nm_pdp).
- Normalise: Remove weather effects (nm_normalise / nm_normalise_auto).
- Decompose: Isolate emission vs. meteorological contributions (nm_decompose).
- Evaluate Policy: Estimate causal effects (nm_run_scm).

🔧 Installation

Install the latest development version from GitHub:

# install.packages("devtools")
devtools::install_github("normet-dev/normet-r")

Backend Setup

normet relies on H2O. Ensure Java is installed, then install the h2o package:

install.packages("h2o")

💡 Quick Start: The "Do-All" Pipeline

For standard weather normalisation, use nm_do_all to handle data preparation, training, and normalisation in one step.

library(normet)
library(dplyr)

# 1. Initialize Backend
nm_init_h2o()

# 2. Load Data
data("MY1")

# 3. Define Features
# 'predictors' includes weather + time variables for training
predictors <- c("u10", "v10", "d2m", "t2m", "blh", "sp", "ssrd", "tcc", "tp", "rh2m",
                "date_unix", "day_julian", "weekday", "hour")

# 'weather_vars' are the variables to be resampled (shuffled) during normalisation
weather_vars <- c("u10", "v10", "d2m", "t2m", "blh", "sp", "ssrd", "tcc", "tp", "rh2m")

# 4. Run Pipeline
results <- nm_do_all(
  df = my1,
  value = "PM2.5",
  predictors = predictors,
  resample_vars = weather_vars,
  n_samples = 300,  # Number of resampling iterations
  model_config = list(include_algos = c("GBM"), max_runtime_secs = 60, sort_metric = "AUTO")
)

# 5. Inspect Results
head(results$out)       # Normalised Data (Date, Observed, Normalised)
print(results$model)    # Trained Model

🛠️ Step-by-Step Advanced Workflow

For greater control, you can execute each stage manually.

1. Data Preparation & Model Training

# Prepare data with time features and train/test splits
df_prep <- nm_prepare_data(
  df = my1,
  value = 'PM2.5',
  predictors = weather_vars, # Will automatically add time features
  split_method = 'random',
  fraction = 0.75
)

# Configure H2O AutoML
h2o_cfg <- list(
  include_algos = c("GBM"),
  max_runtime_secs = 60,
  sort_metric = "AUTO"
)

# Train the model
model <- nm_train_model(
  df = df_prep,
  value = 'value',
  backend = "h2o",
  variables = predictors,
  model_config = h2o_cfg
)

# Evaluate Performance
nm_modStats(df_prep, model)

# (Optional) Save and Load Model
nm_save_model(model, path = "./", filename = "my_automl")
model <- nm_load_model(path = "./", filename = "my_automl")

2. Model Explainability

Feature Importance

Identify which variables have the strongest influence on the model's predictions.

# Extract feature importance table
importance_table <- nm_feature_importance(model)
print(head(importance_table))

Partial Dependence Plots (PDP)

Understand the specific relationship between variables and the pollutant (e.g., linear, non-linear) using Partial Dependence Plots.

# Compute PDP for all variables
pdp_all <- nm_pdp(df_prep, model)
print(head(pdp_all))

# Compute PDP for specific variables
pdp_data <- nm_pdp(df_prep, model, var_list = c('blh', 'rh2m'))
print(head(pdp_data))

3. Weather Normalisation

Standard Normalisation

Use the trained model to generate the weather-normalised time-series. The function now auto-detects features from the model.

# Aggregate=TRUE returns the mean normalised value
df_dew <- nm_normalise(
  df = df_prep,
  model = model,
  resample_vars = weather_vars,
  n_samples = 600,
  aggregate = TRUE
)

Automatic Normalisation (Auto-Convergence)

Instead of guessing n_samples, let the algorithm determine the optimal number of resampling iterations required for the result to stabilize.

# Automatically find best n_samples
auto_result <- nm_normalise_auto(
  df = df_prep,
  model = model,
  resample_vars = weather_vars
)

# Check the optimal N found
cat("Optimal samples used:", auto_result$best_n, "\n")

# Access the normalised result
head(auto_result$res)

Custom Resampling Pool

Use a specific historical period (e.g., specific year or season) as the weather baseline.

# Create a custom pool (e.g., first 100 observations)
resample_pool <- df_prep %>% dplyr::slice(1:100)

df_dew_custom <- nm_normalise(
  df = df_prep,
  model = model,
  resample_df = resample_pool, # <--- Use custom pool
  resample_vars = weather_vars,
  n_samples = 600
)

Rolling Normalisation

Perform normalisation in a moving window to capture changing trends.

df_rolling <- nm_rolling(
  df = df_prep,
  value = 'value',
  model = model,
  resample_vars = weather_vars,
  n_samples = 300,
  window_days = 14,
  rolling_every = 7
)

4. Time-Series Decomposition

Decompose the signal into Emission (human activity) and Meteorology (weather) drivers.

# Isolate Emission contribution
df_emi <- nm_decompose(method = "emission", df = df_prep, value = "value", model = model, n_samples = 300)

# Isolate Meteorology contribution
df_met <- nm_decompose(method = "meteorology", df = df_prep, value = "value", model = model, n_samples = 300)

5. Uncertainty Quantification (Ensemble)

Run an ensemble of models to estimate confidence intervals for the normalised trend.

unc_results <- nm_do_all_unc(
  df = my1,
  value = 'PM2.5',
  predictors = predictors,
  resample_vars = weather_vars,
  n_models = 5, # Train 5 models with different seeds
  n_samples = 300
)

⚖️ Causal Inference: Synthetic Control Methods (SCM)

Evaluate the effectiveness of policy interventions using SCM or Machine Learning SCM (ML-SCM).

1. Setup Data

data("SCM")
df_scm <- scm

# Define the intervention date
intervention_date <- as.Date("2015-10-23")

# Identify the Target Unit and the Donor Pool
target_unit <- unique(scm$ID[scm$group == "target"])
control_pool <- unique(scm$ID[scm$group == "control"])

cat("Target Unit:", target_unit, "\n")
cat("Donor Pool size:", length(control_pool), "\n")

2. Run SCM / ML-SCM

# Classic SCM
scm_res <- nm_run_scm(
  df = df_scm, date_col = "date", outcome_col = "SO2wn", unit_col = "ID",
  treated_unit = target_unit, donors = control_pool, cutoff_date = intervention_date,
  scm_backend = "scm", #Or "mlscm"
)

3. Placebo Tests & Confidence Bands

Validate results by running "Placebo in Space" tests (treating control units as if they were treated).

# Run Placebo Test
placebo_out <- nm_placebo_in_space(
  df = df_scm, date_col = "date", outcome_col = "SO2wn", unit_col = "ID",
  treated_unit = target_unit, donors = control_pool, cutoff_date = intervention_date,
  scm_backend = "scm", # Using ML-SCM backend ("mlscm") or "scm"
  verbose = FALSE
)

# Calculate and Plot 95% Confidence Bands
bands <- nm_effect_bands_space(placebo_out, level = 0.95, method = "quantile")
nm_plot_effect_with_bands(bands, cutoff_date = intervention_date, title = "SCM Effect (95% Placebo)")

4. Uncertainty (Bootstrap / Jackknife)

Alternative uncertainty estimation methods.

# Jackknife (Leave-One-Out)
jack_res <- nm_uncertainty_bands(
  df = df_scm, date_col = "date", outcome_col = "SO2wn", unit_col = "ID",
  scm_backend = "scm", # Using ML-SCM backend ("mlscm") or "scm"
  treated_unit = target_unit, donors = control_pool, cutoff_date = intervention_date,
  method = "jackknife", # Or "bootstrap"
  verbose = FALSE
)
nm_plot_uncertainty_bands(jack_res, cutoff_date = intervention_date, title = "SCM Effect (Jackknife)")

📦 Dependencies

R (>= 4.0)
Core: h2o, dplyr, data.table, lubridate, foreach, doSNOW
SCM: glmnet, quadprog
Visualization: ggplot2

📜 How to Cite

@Manual{normet-pkg,
  title = {normet: Normalisation, Decomposition, and Counterfactual Modelling for Air Quality Time-Series},
  author = {Congbo Song and Other Contributors},
  year = {2025},
  note = {R package version 0.0.1},
  organization = {University of Manchester},
  url = {https://github.com/normet-dev/normet-r},
}

📄 License

GNU GENERAL PUBLIC LICENSE.

🤝 Contributing

Contributions are welcome! Please submit issues and pull requests via GitHub.

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
R		R
data		data
man		man
vignettes		vignettes
.gitignore		.gitignore
DESCRIPTION		DESCRIPTION
LICENSE		LICENSE
NAMESPACE		NAMESPACE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

normet: Normalisation, Decomposition, and Counterfactual Modelling for Air Quality Time-series

✨ Core Strengths

🚀 Workflow Overview

🔧 Installation

Backend Setup

💡 Quick Start: The "Do-All" Pipeline

🛠️ Step-by-Step Advanced Workflow

1. Data Preparation & Model Training

2. Model Explainability

Feature Importance

Partial Dependence Plots (PDP)

3. Weather Normalisation

Standard Normalisation

Automatic Normalisation (Auto-Convergence)

Custom Resampling Pool

Rolling Normalisation

4. Time-Series Decomposition

5. Uncertainty Quantification (Ensemble)

⚖️ Causal Inference: Synthetic Control Methods (SCM)

1. Setup Data

2. Run SCM / ML-SCM

3. Placebo Tests & Confidence Bands

4. Uncertainty (Bootstrap / Jackknife)

📦 Dependencies

📜 How to Cite

📄 License

🤝 Contributing

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

normet: Normalisation, Decomposition, and Counterfactual Modelling for Air Quality Time-series

✨ Core Strengths

🚀 Workflow Overview

🔧 Installation

Backend Setup

💡 Quick Start: The "Do-All" Pipeline

🛠️ Step-by-Step Advanced Workflow

1. Data Preparation & Model Training

2. Model Explainability

Feature Importance

Partial Dependence Plots (PDP)

3. Weather Normalisation

Standard Normalisation

Automatic Normalisation (Auto-Convergence)

Custom Resampling Pool

Rolling Normalisation

4. Time-Series Decomposition

5. Uncertainty Quantification (Ensemble)

⚖️ Causal Inference: Synthetic Control Methods (SCM)

1. Setup Data

2. Run SCM / ML-SCM

3. Placebo Tests & Confidence Bands

4. Uncertainty (Bootstrap / Jackknife)

📦 Dependencies

📜 How to Cite

📄 License

🤝 Contributing

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages