ILAMB Evaluation Workflow: CMORisation with Batch Processing

This guide covers the end-to-end workflow for preparing ACCESS-ESM1-6 model output for evaluation with ILAMB (International Land Model Benchmarking). The workflow uses ACCESS-MOPPy’s batch processing system to CMORise multiple land, atmosphere, and biogeochemistry variables in parallel on NCI’s Gadi HPC.

Note

ACCESS-ESM1-6 is not yet officially registered in the CMIP controlled vocabularies. As a temporary workaround, we use ACCESS-ESM1-5 as the source_id during CMORisation. This means some outputs may appear labelled as ACCESS-ESM1-5, which can be confusing. We are aware of this limitation and plan to remove it as soon as ACCESS-ESM1-6 is validated and added to the CMIP vocabulary.


Overview

ILAMB evaluates land surface model performance against observational benchmarks. It expects variables in CF-compliant NetCDF format with standard CMIP names and units. ACCESS-MOPPy handles the conversion (CMORisation) from raw ACCESS-ESM1-6 output to this format, with each variable submitted as an independent PBS job.

Experiment: historical-02 (ACCESS-ESM1-6 production run) Variables covered: 22 variables across Emon, Lmon, and Amon CMIP tables

Note

ILAMB does not require strict CMIP6 publication compliance. CMORisation here is used to standardise variable names, units, and metadata — not for data submission.


Prerequisites

Requirement

Details

NCI project access

p73, xp65

Software

conda/analysis3-26.04 via xp65 modules

Scheduler

PBS Pro (Gadi)

Note: tm70 appears in the example paths throughout this guide (e.g. /scratch/tm70/$USER/…) but is not a required prerequisite. Substitute your own project code where appropriate.


Variables

The workflow processes 22 variables organised by CMIP table. Variables marked with * require daily input files; all others use monthly input.

Emon — Monthly ecosystem diagnostics

Variable

Long name

cSoil

Carbon mass in soil pool

fBNF

Biological nitrogen fixation

Lmon — Monthly land surface

Variable

Long name

cVeg

Carbon mass in vegetation

gpp

Gross primary production

lai

Leaf area index

nbp

Net biome production

ra

Plant respiration

rh

Heterotrophic respiration

tsl

Temperature of soil layers

mrro

Total runoff

Amon — Monthly atmosphere / surface energy balance

Variable

Long name

evspsbl

Evaporation

hfls

Surface upward latent heat flux

hfss

Surface upward sensible heat flux

hurs

Near-surface relative humidity

pr

Precipitation

rlds

Surface downwelling longwave radiation

rlus

Surface upwelling longwave radiation

rsds

Surface downwelling shortwave radiation

rsus

Surface upwelling shortwave radiation

tas

Near-surface air temperature

tasmax *

Daily maximum near-surface air temperature

tasmin *

Daily minimum near-surface air temperature

tasmax and tasmin are Amon-table variables but are derived from daily ACCESS output (*dai.nc). Their file patterns differ from the other Amon variables.

Omon — Ocean (pending)

Variable

Status

hfds (Downward heat flux at sea surface)

Pending — ocean file pattern not yet confirmed


Batch Configuration

Save the following as ilamb_variables_cmorise.yml. Update output_folder before running.

# Batch CMORisation configuration — Feb26-PI-CNP-concentrations (ACCESS-ESM1-6)
# Run with: python -m access_moppy.batch_cmoriser batch_config_Feb26_PI_CNP.yml

# Variables to process (one PBS job per variable)
variables:
  # --- Emon ---
  - Emon.cSoil
  - Emon.fBNF
  # --- Lmon ---
  - Lmon.cVeg
  - Lmon.gpp
  - Lmon.lai
  - Lmon.nbp
  - Lmon.ra
  - Lmon.rh
  - Lmon.tsl
  - Lmon.mrro
  # --- Amon ---
  - Amon.evspsbl
  - Amon.hfls
  - Amon.hfss
  - Amon.hurs
  - Amon.pr
  - Amon.rlds
  - Amon.rlus
  - Amon.rsds
  - Amon.rsus
  - Amon.tasmax
  - Amon.tasmin
  - Amon.tas
  # --- Omon ---
  # Omon.hfds  # TODO: add ocean file pattern once known

# CMIP6 metadata
experiment_id: historical
source_id: ACCESS-ESM1-5
variant_label: r1i1p1f1
grid_label: gn
activity_id: CMIP

# Input and output paths
input_folder: "/g/data/p73/archive/CMIP7/ACCESS-ESM1-6/production/historical-02"
output_folder: "YOUR_OUTPUT_PATH"

# File patterns (relative to input_folder)
# All atmosphere/land variables share the same pattern
file_patterns:
  Emon.cSoil:   "/output1*/atmosphere/netCDF/*mon.nc"
  Emon.fBNF:    "/output1*/atmosphere/netCDF/*mon.nc"
  Lmon.cVeg:    "/output1*/atmosphere/netCDF/*mon.nc"
  Lmon.gpp:     "/output1*/atmosphere/netCDF/*mon.nc"
  Lmon.lai:     "/output1*/atmosphere/netCDF/*mon.nc"
  Lmon.nbp:     "/output1*/atmosphere/netCDF/*mon.nc"
  Lmon.ra:      "/output1*/atmosphere/netCDF/*mon.nc"
  Lmon.rh:      "/output1*/atmosphere/netCDF/*mon.nc"
  Lmon.tsl:     "/output1*/atmosphere/netCDF/*mon.nc"
  Lmon.mrro:    "/output1*/atmosphere/netCDF/*mon.nc"
  Amon.evspsbl: "/output1*/atmosphere/netCDF/*mon.nc"
  Amon.hfls:    "/output1*/atmosphere/netCDF/*mon.nc"
  Amon.hfss:    "/output1*/atmosphere/netCDF/*mon.nc"
  Amon.hurs:    "/output1*/atmosphere/netCDF/*mon.nc"
  Amon.pr:      "/output1*/atmosphere/netCDF/*mon.nc"
  Amon.rlds:    "/output1*/atmosphere/netCDF/*mon.nc"
  Amon.rlus:    "/output1*/atmosphere/netCDF/*mon.nc"
  Amon.rsds:    "/output1*/atmosphere/netCDF/*mon.nc"
  Amon.rsus:    "/output1*/atmosphere/netCDF/*mon.nc"
  Amon.tasmax:  "/output1*/atmosphere/netCDF/*dai.nc"
  Amon.tasmin:  "/output1*/atmosphere/netCDF/*dai.nc"
  Amon.tas:     "/output1*/atmosphere/netCDF/*mon.nc"

# PBS job configuration (defaults for all variables)
queue: "normal"
cpus_per_node: 12
mem: "190GB"
jobfs: 100GB
walltime: "02:00:00"
scheduler_options: "#PBS -P iq82"
storage: "gdata/tm70+gdata/xp65+gdata/p73+scratch/tm70" #Example

# Environment setup
worker_init: |
  module use /g/data/xp65/public/modules
  module load conda/analysis3-26.04

wait_for_completion: false

Key configuration notes

File patterns

The pattern /output1*/atmosphere/netCDF/*mon.nc uses shell globbing:

  • output1* — matches all restart chunks beginning with output1 (e.g. output100, output101, …). Adjust the prefix if your archive uses a different naming scheme.

  • *mon.nc — monthly-frequency files. Daily files (*dai.nc) are used only for tasmax and tasmin.

Resource allocation

Each job requests 12 CPUs and 190 GB of memory. This is sized for land/atmosphere monthly data at N96 resolution. If you add 3-D ocean variables (e.g. Omon.hfds) you may need per-variable overrides:

variable_resources:
  Omon.hfds:
    cpus_per_node: 28
    mem: "190GB"
    walltime: "04:00:00"

wait_for_completion: false

The controller submits all 22 jobs and exits immediately. Jobs run independently on Gadi. Use the tracking database or PBS commands to monitor progress (see Monitoring below).


Running the Workflow

1. Set your output path

Replace YOUR_OUTPUT_PATH in the config with a path on scratch, e.g.:

output_folder: "/scratch/tm70/$USER/ilamb_cmorised/historical-02"

2. Submit the batch

From a Gadi login node, with ACCESS-MOPPy available in your environment:

module use /g/data/xp65/public/modules
module load conda/analysis3-26.04

moppy-cmorise ilamb_variables_cmorise.yml

This will:

  1. Parse and validate the configuration

  2. Create a SQLite tracking database at output_folder/cmor_tasks.db

  3. Generate PBS and Python scripts under output_folder/cmor_job_scripts/

  4. Submit 22 PBS jobs (one per variable) via qsub

  5. Print submitted job IDs and exit


Monitoring

PBS commands

# List your running jobs
qstat -u $USER

# Check a specific job
qstat -f <job_id>

# Watch overall queue
qstat -q normal

Tracking database

import sqlite3, pandas as pd

conn = sqlite3.connect("/scratch/tm70/$USER/ilamb_cmorised/historical-02/cmor_tasks.db")

df = pd.read_sql_query("""
    SELECT variable, status, start_time, end_time, error_message
    FROM cmor_tasks
    ORDER BY start_time
""", conn)

print(df)

Possible status values: pendingrunningcompleted / failed.

Log files

Each job writes stdout and stderr under the cmor_job_scripts/ directory:

cmor_job_scripts/
├── cmor_Amon_pr.sh       # Generated PBS script
├── cmor_Amon_pr.py       # Generated Python script
├── cmor_Amon_pr.out      # stdout
└── cmor_Amon_pr.err      # stderr (check here first on failure)

Error Recovery

Failed jobs can be resubmitted by re-running the same command:

moppy-cmorise ilamb_variables_cmorise.yml

Completed variables are skipped automatically; only failed or pending variables are resubmitted.

To manually reset a failed variable in the database:

conn.execute("""
    UPDATE cmor_tasks
    SET status = 'pending', start_time = NULL, end_time = NULL, error_message = NULL
    WHERE variable = 'Amon.tas' AND status = 'failed'
""")
conn.commit()

Common failures

Symptom

Likely cause

Fix

FileNotFoundError in .err

File pattern matches nothing

Check output1* prefix against actual archive directory names

MemoryError / job killed

Data larger than mem allocation

Increase mem or reduce cpus_per_node

Job never starts (stuck Q)

Insufficient project allocation

Run nci_account -P iq82 to check SU balance

Module not found in job

worker_init not sourcing correctly

Test module use command interactively on a compute node

tasmax/tasmin empty output

Wrong file pattern

Confirm daily files exist at *dai.nc; adjust glob if needed


Output Structure

When drs_root is not set (the default for this workflow), all CMORised files land directly in output_folder with CMIP6-standard filenames:

output_folder/
├── pr_Amon_ACCESS-ESM1-5_historical_r1i1p1f1_gn_185001-201412.nc
├── tas_Amon_ACCESS-ESM1-5_historical_r1i1p1f1_gn_185001-201412.nc
├── gpp_Lmon_ACCESS-ESM1-5_historical_r1i1p1f1_gn_185001-201412.nc
├── ...
└── cmor_tasks.db

Preparing ILAMB-Ready Files

ILAMB requires a specific directory layout called ILAMB-ROOT:

ILAMB_ROOT/
├── DATA/          → observational reference datasets
└── MODELS/
    └── <model_name>/
        ├── tas.nc → <CMORised file>
        ├── pr.nc  → <CMORised file>
        └── ...

Use create_ilamb_data_tree to build this layout in one call after all batch jobs complete:

from access_moppy.utilities import create_ilamb_data_tree

create_ilamb_data_tree(
    output_dir="/scratch/tm70/$USER/ilamb_cmorised/historical-02",
    ilamb_root="/scratch/tm70/$USER/ilamb_root",
    model_name="ACCESS-ESM1-6",
)

This call:

  1. Creates ILAMB_ROOT/DATA as a symlink to the NCI observational dataset replica at /g/data/ct11/access-nri/replicas/ILAMB (default).

  2. Creates ILAMB_ROOT/MODELS/ACCESS-ESM1-6/ and populates it with <variable>.nc symlinks pointing at the CMORised files in output_dir.

The resulting layout:

/scratch/tm70/$USER/ilamb_root/
├── DATA  →  /g/data/ct11/access-nri/replicas/ILAMB
└── MODELS/
    └── ACCESS-ESM1-6/
        ├── pr.nc   →  /scratch/tm70/$USER/ilamb_cmorised/historical-02/pr_Amon_…nc
        ├── tas.nc  →  /scratch/tm70/$USER/ilamb_cmorised/historical-02/tas_Amon_…nc
        ├── gpp.nc  →  /scratch/tm70/$USER/ilamb_cmorised/historical-02/gpp_Lmon_…nc
        └── ...

Optional: custom observational source

To point DATA at a different observational archive, pass obs_source:

create_ilamb_data_tree(
    output_dir="/scratch/tm70/$USER/ilamb_cmorised/historical-02",
    ilamb_root="/scratch/tm70/$USER/ilamb_root",
    model_name="ACCESS-ESM1-6",
    obs_source="/path/to/custom/obs",
)

Optional: step-by-step control

To create the DATA link and model symlinks independently:

from access_moppy.utilities import (
    create_ilamb_observational_symlinks,
    create_ilamb_model_symlinks,
)

# Step 1 – observational data
create_ilamb_observational_symlinks(
    ilamb_root="/scratch/tm70/$USER/ilamb_root",
)

# Step 2 – model output
create_ilamb_model_symlinks(
    output_dir="/scratch/tm70/$USER/ilamb_cmorised/historical-02",
    ilamb_dir="/scratch/tm70/$USER/ilamb_root/MODELS/ACCESS-ESM1-6",
)

Set ILAMB_ROOT to the root directory when running ILAMB:

export ILAMB_ROOT=/scratch/tm70/$USER/ilamb_root

After set-up, run ilamb with following command.

module use /g/data/xp65/public/modules
module load conda/analysis3-26.04


export ILAMB_ROOT=YOUR-ILAMB-ROOT
export CARTOPY_DATA_DIR=/g/data/kj13/admin/ILAMB/script_github_ilamb
export BUILD_DIR=YOUR-BUILD-DIR

rm -rf BUILD_DIR
mpiexec -n <NUMBER OF PROCESS> ilamb-run --config <YOUR ILAMB CONFIG FILE> --model_setup <YOUR DATASET SETUP .txt FILE> --regions global --build_dir $BUILD_DIR