Variable Mapping Reference
This page is intended for developers who need to understand how ACCESS-MOPPy maps raw ACCESS model output variables to CMIP-compliant output, or who want to add support for new variables or models.
Overview
ACCESS-MOPPy uses JSON mapping files to describe how raw model variables (e.g.
UM STASH codes such as fld_s02i208, or MOM5/MOM6 diagnostics such as temp)
correspond to CMIP output variables (e.g. Amon.rsds).
At runtime, load_model_mappings() reads the appropriate
JSON file, finds the requested CMIP variable, and returns the mapping dictionary.
The relevant CMORiser subclass then uses the mapping to
load, transform, and write the data.
The mapping system also handles CMIP7 compound names transparently: a CMIP7 name is first resolved to its CMIP6 equivalent via a separate translation table, and the CMIP6 mapping is then applied as normal.
Mapping Files
Location
All mapping files live inside the installed package under:
src/access_moppy/mappings/
The files shipped with ACCESS-MOPPy are:
File |
Description |
|---|---|
|
Primary mapping file for ACCESS-ESM1.6 (atmosphere, ocean, land, sea ice, aerosol) |
|
Mapping file for ACCESS-CM3 (atmosphere, ocean) |
|
Mapping file for ACCESS-OM3 (ocean, time-invariant) |
|
Translation table: CMIP7 branded name → CMIP6 |
|
Translation table: CMIP6 |
Selecting a mapping file
The mapping file to use is determined by the model_id argument of
ACCESS_ESM_CMORiser (default: "ACCESS-ESM1.6").
load_model_mappings() constructs the filename as
{model_id}_mappings.json and looks for it inside the access_moppy.mappings
package resource directory.
Top-level Structure of a Mapping File
Each mapping file is a JSON object with the following top-level keys:
{
"model_info": { ... },
"aerosol": { "var1": { ... }, "var2": { ... } },
"atmosphere": { "var1": { ... }, "var2": { ... } },
"land": { "var1": { ... }, "var2": { ... } },
"landIce": { "var1": { ... }, "var2": { ... } },
"ocean": { "var1": { ... }, "var2": { ... } },
"sea_ice": { "var1": { ... }, "var2": { ... } }
}
model_infoA metadata block describing the model, its components, and — most importantly — how to discover raw output files automatically.
{ "model_id": "ACCESS-ESM1.6", "components": ["aerosol", "atmosphere", "land", "landIce", "ocean", "sea_ice"], "description": "Variable mappings for ACCESS-ESM1.6 Earth System Model", "file_discovery": { ... } }
The
file_discoverysub-block drives automatic file discovery (see File Discovery below).
Each component key (aerosol, atmosphere, etc.) maps CMIP variable names to
their entry dictionaries. When load_model_mappings() is
called with, say, compound_name="Amon.rsds", it extracts the CMIP name rsds
and searches each component in turn until it finds the entry.
Variable Entry Fields
Each variable entry inside a component block shares the same set of optional and required fields:
Field |
Required |
Description |
|---|---|---|
|
Yes |
The CF conventions standard name for the output variable.
May be an empty string |
|
Yes |
Ordered dictionary that maps model dimension names (keys) to CMIP dimension names (values). This tells the CMORiser how to rename coordinates. Example: "dimensions": {"time": "time", "lat": "lat", "lon": "lon"}
|
|
Yes |
Expected physical units of the CMIP output variable (e.g. |
|
Yes |
Sign convention: |
|
Yes |
List of raw model variable names that must be loaded from the input files.
These are passed by name into the calculation context.
|
|
Yes |
Dictionary that specifies how to derive the output variable. See Calculation Types below. |
|
No |
Present for variables on vertical levels. Describes the vertical coordinate type and the variables needed to reconstruct it. See Vertical Axis (zaxis) Field below. |
|
No |
A glob pattern (or list of patterns), relative to "file_pattern": "output[0-9][0-9][0-9]/ocean/ocean-2d-surface_temp-1mon-mean-y_*.nc"
When absent, discovery falls back to the |
|
No |
Name of a bundled NetCDF resource file (stored under |
Example — simple direct variable
"rldscs": {
"CF standard Name": "surface_downwelling_longwave_flux_in_air_assuming_clear_sky",
"dimensions": {"time": "time", "lat": "lat", "lon": "lon"},
"units": "W m-2",
"positive": "down",
"model_variables": ["fld_s02i208"],
"calculation": {
"type": "direct",
"formula": "fld_s02i208"
}
}
Calculation Types
The calculation dictionary always contains a "type" key. The five supported
types are described below.
direct
The output variable is taken straight from one model variable with no transformation.
"calculation": {
"type": "direct",
"formula": "<model_variable_name>"
}
formula
The output is derived by calling a registered function from the
custom_functions registry (see
Custom Functions Registry).
"calculation": {
"type": "formula",
"operation": "<function_name>",
"args": ["<var1>", "<var2>", ...],
"kwargs": {"<key>": "<var_or_literal>"}
}
argsis a list of positional arguments. Each item is either a string (variable name looked up in the input context), a number (used as-is), or a nested expression dictionary (see Expression Language below).kwargsis a dictionary of keyword arguments. Values follow the same rules asargsitems.Alternatively,
operandsmay be used instead ofargsfor legacy entries — both are treated identically by the expression evaluator.
Optional operands example (ocean hfds):
"calculation": {
"type": "formula",
"operation": "calc_hfds",
"args": ["sfc_hflux_from_runoff", "sfc_hflux_coupler", "sfc_hflux_pme"],
"kwargs": {
"frazil_3d_int_z": {"optional": "frazil_3d_int_z"},
"frazil_2d": {"optional": "frazil_2d"}
}
}
Wrapping a value in {"optional": "<var>"} means the variable is passed as
None if it is absent from the input dataset, instead of raising a
KeyError.
operation
A shorthand for common two-argument arithmetic operations. Functionally equivalent
to formula but expressed more compactly:
"calculation": {
"type": "operation",
"operation": "<op_name>",
"args": ["<var1>", "<var2>"]
}
Supported operation values: "add", "subtract", "multiply",
"divide", "power".
Example (land npp — net primary productivity divided by tile fraction):
"calculation": {
"type": "operation",
"operation": "divide",
"args": ["fld_s03i262", "fld_s03i395"]
}
dataset_function
Calls a more complex dataset-level function that receives the entire xarray Dataset and may modify dimensions or coordinates (e.g. interpolating from hybrid-height levels to physical height levels).
"calculation": {
"type": "dataset_function",
"function": "<function_name>",
"kwargs": {}
}
Available dataset_function values: "cl_level_to_height",
"cli_level_to_height", "clw_level_to_height", "level_to_height".
These functions are defined in
access_moppy.derivations.calc_atmos and registered in
custom_functions.
internal
The output variable is computed entirely internally from ancillary information (grid geometry, etc.) without reading any user-provided input file.
"calculation": {
"type": "internal",
"function": "<function_name>",
"args": []
}
Currently the only available function is "calculate_areacella" (atmospheric
grid-cell area, computed from latitude/longitude coordinate arrays).
Variables that use this type do not require input_data to be passed to
ACCESS_ESM_CMORiser.
Expression Language
The formula calculation type uses a small recursive expression language that is
evaluated by evaluate_expression().
An expression can be one of:
Expression form |
Meaning |
|---|---|
|
Look up the named variable in the input context (an xarray DataArray). |
|
A literal numeric value (integer or float). |
|
Explicit literal — useful when the value might be a string or ambiguous. |
|
Look up the variable; return |
|
Nested function call: recursively evaluate |
Expressions can be arbitrarily nested, allowing compound derivations to be expressed in a single JSON structure.
Custom Functions Registry
All functions available to the formula, operation, and dataset_function
calculation types are registered in the dictionary
access_moppy.derivations.custom_functions.
Built-in operations
Name |
Description |
|---|---|
|
Sum of any number of arguments: |
|
Difference: |
|
Product: |
|
Ratio: |
|
Exponentiation: |
|
|
|
Arithmetic mean of multiple arguments |
|
|
|
|
|
Select a single index slice: |
|
Resample to monthly minimum |
|
Resample to monthly maximum |
|
Drop a named dimension/axis |
|
Drop the time dimension (for time-invariant fields stored in time-varying files) |
|
Squeeze (remove) size-1 dimensions |
Atmosphere functions
Defined in access_moppy.derivations.calc_atmos.
Name |
Description |
|---|---|
|
Convert cloud fraction from hybrid-height levels to physical height levels |
|
Convert cloud ice content from hybrid-height levels to physical height levels |
|
Convert cloud liquid water from hybrid-height levels to physical height levels |
|
Generic hybrid-height level → physical height conversion |
|
Compute atmospheric grid-cell area from lat/lon coordinates |
Aerosol functions
Defined in access_moppy.derivations.calc_aerosol.
Name |
Description |
|---|---|
|
Sum spectral band optical depths to produce a broadband aerosol optical depth |
Land functions
Defined in access_moppy.derivations.calc_land.
Name |
Description |
|---|---|
|
Extract top-soil layer diagnostic |
|
Derive land cover fractions from tile data |
|
Extract a specific tile fraction |
|
Weighted sum over surface tiles |
|
Convert carbon pool units to kg m⁻² |
|
Total land carbon including wood products |
|
Convert mass pool to kg m⁻² |
|
Convert nitrogen pool units to kg m⁻² |
|
Compute frozen soil moisture |
|
Compute liquid soil moisture |
|
Compute total soil moisture |
|
Compute soil temperature profile |
Ocean functions
Defined in access_moppy.derivations.calc_ocean.
Name |
Description |
|---|---|
|
Compute ocean grid-cell area |
|
Downward ocean heat flux (composite of runoff, coupler, P-E terms, plus optional frazil) |
|
Upward geothermal heat flux |
|
Barotropic mass streamfunction |
|
Meridional overturning circulation streamfunction |
|
Shortwave radiation absorbed in ocean |
|
Volume-weighted global ocean average |
|
Total mass transport across an ocean section |
|
Zonal mass transport corrected for barotropic flow |
|
Meridional mass transport corrected for barotropic flow |
|
Global mean thermosteric sea level change |
|
Extract ocean floor (bottom-cell) values |
Sea ice functions
Defined in access_moppy.derivations.calc_seaice.
Name |
Description |
|---|---|
|
Sea ice extent (area where concentration > 15 %) |
|
Hemisphere-specific sea ice aggregate |
|
Northern/southern hemisphere sea ice area |
|
Northern/southern hemisphere sea ice volume |
|
Northern/southern hemisphere sea ice snow mass |
|
Northern/southern hemisphere sea ice extent |
Vertical Axis (zaxis) Field
For variables defined on vertical levels the mapping entry may include a zaxis
block that describes the vertical coordinate:
"zaxis": {
"type": "hybrid_height",
"coordinate_variables": {
"sigma_theta": "b",
"surface_altitude": "orog",
"theta_level_height": "lev"
},
"formula": "z = a + b*orog"
}
type: currently always"hybrid_height"(UM eta-based hybrid height coordinate).coordinate_variables: mapping from the UM variable name (key) to the CMIP output coordinate name (value).formula: human-readable label for the vertical coordinate reconstruction formula.
The actual vertical interpolation is carried out by the dataset_function
registered functions (e.g. level_to_height) using the auxiliary variables
identified in coordinate_variables.
Resource Files
Some variables (e.g. areacello, zfull) are derived from static ancillary
data that is bundled with ACCESS-MOPPy rather than read from user-supplied files.
These are listed in the ressource_file field (note the non-standard spelling,
kept for historical compatibility).
Bundled resource files live under:
src/access_moppy/resources/
When ressource_file is set and no input_data is provided to
ACCESS_ESM_CMORiser, the bundled file is resolved via
importlib.resources.files() and used automatically.
CMIP7 Compound Name Translation
CMIP7 uses a longer “branded” compound name format:
realm.variable.operation.frequency.domain
(e.g. atmos.rsds.tavg-u-hxy-u.mon.GLB).
The files cmip7_to_cmip6_compound_name_mapping.json and
cmip6_to_cmip7_compound_name_mapping.json provide a bidirectional look-up
table between these names and the familiar CMIP6 table.variable form.
These mappings are generated from the official CMIP7 Data Request API and contain
~1 974 entries. The function
_get_cmip7_to_cmip6_mapping() resolves a CMIP7 name
to its CMIP6 equivalent (with support for regex patterns when a single exact match
is not available).
The resolved CMIP6 name is then passed to load_model_mappings()
as usual, so the variable-level mapping files only need to be maintained in CMIP6
terms.
Adding New Mappings
To add support for a new variable, open the relevant model mapping JSON file and add an entry under the appropriate component key.
Checklist
Identify the correct component (
atmosphere,ocean, etc.) based on the model realm.Use the CMIP6 variable short name as the JSON key.
Fill in all required fields:
CF standard Name,dimensions,units,positive,model_variables,calculation.Choose the simplest applicable
calculation.type:Single variable, no transform →
directArithmetic on two variables →
operationCustom function with ≥ 1 argument →
formulaDataset-level level interpolation →
dataset_functionNo input data needed →
internal
If the function you need does not yet exist in
custom_functions, implement it in the appropriatecalc_*.pymodule underaccess_moppy.derivations, import it inaccess_moppy.derivations.__init__, and register it in thecustom_functionsdictionary.Run the test suite to ensure no regressions.
Example — adding a new atmosphere variable
Suppose you want to add huss (near-surface specific humidity, fld_s03i237):
"huss": {
"CF standard Name": "specific_humidity",
"dimensions": {"time": "time", "lat": "lat", "lon": "lon"},
"units": "1",
"positive": null,
"model_variables": ["fld_s03i237"],
"calculation": {
"type": "direct",
"formula": "fld_s03i237"
}
}
Adding a new model
Create
src/access_moppy/mappings/<MODEL_ID>_mappings.jsonfollowing the same top-level structure (model_info+ component keys).Add a
file_discoveryblock tomodel_info(see File Discovery) so that MOPPy can auto-discover raw files for the new model.Pass
model_id="<MODEL_ID>"toACCESS_ESM_CMORiserto activate the new mapping file.If the model uses a different CMORiser class (e.g. a new ocean component), implement a
CMORisersubclass and wire it up inaccess_moppy.driver.
File Discovery
Overview
discover_files() automatically locates raw
model output files for a CMIP variable given only an archive root directory.
It is used as a fallback by the batch system when no explicit file_patterns
entry is present in the batch config.
Resolution order
Per-variable
file_patternin the mapping entry — explicit override for edge-cases (unusual filenames, legacy layouts, or derived variables drawing from multiple file types).Component-level
frequency_patternsfrommodel_info.file_discovery— the normal path, resolved by substituting{model_var}frommodel_variablesand globbing underinput_folder.FileDiscoveryError— raised when neither source provides a pattern; the batch job fails with an actionable message.
file_discovery block in model_info
"file_discovery": {
"output_dir_pattern": "output[0-9][0-9][0-9]",
"components": {
"atmosphere": {
"subdir": "atmosphere/netCDF",
"frequency_patterns": {
"mon": "*.pa-*_mon.nc",
"day": "*.pe-*_dai.nc",
"3hr": "*.pi-*_3hr.nc",
"6hr": "*.pj-*_6hr.nc",
"subhr": "*.pc-*.nc"
}
},
"sea_ice": {
"subdir": "ice",
"frequency_patterns": {
"mon": "iceh-1monthly-mean_*.nc",
"day": "iceh-1daily-mean_*.nc"
}
},
"ocean": {
"subdir": "ocean",
"frequency_patterns": {
"mon": "ocean-*-{model_var}-1mon-mean-y_*.nc",
"day": "ocean-*-{model_var}-1day-mean-y_*.nc",
"yr": "ocean-*-{model_var}-1yr-mean-y_*.nc",
"fx": "ocean-*-{model_var}-fx.nc"
}
}
}
}
output_dir_patternGlob fragment matched against the top-level output sub-directories (e.g.
output000,output001, …).subdirPath relative to the output directory where the component’s files live.
frequency_patternsDictionary keyed by frequency token (
"mon","day","3hr","6hr","yr","fx") mapping to a filename glob.When the pattern contains
{model_var}, it is substituted with each entry in the variable’smodel_variableslist and all results are merged — this is how multi-variable derivations (e.g. ocean overturning computed from two transport fields) collect all required files.When the pattern has no
{model_var}placeholder (atmosphere, sea ice), a single glob is issued that returns all files for that frequency, regardless of which variable is requested. All variables at that frequency are packed into the same file in those components.
frequency_patterns keys
The key is derived from the CMIP table name of the requested variable:
Key |
CMIP tables that map to it |
|---|---|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Year-based filtering
discover_files() accepts optional
start_year and end_year arguments. After globbing, files outside the
requested range are removed by parsing the year directly from the filename
— no files are opened, so filtering adds negligible overhead even for large
archives.
The following filename conventions are recognised:
ocean-2d-tos-1mon-mean-y_1850.nc→ year1850iceh-1monthly-mean_1850-01.nc→ year1850aiihca.pa-185001_mon.nc→ year1850tos_mean_ocean_1mon_185001-185012.nc→ start year1850(proposed unified naming scheme)
Adapting to a different file layout
Each model version can have its own mapping file with a different
file_discovery block. To add support for a new file organisation:
Create (or copy) a mapping JSON — e.g.
ACCESS-ESM1.6-unified_mappings.json.Update
model_info.file_discoverywith the newsubdirvalues andfrequency_patternsglobs.For per-variable overrides (renamed files, unusual paths), add a
file_patternfield directly to the variable entry.Point the batch config at the new mapping:
model_id: ACCESS-ESM1.6-unified.
Old and new runs can be processed side-by-side; the model_id in each
batch config selects the correct layout independently.