Input Files¶
CATChem requires various input files to perform atmospheric chemistry simulations. This guide describes the required and optional input files, their formats, and how to prepare them.
Overview¶
CATChem input files can be organized into several categories:
- Configuration files - YAML files controlling model behavior
- Meteorological data - Weather and atmospheric state variables
- Emission data - Anthropogenic and biogenic emission sources
- Chemical data - Species properties and reaction mechanisms
- Geographic data - Land use, topography, and surface properties
Configuration Files¶
Main Configuration File¶
The main configuration file (typically catchem_config.yml) controls all aspects of the model run:
# CATChem Configuration File
model:
name: "My CATChem Run"
version: "2.1.0"
simulation_start: "2024-01-01T00:00:00"
simulation_end: "2024-01-02T00:00:00"
timestep: 300 # seconds
domain:
grid_type: "latlon"
nx: 100
ny: 80
nz: 50
dx: 0.25 # degrees
dy: 0.25 # degrees
processes:
- name: chemistry
enabled: true
scheme: cb6_full
timestep_factor: 1
- name: emissions
enabled: true
scheme: edgar_v6
timestep_factor: 1
- name: dry_deposition
enabled: true
scheme: wesely1989
timestep_factor: 2
input:
meteorology:
file_template: "met_data_%Y%m%d_%H.nc"
directory: "/data/meteorology/"
emissions:
anthropogenic:
file_template: "emis_anthro_%Y%m.nc"
directory: "/data/emissions/"
biogenic:
file_template: "emis_bio_%Y%m%d.nc"
directory: "/data/emissions/"
output:
directory: "/output/catchem_run/"
file_template: "catchem_output_%Y%m%d_%H.nc"
frequency: 3600 # seconds
variables:
- "O3"
- "NO2"
- "PM25"
- "SO2"
Process-Specific Configuration¶
Each process can have detailed configuration in separate files:
Chemistry Configuration (chemistry_config.yml):
chemistry:
mechanism: cb6_full
solver: rosenbrock
solver_options:
relative_tolerance: 1.0e-3
absolute_tolerance: 1.0e-12
max_iterations: 1000
species:
- name: O3
initial_concentration: 50.0e-9 # ppbv
units: "mol/mol"
- name: NO2
initial_concentration: 10.0e-9
units: "mol/mol"
reactions:
photolysis:
lookup_table: "photolysis_rates.nc"
update_frequency: 3600
thermal:
temperature_dependence: arrhenius
pressure_dependence: troe
Emissions Configuration (emissions_config.yml):
emissions:
anthropogenic:
inventory: EDGAR_v6
sectors:
- power_plants
- transportation
- industrial
- residential
temporal_profiles:
diurnal: "diurnal_profiles.nc"
weekly: "weekly_profiles.nc"
seasonal: "seasonal_profiles.nc"
biogenic:
model: MEGAN
land_use_data: "land_use.nc"
plant_functional_types: "pft_data.nc"
emission_factors: "megan_ef.nc"
point_sources:
file: "point_sources.csv"
format: csv
plume_rise: briggs
Meteorological Data¶
Required Variables¶
CATChem requires the following meteorological variables:
3D Variables:
- Temperature (K)
- Pressure (Pa)
- Specific humidity (kg/kg)
- Zonal wind (m/s)
- Meridional wind (m/s)
- Vertical velocity (m/s)
- Turbulent kinetic energy (m²/s²)
2D Variables:
- Surface pressure (Pa)
- 2m temperature (K)
- 2m specific humidity (kg/kg)
- 10m wind speed (m/s)
- Precipitation rate (kg/m²/s)
- Boundary layer height (m)
- Surface heat flux (W/m²)
File Format¶
Meteorological data should be in NetCDF format with CF conventions:
netcdf met_data_20240101_00 {
dimensions:
time = 1 ;
lon = 100 ;
lat = 80 ;
lev = 50 ;
variables:
double time(time) ;
time:units = "hours since 2024-01-01 00:00:00" ;
time:calendar = "gregorian" ;
float lon(lon) ;
lon:units = "degrees_east" ;
lon:standard_name = "longitude" ;
float lat(lat) ;
lat:units = "degrees_north" ;
lat:standard_name = "latitude" ;
float lev(lev) ;
lev:units = "Pa" ;
lev:standard_name = "air_pressure" ;
lev:positive = "down" ;
float temperature(time, lev, lat, lon) ;
temperature:units = "K" ;
temperature:standard_name = "air_temperature" ;
float pressure(time, lev, lat, lon) ;
pressure:units = "Pa" ;
pressure:standard_name = "air_pressure" ;
}
Data Sources¶
Common Meteorological Data Sources:
- NCEP GFS - Global Forecast System
- ECMWF ERA5 - Reanalysis data
- NCAR WRF - Weather Research and Forecasting model
- NOAA NAM - North American Mesoscale model
Data Preparation:
# Convert GRIB to NetCDF
cdo -f nc copy gfs_data.grib met_data.nc
# Regrid to target domain
cdo remapbil,target_grid.txt input_met.nc output_met.nc
# Extract variables
cdo select,name=t,u,v,q,ps input_met.nc extracted_met.nc
Emission Data¶
Anthropogenic Emissions¶
File Format:
netcdf emis_anthro_202401.nc {
dimensions:
time = 744 ; // hourly for January
lon = 100 ;
lat = 80 ;
sector = 10 ;
variables:
float NOx_emissions(time, sector, lat, lon) ;
NOx_emissions:units = "mol/m2/s" ;
NOx_emissions:long_name = "NOx emissions" ;
float CO_emissions(time, sector, lat, lon) ;
CO_emissions:units = "mol/m2/s" ;
float SO2_emissions(time, sector, lat, lon) ;
SO2_emissions:units = "mol/m2/s" ;
char sector_names(sector, 50) ;
sector_names:long_name = "Emission sector names" ;
}
Recommended Minimal Species:
- NOx (as NO2 equivalent)
- CO
- SO2
- NH3
- VOCs (individual species or lumped)
- PM2.5 and PM10
- Black carbon (BC)
- Organic carbon (OC)
Biogenic Emissions¶
MEGAN Format:
netcdf emis_bio_20240101.nc {
dimensions:
time = 24 ; // hourly
lon = 100 ;
lat = 80 ;
pft = 16 ; // plant functional types
variables:
float isoprene_ef(pft, lat, lon) ;
isoprene_ef:units = "mg/m2/h" ;
isoprene_ef:long_name = "Isoprene emission factor" ;
float monoterpene_ef(pft, lat, lon) ;
monoterpene_ef:units = "mg/m2/h" ;
float lai(time, lat, lon) ;
lai:units = "m2/m2" ;
lai:long_name = "Leaf area index" ;
float temperature_2m(time, lat, lon) ;
temperature_2m:units = "K" ;
}
Point Source Emissions¶
CSV Format:
source_id,longitude,latitude,stack_height,stack_diameter,exit_velocity,exit_temperature,NOx_rate,SO2_rate,CO_rate
PLANT001,-74.5,40.2,100.0,3.5,15.2,423.0,0.025,0.015,0.008
PLANT002,-74.3,40.1,80.0,2.8,12.1,398.0,0.018,0.012,0.006
NetCDF Format:
netcdf point_sources.nc {
dimensions:
source = 1000 ;
time = 8760 ; // hourly for year
variables:
double longitude(source) ;
double latitude(source) ;
float stack_height(source) ;
float NOx_emissions(time, source) ;
NOx_emissions:units = "kg/s" ;
}
Chemical Data¶
Species Properties¶
Species Definition File (species.yml):
species:
- name: O3
molecular_weight: 48.0
henry_constant: 1.0e-2 # M/atm
diffusivity: 1.8e-5 # m2/s
reactivity: moderate
- name: NO2
molecular_weight: 46.0
henry_constant: 7.0e-3
diffusivity: 1.6e-5
reactivity: high
- name: SO2
molecular_weight: 64.1
henry_constant: 1.4
diffusivity: 1.2e-5
reactivity: high
Chemical Mechanisms¶
Mechanism Definition:
mechanism:
name: CB6_full
description: "Carbon Bond 6 full mechanism"
species:
gas_phase: [O3, NO, NO2, CO, SO2, NH3, ...]
aerosol_phase: [SO4, NO3, NH4, BC, OC, ...]
reactions:
- equation: "NO2 + hv -> NO + O"
rate_constant: "photolysis"
products: [NO, O]
- equation: "NO + O3 -> NO2 + O2"
rate_constant:
A: 1.8e-14
Ea: -1370.0
units: "cm3/molec/s"
Geographic Data¶
Land Use Data¶
File Format:
netcdf land_use.nc {
dimensions:
lon = 100 ;
lat = 80 ;
category = 24 ; // USGS land use categories
variables:
float land_use_fraction(category, lat, lon) ;
land_use_fraction:units = "1" ;
land_use_fraction:long_name = "Land use category fraction" ;
int dominant_category(lat, lon) ;
dominant_category:long_name = "Dominant land use category" ;
char category_names(category, 100) ;
}
Land Use Categories:
- Urban and Built-up
- Dryland Cropland and Pasture
- Irrigated Cropland and Pasture
- Mixed Dryland/Irrigated Cropland
- Cropland/Grassland Mosaic
- Cropland/Woodland Mosaic
- Grassland
- Shrubland
- Mixed Shrubland/Grassland
- Savanna
- Deciduous Broadleaf Forest
- Deciduous Needleleaf Forest
- Evergreen Broadleaf Forest
- Evergreen Needleleaf Forest
- Mixed Forest
- Water Bodies
- Herbaceous Wetland
- Wooded Wetland
- Barren or Sparsely Vegetated
- Herbaceous Tundra
- Wooded Tundra
- Mixed Tundra
- Bare Ground Tundra
- Snow or Ice
Topography Data¶
netcdf topography.nc {
dimensions:
lon = 100 ;
lat = 80 ;
variables:
float elevation(lat, lon) ;
elevation:units = "m" ;
elevation:standard_name = "surface_altitude" ;
float roughness_length(lat, lon) ;
roughness_length:units = "m" ;
roughness_length:long_name = "Surface roughness length" ;
}
Data Preparation Tools¶
Python Scripts¶
Meteorological Data Processing:
import xarray as xr
import numpy as np
def prepare_met_data(input_file, output_file):
"""Convert meteorological data to CATChem format"""
# Read input data
ds = xr.open_dataset(input_file)
# Rename variables to CATChem conventions
ds = ds.rename({
't': 'temperature',
'u': 'u_wind',
'v': 'v_wind',
'q': 'specific_humidity'
})
# Add required attributes
ds.temperature.attrs['units'] = 'K'
ds.temperature.attrs['standard_name'] = 'air_temperature'
# Save in CATChem format
ds.to_netcdf(output_file, unlimited_dims=['time'])
# Usage
prepare_met_data('gfs_data.nc', 'catchem_met.nc')
Emission Data Processing:
def prepare_emission_data(edgar_file, output_file):
"""Convert EDGAR emissions to CATChem format"""
ds = xr.open_dataset(edgar_file)
# Convert units from kg/m2/s to mol/m2/s
mw_nox = 46.0 # g/mol
ds['NOx_emissions'] = ds['NOx'] / mw_nox * 1000
# Add temporal variation
time_factors = create_temporal_profiles()
ds = apply_temporal_profiles(ds, time_factors)
ds.to_netcdf(output_file)
Command Line Tools¶
CDO (Climate Data Operators):
# Regrid meteorological data
cdo remapbil,target_grid.nc input_met.nc output_met.nc
# Extract time period
cdo seldate,2024-01-01,2024-01-31 input.nc output.nc
# Calculate daily means
cdo daymean input.nc daily_output.nc
# Merge multiple files
cdo mergetime input_*.nc merged_output.nc
NCO (NetCDF Operators):
# Extract variables
ncks -v temperature,pressure,humidity input.nc output.nc
# Rename variables
ncrename -v t,temperature input.nc
# Add attributes
ncatted -a units,temperature,c,c,"K" input.nc
# Concatenate along time dimension
ncrcat input_*.nc output.nc
Data Quality Control¶
Validation Checks¶
Meteorological Data:
def validate_met_data(filename):
"""Validate meteorological input data"""
ds = xr.open_dataset(filename)
# Check for required variables
required_vars = ['temperature', 'pressure', 'humidity']
missing_vars = [v for v in required_vars if v not in ds.variables]
if missing_vars:
raise ValueError(f"Missing variables: {missing_vars}")
# Check for reasonable ranges
if ds.temperature.min() < 200 or ds.temperature.max() > 350:
raise ValueError("Temperature values outside reasonable range")
# Check for missing data
if ds.temperature.isnull().any():
raise ValueError("Missing temperature data detected")
print("Meteorological data validation passed")
Emission Data:
def validate_emission_data(filename):
"""Validate emission input data"""
ds = xr.open_dataset(filename)
# Check for negative emissions
for var in ds.data_vars:
if (ds[var] < 0).any():
print(f"Warning: Negative emissions found in {var}")
# Check total emissions
total_nox = ds.NOx_emissions.sum()
print(f"Total NOx emissions: {total_nox.values:.2e} mol/s")
File Organization¶
Recommended Directory Structure¶
/data/catchem_run/
├── config/
│ ├── catchem_config.yml
│ ├── chemistry_config.yml
│ └── emissions_config.yml
├── meteorology/
│ ├── met_20240101_00.nc
│ ├── met_20240101_06.nc
│ └── ...
├── emissions/
│ ├── anthropogenic/
│ │ ├── emis_anthro_202401.nc
│ │ └── emis_anthro_202402.nc
│ ├── biogenic/
│ │ ├── emis_bio_20240101.nc
│ │ └── emis_bio_20240102.nc
│ └── point_sources.csv
├── geographic/
│ ├── land_use.nc
│ ├── topography.nc
│ └── roughness.nc
└── chemistry/
├── species.yml
├── mechanism.yml
└── photolysis_rates.nc
File Naming Conventions¶
Time-based Files:
%Y- 4-digit year (2024)%m- 2-digit month (01-12)%d- 2-digit day (01-31)%H- 2-digit hour (00-23)
Troubleshooting¶
Common Issues¶
File Not Found Errors:
# Check file paths in configuration
grep -r "file_template\|directory" config/
# Verify file existence
ls -la /data/meteorology/met_20240101_00.nc
# Check file permissions
stat /data/meteorology/met_20240101_00.nc
Format Errors:
# Check NetCDF file structure
ncdump -h problematic_file.nc
# Validate CF conventions
cfchecker problematic_file.nc
# Check for corrupted files
ncdump -t problematic_file.nc > /dev/null
Unit Conversion Issues:
# Common unit conversions
# Temperature: Celsius to Kelvin
temp_k = temp_c + 273.15
# Pressure: hPa to Pa
pressure_pa = pressure_hpa * 100
# Mixing ratio: kg/kg to mol/mol
mixing_ratio_mol = mixing_ratio_kg * mw_air / mw_species
Best Practices¶
- Validate all input data before running simulations
- Use consistent units throughout all input files
- Follow CF conventions for NetCDF metadata
- Document data sources and processing steps
- Check for missing data and handle appropriately
- Use appropriate temporal resolution for your application
- Verify spatial alignment between different datasets
- Test with small domains before full simulations