freva-client databrowser data-overview --host https://www.gems.dkrz.de

Available search flavours:
- freva
- cmip6
- cmip5
- cordex
- nextgems
- user
Search attributes by flavour:
  cmip5:
  - experiment
124 more lines truncated

from freva_client import databrowser
db = databrowser(host="https://www.gems.dkrz.de",
                 flavour="cmip6",
                 mip_era="mpi-ge", variable_id="tas")
db

databrowser.metadata_search(host="https://www.gems.dkrz.de", flavour="cmip6")

{'activity_id': ['aerchemmip',
                 'afr-22',
                 'afr-44',
                 'amsre_soilmoisture',
                 'amsre_sst',
                 'arc-44',
                 'arc-44i',
                 'arctic_lead_fraction_amsre',
                 'arctic_lead_fraction_cryosat2',
                 'arctic_meltponds',
... (9738 more lines truncated) ...

databrowser.metadata_search(host="https://www.gems.dkrz.de", flavour="cmip6").keys()

dict_keys(['member_id', 'experiment_id', 'institution_id', 'source_id', 'activity_id', 'mip_era', 'realm', 'time_aggregation', 'frequency', 'variable_id'])

freva-client databrowser metadata-search --host https://www.gems.dkrz.de

member_id: r001i1850p3,  r001i2005p3,  r002i1850p3, ...
experiment_id: 1pctco2,  historical,  picontrol, ...
institution_id: mpi-m
source_id: mpi-esm
activity_id: output1
mip_era: mpi-ge
realm: atmos,  land,  ocean, ...
time_aggregation: mean
frequency: mon,  yr, ...
variable_id: baresoilfrac,  c3pftfrac,  c4pftfrac, ...
1 more lines truncated

databrowser metadata-search --host https://www.gems.dkrz.de --facet mpi-ge --flavour cmip6 \
--json | jq -cr 'keys'

activity_id, experiment_id, frequency, institution_id, member_id, mip_era, realm, source_id, time_aggregation, variable_id

freva-client databrowser metadata-search --facet mpi-ge \
 --host https://www.gems.dkrz.de \
 --flavour cmip6

member_id: r001i1850p3,  r001i2005p3,  r002i1850p3, ...
experiment_id: 1pctco2,  historical,  picontrol, ...
institution_id: mpi-m
source_id: mpi-esm
activity_id: output1
mip_era: mpi-ge
realm: atmos,  land,  ocean, ...
time_aggregation: mean
frequency: mon,  yr, ...
variable_id: baresoilfrac,  c3pftfrac,  c4pftfrac, ...

databrowser.metadata_search("mpi-ge", host="https://www.gems.dkrz.de", flavour="cmip6")

{'activity_id': ['output1'],
 'experiment_id': ['1pctco2', 'historical', 'picontrol', '...'],
 'frequency': ['mon', 'yr'],
 'institution_id': ['mpi-m'],
 'member_id': ['r001i1850p3', 'r001i2005p3', 'r002i1850p3', '...'],
 'mip_era': ['mpi-ge'],
 'realm': ['atmos', 'land', 'ocean', '...'],
 'source_id': ['mpi-esm'],
 'time_aggregation': ['mean'],
 'variable_id': ['baresoilfrac', 'c3pftfrac', 'c4pftfrac', '...']}

freva-client databrowser metadata-search --host www.gems.dkrz.de --flavour cmip6 \
mip_era=mpi-ge variable_id=tas

member_id: r001i1850p3,  r001i2005p3,  r002i1850p3, ...
experiment_id: 1pctco2,  historical,  picontrol, ...
institution_id: mpi-m
source_id: mpi-esm
activity_id: output1
mip_era: mpi-ge
realm: atmos,  land, ...
time_aggregation: mean
frequency: mon
variable_id: tas

databrowser.metadata_search(
    flavour="cmip6",
    mip_era="mpi-ge",
    variable_id="tas",
    host="www.gems.dkrz.de"
)

{'activity_id': ['output1'],
 'experiment_id': ['1pctco2', 'historical', 'picontrol', '...'],
 'frequency': ['mon'],
 'institution_id': ['mpi-m'],
 'member_id': ['r001i1850p3', 'r001i2005p3', 'r002i1850p3', '...'],
 'mip_era': ['mpi-ge'],
 'realm': ['atmos', 'land'],
 'source_id': ['mpi-esm'],
 'time_aggregation': ['mean'],
 'variable_id': ['tas']}

freva-client databrowser data-count  --host www.gems.dkrz.de --flavour cmip6 \
mip_era=mpi-ge frequency=mon variable_id=pr variable_id=tas \
time='2025-01 to 2100-12'

602

db_yr =  databrowser(
    host="https://www.gems.dkrz.de", 
    flavour="cmip6", 
    mip_era="mpi-ge", 
    frequency="mon", 
    variable_id="tas",
    time="2025-01 to 2100-12"
))
len(db_yr)

602

freva-client databrowser data-search mip_era=mpi-ge variable_id=tas \
frequency=mon --time "2025-01 to 2100-12" experiment_id=picontrol \
--host  https://www.gems.dkrz.de --flavour cmip6

/work/mh1007/CMOR/MPI-GE/output1/MPI-M/MPI-ESM/piControl/mon/atmos/tas/r001i1850p3/v20190123/tas_Amon_MPI-ESM_piControl_r001i1850p3_210001-219912.nc
/work/mh1007/CMOR/MPI-GE/output1/MPI-M/MPI-ESM/piControl/mon/atmos/tas/r001i1850p3/v20190123/tas_Amon_MPI-ESM_piControl_r001i1850p3_200001-209912.nc

pcontrol = databrowser(mip_era="mpi-ge",
                       variable_id="tas",
                       frequency="mon", time="2025-01 to 2100-12", experiment_id="picontrol",
                       host="https://www.gems.dkrz.de", flavour="cmip6")
list(pcontrol)

['/work/mh1007/CMOR/MPI-GE/output1/MPI-M/MPI-ESM/piControl/mon/atmos/tas/r001i1850p3/v20190123/tas_Amon_MPI-ESM_piControl_r001i1850p3_210001-219912.nc',
 '/work/mh1007/CMOR/MPI-GE/output1/MPI-M/MPI-ESM/piControl/mon/atmos/tas/r001i1850p3/v20190123/tas_Amon_MPI-ESM_piControl_r001i1850p3_200001-209912.nc']

freva-client databrowser metadata-search --host www.gems.dkrz.de \
file=/work/mh1007/CMOR/MPI-GE/output1/MPI-M/MPI-ESM/piControl/mon/atmos/tas/r001i1850p3/v20190123/tas_Amon_MPI-ESM_piControl_r001i1850p3_210001-219912.nc

ensemble: r001i1850p3
experiment: picontrol
institute: mpi-m
model: mpi-esm
product: output1
project: mpi-ge
realm: atmos
time_aggregation: mean
time_frequency: mon
variable: tas

db.metadata_search(
    file="/work/mh1007/CMOR/MPI-GE/output1/MPI-M/MPI-ESM/piControl/mon/atmos/tas/r001i1850p3/v20190123/tas_Amon_MPI-ESM_piControl_r001i1850p3_210001-219912.nc",
    host="www.gems.dkrz.de"
)

{'ensemble': ['r001i1850p3'],
 'experiment': ['picontrol'],
 'institute': ['mpi-m'],
 'model': ['mpi-esm'],
 'product': ['output1'],
 'project': ['mpi-ge'],
 'realm': ['atmos'],
 'time_aggregation': ['mean'],
 'time_frequency': ['mon'],
 'variable': ['tas']}

from freva_client import authenticate
import xarray as xr
token = authenticate(host="www.gems.dkrz.de", token_file=Path("~/.token.json").expanduser())
data = databrowser(host="www.gems.dkrz.de", flavour="cmip6",
                   mip_era="mpi-ge",
                   variable_id="tas",
                   experiment_id="historical",
                   stream_zarr=True)
uri = list(data)[0]
uri

'https://www.nextgems.dkrz.de/api/freva-nextgen/data-portal/zarr/5b8ec334-d2c9-536d-9b8e-249bbd34ac23.zarr'

ds = xr.open_dataset(
    uri,
    engine="zarr",
    decode_times=xr.coders.CFDatetimeCoder(use_cftime=True),
    storage_options={
        "headers":{"Authorization": f"Bearer {token['access_token']}"}
   })
print(ds)

<xarray.Dataset> Size: 138MB
Dimensions:  (time: 1872, lat: 96, lon: 192)
Coordinates:
  * time     (time) object 15kB 1850-01-31 23:52:00 ... 2005-12-31 23:52:00
  * lon      (lon) float64 2kB 0.0 1.875 3.75 5.625 ... 352.5 354.4 356.2 358.1
  * lat      (lat) float64 768B -88.57 -86.72 -84.86 -83.0 ... 84.86 86.72 88.57
Data variables:
    tas      (time, lat, lon) float32 138MB ...
Attributes:
    CDI:          Climate Data Interface version 1.9.5 (http://mpimet.mpg.de/...
    Conventions:  CF-1.6
    history:      Thu Dec 13 08:45:54 2018: cdo -f nc -r setpartabn,tas.parta...
    institution:  Max-Planck-Institute for Meteorology
    CDO:          Climate Data Operators version 1.9.5 (http://mpimet.mpg.de/...

$ module load clint gems
$ da-workshop-setup

export PATH=/sw/spack-levante/cdo-2.2.2-4z4icb/bin:$PATH
freva-client --version

freva-client: 2508.0.0

from freva_client import databrowser, __version__
print(__version__)

2508.0.0

$ freva-client databrowser data-overview

Available search flavours:
- freva
- cmip6
- cmip5
- cordex
- nextgems
- user
Search attributes by flavour:
  cmip5:
  - experiment
  - member_id
  - fs_type
  - grid_label
  - institution_id
  - model_id
  - project
118 more lines truncated

databrowser.metadata_search(host="https://www.gems.dkrz.de").keys()

dict_keys(['ensemble', 'experiment', 'institute', 'model', 'product', 'project', 'realm', 'time_aggregation', 'time_frequency', 'variable'])

$ freva-client databrowser metadata-search --host https://www.gems.dkrz.de \
--facet mpi-ge --json | jq -r 'to_entries[] | select(.value[] == "mpi-ge") \
| .key'

project

$ freva-client databrowser metadata-search project=mpi-ge --json \
| jq -rc '.variable | index("tas") != null'

true

metadata = databrowser.metadata_search("mpi-ge", host="https://www.gems.dkrz.de")
facets_with_mpi_ge = [facet for facet, values in metadata.items() if "mpi-ge" in values]
print(facets_with_mpi_ge)

['project']

"tas" in databrowser.metadata_search("mpi-ge",
                                     host="https://www.gems.dkrz.de")["variable"]

True

$ freva-client databrowser metadata-search --host www.gems.dkrz.de --flavour cmip6 \
mip_era=mpi-ge variable_id=tas --time '2025-01 to 2100-12'

member_id: r001i1850p3,  r001i2005p3,  r002i2005p3, ...
experiment_id: picontrol,  rcp26,  rcp45, ...
institution_id: mpi-m
source_id: mpi-esm
activity_id: output1
mip_era: mpi-ge
realm: atmos,  land, ...
time_aggregation: mean
frequency: mon
variable_id: tas
1 more lines truncated

from freva_client import databrowser
db = databrowser(host="https://www.gems.dkrz.de",
                 flavour="cmip6",
                 mip_era="mpi-ge",
                 variable_id="tas",
                 time="2025-01 to 2100-12")
db

$ freva-client databrowser metadata-search  --host www.gems.dkrz.de \
--flavour cmip6 mip_era=mpi-ge variable_id=tas --json \
|  /run/current-system/sw/bin/jq -c .frequency

["mon"]

$ freva-client databrowser data-count  --host www.gems.dkrz.de --flavour cmip6 \
mip_era=mpi-ge frequency=mon variable_id=pr variable_id=tas \
time='2025-01 to 2100-12'

602

db.metadata["frequency"]

['mon']

db_yr =  databrowser(
    host="https://www.gems.dkrz.de", 
    flavour="cmip6", 
    mip_era="mpi-ge", 
    frequency="mon", 
    variable_id="tas",
    time="2025-01 to 2100-12"
)
len(db_yr)

602

$ freva-client databrowser metadata-search mip_era=mpi-ge variable_id=tas \
frequency=mon --time "2025-01 to 2100-12" --json \
--host  https://www.gems.dkrz.de --flavour cmip6 \
 | jq -c ".experiment_id"

["picontrol","rcp26","rcp45","rcp85"]

$ freva-client databrowser data-count mip_era=mpi-ge variable_id=tas \
frequency=mon --time "2025-01 to 2100-12" --json experiment_id=picontrol \
--host  https://www.gems.dkrz.de --flavour cmip6

2

$ freva-client databrowser data-search mip_era=mpi-ge variable_id=tas \
frequency=mon --time "2025-01 to 2100-12" experiment_id=picontrol \
--host  https://www.gems.dkrz.de --flavour cmip6

/work/mh1007/CMOR/MPI-GE/output1/MPI-M/MPI-ESM/piControl/mon/atmos/tas/r001i1850p3/v20190123/tas_Amon_MPI-ESM_piControl_r001i1850p3_210001-219912.nc
/work/mh1007/CMOR/MPI-GE/output1/MPI-M/MPI-ESM/piControl/mon/atmos/tas/r001i1850p3/v20190123/tas_Amon_MPI-ESM_piControl_r001i1850p3_200001-209912.nc

db_yr.metadata["experiment_id"]

['picontrol', 'rcp26', 'rcp45', 'rcp85']

pcontrol = databrowser(mip_era="mpi-ge",
                       variable_id="tas",
                       frequency="mon", time="2025-01 to 2100-12", experiment_id="picontrol",
                       host="https://www.gems.dkrz.de", flavour="cmip6")
len(pcontrol)

2

for num, file in enumerate(pcontrol):
    print(file)
    if num > 1:
        break

/work/mh1007/CMOR/MPI-GE/output1/MPI-M/MPI-ESM/piControl/mon/atmos/tas/r001i1850p3/v20190123/tas_Amon_MPI-ESM_piControl_r001i1850p3_210001-219912.nc
/work/mh1007/CMOR/MPI-GE/output1/MPI-M/MPI-ESM/piControl/mon/atmos/tas/r001i1850p3/v20190123/tas_Amon_MPI-ESM_piControl_r001i1850p3_200001-209912.nc

$ freva-client databrowser metadata-search file=/work/mh1007/CMOR/MPI-GE/output1/MPI-M/MPI-ESM/piControl/mon/atmos/tas/r001i1850p3/v20190123/tas_Amon_MPI-ESM_piControl_r001i1850p3_210001-219912.nc --host=www.gems.dkrz.de

ensemble: r001i1850p3
experiment: picontrol
institute: mpi-m
model: mpi-esm
product: output1
project: mpi-ge
realm: atmos
time_aggregation: mean
time_frequency: mon
variable: tas

db.metadata_search(
    file="/work/mh1007/CMOR/MPI-GE/output1/MPI-M/MPI-ESM/piControl/mon/atmos/tas/r001i1850p3/v20190123/tas_Amon_MPI-ESM_piControl_r001i1850p3_210001-219912.nc",
    host="www.gems.dkrz.de"
)

{'ensemble': ['r001i1850p3'],
 'experiment': ['picontrol'],
 'institute': ['mpi-m'],
 'model': ['mpi-esm'],
 'product': ['output1'],
 'project': ['mpi-ge'],
 'realm': ['atmos'],
 'time_aggregation': ['mean'],
 'time_frequency': ['mon'],
 'variable': ['tas']}

freva-client databrowser metadata-search mip_era=mpi-ge variable_id=tas \
frequency=mon --time="2025-01 to 2100-12"  experiment_id='!picontrol' --json \
--host https://www.gems.dkrz.de --flavour cmip6| jq -rc '.experiment_id| join(" ")'

rcp26 rcp45 rcp85

db = databrowser(mip_era="mpi-ge", variable_id="tas",
                 frequency="mon", time="2025-01 to 2100-12",
                 experiment_id="!picontrol", flavour="cmip6",
                 host="https://www.gems.dkrz.de"
                )
db.metadata["experiment_id"]

['rcp26', 'rcp45', 'rcp85']

temp_dir=$(mktemp -d --suffix cdo)
for exp in $experiments ;do
    outlist=()
    # Let's get only the first 5 ensemble member for brevity
    members=$(freva-client databrowser metadata-search \
    project=mpi-ge variable=tas time_frequency=mon --time="2025-01 to 2100-12" experiment="$exp" --json |
    jq -r '.ensemble | unique | .[:5] | join(" ")')
    for ens in $members;do
        echo -ne "Reading data and calculating TS for experiment $exp in ens: $ens\r"
        files=$(freva-client databrowser data-search project=mpi-ge variable=tas time_frequency=mon --time="2025-01 to 2100-12" experiment=$exp ensemble=$ens realm=atmos)
        outfile="$temp_dir/tas_mean_${exp}_${ens}.nc"
        cdo -s fldmean -mergetime $files "$outfile"
        outlist+=("$outfile")
    done
    cdo mergetime "${outlist[@]}" "$temp_dir/tas_ensemble_${exp}.nc"
done
cdo mergetime $temp_dir/tas_ensemble_*.nc tas_all_experiments.nc

cdo    mergetime: Processed 5640 values from 5 variables over 5640 timesteps [0.03s 25MB]
cdo    mergetime: Processed 5640 values from 5 variables over 5640 timesteps [0.03s 25MB]
cdo    mergetime: Processed 5640 values from 5 variables over 5640 timesteps [0.03s 25MB]

import matplotlib.pyplot as plt

colors = plt.rcParams["axes.prop_cycle"].by_key()["color"]
plt.figure(figsize=(12, 5))

for i, exp in enumerate(data.experiment.values):
    ts = data.sel(experiment=exp).resample(time="1YE").mean()
    ts_min = ts.min(dim="ensemble").squeeze()
    ts_max = ts.max(dim="ensemble").squeeze()
    ts_mean = ts.mean(dim="ensemble").squeeze()
    plt.fill_between(
        ts.time,
        ts_min,
        ts_max,
        color=colors[i % len(colors)],
        alpha=0.2,
        label=None,
    )
    plt.plot(
        ts.time,
        ts_mean,
        color=colors[i % len(colors)],
        linewidth=2,
        label=f"{exp}"
    )
plt.title("Ensemble Mean and Spread for Each Experiment")
plt.xlabel("Time")
plt.ylabel(f"{data.attrs['long_name']} [{data.attrs['units']}]")
plt.legend()
plt.grid(True)
plt.tight_layout()
plt.show()

$ freva-client databrowser intake-catalogue  --host www.gems.dkrz.de --flavour cmip6 \
mip_era=mpi-ge frequency=mon variable_id=tas --time "2025-01 to 2100-12" \
experiment_id=picontrol

{
   "esmcat_version": "0.1.0",
   "attributes": [
      {
         "column_name": "mip_era",
         "vocabulary": ""
      },
      {
         "column_name": "activity_id",
         "vocabulary": ""
      },
      {
         "column_name": "institution_id",
         "vocabulary": ""
      },
      {
         "column_name": "source_id",
         "vocabulary": ""
      },
      {
         "column_name": "experiment_id",
         "vocabulary": ""
      },
      {
         "column_name": "frequency",
         "vocabulary": ""
      },
      {
         "column_name": "realm",
         "vocabulary": ""
      },
      {
         "column_name": "variable_id",
         "vocabulary": ""
      },
      {
         "column_name": "member_id",
         "vocabulary": ""
      },
      {
         "column_name": "table_id",
         "vocabulary": ""
      },
      {
         "column_name": "fs_type",
         "vocabulary": ""
      },
      {
         "column_name": "grid_label",
         "vocabulary": ""
      },
      {
         "column_name": "format",
         "vocabulary": ""
      }
   ],
   "assets": {
      "column_name": "file",
      "format_column_name": "format"
   },
   "id": "freva",
   "description": "Catalogue from freva-databrowser v2507.0.0",
   "title": "freva-databrowser catalogue",
   "last_updated": "2025-08-05T14:06:40.039213",
   "aggregation_control": {
      "variable_column_name": "variable_id",
      "groupby_attrs": [],
      "aggregations": [
         {
            "type": "union",
            "attribute_name": "mip_era",
            "options": {}
         },
         {
            "type": "union",
            "attribute_name": "activity_id",
            "options": {}
         },
         {
            "type": "union",
            "attribute_name": "institution_id",
            "options": {}
         },
         {
            "type": "union",
            "attribute_name": "source_id",
            "options": {}
         },
         {
            "type": "union",
            "attribute_name": "experiment_id",
            "options": {}
         },
         {
            "type": "union",
            "attribute_name": "frequency",
            "options": {}
         },
         {
            "type": "union",
            "attribute_name": "realm",
            "options": {}
         },
         {
            "type": "union",
            "attribute_name": "variable_id",
            "options": {}
         },
         {
            "type": "union",
            "attribute_name": "member_id",
            "options": {}
         },
         {
            "type": "union",
            "attribute_name": "table_id",
            "options": {}
         },
         {
            "type": "union",
            "attribute_name": "fs_type",
            "options": {}
         },
         {
            "type": "union",
            "attribute_name": "grid_label",
            "options": {}
         },
         {
            "type": "union",
            "attribute_name": "format",
            "options": {}
         }
      ]
   }
,
   "catalog_dict": [
   {
   "file": "/work/mh1007/CMOR/MPI-GE/output1/MPI-M/MPI-ESM/piControl/mon/atmos/tas/r001i1850p3/v20190123/tas_Amon_MPI-ESM_piControl_r001i1850p3_210001-219912.nc",
   "project": "MPI-GE",
   "product": "output1",
   "institute": "MPI-M",
   "model": "MPI-ESM",
   "experiment": "piControl",
   "time_frequency": "mon",
   "realm": "atmos",
   "variable": "tas",
   "ensemble": "r001i1850p3",
   "cmor_table": "Amon",
   "fs_type": "posix",
   "grid_label": "gn",
   "format": "nc"
},
   {
   "file": "/work/mh1007/CMOR/MPI-GE/output1/MPI-M/MPI-ESM/piControl/mon/atmos/tas/r001i1850p3/v20190123/tas_Amon_MPI-ESM_piControl_r001i1850p3_200001-209912.nc",
   "project": "MPI-GE",
   "product": "output1",
   "institute": "MPI-M",
   "model": "MPI-ESM",
   "experiment": "piControl",
   "time_frequency": "mon",
   "realm": "atmos",
   "variable": "tas",
   "ensemble": "r001i1850p3",
   "cmor_table": "Amon",
   "fs_type": "posix",
   "grid_label": "gn",
   "format": "nc"
},
   {
   "file": "/work/mh1007/CMOR/MPI-GE/output1/MPI-M/MPI-ESM/piControl/mon/atmos/pr/r001i1850p3/v20190123/pr_Amon_MPI-ESM_piControl_r001i1850p3_210001-219912.nc",
   "project": "MPI-GE",
   "product": "output1",
   "institute": "MPI-M",
   "model": "MPI-ESM",
   "experiment": "piControl",
   "time_frequency": "mon",
   "realm": "atmos",
   "variable": "pr",
   "ensemble": "r001i1850p3",
   "cmor_table": "Amon",
   "fs_type": "posix",
   "grid_label": "gn",
   "format": "nc"
},
   {
   "file": "/work/mh1007/CMOR/MPI-GE/output1/MPI-M/MPI-ESM/piControl/mon/atmos/pr/r001i1850p3/v20190123/pr_Amon_MPI-ESM_piControl_r001i1850p3_200001-209912.nc",
   "project": "MPI-GE",
   "product": "output1",
   "institute": "MPI-M",
   "model": "MPI-ESM",
   "experiment": "piControl",
   "time_frequency": "mon",
   "realm": "atmos",
   "variable": "pr",
   "ensemble": "r001i1850p3",
   "cmor_table": "Amon",
   "fs_type": "posix",
   "grid_label": "gn",
   "format": "nc"
}
   ]
}

db = databrowser(mip_era="mpi-ge", variable_id="tas",
                 frequency="mon", time="2025-01 to 2100-12",
                 experiment_id="picontrol", flavour="cmip6",
                 host="https://www.gems.dkrz.de"
                )
db.intake_catalogue()

$ freva-client databrowser stac-catalogue --host www.gems.dkrz.de --flavour cmip6 \
mip_era=mpi-ge frequency=mon variable_id=tas --time "2025-01 to 2100-12" \
experiment_id=picontrol

Downloading the STAC catalog started ...
STAC catalog saved to: /Users/mo/dev/20250805/Talks/talks/DataSearchWorkshop2025/stac-catalog-Dataset-cmip6-ca87bec5-e66-file.zip (size: 0.02 MB). Or simply download from: https://www.gems.dkrz.de/api/freva-nextgen/databrowser/stac-catalogue/cmip6/file?multi-version=False&mip_era=mpi-ge&frequency=mon&variable_id=pr&variable_id=tas&experiment_id=picontrol&time=2025-01+to+2100-12&time_select=flexible

db = databrowser(mip_era="mpi-ge", variable_id="tas",
                 frequency="mon", time="2025-01 to 2100-12",
                 experiment_id="picontrol", flavour="cmip6",
                 host="https://www.gems.dkrz.de"
                )
db.stac_catalogue()

Downloading the STAC catalog started ...

'STAC catalog saved to: /Users/mo/dev/20250805/Talks/talks/DataSearchWorkshop2025/stac-catalog-Dataset-cmip6-ec013869-2c5-file.zip (size: 0.01 MB). Or simply download from: https://www.gems.dkrz.de/api/freva-nextgen/databrowser/stac-catalogue/cmip6/file?multi-version=False&mip_era=mpi-ge&variable_id=tas&frequency=mon&experiment_id=picontrol&time=2025-01+to+2100-12&time_select=flexible'

USER=$(whoami)
OUTFILE="dummy_sst_${USER}.nc"
GRIDFILE="$(mktemp)"
cat > "$GRIDFILE" << EOF
gridtype = lonlat
xsize    = 171
ysize    = 121
xfirst   = 120
xinc     = 1
yfirst   = -30
yinc     = 0.5
xname    = lon
yname    = lat
xunits   = degrees_east
yunits   = degrees_north
EOF
cdo -f nc const,0,"$GRIDFILE" base.nc
cdo -f nc \
    -setcalendar,standard \
    -settaxis,2025-01-01,00:00:00,1month \
    -expr,'sst=28-0.006*(clon(const)-230)' \
    -setattribute,sst@long_name="Idealized Pacific SST" \
    -setattribute,sst@units="°C" \
    base.nc "$OUTFILE"
rm -f base.nc "$GRIDFILE"
echo "Wrote $OUTFILE → exists? $( [[ -f $OUTFILE ]] && echo yes || echo no )"

import numpy as np, xarray as xr, os
from getpass import getuser
from freva_client import databrowser, authenticate
from pathlib import Path
time = np.arange("2025-01","2026-01",dtype="datetime64[M]")
lat, lon = np.linspace(-30,30,121), np.linspace(120,290,171)
da = (xr.DataArray(28 - 0.006*(lon-230), dims=("lon",), coords={"lon":lon})
      .expand_dims(time=time, lat=lat))
da.name = "sst"; da.attrs.update(long_name="Idealized Pacific SST", units="°C")
da.to_netcdf(f"dummy_sst_{getuser()}.nc")
print(f"check if dummy_sst_{getuser()}.nc exists: {os.path.exists(f'dummy_sst_{getuser()}.nc')}")

check if dummy_sst_mo.nc exists: True

import matplotlib.pyplot as plt, cartopy.crs as ccrs, cartopy.feature as cfeature

fig, ax = plt.subplots(figsize=(8,4), subplot_kw=dict(projection=ccrs.PlateCarree(180)))
da.mean("time").plot.pcolormesh(ax=ax, transform=ccrs.PlateCarree(), cmap="coolwarm", add_colorbar=True)

ax.add_feature(cfeature.LAND, facecolor="white", zorder=2); ax.coastlines(zorder=3); ax.add_feature(cfeature.BORDERS, linestyle=":", zorder=3)
plt.title("Pacific SST"); plt.tight_layout(); plt.show()

cat > ~/.freva-access-token.json << 'EOF'

EOF

token = """ """
_ = (Path.home() / ".freva-access-token.json").write_text(token)

global_attributes = {"project": "userdata", "product": "stats", "model": "IFS", "experiment": "ETCCDI", "realm": "atmos"}
databrowser.userdata(
    action="add",
    userdata_items=[f"{os.getcwd()}/dummy_sst_{getuser()}.nc"],
    metadata=global_attributes,
)

1 have been successfully added to the databrowser. 0 files were duplicates and not added.

global_attributes = {"project": "userdata", "product": "stats", "model": "IFS", "experiment": "ETCCDI", "realm": "atmos"}
databrowser.userdata(
    action="delete",
    metadata=global_attributes,
)

User data deleted successfully

$ freva-client databrowser user-data add --path dummy_sst_mo.nc --facet project=userdata --facet product=stats --facet model=IFS --facet experiment=ETCCDI --facet realm=atmos --token-file ~/.freva-access-token.json

1 have been successfully added to the databrowser. 0 files were duplicates and 
not added.

$ freva-client databrowser user-data delete --search-key project=userdata --search-key product=stats --search-key model=IFS --search-key experiment=ETCCDI --search-key realm=atmos --token-file ~/.freva-access-token.json

User data deleted successfully

#conda install awscli ||pip install awscli
aws s3 ls s3://ncar-cesm2-lens/ --no-sign-request

alias awss3='aws s3 --no-sign-request'
awss3 cp --recursive  s3://ncar-cesm2-lens/atm/static/grid.zarr /tmp/grid.zarr

module load minio-client
mc alias set aws https://s3.amazonaws.com "" "" # empty keys = no authentication
mc ls aws/ncar-cesm2-lens/ice/static/grid.zarr

mc alias set dkrz https://s3.eu-dkrz-1.dkrz.cloud "" "" # no authentication ->  read-only!!!
mc ls dkrz/freva
#mc cp /scratch/${USER:0:1}/${USER}/path/to/my/file.txt dkrz/freva/workshop/${USER}/myfile.txt

dataset_url = 's3://ncar-cesm2-lens/ocn/static/grid.zarr'
s3_opts = {'anon': True }
# s3_minio_opt = { # From playground:  https://docs.min.io/community/minio-object-store/developers/python/minio-py.html#id3
#     'key': 'Q3AM3UQ867SPQQA43P2F',
#     'secret': 'zuf+tfteSlswRu7BJ86wekitnifILbZam1KYY3TG',
#     'client_kwargs':{'endpoint_url': 'https://play.min.io'}
# }

import xarray as xr
ds = xr.open_dataset(dataset_url, engine='zarr', storage_options=s3_opts)

import zarr
zarr.open(dataset_url, storage_options=s3_opts)

import s3fs
s3 = s3fs.S3FileSystem(endpoint_url="https://s3.eu-dkrz-1.dkrz.cloud", anon=True)
with s3.open('s3://freva/workshop/tas.nc') as f:
    ds = xr.open_dataset(f)

$ module load clint gems
$ da-workshop-setup

strip_dict(databrowser.metadata_search(fs_type="s3"))

{'ensemble': ['r1i1p1f1'],
 'experiment': ['historical', 'ssp370'],
 'institute': ['ncar'],
 'model': ['cesm2-le'],
 'product': ['output'],
 'project': ['cesm2-le'],
 'realm': ['atm', 'ice', 'lnd', '...'],
 'time_aggregation': ['mean'],
 'time_frequency': ['daily', 'monthly', 'static'],
 'variable': ['aice', 'aice_d', 'dic', '...']}

search_keys = {
    'fs_type': 's3',
    'project':'cesm2-le',
    'experiment': 'historical',
    'realm': 'atm',
    'time_frequency':'monthly'
}
'ts' in databrowser.metadata_search(**search_keys)['variable']

True

db = databrowser(variable='ts', **search_keys)
list(db)

['s3://ncar-cesm2-lens/atm/monthly/cesm2LE-historical-smbb-TS.zarr',
 's3://ncar-cesm2-lens/atm/monthly/cesm2LE-historical-cmip6-TS.zarr']

import xarray as xr
xr.open_zarr(list(db)[0], storage_options={ 'anon':True })
print(xr.open_dataset(list(db)[0], engine='zarr', storage_options={ 'anon':True }))

<xarray.Dataset> Size: 22GB
Dimensions:    (member_id: 50, time: 1980, lat: 192, lon: 288, nbnd: 2)
Coordinates:
  * lat        (lat) float64 2kB -90.0 -89.06 -88.12 ... 88.12 89.06 90.0
  * lon        (lon) float64 2kB 0.0 1.25 2.5 ... 355.0 356.2 357.5 358.8
  * member_id  (member_id) <U12 2kB 'r10i1191p1f2' ... 'r9i1171p1f2'
  * time       (time) object 16kB 1850-01-16 12:00 ... 2014-12-16 12:00:00
    time_bnds  (time, nbnd) object 32kB ...
Dimensions without coordinates: nbnd
Data variables:
    TS         (member_id, time, lat, lon) float32 22GB ...
Attributes:
    Conventions:       CF-1.0
    logname:           sunseon
    model_doi_url:     https://doi.org/10.5065/D67H1H0V
    source:            CAM
    time_period_freq:  month_1
    topography_file:   /mnt/lustre/share/CESM/cesm_input/atm/cam/topo/fv...

from pathlib import Path
s3_opts = { 'anon':True }
time_series = {}

for fileurl in db:
    print(f"Opening {fileurl}...")
    engine = {'engine':'zarr'} if fileurl.endswith('zarr') else {}
    ds = xr.open_dataset(fileurl, **engine, storage_options=s3_opts)
    ds = ds.rename({'member_id':'ensemble'})
    
    # Going through all ensembles might take some time
    # let's make a cut at 5 member for demo purposes
    # Memory usage might spike to ~70GB 
    members=ds.ensemble[:5]
    ds = ds.sel(ensemble=members)    
    
    # cesm2LE-historical-cmip6-TS cesm2LE-historical-smbb-TS
    dataset_name=Path(fileurl).stem  
    
    mean_ts = field_mean(ds["TS"])
    mean_ts.attrs['source_dataset'] = fileurl
    time_series[dataset_name] = mean_ts

Opening s3://ncar-cesm2-lens/atm/monthly/cesm2LE-historical-smbb-TS.zarr
Opening s3://ncar-cesm2-lens/atm/monthly/cesm2LE-historical-cmip6-TS.zarr

import matplotlib.pyplot as plt

# Create a color cycle for different experiments
colors = plt.rcParams["axes.prop_cycle"].by_key()["color"]

plot = plt.figure(figsize=(12, 5))

for i, (exp, data) in enumerate(time_series.items()):
    ts = data.resample(time="1YE").mean()
    ts_min = ts.min(dim="ensemble").squeeze()
    ts_max = ts.max(dim="ensemble").squeeze()
    ts_mean = ts.mean(dim="ensemble").squeeze()
    time_values = np.array(ts.time.values, dtype='datetime64[ns]')
    # Plot min–max shading
    plt.fill_between(
        time_values,
        ts_min.values,
        ts_max.values,
        color=colors[i % len(colors)],
        alpha=0.2,
        label=None,
    )
    # Plot mean line
    plt.plot(
        time_values,
        ts_mean.values,
        color=colors[i % len(colors)],
        linewidth=2,
        label=f"{exp}"
    )

# Add plot decorations
plt.title("Ensemble Mean and Spread for Each Dataset")
plt.xlabel("Time")
plt.ylabel(f"{data.attrs['long_name']} [{data.attrs['units']}]")  # Replace with actual units
plt.legend()
plt.grid(True)
plt.tight_layout()
plt.show()

from getpass import getuser # to get USER env variable
USERNAME = getuser()
s3_config = {
    'bucket' : 'freva',
    'endpoint' :'https://s3.eu-dkrz-1.dkrz.cloud', # DKRZ Minio S3
    'prefix' : f'workshop/{USERNAME}', # Avoid users writting object with same prefix
    'access_key_id' : "s3handson", # Only valid during the workshop
    'secret_access_key' : "s3handson", # Only valid during the workshop
    'region' : 'eu-dkrz-1',
}

import s3fs
s3 = s3fs.S3FileSystem(
    key = s3_config['access_key_id'],
    secret = s3_config['secret_access_key'],
    client_kwargs = {'endpoint_url': s3_config['endpoint']},
)

full_prefix = s3_config['bucket']+'/'+s3_config['prefix'] 
testobj = f'{full_prefix}/hi.txt'
s3.write_bytes(testobj, b'Hi!\n')

print(f"Writing data to S3 works! Test it with:\ncurl {s3_config['endpoint']}/{testobj}")

Writing data to S3 works! Test it with:
curl https://s3.eu-dkrz-1.dkrz.cloud/freva/workshop/k202186/hi.txt

!curl https://s3.eu-dkrz-1.dkrz.cloud/freva/workshop/${USER}/hi.txt

Hi!

figure_path = f'{full_prefix}/figure-ts-mean.png'
with s3.open(figure_path, 'wb') as f:
    plot.savefig(f)
print(f"Open this link to get the image: {s3_config['endpoint']}/{figure_path}")

Open this link to get the image: https://s3.eu-dkrz-1.dkrz.cloud/freva/workshop/k202186/figure-ts-mean.png

def write_netcdf(s3_path, dataset:xr.Dataset):
    tmp_name = f"/scratch/{USERNAME[0]}/{USERNAME}/{name}.nc"

    dataset.to_netcdf(tmp_name, engine='h5netcdf')
        
    ## Copy to s3
    with s3.open(s3_path,'wb') as s3file:
        with open(tmp_name, 'rb') as tmpf:
            s3file.write(tmpf.read())

    ## Check if we can open with Xarray!
    with s3.open(s3_path,'rb') as s3file:
        xr.open_dataset(s3file, engine='h5netcdf')
    
    ## We can now remove the local copy
    import os
    os.remove(tmp_name)

def write_zarr(store, dataset: xr.Dataset):     
    import zarr
    zarr.config.set(default_zarr_format=2)
    
    import numcodecs
    codec = numcodecs.Blosc(shuffle=1, clevel=6)
    
    data_encoding = {}
    for var in dataset.variables:
        data_encoding[var] = { "compressors": codec}

    dataset.to_zarr(store,
                    mode='w', # OVERWRITES existing data!!
                    encoding=data_encoding,
                    consolidated=True, # consolidate metadata for fast access
                    storage_options= { # We cannot use the anonymous mode anymore
                        'key':s3_config['access_key_id'], 
                        'secret':s3_config['secret_access_key'],
                        'client_kwargs':{
                            'endpoint_url': s3_config['endpoint']
                        },
                    })

def write_dataset_to_s3(name:str, dataset:xr.Dataset, file_format='nc'):
    _supported_types = ('nc','zarr')
    file_format = file_format.replace('.','')
    if file_format not in _supported_types:
        raise Exception('Unsuported file format, use one of')
    
    s3_path = f'{full_prefix}/{name}.{file_format}'
    if file_format == 'nc':
        write_netcdf(s3_path, dataset)

    elif file_format == 'zarr':
        write_zarr(f's3://{s3_path}', dataset)

    netcdf_mode= 'zarr' if file_format == 'zarr' else 'bytes'
    print(f"Try running:\n\t" \
          f"/fastdata/freva/opt/bin/ncdump -h {s3_config['endpoint']}/{s3_path}#mode=s3,{netcdf_mode}\n\t"\
          f"/fastdata/freva/opt/bin/cdo sinfo {s3_config['endpoint']}/{s3_path}#mode=s3,{netcdf_mode}")

for name, data in time_series.items():
    ## for CDO it is important that time is the first dimension
    ## also it cannot have indexers of type string
    ensemble = data.ensemble.values
    ensemble_id = np.arange(len(ensemble))
    
    dataset = data.to_dataset() \
            .transpose('time',...) \
            .assign_coords(ensemble_id=('ensemble',ensemble_id)) \
            .swap_dims({'ensemble':'ensemble_id'}) \
            .reset_coords(drop=True)\
            .assign_attrs(ensembles= ', '.join(data.ensemble.values))
    
    write_dataset_to_s3(name, dataset, file_format='nc')
    write_dataset_to_s3(name, dataset, file_format='zarr')

public_url = f"https://eu-dkrz-1.dkrz.cloud/browser/{s3_config['bucket']}/{s3_config['prefix']}"
f"{s3_config['endpoint']}/browser/{s3_config['bucket']}/{s3_config['prefix']}"
print(f"Browser the data in {public_url}")

Try running:
	/fastdata/freva/opt/bin/ncdump -h https://s3.eu-dkrz-1.dkrz.cloud/freva/workshop/k202186/cesm2LE-historical-smbb-TS.nc#mode=s3,bytes
...
Browser the data in https://eu-dkrz-1.dkrz.cloud/browser/freva/workshop/k202186

export PATH=/fastdata/freva/opt/bin/:${PATH}
ncdump -h https://s3.eu-dkrz-1.dkrz.cloud/freva/workshop/tas.nc#mode=s3,bytes
ncdump -h https://s3.eu-dkrz-1.dkrz.cloud/freva/workshop/tas.zarr#mode=s3,zarr

!/fastdata/freva/opt/bin/ncdump -h https://s3.eu-dkrz-1.dkrz.cloud/freva/workshop/$USER/cesm2LE-historical-smbb-TS.nc#mode=s3,bytes
!/fastdata/freva/opt/bin/cdo sinfo https://s3.eu-dkrz-1.dkrz.cloud/freva/workshop/$USER/cesm2LE-historical-smbb-TS.nc#mode=s3,bytes

netcdf cesm2LE-historical-smbb-TS {
dimensions:
	time = 1980 ;
	ensemble_id = 5 ;
variables:
	double time(time) ;
		time:_FillValue = NaN ;
		string time:units = "days since 1850-01-01" ;
		string time:calendar = "noleap" ;
	double TS(time, ensemble_id) ;
		TS:_FillValue = NaN ;
		string TS:cell_methods = "time: mean" ;
		string TS:long_name = "Surface temperature (radiative)" ;
		string TS:units = "K" ;
		string TS:source_dataset = "s3://ncar-cesm2-lens/atm/monthly/cesm2LE-historical-smbb-TS.zarr" ;
	int64 ensemble_id(ensemble_id) ;

// global attributes:
		string :ensembles = "r10i1191p1f2, r11i1231p1f2, r11i1251p1f2, r11i1281p1f2, r11i1301p1f2" ;
}
   File format : NetCDF4
    -1 : Institut Source   T Steptype Levels Num    Points Num Dtype : Parameter ID
     1 : unknown  unknown  v instant       1   1         5   1  F64  : -1            
   Grid coordinates :
     1 : generic                  : points=5
                      ensemble_id : 0 to 4 by 1
   Vertical coordinates :
     1 : surface                  : levels=1
   Time coordinate :
                             time : 1980 steps
     RefTime =  1850-01-01 00:00:00  Units = days  Calendar = 365_day
  YYYY-MM-DD hh:mm:ss  YYYY-MM-DD hh:mm:ss  YYYY-MM-DD hh:mm:ss  YYYY-MM-DD hh:mm:ss
  1850-01-16 12:00:00  1850-02-15 00:00:00  1850-03-16 12:00:00  1850-04-16 00:00:00
  1850-05-16 12:00:00  1850-06-16 00:00:00  1850-07-16 12:00:00  1850-08-16 12:00:00
  1850-09-16 00:00:00  1850-10-16 12:00:00  1850-11-16 00:00:00  1850-12-16 12:00:00
  1851-01-16 12:00:00  1851-02-15 00:00:00  1851-03-16 12:00:00  1851-04-16 00:00:00
  1851-05-16 12:00:00  1851-06-16 00:00:00  1851-07-16 12:00:00  1851-08-16 12:00:00
  1851-09-16 00:00:00  1851-10-16 12:00:00  1851-11-16 00:00:00  1851-12-16 12:00:00
  1852-01-16 12:00:00  1852-02-15 00:00:00  1852-03-16 12:00:00  1852-04-16 00:00:00
  1852-05-16 12:00:00  1852-06-16 00:00:00  1852-07-16 12:00:00  1852-08-16 12:00:00
  1852-09-16 00:00:00  1852-10-16 12:00:00  1852-11-16 00:00:00  1852-12-16 12:00:00
  1853-01-16 12:00:00  1853-02-15 00:00:00  1853-03-16 12:00:00  1853-04-16 00:00:00
  1853-05-16 12:00:00  1853-06-16 00:00:00  1853-07-16 12:00:00  1853-08-16 12:00:00
  1853-09-16 00:00:00  1853-10-16 12:00:00  1853-11-16 00:00:00  1853-12-16 12:00:00
  1854-01-16 12:00:00  1854-02-15 00:00:00  1854-03-16 12:00:00  1854-04-16 00:00:00
  1854-05-16 12:00:00  1854-06-16 00:00:00  1854-07-16 12:00:00  1854-08-16 12:00:00
  1854-09-16 00:00:00  1854-10-16 12:00:00  1854-11-16 00:00:00  1854-12-16 12:00:00
   ................................................................................
   ................................................................................
   ..........
  2010-01-16 12:00:00  2010-02-15 00:00:00  2010-03-16 12:00:00  2010-04-16 00:00:00
  2010-05-16 12:00:00  2010-06-16 00:00:00  2010-07-16 12:00:00  2010-08-16 12:00:00
  2010-09-16 00:00:00  2010-10-16 12:00:00  2010-11-16 00:00:00  2010-12-16 12:00:00
  2011-01-16 12:00:00  2011-02-15 00:00:00  2011-03-16 12:00:00  2011-04-16 00:00:00
  2011-05-16 12:00:00  2011-06-16 00:00:00  2011-07-16 12:00:00  2011-08-16 12:00:00
  2011-09-16 00:00:00  2011-10-16 12:00:00  2011-11-16 00:00:00  2011-12-16 12:00:00
  2012-01-16 12:00:00  2012-02-15 00:00:00  2012-03-16 12:00:00  2012-04-16 00:00:00
  2012-05-16 12:00:00  2012-06-16 00:00:00  2012-07-16 12:00:00  2012-08-16 12:00:00
  2012-09-16 00:00:00  2012-10-16 12:00:00  2012-11-16 00:00:00  2012-12-16 12:00:00
  2013-01-16 12:00:00  2013-02-15 00:00:00  2013-03-16 12:00:00  2013-04-16 00:00:00
  2013-05-16 12:00:00  2013-06-16 00:00:00  2013-07-16 12:00:00  2013-08-16 12:00:00
  2013-09-16 00:00:00  2013-10-16 12:00:00  2013-11-16 00:00:00  2013-12-16 12:00:00
  2014-01-16 12:00:00  2014-02-15 00:00:00  2014-03-16 12:00:00  2014-04-16 00:00:00
  2014-05-16 12:00:00  2014-06-16 00:00:00  2014-07-16 12:00:00  2014-08-16 12:00:00
  2014-09-16 00:00:00  2014-10-16 12:00:00  2014-11-16 00:00:00  2014-12-16 12:00:00
cdo    sinfo: Processed 1 variable over 1980 timesteps [0.13s 3000MB]

!/fastdata/freva/opt/bin/ncdump -h https://s3.eu-dkrz-1.dkrz.cloud/freva/workshop/$USER/cesm2LE-historical-smbb-TS.zarr#mode=s3,zarr
!/fastdata/freva/opt/bin/cdo sinfo https://s3.eu-dkrz-1.dkrz.cloud/freva/workshop/$USER/cesm2LE-historical-smbb-TS.zarr#mode=s3,zarr

netcdf cesm2LE-historical-smbb-TS {
dimensions:
	time = 1980 ;
	ensemble_id = 5 ;
variables:
	double TS(time, ensemble_id) ;
		TS:_FillValue = NaN ;
		TS:cell_methods = "time: mean" ;
		TS:long_name = "Surface temperature (radiative)" ;
		TS:units = "K" ;
		TS:source_dataset = "s3://ncar-cesm2-lens/atm/monthly/cesm2LE-historical-smbb-TS.zarr" ;
	int64 time(time) ;
		time:units = "hours since 1850-01-16 12:00:00.000000" ;
		time:calendar = "noleap" ;
	int64 ensemble_id(ensemble_id) ;

// global attributes:
		:ensembles = "r10i1191p1f2, r11i1231p1f2, r11i1251p1f2, r11i1281p1f2, r11i1301p1f2" ;
}
   File format : NCZarr filter
    -1 : Institut Source   T Steptype Levels Num    Points Num Dtype : Parameter ID
     1 : unknown  unknown  v instant       1   1         5   1  F64f : -1            
   Grid coordinates :
     1 : generic                  : points=5
                      ensemble_id : 0 to 4 by 1
   Vertical coordinates :
     1 : surface                  : levels=1
   Time coordinate :
                             time : 1980 steps
     RefTime =  1850-01-16 12:00:00  Units = hours  Calendar = 365_day
  YYYY-MM-DD hh:mm:ss  YYYY-MM-DD hh:mm:ss  YYYY-MM-DD hh:mm:ss  YYYY-MM-DD hh:mm:ss
  1850-01-16 12:00:00  1850-02-15 00:00:00  1850-03-16 12:00:00  1850-04-16 00:00:00
  1850-05-16 12:00:00  1850-06-16 00:00:00  1850-07-16 12:00:00  1850-08-16 12:00:00
  1850-09-16 00:00:00  1850-10-16 12:00:00  1850-11-16 00:00:00  1850-12-16 12:00:00
  1851-01-16 12:00:00  1851-02-15 00:00:00  1851-03-16 12:00:00  1851-04-16 00:00:00
  1851-05-16 12:00:00  1851-06-16 00:00:00  1851-07-16 12:00:00  1851-08-16 12:00:00
  1851-09-16 00:00:00  1851-10-16 12:00:00  1851-11-16 00:00:00  1851-12-16 12:00:00
  1852-01-16 12:00:00  1852-02-15 00:00:00  1852-03-16 12:00:00  1852-04-16 00:00:00
  1852-05-16 12:00:00  1852-06-16 00:00:00  1852-07-16 12:00:00  1852-08-16 12:00:00
  1852-09-16 00:00:00  1852-10-16 12:00:00  1852-11-16 00:00:00  1852-12-16 12:00:00
  1853-01-16 12:00:00  1853-02-15 00:00:00  1853-03-16 12:00:00  1853-04-16 00:00:00
  1853-05-16 12:00:00  1853-06-16 00:00:00  1853-07-16 12:00:00  1853-08-16 12:00:00
  1853-09-16 00:00:00  1853-10-16 12:00:00  1853-11-16 00:00:00  1853-12-16 12:00:00
  1854-01-16 12:00:00  1854-02-15 00:00:00  1854-03-16 12:00:00  1854-04-16 00:00:00
  1854-05-16 12:00:00  1854-06-16 00:00:00  1854-07-16 12:00:00  1854-08-16 12:00:00
  1854-09-16 00:00:00  1854-10-16 12:00:00  1854-11-16 00:00:00  1854-12-16 12:00:00
   ................................................................................
   ................................................................................
   ..........
  2010-01-16 12:00:00  2010-02-15 00:00:00  2010-03-16 12:00:00  2010-04-16 00:00:00
  2010-05-16 12:00:00  2010-06-16 00:00:00  2010-07-16 12:00:00  2010-08-16 12:00:00
  2010-09-16 00:00:00  2010-10-16 12:00:00  2010-11-16 00:00:00  2010-12-16 12:00:00
  2011-01-16 12:00:00  2011-02-15 00:00:00  2011-03-16 12:00:00  2011-04-16 00:00:00
  2011-05-16 12:00:00  2011-06-16 00:00:00  2011-07-16 12:00:00  2011-08-16 12:00:00
  2011-09-16 00:00:00  2011-10-16 12:00:00  2011-11-16 00:00:00  2011-12-16 12:00:00
  2012-01-16 12:00:00  2012-02-15 00:00:00  2012-03-16 12:00:00  2012-04-16 00:00:00
  2012-05-16 12:00:00  2012-06-16 00:00:00  2012-07-16 12:00:00  2012-08-16 12:00:00
  2012-09-16 00:00:00  2012-10-16 12:00:00  2012-11-16 00:00:00  2012-12-16 12:00:00
  2013-01-16 12:00:00  2013-02-15 00:00:00  2013-03-16 12:00:00  2013-04-16 00:00:00
  2013-05-16 12:00:00  2013-06-16 00:00:00  2013-07-16 12:00:00  2013-08-16 12:00:00
  2013-09-16 00:00:00  2013-10-16 12:00:00  2013-11-16 00:00:00  2013-12-16 12:00:00
  2014-01-16 12:00:00  2014-02-15 00:00:00  2014-03-16 12:00:00  2014-04-16 00:00:00
  2014-05-16 12:00:00  2014-06-16 00:00:00  2014-07-16 12:00:00  2014-08-16 12:00:00
  2014-09-16 00:00:00  2014-10-16 12:00:00  2014-11-16 00:00:00  2014-12-16 12:00:00
cdo    sinfo: Processed 1 variable over 1980 timesteps [0.10s 3000MB]

print("Opening")
for name in time_series.keys():
    file = f'https://s3.eu-dkrz-1.dkrz.cloud/freva/workshop/{USERNAME}/{name}'
    print(f"{file}.nc\n{file}.zarr")
    xr.open_dataset(f'{file}.nc', engine='h5netcdf')
    xr.open_zarr(f'{file}.zarr')

print('All output datasets opened!')

Opening
https://s3.eu-dkrz-1.dkrz.cloud/freva/workshop/k202186/cesm2LE-historical-smbb-TS.nc
https://s3.eu-dkrz-1.dkrz.cloud/freva/workshop/k202186/cesm2LE-historical-smbb-TS.zarr
https://s3.eu-dkrz-1.dkrz.cloud/freva/workshop/k202186/cesm2LE-historical-cmip6-TS.nc
https://s3.eu-dkrz-1.dkrz.cloud/freva/workshop/k202186/cesm2LE-historical-cmip6-TS.zarr
All output datasets opened!

Environment	Installation Command
modules (Recommended)	`module load clint gems`
conda	`conda create -n freva-client-env -c conda-forge freva-client -y`
pip	`pip install freva-client`

databrowser(flavour=cmip6, host=https://www.gems.dkrz.de/api/freva-nextgen/databrowser, multi_version=False, mip_era=mpi-ge, variable_id=tas)
# objects	921
*Available search facets for cmip6* flavour**	experiment_id, member_id, fs_type, grid_label, institution_id, source_id, mip_era, activity_id, realm, variable_id, time, bbox, time_aggregation, frequency, table_id, dataset, format, grid_id, level_type
Available flavours	freva, cmip6, cmip5, cordex, nextgems, user

databrowser(flavour=cmip6, host=https://www.gems.dkrz.de/api/freva-nextgen/databrowser, multi_version=False, mip_era=mpi-ge, variable_id=tas, time=2025-01 to 2100-12, time_select=flexible)
# objects	602
*Available search facets for cmip6* flavour**	experiment_id, member_id, fs_type, grid_label, institution_id, source_id, mip_era, activity_id, realm, variable_id, time, bbox, time_aggregation, frequency, table_id, dataset, format, grid_id, level_type
Available flavours	freva, cmip6, cmip5, cordex, nextgems, user

hostname	command (levante)	obs
https://gems.dkrz.de	`module load clint gems`	only data browser
https://freva.dkrz.de	`module load clint freva`	with plugins ⚠️Need to add batch scheduling info in `Extra scheduler options`⚠️

	unique
file	2
project	1
product	1
institute	1
model	1
experiment	1
time_frequency	1
realm	1
variable	1
ensemble	1
cmor_table	1
fs_type	1
grid_label	1
format	1
derived_variable_id	0

Freva - Data search and analysis framework for the Community

Primer:¶

Common Problem: Finding and accesing Data

Yet another solution: The Freva framework

Researchers

Perfect for Every Research Task:

Why Choose Freva?

Smart Architecture:

Setup

The Client Library¶

Flexible Search

Metadata overview

Metadata search¶

What if I don't know the search keys?¶

Access Metadata¶

Counting the number of datasets¶

Accessing dataset paths:¶

Pro Tip: Query data of files¶

Remote Access

Freva Databrowser: Hands-on content table

Search (Findablity)

Cataloging Data (Reusablity)

Customize Dataset (Interoperablity)

S3 Access (Accessiblity)

Setup

Setup

Initial Check¶

Search (Findablity)

MPI-GE (Grand Ensemble)¶

Search

Search

Search

Search

Search

Search

Search

Search

Search

Search

Search

Search

Search

Search

Search

Search

Search

Search

Find data via STAC Browser¶

Cataloging Data (Reusablity)

Catalog

Catalog

Catalog

Catalog

Customize Dataset (Interoperablity)

Custom

Custom

Custom

Auth

Custom

Custom

Custom

S3 Access (Accessibility)

S3

File vs Object Storage

File Storage¶

S3

File vs Object Storage

Object Storage¶

S3

Local vs Remote

S3

HTTP aka internet language!?¶

S3

Requests¶

S3

Replies¶

S3

Object Store APIs¶

S3

S3 - Simple Storage Service¶

Search
(Findablity)

Cataloging Data
(Reusablity)

Customize Dataset
(Interoperablity)

S3 Access
(Accessiblity)