Zarr Streaming API#
Definition of endpoints for loading/streaming and manipulating data.
This API exposes asynchronous services that convert files or objects into Zarr stores and stream them to clients. The conversion is performed by a background worker and communicated via a message broker; the REST endpoints return references (URLs) to the resulting Zarr datasets rather than the data itself.
—
Creating zarr endpoints for streaming data#
- GET /api/freva-nextgen/databrowser/load/(str: flavour)#
This endpoint searches for datasets and streams the results as Zarr data. The Zarr format allows for efficient storage and retrieval of large, multidimensional arrays. This endpoint can be used to query datasets and receive the results in a format that is suitable for further analysis and processing with Zarr. If the
catalogue-typeparameter is set to “intake”, it can generate Intake-ESM catalogues that point to the generated Zarr endpoints.- Parameters:
flavour (str) – The Data Reference Syntax (DRS) standard specifying the type of climate datasets to query. The available DRS standards can be retrieved using the
GET /api/datasets/overviewmethod.
- Query Parameters:
start – Specify the starting point for receiving search results. Default is 0.
multi-version – Use versioned datasets for querying instead of the latest datasets. Default is false.
translate – Translate the metadata output to the required DRS flavour. Default is true
catalogue-type – Set the type of catalogue you want to create from this query.
public – Indicate whether you want to create a publicly available temporary zarr url. Be default users need to be authenticated in order to access the zarr urls. Default is false
ttl_seconds – Set for how many seconds a the public zarr url should be valid for, if any. Default is 86,400 (1 day).
access_pattern – Optimise the chunk sizes for those access pattern, default: map, choose from map, time_series.
map_primary_chunksize – If access pattern is map set the chunk sizes of the primary axis (e.g time).
reload – Force a server-side cache refresh. By default, data store requests are cached to improve performance. Set to
trueto bypass the cache and fetch fresh datachunk_size – Target chunk size in megabytes.
**search_facets – With any other query parameters you refine your data search. Query parameters could be, depending on the DRS standard flavour
product,projectmodeletc.
- Request Headers:
Authorization – Bearer token for authentication.
Content-Type – application/json
- Status Codes:
200 OK – no error
400 Bad Request – no entries found for this query
422 Unprocessable Entity – invalid query parameters
- Response Headers:
Content-Type –
text/plain: zarr endpoints for the data
Example Request#
The logic works just like for the
data-searchandintake-catalogueendpoints. We constrain the data search bykey=valuesearch pairs. The only difference is that we have to authenticate by using an access token. You will also have to use a valid access token if you want to access the zarr data via http. Please refer to the Authentication & Authorization chapter for more details.GET /api/freva-nextgen/databrowser/load/freva/file?dataset=cmip6-fs HTTP/1.1 Host: www.freva.dkrz.de Authorization: Bearer your_access_token
Example Response#
HTTP/1.1 200 OK Content-Type: plain/text https://www.freva.dkrz.de/api/freva-nextgen/data-portal/zarr/dcb608a0-9d77-5045-b656-f21dfb5e9acf.zarr https://www.freva.dkrz.de/api/freva-nextgen/data-portal/zarr/f56264e3-d713-5c27-bc4e-c97f15b5fe86.zarr
Example#
Below you can find example usages of this request in different scripting and programming languages.
curl -X GET \ 'https://www.freva.dkrz.de/api/freva-nextgen/databrowser/load/freva?dataset=cmip6-fs' -H "Authorization: Bearer YOUR_ACCESS_TOKEN"
import requests import intake response = requests.get( "https://www.freva.dkrz.de/api/freva-nextgen/databrowser/load/freva", params={"dataset": "cmip6-fs"}, headers={"Authorization": "Bearer YOUR_ACCESS_TOKEN"}, stream=True, ) files = list(res.iterlines(decode_unicode=True)
library(httr) response <- GET( "https://www.freva.dkrz.de/api/freva-nextgen/databrowser/load/freva", query = list(dataset = "cmip6-fs") ) data <- strsplit(content(response, as = "text", encoding = "UTF-8"), "\n")[[1]]
using HTTP response = HTTP.get( "https://www.freva.dkrz.de/api/freva-nextgen/databrowser/load/freva", query = Dict("dataset" => "cmip6-fs") ) data = split(String(HTTP.body(response)),"\n")
#include <stdio.h> #include <curl/curl.h> int main() { CURL *curl; CURLcode res; const char *url = "https://www.freva.dkrz.de/api/freva-nextgen/databrowser/load/freva"; // Query parameters const char *dataset = "cmip6-fs"; const int start = 0; const int multi_version = 0; // 0 for false, 1 for true // Build the query string char query[256]; snprintf(query, sizeof(query), "?dataset=%s&start=%d&multi-version=%d",product , start, multi_version); // Initialize curl curl = curl_easy_init(); if (!curl) { fprintf(stderr, "Failed to initialize curl\n"); return 1; } // Construct the full URL with query parameters char full_url[512]; snprintf(full_url, sizeof(full_url), "%s%s", url, query); // Set the URL to fetch curl_easy_setopt(curl, CURLOPT_URL, full_url); // Perform the request res = curl_easy_perform(curl); if (res != CURLE_OK) { fprintf(stderr, "curl_easy_perform() failed: %s\n", curl_easy_strerror(res)); } // Clean up curl_easy_cleanup(curl); return 0; }
Request asynchronous Zarr conversion#
- POST /api/freva-nextgen/data-portal/zarr/convert#
Submit one or more file or object paths to be converted into Zarr stores. This endpoint only publishes a message to the data‑portal worker via a broker; it does not verify that the paths exist or perform the conversion itself. It returns a JSON object with a
urlsarray; each entry contains a UUID that identifies the future Zarr dataset.If the data‑loading service cannot access a file, it will record the failure and the corresponding Zarr dataset will be in a failed state with a reason. You can query the status endpoint to check whether the conversion succeeded or failed.
- Request JSON Object:
path (string) – Absolute or object‑store paths to the input files. Can be a single path or a list of paths.
public (boolean) – Boolean indicating whether or not the zarr store that should be created will be anonymously available.
ttl_seconds (integer) – How long should a pulic URL should remain valid, in seconds.
access_pattern (string) – Optimise the chunk sizes for those access pattern, default: map, choose from map, time_series.
map_primary_chunksize (integer query) – If access pattern is map set the chunk sizes of the primary axis (e.g time).
reload (boolean) – Force a server-side cache refresh. By default, data store requests are cached to improve performance. Set to
trueto bypass the cache and fetch fresh datachunk_size (float) – Target chunk size in megabytes.
aggregate (string) – The string indicating how the aggregation should be done: - null/omit: Do not aggregate data (default) - “auto”: let the system choose if the datasets should be concatenated or mereged. - “merge”: merge datasets as variables - “concat”: concatenated datasets along a dimension
join (string) –
String indicating how to combine differing indexes:
”outer”: use the union of object indexes.
”inner”: use the intersection of object indexes.
”left”: use indexes from the first object with each dimension.
”right”: use indexes from the last object with each dimension.
”exact”: instead of aligning, errors when indexes to be aligned are not equal.
This option is only taken into account it
aggregateis setcompat (string) –
String indicating how to compare non-concatenated variables of the same name for:
”equals”: all values and dimensions must be the same.
”no_conflicts”: only values which are not null in both datasets must be equal. The returned dataset then contains the combination of all non-null values.
”override”: skip comparing and pick variable from first dataset
This option is only taken into account it
aggregateis set.data_vars (string) –
These data variables will be combined together:
”minimal”: Only data variables in which the dimension already appears are included.
”different”: Data variables which are not equal (ignoring attributes) across all datasets are also concatenated (as well as all for which dimension already appears).
”all”: All data variables will be concatenated.
This option is only taken into account it
aggregateis set.coords (string) –
These coordinate variables will be combined together:
”minimal”: Only coordinates in which the dimension already appears are included.
”different”: Coordinates which are not equal (ignoring attributes) across all datasets are also concatenated (as well as all for which dimension already appears).
”all”: All coordinates will be concatenated.
This option is only taken into account it
aggregateis set.dim (string) –
Name of the dimension to concatenate along. This can either be a new dimension name, in which case it is added along axis=0, or an existing dimension name, in which case the location of the dimension is unchanged.
This option is only taken into account it
aggregateis set.group_by (string) – If set, forces grouping by a signature key. Otherwise grouping is attempted only when direct combine fails.
- Request Headers:
Authorization – Bearer token for authentication.
- Status Codes:
200 OK – JSON object with a
urlsarray containing the Zarr endpoint URLs.401 Unauthorized – The user could not be authenticated.
503 Service Unavailable – Service is currently unavailable.
500 Internal Server Error – Internal error while publishing the data request.
- Response Headers:
Content-Type –
application/json
Example Request#
POST /api/freva-nextgen/data-portal/zarr/convert?path=/work/abc123/myuser/mydata_1.nc&path=/work/abc123/myuser/mydata_2.nc HTTP/1.1 Host: www.freva.dkrz.de Authorization: Bearer YOUR_ACCESS_TOKEN
Example Response#
HTTP/1.1 200 OK Content-Type: application/json { "urls": [ "https://www.freva.dkrz.de/api/freva-nextgen/data-portal/zarr/907f6bca-1234-5678-9abc-def012345678.zarr", "https://www.freva.dkrz.de/api/freva-nextgen/data-portal/zarr/aa8432de-9abc-4567-def0-123456789abc.zarr" ] }
Examples#
Below are example usages of this request in different languages.
curl -X POST \ 'https://www.freva.dkrz.de/api/freva-nextgen/data-portal/zarr/convert' \ --header "Authorization: Bearer YOUR_ACCESS_TOKEN" \ --data-urlencode 'path=/work/abc123/myuser/mydata_1.nc' \ --data-urlencode 'path=/work/abc123/myuser/mydata_2.nc'
import requests response = requests.post( "https://www.freva.dkrz.de/api/freva-nextgen/data-portal/zarr/convert", json={ "path": [ "/work/abc123/myuser/mydata_1.nc", "/work/abc123/myuser/mydata_2.nc", ] }, headers={"Authorization": "Bearer YOUR_ACCESS_TOKEN"}, ) zarr_locations = response.json()["urls"]
library(httr) library(jsonlite) response <- POST( "https://www.freva.dkrz.de/api/freva-nextgen/data-portal/zarr/convert", query = list(path = c( "/work/abc123/myuser/mydata_1.nc", "/work/abc123/myuser/mydata_2.nc" )), add_headers(Authorization = "Bearer YOUR_ACCESS_TOKEN") ) zarr_locations <- fromJSON(content(response, as = "text", encoding = "UTF-8"))$urls
using HTTP using JSON headers = Dict("Authorization" => "Bearer YOUR_ACCESS_TOKEN") query = Dict("path" => [ "/work/abc123/myuser/mydata_1.nc", "/work/abc123/myuser/mydata_2.nc", ]) response = HTTP.post( "https://www.freva.dkrz.de/api/freva-nextgen/data-portal/zarr/convert"; headers = headers, query = query, )
Generic Zarr key retrieval#
In addition to the specific metadata and chunk routes listed above, the
data portal also exposes a catch‑all endpoint that serves arbitrary keys
from a Zarr store. This interface is compatible with remote storage
backends used by xarray and zarr, allowing clients to
access consolidated metadata, per‑variable metadata and individual chunk
payloads via HTTP.
- GET /api/freva-nextgen/data-portal/zarr/{token}.zarr/{zarr_key}#
Retrieve metadata or chunk data from the store identified by
token. Thezarr_keyparameter must contain the slash‑separated key within the Zarr store. The following conventions apply:Root‑level keys such as
.zmetadata,.zgroupand.zattrsreturn the consolidated metadata, group information and attributes for the entire store. These keys have no variable prefix.Keys of the form
<variable>/.zarrayand<variable>/.zattrsreturn the array encoding and variable attributes for the specified variable. Replace<variable>with the actual variable name.Data chunk keys use the pattern
<variable>/<chunk>wherechunkencodes the chunk indices separated by dots. For example,tas/0.0.0requests the first chunk of the variabletas.
- Parameters:
token – Unique identifier returned by the load endpoint when a dataset has been registered for streaming. Each token corresponds to one Zarr store.
zarr_key – Slash‑separated key within the Zarr store. Refer to the descriptions above for valid patterns.
- Status Codes:
200 OK – The requested metadata or chunk was found and is returned in the response body.
400 Bad Request – The key is malformed or refers to a metadata file that requires a variable context (for example
foo/.zarrayat the root level).404 Not Found – The requested key does not exist in the store.
Example
To retrieve the consolidated metadata for a store, use
.zmetadataas the key:curl -X GET \ "https://api.freva.de/api/freva-nextgen/data-portal/zarr/<token>.zarr/.zmetadata"
You can open a remote store directly in
xarrayusing thefsspecinterface:import fsspec import xarray as xr mapper = fsspec.get_mapper( "https://api.freva.de/api/freva-nextgen/data-portal/zarr/<token>.zarr" ) dset = xr.open_zarr(mapper, consolidated=True) dset.load()
This endpoint mirrors the behaviour of the previous one but requires a valid
sigparameter obtained fromPOST /api/freva-nextgen/data-portal/zarr/share-zarr. The signature authenticates the request and maps to the underlying token.Example
Retrieve the consolidated metadata for a shared store:
curl -X GET \ "https://api.freva.de/api/freva-nextgen/data-portal/share/<sig>/<token>.zarr/.zmetadata" zarr_locations = JSON.parse(String(response.body))["urls"] .. code-tab:: c :caption: C (libcurl) #include <stdio.h> #include <curl/curl.h> int main(void) { CURL *curl = curl_easy_init(); if (curl) { struct curl_slist *headers = NULL; headers = curl_slist_append(headers, "Authorization: Bearer YOUR_ACCESS_TOKEN"); // Note: encode special characters in the paths as needed curl_easy_setopt(curl, CURLOPT_URL, "https://www.freva.dkrz.de/api/freva-nextgen/data-portal/zarr/convert" "?path=/work/abc123/myuser/mydata_1.nc&" "path=/work/abc123/myuser/mydata_2.nc"); curl_easy_setopt(curl, CURLOPT_HTTPHEADER, headers); CURLcode res = curl_easy_perform(curl); curl_slist_free_all(headers); curl_easy_cleanup(curl); } return 0; }
—
Create a public pre-signed zarr url#
Create a short-lived, shareable pre-signed URL for a specific Zarr chunk. The caller must authenticate with a normal OAuth2 access token.
The returned URL includes expires and sig query parameters. Anyone who knows the URL can perform a
GETrequest on the target resource until the expiry time is reached, without needing an access token.- Request JSON Object:
path (string) – Fully qualified URL of the resource to pre-sign, relative to this API. Must contain /api/freva-nextgen/data-portal/zarr/ and typically points to a single Zarr url.
ttl_seconds (integer) – How long the pre-signed URL should remain valid, in seconds.
- Request Headers:
Authorization – Bearer token for authentication.
- Status Codes:
200 OK – JSON object with a
urlcontaining the Zarr endpoint URLs.401 Unauthorized – The user could not be authenticated.
503 Service Unavailable – Service is currently unavailable.
500 Internal Server Error – Internal error while publishing the data request.
- Response Headers:
Content-Type –
application/json
Example Request#
POST /api/freva-nextgen/data-portal/share-zarr/?path=https://www.freva.dkrz.de/api/freva-nextgen/data-portal/zarr/432a5670.zarr HTTP/1.1 Host: www.freva.dkrz.de Authorization: Bearer YOUR_ACCESS_TOKEN
Example Response#
HTTP/1.1 200 OK Content-Type: application/json { "url": "https://www.freva.dkrz.de/api/freva-nextgen/data-portal/share/hRNJcEug/breezy-maxwell.zarr", "sig": "AbCdEf", "token": "123e4567", "expires": 1763540778, "method": "GET", }
Examples#
Below are example usages of this request in different languages.
curl -X POST \ 'https://www.freva.dkrz.de/api/freva-nextgen/data-portal/share-zarr\ ?path=https://www.freva.dkrz.de/api/freva-nextgen/data-portal/zarr/432a5670.zarr' \ -H "Authorization: Bearer YOUR_ACCESS_TOKEN"
import requests response = requests.post( "https://www.freva.dkrz.de/api/freva-nextgen/data-portal/share-zarr", json={ "path": ( "https://www.freva.dkrz.de/api/freva-nextgen/" "data-portal/zarr/432a5670.zarr" ), }, headers={"Authorization": "Bearer YOUR_ACCESS_TOKEN"}, ) public_zarr = response.json()["url"]
library(httr) library(jsonlite) response <- POST( "https://www.freva.dkrz.de/api/freva-nextgen/data-portal/share-zarr", query = list( path = "https://www.freva.dkrz.de/api/freva-nextgen/data-portal/zarr/432a5670.zarr" ), add_headers(Authorization = "Bearer YOUR_ACCESS_TOKEN") ) public_zarr <- fromJSON( content(response, as = "text", encoding = "UTF-8") )$url
using HTTP using JSON headers = Dict("Authorization" => "Bearer YOUR_ACCESS_TOKEN") response = HTTP.post( "https://www.freva.dkrz.de/api/freva-nextgen/data-portal/share-zarr"; headers = headers, query = Dict( "path" => "https://www.freva.dkrz.de/api/freva-nextgen/data-portal/zarr/432a5670.zarr", ), ) public_zarr = JSON.parse(String(response.body))["url"]
#include <stdio.h> #include <curl/curl.h> int main(void) { CURL *curl = curl_easy_init(); if (curl) { CURLcode res; struct curl_slist *headers = NULL; headers = curl_slist_append( headers, "Authorization: Bearer YOUR_ACCESS_TOKEN" ); curl_easy_setopt( curl, CURLOPT_URL, "https://www.freva.dkrz.de/api/freva-nextgen/data-portal/" "share-zarr?path=https://www.freva.dkrz.de/api/freva-nextgen/" "data-portal/zarr/432a5670.zarr" ); curl_easy_setopt(curl, CURLOPT_HTTPHEADER, headers); curl_easy_setopt(curl, CURLOPT_POST, 1L); res = curl_easy_perform(curl); curl_slist_free_all(headers); curl_easy_cleanup(curl); } return 0; }
—
Checking the status of a dynamically created Zarr dataset#
- GET /api/freva-nextgen/data-portal/zarr-utils/status#
This endpoint returns the current status of a dynamically created Zarr dataset. After submitting an instruction to generate a Zarr dataset, clients can poll this endpoint to track the progress of the conversion process. The response includes a machine-readable integer status code and a human-readable explanation.
- Request Headers:
Authorization – Bearer token for authentication.
Content-Type – application/json
- Status Codes:
200 OK – Status returned successfully.
401 Unauthorized – Unauthorised / invalid or missing access token.
404 Not Found – The token is not known to the system.
503 Service Unavailable – The service is currently unavailable.
- Response Headers:
Content-Type –
application/json
Example Request#
To check the status of a dataset conversion, simply perform a
GETrequest with a valid access token:GET /api/freva-nextgen/data-portal/zarr-utils/status HTTP/1.1 Host: www.freva.dkrz.de Authorization: Bearer your_access_token
Example Response#
HTTP/1.1 200 OK Content-Type: application/json { "status": 2, "reason": "Dataset is currently being prepared" }
Example#
Below you can find example usages of this request in different scripting and programming languages.
curl -X GET \ 'https://www.freva.dkrz.de/api/freva-nextgen/data-portal/zarr-utils/status' \ -H "Authorization: Bearer YOUR_ACCESS_TOKEN"
import requests response = requests.get( "https://www.freva.dkrz.de/api/freva-nextgen/data-portal/zarr-utils/status", headers={"Authorization": "Bearer YOUR_ACCESS_TOKEN"}, ) status = response.json() print(status["status"], status["reason"])
library(httr) response <- GET( "https://www.freva.dkrz.de/api/freva-nextgen/data-portal/zarr-utils/status", add_headers(Authorization = "Bearer YOUR_ACCESS_TOKEN") ) data <- content(response, as = "parsed")
using HTTP response = HTTP.get( "https://www.freva.dkrz.de/api/freva-nextgen/data-portal/zarr-utils/status", headers = Dict("Authorization" => "Bearer YOUR_ACCESS_TOKEN") ) status = JSON3.read(String(HTTP.body(response)))
#include <stdio.h> #include <curl/curl.h> int main() { CURL *curl; CURLcode res; curl = curl_easy_init(); if (!curl) { fprintf(stderr, "Failed to initialize curl\n"); return 1; } curl_easy_setopt(curl, CURLOPT_URL, "https://www.freva.dkrz.de/api/freva-nextgen/data-portal/zarr-utils/status"); struct curl_slist *headers = NULL; headers = curl_slist_append(headers, "Authorization: Bearer YOUR_ACCESS_TOKEN"); curl_easy_setopt(curl, CURLOPT_HTTPHEADER, headers); res = curl_easy_perform(curl); if (res != CURLE_OK) { fprintf(stderr, "curl_easy_perform() failed: %s\n", curl_easy_strerror(res)); } curl_slist_free_all(headers); curl_easy_cleanup(curl); return 0; }
—
Retrieving an HTML representation of a Zarr dataset#
- GET /api/freva-nextgen/data-portal/zarr-utils/html#
This endpoint returns a human-readable HTML representation of a Zarr dataset. Internally, the dataset is opened using Xarray and rendered using its built-in HTML formatter. The resulting HTML document contains a rich, interactive summary that is well-suited for inspection in a web browser.
This endpoint is intended for interactive exploration only. It does not return Zarr data directly and should not be used for programmatic access or large-scale data processing.
- Request Headers:
Authorization – Bearer token for authentication.
Content-Type – application/json
- Status Codes:
200 OK – HTML representation returned successfully.
401 Unauthorized – Unauthorised / invalid or missing token.
404 Not Found – Dataset not found or unknown.
503 Service Unavailable – The service is currently unavailable.
- Response Headers:
Content-Type –
text/html
Example Request#
GET /api/freva-nextgen/data-portal/zarr-utils/html HTTP/1.1 Host: www.freva.dkrz.de Authorization: Bearer your_access_token
Example Response#
HTTP/1.1 200 OK Content-Type: text/html; charset=UTF-8 <!DOCTYPE html> <html> <head> <meta charset="utf-8"> <title>Xarray Dataset</title> </head> <body> <!-- HTML rendering of the dataset --> <div class="xr-wrap"> ... </div> </body> </html>
Example#
Below you can find example usages of this request in different programming languages.
curl -X GET \ 'https://www.freva.dkrz.de/api/freva-nextgen/data-portal/zarr-utils/html' \ -H "Authorization: Bearer YOUR_ACCESS_TOKEN"
import requests response = requests.get( "https://www.freva.dkrz.de/api/freva-nextgen/data-portal/zarr-utils/html", headers={"Authorization": "Bearer YOUR_ACCESS_TOKEN"}, ) html = response.text print(html)
library(httr) response <- GET( "https://www.freva.dkrz.de/api/freva-nextgen/data-portal/zarr-utils/html", add_headers(Authorization = "Bearer YOUR_ACCESS_TOKEN") ) html <- content(response, as = "text")
using HTTP response = HTTP.get( "https://www.freva.dkrz.de/api/freva-nextgen/data-portal/zarr-utils/html", headers = Dict("Authorization" => "Bearer YOUR_ACCESS_TOKEN") ) html = String(HTTP.body(response))
#include <stdio.h> #include <curl/curl.h> int main() { CURL *curl; CURLcode res; curl = curl_easy_init(); if (!curl) { fprintf(stderr, "Failed to initialize curl\n"); return 1; } curl_easy_setopt( curl, CURLOPT_URL, "https://www.freva.dkrz.de/api/freva-nextgen/data-portal/zarr-utils/html" ); struct curl_slist *headers = NULL; headers = curl_slist_append(headers, "Authorization: Bearer YOUR_ACCESS_TOKEN"); curl_easy_setopt(curl, CURLOPT_HTTPHEADER, headers); res = curl_easy_perform(curl); if (res != CURLE_OK) { fprintf(stderr, "curl_easy_perform() failed: %s\n", curl_easy_strerror(res)); } curl_slist_free_all(headers); curl_easy_cleanup(curl); return 0; }