Open Dataset Reference (engine='ee')
This page is the canonical user-facing reference for calling:
xr.open_dataset(..., engine='ee')
How The Call Chain Works
In plain terms:
You call
xarray.open_dataset(..., engine='ee').Xarray routes that call to Xee’s backend entrypoint method:
xee.EarthEngineBackendEntrypoint.open_dataset.That entrypoint creates and uses
xee.EarthEngineStoreinternally to stream pixels and metadata.
EarthEngineStore is an internal/core backend type. Most users should treat
xr.open_dataset(..., engine='ee') as the public API and use this page as the
parameter reference.
Related API pages:
Required vs Optional Parameters
When engine='ee', the grid parameters are required at call time:
crscrs_transformshape_2d
Most other parameters are optional tuning or decoding controls.
Input source (filename_or_obj) can be one of:
An
ee.ImageCollectionobjectAn
ee.Imageobject (wrapped internally as an ImageCollection)An asset id string/path, including
ee://.../ee:...style URIs
Canonical Parameter List
The signature and parameter docs below are rendered from the backend method used at runtime, so this reference stays aligned with implementation behavior.
- EarthEngineBackendEntrypoint.open_dataset(filename_or_obj, crs, crs_transform, shape_2d, drop_variables=None, io_chunks=None, n_images=-1, mask_and_scale=True, decode_times=True, decode_timedelta=None, use_cftime=None, concat_characters=True, decode_coords=True, primary_dim_name=None, primary_dim_property=None, ee_mask_value=None, request_byte_limit=50331648, ee_init_if_necessary=False, ee_init_kwargs=None, executor_kwargs=None, getitem_kwargs=None, fast_time_slicing=False)
Open an Earth Engine ImageCollection as an Xarray Dataset.
- Parameters:
filename_or_obj (str | PathLike[Any] | ImageCollection) – An asset ID for an ImageCollection, or an
ee.ImageCollectionobject.crs (str) – The coordinate reference system (a CRS code or WKT string). This defines the frame of reference to coalesce all variables upon opening.
crs_transform (tuple[float, float, float, float, float, float] | Affine) – Transform matrix describing the grid origin and scale relative to the CRS.
shape_2d (tuple[int, int]) – Dimensions of the pixel grid in the form (width, height).
drop_variables (optional) – Variables or bands to drop before opening.
io_chunks (optional) – Specifies the chunking strategy for loading data from EE. By default, this automatically calculates optional chunks based on
request_byte_limit.n_images (optional) – The max number of EE images in the collection to open. Useful when there are a large number of images in the collection since calculating collection size can be slow. -1 indicates that all images should be included.
mask_and_scale (optional) – Lazily scale (using
scale_factorandadd_offset) and mask (using_FillValue).decode_times (optional) – Decode CF times (e.g., integers since
"hours since 2000-01-01") tonp.datetime64.decode_timedelta (optional) – If True, decode variables and coordinates with time units in {“days”, “hours”, “minutes”, “seconds”, “milliseconds”, “microseconds”} into timedelta objects. If False, leave them encoded as numbers. If
None(default), assume the same value ofdecode_times.use_cftime (optional) – Only relevant if encoded dates come from a standard calendar (e.g. “gregorian”, “proleptic_gregorian”, “standard”, or not specified). If None (default), attempt to decode times to
np.datetime64[ns]objects; if this is not possible, decode times tocftime.datetimeobjects. If True, always decode times tocftime.datetimeobjects, regardless of whether or not they can be represented usingnp.datetime64[ns]objects. If False, always decode times tonp.datetime64[ns]objects; if this is not possible raise an error.concat_characters (optional) – Should character arrays be concatenated to strings, for example:
["h", "e", "l", "l", "o"] -> "hello".decode_coords (optional) –
boolor{"coordinates", "all"}. Controls which variables are set as coordinate variables. Use"coordinates"(orTrue) to set variables referenced by the'coordinates'attribute of datasets or individual variables as coordinate variables. Use"all"to additionally set variables referenced by'grid_mapping','bounds', and related attributes as coordinate variables.primary_dim_name (optional) – Override the name of the primary dimension of the output Dataset. By default, the name is
'time'.primary_dim_property (optional) – Override the
ee.Imageproperty for which to derive the values of the primary dimension. By default, this is'system:time_start'.ee_mask_value (optional) – Value to mask to EE nodata values. By default, this is
np.iinfo(np.int32).max(i.e.,2147483647).request_byte_limit (int) – The max allowed bytes to request at a time from Earth Engine. By default, it is
48 * 1024 * 1024(48 MB).ee_init_if_necessary (bool) – boolean flag to set if auto initialize for Earth Engine should be attempted. Set to
Trueif using distributed compute frameworks.ee_init_kwargs (dict[str, Any] | None) – keywords to pass to Earth Engine Initialize when attempting to auto init for remote workers.
executor_kwargs (optional) – A dictionary of keyword arguments to pass to the
ThreadPoolExecutorthat handles the parallel computation of pixels, for example{'max_workers': 2}.getitem_kwargs (optional) – Exponential backoff kwargs passed into the xarray function used to index the array (
robust_getitem). Supported keys include'max_retries'(maximum retry attempts, default 6) and'initial_delay'(initial delay in milliseconds before the first retry, default 500).fast_time_slicing (optional) – Whether to perform an optimization that makes slicing an ImageCollection across time faster. This optimization loads EE images in a slice by ID, so any modifications to images in a computed ImageCollection will not be reflected.
- Returns:
An xarray.Dataset that streams in remote data from Earth Engine.
- Return type:
Dataset
Parameter Name Mapping (User API vs Core Backend)
Most users should pass arguments to xr.open_dataset(..., engine='ee').
Some names differ in the core backend API (EarthEngineStore.open).
User-facing ( |
Core backend ( |
Notes |
|---|---|---|
|
|
Backend always operates on an |
|
|
Same concept, different name at different layers |
|
|
Same behavior |
If you are reading backend API pages, these name differences are expected.
Practical Parameter Guide
The list below explains the most common practical usage patterns for parameters you may see in user docs and backend API docs.
image_collection (ee.ImageCollection)
Backend/core parameter corresponding to user-facing
filename_or_obj.You usually pass either an EE object (
ee.ImageCollection/ee.Image) or an asset URI string intoxr.open_dataset; Xee normalizes to anee.ImageCollectioninternally.Asset paths usually come from either:
The public Earth Engine Data Catalog: https://developers.google.com/earth-engine/datasets/catalog
The Awesome GEE Community Catalog (community datasets): https://gee-community-catalog.org/projects/
Your own Earth Engine assets (personal, team, or project-owned): https://developers.google.com/earth-engine/guides/asset_manager
Example catalog path:
ECMWF/ERA5_LAND/MONTHLY_AGGR(or URI formee://ECMWF/ERA5_LAND/MONTHLY_AGGR).
crs (str)
Output coordinate reference system for all opened variables.
Required at runtime for
engine='ee'.Prefer
helpers.extract_grid_params(...)/helpers.fit_geometry(...)unless you explicitly need a manual override.
crs_transform (tuple[float, float, float, float, float, float] | Affine)
Geotransform defining pixel size/origin in the selected CRS.
Required at runtime for
engine='ee'.Keep this consistent with
shape_2d; mismatches can cause confusing bounds or orientation outcomes.
shape_2d (tuple[int, int])
Pixel grid size in
(width, height)order.Required at runtime for
engine='ee'.Large shapes increase memory and request pressure.
chunks (int | dict[Any, Any] | Literal['auto'] | None)
Default:
None.Dask/Xarray chunking in the returned dataset.
Affects downstream compute scheduling/memory behavior, not just EE request boundaries.
Start with modest time chunks and tune only when needed.
n_images (int)
Default:
-1(include all images).Limit the number of images loaded from the collection (
-1means all).Useful for quick iteration, debugging, or very large collections.
primary_dim_name (str | None)
Default:
None(resolved totime).Rename the primary stacked dimension (default:
time).Usually keep default unless integrating with an existing schema.
primary_dim_property (str | None)
Default:
None(resolved tosystem:time_start).EE image property used to derive primary-dimension coordinate values (default:
system:time_start).Change only if your collection indexing semantics depend on another property.
mask_value (float | None)
Default:
None(resolved tonp.iinfo(np.int32).max, i.e.2147483647).Backend/core mask sentinel corresponding to user-facing
ee_mask_value.Used to convert EE nodata/sentinel pixels to NaN-friendly behavior.
request_byte_limit (int)
Default:
48 * 1024 * 1024(48 MB).Upper bound for per-request payload size.
Advanced tuning control: Earth Engine size constraints vary by workload.
Prefer lowering this value when you hit request-size instability.
Avoid increasing unless validated for your specific dataset/query pattern.
ee_init_kwargs (dict[str, Any] | None)
Default:
None.Keyword arguments forwarded to
ee.Initialize(...)during optional worker auto-initialization.Useful in distributed settings where workers need credentials/project config.
ee_init_if_necessary (bool)
Default:
False.Whether Xee should attempt EE initialization on demand (commonly for remote workers).
Keep
Falsefor standard local workflows where EE is already initialized.
executor_kwargs (dict[str, Any] | None)
Default:
None(internally treated as{}).Thread pool settings for parallel pixel retrieval.
Advanced tuning: increasing worker count may improve throughput or trigger more rate/quota pressure depending on workload.
getitem_kwargs (dict[str, int] | None)
Default:
None(uses internal defaults:max_retries=6,initial_delay=500ms).Retry/backoff tuning for array indexing fetches.
Useful for transient quota/rate errors.
Tune conservatively (
max_retries,initial_delay) and prefer reducing concurrency before aggressive retry expansion.
fast_time_slicing (bool)
Default:
False.Enables a faster slice path by loading images by ID.
Important: for computed/modified ImageCollections, this can return original asset images (looked up by ID) rather than your computed image values.
fast_time_slicing Deep Dive
fast_time_slicing=True is an important optimization, but it changes how time
slices are resolved.
What it does:
False(default): Xee slices directly from the in-memory EEImageCollectionobject.True: Xee slices bysystem:idfirst and then loads by those IDs.
Why this can be confusing:
If your collection is computed/modified (for example:
.map(...), band math, clipping/masking, or replacing images), slicing by ID can bypass those computed modifications and return the original images associated with the IDs.In other words,
fast_time_slicing=Truecan be faster but may not reflect computed collection transformations.
When to use it:
Good fit: direct/stored collections where you want faster time slicing and are not depending on computed per-image transformations.
Use caution: computed collections where transformed pixel values must be preserved in reads.
Practical recommendation:
Start with
fast_time_slicing=Falsefor correctness-sensitive workflows.Enable
fast_time_slicing=Trueonly after validating that sampled outputs match your intended processing semantics.If enabled and your collection lacks image IDs, Xee logs a warning and falls back to default (non-fast) behavior.
Common Recipes
1. Match source projection/resolution
Use this when you want output aligned to the dataset’s native grid.
import ee
import xarray as xr
from xee import helpers
ic = ee.ImageCollection('ECMWF/ERA5_LAND/MONTHLY_AGGR')
grid = helpers.extract_grid_params(ic)
ds = xr.open_dataset(ic, engine='ee', **grid)
2. Manual grid override
Use this when you must align with an external raster/grid spec.
import xarray as xr
manual_crs = 'EPSG:4326'
manual_transform = (0.25, 0, -180, 0, -0.25, 90)
manual_shape = (1440, 720) # (width, height)
ds = xr.open_dataset(
'ee://ECMWF/ERA5_LAND/MONTHLY_AGGR',
engine='ee',
crs=manual_crs,
crs_transform=manual_transform,
shape_2d=manual_shape,
)
3. Performance/chunking tuning
Use this when throughput or memory behavior needs tuning.
import ee
import xarray as xr
from xee import helpers
ic = ee.ImageCollection('ECMWF/ERA5_LAND/MONTHLY_AGGR')
grid = helpers.extract_grid_params(ic)
ds = xr.open_dataset(
ic,
engine='ee',
**grid,
chunks={'time': 12},
io_chunks={'time': 24, 'x': 256, 'y': 256},
request_byte_limit=32 * 1024 * 1024,
)
Advanced tuning only
io_chunks and request_byte_limit are advanced controls. Earth Engine
imposes response/request size constraints, so these values usually require
trial-and-error for each workload.
Start from defaults and tune conservatively. In most cases, reducing request size is safer than increasing it.
Notes:
chunkscontrols Dask chunking in Xarray.io_chunkscontrols request windows used by Xee for Earth Engine reads.request_byte_limitlimits per-request payload size. Prefer reducing this if you encounter request-size failures or unstable reads.Avoid increasing
request_byte_limitunless you have validated behavior against Earth Engine limits for your specific dataset and query pattern.
Object vs URI Inputs
For engine='ee', these are equivalent in outcome once resolved:
Passing an EE object (
ee.ImageCollection/ee.Image)Passing a URI/asset id string (
ee://...style or asset path string)
Object inputs are often convenient in notebooks where you’ve already built a computed collection. URI/asset id strings are useful for concise, declarative loading and config-driven workflows.