# Open Dataset Reference (`engine='ee'`) This page is the canonical user-facing reference for calling: ```python xr.open_dataset(..., engine='ee') ``` ## How The Call Chain Works In plain terms: 1. You call `xarray.open_dataset(..., engine='ee')`. 2. Xarray routes that call to Xee's backend entrypoint method: `xee.EarthEngineBackendEntrypoint.open_dataset`. 3. That entrypoint creates and uses `xee.EarthEngineStore` internally to stream pixels and metadata. `EarthEngineStore` is an internal/core backend type. Most users should treat `xr.open_dataset(..., engine='ee')` as the public API and use this page as the parameter reference. Related API pages: - [EarthEngineBackendEntrypoint autosummary](_autosummary/xee.EarthEngineBackendEntrypoint) - [EarthEngineStore autosummary](_autosummary/xee.EarthEngineStore) ## Required vs Optional Parameters When `engine='ee'`, the grid parameters are required at call time: - `crs` - `crs_transform` - `shape_2d` Most other parameters are optional tuning or decoding controls. Input source (`filename_or_obj`) can be one of: - An `ee.ImageCollection` object - An `ee.Image` object (wrapped internally as an ImageCollection) - An asset id string/path, including `ee://...` / `ee:...` style URIs ## Canonical Parameter List The signature and parameter docs below are rendered from the backend method used at runtime, so this reference stays aligned with implementation behavior. ```{eval-rst} .. currentmodule:: xee .. automethod:: EarthEngineBackendEntrypoint.open_dataset ``` ## Parameter Name Mapping (User API vs Core Backend) Most users should pass arguments to `xr.open_dataset(..., engine='ee')`. Some names differ in the core backend API (`EarthEngineStore.open`). | User-facing (`xr.open_dataset`) | Core backend (`EarthEngineStore.open`) | Notes | |---|---|---| | `filename_or_obj` | `image_collection` | Backend always operates on an `ee.ImageCollection` | | `io_chunks` | `chunk_store` / `chunks` | Same concept, different name at different layers | | `ee_mask_value` | `mask_value` | Same behavior | If you are reading backend API pages, these name differences are expected. ## Practical Parameter Guide The list below explains the most common practical usage patterns for parameters you may see in user docs and backend API docs. ### `image_collection` (`ee.ImageCollection`) - Backend/core parameter corresponding to user-facing `filename_or_obj`. - You usually pass either an EE object (`ee.ImageCollection`/`ee.Image`) or an asset URI string into `xr.open_dataset`; Xee normalizes to an `ee.ImageCollection` internally. - Asset paths usually come from either: - The public Earth Engine Data Catalog: - The Awesome GEE Community Catalog (community datasets): - Your own Earth Engine assets (personal, team, or project-owned): - Example catalog path: `ECMWF/ERA5_LAND/MONTHLY_AGGR` (or URI form `ee://ECMWF/ERA5_LAND/MONTHLY_AGGR`). ### `crs` (`str`) - Output coordinate reference system for all opened variables. - Required at runtime for `engine='ee'`. - Prefer `helpers.extract_grid_params(...)` / `helpers.fit_geometry(...)` unless you explicitly need a manual override. ### `crs_transform` (`tuple[float, float, float, float, float, float] | Affine`) - Geotransform defining pixel size/origin in the selected CRS. - Required at runtime for `engine='ee'`. - Keep this consistent with `shape_2d`; mismatches can cause confusing bounds or orientation outcomes. ### `shape_2d` (`tuple[int, int]`) - Pixel grid size in `(width, height)` order. - Required at runtime for `engine='ee'`. - Large shapes increase memory and request pressure. ### `chunks` (`int | dict[Any, Any] | Literal['auto'] | None`) - Default: `None`. - Dask/Xarray chunking in the returned dataset. - Affects downstream compute scheduling/memory behavior, not just EE request boundaries. - Start with modest time chunks and tune only when needed. ### `n_images` (`int`) - Default: `-1` (include all images). - Limit the number of images loaded from the collection (`-1` means all). - Useful for quick iteration, debugging, or very large collections. ### `primary_dim_name` (`str | None`) - Default: `None` (resolved to `time`). - Rename the primary stacked dimension (default: `time`). - Usually keep default unless integrating with an existing schema. ### `primary_dim_property` (`str | None`) - Default: `None` (resolved to `system:time_start`). - EE image property used to derive primary-dimension coordinate values (default: `system:time_start`). - Change only if your collection indexing semantics depend on another property. ### `mask_value` (`float | None`) - Default: `None` (resolved to `np.iinfo(np.int32).max`, i.e. `2147483647`). - Backend/core mask sentinel corresponding to user-facing `ee_mask_value`. - Used to convert EE nodata/sentinel pixels to NaN-friendly behavior. ### `request_byte_limit` (`int`) - Default: `48 * 1024 * 1024` (48 MB). - Upper bound for per-request payload size. - Advanced tuning control: Earth Engine size constraints vary by workload. - Prefer lowering this value when you hit request-size instability. - Avoid increasing unless validated for your specific dataset/query pattern. ### `ee_init_kwargs` (`dict[str, Any] | None`) - Default: `None`. - Keyword arguments forwarded to `ee.Initialize(...)` during optional worker auto-initialization. - Useful in distributed settings where workers need credentials/project config. ### `ee_init_if_necessary` (`bool`) - Default: `False`. - Whether Xee should attempt EE initialization on demand (commonly for remote workers). - Keep `False` for standard local workflows where EE is already initialized. ### `executor_kwargs` (`dict[str, Any] | None`) - Default: `None` (internally treated as `{}`). - Thread pool settings for parallel pixel retrieval. - Advanced tuning: increasing worker count may improve throughput or trigger more rate/quota pressure depending on workload. ### `getitem_kwargs` (`dict[str, int] | None`) - Default: `None` (uses internal defaults: `max_retries=6`, `initial_delay=500` ms). - Retry/backoff tuning for array indexing fetches. - Useful for transient quota/rate errors. - Tune conservatively (`max_retries`, `initial_delay`) and prefer reducing concurrency before aggressive retry expansion. ### `fast_time_slicing` (`bool`) - Default: `False`. - Enables a faster slice path by loading images by ID. - Important: for computed/modified ImageCollections, this can return original asset images (looked up by ID) rather than your computed image values. ## `fast_time_slicing` Deep Dive `fast_time_slicing=True` is an important optimization, but it changes how time slices are resolved. What it does: - `False` (default): Xee slices directly from the in-memory EE `ImageCollection` object. - `True`: Xee slices by `system:id` first and then loads by those IDs. Why this can be confusing: - If your collection is computed/modified (for example: `.map(...)`, band math, clipping/masking, or replacing images), slicing by ID can bypass those computed modifications and return the original images associated with the IDs. - In other words, `fast_time_slicing=True` can be faster but may not reflect computed collection transformations. When to use it: - Good fit: direct/stored collections where you want faster time slicing and are not depending on computed per-image transformations. - Use caution: computed collections where transformed pixel values must be preserved in reads. Practical recommendation: 1. Start with `fast_time_slicing=False` for correctness-sensitive workflows. 2. Enable `fast_time_slicing=True` only after validating that sampled outputs match your intended processing semantics. 3. If enabled and your collection lacks image IDs, Xee logs a warning and falls back to default (non-fast) behavior. ## Common Recipes ### 1. Match source projection/resolution Use this when you want output aligned to the dataset's native grid. ```python import ee import xarray as xr from xee import helpers ic = ee.ImageCollection('ECMWF/ERA5_LAND/MONTHLY_AGGR') grid = helpers.extract_grid_params(ic) ds = xr.open_dataset(ic, engine='ee', **grid) ``` ### 2. Manual grid override Use this when you must align with an external raster/grid spec. ```python import xarray as xr manual_crs = 'EPSG:4326' manual_transform = (0.25, 0, -180, 0, -0.25, 90) manual_shape = (1440, 720) # (width, height) ds = xr.open_dataset( 'ee://ECMWF/ERA5_LAND/MONTHLY_AGGR', engine='ee', crs=manual_crs, crs_transform=manual_transform, shape_2d=manual_shape, ) ``` ### 3. Performance/chunking tuning Use this when throughput or memory behavior needs tuning. ```python import ee import xarray as xr from xee import helpers ic = ee.ImageCollection('ECMWF/ERA5_LAND/MONTHLY_AGGR') grid = helpers.extract_grid_params(ic) ds = xr.open_dataset( ic, engine='ee', **grid, chunks={'time': 12}, io_chunks={'time': 24, 'x': 256, 'y': 256}, request_byte_limit=32 * 1024 * 1024, ) ``` ```{admonition} Advanced tuning only :class: warning `io_chunks` and `request_byte_limit` are advanced controls. Earth Engine imposes response/request size constraints, so these values usually require trial-and-error for each workload. Start from defaults and tune conservatively. In most cases, reducing request size is safer than increasing it. ``` Notes: - `chunks` controls Dask chunking in Xarray. - `io_chunks` controls request windows used by Xee for Earth Engine reads. - `request_byte_limit` limits per-request payload size. Prefer reducing this if you encounter request-size failures or unstable reads. - Avoid increasing `request_byte_limit` unless you have validated behavior against Earth Engine limits for your specific dataset and query pattern. ## Object vs URI Inputs For `engine='ee'`, these are equivalent in outcome once resolved: - Passing an EE object (`ee.ImageCollection`/`ee.Image`) - Passing a URI/asset id string (`ee://...` style or asset path string) Object inputs are often convenient in notebooks where you've already built a computed collection. URI/asset id strings are useful for concise, declarative loading and config-driven workflows.