SpatialData object#

class spatialdata.SpatialData(images=None, labels=None, points=None, shapes=None, tables=None, attrs=None)#

Bases: object

The SpatialData object.

The SpatialData object is a modular container for arbitrary combinations of SpatialElements and annotation tables. The elements can be accesses separately and are stored as standard types (anndata.AnnData, geopandas.GeoDataFrame, xarray.DataArray).

The elements need to pass a validation step. To construct valid elements you can use the parsers that we provide:

  • Image2DModel,

  • Image3DModel,

  • Labels2DModel,

  • Labels3DModel,

  • PointsModel,

  • ShapesModel,

  • TableModel

Parameters:
  • images (Optional[dict[str, DataArray | DataTree]] (default: None)) – Dict of 2D and 3D image elements. The following parsers are available: Image2DModel, Image3DModel.

  • labels (Optional[dict[str, DataArray | DataTree]] (default: None)) – Dict of 2D and 3D labels elements. Labels are regions, they can’t contain annotation, but they can be annotated by a table. The following parsers are available: Labels2DModel, Labels3DModel.

  • points (Optional[dict[str, DataFrame]] (default: None)) – Dict of points elements. Points can contain annotations. The following parsers is available: PointsModel.

  • shapes (Optional[dict[str, GeoDataFrame]] (default: None)) – Dict of 2D shapes elements (circles, polygons, multipolygons). Shapes are regions, they can’t contain annotation, but they can be annotated by a table. The following parsers are available: ShapesModel.

  • table – AnnData table containing annotations for regions (labels and shapes). The following parsers is available: TableModel.

Notes

The SpatialElements are stored with standard types:

The table can annotate regions (shapesor labels) and can be used to store additional information. Points are not regions but 0-dimensional locations. They can’t be annotated by a table, but they can store annotation directly.

add_image(name, image, storage_options=None, overwrite=False)#

Deprecated. Use sdata[name] = image instead.

Return type:

None

add_labels(name, labels, storage_options=None, overwrite=False)#

Deprecated. Use sdata[name] = labels instead.

Return type:

None

add_points(name, points, overwrite=False)#

Deprecated. Use sdata[name] = points instead.

Return type:

None

add_shapes(name, shapes, overwrite=False)#

Deprecated. Use sdata[name] = shapes instead.

Return type:

None

aggregate(values_sdata=None, values=None, by_sdata=None, by=None, value_key=None, agg_func='sum', target_coordinate_system='global', fractions=False, region_key='region', instance_key='instance_id', deepcopy=True, table_name='table', **kwargs)#

Aggregate values by given region.

Return type:

SpatialData

Notes

This function calls spatialdata.aggregate() with the convenience that values and by can be string without having to specify the values_sdata and by_sdata, which in that case will be replaced by self.

Please see spatialdata.aggregate() for the complete docstring.

delete_element_from_disk(element_name)#

Delete an element, or list of elements, from the Zarr store associated with the SpatialData object.

The element must be available in-memory and will not be removed from the SpatialData object in-memory storage.

Parameters:

element_name (str | list[str]) – The name(s) of the element(s) to delete.

Return type:

None

Notes

If you pass a list of names, the elements will be deleted one by one. If an error occurs during the deletion of an element, the deletion of the remaining elements will not be attempted.

Important note on overwriting elements saved on disk. In general, it is not recommended to delete an element from the Zarr store with the intention of saving an updated version of the element that is available only in-memory. This is because data loss may occur if the execution is interrupted during writing.

Here are some recommendations:

  • the above scenario may be acceptable when the element to save can be easily recreated from the data;

  • if data recreation is not possible or computationally expensive, it is recommended to first save the element to a different location and then eventually copy it to the original desired location. Please note that this approach is not guaranteed to be always safe (e.g. if multiple processes are trying to write to the same Zarr store simultaneously, then the backup data may become corrupted).

Ultimately, it is the responsibility of the user to consider the implications of the current computational environment (e.g. operating system, local vs network storage, file permissions, …) and call this function appropriately (or implement a tailored solution), to prevent data loss.

elements_are_self_contained()#

Describe if elements are self-contained as a dict of element_name to bool.

Return type:

dict[str, bool]

Returns:

: A dictionary of element_name to boolean values indicating whether the elements are self-contained.

Notes

Please see spatialdata.SpatialData.is_self_contained() for more information on the semantic of self-contained elements.

elements_paths_in_memory()#

Get the paths of the elements in the SpatialData object.

Return type:

list[str]

Returns:

: A list of paths of the elements in the SpatialData object.

Notes

The paths are relative to the root of the SpatialData object and are in the format “element_type/element_name”.

elements_paths_on_disk()#

Get the paths of the elements saved in the Zarr store.

Return type:

list[str]

Returns:

: A list of paths of the elements saved in the Zarr store.

filter_by_coordinate_system(coordinate_system, filter_tables=True, include_orphan_tables=False)#

Filter the SpatialData by one (or a list of) coordinate system.

This returns a SpatialData object with the elements containing a transformation mapping to the specified coordinate system(s).

Parameters:
  • coordinate_system (str | list[str]) – The coordinate system(s) to filter by.

  • filter_tables (bool (default: True)) – If True (default), the tables will be filtered to only contain regions of an element belonging to the specified coordinate system(s).

  • include_orphan_tables (bool (default: False)) – If True (not default), include tables that do not annotate SpatialElement(s). Only has an effect if filter_tables is also set to True.

Return type:

SpatialData

Returns:

: The filtered SpatialData.

static from_elements_dict(elements_dict, attrs=None)#

Create a SpatialData object from a dict of elements.

Parameters:
  • elements_dict (dict[str, DataArray | DataTree | GeoDataFrame | DataFrame | AnnData]) – Dict of elements. The keys are the names of the elements and the values are the elements. A table can be present in the dict, but only at most one; its name is not used and can be anything.

  • attrs (Optional[Mapping[Any, Any]] (default: None)) – Additional attributes to store in the SpatialData object.

Return type:

SpatialData

Returns:

: The SpatialData object.

gen_elements()#

Generate elements within the SpatialData object.

This method generates elements in the SpatialData object (images, labels, points, shapes and tables)

Return type:

Generator[tuple[str, str, DataArray | DataTree | GeoDataFrame | DataFrame | AnnData], None, None]

Returns:

: A generator that yields tuples containing the name, description, and element objects themselves.

gen_spatial_elements()#

Generate spatial elements within the SpatialData object.

This method generates spatial elements (images, labels, points and shapes).

Return type:

Generator[tuple[str, str, DataArray | DataTree | GeoDataFrame | DataFrame], None, None]

Returns:

: A generator that yields tuples containing the element_type (string), name, and SpatialElement objects themselves.

get(key, default_value=None)#

Get element from SpatialData object based on corresponding name.

Parameters:
  • key (str) – The key to lookup in the spatial elements.

  • default_value (Union[DataArray, DataTree, GeoDataFrame, DataFrame, AnnData, None] (default: None)) – The default value (a SpatialElement or a table) to return if the key is not found. Default is None.

Return type:

DataArray | DataTree | GeoDataFrame | DataFrame | AnnData | None

Returns:

: The SpatialData element associated with the given key, if found. Otherwise, the default value is returned.

static get_annotated_regions(table)#

Get the regions annotated by a table.

Parameters:

table (AnnData) – The AnnData table for which to retrieve annotated regions.

Return type:

str | list[str]

Returns:

: The annotated regions.

get_attrs(key, return_as=None, sep='_', flatten=True)#

Retrieve a specific key from sdata.attrs and return it in the specified format.

Parameters:
  • key (str) – The key to retrieve from the attrs.

  • return_as (Optional[Literal['dict', 'json', 'df']] (default: None)) – The format in which to return the data. Options are ‘dict’, ‘json’, ‘df’. If None, the function returns the data in its original format.

  • sep (str (default: '_')) – Separator for nested keys in flattened data. Defaults to “_”.

  • flatten (bool (default: True)) – If True, flatten the data if it is a mapping. Defaults to True.

Return type:

dict[str, Any] | str | DataFrame

Returns:

: The data associated with the specified key, returned in the specified format. The format can be a dictionary, JSON string, or Pandas DataFrame, depending on the value of return_as.

static get_instance_key_column(table)#

Return the instance key column in table.obs containing for each row the instance id of that row.

Parameters:

table (AnnData) – The AnnData table.

Return type:

Series

Returns:

: The instance key column.

Raises:

KeyError – If the instance key column is not found in table.obs.

static get_region_key_column(table)#

Get the column of table.obs containing per row the region annotated by that row.

Parameters:

table (AnnData) – The AnnData table.

Return type:

Series

Returns:

: The region key column.

Raises:

KeyError – If the region key column is not found in table.obs.

classmethod init_from_elements(elements, tables=None, attrs=None)#

Create a SpatialData object from a dict of named elements and an optional table.

Parameters:
  • elements (dict[str, DataArray | DataTree | GeoDataFrame | DataFrame]) – A dict of named elements.

  • tables (Union[AnnData, dict[str, AnnData], None] (default: None)) – An optional table or dictionary of tables

  • attrs (Optional[Mapping[Any, Any]] (default: None)) – Additional attributes to store in the SpatialData object.

Return type:

SpatialData

Returns:

: The SpatialData object.

is_backed()#

Check if the data is backed by a Zarr storage or if it is in-memory.

Return type:

bool

is_self_contained(element_name=None)#

Check if an object is self-contained; self-contained objects have a simpler disk storage layout.

A SpatialData object is said to be self-contained if all its SpatialElements or AnnData tables are self-contained. A SpatialElement or AnnData table is said to be self-contained when it does not depend on a Dask computational graph (i.e. it is not “lazy”) or when it is Dask-backed and each file that is read in the Dask computational graph is contained within the Zarr store associated with the SpatialElement.

Currently, Points, Labels and Images are always represented lazily, while Shapes and Tables are always in-memory. Therefore, the latter are always self-contained.

Printing a SpatialData object will show if any of its elements are not self-contained.

Parameters:

element_name (Optional[str] (default: None)) – The name of the element to check. If None, the SpatialData object is checked instead.

Return type:

bool

Returns:

: A boolean value indicating whether the SpatialData object is self-contained.

Notes

Generally, it is preferred to work with self-contained SpatialData objects; working with non-self-contained SpatialData objects is possible but requires more care when performing IO operations:

  1. Non-self-contained elements depend on files outside the Zarr store associated with the SpatialData object. Therefore, changes on these external files (such as deletion), will be reflected in the SpatialData object.

  2. When calling write_element() and write_element() metadata, the changes will be applied to the Zarr store associated with the SpatialData object, not on the external files.

locate_element(element)#

Locate a SpatialElement within the SpatialData object and returns its Zarr paths relative to the root.

Parameters:

element (DataArray | DataTree | GeoDataFrame | DataFrame) – The queried SpatialElement

Return type:

list[str]

Returns:

: A list of Zarr paths of the element relative to the root (multiple copies of the same element are allowed). The list is empty if the element is not present.

static read(file_path, selection=None)#

Read a SpatialData object from a Zarr storage (on-disk or remote).

Parameters:
  • file_path (Path | str) – The path or URL to the Zarr storage.

  • selection (Optional[tuple[str]] (default: None)) – The elements to read (images, labels, points, shapes, table). If None, all elements are read.

Return type:

SpatialData

Returns:

: The SpatialData object.

rename_coordinate_systems(rename_dict)#

Rename coordinate systems.

Parameters:

rename_dict (dict[str, str]) – A dictionary mapping old coordinate system names to new coordinate system names.

Return type:

None

Notes

The method does not allow to rename a coordinate system into an existing one, unless the existing one is also renamed in the same call.

set_channel_names(element_name, channel_names, write=False)#

Set the channel names for a image SpatialElement in the SpatialData object.

This method assumes that the SpatialData object and the element are already stored on disk as it will also overwrite the channel names metadata on disk. In case either the SpatialData object or the element are not stored on disk, please use SpatialData.set_image_channel_names instead.

Parameters:
  • element_name (str) – Name of the image SpatialElement.

  • channel_names (str | list[str]) – The channel names to be assigned to the c dimension of the image SpatialElement.

  • write (bool (default: False)) – Whether to overwrite the channel metadata on disk.

Return type:

None

set_table_annotates_spatialelement(table_name, region, region_key=None, instance_key=None)#

Set the SpatialElement annotation target of a given AnnData table.

Parameters:
  • table_name (str) – The name of the table to set the annotation target for.

  • region (str | Series | list[str]) – The name of the target element for the annotation. This can either be a string or a pandas Series object.

  • region_key (Optional[str] (default: None)) – The region key for the annotation. If not specified, defaults to None which means the currently set region key is reused.

  • instance_key (Optional[str] (default: None)) – The instance key for the annotation. If not specified, defaults to None which means the currently set instance key is reused.

Raises:
  • ValueError – If the annotation SpatialElement target is not present in the SpatialData object.

  • TypeError – If no current annotation metadata is found and both region_key and instance_key are not specified.

Return type:

None

Notes

Before calling this function, you may need to replace the values of the region_key column, or add a new region_key column. For example, by calling: sdata[“table”].obs[“region”] = “my_new_instances”.

subset(element_names, filter_tables=True, include_orphan_tables=False)#

Subset the SpatialData object.

Parameters:
  • element_names (list[str]) – The names of the element_names to subset. If the element_name is the name of a table, this table would be completely included in the subset even if filter_table is True.

  • filter_table – If True (default), the table is filtered to only contain rows that are annotating regions contained within the element_names.

  • include_orphan_tables (bool (default: False)) – If True (not default), include tables that do not annotate SpatialElement(s). Only has an effect if filter_tables is also set to True.

Return type:

SpatialData

Returns:

: The subsetted SpatialData object.

transform_element_to_coordinate_system(element_name, target_coordinate_system, maintain_positioning=False)#

Transform an element to a given coordinate system.

Parameters:
  • element_name (str) – The name of the element to transform.

  • target_coordinate_system (str) – The target coordinate system.

  • maintain_positioning (bool (default: False)) – Default False (most common use case). If True, the data will be transformed but a transformation will be added so that the positioning of the data in the target coordinate system will not change. If you want to align datasets to a common coordinate system you should use the default value.

Return type:

DataArray | DataTree | GeoDataFrame | DataFrame

Returns:

: The transformed element.

transform_to_coordinate_system(target_coordinate_system, maintain_positioning=False)#

Transform the SpatialData to a given coordinate system.

Parameters:
  • target_coordinate_system (str) – The target coordinate system.

  • maintain_positioning (bool (default: False)) – Default False (most common use case). If True, the data will be transformed but a transformation will be added so that the positioning of the data in the target coordinate system will not change. If you want to align datasets to a common coordinate system you should use the default value.

Return type:

SpatialData

Returns:

: The transformed SpatialData.

static update_annotated_regions_metadata(table, region_key=None)#

Update the annotation target of the table using the region_key column in table.obs.

The table must already contain annotation metadata, e.g. the region, region_key and instance_key must already be specified for the table. If this is not the case please use TableModel.parse instead and specify the annotation metadata by passing the correct arguments to that function.

Parameters:
  • table (AnnData) – The AnnData table for which to set the annotation target.

  • region_key (Optional[str] (default: None)) – The column in table.obs containing the rows specifying the SpatialElements being annotated. If None the current value for region_key in the annotation metadata of the table is used. If specified but different from the current region_key, the current region_key is overwritten.

Return type:

AnnData

Returns:

: The table for which the annotation target has been set.

validate_table_in_spatialdata(table)#

Validate the presence of the annotation target of a SpatialData table in the SpatialData object.

This method validates a table in the SpatialData object to ensure that if annotation metadata is present, the annotation target (SpatialElement) is present in the SpatialData object, the dtypes of the instance key column in the table and the annotation target do not match. Otherwise, a warning is raised.

Parameters:

table (AnnData) – The table potentially annotating a SpatialElement

Raises:
  • UserWarning – If the table is annotating elements not present in the SpatialData object.

  • UserWarning – The dtypes of the instance key column in the table and the annotation target do not match.

Return type:

None

write(file_path, overwrite=False, consolidate_metadata=True, format=None)#

Write the SpatialData object to a Zarr store.

Parameters:
  • file_path (str | Path) – The path to the Zarr store to write to.

  • overwrite (bool (default: False)) – If True, overwrite the Zarr store if it already exists. If False, write() will fail if the Zarr store already exists.

  • consolidate_metadata (bool (default: True)) – If True, triggers zarr.convenience.consolidate_metadata(), which writes all the metadata in a single file at the root directory of the store. This makes the data cloud accessible, which is required for certain cloud stores (such as S3).

  • format (Union[SpatialDataFormat, list[SpatialDataFormat], None] (default: None)) – The format to use for writing the elements of the SpatialData object. It is recommended to leave this parameter equal to None (default to latest format for all the elements). If not None, it must be either a format for an element, or a list of formats. For example it can be a subset of the following list [RasterFormatVXX(), ShapesFormatVXX(), PointsFormatVXX(), TablesFormatVXX()]. (XX denote the version number, and should be replaced with the respective format; the version numbers can differ across elements). By default, the latest format is used for all elements, i.e. CurrentRasterFormat, CurrentShapesFormat, CurrentPointsFormat, CurrentTablesFormat.

Return type:

None

write_channel_names(element_name=None)#

Write channel names to disk for a single image element, or for all image elements, without rewriting the data.

Parameters:

element_name (Optional[str] (default: None)) – The name of the element to write the channel names of. If None, write the channel names of all image elements.

Return type:

None

write_element(element_name, overwrite=False, format=None)#

Write a single element, or a list of elements, to the Zarr store used for backing.

The element must already be present in the SpatialData object.

Parameters:
  • element_name (str | list[str]) – The name(s) of the element(s) to write.

  • overwrite (bool (default: False)) – If True, overwrite the element if it already exists.

  • format (Union[SpatialDataFormat, list[SpatialDataFormat], None] (default: None)) –

    It is recommended to leave this parameter equal to None. See more details in the documentation of

    SpatialData.write().

Return type:

None

Notes

If you pass a list of names, the elements will be written one by one. If an error occurs during the writing of an element, the writing of the remaining elements will not be attempted.

write_metadata(element_name=None, consolidate_metadata=None, write_attrs=True)#

Write the metadata of a single element, or of all elements, to the Zarr store, without rewriting the data.

Currently only the transformations and the consolidated metadata can be re-written without re-writing the data.

Future versions of SpatialData will support writing the following metadata without requiring a rewrite of the data:

  • .uns[‘spatialdata_attrs’] metadata for AnnData;

  • .attrs[‘spatialdata_attrs’] metadata for DaskDataFrame;

  • OMERO metadata for the channel name of images.

Parameters:
  • element_name (Optional[str] (default: None)) – The name of the element to write. If None, write the metadata of all elements.

  • consolidate_metadata (Optional[bool] (default: None)) – If True, consolidate the metadata to more easily support remote reading. By default write the metadata only if the metadata was already consolidated.

Return type:

None

Notes

When using the methods write() and write_element(), the metadata is written automatically.

write_transformations(element_name=None)#

Write transformations to disk for a single element, or for all elements, without rewriting the data.

Parameters:

element_name (Optional[str] (default: None)) – The name of the element to write. If None, write the transformations of all elements.

Return type:

None

property attrs: dict[Any, Any]#

Dictionary of global attributes on this SpatialData object.

Notes

Operations on SpatialData objects such as subset(), query(), …, will pass the .attrs by reference. If you want to modify the .attrs without affecting the original object, you should either use copy.deepcopy(sdata.attrs) or eventually copy the SpatialData object using spatialdata.deepcopy().

property images: Images#

Return images as a Dict of name to image data.

property labels: Labels#

Return labels as a Dict of name to label data.

property path: Path | None#

Path to the Zarr storage.

property points: Points#

Return points as a Dict of name to point data.

property query: QueryManager#

An accessor to the query operations.

Examples

>>> sdata.query.bounding_box_query(...)
>>> sdata.query.polygon_query(...)
property shapes: Shapes#

Return shapes as a Dict of name to shape data.

property table: None | AnnData#

Return table with name table from tables if it exists.

Returns:

The table.

property tables: Tables#

Return tables dictionary.

Returns:

dict[str, AnnData] Either the empty dictionary or a dictionary with as values the strings representing the table names and as values the AnnData tables themselves.