SpatialData object#
- class spatialdata.SpatialData(images=None, labels=None, points=None, shapes=None, tables=None, attrs=None)#
Bases:
object
The SpatialData object.
The SpatialData object is a modular container for arbitrary combinations of SpatialElements and annotation tables. The elements can be accesses separately and are stored as standard types (
anndata.AnnData
,geopandas.GeoDataFrame
,xarray.DataArray
).The elements need to pass a validation step. To construct valid elements you can use the parsers that we provide:
Image2DModel
,Image3DModel
,Labels2DModel
,Labels3DModel
,PointsModel
,ShapesModel
,TableModel
- Parameters:
images (
Optional
[dict
[str
,DataArray
|DataTree
]] (default:None
)) – Dict of 2D and 3D image elements. The following parsers are available:Image2DModel
,Image3DModel
.labels (
Optional
[dict
[str
,DataArray
|DataTree
]] (default:None
)) – Dict of 2D and 3D labels elements. Labels are regions, they can’t contain annotation, but they can be annotated by a table. The following parsers are available:Labels2DModel
,Labels3DModel
.points (
Optional
[dict
[str
,DataFrame
]] (default:None
)) – Dict of points elements. Points can contain annotations. The following parsers is available:PointsModel
.shapes (
Optional
[dict
[str
,GeoDataFrame
]] (default:None
)) – Dict of 2D shapes elements (circles, polygons, multipolygons). Shapes are regions, they can’t contain annotation, but they can be annotated by a table. The following parsers are available:ShapesModel
.table – AnnData table containing annotations for regions (labels and shapes). The following parsers is available:
TableModel
.
Notes
The SpatialElements are stored with standard types:
images and labels are stored as
xarray.DataArray
ordatatree.DataTree
objects.points are stored as
dask.dataframe.DataFrame
objects.shapes are stored as
geopandas.GeoDataFrame
.- the table are stored as
anndata.AnnData
objects, with the spatial coordinates stored in the obsm slot.
- the table are stored as
The table can annotate regions (shapesor labels) and can be used to store additional information. Points are not regions but 0-dimensional locations. They can’t be annotated by a table, but they can store annotation directly.
- add_image(name, image, storage_options=None, overwrite=False)#
Deprecated. Use
sdata[name] = image
instead.- Return type:
None
- add_labels(name, labels, storage_options=None, overwrite=False)#
Deprecated. Use
sdata[name] = labels
instead.- Return type:
None
- add_points(name, points, overwrite=False)#
Deprecated. Use
sdata[name] = points
instead.- Return type:
None
- add_shapes(name, shapes, overwrite=False)#
Deprecated. Use
sdata[name] = shapes
instead.- Return type:
None
- aggregate(values_sdata=None, values=None, by_sdata=None, by=None, value_key=None, agg_func='sum', target_coordinate_system='global', fractions=False, region_key='region', instance_key='instance_id', deepcopy=True, table_name='table', **kwargs)#
Aggregate values by given region.
- Return type:
Notes
This function calls
spatialdata.aggregate()
with the convenience thatvalues
andby
can be string without having to specify thevalues_sdata
andby_sdata
, which in that case will be replaced byself
.Please see
spatialdata.aggregate()
for the complete docstring.
- delete_element_from_disk(element_name)#
Delete an element, or list of elements, from the Zarr store associated with the SpatialData object.
The element must be available in-memory and will not be removed from the SpatialData object in-memory storage.
- Parameters:
element_name (
str
|list
[str
]) – The name(s) of the element(s) to delete.- Return type:
None
Notes
If you pass a list of names, the elements will be deleted one by one. If an error occurs during the deletion of an element, the deletion of the remaining elements will not be attempted.
Important note on overwriting elements saved on disk. In general, it is not recommended to delete an element from the Zarr store with the intention of saving an updated version of the element that is available only in-memory. This is because data loss may occur if the execution is interrupted during writing.
Here are some recommendations:
the above scenario may be acceptable when the element to save can be easily recreated from the data;
if data recreation is not possible or computationally expensive, it is recommended to first save the element to a different location and then eventually copy it to the original desired location. Please note that this approach is not guaranteed to be always safe (e.g. if multiple processes are trying to write to the same Zarr store simultaneously, then the backup data may become corrupted).
Ultimately, it is the responsibility of the user to consider the implications of the current computational environment (e.g. operating system, local vs network storage, file permissions, …) and call this function appropriately (or implement a tailored solution), to prevent data loss.
- elements_are_self_contained()#
Describe if elements are self-contained as a dict of element_name to bool.
- Return type:
dict
[str
,bool
]- Returns:
: A dictionary of element_name to boolean values indicating whether the elements are self-contained.
Notes
Please see
spatialdata.SpatialData.is_self_contained()
for more information on the semantic of self-contained elements.
- elements_paths_in_memory()#
Get the paths of the elements in the SpatialData object.
- Return type:
list
[str
]- Returns:
: A list of paths of the elements in the SpatialData object.
Notes
The paths are relative to the root of the SpatialData object and are in the format “element_type/element_name”.
- elements_paths_on_disk()#
Get the paths of the elements saved in the Zarr store.
- Return type:
list
[str
]- Returns:
: A list of paths of the elements saved in the Zarr store.
- filter_by_coordinate_system(coordinate_system, filter_tables=True, include_orphan_tables=False)#
Filter the SpatialData by one (or a list of) coordinate system.
This returns a SpatialData object with the elements containing a transformation mapping to the specified coordinate system(s).
- Parameters:
coordinate_system (
str
|list
[str
]) – The coordinate system(s) to filter by.filter_tables (
bool
(default:True
)) – If True (default), the tables will be filtered to only contain regions of an element belonging to the specified coordinate system(s).include_orphan_tables (
bool
(default:False
)) – If True (not default), include tables that do not annotate SpatialElement(s). Only has an effect if filter_tables is also set to True.
- Return type:
- Returns:
: The filtered SpatialData.
- static from_elements_dict(elements_dict, attrs=None)#
Create a SpatialData object from a dict of elements.
- Parameters:
elements_dict (
dict
[str
,DataArray
|DataTree
|GeoDataFrame
|DataFrame
|AnnData
]) – Dict of elements. The keys are the names of the elements and the values are the elements. A table can be present in the dict, but only at most one; its name is not used and can be anything.attrs (
Optional
[Mapping
[Any
,Any
]] (default:None
)) – Additional attributes to store in the SpatialData object.
- Return type:
- Returns:
: The SpatialData object.
- gen_elements()#
Generate elements within the SpatialData object.
This method generates elements in the SpatialData object (images, labels, points, shapes and tables)
- Return type:
Generator
[tuple
[str
,str
,DataArray
|DataTree
|GeoDataFrame
|DataFrame
|AnnData
],None
,None
]- Returns:
: A generator that yields tuples containing the name, description, and element objects themselves.
- gen_spatial_elements()#
Generate spatial elements within the SpatialData object.
This method generates spatial elements (images, labels, points and shapes).
- Return type:
Generator
[tuple
[str
,str
,DataArray
|DataTree
|GeoDataFrame
|DataFrame
],None
,None
]- Returns:
: A generator that yields tuples containing the element_type (string), name, and SpatialElement objects themselves.
- get(key, default_value=None)#
Get element from SpatialData object based on corresponding name.
- Parameters:
key (
str
) – The key to lookup in the spatial elements.default_value (
Union
[DataArray
,DataTree
,GeoDataFrame
,DataFrame
,AnnData
,None
] (default:None
)) – The default value (a SpatialElement or a table) to return if the key is not found. Default is None.
- Return type:
DataArray
|DataTree
|GeoDataFrame
|DataFrame
|AnnData
|None
- Returns:
: The SpatialData element associated with the given key, if found. Otherwise, the default value is returned.
- static get_annotated_regions(table)#
Get the regions annotated by a table.
- Parameters:
table (
AnnData
) – The AnnData table for which to retrieve annotated regions.- Return type:
str
|list
[str
]- Returns:
: The annotated regions.
- get_attrs(key, return_as=None, sep='_', flatten=True)#
Retrieve a specific key from sdata.attrs and return it in the specified format.
- Parameters:
key (
str
) – The key to retrieve from the attrs.return_as (
Optional
[Literal
['dict'
,'json'
,'df'
]] (default:None
)) – The format in which to return the data. Options are ‘dict’, ‘json’, ‘df’. If None, the function returns the data in its original format.sep (
str
(default:'_'
)) – Separator for nested keys in flattened data. Defaults to “_”.flatten (
bool
(default:True
)) – If True, flatten the data if it is a mapping. Defaults to True.
- Return type:
dict
[str
,Any
] |str
|DataFrame
- Returns:
: The data associated with the specified key, returned in the specified format. The format can be a dictionary, JSON string, or Pandas DataFrame, depending on the value of
return_as
.
- static get_instance_key_column(table)#
Return the instance key column in table.obs containing for each row the instance id of that row.
- Parameters:
table (
AnnData
) – The AnnData table.- Return type:
Series
- Returns:
: The instance key column.
- Raises:
KeyError – If the instance key column is not found in table.obs.
- static get_region_key_column(table)#
Get the column of table.obs containing per row the region annotated by that row.
- Parameters:
table (
AnnData
) – The AnnData table.- Return type:
Series
- Returns:
: The region key column.
- Raises:
KeyError – If the region key column is not found in table.obs.
- classmethod init_from_elements(elements, tables=None, attrs=None)#
Create a SpatialData object from a dict of named elements and an optional table.
- Parameters:
elements (
dict
[str
,DataArray
|DataTree
|GeoDataFrame
|DataFrame
]) – A dict of named elements.tables (
Union
[AnnData
,dict
[str
,AnnData
],None
] (default:None
)) – An optional table or dictionary of tablesattrs (
Optional
[Mapping
[Any
,Any
]] (default:None
)) – Additional attributes to store in the SpatialData object.
- Return type:
- Returns:
: The SpatialData object.
- is_backed()#
Check if the data is backed by a Zarr storage or if it is in-memory.
- Return type:
bool
- is_self_contained(element_name=None)#
Check if an object is self-contained; self-contained objects have a simpler disk storage layout.
A SpatialData object is said to be self-contained if all its SpatialElements or AnnData tables are self-contained. A SpatialElement or AnnData table is said to be self-contained when it does not depend on a Dask computational graph (i.e. it is not “lazy”) or when it is Dask-backed and each file that is read in the Dask computational graph is contained within the Zarr store associated with the SpatialElement.
Currently, Points, Labels and Images are always represented lazily, while Shapes and Tables are always in-memory. Therefore, the latter are always self-contained.
Printing a SpatialData object will show if any of its elements are not self-contained.
- Parameters:
element_name (
Optional
[str
] (default:None
)) – The name of the element to check. IfNone
, the SpatialData object is checked instead.- Return type:
bool
- Returns:
: A boolean value indicating whether the SpatialData object is self-contained.
Notes
Generally, it is preferred to work with self-contained SpatialData objects; working with non-self-contained SpatialData objects is possible but requires more care when performing IO operations:
Non-self-contained elements depend on files outside the Zarr store associated with the SpatialData object. Therefore, changes on these external files (such as deletion), will be reflected in the SpatialData object.
When calling
write_element()
andwrite_element()
metadata, the changes will be applied to the Zarr store associated with the SpatialData object, not on the external files.
- locate_element(element)#
Locate a SpatialElement within the SpatialData object and returns its Zarr paths relative to the root.
- Parameters:
element (
DataArray
|DataTree
|GeoDataFrame
|DataFrame
) – The queried SpatialElement- Return type:
list
[str
]- Returns:
: A list of Zarr paths of the element relative to the root (multiple copies of the same element are allowed). The list is empty if the element is not present.
- static read(file_path, selection=None)#
Read a SpatialData object from a Zarr storage (on-disk or remote).
- Parameters:
file_path (
Path
|str
) – The path or URL to the Zarr storage.selection (
Optional
[tuple
[str
]] (default:None
)) – The elements to read (images, labels, points, shapes, table). If None, all elements are read.
- Return type:
- Returns:
: The SpatialData object.
- rename_coordinate_systems(rename_dict)#
Rename coordinate systems.
- Parameters:
rename_dict (
dict
[str
,str
]) – A dictionary mapping old coordinate system names to new coordinate system names.- Return type:
None
Notes
The method does not allow to rename a coordinate system into an existing one, unless the existing one is also renamed in the same call.
- set_channel_names(element_name, channel_names, write=False)#
Set the channel names for a image
SpatialElement
in theSpatialData
object.This method assumes that the
SpatialData
object and the element are already stored on disk as it will also overwrite the channel names metadata on disk. In case either theSpatialData
object or the element are not stored on disk, please useSpatialData.set_image_channel_names
instead.- Parameters:
element_name (
str
) – Name of the imageSpatialElement
.channel_names (
str
|list
[str
]) – The channel names to be assigned to the c dimension of the imageSpatialElement
.write (
bool
(default:False
)) – Whether to overwrite the channel metadata on disk.
- Return type:
None
- set_table_annotates_spatialelement(table_name, region, region_key=None, instance_key=None)#
Set the SpatialElement annotation target of a given AnnData table.
- Parameters:
table_name (
str
) – The name of the table to set the annotation target for.region (
str
|Series
|list
[str
]) – The name of the target element for the annotation. This can either be a string or a pandas Series object.region_key (
Optional
[str
] (default:None
)) – The region key for the annotation. If not specified, defaults to None which means the currently set region key is reused.instance_key (
Optional
[str
] (default:None
)) – The instance key for the annotation. If not specified, defaults to None which means the currently set instance key is reused.
- Raises:
ValueError – If the annotation SpatialElement target is not present in the SpatialData object.
TypeError – If no current annotation metadata is found and both region_key and instance_key are not specified.
- Return type:
None
Notes
Before calling this function, you may need to replace the values of the
region_key
column, or add a newregion_key
column. For example, by calling: sdata[“table”].obs[“region”] = “my_new_instances”.
- subset(element_names, filter_tables=True, include_orphan_tables=False)#
Subset the SpatialData object.
- Parameters:
element_names (
list
[str
]) – The names of the element_names to subset. If the element_name is the name of a table, this table would be completely included in the subset even if filter_table is True.filter_table – If True (default), the table is filtered to only contain rows that are annotating regions contained within the element_names.
include_orphan_tables (
bool
(default:False
)) – If True (not default), include tables that do not annotate SpatialElement(s). Only has an effect if filter_tables is also set to True.
- Return type:
- Returns:
: The subsetted SpatialData object.
- transform_element_to_coordinate_system(element_name, target_coordinate_system, maintain_positioning=False)#
Transform an element to a given coordinate system.
- Parameters:
element_name (
str
) – The name of the element to transform.target_coordinate_system (
str
) – The target coordinate system.maintain_positioning (
bool
(default:False
)) – Default False (most common use case). If True, the data will be transformed but a transformation will be added so that the positioning of the data in the target coordinate system will not change. If you want to align datasets to a common coordinate system you should use the default value.
- Return type:
DataArray
|DataTree
|GeoDataFrame
|DataFrame
- Returns:
: The transformed element.
- transform_to_coordinate_system(target_coordinate_system, maintain_positioning=False)#
Transform the SpatialData to a given coordinate system.
- Parameters:
target_coordinate_system (
str
) – The target coordinate system.maintain_positioning (
bool
(default:False
)) – Default False (most common use case). If True, the data will be transformed but a transformation will be added so that the positioning of the data in the target coordinate system will not change. If you want to align datasets to a common coordinate system you should use the default value.
- Return type:
- Returns:
: The transformed SpatialData.
- static update_annotated_regions_metadata(table, region_key=None)#
Update the annotation target of the table using the region_key column in table.obs.
The table must already contain annotation metadata, e.g. the region, region_key and instance_key must already be specified for the table. If this is not the case please use TableModel.parse instead and specify the annotation metadata by passing the correct arguments to that function.
- Parameters:
table (
AnnData
) – The AnnData table for which to set the annotation target.region_key (
Optional
[str
] (default:None
)) – The column in table.obs containing the rows specifying the SpatialElements being annotated. If None the current value for region_key in the annotation metadata of the table is used. If specified but different from the current region_key, the current region_key is overwritten.
- Return type:
- Returns:
: The table for which the annotation target has been set.
- validate_table_in_spatialdata(table)#
Validate the presence of the annotation target of a SpatialData table in the SpatialData object.
This method validates a table in the SpatialData object to ensure that if annotation metadata is present, the annotation target (SpatialElement) is present in the SpatialData object, the dtypes of the instance key column in the table and the annotation target do not match. Otherwise, a warning is raised.
- Parameters:
table (
AnnData
) – The table potentially annotating a SpatialElement- Raises:
UserWarning – If the table is annotating elements not present in the SpatialData object.
UserWarning – The dtypes of the instance key column in the table and the annotation target do not match.
- Return type:
None
- write(file_path, overwrite=False, consolidate_metadata=True, format=None)#
Write the
SpatialData
object to a Zarr store.- Parameters:
file_path (
str
|Path
) – The path to the Zarr store to write to.overwrite (
bool
(default:False
)) – IfTrue
, overwrite the Zarr store if it already exists. IfFalse
,write()
will fail if the Zarr store already exists.consolidate_metadata (
bool
(default:True
)) – IfTrue
, triggerszarr.convenience.consolidate_metadata()
, which writes all the metadata in a single file at the root directory of the store. This makes the data cloud accessible, which is required for certain cloud stores (such as S3).format (
Union
[SpatialDataFormat
,list
[SpatialDataFormat
],None
] (default:None
)) – The format to use for writing the elements of theSpatialData
object. It is recommended to leave this parameter equal toNone
(default to latest format for all the elements). If notNone
, it must be either a format for an element, or a list of formats. For example it can be a subset of the following list[RasterFormatVXX(), ShapesFormatVXX(), PointsFormatVXX(), TablesFormatVXX()]
. (XX denote the version number, and should be replaced with the respective format; the version numbers can differ across elements). By default, the latest format is used for all elements, i.e.CurrentRasterFormat
,CurrentShapesFormat
,CurrentPointsFormat
,CurrentTablesFormat
.
- Return type:
None
- write_channel_names(element_name=None)#
Write channel names to disk for a single image element, or for all image elements, without rewriting the data.
- Parameters:
element_name (
Optional
[str
] (default:None
)) – The name of the element to write the channel names of. If None, write the channel names of all image elements.- Return type:
None
- write_element(element_name, overwrite=False, format=None)#
Write a single element, or a list of elements, to the Zarr store used for backing.
The element must already be present in the SpatialData object.
- Parameters:
element_name (
str
|list
[str
]) – The name(s) of the element(s) to write.overwrite (
bool
(default:False
)) – If True, overwrite the element if it already exists.format (
Union
[SpatialDataFormat
,list
[SpatialDataFormat
],None
] (default:None
)) –- It is recommended to leave this parameter equal to
None
. See more details in the documentation of SpatialData.write()
.
- It is recommended to leave this parameter equal to
- Return type:
None
Notes
If you pass a list of names, the elements will be written one by one. If an error occurs during the writing of an element, the writing of the remaining elements will not be attempted.
- write_metadata(element_name=None, consolidate_metadata=None, write_attrs=True)#
Write the metadata of a single element, or of all elements, to the Zarr store, without rewriting the data.
Currently only the transformations and the consolidated metadata can be re-written without re-writing the data.
Future versions of SpatialData will support writing the following metadata without requiring a rewrite of the data:
.uns[‘spatialdata_attrs’] metadata for AnnData;
.attrs[‘spatialdata_attrs’] metadata for DaskDataFrame;
OMERO metadata for the channel name of images.
- Parameters:
element_name (
Optional
[str
] (default:None
)) – The name of the element to write. If None, write the metadata of all elements.consolidate_metadata (
Optional
[bool
] (default:None
)) – If True, consolidate the metadata to more easily support remote reading. By default write the metadata only if the metadata was already consolidated.
- Return type:
None
Notes
When using the methods
write()
andwrite_element()
, the metadata is written automatically.
- write_transformations(element_name=None)#
Write transformations to disk for a single element, or for all elements, without rewriting the data.
- Parameters:
element_name (
Optional
[str
] (default:None
)) – The name of the element to write. If None, write the transformations of all elements.- Return type:
None
- property attrs: dict[Any, Any]#
Dictionary of global attributes on this SpatialData object.
Notes
Operations on SpatialData objects such as
subset()
,query()
, …, will pass the.attrs
by reference. If you want to modify the.attrs
without affecting the original object, you should either usecopy.deepcopy(sdata.attrs)
or eventually copy the SpatialData object usingspatialdata.deepcopy()
.
- property images: Images#
Return images as a Dict of name to image data.
- property labels: Labels#
Return labels as a Dict of name to label data.
- property path: Path | None#
Path to the Zarr storage.
- property points: Points#
Return points as a Dict of name to point data.
- property query: QueryManager#
An accessor to the query operations.
Examples
>>> sdata.query.bounding_box_query(...) >>> sdata.query.polygon_query(...)
- property shapes: Shapes#
Return shapes as a Dict of name to shape data.
- property table: None | AnnData#
Return table with name table from tables if it exists.
- Returns:
The table.
- property tables: Tables#
Return tables dictionary.
- Returns:
dict[str, AnnData] Either the empty dictionary or a dictionary with as values the strings representing the table names and as values the AnnData tables themselves.