Operations

Operations#

Operations on SpatialData objects.

spatialdata.bounding_box_query(element, axes, min_coordinate, max_coordinate, target_coordinate_system, return_request_only=False, filter_table=True, **kwargs)#

spatialdata.bounding_box_query(sdata, axes, min_coordinate, max_coordinate, target_coordinate_system, filter_table=True)

spatialdata.bounding_box_query(image, axes, min_coordinate, max_coordinate, target_coordinate_system, return_request_only=False)

spatialdata.bounding_box_query(points, axes, min_coordinate, max_coordinate, target_coordinate_system)

spatialdata.bounding_box_query(polygons, axes, min_coordinate, max_coordinate, target_coordinate_system)

Query a SpatialData object or SpatialElement within a bounding box.

This function can also be accessed as a method of a SpatialData object, via sdata.query.bounding_box(...), without specifying element.

Parameters:

element (DataArray | DataTree | GeoDataFrame | DataFrame | SpatialData) – The SpatialElement or SpatialData object to query.
axes (tuple[str, ...]) – The axes min_coordinate and max_coordinate refer to.
min_coordinate (list[int | float] | ndarray[tuple[Any, ...], dtype[floating[Any]]]) – The upper left hand corners of the bounding boxes (i.e., minimum coordinates along all dimensions). Shape: (n_boxes, n_axes) or (n_axes,) for a single box.
max_coordinate (list[int | float] | ndarray[tuple[Any, ...], dtype[floating[Any]]]) – The lower right hand corners of the bounding boxes (i.e., the maximum coordinates along all dimensions). Shape: (n_boxes, n_axes)
target_coordinate_system (str) – The coordinate system the bounding box is defined in.
filter_table (bool (default: True)) – If True, the table is filtered to only contain rows that are annotating regions contained within the bounding box.
return_request_only (bool (default: False)) – If True, the function returns the bounding box coordinates in the target coordinate system. Only valid with DataArray and DataTree elements.

Return type:

Returns:

: The SpatialData object or SpatialElement containing the requested data. Eventual empty Elements are omitted by the SpatialData object.

Notes

If the object has points element, depending on the number of points, it MAY suffer from performance issues. Please consider filtering the object before calling this function by calling the subset() method of SpatialData.

spatialdata.polygon_query(element, polygon, target_coordinate_system, filter_table=True, clip=False)#

spatialdata.polygon_query(sdata, polygon, target_coordinate_system, filter_table=True, clip=False)

spatialdata.polygon_query(image, polygon, target_coordinate_system, return_request_only=False, **kwargs)

spatialdata.polygon_query(points, polygon, target_coordinate_system, **kwargs)

spatialdata.polygon_query(element, polygon, target_coordinate_system, clip=False, **kwargs)

Query a SpatialData object or a SpatialElement by a polygon or multipolygon.

This function can also be accessed as a method of a SpatialData object, via sdata.query.polygon(...), without specifying element.

Parameters:

element (DataArray | DataTree | GeoDataFrame | DataFrame | SpatialData) – The SpatialElement or SpatialData object to query.
polygon (Polygon | MultiPolygon) – The polygon/multipolygon to query by.
target_coordinate_system (str) – The coordinate system of the polygon/multipolygon.
filter_table (bool (default: True)) – Specifies whether to filter the tables to only include tables that annotate elements in the retrieved SpatialData object of the query.
clip (bool (default: False)) – If True, the shapes are clipped to the polygon. This behavior is implemented only when querying polygons/multipolygons or circles, and it is ignored for other types of elements (images, labels, points). Importantly, when clipping is enabled, the circles will be converted to polygons before the clipping. This may affect downstream operations that rely on the circle radius or on performance, so it is recommended to disable clipping when querying circles or when querying a SpatialData object that contains circles.

Return type:

Returns:

: The queried SpatialData object or SpatialElement containing the requested data. Eventual empty Elements are omitted by the SpatialData object.

Examples

Here is an example for multipolygon use case. If you have a sequence of polygons/multipolygons, in particular a GeoDataFrame, and you want to query the data that belongs to any one of these shapes, you can call this function to the multipolygon obtained by merging all the polygons. To merge you can use a unary union.

spatialdata.get_values(value_key, element=None, sdata=None, element_name=None, table_name=None, table_layer=None, return_obsm_as_is=False)#

Get the values from the element, from any location: df columns, obs or var columns (table).

Parameters:

value_key (str | list[str]) – Name of the column/channel name to get the values from
element (DataArray | DataTree | GeoDataFrame | DataFrame | AnnData | None (default: None)) – SpatialElement object or AnnData table; either element or (sdata, element_name) must be provided
sdata (SpatialData | None (default: None)) – SpatialData object; either element or (sdata, element_name) must be provided
element_name (str | None (default: None)) – Name of the element; either element or (sdata, element_name) must be provided. In case of element being an AnnData table, element_name can also be provided to subset the AnnData table to only include those rows annotating the element_name.
table_name (str | None (default: None)) – Name of the table to get the values from.
table_layer (str | None (default: None)) – Layer of the table to get the values from. If None, the values are taken from X.
return_obsm_as_is (bool (default: False)) – In case the value is in obsm the value of the key can be returned as is if return_obsm_as_is is True, otherwise creates a dataframe and returns it.

Return type:

DataFrame | ndarray[tuple[Any, ...], dtype[floating[Any]]]

Returns:

: DataFrame with the values requested.

Notes

The index of the returned dataframe is the instance_key of the table for the specified element.
If the element is a labels, the eventual background (0) is not included in the dataframe of returned values.

spatialdata.get_element_instances(element, return_background=False)#

spatialdata.get_element_instances(element, return_background=False)

spatialdata.get_element_instances(element)

Get the instances (index values) of the SpatialElement.

Parameters:

element (DataArray | DataTree | GeoDataFrame | DataFrame) – The SpatialElement.
return_background (bool (default: False)) – If True, the background label (0) is included in the output.

Return type:

Index

Returns:

: pd.Series with the instances (index values) of the SpatialElement.

spatialdata.get_extent(e, coordinate_system='global', exact=True, has_images=True, has_labels=True, has_points=True, has_shapes=True, elements=None)#

spatialdata.get_extent(e, coordinate_system='global', exact=True, has_images=True, has_labels=True, has_points=True, has_shapes=True, elements=None)

spatialdata.get_extent(e, coordinate_system='global', exact=True)

spatialdata.get_extent(e, coordinate_system='global')

Get the extent (bounding box) of a SpatialData object or a SpatialElement.

Parameters:

e (SpatialData | DataArray | DataTree | GeoDataFrame | DataFrame) – The SpatialData object or SpatialElement to compute the extent of.

Return type:

dict[str, tuple[float, float]]

Returns:

: The bounding box description.

min_coordinate

The minimum coordinate of the bounding box.

max_coordinate

The maximum coordinate of the bounding box.

axes

The names of the dimensions of the bounding box.

exact

Whether the extent is computed exactly or not.

If True, the extent is computed exactly.

If False, an approximation faster to compute is given.

The approximation is guaranteed to contain all the data, see notes for details.

has_images

If True, images are included in the computation of the extent.

has_labels

If True, labels are included in the computation of the extent.

has_points

If True, points are included in the computation of the extent.

has_shapes

If True, shapes are included in the computation of the extent.

elements

If not None, only the elements with the given names are included in the computation of the extent.

Notes

The extent of a SpatialData object is the extent of the union of the extents of all its elements. The extent of a SpatialElement is the extent of the element in the coordinate system specified by the argument coordinate_system.

If exact is False, first the extent of the SpatialElement before any transformation is computed. Then, the extent is transformed to the target coordinate system. This is faster than computing the extent after the transformation, since the transformation is applied to extent of the untransformed data, as opposed to transforming the data and then computing the extent.

The exact and approximate extent are the same if the transformation does not contain any rotation or shear, or in the case in which the transformation is affine but all the corners of the extent of the untransformed data (bounding box corners) are part of the dataset itself. Note that this is always the case for raster data.

An extreme case is a dataset composed of the two points (0, 0) and (1, 1), rotated anticlockwise by 45 degrees. The exact extent is the bounding box [minx, miny, maxx, maxy] = [0, 0, 0, 1.414], while the approximate extent is the box [minx, miny, maxx, maxy] = [-0.707, 0, 0.707, 1.414].

spatialdata.get_centroids(e, coordinate_system='global', return_background=False)#

spatialdata.get_centroids(e, coordinate_system='global', return_background=False)

spatialdata.get_centroids(e, coordinate_system='global')

Get the centroids of the geometries contained in a SpatialElement, as a new Points element.

Parameters:

e (DataArray | DataTree | GeoDataFrame | DataFrame) – The SpatialElement. Only points, shapes (circles, polygons and multipolygons) and labels are supported.
coordinate_system (str (default: 'global')) – The coordinate system in which the centroids are computed.
return_background (bool (default: False)) – If True, the centroid of the background label (0) is included in the output.

Return type:

DataFrame

Notes

For Multipolygon.

spatialdata.join_spatialelement_table(sdata=None, spatial_element_names=None, spatial_elements=None, table_name=None, table=None, how='left', match_rows='no', filter_label_pixels=None)#

Join SpatialElement(s) and table together in SQL like manner.

The function allows the user to perform SQL like joins of SpatialElements and a table. The elements are not returned together in one dataframe-like structure, but instead filtered elements are returned. To determine matches, for the SpatialElement the index is used and for the table the region key column and instance key column. The elements are not overwritten in the SpatialData object.

The following joins are supported: 'left', 'left_exclusive', 'inner', 'right' and 'right_exclusive'. In case of a 'left' join the SpatialElements are returned in a dictionary as is while the table is filtered to only include matching rows. In case of 'left_exclusive' join None is returned for table while the SpatialElements returned are filtered to only include indices not present in the table. The cases for 'right' joins are symmetric to the 'left' joins. In case of an 'inner' join of SpatialElement(s) and a table, for each an element is returned only containing the rows that are present in both the SpatialElement and table.

For Points and Shapes elements every valid join for argument how is supported. For Labels elements only the 'left' and 'right_exclusive' joins are supported. For Labels, the background label (0) is not included in the output and it will not be returned.

Parameters:

sdata (SpatialData | None (default: None)) – SpatialData object containing all the elements and tables. This parameter can be None; in such case the both the names and values for the elements and the table must be provided.
spatial_element_names (str | list[str] | None (default: None)) –

Required. The name(s) of the spatial elements to be joined with the table. If a list of names, and if sdata is
None, the indices must match with the list of SpatialElements passed on by the argument elements.
spatial_elements (DataArray | DataTree | GeoDataFrame | DataFrame | list[DataArray | DataTree | GeoDataFrame | DataFrame] | None (default: None)) – This parameter should be speficied exactly when sdata is None. The SpatialElement(s) to be joined with the table. In case of a list of SpatialElements the indices must match exactly with the indices in the list of spatial_element_name.
table_name (str | None (default: None)) – The name of the table to join with the spatial elements. Optional, table can be provided instead.
table (AnnData | None (default: None)) – The table to join with the spatial elements. When sdata is not None, table_name can be used instead.
how (Literal['left', 'left_exclusive', 'inner', 'right', 'right_exclusive'] (default: 'left')) – The type of SQL like join to perform, default is 'left'. Options are 'left', 'left_exclusive', 'inner', 'right' and 'right_exclusive'.
match_rows (Literal['no', 'left', 'right'] (default: 'no')) – Whether to match the indices of the element and table and if so how. If 'left', element_indices take priority and if 'right' table instance ids take priority.
filter_label_pixels (bool | None (default: None)) – Controls pixel-level filtering of label elements for 'right' and 'inner' joins. If True, pixels whose instance id is not present in the table are set to zero. If None (default), label elements are returned unfiltered and a warning is issued. If False, label elements are returned unfiltered silently (no warning).

Return type:

tuple[dict[str, Any], AnnData]

Returns:

: A tuple containing the joined elements as a dictionary and the joined table as an AnnData object.

Raises:

ValueError – If spatial_element_names is not provided.
ValueError – If sdata is None but spatial_elements is not None; if sdata is not None, but spatial_elements is None.
ValueError – If table_name is provided but not present in the SpatialData object, or if table_name is provided but sdata is None.
ValueError – If not exactly one of table_name and table is provided.
ValueError – If no valid elements are provided for the join operation.
ValueError – If the provided join type is not supported.
ValueError – If an incorrect value is given for match_rows.

Notes

For a graphical representation of the join operations, see the Tables tutorial.

See also

match_element_to_table: Function to match a spatial element to a table.
join_spatialelement_table: General function, to join spatial elements with a table with more control.

spatialdata.match_sdata_to_table(sdata, table_name, table=None, how='right', filter_label_pixels=None)#

Filter the elements of a SpatialData object to match only the rows present in the table.

Parameters:

sdata (SpatialData) – SpatialData object containing all the elements and tables.
table (AnnData | None (default: None)) – The table to join with the spatial elements. Has precedence over table_name.
table_name (str) – The name of the table to join with the SpatialData object if table is not provided. If table is provided, table_name is used to name the table in the returned SpatialData object.
how (Literal['left', 'left_exclusive', 'inner', 'right', 'right_exclusive'] (default: 'right')) – The type of join to perform. See spatialdata.join_spatialelement_table(). Default is “right”.
filter_label_pixels (bool | None (default: None)) – Controls pixel-level filtering of label elements. True filters pixels, None (default) leaves them unfiltered and warns, False leaves them unfiltered silently. See spatialdata.join_spatialelement_table() for details.

Return type:

SpatialData

Notes

For a graphical representation of the join operations, see the Tables tutorial.

spatialdata.filter_by_table_query(sdata, table_name, filter_tables=True, element_names=None, obs_expr=None, var_expr=None, x_expr=None, obs_names_expr=None, var_names_expr=None, layer=None, how='right', filter_label_pixels=None)#

Filter the SpatialData object based on a set of table queries.

Parameters:

sdata (SpatialData) – The SpatialData object to filter.
table_name (str) – The name of the table to filter the SpatialData object by.
filter_tables (bool (default: True)) – If True (default), the table is filtered to only contain rows that are annotating regions contained within the element_names.
element_names (list[str] | None (default: None)) – The names of the elements to filter the SpatialData object by.
obs_expr (Predicates | None (default: None)) – A Predicate or an iterable of annsel Predicates to filter anndata.AnnData.obs by.
var_expr (Predicates | None (default: None)) – A Predicate or an iterable of annsel Predicates to filter anndata.AnnData.var by.
x_expr (Predicates | None (default: None)) – A Predicate or an iterable of annsel Predicates to filter anndata.AnnData.X by.
obs_names_expr (Predicates | None (default: None)) – A Predicate or an iterable of annsel Predicates to filter anndata.AnnData.obs_names by.
var_names_expr (Predicates | None (default: None)) – A Predicate or an iterable of annsel Predicates to filter anndata.AnnData.var_names by.
layer (str | None (default: None)) – The layer of the anndata.AnnData to filter the SpatialData object by, only used with x_expr.
how (Literal['left', 'left_exclusive', 'inner', 'right', 'right_exclusive'] (default: 'right')) – The type of join to perform. See spatialdata.join_spatialelement_table(). Default is “right”.
filter_label_pixels (bool | None (default: None)) – Controls pixel-level filtering of label elements. True filters pixels, None (default) leaves them unfiltered and warns, False leaves them unfiltered silently. See spatialdata.join_spatialelement_table() for details.

Return type:

SpatialData

Returns:

: The filtered SpatialData object.

Notes

You can also use spatialdata.SpatialData.filter_by_table_query() with the convenience that sdata is the current SpatialData object.

For a graphical representation of the join operations, see the Tables tutorial.

For more examples on table queries, see the Table queries tutorial.

spatialdata.concatenate(sdatas, region_key=None, instance_key=None, concatenate_tables=False, obs_names_make_unique=True, modify_tables_inplace=False, merge_coordinate_systems_on_name=False, attrs_merge=None, **kwargs)#

Concatenate a list of spatial data objects.

Parameters:

sdatas (Iterable[SpatialData] | dict[str, SpatialData]) – The spatial data objects to concatenate. The names of the elements across the SpatialData objects must be unique. If they are not unique, you can pass a dictionary with the suffixes as keys and the spatial data objects as values. This will rename the names of each SpatialElement to ensure uniqueness of names across SpatialData objects. See more on the notes.
region_key (str | None (default: None)) – The key to use for the region column in the concatenated object. If None and all region_keys are the same, the region_key is used.
instance_key (str | None (default: None)) – The key to use for the instance column in the concatenated object. If None and all instance_keys are the same, the instance_key is used.
concatenate_tables (bool (default: False)) – Whether to merge the tables in case of having the same element name.
obs_names_make_unique (bool (default: True)) – Whether to make the obs_names unique by calling AnnData.obs_names_make_unique() on each table of the concatenated object. If you passed a dictionary with the suffixes as keys and the SpatialData objects as values and if concatenate_tables is True, the obs_names will be made unique by adding the corresponding suffix instead.
modify_tables_inplace (bool (default: False)) – Whether to modify the tables in place. If True, the tables will be modified in place. If False, the tables will be copied before modification. Copying is enabled by default but can be disabled for performance reasons.
merge_coordinate_systems_on_name (bool (default: False)) – Whether to keep coordinate system names unchanged (True) or add suffixes (False).
attrs_merge (Union[Literal['same', 'unique', 'first', 'only'], Callable[[list[dict[Any, Any]]], dict[Any, Any]], None] (default: None)) – How the elements of .attrs are selected. Uses the same set of strategies as the uns_merge argument of [anndata.concat](https://anndata.readthedocs.io/en/latest/generated/anndata.concat.html)
kwargs (Any) – See anndata.concat() for more details.

Return type:

SpatialData

Returns:

: The concatenated spatialdata.SpatialData object.

Notes

If you pass a dictionary with the suffixes as keys and the SpatialData objects as values, the names of each SpatialElement will be renamed to ensure uniqueness of names across SpatialData objects by adding the corresponding suffix. To ensure the matching between existing table annotations, the region metadata of each table, and the values of the region_key column in each table, will be altered by adding the suffix. In addition, the obs_names of each table will be altered (a suffix will be added). Finally, a suffix will be added to the name of each table iff rename_tables is False.

If you need more control in the renaming, please give us feedback, as we are still trying to find the right balance between ergonomics and control. Also, you are welcome to copy and adjust the code of _fix_ensure_unique_element_names() directly.

spatialdata.transform(data, transformation=None, maintain_positioning=False, to_coordinate_system=None)#

spatialdata.transform(data, transformation=None, maintain_positioning=False, to_coordinate_system=None)

Transform a SpatialElement using the transformation to a coordinate system, and returns the transformed element.

Parameters:

data (Any) – SpatialElement to transform.
transformation (BaseTransformation | None (default: None)) – The transformation to apply to the element. This parameter can be used only when maintain_positioning=True, otherwise to_coordinate_system must be used.
maintain_positioning (bool (default: False)) –
The default and recommended behavior is to leave this parameter to False.
- If True, in the transformed element, each transformation that was present in the original element will be
  prepended with the inverse of the transformation used to transform the data (i.e. the current transformation for which .transform() is called). In this way the data is transformed but the positioning (for each coordinate system) is maintained. A use case is changing the orientation/scale/etc. of the data but keeping the alignment of the data within each coordinate system.
- If False, the data is transformed and the positioning changes; only the coordinate system in which the
  data is transformed to is kept. For raster data, the translation part of the transformation is assigned to the element (see Notes below for more details). Furthermore, for raster data, the returned object will have a translation to take into account for the pixel (0, 0) position. Also, rotated raster data will be padded in the corners with a black color, such padding will be reflected into the rotation. Please see notes for more details of how this parameter interact with xarray.DataArray for raster data.
to_coordinate_system (str | None (default: None)) – The coordinate system to which the data should be transformed. The coordinate system must be present in the element.

Return type:

Any

Returns:

: SpatialElement: Transformed SpatialElement.

Notes

An affine transformation contains a linear transformation and a translation. For raster types, only the linear transformation is applied to the data (e.g. the data is rotated or resized), but not the translation part. This means that calling Translation(…).transform(raster_element) will have the same effect as pre-pending the translation to each transformation of the raster element (if maintain_positioning=True), or assigning this translation to the element in the new coordinate system (if maintain_positioning=False). Analougous considerations apply to the black corner padding due to the rotation part of the transformation. We are considering to change this behavior by letting translations modify the coordinates stored with xarray.DataArray; this is tracked here: scverse/spatialdata#308

spatialdata.rasterize(data, axes, min_coordinate, max_coordinate, target_coordinate_system, target_unit_to_pixels=None, target_width=None, target_height=None, target_depth=None, sdata=None, value_key=None, table_name=None, return_regions_as_labels=False, agg_func=None, return_single_channel=None)#

Rasterize a SpatialData object or a SpatialElement (image, labels, points, shapes).

Parameters:

data (SpatialData | DataArray | DataTree | GeoDataFrame | DataFrame | str) – The SpatialData object or SpatialElement to rasterize. In alternative, the name of the SpatialElement in the SpatialData object, when the SpatialData object is passed to values_sdata.
axes (tuple[str, ...]) – The axes that min_coordinate and max_coordinate refer to.
min_coordinate (list[int | float] | ndarray[tuple[Any, ...], dtype[floating[Any]]]) – The minimum coordinates of the bounding box.
max_coordinate (list[int | float] | ndarray[tuple[Any, ...], dtype[floating[Any]]]) – The maximum coordinates of the bounding box.
target_coordinate_system (str) – The coordinate system in which we define the bounding box. This will also be the coordinate system of the produced rasterized image.
target_unit_to_pixels (float | None (default: None)) – The number of pixels per unit that the target image should have. It is mandatory to specify precisely one of the following options: target_unit_to_pixels, target_width, target_height, target_depth.
target_width (float | None (default: None)) – The width of the target image in units. It is mandatory to specify precisely one of the following options: target_unit_to_pixels, target_width, target_height, target_depth.
target_height (float | None (default: None)) – The height of the target image in units. It is mandatory to specify precisely one of the following options: target_unit_to_pixels, target_width, target_height, target_depth.
target_depth (float | None (default: None)) – The depth of the target image in units. It is mandatory to specify precisely one of the following options: target_unit_to_pixels, target_width, target_height, target_depth.
sdata (SpatialData | None (default: None)) – SpatialData object containing the values to aggregate if value_key refers to values from a table. Must be None when data is a SpatialData object.
value_key (str | None (default: None)) –
Name of the column containing the values to aggregate; can refer both to numerical or categorical values.

The key can be:
- the name of a column(s) in the dataframe (Dask DataFrame for points or GeoDataFrame for shapes);
- the name of obs column(s) in the associated AnnData table (for points, shapes, and labels);
- the name of a var(s), referring to the column(s) of the X matrix in the table (for points, shapes, and labels).
See the notes for more details on the default behavior. Must be None when data is a SpatialData object.
table_name (str | None (default: None)) – The table optionally containing the value_key and the name of the table in the returned SpatialData object. Must be None when data is a SpatialData object, otherwise it assumes the default value of 'table'.
return_regions_as_labels (bool (default: False)) – By default, single-scale images of shape (c, y, x) are returned. If True, returns labels, shapes and points as labels of shape (y, x) as opposed to an image of shape (c, y, x). Images are always returned as images, and multiscale raster data is always returned as single-scale data.
agg_func (str | Reduction | None (default: None)) – Available only when rasterizing points and shapes. A reduction function from datashader (its name, or a Callable). See the notes for more details on the default behavior. Must be None when data is a SpatialData object.
return_single_channel (bool | None (default: None)) – Only used when rasterizing points and shapes and when value_key refers to a categorical column. If False, each category will be rasterized in a separate channel.

Return type:

SpatialData | DataArray

Returns:

: The rasterized SpatialData object or SpatialData supported DataArray. Each SpatialElement will be rasterized into a DataArray (not a DataTree). So if a SpatialData object with elements is passed, a SpatialData object with single-scale images and labels will be returned.

When return_regions_as_labels is True, the returned DataArray object will have an attribute called label_index_to_category that maps the label index to the category name. You can access it via returned_data.attrs["label_index_to_category"]. The returned labels will start from 1 (0 is reserved for the background), and will be contiguous.

Notes

For images and labels, the parameters value_key, table_name, agg_func, and return_single_channel are not used.

Instead, when rasterizing shapes and points, the following table clarifies the default datashader reduction used for various combinations of parameters.

In particular, the first two rows refer to the default behavior when the parameters (value_key, ‘table_name’, returned_single_channel, agg_func) are kept to their default values.

value_key	Shapes or Points	return_single_chan	datashader reduct.	table_name
None*	Point (default)	NA	count	‘table’
None**	Shapes (default)	True	first	‘table’
None**	Shapes	False	count_cat	‘table’
category	NA	True	first	‘table’
category	NA	False	count_cat	‘table’
int/float	NA	NA	sum	‘table’

Explicitly, the default behaviors are as follows.

for points, each pixel counts the number of points belonging to it, (the count function is applied to an artificial column of ones);
for shapes, each pixel gets a single index among the ones of the shapes that intersect it (the index of the shapes is interpreted as a categorical column and then the first function is used).

spatialdata.rasterize_bins(sdata, bins, table_name, col_key, row_key, value_key=None, return_region_as_labels=False)#

Rasterizes grid-like binned shapes/points annotated by a table (e.g. Visium HD data).

Parameters:

sdata (SpatialData) – The spatial data object containing the grid-like binned element to be rasterized.
bins (str) – The name SpatialElement which defines the grid-like bins.
table_name (str) – The name of the table annotating the SpatialElement.
col_key (str) – Name of a column in sdata[table_name].obs containing the column indices (integer) for the bins.
row_key (str) – Name of a column in sdata[table_name].obs containing the row indices (integer) for the bins.
value_key (str | list[str] | None (default: None)) – The key(s) (obs columns/var names) in the table that will be used to rasterize the bins. If None, all the var names will be used, and the returned object will be lazily constructed. Ignored if return_region_as_labels is True.
return_regions_as_labels – If False this function returns a xarray.DataArray of shape (c, y, x) with dimension of c equal to the number of key(s) specified in value_key, or the number of var names in table_name if value_key is None. If True, will return labels of shape (y, x), where each bin of the bins element will be represented as a pixel. The table by default will not be set to annotate the new rasterized labels; this can be achieved using the helper function spatialdata.rasterize_bins_link_table_to_labels().

Return type:

DataArray

Returns:

: A spatial image object created by rasterizing the specified bins from the spatial data.

Notes

Before calling this function you should ensure that the data geometries are organized in grid-like bins (e.g. Visium HD data, but not Visium data). Also you should ensure that bin indices (integer) are defined in the .obs dataframe of the table associated with the spatial geometries. If variables from table.X are being rasterized (typically, gene counts), then the table should be a csc_matrix matrix (this can be done by calling sdata[table_name].X = sdata[table_name].X.tocsc()).

The returned image will have one pixel for each bin, and a coordinate transformation to map the image to the original data orientation. In particular, the bins of Visium HD data are in a grid that is slightly rotated; the coordinate transformation will adjust for this, so that the returned data is aligned to the original geometries.

If spatialdata-plot is used to visualized the returned image, the parameter scale='full' needs to be passed to .render_shapes(), to disable an automatic rasterization that would confict with the rasterization performed here.

spatialdata.rasterize_bins_link_table_to_labels(sdata, table_name, rasterized_labels_name)#

Change the annotation target of the table to the rasterized labels.

This function should be called after having rasterized the bins (calling rasterize_bins() with return_regions_as_labels=True) and after having added the rasterized labels to the spatial data object.

Parameters:

sdata (SpatialData) – The spatial data object containing the rasterized labels.
table_name (str) – The name of the table to be annotated.
rasterized_labels_name (str) – The name of the rasterized labels in the spatial data object.

Return type:

None

spatialdata.to_circles(data, radius=None)#

spatialdata.to_circles(element, **kwargs)

spatialdata.to_circles(element, radius=None)

Convert a set of geometries (2D/3D labels, 2D shapes) to approximated circles/spheres.

Parameters:

data (DataArray | DataTree | GeoDataFrame | DataFrame) – The SpatialElement representing the geometries to approximate as circles/spheres.
radius (float | ndarray[tuple[Any, ...], dtype[floating[Any]]] | None (default: None)) –

Radius/radii for the circles. For points elements, radius can either be specified as an argument, or be a column
of the dataframe. For non-points elements, radius must be None.

Return type:

GeoDataFrame

Returns:

: The approximated circles/spheres.

Notes

The approximation is done by computing the centroids and the area/volume of the geometries. The geometries are then replaced by circles/spheres with the same centroids and area/volume.

spatialdata.to_polygons(data, buffer_resolution=None)#

spatialdata.to_polygons(element, **kwargs)

spatialdata.to_polygons(gdf, buffer_resolution=16)

spatialdata.to_polygons(element, **kwargs)

Convert a set of geometries (2D labels, 2D shapes) to approximated 2D polygons/multypolygons.

For optimal performance when converting rasters (xarray.DataArray or datatree.DataTree) to polygons, it is recommended to configure Dask to use ‘processes’ rather than ‘threads’. For example, you can set this configuration with:

>>> import dask
>>> dask.config.set(scheduler='processes')

Parameters:

data (DataArray | DataTree | GeoDataFrame | DataFrame) – The SpatialElement representing the geometries to approximate as 2D polygons/multipolygons.
buffer_resolution (int | None (default: None)) – Used only when constructing polygons from circles. Value of the resolution parement for the buffer() internal call.

Return type:

GeoDataFrame

Returns:

: The approximated 2D polygons/multipolygons in the specified coordinate system.

spatialdata.aggregate(values, by, values_sdata=None, by_sdata=None, value_key=None, agg_func='sum', target_coordinate_system='global', fractions=False, region_key='region', instance_key='instance_id', deepcopy=True, table_name=None, buffer_resolution=16, **kwargs)#

Aggregate values by given region.

Parameters:

values_sdata (SpatialData | None (default: None)) – SpatialData object containing the values to aggregate: if None, values must be a SpatialElement; if not None, values must be a string.
values (DataFrame | GeoDataFrame | DataArray | DataTree | str) – The values to aggregate: if values_sdata is None, must be a SpatialElement, otherwise must be a string specifying the name of the SpatialElement in values_sdata
by_sdata (SpatialData | None (default: None)) – Regions to aggregate by: if None, by must be a SpatialElement; if not None, by must be a string.
by (GeoDataFrame | DataArray | DataTree | str) – The regions to aggregate by: if by_sdata is None, must be a SpatialElement, otherwise must be a string specifying the name of the SpatialElement in by_sdata
value_key (list[str] | str | None (default: None)) –
Name (or list of names) of the columns containing the values to aggregate; can refer both to numerical or categorical values. If the values are categorical, value_key can’t be a list.

The key can be:
- the name of a column(s) in the dataframe (Dask DataFrame for points or GeoDataFrame for shapes);
- the name of obs column(s) in the associated AnnData table (for points, shapes and labels);
- the name of a var(s), referring to the column(s) of the X matrix in the table (for points, shapes and labels).
If nothing is passed here, it defaults to the equivalent of a column of ones. Defaults to FEATURE_KEY for points (if present).
agg_func (str | list[str] (default: 'sum')) – Aggregation function to apply over point values, e.g. "mean", "sum", "count". Passed to pandas.DataFrame.groupby.agg() or to xrspatial.zonal_stats() according to the type of values.
target_coordinate_system (str (default: 'global')) – Coordinate system to transform to before aggregating.
fractions (bool (default: False)) –
Adjusts for partial areas overlap between regions in values and by. More precisely: in the case in which a region in by partially overlaps with a region in values, this setting specifies whether the value to aggregate should be considered as it is (fractions = False) or it is to be multiplied by the following ratio: “area of the intersection between the two regions” / “area of the region in values”.

Additional details:
- default is fractions = False.
- when aggregating points this parameter must be left to False, as the points don’t have area (otherwise
  a table of zeros would be obtained);
- for categorical values "count" and "sum" are equivalent when fractions = False, but when
  fractions = True, "count" and "sum" are different: count would give not meaningful results and so it’s not allowed, while "sum" actually sums the values of the intersecting regions, and should therefore be used.
- aggregating categorical values with agg_func = "mean" is not allowed as it give not meaningful results.
region_key (str (default: 'region')) – Name that will be given to the new region column in the returned aggregated table.
instance_key (str (default: 'instance_id')) – Name that will be given to the new instance id column in the returned aggregated table.
deepcopy (bool (default: True)) – Whether to deepcopy the shapes in the returned SpatialData object. If the shapes are large (e.g. large multiscale labels), you may consider disabling the deepcopy to use a lazy Dask representation.
table_name (str | None (default: None)) – The table optionally containing the value_key and the name of the table in the returned SpatialData object.
buffer_resolution (int (default: 16)) – Resolution parameter to pass to the of the .buffer() method to convert circles to polygons. A higher value results in a more accurate representation of the circle, but also in a more complex polygon and computation.
kwargs (Any) – Additional keyword arguments to pass to xrspatial.zonal_stats().

Return type:

SpatialData

Returns:

: Returns a SpatialData object with the by shapes as SpatialElement and a table with the aggregated values annotating the shapes.

If value_key refers to a categorical variable, the table in the SpaitalData object has shape (by.shape[0], <n categories>).

Notes

This function returns a SpatialData object, so to access the aggregated table you can use the table attribute`.

The shapes in the returned SpatialData objects are a reference to the original one. If you want them to be a different object you can do a deepcopy manually (this loads the data into memory), or you can save the SpatialData object to disk and reload it (this keeps the data lazily represented).

When aggregation points by shapes, the current implementation loads all the points into memory and thus could lead to a large memory usage. This Github issue scverse/spatialdata#210 keeps track of the changes required to address this behavior.

spatialdata.map_raster(data, func, func_kwargs=mappingproxy({}), blockwise=True, depth=None, chunks=None, c_coords=None, dims=None, transformations=None, relabel=True, **kwargs)#

Apply a callable to raster data.

Applies a func callable to raster data. If blockwise is set to True, distributed processing will be achieved with:

dask.array.map_overlap() if depth is not None

dask.array.map_blocks(), if depth is None

otherwise func is applied to the full data.

Parameters:

data (DataArray | DataTree) – The data to process. It can be a xarray.DataArray or datatree.DataTree. If it’s a DataTree, the callable is applied to the first scale (scale0, the full-resolution data).
func (Callable[[Array], Array]) – The callable that is applied to the data.
func_kwargs (Mapping[str, Any] (default: mappingproxy({}))) – Additional keyword arguments to pass to the callable func.
blockwise (bool (default: True)) – If True, func will be distributed with dask.array.map_overlap() or dask.array.map_blocks(), otherwise func is applied to the full data. If False, depth, chunks and kwargs are ignored.
depth (int | tuple[int, ...] | dict[int, int] | None (default: None)) – Specifies the overlap between chunks, i.e. the number of elements that each chunk should share with its neighboring chunks. If not None, distributed processing will be achieved with dask.array.map_overlap(), otherwise with dask.array.map_blocks().
chunks (tuple[tuple[int, ...], ...] | None (default: None)) – Chunk shape of resulting blocks if the callable does not preserve the data shape. For example, if the input block has shape: (3,100,100) and the resulting block after the map_raster call has shape: (1, 100,100), the argument chunks should be passed accordingly. Passed to dask.array.map_overlap() or dask.array.map_blocks(). Ignored if blockwise is False.
c_coords (Iterable[int] | Iterable[str] | None (default: None)) – The channel coordinates for the output data. If not provided, the channel coordinates of the input data are used. If the callable func is expected to change the number of channel coordinates, this argument should be provided, otherwise will default to range(len(output_coords)).
dims (tuple[str, ...] | None (default: None)) – The dimensions of the output data. If not provided, the dimensions of the input data are used. It must be specified if the callable changes the data dimensions, e.g. ('c', 'y', 'x') -> ('y', 'x').
transformations (dict[str, Any] | None (default: None)) – The transformations of the output data. If not provided, the transformations of the input data are copied to the output data. It should be specified if the callable changes the data transformations.
relabel (bool (default: True)) – Whether to relabel the blocks of the output data. This option is ignored when the output data is not a labels layer (i.e., when dims does not contain c). It is recommended to enable relabeling if func returns labels that are not unique across chunks. Relabeling will be done by performing a bit shift. When a cell or entity to be labeled is split between two adjacent chunks, the current implementation does not assign the same label across blocks. See scverse/spatialdata#664 for discussion.
kwargs (Any) – Additional keyword arguments to pass to dask.array.map_overlap() or dask.array.map_blocks(). Ignored if blockwise is set to False.

Return type:

DataArray

Returns:

: The processed data as a xarray.DataArray.

spatialdata.unpad_raster(raster)#

Remove padding from a raster type that was eventually added by the rotation component of a transformation.

Parameters:

raster (DataArray | DataTree) – The raster to unpad. Contiguous zero values are considered padding.

Return type:

DataArray | DataTree

Returns:

: The unpadded raster.

spatialdata.relabel_sequential(arr)#

Relabels integers in a Dask array sequentially.

This function assigns sequential labels to the integers in a Dask array starting from 1. For example, if the unique values in the input array are [0, 9, 5], they will be relabeled to [0, 1, 2] respectively. Note that currently if a cell or entity to be labeled is split across adjacent chunks the same label is not assigned to the cell across blocks. See discussion scverse/spatialdata#664.

Parameters:

arr (Array) – input array.

Return type:

Array

Returns:

: The relabeled array.

spatialdata.are_extents_equal(extent0, extent1, atol=0.1)#

Check if two data extents, as returned by get_extent() are equal up to approximation errors.

Parameters:

extent0 (dict[str, tuple[float, float]]) – The first data extent.
extent1 (dict[str, tuple[float, float]]) – The second data extent.
atol (float (default: 0.1)) – The absolute tolerance to use when comparing the extents.

Return type:

bool

Returns:

: Whether the extents are equal or not.

Notes

The default value of atol is currently high because of a bug of rasterize() that makes the extent of the rasterized data slightly different from the extent of the original data. This bug is tracked in scverse/spatialdata#165

spatialdata.deepcopy(element)#

spatialdata.deepcopy(sdata)

spatialdata.deepcopy(element)

spatialdata.deepcopy(gdf)

spatialdata.deepcopy(df)

spatialdata.deepcopy(adata)

Deepcopy a SpatialData or SpatialElement object.

Deepcopy will load the data in memory. Using this function for large Dask-backed objects is discouraged. In that case, please save the SpatialData object to a different disk location and read it back again.

Parameters:

Return type:

Returns:

: A deepcopy of the SpatialData or SpatialElement object

Notes

The order of the columns for a deepcopied points element may be differ from the original one, please see more here: scverse/spatialdata#486

spatialdata.get_pyramid_levels(image, attr=None, n=None)#

Access the data/attribute of the pyramid levels of a multiscale spatial image.

Parameters:

image (DataTree) – The multiscale spatial image.
attr (str | None (default: None)) – If None, return the data of the pyramid level as a DataArray, if not None, return the specified attribute within the DataArray data.
n (int | None (default: None)) – If not None, return only the n pyramid level.

Return type:

list[Any] | Any

Returns:

: The pyramid levels data (or an attribute of it) as a list or a generator.

spatialdata.sanitize_name(name, is_dataframe_column=False)#

Sanitize a name to comply with SpatialData naming rules.

This function converts invalid names into valid ones by: 1. Converting to string if not already 2. Removing invalid characters 3. Handling special cases like “__” prefix 4. Ensuring the name is not empty 5. Handling special cases for dataframe columns

See a discussion on the naming rules, and how to avoid naming collisions, here: scverse/spatialdata#707

Parameters:

name (str) – The name to sanitize
is_dataframe_column (bool (default: False)) – Whether this name is for a dataframe column (additional restrictions apply)

Return type:

str

Returns:

: A sanitized version of the name that complies with SpatialData naming rules. If a santized name cannoted be generated, it returns “unnamed”.

Examples

>>> sanitize_name("my@invalid#name")
'my_invalid_name'
>>> sanitize_name("__private")
'private'
>>> sanitize_name("_index", is_dataframe_column=True)
'index'

spatialdata.sanitize_table(data, inplace=True)#

Sanitize all keys in an AnnData table to comply with SpatialData naming rules.

This function sanitizes all keys in obs, var, obsm, obsp, varm, varp, uns, and layers while maintaining case-insensitive uniqueness. It can either modify the table in-place or return a new sanitized copy.

See a discussion on the naming rules here: scverse/spatialdata#707

Parameters:

data (AnnData) – The AnnData table to sanitize
inplace (bool (default: True)) – Whether to modify the table in-place or return a new copy

Return type:

AnnData | None

Returns:

: If inplace is False, returns a new AnnData object with sanitized keys. If inplace is True, returns None as the original object is modified.

Examples

>>> import anndata as ad
>>> adata = ad.AnnData(obs=pd.DataFrame({"@invalid#": [1, 2]}))
>>> # Create a new sanitized copy
>>> sanitized = sanitize_table(adata)
>>> print(sanitized.obs.columns)
Index(['invalid_'], dtype='object')
>>> # Or modify in-place
>>> sanitize_table(adata, inplace=True)
>>> print(adata.obs.columns)
Index(['invalid_'], dtype='object')

Operations

Contents

Operations#