Operations#
Operations on SpatialData objects.
- spatialdata.bounding_box_query(element, axes, min_coordinate, max_coordinate, target_coordinate_system, return_request_only=False, filter_table=True, **kwargs)#
- spatialdata.bounding_box_query(sdata, axes, min_coordinate, max_coordinate, target_coordinate_system, filter_table=True)
- spatialdata.bounding_box_query(image, axes, min_coordinate, max_coordinate, target_coordinate_system, return_request_only=False)
- spatialdata.bounding_box_query(image, axes, min_coordinate, max_coordinate, target_coordinate_system, return_request_only=False)
- spatialdata.bounding_box_query(points, axes, min_coordinate, max_coordinate, target_coordinate_system)
- spatialdata.bounding_box_query(polygons, axes, min_coordinate, max_coordinate, target_coordinate_system)
Query a SpatialData object or SpatialElement within a bounding box.
This function can also be accessed as a method of a
SpatialDataobject, viasdata.query.bounding_box(...), without specifyingelement.- Parameters:
element (
DataArray|DataTree|GeoDataFrame|DataFrame|SpatialData) – The SpatialElement or SpatialData object to query.axes (
tuple[str,...]) – The axesmin_coordinateandmax_coordinaterefer to.min_coordinate (
list[int|float] |ndarray[tuple[Any,...],dtype[floating[Any]]]) – The upper left hand corners of the bounding boxes (i.e., minimum coordinates along all dimensions). Shape: (n_boxes, n_axes) or (n_axes,) for a single box.max_coordinate (
list[int|float] |ndarray[tuple[Any,...],dtype[floating[Any]]]) – The lower right hand corners of the bounding boxes (i.e., the maximum coordinates along all dimensions). Shape: (n_boxes, n_axes)target_coordinate_system (
str) – The coordinate system the bounding box is defined in.filter_table (
bool(default:True)) – IfTrue, the table is filtered to only contain rows that are annotating regions contained within the bounding box.return_request_only (
bool(default:False)) – IfTrue, the function returns the bounding box coordinates in the target coordinate system. Only valid withDataArrayandDataTreeelements.
- Return type:
DataArray|DataTree|GeoDataFrame|DataFrame|SpatialData|None- Returns:
: The SpatialData object or SpatialElement containing the requested data. Eventual empty Elements are omitted by the SpatialData object.
Notes
If the object has
pointselement, depending on the number of points, it MAY suffer from performance issues. Please consider filtering the object before calling this function by calling thesubset()method ofSpatialData.
- spatialdata.polygon_query(element, polygon, target_coordinate_system, filter_table=True, clip=False)#
- spatialdata.polygon_query(sdata, polygon, target_coordinate_system, filter_table=True, clip=False)
- spatialdata.polygon_query(image, polygon, target_coordinate_system, return_request_only=False, **kwargs)
- spatialdata.polygon_query(image, polygon, target_coordinate_system, return_request_only=False, **kwargs)
- spatialdata.polygon_query(points, polygon, target_coordinate_system, **kwargs)
- spatialdata.polygon_query(element, polygon, target_coordinate_system, clip=False, **kwargs)
Query a SpatialData object or a SpatialElement by a polygon or multipolygon.
This function can also be accessed as a method of a
SpatialDataobject, viasdata.query.polygon(...), without specifyingelement.- Parameters:
element (
DataArray|DataTree|GeoDataFrame|DataFrame|SpatialData) – The SpatialElement or SpatialData object to query.polygon (
Polygon|MultiPolygon) – The polygon/multipolygon to query by.target_coordinate_system (
str) – The coordinate system of the polygon/multipolygon.filter_table (
bool(default:True)) – Specifies whether to filter the tables to only include tables that annotate elements in the retrieved SpatialData object of the query.clip (
bool(default:False)) – IfTrue, the shapes are clipped to the polygon. This behavior is implemented only when querying polygons/multipolygons or circles, and it is ignored for other types of elements (images, labels, points). Importantly, when clipping is enabled, the circles will be converted to polygons before the clipping. This may affect downstream operations that rely on the circle radius or on performance, so it is recommended to disable clipping when querying circles or when querying aSpatialDataobject that contains circles.
- Return type:
DataArray|DataTree|GeoDataFrame|DataFrame|SpatialData|None- Returns:
: The queried SpatialData object or SpatialElement containing the requested data. Eventual empty Elements are omitted by the SpatialData object.
Examples
Here is an example for multipolygon use case. If you have a sequence of polygons/multipolygons, in particular a GeoDataFrame, and you want to query the data that belongs to any one of these shapes, you can call this function to the multipolygon obtained by merging all the polygons. To merge you can use a unary union.
- spatialdata.get_values(value_key, element=None, sdata=None, element_name=None, table_name=None, table_layer=None, return_obsm_as_is=False)#
Get the values from the element, from any location: df columns, obs or var columns (table).
- Parameters:
value_key (
str|list[str]) – Name of the column/channel name to get the values fromelement (
DataArray|DataTree|GeoDataFrame|DataFrame|AnnData|None(default:None)) – SpatialElement object or AnnData table; either element or (sdata, element_name) must be providedsdata (
SpatialData|None(default:None)) – SpatialData object; either element or (sdata, element_name) must be providedelement_name (
str|None(default:None)) – Name of the element; either element or (sdata, element_name) must be provided. In case of element being an AnnData table, element_name can also be provided to subset the AnnData table to only include those rows annotating the element_name.table_name (
str|None(default:None)) – Name of the table to get the values from.table_layer (
str|None(default:None)) – Layer of the table to get the values from. If None, the values are taken from X.return_obsm_as_is (
bool(default:False)) – In case the value is in obsm the value of the key can be returned as is if return_obsm_as_is is True, otherwise creates a dataframe and returns it.
- Return type:
- Returns:
: DataFrame with the values requested.
Notes
The index of the returned dataframe is the instance_key of the table for the specified element.
If the element is a labels, the eventual background (0) is not included in the dataframe of returned values.
- spatialdata.get_element_instances(element, return_background=False)#
- spatialdata.get_element_instances(element, return_background=False)
- spatialdata.get_element_instances(element, return_background=False)
- spatialdata.get_element_instances(element)
- spatialdata.get_element_instances(element)
Get the instances (index values) of the SpatialElement.
- Parameters:
element (
DataArray|DataTree|GeoDataFrame|DataFrame) – The SpatialElement.return_background (
bool(default:False)) – If True, the background label (0) is included in the output.
- Return type:
Index- Returns:
: pd.Series with the instances (index values) of the SpatialElement.
- spatialdata.get_extent(e, coordinate_system='global', exact=True, has_images=True, has_labels=True, has_points=True, has_shapes=True, elements=None)#
- spatialdata.get_extent(e, coordinate_system='global', exact=True, has_images=True, has_labels=True, has_points=True, has_shapes=True, elements=None)
- spatialdata.get_extent(e, coordinate_system='global', exact=True)
- spatialdata.get_extent(e, coordinate_system='global', exact=True)
- spatialdata.get_extent(e, coordinate_system='global')
- spatialdata.get_extent(e, coordinate_system='global')
Get the extent (bounding box) of a SpatialData object or a SpatialElement.
- Parameters:
e (
SpatialData|DataArray|DataTree|GeoDataFrame|DataFrame) – The SpatialData object or SpatialElement to compute the extent of.- Return type:
dict[str,tuple[float,float]]- Returns:
: The bounding box description.
- min_coordinate
The minimum coordinate of the bounding box.
- max_coordinate
The maximum coordinate of the bounding box.
- axes
The names of the dimensions of the bounding box.
- exact
Whether the extent is computed exactly or not.
If
True, the extent is computed exactly.If
False, an approximation faster to compute is given.
The approximation is guaranteed to contain all the data, see notes for details.
- has_images
If
True, images are included in the computation of the extent.- has_labels
If
True, labels are included in the computation of the extent.- has_points
If
True, points are included in the computation of the extent.- has_shapes
If
True, shapes are included in the computation of the extent.- elements
If not
None, only the elements with the given names are included in the computation of the extent.
Notes
The extent of a
SpatialDataobject is the extent of the union of the extents of all its elements. The extent of aSpatialElementis the extent of the element in the coordinate system specified by the argumentcoordinate_system.If
exactisFalse, first the extent of theSpatialElementbefore any transformation is computed. Then, the extent is transformed to the target coordinate system. This is faster than computing the extent after the transformation, since the transformation is applied to extent of the untransformed data, as opposed to transforming the data and then computing the extent.The exact and approximate extent are the same if the transformation does not contain any rotation or shear, or in the case in which the transformation is affine but all the corners of the extent of the untransformed data (bounding box corners) are part of the dataset itself. Note that this is always the case for raster data.
An extreme case is a dataset composed of the two points
(0, 0)and(1, 1), rotated anticlockwise by 45 degrees. The exact extent is the bounding box[minx, miny, maxx, maxy] = [0, 0, 0, 1.414], while the approximate extent is the box[minx, miny, maxx, maxy] = [-0.707, 0, 0.707, 1.414].
- spatialdata.get_centroids(e, coordinate_system='global', return_background=False)#
- spatialdata.get_centroids(e, coordinate_system='global', return_background=False)
- spatialdata.get_centroids(e, coordinate_system='global', return_background=False)
- spatialdata.get_centroids(e, coordinate_system='global')
- spatialdata.get_centroids(e, coordinate_system='global')
Get the centroids of the geometries contained in a SpatialElement, as a new Points element.
- Parameters:
e (
DataArray|DataTree|GeoDataFrame|DataFrame) – The SpatialElement. Only points, shapes (circles, polygons and multipolygons) and labels are supported.coordinate_system (
str(default:'global')) – The coordinate system in which the centroids are computed.return_background (
bool(default:False)) – If True, the centroid of the background label (0) is included in the output.
- Return type:
Notes
For
Multipolygon.
- spatialdata.join_spatialelement_table(sdata=None, spatial_element_names=None, spatial_elements=None, table_name=None, table=None, how='left', match_rows='no')#
Join SpatialElement(s) and table together in SQL like manner.
The function allows the user to perform SQL like joins of SpatialElements and a table. The elements are not returned together in one dataframe-like structure, but instead filtered elements are returned. To determine matches, for the SpatialElement the index is used and for the table the region key column and instance key column. The elements are not overwritten in the
SpatialDataobject.The following joins are supported:
'left','left_exclusive','inner','right'and'right_exclusive'. In case of a'left'join the SpatialElements are returned in a dictionary as is while the table is filtered to only include matching rows. In case of'left_exclusive'join None is returned for table while the SpatialElements returned are filtered to only include indices not present in the table. The cases for'right'joins are symmetric to the'left'joins. In case of an'inner'join of SpatialElement(s) and a table, for each an element is returned only containing the rows that are present in both the SpatialElement and table.For Points and Shapes elements every valid join for argument how is supported. For Labels elements only the
'left'and'right_exclusive'joins are supported. For Labels, the background label (0) is not included in the output and it will not be returned.- Parameters:
sdata (
SpatialData|None(default:None)) – SpatialData object containing all the elements and tables. This parameter can beNone; in such case the both the names and values for the elements and the table must be provided.spatial_element_names (
str|list[str] |None(default:None)) –- Required. The name(s) of the spatial elements to be joined with the table. If a list of names, and if sdata is
None, the indices must match with the list of SpatialElements passed on by the argument elements.
spatial_elements (
DataArray|DataTree|GeoDataFrame|DataFrame|list[DataArray|DataTree|GeoDataFrame|DataFrame] |None(default:None)) – This parameter should be speficied exactly whensdataisNone. The SpatialElement(s) to be joined with the table. In case of a list of SpatialElements the indices must match exactly with the indices in the list ofspatial_element_name.table_name (
str|None(default:None)) – The name of the table to join with the spatial elements. Optional,tablecan be provided instead.table (
AnnData|None(default:None)) – The table to join with the spatial elements. Whensdatais notNone,table_namecan be used instead.how (
Literal['left','left_exclusive','inner','right','right_exclusive'] (default:'left')) – The type of SQL like join to perform, default is'left'. Options are'left','left_exclusive','inner','right'and'right_exclusive'.match_rows (
Literal['no','left','right'] (default:'no')) – Whether to match the indices of the element and table and if so how. If'left', element_indices take priority and if'right'table instance ids take priority.
- Return type:
tuple[dict[str,Any],AnnData]- Returns:
: A tuple containing the joined elements as a dictionary and the joined table as an AnnData object.
- Raises:
ValueError – If
spatial_element_namesis not provided.ValueError – If sdata is
Nonebutspatial_elementsis notNone; ifsdatais notNone, butspatial_elementsisNone.ValueError – If
table_nameis provided but not present in theSpatialDataobject, or iftable_nameis provided butsdataisNone.ValueError – If not exactly one of
table_nameandtableis provided.ValueError – If no valid elements are provided for the join operation.
ValueError – If the provided join type is not supported.
ValueError – If an incorrect value is given for
match_rows.
Notes
For a graphical representation of the join operations, see the Tables tutorial.
See also
match_element_to_tableFunction to match elements to a table.
join_spatialelement_tableFunction to join spatial elements with a table.
- spatialdata.match_element_to_table(sdata, element_name, table_name)#
Filter the elements and make the indices match those in the table.
- Parameters:
sdata (
SpatialData) – SpatialData objectelement_name (
str|list[str]) – The name(s) of the spatial elements to be joined with the table. Not supported for Label elements.table_name (
str) – The name of the table to join with the spatial elements.
- Return type:
tuple[dict[str,Any],AnnData]- Returns:
: A tuple containing the joined elements as a dictionary and the joined table as an AnnData object.
Notes
For a graphical representation of the join operations, see the Tables tutorial.
See also
match_table_to_elementFunction to match a table to a spatial element.
join_spatialelement_tableGeneral function, to join spatial elements with a table with more control.
- spatialdata.match_table_to_element(sdata, element_name, table_name='table')#
Filter the table and reorders the rows to match the instances (rows/labels) of the specified SpatialElement.
- Parameters:
sdata (
SpatialData) – SpatialData objectelement_name (
str) – The name of the spatial elements to be joined with the table.table_name (
str(default:'table')) – The name of the table to match to the element.
- Return type:
- Returns:
: Table with the rows matching the instances of the element
Notes
For a graphical representation of the join operations, see the Tables tutorial.
See also
match_element_to_tableFunction to match a spatial element to a table.
join_spatialelement_tableGeneral function, to join spatial elements with a table with more control.
- spatialdata.match_sdata_to_table(sdata, table_name, table=None, how='right')#
Filter the elements of a SpatialData object to match only the rows present in the table.
- Parameters:
sdata (
SpatialData) – SpatialData object containing all the elements and tables.table (
AnnData|None(default:None)) – The table to join with the spatial elements. Has precedence overtable_name.table_name (
str) – The name of the table to join with the SpatialData object iftableis not provided. If table is provided,table_nameis used to name the table in the returnedSpatialDataobject.how (
Literal['left','left_exclusive','inner','right','right_exclusive'] (default:'right')) – The type of join to perform. Seespatialdata.join_spatialelement_table(). Default is “right”.
- Return type:
Notes
For a graphical representation of the join operations, see the Tables tutorial.
- spatialdata.filter_by_table_query(sdata, table_name, filter_tables=True, element_names=None, obs_expr=None, var_expr=None, x_expr=None, obs_names_expr=None, var_names_expr=None, layer=None, how='right')#
Filter the SpatialData object based on a set of table queries.
- Parameters:
sdata (
SpatialData) – The SpatialData object to filter.table_name (
str) – The name of the table to filter the SpatialData object by.filter_tables (
bool(default:True)) – If True (default), the table is filtered to only contain rows that are annotating regions contained within the element_names.element_names (
list[str] |None(default:None)) – The names of the elements to filter the SpatialData object by.obs_expr (
Expr|str|Series|Iterable[Expr|str|Series] |None(default:None)) – A Predicate or an iterable ofannselPredicatesto filteranndata.AnnData.obsby.var_expr (
Expr|str|Series|Iterable[Expr|str|Series] |None(default:None)) – A Predicate or an iterable ofannselPredicatesto filteranndata.AnnData.varby.x_expr (
Expr|str|Series|Iterable[Expr|str|Series] |None(default:None)) – A Predicate or an iterable ofannselPredicatesto filteranndata.AnnData.Xby.obs_names_expr (
Expr|str|Series|Iterable[Expr|str|Series] |None(default:None)) – A Predicate or an iterable ofannselPredicatesto filteranndata.AnnData.obs_namesby.var_names_expr (
Expr|str|Series|Iterable[Expr|str|Series] |None(default:None)) – A Predicate or an iterable ofannselPredicatesto filteranndata.AnnData.var_namesby.layer (
str|None(default:None)) – The layer of theanndata.AnnDatato filter the SpatialData object by, only used withx_expr.how (
Literal['left','left_exclusive','inner','right','right_exclusive'] (default:'right')) – The type of join to perform. Seespatialdata.join_spatialelement_table(). Default is “right”.
- Return type:
- Returns:
: The filtered SpatialData object.
Notes
You can also use
spatialdata.SpatialData.filter_by_table_query()with the convenience thatsdatais the currentSpatialDataobject.For a graphical representation of the join operations, see the Tables tutorial.
For more examples on table queries, see the Table queries tutorial.
- spatialdata.concatenate(sdatas, region_key=None, instance_key=None, concatenate_tables=False, obs_names_make_unique=True, modify_tables_inplace=False, merge_coordinate_systems_on_name=False, attrs_merge=None, **kwargs)#
Concatenate a list of spatial data objects.
- Parameters:
sdatas (
Iterable[SpatialData] |dict[str,SpatialData]) – The spatial data objects to concatenate. The names of the elements across theSpatialDataobjects must be unique. If they are not unique, you can pass a dictionary with the suffixes as keys and the spatial data objects as values. This will rename the names of eachSpatialElementto ensure uniqueness of names acrossSpatialDataobjects. See more on the notes.region_key (
str|None(default:None)) – The key to use for the region column in the concatenated object. IfNoneand all region_keys are the same, theregion_keyis used.instance_key (
str|None(default:None)) – The key to use for the instance column in the concatenated object. IfNoneand all instance_keys are the same, theinstance_keyis used.concatenate_tables (
bool(default:False)) – Whether to merge the tables in case of having the same element name.obs_names_make_unique (
bool(default:True)) – Whether to make theobs_namesunique by callingAnnData.obs_names_make_unique()on each table of the concatenated object. If you passed a dictionary with the suffixes as keys and theSpatialDataobjects as values and ifconcatenate_tablesisTrue, theobs_nameswill be made unique by adding the corresponding suffix instead.modify_tables_inplace (
bool(default:False)) – Whether to modify the tables in place. IfTrue, the tables will be modified in place. IfFalse, the tables will be copied before modification. Copying is enabled by default but can be disabled for performance reasons.merge_coordinate_systems_on_name (
bool(default:False)) – Whether to keep coordinate system names unchanged (True) or add suffixes (False).attrs_merge (
Union[Literal['same','unique','first','only'],Callable[[list[dict[Any,Any]]],dict[Any,Any]],None] (default:None)) – How the elements of.attrsare selected. Uses the same set of strategies as theuns_mergeargument of [anndata.concat](https://anndata.readthedocs.io/en/latest/generated/anndata.concat.html)kwargs (
Any) – Seeanndata.concat()for more details.
- Return type:
- Returns:
: The concatenated
spatialdata.SpatialDataobject.
Notes
If you pass a dictionary with the suffixes as keys and the
SpatialDataobjects as values, the names of eachSpatialElementwill be renamed to ensure uniqueness of names acrossSpatialDataobjects by adding the corresponding suffix. To ensure the matching between existing table annotations, theregionmetadata of each table, and the values of theregion_keycolumn in each table, will be altered by adding the suffix. In addition, theobs_namesof each table will be altered (a suffix will be added). Finally, a suffix will be added to the name of each table iffrename_tablesisFalse.If you need more control in the renaming, please give us feedback, as we are still trying to find the right balance between ergonomics and control. Also, you are welcome to copy and adjust the code of
_fix_ensure_unique_element_names()directly.
- spatialdata.transform(data, transformation=None, maintain_positioning=False, to_coordinate_system=None)#
- spatialdata.transform(data, transformation=None, maintain_positioning=False, to_coordinate_system=None)
- spatialdata.transform(data, transformation=None, maintain_positioning=False, to_coordinate_system=None)
- spatialdata.transform(data, transformation=None, maintain_positioning=False, to_coordinate_system=None)
- spatialdata.transform(data, transformation=None, maintain_positioning=False, to_coordinate_system=None)
- spatialdata.transform(data, transformation=None, maintain_positioning=False, to_coordinate_system=None)
Transform a SpatialElement using the transformation to a coordinate system, and returns the transformed element.
- Parameters:
data (
Any) – SpatialElement to transform.transformation (
BaseTransformation|None(default:None)) – The transformation to apply to the element. This parameter can be used only whenmaintain_positioning=True, otherwiseto_coordinate_systemmust be used.maintain_positioning (
bool(default:False)) –The default and recommended behavior is to leave this parameter to False.
- If True, in the transformed element, each transformation that was present in the original element will be
prepended with the inverse of the transformation used to transform the data (i.e. the current transformation for which .transform() is called). In this way the data is transformed but the positioning (for each coordinate system) is maintained. A use case is changing the orientation/scale/etc. of the data but keeping the alignment of the data within each coordinate system.
- If False, the data is transformed and the positioning changes; only the coordinate system in which the
data is transformed to is kept. For raster data, the translation part of the transformation is assigned to the element (see Notes below for more details). Furthermore, for raster data, the returned object will have a translation to take into account for the pixel (0, 0) position. Also, rotated raster data will be padded in the corners with a black color, such padding will be reflected into the rotation. Please see notes for more details of how this parameter interact with xarray.DataArray for raster data.
to_coordinate_system (
str|None(default:None)) – The coordinate system to which the data should be transformed. The coordinate system must be present in the element.
- Return type:
Any- Returns:
: SpatialElement: Transformed SpatialElement.
Notes
An affine transformation contains a linear transformation and a translation. For raster types, only the linear transformation is applied to the data (e.g. the data is rotated or resized), but not the translation part. This means that calling Translation(…).transform(raster_element) will have the same effect as pre-pending the translation to each transformation of the raster element (if maintain_positioning=True), or assigning this translation to the element in the new coordinate system (if maintain_positioning=False). Analougous considerations apply to the black corner padding due to the rotation part of the transformation. We are considering to change this behavior by letting translations modify the coordinates stored with xarray.DataArray; this is tracked here: scverse/spatialdata#308
- spatialdata.rasterize(data, axes, min_coordinate, max_coordinate, target_coordinate_system, target_unit_to_pixels=None, target_width=None, target_height=None, target_depth=None, sdata=None, value_key=None, table_name=None, return_regions_as_labels=False, agg_func=None, return_single_channel=None)#
Rasterize a
SpatialDataobject or aSpatialElement(image, labels, points, shapes).- Parameters:
data (
SpatialData|DataArray|DataTree|GeoDataFrame|DataFrame|str) – TheSpatialDataobject orSpatialElementto rasterize. In alternative, the name of theSpatialElementin theSpatialDataobject, when theSpatialDataobject is passed tovalues_sdata.axes (
tuple[str,...]) – The axes thatmin_coordinateandmax_coordinaterefer to.min_coordinate (
list[int|float] |ndarray[tuple[Any,...],dtype[floating[Any]]]) – The minimum coordinates of the bounding box.max_coordinate (
list[int|float] |ndarray[tuple[Any,...],dtype[floating[Any]]]) – The maximum coordinates of the bounding box.target_coordinate_system (
str) – The coordinate system in which we define the bounding box. This will also be the coordinate system of the produced rasterized image.target_unit_to_pixels (
float|None(default:None)) – The number of pixels per unit that the target image should have. It is mandatory to specify precisely one of the following options:target_unit_to_pixels,target_width,target_height,target_depth.target_width (
float|None(default:None)) – The width of the target image in units. It is mandatory to specify precisely one of the following options:target_unit_to_pixels,target_width,target_height,target_depth.target_height (
float|None(default:None)) – The height of the target image in units. It is mandatory to specify precisely one of the following options:target_unit_to_pixels,target_width,target_height,target_depth.target_depth (
float|None(default:None)) – The depth of the target image in units. It is mandatory to specify precisely one of the following options:target_unit_to_pixels,target_width,target_height,target_depth.sdata (
SpatialData|None(default:None)) –SpatialDataobject containing the values to aggregate ifvalue_keyrefers to values from a table. Must beNonewhendatais aSpatialDataobject.value_key (
str|None(default:None)) –Name of the column containing the values to aggregate; can refer both to numerical or categorical values.
The key can be:
the name of a column(s) in the dataframe (Dask
DataFramefor points orGeoDataFramefor shapes);the name of obs column(s) in the associated
AnnDatatable (for points, shapes, and labels);the name of a var(s), referring to the column(s) of the X matrix in the table (for points, shapes, and labels).
See the notes for more details on the default behavior. Must be
Nonewhendatais aSpatialDataobject.table_name (
str|None(default:None)) – The table optionally containing thevalue_keyand the name of the table in the returnedSpatialDataobject. Must beNonewhendatais aSpatialDataobject, otherwise it assumes the default value of'table'.return_regions_as_labels (
bool(default:False)) – By default, single-scale images of shape(c, y, x)are returned. IfTrue, returns labels, shapes and points as labels of shape(y, x)as opposed to an image of shape(c, y, x). Images are always returned as images, and multiscale raster data is always returned as single-scale data.agg_func (
str|Reduction|None(default:None)) – Available only when rasterizing points and shapes. A reduction function from datashader (its name, or aCallable). See the notes for more details on the default behavior. Must beNonewhendatais aSpatialDataobject.return_single_channel (
bool|None(default:None)) – Only used when rasterizing points and shapes and whenvalue_keyrefers to a categorical column. IfFalse, each category will be rasterized in a separate channel.
- Return type:
- Returns:
: The rasterized
SpatialDataobject or SpatialData supportedDataArray. EachSpatialElementwill be rasterized into aDataArray(not aDataTree). So if aSpatialDataobject with elements is passed, aSpatialDataobject with single-scale images and labels will be returned.When
return_regions_as_labelsisTrue, the returnedDataArrayobject will have an attribute calledlabel_index_to_categorythat maps the label index to the category name. You can access it viareturned_data.attrs["label_index_to_category"]. The returned labels will start from 1 (0 is reserved for the background), and will be contiguous.
Notes
For images and labels, the parameters
value_key,table_name,agg_func, andreturn_single_channelare not used.Instead, when rasterizing shapes and points, the following table clarifies the default datashader reduction used for various combinations of parameters.
In particular, the first two rows refer to the default behavior when the parameters (
value_key, ‘table_name’,returned_single_channel,agg_func) are kept to their default values.value_key
Shapes or Points
return_single_chan
datashader reduct.
table_name
None*
Point (default)
NA
count
‘table’
None**
Shapes (default)
True
first
‘table’
None**
Shapes
False
count_cat
‘table’
category
NA
True
first
‘table’
category
NA
False
count_cat
‘table’
int/float
NA
NA
sum
‘table’
Explicitly, the default behaviors are as follows.
for points, each pixel counts the number of points belonging to it, (the
countfunction is applied to an artificial column of ones);for shapes, each pixel gets a single index among the ones of the shapes that intersect it (the index of the shapes is interpreted as a categorical column and then the
firstfunction is used).
- spatialdata.rasterize_bins(sdata, bins, table_name, col_key, row_key, value_key=None, return_region_as_labels=False)#
Rasterizes grid-like binned shapes/points annotated by a table (e.g. Visium HD data).
- Parameters:
sdata (
SpatialData) – The spatial data object containing the grid-like binned element to be rasterized.bins (
str) – The name SpatialElement which defines the grid-like bins.table_name (
str) – The name of the table annotating the SpatialElement.col_key (
str) – Name of a column insdata[table_name].obscontaining the column indices (integer) for the bins.row_key (
str) – Name of a column insdata[table_name].obscontaining the row indices (integer) for the bins.value_key (
str|list[str] |None(default:None)) – The key(s) (obs columns/var names) in the table that will be used to rasterize the bins. IfNone, all the var names will be used, and the returned object will be lazily constructed. Ignored ifreturn_region_as_labelsisTrue.return_regions_as_labels – If
Falsethis function returns axarray.DataArrayof shape(c, y, x)with dimension ofcequal to the number of key(s) specified invalue_key, or the number of var names intable_nameifvalue_keyisNone. IfTrue, will return labels of shape(y, x), where each bin of thebinselement will be represented as a pixel. The table by default will not be set to annotate the new rasterized labels; this can be achieved using the helper functionspatialdata.rasterize_bins_link_table_to_labels().
- Return type:
- Returns:
: A spatial image object created by rasterizing the specified bins from the spatial data.
Notes
Before calling this function you should ensure that the data geometries are organized in grid-like bins (e.g. Visium HD data, but not Visium data). Also you should ensure that bin indices (integer) are defined in the
.obsdataframe of the table associated with the spatial geometries. If variables fromtable.Xare being rasterized (typically, gene counts), then the table should be acsc_matrixmatrix (this can be done by callingsdata[table_name].X = sdata[table_name].X.tocsc()).The returned image will have one pixel for each bin, and a coordinate transformation to map the image to the original data orientation. In particular, the bins of Visium HD data are in a grid that is slightly rotated; the coordinate transformation will adjust for this, so that the returned data is aligned to the original geometries.
If
spatialdata-plotis used to visualized the returned image, the parameterscale='full'needs to be passed to.render_shapes(), to disable an automatic rasterization that would confict with the rasterization performed here.
- spatialdata.rasterize_bins_link_table_to_labels(sdata, table_name, rasterized_labels_name)#
Change the annotation target of the table to the rasterized labels.
This function should be called after having rasterized the bins (calling
rasterize_bins()withreturn_regions_as_labels=True) and after having added the rasterized labels to the spatial data object.- Parameters:
sdata (
SpatialData) – The spatial data object containing the rasterized labels.table_name (
str) – The name of the table to be annotated.rasterized_labels_name (
str) – The name of the rasterized labels in the spatial data object.
- Return type:
None
- spatialdata.to_circles(data, radius=None)#
- spatialdata.to_circles(element, **kwargs)
- spatialdata.to_circles(element, **kwargs)
- spatialdata.to_circles(element, **kwargs)
- spatialdata.to_circles(element, radius=None)
Convert a set of geometries (2D/3D labels, 2D shapes) to approximated circles/spheres.
- Parameters:
data (
DataArray|DataTree|GeoDataFrame|DataFrame) – The SpatialElement representing the geometries to approximate as circles/spheres.radius (
float|ndarray[tuple[Any,...],dtype[floating[Any]]] |None(default:None)) –- Radius/radii for the circles. For points elements, radius can either be specified as an argument, or be a column
of the dataframe. For non-points elements, radius must be
None.
- Return type:
- Returns:
: The approximated circles/spheres.
Notes
The approximation is done by computing the centroids and the area/volume of the geometries. The geometries are then replaced by circles/spheres with the same centroids and area/volume.
- spatialdata.to_polygons(data, buffer_resolution=None)#
- spatialdata.to_polygons(element, **kwargs)
- spatialdata.to_polygons(element, **kwargs)
- spatialdata.to_polygons(gdf, buffer_resolution=16)
- spatialdata.to_polygons(element, **kwargs)
Convert a set of geometries (2D labels, 2D shapes) to approximated 2D polygons/multypolygons.
For optimal performance when converting rasters (
xarray.DataArrayordatatree.DataTree) to polygons, it is recommended to configureDaskto use ‘processes’ rather than ‘threads’. For example, you can set this configuration with:>>> import dask >>> dask.config.set(scheduler='processes')
- Parameters:
data (
DataArray|DataTree|GeoDataFrame|DataFrame) – The SpatialElement representing the geometries to approximate as 2D polygons/multipolygons.buffer_resolution (
int|None(default:None)) – Used only when constructing polygons from circles. Value of theresolutionparement for thebuffer()internal call.
- Return type:
- Returns:
: The approximated 2D polygons/multipolygons in the specified coordinate system.
- spatialdata.aggregate(values, by, values_sdata=None, by_sdata=None, value_key=None, agg_func='sum', target_coordinate_system='global', fractions=False, region_key='region', instance_key='instance_id', deepcopy=True, table_name=None, buffer_resolution=16, **kwargs)#
Aggregate values by given region.
- Parameters:
values_sdata (
SpatialData|None(default:None)) – SpatialData object containing the values to aggregate: ifNone,valuesmust be a SpatialElement; if notNone,valuesmust be a string.values (
DataFrame|GeoDataFrame|DataArray|DataTree|str) – The values to aggregate: ifvalues_sdataisNone, must be a SpatialElement, otherwise must be a string specifying the name of the SpatialElement invalues_sdataby_sdata (
SpatialData|None(default:None)) – Regions to aggregate by: ifNone,bymust be a SpatialElement; if notNone,bymust be a string.by (
GeoDataFrame|DataArray|DataTree|str) – The regions to aggregate by: ifby_sdatais None, must be a SpatialElement, otherwise must be a string specifying the name of the SpatialElement inby_sdatavalue_key (
list[str] |str|None(default:None)) –Name (or list of names) of the columns containing the values to aggregate; can refer both to numerical or categorical values. If the values are categorical,
value_keycan’t be a list.The key can be:
the name of a column(s) in the dataframe (Dask
DataFramefor points orGeoDataFramefor shapes);the name of obs column(s) in the associated
AnnDatatable (for points, shapes and labels);the name of a var(s), referring to the column(s) of the X matrix in the table (for points, shapes and labels).
If nothing is passed here, it defaults to the equivalent of a column of ones. Defaults to
FEATURE_KEYfor points (if present).agg_func (
str|list[str] (default:'sum')) – Aggregation function to apply over point values, e.g."mean","sum","count". Passed topandas.DataFrame.groupby.agg()or toxrspatial.zonal_stats()according to the type ofvalues.target_coordinate_system (
str(default:'global')) – Coordinate system to transform to before aggregating.fractions (
bool(default:False)) –Adjusts for partial areas overlap between regions in
valuesandby. More precisely: in the case in which a region inbypartially overlaps with a region invalues, this setting specifies whether the value to aggregate should be considered as it is (fractions = False) or it is to be multiplied by the following ratio: “area of the intersection between the two regions” / “area of the region invalues”.Additional details:
default is
fractions = False.- when aggregating points this parameter must be left to
False, as the points don’t have area (otherwise a table of zeros would be obtained);
- when aggregating points this parameter must be left to
- for categorical values
"count"and"sum"are equivalent whenfractions = False, but when fractions = True,"count"and"sum"are different:countwould give not meaningful results and so it’s not allowed, while"sum"actually sums the values of the intersecting regions, and should therefore be used.
- for categorical values
aggregating categorical values with
agg_func = "mean"is not allowed as it give not meaningful results.
region_key (
str(default:'region')) – Name that will be given to the new region column in the returned aggregated table.instance_key (
str(default:'instance_id')) – Name that will be given to the new instance id column in the returned aggregated table.deepcopy (
bool(default:True)) – Whether to deepcopy the shapes in the returnedSpatialDataobject. If the shapes are large (e.g. large multiscale labels), you may consider disabling the deepcopy to use a lazy Dask representation.table_name (
str|None(default:None)) – The table optionally containing thevalue_keyand the name of the table in the returnedSpatialDataobject.buffer_resolution (
int(default:16)) – Resolution parameter to pass to the of the .buffer() method to convert circles to polygons. A higher value results in a more accurate representation of the circle, but also in a more complex polygon and computation.kwargs (
Any) – Additional keyword arguments to pass toxrspatial.zonal_stats().
- Return type:
- Returns:
: Returns a
SpatialDataobject with thebyshapes as SpatialElement and a table with the aggregated values annotating the shapes.If
value_keyrefers to a categorical variable, the table in theSpaitalDataobject has shape (by.shape[0], <n categories>).
Notes
This function returns a
SpatialDataobject, so to access the aggregated table you can use thetableattribute`.The shapes in the returned
SpatialDataobjects are a reference to the original one. If you want them to be a different object you can do a deepcopy manually (this loads the data into memory), or you can save theSpatialDataobject to disk and reload it (this keeps the data lazily represented).When aggregation points by shapes, the current implementation loads all the points into memory and thus could lead to a large memory usage. This Github issue scverse/spatialdata#210 keeps track of the changes required to address this behavior.
- spatialdata.map_raster(data, func, func_kwargs=mappingproxy({}), blockwise=True, depth=None, chunks=None, c_coords=None, dims=None, transformations=None, relabel=True, **kwargs)#
Apply a callable to raster data.
Applies a
funccallable to raster data. Ifblockwiseis set toTrue, distributed processing will be achieved with:dask.array.map_overlap()ifdepthis notNonedask.array.map_blocks(), ifdepthisNone
otherwise
funcis applied to the full data.- Parameters:
data (
DataArray|DataTree) – The data to process. It can be axarray.DataArrayordatatree.DataTree. If it’s aDataTree, the callable is applied to the first scale (scale0, the full-resolution data).func (
Callable[[Array],Array]) – The callable that is applied to the data.func_kwargs (
Mapping[str,Any] (default:mappingproxy({}))) – Additional keyword arguments to pass to the callablefunc.blockwise (
bool(default:True)) – IfTrue,funcwill be distributed withdask.array.map_overlap()ordask.array.map_blocks(), otherwisefuncis applied to the full data. IfFalse,depth,chunksandkwargsare ignored.depth (
int|tuple[int,...] |dict[int,int] |None(default:None)) – Specifies the overlap between chunks, i.e. the number of elements that each chunk should share with its neighboring chunks. If notNone, distributed processing will be achieved withdask.array.map_overlap(), otherwise withdask.array.map_blocks().chunks (
tuple[tuple[int,...],...] |None(default:None)) – Chunk shape of resulting blocks if the callable does not preserve the data shape. For example, if the input block hasshape: (3,100,100)and the resulting block after themap_rastercall hasshape: (1, 100,100), the argumentchunksshould be passed accordingly. Passed todask.array.map_overlap()ordask.array.map_blocks(). Ignored ifblockwiseisFalse.c_coords (
Iterable[int] |Iterable[str] |None(default:None)) – The channel coordinates for the output data. If not provided, the channel coordinates of the input data are used. If the callablefuncis expected to change the number of channel coordinates, this argument should be provided, otherwise will default torange(len(output_coords)).dims (
tuple[str,...] |None(default:None)) – The dimensions of the output data. If not provided, the dimensions of the input data are used. It must be specified if the callable changes the data dimensions, e.g.('c', 'y', 'x') -> ('y', 'x').transformations (
dict[str,Any] |None(default:None)) – The transformations of the output data. If not provided, the transformations of the input data are copied to the output data. It should be specified if the callable changes the data transformations.relabel (
bool(default:True)) – Whether to relabel the blocks of the output data. This option is ignored when the output data is not a labels layer (i.e., whendimsdoes not containc). It is recommended to enable relabeling iffuncreturns labels that are not unique across chunks. Relabeling will be done by performing a bit shift. When a cell or entity to be labeled is split between two adjacent chunks, the current implementation does not assign the same label across blocks. See scverse/spatialdata#664 for discussion.kwargs (
Any) – Additional keyword arguments to pass todask.array.map_overlap()ordask.array.map_blocks(). Ignored ifblockwiseis set toFalse.
- Return type:
- Returns:
: The processed data as a
xarray.DataArray.
- spatialdata.unpad_raster(raster)#
Remove padding from a raster type that was eventually added by the rotation component of a transformation.
- spatialdata.relabel_sequential(arr)#
Relabels integers in a Dask array sequentially.
This function assigns sequential labels to the integers in a Dask array starting from 1. For example, if the unique values in the input array are [0, 9, 5], they will be relabeled to [0, 1, 2] respectively. Note that currently if a cell or entity to be labeled is split across adjacent chunks the same label is not assigned to the cell across blocks. See discussion scverse/spatialdata#664.
- spatialdata.are_extents_equal(extent0, extent1, atol=0.1)#
Check if two data extents, as returned by
get_extent()are equal up to approximation errors.- Parameters:
extent0 (
dict[str,tuple[float,float]]) – The first data extent.extent1 (
dict[str,tuple[float,float]]) – The second data extent.atol (
float(default:0.1)) – The absolute tolerance to use when comparing the extents.
- Return type:
bool- Returns:
: Whether the extents are equal or not.
Notes
The default value of
atolis currently high because of a bug ofrasterize()that makes the extent of the rasterized data slightly different from the extent of the original data. This bug is tracked in scverse/spatialdata#165
- spatialdata.deepcopy(element)#
- spatialdata.deepcopy(sdata)
- spatialdata.deepcopy(element)
- spatialdata.deepcopy(element)
- spatialdata.deepcopy(gdf)
- spatialdata.deepcopy(df)
- spatialdata.deepcopy(adata)
Deepcopy a SpatialData or SpatialElement object.
Deepcopy will load the data in memory. Using this function for large Dask-backed objects is discouraged. In that case, please save the SpatialData object to a different disk location and read it back again.
- Parameters:
element (
SpatialData|DataArray|DataTree|GeoDataFrame|DataFrame|AnnData) – The SpatialData or SpatialElement object to deepcopy- Return type:
SpatialData|DataArray|DataTree|GeoDataFrame|DataFrame|AnnData- Returns:
: A deepcopy of the SpatialData or SpatialElement object
Notes
The order of the columns for a deepcopied points element may be differ from the original one, please see more here: scverse/spatialdata#486
- spatialdata.get_pyramid_levels(image, attr=None, n=None)#
Access the data/attribute of the pyramid levels of a multiscale spatial image.
- Parameters:
image (
DataTree) – The multiscale spatial image.attr (
str|None(default:None)) – IfNone, return the data of the pyramid level as aDataArray, if not None, return the specified attribute within theDataArraydata.n (
int|None(default:None)) – If not None, return only thenpyramid level.
- Return type:
list[Any] |Any- Returns:
: The pyramid levels data (or an attribute of it) as a list or a generator.
- spatialdata.sanitize_name(name, is_dataframe_column=False)#
Sanitize a name to comply with SpatialData naming rules.
This function converts invalid names into valid ones by: 1. Converting to string if not already 2. Removing invalid characters 3. Handling special cases like “__” prefix 4. Ensuring the name is not empty 5. Handling special cases for dataframe columns
See a discussion on the naming rules, and how to avoid naming collisions, here: scverse/spatialdata#707
- Parameters:
name (
str) – The name to sanitizeis_dataframe_column (
bool(default:False)) – Whether this name is for a dataframe column (additional restrictions apply)
- Return type:
str- Returns:
: A sanitized version of the name that complies with SpatialData naming rules. If a santized name cannoted be generated, it returns “unnamed”.
Examples
>>> sanitize_name("my@invalid#name") 'my_invalid_name' >>> sanitize_name("__private") 'private' >>> sanitize_name("_index", is_dataframe_column=True) 'index'
- spatialdata.sanitize_table(data, inplace=True)#
Sanitize all keys in an AnnData table to comply with SpatialData naming rules.
This function sanitizes all keys in obs, var, obsm, obsp, varm, varp, uns, and layers while maintaining case-insensitive uniqueness. It can either modify the table in-place or return a new sanitized copy.
See a discussion on the naming rules here: scverse/spatialdata#707
- Parameters:
data (
AnnData) – The AnnData table to sanitizeinplace (
bool(default:True)) – Whether to modify the table in-place or return a new copy
- Return type:
AnnData|None- Returns:
: If inplace is False, returns a new AnnData object with sanitized keys. If inplace is True, returns None as the original object is modified.
Examples
>>> import anndata as ad >>> adata = ad.AnnData(obs=pd.DataFrame({"@invalid#": [1, 2]})) >>> # Create a new sanitized copy >>> sanitized = sanitize_table(adata) >>> print(sanitized.obs.columns) Index(['invalid_'], dtype='object') >>> # Or modify in-place >>> sanitize_table(adata, inplace=True) >>> print(adata.obs.columns) Index(['invalid_'], dtype='object')