Operations#
Operations on SpatialData
objects.
- spatialdata.bounding_box_query(element, axes, min_coordinate, max_coordinate, target_coordinate_system, return_request_only=False, filter_table=True, **kwargs)#
Query a SpatialData object or SpatialElement within a bounding box.
This function can also be accessed as a method of a
SpatialData
object, viasdata.query.bounding_box(...)
, without specifyingelement
.- Parameters:
element (
DataArray
|DataTree
|GeoDataFrame
|DataFrame
|SpatialData
) – The SpatialElement or SpatialData object to query.axes (
tuple
[str
,...
]) – The axesmin_coordinate
andmax_coordinate
refer to.min_coordinate (
list
[int
|float
] |ndarray
[Any
,dtype
[floating
[Any
]]]) – The upper left hand corners of the bounding boxes (i.e., minimum coordinates along all dimensions). Shape: (n_boxes, n_axes) or (n_axes,) for a single box.max_coordinate (
list
[int
|float
] |ndarray
[Any
,dtype
[floating
[Any
]]]) – The lower right hand corners of the bounding boxes (i.e., the maximum coordinates along all dimensions). Shape: (n_boxes, n_axes)target_coordinate_system (
str
) – The coordinate system the bounding box is defined in.filter_table (
bool
(default:True
)) – IfTrue
, the table is filtered to only contain rows that are annotating regions contained within the bounding box.return_request_only (
bool
(default:False
)) – IfTrue
, the function returns the bounding box coordinates in the target coordinate system. Only valid withDataArray
andDataTree
elements.
- Return type:
DataArray
|DataTree
|GeoDataFrame
|DataFrame
|SpatialData
|None
- Returns:
: The SpatialData object or SpatialElement containing the requested data. Eventual empty Elements are omitted by the SpatialData object.
Notes
If the object has
points
element, depending on the number of points, it MAY suffer from performance issues. Please consider filtering the object before calling this function by calling thesubset()
method ofSpatialData
.
- spatialdata.polygon_query(element, polygon, target_coordinate_system, filter_table=True, clip=False, shapes=True, points=True, images=True, labels=True)#
Query a SpatialData object or a SpatialElement by a polygon or multipolygon.
This function can also be accessed as a method of a
SpatialData
object, viasdata.query.polygon(...)
, without specifyingelement
.- Parameters:
element (
DataArray
|DataTree
|GeoDataFrame
|DataFrame
|SpatialData
) – The SpatialElement or SpatialData object to query.polygon (
Polygon
|MultiPolygon
) – The polygon/multipolygon to query by.target_coordinate_system (
str
) – The coordinate system of the polygon/multipolygon.filter_table (
bool
(default:True
)) – Specifies whether to filter the tables to only include tables that annotate elements in the retrieved SpatialData object of the query.clip (
bool
(default:False
)) – IfTrue
, the shapes are clipped to the polygon. This behavior is implemented only when querying polygons/multipolygons or circles, and it is ignored for other types of elements (images, labels, points). Importantly, when clipping is enabled, the circles will be converted to polygons before the clipping. This may affect downstream operations that rely on the circle radius or on performance, so it is recommended to disable clipping when querying circles or when querying aSpatialData
object that contains circles.[Deprecated] (labels) – This argument is now ignored and will be removed. Please filter the SpatialData object before calling this function.
[Deprecated] – This argument is now ignored and will be removed. Please filter the SpatialData object before calling this function.
[Deprecated] – This argument is now ignored and will be removed. Please filter the SpatialData object before calling this function.
[Deprecated] – This argument is now ignored and will be removed. Please filter the SpatialData object before calling this function.
- Return type:
DataArray
|DataTree
|GeoDataFrame
|DataFrame
|SpatialData
|None
- Returns:
: The queried SpatialData object or SpatialElement containing the requested data. Eventual empty Elements are omitted by the SpatialData object.
Examples
Here is an example for multipolygon use case. If you have a sequence of polygons/multipolygons, in particular a GeoDataFrame, and you want to query the data that belongs to any one of these shapes, you can call this function to the multipolygon obtained by merging all the polygons. To merge you can use a unary union.
- spatialdata.get_values(value_key, element=None, sdata=None, element_name=None, table_name=None, table_layer=None, return_obsm_as_is=False)#
Get the values from the element, from any location: df columns, obs or var columns (table).
- Parameters:
value_key (
str
|list
[str
]) – Name of the column/channel name to get the values fromelement (
Union
[DataArray
,DataTree
,GeoDataFrame
,DataFrame
,AnnData
,None
] (default:None
)) – SpatialElement object or AnnData table; either element or (sdata, element_name) must be providedsdata (
Optional
[SpatialData
] (default:None
)) – SpatialData object; either element or (sdata, element_name) must be providedelement_name (
Optional
[str
] (default:None
)) – Name of the element; either element or (sdata, element_name) must be provided. In case of element being an AnnData table, element_name can also be provided to subset the AnnData table to only include those rows annotating the element_name.table_name (
Optional
[str
] (default:None
)) – Name of the table to get the values from.table_layer (
Optional
[str
] (default:None
)) – Layer of the table to get the values from. If None, the values are taken from X.return_obsm_as_is (
bool
(default:False
)) – In case the value is in obsm the value of the key can be returned as is if return_obsm_as_is is True, otherwise creates a dataframe and returns it.
- Return type:
- Returns:
: DataFrame with the values requested.
Notes
The index of the returned dataframe is the instance_key of the table for the specified element.
If the element is a labels, the eventual background (0) is not included in the dataframe of returned values.
- spatialdata.get_element_instances(element, return_background=False)#
Get the instances (index values) of the SpatialElement.
- Parameters:
element (
DataArray
|DataTree
|GeoDataFrame
|DataFrame
) – The SpatialElement.return_background (
bool
(default:False
)) – If True, the background label (0) is included in the output.
- Return type:
Index
- Returns:
: pd.Series with the instances (index values) of the SpatialElement.
- spatialdata.get_extent(e, coordinate_system='global', exact=True, has_images=True, has_labels=True, has_points=True, has_shapes=True, elements=None)#
Get the extent (bounding box) of a SpatialData object or a SpatialElement.
- Parameters:
e (
SpatialData
|DataArray
|DataTree
|GeoDataFrame
|DataFrame
) – The SpatialData object or SpatialElement to compute the extent of.- Return type:
dict
[str
,tuple
[float
,float
]]- Returns:
: The bounding box description.
- min_coordinate
The minimum coordinate of the bounding box.
- max_coordinate
The maximum coordinate of the bounding box.
- axes
The names of the dimensions of the bounding box.
- exact
Whether the extent is computed exactly or not.
If
True
, the extent is computed exactly.If
False
, an approximation faster to compute is given.
The approximation is guaranteed to contain all the data, see notes for details.
- has_images
If
True
, images are included in the computation of the extent.- has_labels
If
True
, labels are included in the computation of the extent.- has_points
If
True
, points are included in the computation of the extent.- has_shapes
If
True
, shapes are included in the computation of the extent.- elements
If not
None
, only the elements with the given names are included in the computation of the extent.
Notes
The extent of a
SpatialData
object is the extent of the union of the extents of all its elements. The extent of aSpatialElement
is the extent of the element in the coordinate system specified by the argumentcoordinate_system
.If
exact
isFalse
, first the extent of theSpatialElement
before any transformation is computed. Then, the extent is transformed to the target coordinate system. This is faster than computing the extent after the transformation, since the transformation is applied to extent of the untransformed data, as opposed to transforming the data and then computing the extent.The exact and approximate extent are the same if the transformation does not contain any rotation or shear, or in the case in which the transformation is affine but all the corners of the extent of the untransformed data (bounding box corners) are part of the dataset itself. Note that this is always the case for raster data.
An extreme case is a dataset composed of the two points
(0, 0)
and(1, 1)
, rotated anticlockwise by 45 degrees. The exact extent is the bounding box[minx, miny, maxx, maxy] = [0, 0, 0, 1.414]
, while the approximate extent is the box[minx, miny, maxx, maxy] = [-0.707, 0, 0.707, 1.414]
.
- spatialdata.get_centroids(e, coordinate_system='global', return_background=False)#
Get the centroids of the geometries contained in a SpatialElement, as a new Points element.
- Parameters:
e (
DataArray
|DataTree
|GeoDataFrame
|DataFrame
) – The SpatialElement. Only points, shapes (circles, polygons and multipolygons) and labels are supported.coordinate_system (
str
(default:'global'
)) – The coordinate system in which the centroids are computed.return_background (
bool
(default:False
)) – If True, the centroid of the background label (0) is included in the output.
- Return type:
DataFrame
Notes
For
Multipolygon
.
- spatialdata.join_spatialelement_table(sdata=None, spatial_element_names=None, spatial_elements=None, table_name=None, table=None, how='left', match_rows='no')#
Join SpatialElement(s) and table together in SQL like manner.
The function allows the user to perform SQL like joins of SpatialElements and a table. The elements are not returned together in one dataframe-like structure, but instead filtered elements are returned. To determine matches, for the SpatialElement the index is used and for the table the region key column and instance key column. The elements are not overwritten in the
SpatialData
object.The following joins are supported:
'left'
,'left_exclusive'
,'inner'
,'right'
and'right_exclusive'
. In case of a'left'
join the SpatialElements are returned in a dictionary as is while the table is filtered to only include matching rows. In case of'left_exclusive'
join None is returned for table while the SpatialElements returned are filtered to only include indices not present in the table. The cases for'right'
joins are symmetric to the'left'
joins. In case of an'inner'
join of SpatialElement(s) and a table, for each an element is returned only containing the rows that are present in both the SpatialElement and table.For Points and Shapes elements every valid join for argument how is supported. For Labels elements only the
'left'
and'right_exclusive'
joins are supported. For Labels, the background label (0) is not included in the output and it will not be returned.- Parameters:
sdata (
Optional
[SpatialData
] (default:None
)) – SpatialData object containing all the elements and tables. This parameter can beNone
; in such case the both the names and values for the elements and the table must be provided.spatial_element_names (
Union
[list
[str
],str
,None
] (default:None
)) –- Required. The name(s) of the spatial elements to be joined with the table. If a list of names, and if sdata is
None
, the indices must match with the list of SpatialElements passed on by the argument elements.
spatial_elements (
Union
[DataArray
,DataTree
,GeoDataFrame
,DataFrame
,list
[DataArray
|DataTree
|GeoDataFrame
|DataFrame
],None
] (default:None
)) – This parameter should be speficied exactly whensdata
isNone
. The SpatialElement(s) to be joined with the table. In case of a list of SpatialElements the indices must match exactly with the indices in the list ofspatial_element_name
.table_name (
Optional
[str
] (default:None
)) – The name of the table to join with the spatial elements. Optional,table
can be provided instead.table (
Optional
[AnnData
] (default:None
)) – The table to join with the spatial elements. Whensdata
is notNone
,table_name
can be used instead.how (
Literal
['left'
,'left_exclusive'
,'inner'
,'right'
,'right_exclusive'
] (default:'left'
)) – The type of SQL like join to perform, default is'left'
. Options are'left'
,'left_exclusive'
,'inner'
,'right'
and'right_exclusive'
.match_rows (
Literal
['no'
,'left'
,'right'
] (default:'no'
)) – Whether to match the indices of the element and table and if so how. If'left'
, element_indices take priority and if'right'
table instance ids take priority.
- Return type:
tuple
[dict
[str
,Any
],AnnData
]- Returns:
: A tuple containing the joined elements as a dictionary and the joined table as an AnnData object.
- Raises:
ValueError – If
spatial_element_names
is not provided.ValueError – If sdata is
None
butspatial_elements
is notNone
; ifsdata
is notNone
, butspatial_elements
isNone
.ValueError – If
table_name
is provided but not present in theSpatialData
object, or iftable_name
is provided butsdata
isNone
.ValueError – If not exactly one of
table_name
andtable
is provided.ValueError – If no valid elements are provided for the join operation.
ValueError – If the provided join type is not supported.
ValueError – If an incorrect value is given for
match_rows
.
See also
match_element_to_table
Function to match elements to a table.
join_spatialelement_table
Function to join spatial elements with a table.
- spatialdata.match_element_to_table(sdata, element_name, table_name)#
Filter the elements and make the indices match those in the table.
- Parameters:
sdata (
SpatialData
) – SpatialData objectelement_name (
str
|list
[str
]) – The name(s) of the spatial elements to be joined with the table. Not supported for Label elements.table_name (
str
) – The name of the table to join with the spatial elements.
- Return type:
tuple
[dict
[str
,Any
],AnnData
]- Returns:
: A tuple containing the joined elements as a dictionary and the joined table as an AnnData object.
See also
match_table_to_element
Function to match a table to a spatial element.
join_spatialelement_table
General function, to join spatial elements with a table with more control.
- spatialdata.match_table_to_element(sdata, element_name, table_name='table')#
Filter the table and reorders the rows to match the instances (rows/labels) of the specified SpatialElement.
- Parameters:
sdata (
SpatialData
) – SpatialData objectelement_name (
str
) – The name of the spatial elements to be joined with the table.table_name (
str
(default:'table'
)) – The name of the table to match to the element.
- Return type:
- Returns:
: Table with the rows matching the instances of the element
See also
match_element_to_table
Function to match a spatial element to a table.
join_spatialelement_table
General function, to join spatial elements with a table with more control.
- spatialdata.concatenate(sdatas, region_key=None, instance_key=None, concatenate_tables=False, obs_names_make_unique=True, modify_tables_inplace=False, attrs_merge=None, **kwargs)#
Concatenate a list of spatial data objects.
- Parameters:
sdatas (
Iterable
[SpatialData
] |dict
[str
,SpatialData
]) – The spatial data objects to concatenate. The names of the elements across theSpatialData
objects must be unique. If they are not unique, you can pass a dictionary with the suffixes as keys and the spatial data objects as values. This will rename the names of eachSpatialElement
to ensure uniqueness of names acrossSpatialData
objects. See more on the notes.region_key (
Optional
[str
] (default:None
)) – The key to use for the region column in the concatenated object. IfNone
and all region_keys are the same, theregion_key
is used.instance_key (
Optional
[str
] (default:None
)) – The key to use for the instance column in the concatenated object. IfNone
and all instance_keys are the same, theinstance_key
is used.concatenate_tables (
bool
(default:False
)) – Whether to merge the tables in case of having the same element name.obs_names_make_unique (
bool
(default:True
)) – Whether to make theobs_names
unique by callingAnnData.obs_names_make_unique()
on each table of the concatenated object. If you passed a dictionary with the suffixes as keys and theSpatialData
objects as values and ifconcatenate_tables
isTrue
, theobs_names
will be made unique by adding the corresponding suffix instead.modify_tables_inplace (
bool
(default:False
)) – Whether to modify the tables in place. IfTrue
, the tables will be modified in place. IfFalse
, the tables will be copied before modification. Copying is enabled by default but can be disabled for performance reasons.attrs_merge (
Union
[Literal
['same'
,'unique'
,'first'
,'only'
],Callable
[[list
[dict
[Any
,Any
]]],dict
[Any
,Any
]],None
] (default:None
)) – How the elements of.attrs
are selected. Uses the same set of strategies as theuns_merge
argument of [anndata.concat](https://anndata.readthedocs.io/en/latest/generated/anndata.concat.html)kwargs (
Any
) – Seeanndata.concat()
for more details.
- Return type:
- Returns:
: The concatenated
spatialdata.SpatialData
object.
Notes
If you pass a dictionary with the suffixes as keys and the
SpatialData
objects as values, the names of eachSpatialElement
will be renamed to ensure uniqueness of names acrossSpatialData
objects by adding the corresponding suffix. To ensure the matching between existing table annotations, theregion
metadata of each table, and the values of theregion_key
column in each table, will be altered by adding the suffix. In addition, theobs_names
of each table will be altered (a suffix will be added). Finally, a suffix will be added to the name of each table iffrename_tables
isFalse
.If you need more control in the renaming, please give us feedback, as we are still trying to find the right balance between ergonomics and control. Also, you are welcome to copy and adjust the code of
_fix_ensure_unique_element_names()
directly.
- spatialdata.transform(data, transformation=None, maintain_positioning=False, to_coordinate_system=None)#
Transform a SpatialElement using the transformation to a coordinate system, and returns the transformed element.
- Parameters:
data (
Any
) – SpatialElement to transform.transformation (
Optional
[BaseTransformation
] (default:None
)) – The transformation to apply to the element. This parameter can be used only whenmaintain_positioning=True
, otherwiseto_coordinate_system
must be used.maintain_positioning (
bool
(default:False
)) –The default and recommended behavior is to leave this parameter to False.
- If True, in the transformed element, each transformation that was present in the original element will be
prepended with the inverse of the transformation used to transform the data (i.e. the current transformation for which .transform() is called). In this way the data is transformed but the positioning (for each coordinate system) is maintained. A use case is changing the orientation/scale/etc. of the data but keeping the alignment of the data within each coordinate system.
- If False, the data is transformed and the positioning changes; only the coordinate system in which the
data is transformed to is kept. For raster data, the translation part of the transformation is assigned to the element (see Notes below for more details). Furthermore, for raster data, the returned object will have a translation to take into account for the pixel (0, 0) position. Also, rotated raster data will be padded in the corners with a black color, such padding will be reflected into the rotation. Please see notes for more details of how this parameter interact with xarray.DataArray for raster data.
to_coordinate_system (
Optional
[str
] (default:None
)) – The coordinate system to which the data should be transformed. The coordinate system must be present in the element.
- Return type:
Any
- Returns:
: SpatialElement: Transformed SpatialElement.
Notes
An affine transformation contains a linear transformation and a translation. For raster types, only the linear transformation is applied to the data (e.g. the data is rotated or resized), but not the translation part. This means that calling Translation(…).transform(raster_element) will have the same effect as pre-pending the translation to each transformation of the raster element (if maintain_positioning=True), or assigning this translation to the element in the new coordinate system (if maintain_positioning=False). Analougous considerations apply to the black corner padding due to the rotation part of the transformation. We are considering to change this behavior by letting translations modify the coordinates stored with xarray.DataArray; this is tracked here: scverse/spatialdata#308
- spatialdata.rasterize(data, axes, min_coordinate, max_coordinate, target_coordinate_system, target_unit_to_pixels=None, target_width=None, target_height=None, target_depth=None, sdata=None, value_key=None, table_name=None, return_regions_as_labels=False, agg_func=None, return_single_channel=None)#
Rasterize a
SpatialData
object or aSpatialElement
(image, labels, points, shapes).- Parameters:
data (
SpatialData
|DataArray
|DataTree
|GeoDataFrame
|DataFrame
|str
) – TheSpatialData
object orSpatialElement
to rasterize. In alternative, the name of theSpatialElement
in theSpatialData
object, when theSpatialData
object is passed tovalues_sdata
.axes (
tuple
[str
,...
]) – The axes thatmin_coordinate
andmax_coordinate
refer to.min_coordinate (
list
[int
|float
] |ndarray
[Any
,dtype
[floating
[Any
]]]) – The minimum coordinates of the bounding box.max_coordinate (
list
[int
|float
] |ndarray
[Any
,dtype
[floating
[Any
]]]) – The maximum coordinates of the bounding box.target_coordinate_system (
str
) – The coordinate system in which we define the bounding box. This will also be the coordinate system of the produced rasterized image.target_unit_to_pixels (
Optional
[float
] (default:None
)) – The number of pixels per unit that the target image should have. It is mandatory to specify precisely one of the following options:target_unit_to_pixels
,target_width
,target_height
,target_depth
.target_width (
Optional
[float
] (default:None
)) – The width of the target image in units. It is mandatory to specify precisely one of the following options:target_unit_to_pixels
,target_width
,target_height
,target_depth
.target_height (
Optional
[float
] (default:None
)) – The height of the target image in units. It is mandatory to specify precisely one of the following options:target_unit_to_pixels
,target_width
,target_height
,target_depth
.target_depth (
Optional
[float
] (default:None
)) – The depth of the target image in units. It is mandatory to specify precisely one of the following options:target_unit_to_pixels
,target_width
,target_height
,target_depth
.sdata (
Optional
[SpatialData
] (default:None
)) –SpatialData
object containing the values to aggregate ifvalue_key
refers to values from a table. Must beNone
whendata
is aSpatialData
object.value_key (
Optional
[str
] (default:None
)) –Name of the column containing the values to aggregate; can refer both to numerical or categorical values.
The key can be:
the name of a column(s) in the dataframe (Dask
DataFrame
for points orGeoDataFrame
for shapes);the name of obs column(s) in the associated
AnnData
table (for points, shapes, and labels);the name of a var(s), referring to the column(s) of the X matrix in the table (for points, shapes, and labels).
See the notes for more details on the default behavior. Must be
None
whendata
is aSpatialData
object.table_name (
Optional
[str
] (default:None
)) – The table optionally containing thevalue_key
and the name of the table in the returnedSpatialData
object. Must beNone
whendata
is aSpatialData
object, otherwise it assumes the default value of'table'
.return_regions_as_labels (
bool
(default:False
)) – By default, single-scale images of shape(c, y, x)
are returned. IfTrue
, returns labels, shapes and points as labels of shape(y, x)
as opposed to an image of shape(c, y, x)
. Images are always returned as images, and multiscale raster data is always returned as single-scale data.agg_func (
Union
[str
,Reduction
,None
] (default:None
)) – Available only when rasterizing points and shapes. A reduction function from datashader (its name, or aCallable
). See the notes for more details on the default behavior. Must beNone
whendata
is aSpatialData
object.return_single_channel (
Optional
[bool
] (default:None
)) – Only used when rasterizing points and shapes and whenvalue_key
refers to a categorical column. IfFalse
, each category will be rasterized in a separate channel.
- Return type:
- Returns:
: The rasterized
SpatialData
object or SpatialData supportedDataArray
. EachSpatialElement
will be rasterized into aDataArray
(not aDataTree
). So if aSpatialData
object with elements is passed, aSpatialData
object with single-scale images and labels will be returned.When
return_regions_as_labels
isTrue
, the returnedDataArray
object will have an attribute calledlabel_index_to_category
that maps the label index to the category name. You can access it viareturned_data.attrs["label_index_to_category"]
. The returned labels will start from 1 (0 is reserved for the background), and will be contiguous.
Notes
For images and labels, the parameters
value_key
,table_name
,agg_func
, andreturn_single_channel
are not used.Instead, when rasterizing shapes and points, the following table clarifies the default datashader reduction used for various combinations of parameters.
In particular, the first two rows refer to the default behavior when the parameters (
value_key
, ‘table_name’,returned_single_channel
,agg_func
) are kept to their default values.value_key
Shapes or Points
return_single_chan
datashader reduct.
table_name
None*
Point (default)
NA
count
‘table’
None**
Shapes (default)
True
first
‘table’
None**
Shapes
False
count_cat
‘table’
category
NA
True
first
‘table’
category
NA
False
count_cat
‘table’
int/float
NA
NA
sum
‘table’
Explicitly, the default behaviors are as follows.
for points, each pixel counts the number of points belonging to it, (the
count
function is applied to an artificial column of ones);for shapes, each pixel gets a single index among the ones of the shapes that intersect it (the index of the shapes is interpreted as a categorical column and then the
first
function is used).
- spatialdata.rasterize_bins(sdata, bins, table_name, col_key, row_key, value_key=None, return_region_as_labels=False)#
Rasterizes grid-like binned shapes/points annotated by a table (e.g. Visium HD data).
- Parameters:
sdata (
SpatialData
) – The spatial data object containing the grid-like binned element to be rasterized.bins (
str
) – The name SpatialElement which defines the grid-like bins.table_name (
str
) – The name of the table annotating the SpatialElement.col_key (
str
) – Name of a column insdata[table_name].obs
containing the column indices (integer) for the bins.row_key (
str
) – Name of a column insdata[table_name].obs
containing the row indices (integer) for the bins.value_key (
Union
[list
[str
],str
,None
] (default:None
)) – The key(s) (obs columns/var names) in the table that will be used to rasterize the bins. IfNone
, all the var names will be used, and the returned object will be lazily constructed. Ignored ifreturn_region_as_labels
isTrue
.return_regions_as_labels – If
False
this function returns axarray.DataArray
of shape(c, y, x)
with dimension ofc
equal to the number of key(s) specified invalue_key
, or the number of var names intable_name
ifvalue_key
isNone
. IfTrue
, will return labels of shape(y, x)
, where each bin of thebins
element will be represented as a pixel. The table by default will not be set to annotate the new rasterized labels; this can be achieved using the helper functionspatialdata.rasterize_bins_link_table_to_labels()
.
- Return type:
- Returns:
: A spatial image object created by rasterizing the specified bins from the spatial data.
Notes
Before calling this function you should ensure that the data geometries are organized in grid-like bins (e.g. Visium HD data, but not Visium data). Also you should ensure that bin indices (integer) are defined in the
.obs
dataframe of the table associated with the spatial geometries. If variables fromtable.X
are being rasterized (typically, gene counts), then the table should be acsc_matrix
matrix (this can be done by callingsdata[table_name].X = sdata[table_name].X.tocsc()
).The returned image will have one pixel for each bin, and a coordinate transformation to map the image to the original data orientation. In particular, the bins of Visium HD data are in a grid that is slightly rotated; the coordinate transformation will adjust for this, so that the returned data is aligned to the original geometries.
If
spatialdata-plot
is used to visualized the returned image, the parameterscale='full'
needs to be passed to.render_shapes()
, to disable an automatic rasterization that would confict with the rasterization performed here.
- spatialdata.rasterize_bins_link_table_to_labels(sdata, table_name, rasterized_labels_name)#
Change the annotation target of the table to the rasterized labels.
This function should be called after having rasterized the bins (calling
rasterize_bins()
withreturn_regions_as_labels=True
) and after having added the rasterized labels to the spatial data object.- Parameters:
sdata (
SpatialData
) – The spatial data object containing the rasterized labels.table_name (
str
) – The name of the table to be annotated.rasterized_labels_name (
str
) – The name of the rasterized labels in the spatial data object.
- Return type:
None
- spatialdata.to_circles(data, radius=None)#
Convert a set of geometries (2D/3D labels, 2D shapes) to approximated circles/spheres.
- Parameters:
data (
DataArray
|DataTree
|GeoDataFrame
|DataFrame
) – The SpatialElement representing the geometries to approximate as circles/spheres.radius (
Union
[float
,ndarray
[Any
,dtype
[floating
[Any
]]],None
] (default:None
)) –- Radius/radii for the circles. For points elements, radius can either be specified as an argument, or be a column
of the dataframe. For non-points elements, radius must be
None
.
- Return type:
- Returns:
: The approximated circles/spheres.
Notes
The approximation is done by computing the centroids and the area/volume of the geometries. The geometries are then replaced by circles/spheres with the same centroids and area/volume.
- spatialdata.to_polygons(data, buffer_resolution=None)#
Convert a set of geometries (2D labels, 2D shapes) to approximated 2D polygons/multypolygons.
For optimal performance when converting rasters (
xarray.DataArray
ordatatree.DataTree
) to polygons, it is recommended to configureDask
to use ‘processes’ rather than ‘threads’. For example, you can set this configuration with:>>> import dask >>> dask.config.set(scheduler='processes')
- Parameters:
data (
DataArray
|DataTree
|GeoDataFrame
|DataFrame
) – The SpatialElement representing the geometries to approximate as 2D polygons/multipolygons.buffer_resolution (
Optional
[int
] (default:None
)) – Used only when constructing polygons from circles. Value of theresolution
parement for thebuffer()
internal call.
- Return type:
- Returns:
: The approximated 2D polygons/multipolygons in the specified coordinate system.
- spatialdata.aggregate(values, by, values_sdata=None, by_sdata=None, value_key=None, agg_func='sum', target_coordinate_system='global', fractions=False, region_key='region', instance_key='instance_id', deepcopy=True, table_name=None, buffer_resolution=16, **kwargs)#
Aggregate values by given region.
- Parameters:
values_sdata (
Optional
[SpatialData
] (default:None
)) – SpatialData object containing the values to aggregate: ifNone
,values
must be a SpatialElement; if notNone
,values
must be a string.values (
DataFrame
|GeoDataFrame
|DataArray
|DataTree
|str
) – The values to aggregate: ifvalues_sdata
isNone
, must be a SpatialElement, otherwise must be a string specifying the name of the SpatialElement invalues_sdata
by_sdata (
Optional
[SpatialData
] (default:None
)) – Regions to aggregate by: ifNone
,by
must be a SpatialElement; if notNone
,by
must be a string.by (
GeoDataFrame
|DataArray
|DataTree
|str
) – The regions to aggregate by: ifby_sdata
is None, must be a SpatialElement, otherwise must be a string specifying the name of the SpatialElement inby_sdata
value_key (
Union
[list
[str
],str
,None
] (default:None
)) –Name (or list of names) of the columns containing the values to aggregate; can refer both to numerical or categorical values. If the values are categorical,
value_key
can’t be a list.The key can be:
the name of a column(s) in the dataframe (Dask
DataFrame
for points orGeoDataFrame
for shapes);the name of obs column(s) in the associated
AnnData
table (for points, shapes and labels);the name of a var(s), referring to the column(s) of the X matrix in the table (for points, shapes and labels).
If nothing is passed here, it defaults to the equivalent of a column of ones. Defaults to
FEATURE_KEY
for points (if present).agg_func (
str
|list
[str
] (default:'sum'
)) – Aggregation function to apply over point values, e.g."mean"
,"sum"
,"count"
. Passed topandas.DataFrame.groupby.agg()
or toxrspatial.zonal_stats()
according to the type ofvalues
.target_coordinate_system (
str
(default:'global'
)) – Coordinate system to transform to before aggregating.fractions (
bool
(default:False
)) –Adjusts for partial areas overlap between regions in
values
andby
. More precisely: in the case in which a region inby
partially overlaps with a region invalues
, this setting specifies whether the value to aggregate should be considered as it is (fractions = False
) or it is to be multiplied by the following ratio: “area of the intersection between the two regions” / “area of the region invalues
”.Additional details:
default is
fractions = False
.- when aggregating points this parameter must be left to
False
, as the points don’t have area (otherwise a table of zeros would be obtained);
- when aggregating points this parameter must be left to
- for categorical values
"count"
and"sum"
are equivalent whenfractions = False
, but when fractions = True
,"count"
and"sum"
are different:count
would give not meaningful results and so it’s not allowed, while"sum"
actually sums the values of the intersecting regions, and should therefore be used.
- for categorical values
aggregating categorical values with
agg_func = "mean"
is not allowed as it give not meaningful results.
region_key (
str
(default:'region'
)) – Name that will be given to the new region column in the returned aggregated table.instance_key (
str
(default:'instance_id'
)) – Name that will be given to the new instance id column in the returned aggregated table.deepcopy (
bool
(default:True
)) – Whether to deepcopy the shapes in the returnedSpatialData
object. If the shapes are large (e.g. large multiscale labels), you may consider disabling the deepcopy to use a lazy Dask representation.table_name (
Optional
[str
] (default:None
)) – The table optionally containing thevalue_key
and the name of the table in the returnedSpatialData
object.buffer_resolution (
int
(default:16
)) – Resolution parameter to pass to the of the .buffer() method to convert circles to polygons. A higher value results in a more accurate representation of the circle, but also in a more complex polygon and computation.kwargs (
Any
) – Additional keyword arguments to pass toxrspatial.zonal_stats()
.
- Return type:
- Returns:
: Returns a
SpatialData
object with theby
shapes as SpatialElement and a table with the aggregated values annotating the shapes.If
value_key
refers to a categorical variable, the table in theSpaitalData
object has shape (by.shape[0]
, <n categories>).
Notes
This function returns a
SpatialData
object, so to access the aggregated table you can use thetable
attribute`.The shapes in the returned
SpatialData
objects are a reference to the original one. If you want them to be a different object you can do a deepcopy manually (this loads the data into memory), or you can save theSpatialData
object to disk and reload it (this keeps the data lazily represented).When aggregation points by shapes, the current implementation loads all the points into memory and thus could lead to a large memory usage. This Github issue scverse/spatialdata#210 keeps track of the changes required to address this behavior.
- spatialdata.map_raster(data, func, func_kwargs=mappingproxy({}), blockwise=True, depth=None, chunks=None, c_coords=None, dims=None, transformations=None, relabel=True, **kwargs)#
Apply a callable to raster data.
Applies a
func
callable to raster data. Ifblockwise
is set toTrue
, distributed processing will be achieved with:dask.array.map_overlap()
ifdepth
is notNone
dask.array.map_blocks()
, ifdepth
isNone
otherwise
func
is applied to the full data.- Parameters:
data (
DataArray
|DataTree
) – The data to process. It can be axarray.DataArray
ordatatree.DataTree
. If it’s aDataTree
, the callable is applied to the first scale (scale0
, the full-resolution data).func (
Callable
[[Array
],Array
]) – The callable that is applied to the data.func_kwargs (
Mapping
[str
,Any
] (default:mappingproxy({})
)) – Additional keyword arguments to pass to the callablefunc
.blockwise (
bool
(default:True
)) – IfTrue
,func
will be distributed withdask.array.map_overlap()
ordask.array.map_blocks()
, otherwisefunc
is applied to the full data. IfFalse
,depth
,chunks
andkwargs
are ignored.depth (
Union
[int
,tuple
[int
,...
],dict
[int
,int
],None
] (default:None
)) – Specifies the overlap between chunks, i.e. the number of elements that each chunk should share with its neighboring chunks. If notNone
, distributed processing will be achieved withdask.array.map_overlap()
, otherwise withdask.array.map_blocks()
.chunks (
Optional
[tuple
[tuple
[int
,...
],...
]] (default:None
)) – Chunk shape of resulting blocks if the callable does not preserve the data shape. For example, if the input block hasshape: (3,100,100)
and the resulting block after themap_raster
call hasshape: (1, 100,100)
, the argumentchunks
should be passed accordingly. Passed todask.array.map_overlap()
ordask.array.map_blocks()
. Ignored ifblockwise
isFalse
.c_coords (
Union
[Iterable
[int
],Iterable
[str
],None
] (default:None
)) – The channel coordinates for the output data. If not provided, the channel coordinates of the input data are used. If the callablefunc
is expected to change the number of channel coordinates, this argument should be provided, otherwise will default torange(len(output_coords))
.dims (
Optional
[tuple
[str
,...
]] (default:None
)) – The dimensions of the output data. If not provided, the dimensions of the input data are used. It must be specified if the callable changes the data dimensions, e.g.('c', 'y', 'x') -> ('y', 'x')
.transformations (
Optional
[dict
[str
,Any
]] (default:None
)) – The transformations of the output data. If not provided, the transformations of the input data are copied to the output data. It should be specified if the callable changes the data transformations.relabel (
bool
(default:True
)) – Whether to relabel the blocks of the output data. This option is ignored when the output data is not a labels layer (i.e., whendims
does not containc
). It is recommended to enable relabeling iffunc
returns labels that are not unique across chunks. Relabeling will be done by performing a bit shift. When a cell or entity to be labeled is split between two adjacent chunks, the current implementation does not assign the same label across blocks. See scverse/spatialdata#664 for discussion.kwargs (
Any
) – Additional keyword arguments to pass todask.array.map_overlap()
ordask.array.map_blocks()
. Ignored ifblockwise
is set toFalse
.
- Return type:
- Returns:
: The processed data as a
xarray.DataArray
.
- spatialdata.unpad_raster(raster)#
Remove padding from a raster type that was eventually added by the rotation component of a transformation.
- spatialdata.relabel_sequential(arr)#
Relabels integers in a Dask array sequentially.
This function assigns sequential labels to the integers in a Dask array starting from 1. For example, if the unique values in the input array are [0, 9, 5], they will be relabeled to [0, 1, 2] respectively. Note that currently if a cell or entity to be labeled is split across adjacent chunks the same label is not assigned to the cell across blocks. See discussion scverse/spatialdata#664.
- spatialdata.are_extents_equal(extent0, extent1, atol=0.1)#
Check if two data extents, as returned by
get_extent()
are equal up to approximation errors.- Parameters:
extent0 (
dict
[str
,tuple
[float
,float
]]) – The first data extent.extent1 (
dict
[str
,tuple
[float
,float
]]) – The second data extent.atol (
float
(default:0.1
)) – The absolute tolerance to use when comparing the extents.
- Return type:
bool
- Returns:
: Whether the extents are equal or not.
Notes
The default value of
atol
is currently high because of a bug ofrasterize()
that makes the extent of the rasterized data slightly different from the extent of the original data. This bug is tracked in scverse/spatialdata#165
- spatialdata.deepcopy(element)#
Deepcopy a SpatialData or SpatialElement object.
Deepcopy will load the data in memory. Using this function for large Dask-backed objects is discouraged. In that case, please save the SpatialData object to a different disk location and read it back again.
- Parameters:
element (
SpatialData
|DataArray
|DataTree
|GeoDataFrame
|DataFrame
|AnnData
) – The SpatialData or SpatialElement object to deepcopy- Return type:
SpatialData
|DataArray
|DataTree
|GeoDataFrame
|DataFrame
|AnnData
- Returns:
: A deepcopy of the SpatialData or SpatialElement object
Notes
The order of the columns for a deepcopied points element may be differ from the original one, please see more here: scverse/spatialdata#486
- spatialdata.get_pyramid_levels(image, attr=None, n=None)#
Access the data/attribute of the pyramid levels of a multiscale spatial image.
- Parameters:
image (
DataTree
) – The multiscale spatial image.attr (
Optional
[str
] (default:None
)) – IfNone
, return the data of the pyramid level as aDataArray
, if not None, return the specified attribute within theDataArray
data.n (
Optional
[int
] (default:None
)) – If not None, return only then
pyramid level.
- Return type:
Union
[list
[Any
],Any
]- Returns:
: The pyramid levels data (or an attribute of it) as a list or a generator.