Models

Models#

The elements (building-blocks) that constitute SpatialData.

class spatialdata.models.Image2DModel#

Bases: RasterSchema

classmethod parse(data, dims=None, c_coords=None, transformations=None, scale_factors=None, method=None, chunks=None, **kwargs)#

Validate (or parse) raster data.

Parameters:

data (ndarray[tuple[Any, ...], dtype[floating[Any]]] | DataArray | Array) – Data to validate (or parse). The shape of the data should be c(z)yx for 2D (3D) images and (z)yx for 2D ( 3D) labels. If you have a 2D image with shape yx, you can use numpy.expand_dims() (or an equivalent function) to add a channel dimension.
dims (Sequence[str] | None (default: None)) – Dimensions of the data (e.g. [‘c’, ‘y’, ‘x’] for 2D image data). If the data is a xarray.DataArray, the dimensions can also be inferred from the data. If the dimensions are not in the order (c)(z)yx, the data will be transposed to match the order.
c_coords (str | list[str] | None (default: None)) – Channel names of image data. Must be equal to the length of dimension ‘c’. Only supported for Image models.
transformations (dict[str, BaseTransformation] | None (default: None)) – Dictionary of transformations to apply to the data. The key is the name of the target coordinate system, the value is the transformation to apply. By default, a single Identity transformation mapping to the "global" coordinate system is applied.
scale_factors (Sequence[dict[str, int] | int] | None (default: None)) – Scale factors to apply to construct a multiscale image (datatree.DataTree). If None, a xarray.DataArray is returned instead. Importantly, each scale factor is relative to the previous scale factor. For example, if the scale factors are [2, 2, 2], the returned multiscale image will have 4 scales. The original image and then the 2x, 4x and 8x downsampled images.
method (Methods | None (default: None)) –
Method to use for multiscale downsampling. The default (None) differs between images and labels:
- Images (Image2DModel, Image3DModel): uses multiscale_spatial_image.to_multiscale() with method=Methods.XARRAY_COARSEN. This is the same default as in spatialdata <= 0.7.2 and is fast.
- Labels (Labels2DModel, Labels3DModel): uses a lazy implementation based on ome-zarr-py’s resize() (order=0, nearest-neighbour). This has lower peak memory usage than the multiscale_spatial_image implementation. Note: for images this ome-zarr-py path shows a significant performance regression (both time and memory); see GitHub issue #1079.
To override the default, pass any Methods value, which will force the multiscale_spatial_image.to_multiscale() code path for all element types. For example:
- method=Methods.XARRAY_COARSEN — coarsening via xarray (fast, default for images).
- method=Methods.DASK_IMAGE_NEAREST — nearest-neighbour via dask-image (not lazy as of multiscale-spatial-image==2.0.3, so it leads to higher memory usage).
chunks (int | tuple[int, ...] | tuple[tuple[int, ...], ...] | Mapping[Any, None | int | tuple[int, ...]] | None (default: None)) – Chunks to use for dask array.
kwargs (Any) – Additional arguments for to_spatial_image(). In particular the c_coords kwargs argument (an iterable) can be used to set the channel coordinates for image data. c_coords is not available for labels data as labels do not have channels.

Return type:

DataArray | DataTree

Returns:

: xarray.DataArray or datatree.DataTree

Notes

RGB images

If you have an image with 3 or 4 channels and you want to interpret it as an RGB or RGB(A) image, you can use the c_coords argument to specify the channel coordinates as ["r", "g", "b"] or ["r", "g", "b", "a"].

You can also pass the rgb argument to kwargs to automatically set the c_coords to ["r", "g", "b"]. Please refer to to_spatial_image() for more information. Note: if you set rgb=None in kwargs, 3-4 channel images will be interpreted automatically as RGB(A) images.

Setting axes / dims In case of the data being a numpy or dask array, there are no named axes yet. In this case, we first try to use the dimensions specified by the user in the dims argument of .parse. These dimensions are used to potentially transpose the data to match the order (c)(z)yx. See the description of the dims argument above. If dims is not specified, the dims are set to (c)(z)yx, dependent on the number of dimensions of the data.

classmethod validate(data)#

Validate data.

Parameters:: data (Any) – Data to validate.
Raises:: ValueError – If data is not valid.
Return type:: None

class spatialdata.models.Image3DModel#

Bases: RasterSchema

classmethod parse(data, dims=None, c_coords=None, transformations=None, scale_factors=None, method=None, chunks=None, **kwargs)#

Validate (or parse) raster data.

Parameters:

data (ndarray[tuple[Any, ...], dtype[floating[Any]]] | DataArray | Array) – Data to validate (or parse). The shape of the data should be c(z)yx for 2D (3D) images and (z)yx for 2D ( 3D) labels. If you have a 2D image with shape yx, you can use numpy.expand_dims() (or an equivalent function) to add a channel dimension.
dims (Sequence[str] | None (default: None)) – Dimensions of the data (e.g. [‘c’, ‘y’, ‘x’] for 2D image data). If the data is a xarray.DataArray, the dimensions can also be inferred from the data. If the dimensions are not in the order (c)(z)yx, the data will be transposed to match the order.
c_coords (str | list[str] | None (default: None)) – Channel names of image data. Must be equal to the length of dimension ‘c’. Only supported for Image models.
transformations (dict[str, BaseTransformation] | None (default: None)) – Dictionary of transformations to apply to the data. The key is the name of the target coordinate system, the value is the transformation to apply. By default, a single Identity transformation mapping to the "global" coordinate system is applied.
scale_factors (Sequence[dict[str, int] | int] | None (default: None)) – Scale factors to apply to construct a multiscale image (datatree.DataTree). If None, a xarray.DataArray is returned instead. Importantly, each scale factor is relative to the previous scale factor. For example, if the scale factors are [2, 2, 2], the returned multiscale image will have 4 scales. The original image and then the 2x, 4x and 8x downsampled images.
method (Methods | None (default: None)) –
Method to use for multiscale downsampling. The default (None) differs between images and labels:
- Images (Image2DModel, Image3DModel): uses multiscale_spatial_image.to_multiscale() with method=Methods.XARRAY_COARSEN. This is the same default as in spatialdata <= 0.7.2 and is fast.
- Labels (Labels2DModel, Labels3DModel): uses a lazy implementation based on ome-zarr-py’s resize() (order=0, nearest-neighbour). This has lower peak memory usage than the multiscale_spatial_image implementation. Note: for images this ome-zarr-py path shows a significant performance regression (both time and memory); see GitHub issue #1079.
To override the default, pass any Methods value, which will force the multiscale_spatial_image.to_multiscale() code path for all element types. For example:
- method=Methods.XARRAY_COARSEN — coarsening via xarray (fast, default for images).
- method=Methods.DASK_IMAGE_NEAREST — nearest-neighbour via dask-image (not lazy as of multiscale-spatial-image==2.0.3, so it leads to higher memory usage).
chunks (int | tuple[int, ...] | tuple[tuple[int, ...], ...] | Mapping[Any, None | int | tuple[int, ...]] | None (default: None)) – Chunks to use for dask array.
kwargs (Any) – Additional arguments for to_spatial_image(). In particular the c_coords kwargs argument (an iterable) can be used to set the channel coordinates for image data. c_coords is not available for labels data as labels do not have channels.

Return type:

DataArray | DataTree

Returns:

: xarray.DataArray or datatree.DataTree

Notes

RGB images

If you have an image with 3 or 4 channels and you want to interpret it as an RGB or RGB(A) image, you can use the c_coords argument to specify the channel coordinates as ["r", "g", "b"] or ["r", "g", "b", "a"].

You can also pass the rgb argument to kwargs to automatically set the c_coords to ["r", "g", "b"]. Please refer to to_spatial_image() for more information. Note: if you set rgb=None in kwargs, 3-4 channel images will be interpreted automatically as RGB(A) images.

Setting axes / dims In case of the data being a numpy or dask array, there are no named axes yet. In this case, we first try to use the dimensions specified by the user in the dims argument of .parse. These dimensions are used to potentially transpose the data to match the order (c)(z)yx. See the description of the dims argument above. If dims is not specified, the dims are set to (c)(z)yx, dependent on the number of dimensions of the data.

classmethod validate(data)#

Validate data.

Parameters:: data (Any) – Data to validate.
Raises:: ValueError – If data is not valid.
Return type:: None

class spatialdata.models.Labels2DModel#

Bases: RasterSchema

classmethod parse(*args, **kwargs)#

Validate (or parse) raster data.

Parameters:

data – Data to validate (or parse). The shape of the data should be c(z)yx for 2D (3D) images and (z)yx for 2D ( 3D) labels. If you have a 2D image with shape yx, you can use numpy.expand_dims() (or an equivalent function) to add a channel dimension.
dims – Dimensions of the data (e.g. [‘c’, ‘y’, ‘x’] for 2D image data). If the data is a xarray.DataArray, the dimensions can also be inferred from the data. If the dimensions are not in the order (c)(z)yx, the data will be transposed to match the order.
c_coords (str | list[str] | None) – Channel names of image data. Must be equal to the length of dimension ‘c’. Only supported for Image models.
transformations – Dictionary of transformations to apply to the data. The key is the name of the target coordinate system, the value is the transformation to apply. By default, a single Identity transformation mapping to the "global" coordinate system is applied.
scale_factors – Scale factors to apply to construct a multiscale image (datatree.DataTree). If None, a xarray.DataArray is returned instead. Importantly, each scale factor is relative to the previous scale factor. For example, if the scale factors are [2, 2, 2], the returned multiscale image will have 4 scales. The original image and then the 2x, 4x and 8x downsampled images.
method –
Method to use for multiscale downsampling. The default (None) differs between images and labels:
- Images (Image2DModel, Image3DModel): uses multiscale_spatial_image.to_multiscale() with method=Methods.XARRAY_COARSEN. This is the same default as in spatialdata <= 0.7.2 and is fast.
- Labels (Labels2DModel, Labels3DModel): uses a lazy implementation based on ome-zarr-py’s resize() (order=0, nearest-neighbour). This has lower peak memory usage than the multiscale_spatial_image implementation. Note: for images this ome-zarr-py path shows a significant performance regression (both time and memory); see GitHub issue #1079.
To override the default, pass any Methods value, which will force the multiscale_spatial_image.to_multiscale() code path for all element types. For example:
- method=Methods.XARRAY_COARSEN — coarsening via xarray (fast, default for images).
- method=Methods.DASK_IMAGE_NEAREST — nearest-neighbour via dask-image (not lazy as of multiscale-spatial-image==2.0.3, so it leads to higher memory usage).
chunks – Chunks to use for dask array.
kwargs (Any) – Additional arguments for to_spatial_image(). In particular the c_coords kwargs argument (an iterable) can be used to set the channel coordinates for image data. c_coords is not available for labels data as labels do not have channels.

Return type:

DataArray | DataTree

Returns:

: xarray.DataArray or datatree.DataTree

Notes

RGB images

If you have an image with 3 or 4 channels and you want to interpret it as an RGB or RGB(A) image, you can use the c_coords argument to specify the channel coordinates as ["r", "g", "b"] or ["r", "g", "b", "a"].

You can also pass the rgb argument to kwargs to automatically set the c_coords to ["r", "g", "b"]. Please refer to to_spatial_image() for more information. Note: if you set rgb=None in kwargs, 3-4 channel images will be interpreted automatically as RGB(A) images.

Setting axes / dims In case of the data being a numpy or dask array, there are no named axes yet. In this case, we first try to use the dimensions specified by the user in the dims argument of .parse. These dimensions are used to potentially transpose the data to match the order (c)(z)yx. See the description of the dims argument above. If dims is not specified, the dims are set to (c)(z)yx, dependent on the number of dimensions of the data.

classmethod validate(data)#

Validate data.

Parameters:: data (Any) – Data to validate.
Raises:: ValueError – If data is not valid.
Return type:: None

class spatialdata.models.Labels3DModel#

Bases: RasterSchema

classmethod parse(*args, **kwargs)#

Validate (or parse) raster data.

Parameters:

data – Data to validate (or parse). The shape of the data should be c(z)yx for 2D (3D) images and (z)yx for 2D ( 3D) labels. If you have a 2D image with shape yx, you can use numpy.expand_dims() (or an equivalent function) to add a channel dimension.
dims – Dimensions of the data (e.g. [‘c’, ‘y’, ‘x’] for 2D image data). If the data is a xarray.DataArray, the dimensions can also be inferred from the data. If the dimensions are not in the order (c)(z)yx, the data will be transposed to match the order.
c_coords (str | list[str] | None) – Channel names of image data. Must be equal to the length of dimension ‘c’. Only supported for Image models.
transformations – Dictionary of transformations to apply to the data. The key is the name of the target coordinate system, the value is the transformation to apply. By default, a single Identity transformation mapping to the "global" coordinate system is applied.
scale_factors – Scale factors to apply to construct a multiscale image (datatree.DataTree). If None, a xarray.DataArray is returned instead. Importantly, each scale factor is relative to the previous scale factor. For example, if the scale factors are [2, 2, 2], the returned multiscale image will have 4 scales. The original image and then the 2x, 4x and 8x downsampled images.
method –
Method to use for multiscale downsampling. The default (None) differs between images and labels:
- Images (Image2DModel, Image3DModel): uses multiscale_spatial_image.to_multiscale() with method=Methods.XARRAY_COARSEN. This is the same default as in spatialdata <= 0.7.2 and is fast.
- Labels (Labels2DModel, Labels3DModel): uses a lazy implementation based on ome-zarr-py’s resize() (order=0, nearest-neighbour). This has lower peak memory usage than the multiscale_spatial_image implementation. Note: for images this ome-zarr-py path shows a significant performance regression (both time and memory); see GitHub issue #1079.
To override the default, pass any Methods value, which will force the multiscale_spatial_image.to_multiscale() code path for all element types. For example:
- method=Methods.XARRAY_COARSEN — coarsening via xarray (fast, default for images).
- method=Methods.DASK_IMAGE_NEAREST — nearest-neighbour via dask-image (not lazy as of multiscale-spatial-image==2.0.3, so it leads to higher memory usage).
chunks – Chunks to use for dask array.
kwargs (Any) – Additional arguments for to_spatial_image(). In particular the c_coords kwargs argument (an iterable) can be used to set the channel coordinates for image data. c_coords is not available for labels data as labels do not have channels.

Return type:

DataArray | DataTree

Returns:

: xarray.DataArray or datatree.DataTree

Notes

RGB images

If you have an image with 3 or 4 channels and you want to interpret it as an RGB or RGB(A) image, you can use the c_coords argument to specify the channel coordinates as ["r", "g", "b"] or ["r", "g", "b", "a"].

You can also pass the rgb argument to kwargs to automatically set the c_coords to ["r", "g", "b"]. Please refer to to_spatial_image() for more information. Note: if you set rgb=None in kwargs, 3-4 channel images will be interpreted automatically as RGB(A) images.

Setting axes / dims In case of the data being a numpy or dask array, there are no named axes yet. In this case, we first try to use the dimensions specified by the user in the dims argument of .parse. These dimensions are used to potentially transpose the data to match the order (c)(z)yx. See the description of the dims argument above. If dims is not specified, the dims are set to (c)(z)yx, dependent on the number of dimensions of the data.

classmethod validate(data)#

Validate data.

Parameters:: data (Any) – Data to validate.
Raises:: ValueError – If data is not valid.
Return type:: None

class spatialdata.models.ShapesModel#

Bases: object

classmethod parse(data, **kwargs)#

classmethod parse(cls, data, geometry, offsets=None, radius=None, index=None, transformations=None)

classmethod parse(cls, data, radius=None, index=None, transformations=None, **kwargs)

classmethod parse(cls, data, transformations=None)

Parse shapes data.

Parameters:

data (Any) –
Data to parse:
- If numpy.ndarray, it assumes the shapes are parsed as ragged arrays, in case of shapely.Polygon or shapely.MultiPolygon. Therefore additional arguments offsets and geometry must be provided
- if Path or str, it’s read as a GeoJSON file.
- If geopandas.GeoDataFrame, it’s validated. The object needs to have a column called geometry which is a geopandas.GeoSeries or shapely objects. Valid options are combinations of shapely.Polygon or shapely.MultiPolygon or shapely.Point. If the geometries are Point, there must be another column called radius.
geometry –
Geometry type of the shapes. The following geometries are supported:
- 0: Circles
- 3: Polygon
- 6: MultiPolygon
offsets – In the case of shapely.Polygon or shapely.MultiPolygon shapes, in order to initialize the shapes from their ragged array representation, the offsets of the polygons must be provided. Alternatively you can call the parser as ShapesModel.parse(data), where data is a GeoDataFrame object and ignore the offset parameter (recommended).
radius – Size of the Circles. It must be provided if the shapes are Circles.
index – Index of the shapes, must be of type str. If None, it’s generated automatically.
transformations – Transformations of shapes.
kwargs (Any) – Additional arguments for GeoJSON reader.

Return type:

GeoDataFrame

Returns:

: geopandas.GeoDataFrame

classmethod validate(data)#

Validate data.

Parameters:

data (GeoDataFrame) – geopandas.GeoDataFrame to validate.

Return type:

None

Returns:

: None

classmethod validate_shapes_not_mixed_types(gdf)#

Check that the Shapes element is either composed of Point or Polygon/MultiPolygon.

Parameters:: gdf (GeoDataFrame) – The Shapes element.
Raises:: ValueError – When the geometry column composing the object does not satisfy the type requirements.
Return type:: None

Notes

This function is not called by ShapesModel.validate() because computing the unique types by default could be expensive.

class spatialdata.models.PointsModel#

Bases: object

classmethod parse(data, **kwargs)#

classmethod parse(cls, data, annotation=None, feature_key=None, instance_key=None, transformations=None, **kwargs)

classmethod parse(cls, data, coordinates=None, feature_key=None, instance_key=None, transformations=None, **kwargs)

Validate (or parse) points data.

Parameters:

data (Any) –
Data to parse:
- If numpy.ndarray, an annotation pandas.DataFrame can be provided, as well as a feature_key column in the annotation dataframe. Furthermore, numpy.ndarray is assumed to have shape (n_points, axes), with axes being “x”, “y” and optionally “z”.
- If pandas.DataFrame, a coordinates mapping can be provided with key as valid axes (‘x’, ‘y’, ‘z’) and value as column names in dataframe. If the dataframe already has columns named ‘x’, ‘y’ and ‘z’, the mapping can be omitted.
annotation – Annotation dataframe. Only if data is numpy.ndarray. If data is an array, the index of the annotations will be used as the index of the parsed points.
coordinates – Mapping of axes names (keys) to column names (valus) in data. Only if data is pandas.DataFrame. Example: {‘x’: ‘my_x_column’, ‘y’: ‘my_y_column’}. If not provided and data is pandas.DataFrame, and x, y and optionally z are column names, then they will be used as coordinates.
feature_key – Optional, feature key in annotation or data. Example use case: gene id categorical column describing the gene identity of each point.
instance_key – Optional, instance key in annotation or data. Example use case: cell id column, describing which cell a point belongs to. This argument is likely going to be deprecated: scverse/spatialdata#503.
transformations – Transformations of points.
kwargs (Any) – Additional arguments for dask.dataframe.from_array().

Return type:

DataFrame

Returns:

: dask.dataframe.core.DataFrame

Notes

The order of the columns of the dataframe returned by the parser is not guaranteed to be the same as the order of the columns in the dataframe passed as an argument.

classmethod validate(data)#

Validate data.

Parameters:

data (DataFrame) – dask.dataframe.core.DataFrame to validate.

Return type:

None

Returns:

: None

class spatialdata.models.TableModel#

Bases: object

classmethod parse(adata, region=None, region_key=None, instance_key=None, overwrite_metadata=False)#

Parse the anndata.AnnData to be compatible with the model.

Parameters:

adata (AnnData) – The AnnData object.
region (str | list[str] | None (default: None)) – Region(s) to be used.
region_key (str | None (default: None)) – Key in adata.obs that specifies the region.
instance_key (str | None (default: None)) – Key in adata.obs that specifies the instance.
overwrite_metadata (bool (default: False)) – If True, the region, region_key and instance_key metadata will be overwritten.

Return type:

AnnData

Returns:

: The parsed data.

classmethod validate(data)#

Validate the data.

Parameters:

data (AnnData) – The data to validate.

Return type:

AnnData

Returns:

: The validated data.

Models

Contents

Models#