I. Use SpatialData with your data: the SpatialData object.#

The spatialdata framework has three ways to construct SpatialData objects:

  1. You can read a SpatialData object that has already been saved to .zarr in the SpatialData Zarr format.

    1. From disk.

    2. From the cloud.

  2. You can use the reader functions from spatialdata-io.

  3. You can construct a SpatialData object from scratch using our Python spatialdata APIs.

    1. Using the SpatialData class.

    2. Extending it with the Incremental IO APIs.

This tutorial is divided into two parts. The first part (this notebook) will discuss all of the above. The second part will discuss how to consuct the basic components of a SpatialData object (images, labels, points, shapes, tables).

Reading SpatialData .zarr data#

The distinction between Zarr, OME-NGFF and the SpatialData format#

Let’s start with a clarification on the storage format.

Zarr is a storage format to save data on-disk or in the cloud in a performant and interoperable way. A Zarr object saved on-disk or in the cloud is referred to as a Zarr store. Effectively a Zarr store is not a file, but a folder containing data and metadata. Zarr is optimized to store tensor data (such as large images).

OME-NGFF is a specification that describes how to structure the storage of bioimaging data and metadata. For instance it defines a community-agreed system for storing multiple resolutions for large images, and to divide them into smaller chunks. It also defines how to specify axes, coordinate systems and coordinate transformations to describe the spatial context of the data. OME-NGFF does not require to save the data to Zarr, but the most used implementation of the specification is in Zarr and is called OME-Zarr.

The SpatialData Zarr format, which is described in our design doc, is an extension of the OME-NGFF specification, which makes use of the OME-Zarr, the AnnData Zarr and the Parquet file formats. We need to use these combination of technologies because currently OME-NGFF does not provide all the fundamentals required for storing spatial omics dataset; nevertheless, we try to stay as close as OME-NGFF as possible, and we are contributing to ultimately make spatial omics support available in pure OME-NGFF.

Compatible .zarr stores#

spatialdata can read SpatialData Zarr data. Practically, this is data that has been previously saved using the spatialdata APIs in Python. Outside Python there are preliminary efforts to make possible to save SpatialData Zarr objects. For instance in R: https://github.com/HelenaLC/SpatialData (not yet ready!).

Non-compatible .zarr stores#

spatialdata cannot read arbitrary Zarr files, for instance the feature_slice.zarr file in Visium HD data is not a SpatialData Zarr file (we will see how to read Visium HD data later). spatialdata cannot read arbitrary OME-Zarr files (but eventually our aim would be to make every OME-Zarr file compatible).

Example datasets#

You can download example SpatialData Zarr files from our documentation, example below.

Technology

Sample

File Size

Filename (spatialdata-sandbox)

download data

license

Visium HD

Mouse intestin [^1]

1 GB

visium_hd_3.0.0_id

.zarr.zip

CCA

Visium

Breast cancer [^2]

1.5 GB

visium_associated_xenium_io

.zarr.zip

CCA

Xenium

Breast cancer [^2]

2.8 GB

xenium_rep1_io

.zarr.zip

CCA

Sources.

  1. From https://www.10xgenomics.com/datasets/visium-hd-cytassist-gene-expression-libraries-of-mouse-intestine

  2. Janesick, A. et al. High resolution mapping of the breast cancer tumor microenvironment using integrated single cell, spatial and in situ analysis of FFPE tissue. bioRxiv 2022.10.06.510405 (2022) doi:10.1101/2022.10.06.510405.

APIs to read SpatialData .zarr data from disk#

Here is an example of writing an in-memory example SpatialData object to SpatialData and then reading it again.

from pathlib import Path
from tempfile import TemporaryDirectory

from spatialdata import SpatialData, read_zarr
from spatialdata.datasets import blobs

sdata = blobs()
print(sdata)
print()

tmpdir = TemporaryDirectory().name
f = Path(tmpdir) / "data.zarr"
sdata.write(f)
# 2 equivalent alternatives:
from_disk = read_zarr(f)
from_disk = SpatialData.read(f)
print(from_disk)
SpatialData object
├── Images
│     ├── 'blobs_image': SpatialImage[cyx] (3, 512, 512)
│     └── 'blobs_multiscale_image': MultiscaleSpatialImage[cyx] (3, 512, 512), (3, 256, 256), (3, 128, 128)
├── Labels
│     ├── 'blobs_labels': SpatialImage[yx] (512, 512)
│     └── 'blobs_multiscale_labels': MultiscaleSpatialImage[yx] (512, 512), (256, 256), (128, 128)
├── Points
│     └── 'blobs_points': DataFrame with shape: (<Delayed>, 4) (2D points)
├── Shapes
│     ├── 'blobs_circles': GeoDataFrame shape: (5, 2) (2D shapes)
│     ├── 'blobs_multipolygons': GeoDataFrame shape: (2, 1) (2D shapes)
│     └── 'blobs_polygons': GeoDataFrame shape: (5, 1) (2D shapes)
└── Tables
      └── 'table': AnnData (26, 3)
with coordinate systems:
    ▸ 'global', with elements:
        blobs_image (Images), blobs_multiscale_image (Images), blobs_labels (Labels), blobs_multiscale_labels (Labels), blobs_points (Points), blobs_circles (Shapes), blobs_multipolygons (Shapes), blobs_polygons (Shapes)

INFO     The Zarr backing store has been changed from None the new file path: /tmp/tmpz9ihcc5q/data.zarr           
SpatialData object, with associated Zarr store: /private/tmp/tmpz9ihcc5q/data.zarr
├── Images
│     ├── 'blobs_image': SpatialImage[cyx] (3, 512, 512)
│     └── 'blobs_multiscale_image': MultiscaleSpatialImage[cyx] (3, 512, 512), (3, 256, 256), (3, 128, 128)
├── Labels
│     ├── 'blobs_labels': SpatialImage[yx] (512, 512)
│     └── 'blobs_multiscale_labels': MultiscaleSpatialImage[yx] (512, 512), (256, 256), (128, 128)
├── Points
│     └── 'blobs_points': DataFrame with shape: (200, 4) (2D points)
├── Shapes
│     ├── 'blobs_circles': GeoDataFrame shape: (5, 2) (2D shapes)
│     ├── 'blobs_multipolygons': GeoDataFrame shape: (2, 1) (2D shapes)
│     └── 'blobs_polygons': GeoDataFrame shape: (5, 1) (2D shapes)
└── Tables
      └── 'table': AnnData (26, 3)
with coordinate systems:
    ▸ 'global', with elements:
        blobs_image (Images), blobs_multiscale_image (Images), blobs_labels (Labels), blobs_multiscale_labels (Labels), blobs_points (Points), blobs_circles (Shapes), blobs_multipolygons (Shapes), blobs_polygons (Shapes)

APIs to read SpatialData .zarr data from the cloud#

Remote access of .zarr data is currently only partially supported, see more here: https://github.com/scverse/spatialdata/discussions/526.

Reader functions from spatialdata-io#

If you have raw data from common commercial technologies (e.g. Visium HD or MERSCOPE), you can convert the data easily into the SpatialData Zarr format using the library spatialdata-io.

from spatialdata_io import visium_hd
import spatialdata as sd

# represent the raw data in-memory
sdata = visium_hd('path/to/raw/data')

# write the data to disk
sdata.write(path_write)

# read and print the Zarr data
sdata = sd.read_zarr(path_write)
print(sdata)

The visium_hd() function accepts additional parameters, here is for instance an example to load also the CytAssist image and a high resolution microscopy image of the tissue

sdata = visium_hd(
    path_read,
    load_all_images=True,
    fullres_image_file="Visium_HD_Mouse_Small_Intestine_tissue_image.btf",
)

Please consult the spatialdata-io documentation for a detailed description of all the reader functions and parameters.

Warning! It is important to call .write() and then read the data again after using functions from spatialdata-io. In fact, sometimes the raw data is not stored in a optimized way (e.g. large csv for points and non-chunked, non-multiscale TIFF files for images). In those cases, if you tried to do operations such as viewing the data with napari-spatialdata without first having written and re-read the data, napari would have extremely poor performance and be unusable.

Construct a SpatialData object from scratch.#

The readers from spatialdata-io offer a good starting point, but sometimes you need a higher degree of customization and composability. In such cases you can create new SpatialData objects from scratch, or extend/modify existing ones.

Tip: if you need some code to get started you can look at the source code for the readers implemented in spatialdata-io.

The SpatialData class#

Let’s see how to use the SpatialData class to construct a SpatialData object. For the moment let’s assume to have a set of images, labels, points, shapes and tables: let’s fetch them out from another SpatialData object. Later we will show how to construct them from scratch.

sdata
SpatialData object, with associated Zarr store: /private/tmp/tmpz9ihcc5q/data.zarr
├── Images
│     ├── 'blobs_image': SpatialImage[cyx] (3, 512, 512)
│     └── 'blobs_multiscale_image': MultiscaleSpatialImage[cyx] (3, 512, 512), (3, 256, 256), (3, 128, 128)
├── Labels
│     ├── 'blobs_labels': SpatialImage[yx] (512, 512)
│     └── 'blobs_multiscale_labels': MultiscaleSpatialImage[yx] (512, 512), (256, 256), (128, 128)
├── Points
│     └── 'blobs_points': DataFrame with shape: (<Delayed>, 4) (2D points)
├── Shapes
│     ├── 'blobs_circles': GeoDataFrame shape: (5, 2) (2D shapes)
│     ├── 'blobs_multipolygons': GeoDataFrame shape: (2, 1) (2D shapes)
│     └── 'blobs_polygons': GeoDataFrame shape: (5, 1) (2D shapes)
└── Tables
      └── 'table': AnnData (26, 3)
with coordinate systems:
    ▸ 'global', with elements:
        blobs_image (Images), blobs_multiscale_image (Images), blobs_labels (Labels), blobs_multiscale_labels (Labels), blobs_points (Points), blobs_circles (Shapes), blobs_multipolygons (Shapes), blobs_polygons (Shapes)

You can use any name for the elements, as long as they are unique. You can repeat the same element values, like for the points below (on-disk, they will be written in separate objects).

my_images = {"you": sdata["blobs_image"]}
my_labels = {"can": sdata["blobs_labels"], "use": sdata["blobs_multiscale_labels"]}
my_points = {"any": sdata["blobs_points"], "unique": sdata["blobs_points"]}
my_shapes = {"name": sdata["blobs_circles"]}
my_tables = {"here": sdata["table"]}

The SpatialData constructor takes as input a dict for each element type. We can also omit some dicts (or even all of them).

# empty object
SpatialData()
SpatialData object
with coordinate systems:
# just points
SpatialData(points=my_points)
SpatialData object
└── Points
      ├── 'any': DataFrame with shape: (<Delayed>, 4) (2D points)
      └── 'unique': DataFrame with shape: (<Delayed>, 4) (2D points)
with coordinate systems:
    ▸ 'global', with elements:
        any (Points), unique (Points)
# full object
SpatialData(images=my_images, labels=my_labels, points=my_points, shapes=my_shapes, tables=my_tables)
SpatialData object
├── Images
│     └── 'you': SpatialImage[cyx] (3, 512, 512)
├── Labels
│     ├── 'can': SpatialImage[yx] (512, 512)
│     └── 'use': MultiscaleSpatialImage[yx] (512, 512), (256, 256), (128, 128)
├── Points
│     ├── 'any': DataFrame with shape: (<Delayed>, 4) (2D points)
│     └── 'unique': DataFrame with shape: (<Delayed>, 4) (2D points)
├── Shapes
│     └── 'name': GeoDataFrame shape: (5, 2) (2D shapes)
└── Tables
      └── 'here': AnnData (26, 3)
with coordinate systems:
    ▸ 'global', with elements:
        you (Images), can (Labels), use (Labels), any (Points), unique (Points), name (Shapes)

Here is a shortcut to create the object from a single dict.

merged_dict = my_images | my_labels | my_points | my_shapes | my_tables
print(merged_dict.keys())

sdata = SpatialData.from_elements_dict(merged_dict)
print(sdata)
dict_keys(['you', 'can', 'use', 'any', 'unique', 'name', 'here'])
SpatialData object
├── Images
│     └── 'you': SpatialImage[cyx] (3, 512, 512)
├── Labels
│     ├── 'can': SpatialImage[yx] (512, 512)
│     └── 'use': MultiscaleSpatialImage[yx] (512, 512), (256, 256), (128, 128)
├── Points
│     ├── 'any': DataFrame with shape: (<Delayed>, 4) (2D points)
│     └── 'unique': DataFrame with shape: (<Delayed>, 4) (2D points)
├── Shapes
│     └── 'name': GeoDataFrame shape: (5, 2) (2D shapes)
└── Tables
      └── 'here': AnnData (26, 3)
with coordinate systems:
    ▸ 'global', with elements:
        you (Images), can (Labels), use (Labels), any (Points), unique (Points), name (Shapes)

You can also add or remove additional elements (as long as the names are unique).

# shapes before
print(list(sdata.shapes.keys()))

# let's add a shallow copy
element = sdata["name"]
sdata["another_shape"] = element
print(list(sdata.shapes.keys()))

# let's add a deep copy (we provide APIs for deepcopying elements)
from spatialdata import deepcopy

element2 = deepcopy(sdata["name"])
sdata["yet_another_shape"] = element2
print(list(sdata.shapes.keys()))
['name']
['name', 'another_shape']
['name', 'another_shape', 'yet_another_shape']
# if you use unique names here, it will overwrite the previous one (as long as the element is of the same element type)
sdata["name"] = element

import pytest

with pytest.raises(KeyError, match="Key `name` already exists."):
    sdata.images["name"] = sdata["you"]

You can also delete some elements form the in-memory object, or even subset the entire object to a list of elements. Let’s see this:

del sdata["can"]
assert "can" not in sdata
sdata = sdata.subset(["you", "yet_another_shape"])
sdata
SpatialData object
├── Images
│     └── 'you': SpatialImage[cyx] (3, 512, 512)
└── Shapes
      └── 'yet_another_shape': GeoDataFrame shape: (5, 2) (2D shapes)
with coordinate systems:
    ▸ 'global', with elements:
        you (Images), yet_another_shape (Shapes)

Finally, let’s write the SpatialData object to disk. If you wish (for instance if you want to free the memory and lazy load the elements), you can read it again (note: currenlty not all the elements support lazy loading, this will be discussed in the second part of this tutorial).

f = Path(tmpdir) / "data.zarr"
sdata.write(f, overwrite=True)
read_again = read_zarr(f)
print(read_again)
INFO     The Zarr backing store has been changed from None the new file path: /tmp/tmpz9ihcc5q/data.zarr           
SpatialData object, with associated Zarr store: /private/tmp/tmpz9ihcc5q/data.zarr
├── Images
│     └── 'you': SpatialImage[cyx] (3, 512, 512)
└── Shapes
      └── 'yet_another_shape': GeoDataFrame shape: (5, 2) (2D shapes)
with coordinate systems:
    ▸ 'global', with elements:
        you (Images), yet_another_shape (Shapes)

Extending existing objects with the incremental IO APIs#

Above, we showed that using the sdata['name'] = element syntax, we can modify an object in-memory after having created it. This is possible also on-disk, as we will show in this section.

Note: these functionalities are, at the time of writing, not released yet in PyPi; nevertheless, the implementation is already fully available at https://github.com/scverse/spatialdata/pull/501.

Associated Zarr store#

When we create a SpatialData in-memory, it has no associated Zarr path. Instead, when we write an object to disk, or when we read it, the Zarr store path will be set in the element.

SpatialData().path is None
True
# sdata has been previously written to disk
sdata.path
PosixPath('/tmp/tmpz9ihcc5q/data.zarr')
# read_again has been previously read from disk
read_again.path
PosixPath('/tmp/tmpz9ihcc5q/data.zarr')

Removing elements#

It is possible to remove existing elements, or write new elements, into this Zarr path. Let’s remove one from disk.

sdata.delete_element_from_disk("yet_another_shape")

As you can see, the element is still available in-memory, but not on-disk.

sdata["yet_another_shape"]
geometry radius
0 POINT (291.062 197.065) 51
1 POINT (259.026 371.319) 51
2 POINT (194.973 204.414) 51
3 POINT (149.926 188.623) 51
4 POINT (369.422 258.900) 51
sdata.elements_paths_on_disk()
['images/you']

Let’s now also delete it in-memory.

del sdata["yet_another_shape"]

Adding elements#

Similarly we can add new elements.

new_image = sdata["you"]
sdata["new_image"] = new_image
sdata.write_element("new_image")
sdata.elements_paths_on_disk()
['images/new_image', 'images/you']

In practice, these functions allows workflows in which first simpler SpatialData objects are created, and then gradually they are enriched with new elements, for instance new aligned microscopy images or new segmentation masks.

Modifying the metadata#

Similarly as with adding new elements, it is possible to update the metadata of existing elements, such as the coordinate transformations, using the following APIs (please consult the documentation for details on how to use them):

  • write_transformations()

  • write_metadata()

  • write_consolidated_metadata().

These functions allow workflows that are convenient when registering large datasets, as shown in the notebook “Use landmark annotations to align multiple -omics layers”: you can first save large images to Zarr, load them efficiently, and then try multiple coordinate transformations to perform registration. When you are satisfied with the result you can actually modify the transformation metadata on-disk, without having to modify the large image data.