Raster data and translations (technical topic)#

The following is a technical topic, and may not be relevant for all users.

Due to some differences between the on-disk model (NGFF specification) and the in-memory model (Xarray), our current implementation is that if you call transform() on a raster object (image, labels), the translation part of the transformation will not be applied. Similarly, if you perform a query operation to the data, the returned raster data will contain a translation object to correct for the (0, 0) pixel.

This can create confusion because if you have an image and some points aligned to them, it may look like that after calling transform() the pixels of the image and the coordiantes of the points do not correspond anymore. But this is expected by design: the elements are not guaranteed to be aligned coordinate-wise, what is aligned is the data after applying a transformation to a common coordinate system, and since the image will still contain a translation, the alignment is preserved.

We anticipate that we are implementing a cleaner approach in this repository https://github.com/BioImageTools/ngff-transformations: the plan is to move out the transformation code from the spatialdata framework so that it’s accessible also for the bioimaging community, not only to users working with spatial omics data.

Let’s discuss the currently implemented approach via code in the examples below.

import matplotlib.pyplot as plt
import spatialdata
import spatialdata_plot
from spatialdata.transformations import (
    Affine,
    Identity,
    Translation,
    get_transformation,
    get_transformation_between_coordinate_systems,
    set_transformation,
)

Please download the data from here: mouse_liver dataset and adjust the variable containing the location of the .zarr file.

sdata = spatialdata.read_zarr("./mouse_liver.zarr")

For simplicity, we will transform the multiscale raw image (DataTree) as a “flat” DataArray:

sdata["raw_image"] = sdata["raw_image"]["scale0"]["image"]

Let’s translate the image and the nucleus boundaries. As you can see, spatialdata-plot shows the data correctly (notice how the x and y ranges change in the subplot on the right).

translation = Translation([2000.0, -2000], axes=("x", "y"))

set_transformation(sdata["raw_image"], translation)
set_transformation(sdata["nucleus_boundaries"], translation)

sdata.pl.render_images("raw_image").pl.render_shapes("nucleus_boundaries").pl.show(coordinate_systems=["global"])
INFO     Rasterizing image for faster rendering.
../../../../_images/6f27a41ecaca3ca00eb414c3d6600db8cb5e00b93f40d302b60db767a3ac09ce.png

Now let’s transform the data, as you would expect, we obtain the same plot.

from spatialdata import transform

transformed_image = transform(sdata["raw_image"], to_coordinate_system="global")
transformed_nucleus_boundaries = transform(sdata["nucleus_boundaries"], to_coordinate_system="global")

sdata["transformed_raw_image"] = transformed_image
sdata["transformed_nucleus_boundaries"] = transformed_nucleus_boundaries

sdata.pl.render_images("transformed_raw_image").pl.render_shapes("transformed_nucleus_boundaries").pl.show()
INFO     Dropping coordinate system 'my_other_space0' since it doesn't have relevant elements.                     
INFO     Rasterizing image for faster rendering.
../../../../_images/6f27a41ecaca3ca00eb414c3d6600db8cb5e00b93f40d302b60db767a3ac09ce.png

But notice how the image still contains a translation, while the nucleus_boundaries now don’t.

print(get_transformation(sdata["raw_image"]))
print(get_transformation(sdata["nucleus_boundaries"]))
print(get_transformation(sdata["transformed_raw_image"]))
print(get_transformation(sdata["transformed_nucleus_boundaries"]))
Translation (x, y)
    [ 2000. -2000.]
Translation (x, y)
    [ 2000. -2000.]
Translation (c, y, x)
    [    0. -2000.  2000.]
Identity

This can create confusion. Let’s manually plot the data to show the source of confusion.

plt.figure()
ax = plt.gca()

sdata["raw_image"].sel(c=0).plot.imshow(ax=ax, yincrease=False)
ax.scatter(sdata["nucleus_boundaries"].geometry.centroid.x, sdata["nucleus_boundaries"].geometry.centroid.y, s=10)
<matplotlib.collections.PathCollection at 0x3753d80d0>
../../../../_images/a6a0c51b90bd740aa71db3d163d9875c68cab501456b5ce11cbdd350a88fc5c7.png

Below you will see that the data seems to be not aligned, but this is expected; as we saw above, when transformations are considered (e.g. when making the plot with spatialdata-plot) the data is correcly shown as aligned.

plt.figure()
ax = plt.gca()

sdata["transformed_raw_image"].sel(c=0).plot.imshow(ax=ax, yincrease=False)
ax.scatter(
    sdata["transformed_nucleus_boundaries"].geometry.centroid.x,
    sdata["transformed_nucleus_boundaries"].geometry.centroid.y,
    s=10,
)
<matplotlib.collections.PathCollection at 0x375366ef0>
../../../../_images/324ef9fca9da27c793877eff771e95aaf912569cc909e8de4488dcc111a8a9ee.png

Workarounds#

Being aware of the implemented behavior is generally enough to avoid bugs like the one that appears in the plot above.

Still, we have 2 functions that can be useful: the rasterize() function and then transform_to_data_extent().

The first function, rasterize(), lets you rasterize the data into a target bounding box specified in a desired coordinate system. Translation/rotations/etc will be applied and 0 (black) will be used to pad all the portions of the image outside its extent. This function can be helpful because it “pads” the translation part with 0.

from spatialdata import rasterize
rasterized_raw_image = rasterize(
    sdata["raw_image"],
    min_coordinate=[-2000, -2000],
    max_coordinate=[10_000, 10_000],
    axes=("x", "y"),
    target_coordinate_system="global",
    target_unit_to_pixels=1,
)
sdata["rasterized_raw_image"] = rasterized_raw_image

sdata.pl.render_images("rasterized_raw_image").pl.render_shapes(fill_alpha=0.2).pl.show()
INFO     Dropping coordinate system 'my_other_space0' since it doesn't have relevant elements.                     
INFO     Rasterizing image for faster rendering.
../../../../_images/da2dea7c760c89ac62425510d6d4080090e516696fdc6ed5d08c342cc49fff9d.png

The second function, transform_to_data_extent(), first computes the extent of the data using spatialdata.get_extent() and then transforms all the elements (images, shapes, etc) to the data extent such that all the elemnts will have the same transformation. In this way the correspondance between pixels and coordinates is guaranteed.

from spatialdata._core.operations._utils import transform_to_data_extent

sdata2 = transform_to_data_extent(
    sdata.subset(["raw_image", "nucleus_boundaries"]), coordinate_system="global", target_unit_to_pixels=1
)
sdata2.pl.render_images("raw_image").pl.render_shapes().pl.show()
INFO     Rasterizing image for faster rendering.
../../../../_images/6f27a41ecaca3ca00eb414c3d6600db8cb5e00b93f40d302b60db767a3ac09ce.png

Crucially, now all the elements have the same transformation (up to half pixel difference, due to the bug referenced in the previous notebook)!

print("before:")
# before (raster data still has the translation but not the vector data)
print(get_transformation(sdata["transformed_raw_image"]))
print(get_transformation(sdata["transformed_nucleus_boundaries"]))

print("\nafter:")
# all the data has the same transformation
print(get_transformation(sdata2["raw_image"]))
print(get_transformation(sdata2["nucleus_boundaries"]))
before:
Translation (c, y, x)
    [    0. -2000.  2000.]
Identity 

after:
Sequence 
    Translation (z, y, x)
        [-0.5 -0.5 -0.5]
    Scale (y, x)
        [1. 1.]
    Translation (y, x)
        [-2000.  2000.]
    Translation (z, y, x)
        [0.5 0.5 0.5]
Sequence 
    Sequence 
        Scale (y, x)
            [1. 1.]
        Translation (y, x)
            [-2000.  2000.]
        Identity

This means that now we can manually slice both the raster and vector data, since they are aligned together also in the intrinsic (“pixel”) coordinate system.

axes = plt.subplots(1, 2, figsize=(10, 4))[1]

sdata["transformed_raw_image"].sel(c=0).plot.imshow(ax=axes[0], yincrease=False)
axes[0].scatter(
    sdata["transformed_nucleus_boundaries"].geometry.centroid.x,
    sdata["transformed_nucleus_boundaries"].geometry.centroid.y,
    s=10,
)
axes[0].set_title("before")

sdata2["raw_image"].sel(c=0).plot.imshow(ax=axes[1], yincrease=False)
axes[1].scatter(
    sdata2["nucleus_boundaries"].geometry.centroid.x, sdata2["nucleus_boundaries"].geometry.centroid.y, s=10
)
axes[1].set_title("after")
Text(0.5, 1.0, 'after')
../../../../_images/4d48625438c005b7c241433cf35e8465b07901fd0e5926784107341e0eac9502.png

A simpler approach that we will implement#

Let’s show a preview of what we are implementing next. Here we are slicing the vector and raster data using the xarray and geopandas APIs (no coordinate systems are used, we operate on the coordinates).

# let's manually slice the data
x_slice = slice(1000, 2000)
y_slice = slice(1000, 2000)

# .sel is the coordinate based indexed available via geopandas APIs
image_slice = sdata["raw_image"].sel(x=x_slice, y=y_slice, c=0)
# .cx is the coordinate based indexed available via geopandas APIs
nucleus_boundaries_slice = sdata["nucleus_boundaries"].cx[x_slice, y_slice]

plt.figure()
ax = plt.gca()

image_slice.plot.imshow(ax=ax, yincrease=False)
ax.scatter(nucleus_boundaries_slice.geometry.centroid.x, nucleus_boundaries_slice.geometry.centroid.y, s=20)
<matplotlib.collections.PathCollection at 0x33dfa81f0>
../../../../_images/90b57ae6e819ca2154cc9abd289611a27aab350ed7fb9a9c565dc9f41309b001.png

The simplicity of the operation above comes with several limitations, in particular:

  • no affine transformations are supported

  • no support for multiple coordinate systems

  • the transformations are not serialized in a interoperable format like NGFF.

We designed and are gradually implementing a new system that will bridge the NGFF transformations and the coordinate-based xarray approach to transformations. This will enable to interchengeably write code that operates on the NGFF transformations and on the xarray coordinates (as shown in the code snipepd above). This will greatly simplify the challenges discussed previously, without sacrificing the power of NGFF transformations. See the status of our implementation here: https://github.com/scverse/spatialdata/issues/308 and here https://github.com/BioImageTools/ngff-transformations.

Other technical topics#

Let’s conclude the tutorial with some important, but more technical, topics.

Transformations and Axes#

As quickly shown above, given a transformation object, we can compute an affine matrix from it with the spatialdata.transformations.Transformation.to_affine_matrix() method. This method takes as input the axes of the input element and the axes of the output element. The affine matrix is built consistently according to the input and output axes.

Let’s see an example. Consider this Translation object.

print(translation)
Translation (x, y)
    [ 2000. -2000.]

We can choose the affine matrix representation that we need, by specifying input and output axes.

print(translation.to_affine_matrix(input_axes=("x", "y"), output_axes=("x", "y")))
print(translation.to_affine_matrix(input_axes=("x", "y", "z"), output_axes=("x", "y", "z")))
[[ 1.e+00  0.e+00  2.e+03]
 [ 0.e+00  1.e+00 -2.e+03]
 [ 0.e+00  0.e+00  1.e+00]]
[[ 1.e+00  0.e+00  0.e+00  2.e+03]
 [ 0.e+00  1.e+00  0.e+00 -2.e+03]
 [ 0.e+00  0.e+00  1.e+00  0.e+00]
 [ 0.e+00  0.e+00  0.e+00  1.e+00]]

The affine matrix is built consistently according to the input and output axes; the axes at input and output can also have mismatch and the affine matrix will be built accordingly.

This is what interally allows to specify the same transformations to different types of elements (for instance xyz points and cyx images) and still get consistent results.

print(translation.to_affine_matrix(input_axes=("x", "y", "c"), output_axes=("c", "z", "y", "x")))
[[   0.    0.    1.    0.]
 [   0.    0.    0.    0.]
 [   0.    1.    0. -200.]
 [   1.    0.    0.  200.]
 [   0.    0.    0.    1.]]

notice that z doesn’t appear in the output axes since it was not in the input axes, but c is passed through since it was in the input axes This is the same matrix that gets printed as output above.

x

y

c

c

0

0

1

0

z

0

0

0

0

y

0

1

0

300

x

1

0

0

500

0

0

0

1

Transformations between coordinate systems#

As we mentioned before, SpatialData doesn’t allow to store transformations between coordinate systems, or between elements, only between elements and coordinate systems. Still, SpatialData also allows to compute the transformation between any pair of {element, coordinate system). This is possible with using the spatialdata.ops.get_transformation_between_coordinate_systems() function, which internally computes the graph of transformations and finds a path among transformations and their inverses (when they exist).

The function takes as argument the SpatialData object and the names of the coordinate systems. It returns the transformation that needs to be applied to the elements in the first coordinate system to be converted to the second coordinate system.

Let’s see an example.

import math

theta = math.pi / 6
rotation = Affine(
    [
        [math.cos(theta), -math.sin(theta), 0],
        [math.sin(theta), math.cos(theta), 0],
        [0, 0, 1],
    ],
    input_axes=("x", "y"),
    output_axes=("x", "y"),
)

set_transformation(sdata.images["raw_image"], rotation)
print(get_transformation_between_coordinate_systems(sdata, sdata.images["raw_image"], "global"))
print(get_transformation_between_coordinate_systems(sdata, "global", sdata.images["raw_image"]))
Sequence 
    Affine (x, y -> x, y)
        [ 0.8660254 -0.5        0.       ]
        [0.5       0.8660254 0.       ]
        [0. 0. 1.]
Sequence 
    Affine (x, y -> x, y)
        [0.8660254 0.5       0.       ]
        [-0.5        0.8660254  0.       ]
        [0. 0. 1.]

In the example above, we are effectively only returning the transformation that is needed to map the elements to the global coordinate system, and vice versa. Notice how one transformation is the inverse of the other.

To complete the example, let’s use the function to derive a transformation between different coordinate systems (above we used it to derive the transformation between a coordinate system and an element). We first need to create a new coordinate system for some of the elements. We will just assign an spatialdata.transformations.Identity transfomation.

set_transformation(sdata.images["raw_image"], Identity(), "my_other_space0")
print(get_transformation(sdata.images["raw_image"], get_all=True))
print(sdata.coordinate_systems)
{'global': Affine (x, y -> x, y)
    [ 0.8660254 -0.5        0.       ]
    [0.5       0.8660254 0.       ]
    [0. 0. 1.], 'my_other_space0': Identity }
['my_other_space0', 'global']

We then get the transformation between the "global" coordinate system and the "my_other_space0" coordinate system.

As expected, it is a spatialdata.transformations.Sequence transformation with an spatialdata.transformations.Affine and an spatialdata.transformations.Identity composed.

get_transformation_between_coordinate_systems(sdata, "global", "my_other_space0")
Sequence 
    Affine (x, y -> x, y)
        [0.8660254 0.5       0.       ]
        [-0.5        0.8660254  0.       ]
        [0. 0. 1.]
    Identity