Spatial query#

Given a single-modal or multi-modal dataset, spatial queries allow to retrieve all the spatial elements and instances that are within a given rectangular window or polygonal shape. The returned object is a valid SpatialData object.

Overview#

In this tutorial, we will learn how to filter SpatialData datasets by coordinate system and spatial queries. This is useful for selecting regions of interest for more in-depth processing and for generating visualizations. We will use a Visium dataset as an example in which we first select data by coordinate system and then by a bounding box.

Load the data#

First, we will load the Visium dataset. You can download the data here. Please name it visium_brain.zarr (eventually using symlinks). After loading the dataset, inspect the resulting spatialdata.SpatialData object. We see there are data from two Visium slides (i.e., two image and shapes pairs). The Images are the H&E images and the Shapes are the Visium spots. The sequence data from the spots are stored in the Table.

%load_ext jupyter_black

import matplotlib.patches as patches
import spatialdata as sd
import spatialdata_plot
from matplotlib import pyplot as plt

# from napari_spatialdata import Interactive
plt.style.use("dark_background")
sdata = sd.read_zarr("visium_brain.zarr")
print(sdata)
SpatialData object with:
├── Images
│     ├── 'ST8059048_hires_image': SpatialImage[cyx] (3, 2000, 1969)
│     ├── 'ST8059048_lowres_image': SpatialImage[cyx] (3, 600, 591)
│     ├── 'ST8059050_hires_image': SpatialImage[cyx] (3, 2000, 1968)
│     └── 'ST8059050_lowres_image': SpatialImage[cyx] (3, 600, 590)
├── Shapes
│     ├── 'ST8059048': GeoDataFrame shape: (2987, 2) (2D shapes)
│     └── 'ST8059050': GeoDataFrame shape: (3497, 2) (2D shapes)
└── Tables
      └── 'table': AnnData (6484, 31053)
with coordinate systems:
▸ 'ST8059048', with elements:
        ST8059048_hires_image (Images), ST8059048 (Shapes)
▸ 'ST8059050', with elements:
        ST8059050_hires_image (Images), ST8059050 (Shapes)

Plotting the data#

We can plot data and see that the visium spots from both datasets. We see that the two datasets are in separate coordinate systems.

sdata.pl.render_shapes(color="library").pl.show()
../../../../_images/69da0dc74daf97bb426de71bfeafafb1e8e34d19a4142eaa8243243cabf3c8e6.png

Filtering by coordinate system#

For the remainder of this exercise, we will focus on the data from the library ST8059050. We can extract all of the data in our spatialdata.SpatialData dataset related to spatialdata.SpatialData by filtering by coordinate system. You can do so by calling the spatialdata.SpatialData.filter_by_coordinate_system() method on the spatialdata.SpatialData object and providing the desired coordinate system (ST8059050 in this case). When we plot the filtered dataset (sdata_ST8059050) we now see it only contains the data corresponding to library ST8059050.

sdata_ST8059050 = sdata.filter_by_coordinate_system("ST8059050")

f, ax = plt.subplots(figsize=(7, 7))
sdata_ST8059050.pl.render_shapes(color="library").pl.show(ax=ax)
../../../../_images/7c44375f64261b7c29f5748d86cdbda8223b77fc7f244d2cf7f03dee6eb667c9.png

Spatial query#

Bounding box query#

We can also subset our SpatialData objects by spatial regions via the spatial query interface. For example, we can select the data contained within the rectangular bounding box show below in black.

bb_xmin = 600
bb_ymin = 1000
bb_w = 200
bb_h = 200
bb_xmax = bb_xmin + bb_w
bb_ymax = bb_ymin + bb_h
f, ax = plt.subplots(figsize=(7, 77))
sdata_ST8059050.pl.render_shapes(color="library").pl.show(ax=ax)
rect = patches.Rectangle((bb_xmin, bb_ymin), bb_w, bb_h, linewidth=5, edgecolor="red", facecolor="none")
ax.add_patch(rect)
<matplotlib.patches.Rectangle at 0x7f0474682200>
../../../../_images/1f3fbfecccd8f953234c1f00b61bd7b83521e82997944eacb38a32274c446413.png

We can subset the SpatialData object via the bounding_box query (spatialdata.SpatialData.query()). We must provide the minimum and maximum coordinates of the bounding box as well as the target coordintate system the bounding box coordinates correspond to. We see the resulting cropped_sdata object still contains the Image, Shapes, and Table elements, but they are all smaller. We can confirm this by plotting the resuling visium spots colored by their Sox2 expression below.

cropped_sdata = sdata_ST8059050.query.bounding_box(
    axes=["x", "y"],
    min_coordinate=[bb_xmin, bb_ymin],
    max_coordinate=[bb_xmax, bb_ymax],
    target_coordinate_system="ST8059050",
)

cropped_sdata
SpatialData object with:
├── Images
│     └── 'ST8059050_hires_image': SpatialImage[cyx] (3, 200, 200)
├── Shapes
│     └── 'ST8059050': GeoDataFrame shape: (72, 2) (2D shapes)
└── Tables
      └── 'table': AnnData (72, 31053)
with coordinate systems:
▸ 'ST8059050', with elements:
        ST8059050_hires_image (Images), ST8059050 (Shapes)
cropped_sdata.pl.render_shapes(color="Sox2").pl.show()
../../../../_images/0595d58fbeb2b79f7601d86a1d40bfdf765324e375a87cc748c4a1d2529622e6.png

Transformations are preserved after spatial query#

Importantly, transformations are preserved in the spatial query result such that the resulting data are still positioned correctly within their target coordinate system. Notice below that the cropped_sdata image now has an translation that accounts for the fact that the image has been cropped. Additionally, when we plot the cropped_sdata overlaid on the original sdata_ST8059050, we see that it remains positioned correctly.

sdata_ST8059050.images["ST8059050_hires_image"]
<xarray.SpatialImage 'image' (c: 3, y: 2000, x: 1968)>
dask.array<from-zarr, shape=(3, 2000, 1968), dtype=uint8, chunksize=(3, 2000, 1968), chunktype=numpy.ndarray>
Coordinates:
  * c        (c) int64 0 1 2
  * y        (y) float64 0.5 1.5 2.5 3.5 ... 1.996e+03 1.998e+03 1.998e+03 2e+03
  * x        (x) float64 0.5 1.5 2.5 3.5 ... 1.966e+03 1.966e+03 1.968e+03
Attributes:
    transform:  {'ST8059050': Identity }
cropped_sdata.images["ST8059050_hires_image"]
<xarray.SpatialImage 'image' (c: 3, y: 200, x: 200)>
dask.array<getitem, shape=(3, 200, 200), dtype=uint8, chunksize=(3, 200, 200), chunktype=numpy.ndarray>
Coordinates:
  * c        (c) int64 0 1 2
  * y        (y) float64 0.5 1.5 2.5 3.5 4.5 ... 195.5 196.5 197.5 198.5 199.5
  * x        (x) float64 0.5 1.5 2.5 3.5 4.5 ... 195.5 196.5 197.5 198.5 199.5
Attributes:
    transform:  {'ST8059050': Sequence \n    Translation (x, y)\n        [ 60...

Polygon query#

from shapely import Polygon

polygon = Polygon(
    [
        (200, 200),
        (450, 600),
        (450, 350),
        (900, 1000),
        (1200, 600),
    ]
)
polygon
../../../../_images/62e612b5312feefcb1fe2b6b2629281174b854ed936f911256a9dfaf0b4c12d9.svg
from spatialdata import polygon_query

cropped_sdata2 = polygon_query(
    sdata_ST8059050,
    polygon=polygon,
    target_coordinate_system="ST8059050",
)

cropped_sdata2
SpatialData object with:
├── Images
│     └── 'ST8059050_hires_image': SpatialImage[cyx] (3, 800, 1000)
├── Shapes
│     └── 'ST8059050': GeoDataFrame shape: (435, 2) (2D shapes)
└── Tables
      └── 'table': AnnData (435, 31053)
with coordinate systems:
▸ 'ST8059050', with elements:
        ST8059050_hires_image (Images), ST8059050 (Shapes)

Plotting the data together#

f, ax = plt.subplots(figsize=(7, 7))
sdata_ST8059050.pl.render_shapes(color="library", cmap="Pastel1").pl.show(ax=ax)
cropped_sdata.pl.render_shapes(color="Sox2").pl.show(ax=ax, colorbar=False)
cropped_sdata2.pl.render_shapes(color="Sox2").pl.show(ax=ax, colorbar=False)

cropped_sdata.pl.render_images().pl.show(ax=ax)
cropped_sdata2.pl.render_images().pl.show(ax=ax)

# show the bounding box and the polygons used for the queries
rect = patches.Rectangle((bb_xmin, bb_ymin), bb_w, bb_h, linewidth=5, edgecolor="red", facecolor="none")
poly_patch = patches.Polygon(list(polygon.exterior.coords), linewidth=5, edgecolor="red", facecolor="none")
ax.add_patch(rect)
ax.add_patch(poly_patch)
<matplotlib.patches.Polygon at 0x7f047650ffd0>
../../../../_images/17390eb106a986da38040e4f49c908eee13d603b7a1a83db5b37f0b06f12b587.png