Spatial query#
Given a single-modal or multi-modal dataset, spatial queries allow to retrieve all the spatial elements and instances that are within a given rectangular window or polygonal shape.
The returned object is a valid SpatialData
object.
Overview#
In this tutorial, we will learn how to filter SpatialData datasets by coordinate system and spatial queries. This is useful for selecting regions of interest for more in-depth processing and for generating visualizations. We will use a Visium dataset as an example in which we first select data by coordinate system and then by a bounding box.
Load the data#
First, we will load the Visium dataset. You can download the data here. Please name it visium_brain.zarr
(eventually using symlinks). After loading the dataset, inspect the resulting spatialdata.SpatialData
object. We see there are data from two Visium slides (i.e., two image and shapes pairs). The Images
are the H&E images and the Shapes
are the Visium spots. The sequence data from the spots are stored in the Table
.
%load_ext jupyter_black
import matplotlib.patches as patches
import spatialdata as sd
import spatialdata_plot
from matplotlib import pyplot as plt
# from napari_spatialdata import Interactive
plt.style.use("dark_background")
sdata = sd.read_zarr("visium_brain.zarr")
print(sdata)
SpatialData object with:
├── Images
│ ├── 'ST8059048_hires_image': SpatialImage[cyx] (3, 2000, 1969)
│ ├── 'ST8059048_lowres_image': SpatialImage[cyx] (3, 600, 591)
│ ├── 'ST8059050_hires_image': SpatialImage[cyx] (3, 2000, 1968)
│ └── 'ST8059050_lowres_image': SpatialImage[cyx] (3, 600, 590)
├── Shapes
│ ├── 'ST8059048': GeoDataFrame shape: (2987, 2) (2D shapes)
│ └── 'ST8059050': GeoDataFrame shape: (3497, 2) (2D shapes)
└── Tables
└── 'table': AnnData (6484, 31053)
with coordinate systems:
▸ 'ST8059048', with elements:
ST8059048_hires_image (Images), ST8059048 (Shapes)
▸ 'ST8059050', with elements:
ST8059050_hires_image (Images), ST8059050 (Shapes)
Plotting the data#
We can plot data and see that the visium spots from both datasets. We see that the two datasets are in separate coordinate systems.
sdata.pl.render_shapes(color="library").pl.show()
Filtering by coordinate system#
For the remainder of this exercise, we will focus on the data from the library ST8059050
. We can extract all of the data in our spatialdata.SpatialData
dataset related to spatialdata.SpatialData
by filtering by coordinate system. You can do so by calling the spatialdata.SpatialData.filter_by_coordinate_system()
method on the spatialdata.SpatialData
object and providing the desired coordinate system (ST8059050
in this case). When we plot the filtered dataset (sdata_ST8059050
) we now see it only contains the data corresponding to library ST8059050
.
sdata_ST8059050 = sdata.filter_by_coordinate_system("ST8059050")
f, ax = plt.subplots(figsize=(7, 7))
sdata_ST8059050.pl.render_shapes(color="library").pl.show(ax=ax)
Spatial query#
Bounding box query#
We can also subset our SpatialData
objects by spatial regions via the spatial query interface. For example, we can select the data contained within the rectangular bounding box show below in black.
bb_xmin = 600
bb_ymin = 1000
bb_w = 200
bb_h = 200
bb_xmax = bb_xmin + bb_w
bb_ymax = bb_ymin + bb_h
f, ax = plt.subplots(figsize=(7, 77))
sdata_ST8059050.pl.render_shapes(color="library").pl.show(ax=ax)
rect = patches.Rectangle((bb_xmin, bb_ymin), bb_w, bb_h, linewidth=5, edgecolor="red", facecolor="none")
ax.add_patch(rect)
<matplotlib.patches.Rectangle at 0x7f0474682200>
We can subset the SpatialData
object via the bounding_box
query (spatialdata.SpatialData.query()
). We must provide the minimum and maximum coordinates of the bounding box as well as the target coordintate system the bounding box coordinates correspond to. We see the resulting cropped_sdata
object still contains the Image
, Shapes
, and Table
elements, but they are all smaller. We can confirm this by plotting the resuling visium spots colored by their Sox2 expression below.
cropped_sdata = sdata_ST8059050.query.bounding_box(
axes=["x", "y"],
min_coordinate=[bb_xmin, bb_ymin],
max_coordinate=[bb_xmax, bb_ymax],
target_coordinate_system="ST8059050",
)
cropped_sdata
SpatialData object with:
├── Images
│ └── 'ST8059050_hires_image': SpatialImage[cyx] (3, 200, 200)
├── Shapes
│ └── 'ST8059050': GeoDataFrame shape: (72, 2) (2D shapes)
└── Tables
└── 'table': AnnData (72, 31053)
with coordinate systems:
▸ 'ST8059050', with elements:
ST8059050_hires_image (Images), ST8059050 (Shapes)
cropped_sdata.pl.render_shapes(color="Sox2").pl.show()
Transformations are preserved after spatial query#
Importantly, transformations are preserved in the spatial query result such that the resulting data are still positioned correctly within their target coordinate system. Notice below that the cropped_sdata
image now has an translation that accounts for the fact that the image has been cropped. Additionally, when we plot the cropped_sdata
overlaid on the original sdata_ST8059050
, we see that it remains positioned correctly.
sdata_ST8059050.images["ST8059050_hires_image"]
<xarray.SpatialImage 'image' (c: 3, y: 2000, x: 1968)> dask.array<from-zarr, shape=(3, 2000, 1968), dtype=uint8, chunksize=(3, 2000, 1968), chunktype=numpy.ndarray> Coordinates: * c (c) int64 0 1 2 * y (y) float64 0.5 1.5 2.5 3.5 ... 1.996e+03 1.998e+03 1.998e+03 2e+03 * x (x) float64 0.5 1.5 2.5 3.5 ... 1.966e+03 1.966e+03 1.968e+03 Attributes: transform: {'ST8059050': Identity }
cropped_sdata.images["ST8059050_hires_image"]
<xarray.SpatialImage 'image' (c: 3, y: 200, x: 200)> dask.array<getitem, shape=(3, 200, 200), dtype=uint8, chunksize=(3, 200, 200), chunktype=numpy.ndarray> Coordinates: * c (c) int64 0 1 2 * y (y) float64 0.5 1.5 2.5 3.5 4.5 ... 195.5 196.5 197.5 198.5 199.5 * x (x) float64 0.5 1.5 2.5 3.5 4.5 ... 195.5 196.5 197.5 198.5 199.5 Attributes: transform: {'ST8059050': Sequence \n Translation (x, y)\n [ 60...
Polygon query#
from shapely import Polygon
polygon = Polygon(
[
(200, 200),
(450, 600),
(450, 350),
(900, 1000),
(1200, 600),
]
)
polygon
from spatialdata import polygon_query
cropped_sdata2 = polygon_query(
sdata_ST8059050,
polygon=polygon,
target_coordinate_system="ST8059050",
)
cropped_sdata2
SpatialData object with:
├── Images
│ └── 'ST8059050_hires_image': SpatialImage[cyx] (3, 800, 1000)
├── Shapes
│ └── 'ST8059050': GeoDataFrame shape: (435, 2) (2D shapes)
└── Tables
└── 'table': AnnData (435, 31053)
with coordinate systems:
▸ 'ST8059050', with elements:
ST8059050_hires_image (Images), ST8059050 (Shapes)
Plotting the data together#
f, ax = plt.subplots(figsize=(7, 7))
sdata_ST8059050.pl.render_shapes(color="library", cmap="Pastel1").pl.show(ax=ax)
cropped_sdata.pl.render_shapes(color="Sox2").pl.show(ax=ax, colorbar=False)
cropped_sdata2.pl.render_shapes(color="Sox2").pl.show(ax=ax, colorbar=False)
cropped_sdata.pl.render_images().pl.show(ax=ax)
cropped_sdata2.pl.render_images().pl.show(ax=ax)
# show the bounding box and the polygons used for the queries
rect = patches.Rectangle((bb_xmin, bb_ymin), bb_w, bb_h, linewidth=5, edgecolor="red", facecolor="none")
poly_patch = patches.Polygon(list(polygon.exterior.coords), linewidth=5, edgecolor="red", facecolor="none")
ax.add_patch(rect)
ax.add_patch(poly_patch)
<matplotlib.patches.Polygon at 0x7f047650ffd0>