Technology focus: Xenium#

This notebook will present an overview of the plotting functionalities of the spatialdata framework, in the context of a Xenium dataset.

Other notebooks, focused on data manipualtion, are also available for Xenium data:

Alignment of a Xenium and Visium datasets (this is one of the reproducibility notebooks for the analysis of our paper) notebook
Workflow showing saving annotations in Xenium Explorer and accessing them in SpaitalData, and viceversa notebook. Tutorial from Quentin Blampey.

%load_ext jupyter_black
%load_ext autoreload
%autoreload 2

Loading the data#

A reader for Xenium data is available in spatialdata-io. We used it to parse and convert to Zarr a Xenium dataset of Human Lung Cancer.

You can download the data from the link above, or from this convenience python script and convert it to spatialdata format with this script, rename the .zarr store to xenium.zarr and place it in the current folder (in alternatively you can use symlinks to make the data visible).

The dataset used in this notebook is the latest Xenium Multimodal Cell Segmentation extension of the Xenium technology, available for data processed using Xenium Analyzer version 2.0.0 and when Cell Segmentation Kit was used. Nevertheless, the xenium() reader supports all the Xenium versions.

xenium_path = "./xenium_2.0.0.zarr"

%%time
import spatialdata as sd

sdata = sd.read_zarr(xenium_path)
sdata

CPU times: user 1.11 s, sys: 173 ms, total: 1.28 s
Wall time: 1.36 s

SpatialData object with:
├── Images
│     ├── 'he_image': MultiscaleSpatialImage[cyx] (3, 45087, 11580), (3, 22543, 5790), (3, 11271, 2895), (3, 5635, 1447), (3, 2817, 723)
│     └── 'morphology_focus': MultiscaleSpatialImage[cyx] (5, 17098, 51187), (5, 8549, 25593), (5, 4274, 12796), (5, 2137, 6398), (5, 1068, 3199)
├── Labels
│     ├── 'cell_labels': MultiscaleSpatialImage[yx] (17098, 51187), (8549, 25593), (4274, 12796), (2137, 6398), (1068, 3199)
│     └── 'nucleus_labels': MultiscaleSpatialImage[yx] (17098, 51187), (8549, 25593), (4274, 12796), (2137, 6398), (1068, 3199)
├── Points
│     └── 'transcripts': DataFrame with shape: (12165021, 11) (3D points)
├── Shapes
│     ├── 'cell_boundaries': GeoDataFrame shape: (162254, 1) (2D shapes)
│     ├── 'cell_circles': GeoDataFrame shape: (162254, 2) (2D shapes)
│     └── 'nucleus_boundaries': GeoDataFrame shape: (156628, 1) (2D shapes)
└── Tables
      └── 'table': AnnData (162254, 377)
with coordinate systems:
▸ 'global', with elements:
        he_image (Images), morphology_focus (Images), cell_labels (Labels), nucleus_labels (Labels), transcripts (Points), cell_boundaries (Shapes), cell_circles (Shapes), nucleus_boundaries (Shapes)

The datasets contains 2 large microscopy image, represented as a multiscale, chunked image; the first one, he_image is the H&E image of the tissue, stained post-Xenium measurement, and the morphology_focus image which consists of the image of the tissue used for the Xenium experiment.

Furthermore, it also contains label images, that are the segmentation masks of cells or nuclei (cell_labels and nucleus_labels respectively).

Plotting the images#

Let’s visualize the images.

import matplotlib.pyplot as plt
import spatialdata_plot

axes = plt.subplots(1, 2, figsize=(10, 10))[1].flatten()
sdata.pl.render_images("he_image").pl.show(ax=axes[0], title="H&E image")
sdata.pl.render_images("morphology_focus").pl.show(ax=axes[1], title="Morphology image")

../../../../_images/1c4df40a470fdcc7d9f936696333edcf9014fbf9105a73a2da1411c14d7d6503.png

We can also crop and visualize a smaller region to appreciate morphological differences measured by the two modalities, together with the cell segmentation mask. Spatial cropping is introduced in a dedicated notebook in the documentation.

from spatialdata import bounding_box_query

axes = plt.subplots(3, 1, figsize=(20, 13))[1].flatten()
crop0 = lambda x: bounding_box_query(
    x,
    min_coordinate=[20_000, 8000],
    max_coordinate=[22_000, 8500],
    axes=("x", "y"),
    target_coordinate_system="global",
)
crop0(sdata).pl.render_images("he_image").pl.show(ax=axes[0], title="H&E image", coordinate_systems="global")
crop0(sdata).pl.render_images("morphology_focus").pl.show(
    ax=axes[1], title="Morphology image", coordinate_systems="global"
)
crop0(sdata).pl.render_labels("cell_labels").pl.show(ax=axes[2], title="Cell labels", coordinate_systems="global")

../../../../_images/7d21a354a58ec1072424701fdc08e14b9a30981916d3dbb71e94afc7429b432b.png

Exploring the multimodal segmentation#

Let’s visualize the 4 channels that are used, for this dataset, for computing the cell segmentation. This is specific for datasets profiled using the Cell Segmentation Kit and processed with the Xenium Analyzer >= 2.0.0.

crop0(sdata).pl.render_images("morphology_focus", channel=0).pl.show(title="DAPI (nuclear)", figsize=(10, 10))
crop0(sdata).pl.render_images("morphology_focus", channel=1).pl.show(
    title="AlphaSMA/Vimentin (interior - protein)", figsize=(10, 10)
)
crop0(sdata).pl.render_images("morphology_focus", channel=2).pl.show(
    title="ATP1A1/CD45/E-Cadherin (boundary)", figsize=(10, 10)
)
crop0(sdata).pl.render_images("morphology_focus", channel=3).pl.show(title="18S (interior - RNA)", figsize=(10, 10))

../../../../_images/4cb15b35e09fa46c87c9d0a66be79857ccafb80293039224c65d6cc7a4b7a0b5.png

../../../../_images/1bc3064fd75738636570cb8f8d9b174506d343a63fa98253f2b45b3bf913f0a0.png

../../../../_images/cb49cf97c2edcd98532eeb56a8eceee16a29c75061beba33e1edc77e1fc43fda.png

../../../../_images/c91beb2040c553c76b7fd01f6d18e96c8ab4481563885ffc9258a9d7a9a19d00.png

Plotting the gene expression data#

With SpatialData we can also plot gene expression data on top of the images. We can plot the expression of a single gene, or the expression of multiple genes. To showcase different functionalities, we will use the cell_circles Shapes element to overlay gene expression over the image, instead of the raster labels showed above.

Let’s first normalize and select highly variable genes.

import scanpy as sc

sc.pp.normalize_total(sdata.tables["table"])
sc.pp.log1p(sdata.tables["table"])
sc.pp.highly_variable_genes(sdata.tables["table"])
sdata.tables["table"].var.sort_values("means")

	gene_ids	feature_types	genome	highly_variable	means	dispersions	dispersions_norm
UMOD	ENSG00000169344	Gene Expression	Unknown	False	0.000855	0.733388	0.142545
LY6D	ENSG00000167656	Gene Expression	Unknown	False	0.001364	0.557849	-0.404188
SST	ENSG00000157005	Gene Expression	Unknown	False	0.001424	0.184340	-1.567515
TCF15	ENSG00000125878	Gene Expression	Unknown	False	0.001587	0.189523	-1.551373
CCL27	ENSG00000213927	Gene Expression	Unknown	False	0.001663	0.526737	-0.501089
...	...	...	...	...	...	...	...
CAPN8	ENSG00000203697	Gene Expression	Unknown	True	0.859493	1.032884	1.000000
EPCAM	ENSG00000119888	Gene Expression	Unknown	False	0.897788	0.938736	-0.707107
TCIM	ENSG00000176907	Gene Expression	Unknown	True	0.948361	1.136240	0.707107
CYP2B6	ENSG00000197408	Gene Expression	Unknown	True	0.989477	1.524039	1.000000
MALL	ENSG00000144063	Gene Expression	Unknown	True	1.119610	1.181179	1.000000

377 rows × 7 columns

gene_name = "EPCAM"
crop0(sdata).pl.render_images("morphology_focus").pl.render_shapes(
    "cell_circles",
    color=gene_name,
).pl.show(title=f"{gene_name} expression over Morphology image", coordinate_systems="global", figsize=(10, 5))

../../../../_images/32f6651ff2ca1c5aa48a8b5b70e40769f0f557bf125e496fd4ed7834215d9249.png

We can also assign the table to a different region, in order to overlay the gene expression for the specified region. SpatialData stores the information of which region is the table in the following places:

in sdata.tables[<table_name>].uns["spatialdata_attrs]["region"] it stores the regions that the table is annotating.
The name of the region is stored under sdata.tables[<table_name>].obs["region_key"].

We can use a utility function to modify that information, and hence plot the gene expression for a different shapes.

sdata.tables["table"].obs["region"] = "cell_boundaries"
sdata.set_table_annotates_spatialelement("table", region="cell_boundaries")

crop0(sdata).pl.render_images("he_image").pl.render_shapes(
    "cell_boundaries",
    color=gene_name,
).pl.show(title=f"{gene_name} expression over H&E image", coordinate_systems="global", figsize=(10, 5))

../../../../_images/fd5d8d6ca5852f0a02b97c0b3cf0d8409e8f955f8893a59eeec2ec270d7640ce.png

Finally, we can also plot transcript locations, together with the same layers selected as before.

crop0(sdata).pl.render_images("he_image").pl.render_shapes(
    "cell_boundaries",
    color=gene_name,
).pl.render_points(
    "transcripts",
    color="feature_name",
    groups=gene_name,
    palette="orange",
).pl.show(title=f"{gene_name} expression over H&E image", coordinate_systems="global", figsize=(10, 5))

../../../../_images/b78646ddd37fa6ffeb0ca3ebf68a612a61e424dcb8473b44cba5aed29a012823.png

Technology focus: Xenium

Contents

Technology focus: Xenium#

Loading the data#

Plotting the images#

Exploring the multimodal segmentation#

Plotting the gene expression data#