spatialdata.aggregate

Contents

spatialdata.aggregate#

spatialdata.aggregate(values, by, values_sdata=None, by_sdata=None, value_key=None, agg_func='sum', target_coordinate_system='global', fractions=False, region_key='region', instance_key='instance_id', deepcopy=True, table_name=None, buffer_resolution=16, **kwargs)#

Aggregate values by given region.

Parameters:
  • values_sdata (Optional[SpatialData] (default: None)) – SpatialData object containing the values to aggregate: if None, values must be a SpatialElement; if not None, values must be a string.

  • values (DataFrame | GeoDataFrame | SpatialImage | MultiscaleSpatialImage | str) – The values to aggregate: if values_sdata is None, must be a SpatialElement, otherwise must be a string specifying the name of the SpatialElement in values_sdata

  • by_sdata (Optional[SpatialData] (default: None)) – Regions to aggregate by: if None, by must be a SpatialElement; if not None, by must be a string.

  • by (GeoDataFrame | SpatialImage | MultiscaleSpatialImage | str) – The regions to aggregate by: if by_sdata is None, must be a SpatialElement, otherwise must be a string specifying the name of the SpatialElement in by_sdata

  • value_key (Union[str, list[str], None] (default: None)) –

    Name (or list of names) of the columns containing the values to aggregate; can refer both to numerical or categorical values. If the values are categorical, value_key can’t be a list.

    The key can be:

    • the name of a column(s) in the dataframe (Dask DataFrame for points or GeoDataFrame for shapes);

    • the name of obs column(s) in the associated AnnData table (for shapes and labels);

    • the name of a var(s), referring to the column(s) of the X matrix in the table (for shapes and labels).

    If nothing is passed here, it defaults to the equivalent of a column of ones. Defaults to FEATURE_KEY for points (if present).

  • agg_func (str | list[str] (default: 'sum')) – Aggregation function to apply over point values, e.g. "mean", "sum", "count". Passed to pandas.DataFrame.groupby.agg() or to xrspatial.zonal_stats() according to the type of values.

  • target_coordinate_system (str (default: 'global')) – Coordinate system to transform to before aggregating.

  • fractions (bool (default: False)) –

    Adjusts for partial areas overlap between regions in values and by. More precisely: in the case in which a region in by partially overlaps with a region in values, this setting specifies whether the value to aggregate should be considered as it is (fractions = False) or it is to be multiplied by the following ratio: “area of the intersection between the two regions” / “area of the region in values”.

    Additional details:

    • default is fractions = False.

    • when aggregating points this parameter must be left to False, as the points don’t have area (otherwise

      a table of zeros would be obtained);

    • for categorical values "count" and "sum" are equivalent when fractions = False, but when

      fractions = True, "count" and "sum" are different: count would give not meaningful results and so it’s not allowed, while "sum" actually sums the values of the intersecting regions, and should therefore be used.

    • aggregating categorical values with agg_func = "mean" is not allowed as it give not meaningful results.

  • region_key (str (default: 'region')) – Name that will be given to the new region column in the returned aggregated table.

  • instance_key (str (default: 'instance_id')) – Name that will be given to the new instance id column in the returned aggregated table.

  • deepcopy (bool (default: True)) – Whether to deepcopy the shapes in the returned SpatialData object. If the shapes are large (e.g. large multiscale labels), you may consider disabling the deepcopy to use a lazy Dask representation.

  • table_name (Optional[str] (default: None)) – The table optionally containing the value_key and the name of the table in the returned SpatialData object.

  • buffer_resolution (int (default: 16)) – Resolution parameter to pass to the of the .buffer() method to convert circles to polygons. A higher value results in a more accurate representation of the circle, but also in a more complex polygon and computation.

  • kwargs (Any) – Additional keyword arguments to pass to xrspatial.zonal_stats().

Return type:

SpatialData

Returns:

: Returns a SpatialData object with the by shapes as SpatialElement and a table with the aggregated values annotating the shapes.

If value_key refers to a categorical variable, the table in the SpaitalData object has shape (by.shape[0], <n categories>).

Notes

This function returns a SpatialData object, so to access the aggregated table you can use the table attribute`.

The shapes in the returned SpatialData objects are a reference to the original one. If you want them to be a different object you can do a deepcopy manually (this loads the data into memory), or you can save the SpatialData object to disk and reload it (this keeps the data lazily represented).

When aggregation points by shapes, the current implementation loads all the points into memory and thus could lead to a large memory usage. This Github issue scverse/spatialdata#210 keeps track of the changes required to address this behavior.