spatialdata.aggregate#
- spatialdata.aggregate(values, by, values_sdata=None, by_sdata=None, value_key=None, agg_func='sum', target_coordinate_system='global', fractions=False, region_key='region', instance_key='instance_id', deepcopy=True, **kwargs)#
Aggregate values by given region.
- Parameters:
values_sdata (
Optional
[SpatialData
] (default:None
)) – SpatialData object containing the values to aggregate: ifNone
,values
must be a SpatialElement; if notNone
,values
must be a string.values (
DataFrame
|GeoDataFrame
|SpatialImage
|MultiscaleSpatialImage
|str
) – The values to aggregate: ifvalues_sdata
isNone
, must be a SpatialElement, otherwise must be a string specifying the name of the SpatialElement invalues_sdata
by_sdata (
Optional
[SpatialData
] (default:None
)) – Regions to aggregate by: ifNone
,by
must be a SpatialElement; if notNone
,by
must be a string.by (
GeoDataFrame
|SpatialImage
|MultiscaleSpatialImage
|str
) – The regions to aggregate by: ifby_sdata
is None, must be a SpatialElement, otherwise must be a string specifying the name of the SpatialElement inby_sdata
value_key (
Union
[list
[str
],str
,None
] (default:None
)) –Name (or list of names) of the columns containing the values to aggregate; can refer both to numerical or categorical values. If the values are categorical,
value_key
can’t be a list.The key can be:
the name of a column(s) in the dataframe (Dask
DataFrame
for points orGeoDataFrame
for shapes);the name of obs column(s) in the associated
AnnData
table (for shapes and labels);the name of a var(s), referring to the column(s) of the X matrix in the table (for shapes and labels).
If nothing is passed here, it defaults to the equivalent of a column of ones. Defaults to
FEATURE_KEY
for points (if present).agg_func (
str
|list
[str
] (default:'sum'
)) – Aggregation function to apply over point values, e.g."mean"
,"sum"
,"count"
. Passed topandas.DataFrame.groupby.agg()
or toxrspatial.zonal_stats()
according to the type ofvalues
.target_coordinate_system (
str
(default:'global'
)) – Coordinate system to transform to before aggregating.fractions (
bool
(default:False
)) –Adjusts for partial areas overlap between regions in
values
andby
. More precisely: in the case in which a region inby
partially overlaps with a region invalues
, this setting specifies whether the value to aggregate should be considered as it is (fractions = False
) or it is to be multiplied by the following ratio: “area of the intersection between the two regions” / “area of the region invalues
”.Additional details:
default is
fractions = False
.- when aggregating points this parameter must be left to
False
, as the points don’t have area (otherwise a table of zeros would be obtained);
- when aggregating points this parameter must be left to
- for categorical values
"count"
and"sum"
are equivalent whenfractions = False
, but when fractions = True
,"count"
and"sum"
are different:count
would give not meaningful results and so it’s not allowed, while"sum"
actually sums the values of the intersecting regions, and should therefore be used.
- for categorical values
aggregating categorical values with
agg_func = "mean"
is not allowed as it give not meaningful results.
region_key (
str
(default:'region'
)) – Name that will be given to the new region column in the returned aggregated table.instance_key (
str
(default:'instance_id'
)) – Name that will be given to the new instance id column in the returned aggregated table.deepcopy (
bool
(default:True
)) – Whether to deepcopy the shapes in the returnedSpatialData
object. If the shapes are large (e.g. large multiscale labels), you may consider disabling the deepcopy to use a lazy Dask representation.kwargs (
Any
) – Additional keyword arguments to pass toxrspatial.zonal_stats()
.
- Return type:
- Returns:
: Returns a
SpatialData
object with theby
shapes as SpatialElement and a table with the aggregated values annotating the shapes.If
value_key
refers to a categorical variable, the table in theSpaitalData
object has shape (by.shape[0]
, <n categories>).
Notes
This function returns a
SpatialData
object, so to access the aggregated table you can use thetable
attribute`.The shapes in the returned
SpatialData
objects are a reference to the original one. If you want them to be a different object you can do a deepcopy manually (this loads the data into memory), or you can save theSpatialData
object to disk and reload it (this keeps the data lazily represented).When aggregation points by shapes, the current implementation loads all the points into memory and thus could lead to a large memory usage. This Github issue scverse/spatialdata#210 keeps track of the changes required to address this behavior.