snapatac2.tl.hdbscan#
- snapatac2.tl.hdbscan(adata, min_cluster_size=5, min_samples=None, cluster_selection_epsilon=0.0, alpha=1.0, cluster_selection_method='eom', random_state=0, use_rep='X_spectral', key_added='hdbscan', **kwargs)[source]#
Cluster cells into subgroups using the HDBSCAN algorithm.
- Parameters:
adata (
AnnData
) – The annotated data matrix.min_cluster_size (
int
) – The minimum size of clusters; single linkage splits that contain fewer points than this will be considered points “falling out” of a cluster rather than a cluster splitting into two new clusters.min_samples (
Optional
[int
]) – The number of samples in a neighbourhood for a point to be considered a core point.cluster_selection_epsilon (
float
) – A distance threshold. Clusters below this value will be merged.alpha (
float
) – A distance scaling parameter as used in robust single linkage.cluster_selection_method (
str
) – The method used to select clusters from the condensed tree. The standard approach for HDBSCAN* is to use an Excess of Mass algorithm to find the most persistent clusters. Alternatively you can instead select the clusters at the leaves of the tree - this provides the most fine grained and homogeneous clusters. Options are: “eom” or “leaf”.random_state (
int
) – Change the initialization of the optimization.use_rep (
str
) – Which data inadata.obsm
to use for clustering. Default is “X_spectral”.key_added (
str
) –adata.obs
key under which to add the cluster labels.
- Return type:
- Returns:
adds fields to
adata
adata.obs[key_added]
– Array of dim (number of samples) that stores the subgroup id ('0'
,'1'
, …) for each cell.