snapatac2.tl.hdbscan#

snapatac2.tl.hdbscan(adata, min_cluster_size=5, min_samples=None, cluster_selection_epsilon=0.0, alpha=1.0, cluster_selection_method='eom', random_state=0, use_rep='X_spectral', key_added='hdbscan', **kwargs)[source]#

Cluster cells into subgroups using the HDBSCAN algorithm.

Parameters:
  • adata (AnnData) – The annotated data matrix.

  • min_cluster_size (int) – The minimum size of clusters; single linkage splits that contain fewer points than this will be considered points “falling out” of a cluster rather than a cluster splitting into two new clusters.

  • min_samples (int | None) – The number of samples in a neighbourhood for a point to be considered a core point.

  • cluster_selection_epsilon (float) – A distance threshold. Clusters below this value will be merged.

  • alpha (float) – A distance scaling parameter as used in robust single linkage.

  • cluster_selection_method (str) – The method used to select clusters from the condensed tree. The standard approach for HDBSCAN* is to use an Excess of Mass algorithm to find the most persistent clusters. Alternatively you can instead select the clusters at the leaves of the tree - this provides the most fine grained and homogeneous clusters. Options are: “eom” or “leaf”.

  • random_state (int) – Change the initialization of the optimization.

  • use_rep (str) – Which data in adata.obsm to use for clustering. Default is “X_spectral”.

  • key_added (str) – adata.obs key under which to add the cluster labels.

Return type:

None

Returns:

  • adds fields to adata

  • adata.obs[key_added] – Array of dim (number of samples) that stores the subgroup id ('0', '1', …) for each cell.