snapatac2.tl.dbscan#

snapatac2.tl.dbscan(adata, eps=0.5, min_samples=5, leaf_size=30, n_jobs=None, use_rep='X_spectral', key_added='dbscan')[source]#

Cluster cells into subgroups using the DBSCAN algorithm.

Parameters:
  • adata (AnnData) – The annotated data matrix.

  • eps (float) – The maximum distance between two samples for one to be considered as in the neighborhood of the other. This is not a maximum bound on the distances of points within a cluster. This is the most important DBSCAN parameter to choose appropriately for your data set and distance function.

  • min_samples (int) – The number of samples (or total weight) in a neighborhood for a point to be considered as a core point. This includes the point itself.

  • leaf_size (int) – Leaf size passed to BallTree or cKDTree. This can affect the speed of the construction and query, as well as the memory required to store the tree. The optimal value depends on the nature of the problem.

  • n_jobs (Optional[int]) – The number of parallel jobs to run. None means 1 unless in a joblib.parallel_backend context. -1 means using all processors.

  • use_rep (str) – Which data in adata.obsm to use for clustering. Default is “X_spectral”.

  • key_added (str) – adata.obs key under which to add the cluster labels.

Return type:

None

Returns:

  • adds fields to adata

  • adata.obs[key_added] – Array of dim (number of samples) that stores the subgroup id ('0', '1', …) for each cell.