snapatac2.tl.kmeans#

snapatac2.tl.kmeans(adata, n_clusters, n_iterations=-1, random_state=0, use_rep='X_spectral', key_added='kmeans', inplace=True)[source]#

Cluster cells into subgroups using the K-means algorithm, a classical algorithm in data mining.

Parameters:
  • adata (AnnData | AnnDataSet | ndarray) – The annotated data matrix.

  • n_clusters (int) – Number of clusters to return.

  • n_iterations (int) – How many iterations of the kmeans clustering algorithm to perform. Positive values above 2 define the total number of iterations to perform, -1 has the algorithm run until it reaches its optimal clustering.

  • random_state (int) – Change the initialization of the optimization.

  • use_rep (str) – Which data in adata.obsm to use for clustering. Default is “X_spectral”.

  • key_added (str) – adata.obs key under which to add the cluster labels.

Return type:

ndarray | None

Returns:

  • adds fields to adata

  • adata.obs[key_added] – Array of dim (number of samples) that stores the subgroup id ('0', '1', …) for each cell.

  • adata.uns['kmeans']['params'] – A dict with the values for the parameters n_clusters, random_state, and n_iterations.