snapatac2.tl.spectral#

snapatac2.tl.spectral(adata, n_comps=50, features='selected', random_state=0, sample_size=None, chunk_size=20000, distance_metric='jaccard', feature_weights='idf', inplace=True)[source]#

Compute Laplacian Eigenmaps of chromatin accessibility profiles.

Convert chromatin accessibility profiles of cells into lower dimensional representations using the spectrum of the normalized graph Laplacian defined by pairwise similarity between cells.

Note

The space complexity of this function is \(O(N^2)\), where $N$ is the minimum between the total of cells and the sample_size. The memory usage in bytes is given by $N^2 * 8 * 2$. For example, when $N = 10,000$ it will use roughly 745 MB memory. When sample_size is set, the Nystrom algorithm will be used to approximate the embedding. For large datasets, try to set the sample_size appropriately to reduce the memory usage.

Parameters
  • adata (AnnData | AnnDataSet) – AnnData or AnnDataSet object.

  • n_comps (int) – Number of dimensions to keep.

  • features (UnionType[str, ndarray, None]) – Boolean index mask. True means that the feature is kept. False means the feature is removed.

  • random_state (int) – Seed of the random state generator

  • sample_size (Union[int, float, None]) – Sample size used in the Nystrom method. It could be either an integer indicating the number of cells to sample or a real value from 0 to 1 indicating the fraction of cells to sample.

  • chunk_size (int) – Chunk size used in the Nystrom method

  • distance_metric (str) – distance metric: “jaccard”, “cosine”.

  • feature_weights (UnionType[str, ndarray, None]) – Whether to weight features using “IDF”.

  • inplace (bool) – Whether to store the result in the anndata object.

Returns

if inplace=True it stores Spectral embedding of data in adata.obsm["X_spectral"] and adata.uns["spectral_eigenvalue"]. Otherwise, it returns the result as numpy arrays.

Return type

tuple[np.ndarray, np.ndarray] | None