snapatac2.tl.spectral#

snapatac2.tl.spectral(adata, n_comps=30, features='selected', random_state=0, sample_size=None, sample_method='random', chunk_size=20000, distance_metric='cosine', weighted_by_sd=True, feature_weights=None, inplace=True)[source]#

Perform dimension reduction using Laplacian Eigenmaps.

Convert the cell-by-feature count matrix into lower dimensional representations using the spectrum of the normalized graph Laplacian defined by pairwise similarity between cells. This function utilizes the matrix-free spectral embedding algorithm to compute the embedding when distance_metric is “cosine”, which scales linearly with the number of cells. For other types of similarity metrics, the time and space complexity scale quadratically with the number of cells.

Note

Determining the appropriate number of components is crucial when performing downstream analyses to ensure optimal clustering outcomes. Utilizing components that are either uninformative or irrelevant can compromise the quality of the results. By default, this function adopts a strategy where all eigenvectors are weighted according to the square root of their corresponding eigenvalues, rather than implementing a strict cutoff threshold. This method generally provides satisfactory results, circumventing the necessity for manual specification of component numbers. However, it’s important to note that there might be exceptional cases with certain datasets where deviating from this default setting could yield better outcomes. In such scenarios, you can disable the automatic weighting by setting weighted_by_sd=False. Subsequently, you will need to manually determine and select the number of components to use for your specific analysis.
This funciton may not always return the exact number of eigenvectors requested. This function computes lower-dimensional embeddings by performing the eigen-decomposition of the normalized graph Laplacian matrix, where all eigenvalues should be non-negative. However, the method used to calculate eigenvectors, specifically scipy.sparse.linalg.eigsh, may not perform optimally for small eigenvalues. This occasionally leads to the function outputting negative eigenvalues at the lower spectrum. To address this issue, a post-processing step is introduced to eliminate these erroneous eigenvalues when weighted_by_sd=True (which is the default setting). This step typically has minimal impact, as the affected eigenvalues are generally very small.

Parameters:

adata (AnnData | AnnDataSet) – AnnData or AnnDataSet object.
n_comps (int) – Number of dimensions to keep. The result is insensitive to this parameter when weighted_by_sd is set to True, as long as it is large enough, e.g. 30.
features (str | ndarray | None) – Boolean index mask. True means that the feature is kept. False means the feature is removed. If features=None, all features are used.
random_state (int) – Seed of the random state generator
sample_size (int | float | None) – Approximate the embedding using the Nystrom algorithm by selecting a subset of cells. It could be either an integer indicating the number of cells to sample or a real value from 0 to 1 indicating the fraction of cells to sample. When sample_size is None, the full matrix is used. Using this only when the number of cells is too large, e.g. > 10,000,000, or the distance_metric is “jaccard”.
chunk_size (int) – Chunk size used in the Nystrom method
distance_metric (Literal['jaccard', 'cosine']) – distance metric: “jaccard”, “cosine”. When “cosine” is used, the matrix-free spectral embedding algorithm is used.
weighted_by_sd (bool) – Whether to weight the result eigenvectors by the square root of eigenvalues. This parameter is turned on by default. When it is turned on, mannully selecting the number of components is usually not necessary.
feature_weights (list[float] | None) – Feature weights used in the distance metric. If None, the inverse document frequency (IDF) is used.
inplace (bool) – Whether to store the result in the anndata object.

Returns:

if inplace=True it stores Spectral embedding of data in adata.obsm["X_spectral"] and adata.uns["spectral_eigenvalue"]. Otherwise, it returns the result as numpy arrays.

Return type:

tuple[np.ndarray, np.ndarray] | None