snapatac2.pp.scrublet#
- snapatac2.pp.scrublet(adata, features='selected', n_comps=15, sim_doublet_ratio=2.0, expected_doublet_rate=0.1, n_neighbors=None, use_approx_neighbors=False, random_state=0, inplace=True, n_jobs=8, verbose=True)[source]#
Compute probability of being a doublet using the scrublet algorithm.
This function identifies doublets by generating simulated doublets using randomly pairing chromatin accessibility profiles of individual cells. The simulated doublets are then embedded alongside the original cells using the spectral embedding algorithm in this package. A k-nearest-neighbor classifier is trained to distinguish between the simulated doublets and the authentic cells. This trained classifier produces a “doublet score” for each cell. The doublet scores are then converted into probabilities using a Gaussian mixture model.
- Parameters:
adata (
AnnData
|list
[AnnData
]) – The (annotated) data matrix of shapen_obs
xn_vars
. Rows correspond to cells and columns to regions.adata
can also be a list of AnnData objects. In this case, the function will be applied to each AnnData object in parallel.features (
str
|ndarray
|None
) – Boolean index mask, whereTrue
means that the feature is kept, andFalse
means the feature is removed.n_comps (
int
) – Number of components. 15 is usually sufficient. The algorithm is not sensitive to this parameter.sim_doublet_ratio (
float
) – Number of doublets to simulate relative to the number of observed cells.expected_doublet_rate (
float
) – Expected doublet rate.n_neighbors (
Optional
[int
]) – Number of neighbors used to construct the KNN graph of observed cells and simulated doublets. IfNone
, this is set to round(0.5 * sqrt(n_cells))use_approx_neighbors – Whether to use approximate search.
random_state (
int
) – Random state.inplace (
bool
) – Whether update the AnnData object inplacen_jobs (
int
) – Number of jobs to run in parallel.verbose (
bool
) – Whether to print progress messages.
- Returns:
- if
inplace = True
, it updates adata with the following fields: adata.obs["doublet_probability"]
: probability of being a doubletadata.obs["doublet_score"]
: doublet score
- if
- Return type:
tuple[np.ndarray, np.ndarray] | None