snapatac2.pp.scrublet#

snapatac2.pp.scrublet(adata, features='selected', n_comps=15, sim_doublet_ratio=2.0, expected_doublet_rate=0.1, n_neighbors=None, use_approx_neighbors=True, random_state=0, inplace=True, n_jobs=8, verbose=True)[source]#

Compute probability of being a doublet using the scrublet algorithm.

Parameters:
  • adata (AnnData | list[AnnData]) – The (annotated) data matrix of shape n_obs x n_vars. Rows correspond to cells and columns to regions. adata can also be a list of AnnData objects. In this case, the function will be applied to each AnnData object in parallel.

  • features (UnionType[str, ndarray, None]) – Boolean index mask, where True means that the feature is kept, and False means the feature is removed.

  • n_comps (int) – Number of PCs

  • sim_doublet_ratio (float) – Number of doublets to simulate relative to the number of observed cells.

  • expected_doublet_rate (float) – Expected doublet rate.

  • n_neighbors (Optional[int]) – Number of neighbors used to construct the KNN graph of observed cells and simulated doublets. If None, this is set to round(0.5 * sqrt(n_cells))

  • use_approx_neighbors – Whether to use approximate search.

  • random_state (int) – Random state.

  • inplace (bool) – Whether update the AnnData object inplace

  • n_jobs (int) – Number of jobs to run in parallel.

  • verbose (bool) – Whether to print progress messages.

Returns:

if inplace = True, it updates adata with the following fields:
  • adata.obs["doublet_probability"]: probability of being a doublet

  • adata.obs["doublet_score"]: doublet score

Return type:

tuple[np.ndarray, np.ndarray] | None