snapatac2.pp.select_features#

snapatac2.pp.select_features(adata, n_features=500000, filter_lower_quantile=0.005, filter_upper_quantile=0.005, whitelist=None, blacklist=None, max_iter=1, inplace=True, n_jobs=8)[source]#

Perform feature selection.

Note

This function does not perform the actual subsetting. The feature mask is used by various functions to generate submatrices on the fly.

Parameters:
  • adata (AnnData | AnnDataSet | list[AnnData]) – The (annotated) data matrix of shape n_obs x n_vars. Rows correspond to cells and columns to regions. adata can also be a list of AnnData objects. In this case, the function will be applied to each AnnData object in parallel.

  • n_features (int) – Number of features to keep. Note that the final number of features may be smaller than this number if there is not enough features that pass the filtering criteria.

  • filter_lower_quantile (float) – Lower quantile of the feature count distribution to filter out.

  • filter_upper_quantile (float) – Upper quantile of the feature count distribution to filter out.

  • whitelist (Optional[Path]) – A user provided bed file containing genome-wide whitelist regions. None-zero features listed here will be kept regardless of the other filtering criteria. If a feature is present in both whitelist and blacklist, it will be kept.

  • blacklist (Optional[Path]) – A user provided bed file containing genome-wide blacklist regions. Features that are overlapped with these regions will be removed.

  • inplace (bool) – Perform computation inplace or return result.

  • n_jobs (int) – Number of parallel jobs to use when adata is a list.

Returns:

If inplace = False, return a boolean index mask that does filtering, where True means that the feature is kept, False means the feature is removed. Otherwise, store this index mask directly to .var['selected'].

Return type:

np.ndarray | None