snapatac2.pp.select_features#
- snapatac2.pp.select_features(adata, n_features=500000, filter_lower_quantile=0.005, filter_upper_quantile=0.005, whitelist=None, blacklist=None, max_iter=1, inplace=True, n_jobs=8)[source]#
Perform feature selection.
Note
This function does not perform the actual subsetting. The feature mask is used by various functions to generate submatrices on the fly.
- Parameters:
adata (
AnnData
|AnnDataSet
|list
[AnnData
]) – The (annotated) data matrix of shapen_obs
xn_vars
. Rows correspond to cells and columns to regions.adata
can also be a list of AnnData objects. In this case, the function will be applied to each AnnData object in parallel.n_features (
int
) – Number of features to keep. Note that the final number of features may be smaller than this number if there is not enough features that pass the filtering criteria.filter_lower_quantile (
float
) – Lower quantile of the feature count distribution to filter out.filter_upper_quantile (
float
) – Upper quantile of the feature count distribution to filter out.whitelist (
Optional
[Path
]) – A user provided bed file containing genome-wide whitelist regions. None-zero features listed here will be kept regardless of the other filtering criteria. If a feature is present in both whitelist and blacklist, it will be kept.blacklist (
Optional
[Path
]) – A user provided bed file containing genome-wide blacklist regions. Features that are overlapped with these regions will be removed.inplace (
bool
) – Perform computation inplace or return result.n_jobs (
int
) – Number of parallel jobs to use whenadata
is a list.
- Returns:
If
inplace = False
, return a boolean index mask that does filtering, whereTrue
means that the feature is kept,False
means the feature is removed. Otherwise, store this index mask directly to.var['selected']
.- Return type:
np.ndarray | None