snapatac2.pp.filter_cells#

snapatac2.pp.filter_cells(data, min_counts=1000, min_tsse=5.0, max_counts=None, max_tsse=None, inplace=True, n_jobs=8)[source]#

Filter cell outliers based on counts and numbers of genes expressed. For instance, only keep cells with at least min_counts counts or min_tsse TSS enrichment scores. This is to filter measurement outliers, i.e. “unreliable” observations.

Parameters:

data (AnnData | list[AnnData]) – The (annotated) data matrix of shape n_obs x n_vars. Rows correspond to cells and columns to regions. data can also be a list of AnnData objects. In this case, the function will be applied to each AnnData object in parallel.
min_counts (int | None) – Minimum number of counts required for a cell to pass filtering.
min_tsse (float | None) – Minimum TSS enrichemnt score required for a cell to pass filtering.
max_counts (Optional[int]) – Maximum number of counts required for a cell to pass filtering.
max_tsse (Optional[float]) – Maximum TSS enrichment score expressed required for a cell to pass filtering.
inplace (bool) – Perform computation inplace or return result.
n_jobs (int) – Number of parallel jobs to use when data is a list.

Returns:

If inplace = True, directly subsets the data matrix. Otherwise return a boolean index mask that does filtering, where True means that the cell is kept, False means the cell is removed.

Return type:

np.ndarray | None