snapatac2.metrics.frip#

snapatac2.metrics.frip(adata, regions, *, normalized=True, count_as_insertion=False, inplace=True, n_jobs=8)[source]#

Add fraction of reads in peaks (FRiP) to the AnnData object.

import_data must be ran first in order to use this function.

Parameters:
  • adata (AnnData | list[AnnData]) – The (annotated) data matrix of shape n_obs x n_vars. Rows correspond to cells and columns to regions. adata could also be a list of AnnData objects. In this case, the function will be applied to each AnnData object in parallel.

  • regions (dict[str, Path | list[str]]) – A dictionary containing the peak sets to compute FRiP. The keys are peak set names and the values are either a bed file name or a list of strings representing genomic regions. For example, {"promoter_frac": "promoter.bed", "enhancer_frac": ["chr1:100-200", "chr2:300-400"]}.

  • normalized (bool) – Whether to normalize the counts by the total number of fragments. If False, the raw number of fragments in peaks will be returned.

  • count_as_insertion (bool) – Whether to count transposition events instead of fragments. Transposition events are located at both ends of fragments.

  • inplace (bool) – Whether to add the results to adata.obs or return it as a dictionary.

  • n_jobs (int) – Number of jobs to run in parallel when adata is a list. If n_jobs=-1, all CPUs will be used.

Returns:

If inplace = True, directly adds the results to adata.obs. Otherwise return a dictionary containing the results.

Return type:

dict[str, list[float]] | list[dict[str, list[float]]] | None

Examples

>>> import snapatac2 as snap
>>> data = snap.pp.import_data(snap.datasets.pbmc500(downsample=True), chrom_sizes=snap.genome.hg38, sorted_by_barcode=False)
>>> snap.metrics.frip(data, {"peaks_frac": snap.datasets.cre_HEA()})
>>> print(data.obs['peaks_frac'].head())
AAACTGCAGACTCGGA-1    0.715930
AAAGATGCACCTATTT-1    0.697364
AAAGATGCAGATACAA-1    0.713615
AAAGGGCTCGCTCTAC-1    0.678428
AAATGAGAGTCCCGCA-1    0.724910
Name: peaks_frac, dtype: float64