snapatac2.tl.macs3#

snapatac2.tl.macs3(adata, *, groupby=None, qvalue=0.05, call_broad_peaks=False, broad_cutoff=0.1, replicate=None, replicate_qvalue=None, max_frag_size=None, selections=None, nolambda=False, shift=-100, extsize=200, min_len=None, blacklist=None, key_added='macs3', tempdir=None, inplace=True, n_jobs=8)[source]#

Call peaks using MACS3.

Parameters:
  • adata (AnnData | AnnDataSet) – The (annotated) data matrix of shape n_obs x n_vars. Rows correspond to cells and columns to regions.

  • groupby (str | list[str] | None) – Group the cells before peak calling. If a str, groups are obtained from .obs[groupby]. If None, peaks will be called for all cells.

  • qvalue (float) – qvalue cutoff used in MACS3.

  • call_broad_peaks (bool) – If True, MACS3 will call broad peaks. The broad peak calling process utilizes two distinct cutoffs to discern broader, weaker peaks (broad_cutoff) and narrower, stronger peaks (qvalue), which are subsequently nested to provide a detailed peak landscape. To conceptualize “nested” peaks, picture a gene structure housing regions analogous to exons (strong peaks) and introns coupled with UTRs (weak peaks). Please note that, if you only want to call “broader” peak and not interested in the nested peak structure, please simply use qvalue with weaker cutoff instead of using call_broad_peaks option.

  • broad_cutoff (float) – qvalue cutoff used in MACS3 for calling broad peaks.

  • replicate (str | list[str] | None) – Replicate information. If provided, reproducible peaks will be called for each group.

  • replicate_qvalue (float | None) – qvalue cutoff used in MACS3 for calling peaks in replicates. This parameter is only used when replicate is provided. Typically this parameter is used to call peaks in replicates with a more lenient cutoff. If not provided, qvalue will be used.

  • max_frag_size (int | None) – Maximum fragment size. If provided, fragments with sizes larger than max_frag_size will be not be used in peak calling. This is used in ATAC-seq data to remove fragments that are not from nucleosome-free regions. You can use frag_size_distr to choose a proper value for this parameter.

  • selections (set[str] | None) – Call peaks for the selected groups only.

  • nolambda (bool) – If True, macs3 will use the background lambda as local lambda. This means macs3 will not consider the local bias at peak candidate regions.

  • shift (int) – The shift size in MACS.

  • extsize (int) – The extension size in MACS.

  • min_len (int | None) – The minimum length of a called peak. If None, it is set to extsize.

  • blacklist (Path | None) – Path to the blacklist file in BED format. If provided, regions in the blacklist will be removed.

  • key_added (str) – .uns key under which to add the peak information.

  • tempdir (Path | None) – If provided, a temporary directory will be created in the directory. Otherwise, a temporary directory will be created in the system default temporary directory.

  • inplace (bool) – Whether to store the result inplace.

  • n_jobs (int) – Number of processes to use for peak calling.

Returns:

If inplace=True it stores the result in adata.uns[`key_added]`. Otherwise, it returns the result as dataframes.

Return type:

dict[str, ‘polars.DataFrame’] | None

See also

merge_peaks