snapatac2.tl.macs3#

snapatac2.tl.macs3(adata, groupby, *, qvalue=0.05, replicate=None, replicate_qvalue=None, max_frag_size=None, selections=None, nolambda=False, shift=-100, extsize=200, min_len=None, blacklist=None, key_added='macs3', tempdir=None, inplace=True, n_jobs=8)[source]#

Call peaks using MACS3.

Parameters:
  • adata (AnnData | AnnDataSet) – The (annotated) data matrix of shape n_obs x n_vars. Rows correspond to cells and columns to regions.

  • groupby (str | list[str]) – Group the cells before peak calling. If a str, groups are obtained from .obs[groupby].

  • qvalue (float) – qvalue cutoff used in MACS3.

  • replicate (Union[str, list[str], None]) – Replicate information. If provided, reproducible peaks will be called for each group.

  • replicate_qvalue (Optional[float]) – qvalue cutoff used in MACS3 for calling peaks in replicates. This parameter is only used when replicate is provided. Typically this parameter is used to call peaks in replicates with a more lenient cutoff. If not provided, qvalue will be used.

  • max_frag_size (Optional[int]) – Maximum fragment size. If provided, fragments with sizes larger than max_frag_size will be not be used in peak calling. This is used in ATAC-seq data to remove fragments that are not from nucleosome-free regions. You can use frag_size_distr to choose a proper value for this parameter.

  • selections (Optional[set[str]]) – Call peaks for the selected groups only.

  • nolambda (bool) – If True, macs3 will use the background lambda as local lambda. This means macs3 will not consider the local bias at peak candidate regions.

  • shift (int) – The shift size in MACS.

  • extsize (int) – The extension size in MACS.

  • min_len (Optional[int]) – The minimum length of a called peak. If None, it is set to extsize.

  • blacklist (Optional[Path]) – Path to the blacklist file in BED format. If provided, regions in the blacklist will be removed.

  • key_added (str) – .uns key under which to add the peak information.

  • tempdir (Optional[Path]) – If provided, a temporary directory will be created in the directory. Otherwise, a temporary directory will be created in the system default temporary directory.

  • inplace (bool) – Whether to store the result inplace.

  • n_jobs (int) – Number of processes to use for peak calling.

Returns:

If inplace=True it stores the result in adata.uns[`key_added]`. Otherwise, it returns the result as dataframes.

Return type:

dict[str, ‘polars.DataFrame’] | None

See also

merge_peaks