snapatac2.pp.make_peak_matrix#

snapatac2.pp.make_peak_matrix(adata, *, use_rep=None, inplace=False, file=None, backend='hdf5', peak_file=None, chunk_size=500, use_x=False, min_frag_size=None, max_frag_size=None, counting_strategy='paired-insertion', value_type='target', summary_type='sum')[source]#

Generate cell by peak count matrix.

This function will generate a cell by peak count matrix and store it in a new .h5ad file.

import_fragments must be ran first in order to use this function.

Parameters:
  • adata (AnnData | AnnDataSet) – The (annotated) data matrix of shape n_obs x n_vars. Rows correspond to cells and columns to regions.

  • use_rep (str | list[str] | None) – This is used to read peak information from .uns[use_rep]. The peaks can also be provided by a list of strings: [“chr1:1-100”, “chr2:2-200”].

  • inplace (bool) – Whether to add the tile matrix to the AnnData object or return a new AnnData object.

  • file (Path | None) – File name of the output h5ad file used to store the result. If provided, result will be saved to a backed AnnData, otherwise an in-memory AnnData is used. This has no effect when inplace=True.

  • backend (Literal['hdf5']) – The backend to use for storing the result. If None, the default backend will be used.

  • peak_file (Path | None) – Bed file containing the peaks. If provided, peak information will be read from this file.

  • chunk_size (int) – Chunk size

  • use_x (bool) – If True, use the matrix stored in .X as raw counts. Otherwise the .obsm['insertion'] is used.

  • min_frag_size (int | None) – Minimum fragment size to include.

  • max_frag_size (int | None) – Maximum fragment size to include.

  • counting_strategy (Literal['fragment', 'insertion', 'paired-insertion']) – The strategy to compute feature counts. It must be one of the following: “fragment”, “insertion”, or “paired-insertion”. “fragment” means the feature counts are assigned based on the number of fragments that overlap with a region of interest. “insertion” means the feature counts are assigned based on the number of insertions that overlap with a region of interest. “paired-insertion” is similar to “insertion”, but it only counts the insertions once if the pair of insertions of a fragment are both within the same region of interest [Miao24]. Note that this parameter has no effect if input are single-end reads.

  • value_type (Literal['target', 'total', 'fraction']) – The type of value to use from .obsm['_values'], only available when data is imported using import_values. It must be one of the following: “target”, “total”, or “fraction”. “target” means the value is the number of recrods that are with postive measurements, e.g., number of methylated bases. “total” means the value is the total number of measurements, e.g., methylated bases plus unmethylated bases. “fraction” means the value is the fraction of the records that are positive, e.g., the fraction of methylated bases.

  • summary_type (Literal['sum', 'mean']) – The type of summary to use when multiple values are found in a bin. This parameter is only used when .obsm['_values'] exists, which is created by import_values. It must be one of the following: “sum” or “mean”.

Returns:

An annotated data matrix of shape n_obs x n_vars. Rows correspond to cells and columns to peaks. If file=None, an in-memory AnnData will be returned, otherwise a backed AnnData is returned.

Return type:

AnnData | ad.AnnData | None

Examples

>>> import snapatac2 as snap
>>> data = snap.pp.import_fragments(snap.datasets.pbmc500(downsample=True), chrom_sizes=snap.genome.hg38, sorted_by_barcode=False)
>>> peak_mat = snap.pp.make_peak_matrix(data, peak_file=snap.datasets.cre_HEA())
>>> print(peak_mat)
AnnData object with n_obs × n_vars = 585 × 1154611
    obs: 'n_fragment', 'frac_dup', 'frac_mito'