snapatac2.pp.add_tile_matrix#

snapatac2.pp.add_tile_matrix(adata, *, bin_size=500, inplace=True, chunk_size=500, exclude_chroms=['chrM', 'chrY', 'M', 'Y'], min_frag_size=None, max_frag_size=None, count_frag_as_reads=True, file=None, backend='hdf5', n_jobs=8)[source]#

Generate cell by bin count matrix.

This function is used to generate and add a cell by bin count matrix to the AnnData object.

import_data must be ran first in order to use this function.

Parameters:
  • adata (AnnData | list[AnnData]) – The (annotated) data matrix of shape n_obs x n_vars. Rows correspond to cells and columns to regions. adata could also be a list of AnnData objects when inplace=True. In this case, the function will be applied to each AnnData object in parallel.

  • bin_size (int) – The size of consecutive genomic regions used to record the counts.

  • inplace (bool) – Whether to add the tile matrix to the AnnData object or return a new AnnData object.

  • chunk_size (int) – Increasing the chunk_size speeds up I/O but uses more memory.

  • exclude_chroms (list[str] | str | None) – A list of chromosomes to exclude.

  • min_frag_size (Optional[int]) – Minimum fragment size to include.

  • max_frag_size (Optional[int]) – Maximum fragment size to include.

  • count_frag_as_reads (bool) – Whether to count fragments as reads. If True, each fragment is converted to two points representing both ends of the fragment.

  • file (Optional[Path]) – File name of the output file used to store the result. If provided, result will be saved to a backed AnnData, otherwise an in-memory AnnData is used. This has no effect when inplace=True.

  • backend (Literal['hdf5']) – The backend to use for storing the result. If None, the default backend will be used.

  • n_jobs (int) – Number of jobs to run in parallel when adata is a list. If n_jobs=-1, all CPUs will be used.

Returns:

An annotated data matrix of shape n_obs x n_vars. Rows correspond to cells and columns to bins. If file=None, an in-memory AnnData will be returned, otherwise a backed AnnData is returned.

Return type:

AnnData | ad.AnnData | None

Examples

>>> import snapatac2 as snap
>>> data = snap.pp.import_data(snap.datasets.pbmc500(downsample=True), chrom_sizes=snap.genome.hg38, sorted_by_barcode=False)
>>> snap.pp.add_tile_matrix(data, bin_size=500)
>>> print(data)
AnnData object with n_obs × n_vars = 585 × 6062095
    obs: 'n_fragment', 'frac_dup', 'frac_mito'
    uns: 'reference_sequences'
    obsm: 'fragment_paired'