snapatac2.pp.add_tile_matrix#
- snapatac2.pp.add_tile_matrix(adata, *, bin_size=500, inplace=True, chunk_size=500, exclude_chroms=['chrM', 'chrY', 'M', 'Y'], min_frag_size=None, max_frag_size=None, counting_strategy='paired-insertion', file=None, backend='hdf5', n_jobs=8)[source]#
Generate cell by bin count matrix.
This function is used to generate and add a cell by bin count matrix to the AnnData object.
import_data
must be ran first in order to use this function.- Parameters:
adata (
AnnData
|list
[AnnData
]) – The (annotated) data matrix of shapen_obs
xn_vars
. Rows correspond to cells and columns to regions.adata
could also be a list of AnnData objects wheninplace=True
. In this case, the function will be applied to each AnnData object in parallel.bin_size (
int
) – The size of consecutive genomic regions used to record the counts.inplace (
bool
) – Whether to add the tile matrix to the AnnData object or return a new AnnData object.chunk_size (
int
) – Increasing the chunk_size speeds up I/O but uses more memory.exclude_chroms (
list
[str
] |str
|None
) – A list of chromosomes to exclude.min_frag_size (
Optional
[int
]) – Minimum fragment size to include.max_frag_size (
Optional
[int
]) – Maximum fragment size to include.counting_strategy (
Literal
['fragment'
,'insertion'
,'paired-insertion'
]) – The strategy to compute feature counts. It must be one of the following: “fragment”, “insertion”, or “paired-insertion”. “fragment” means the feature counts are assigned based on the number of fragments that overlap with a region of interest. “insertion” means the feature counts are assigned based on the number of insertions that overlap with a region of interest. “paired-insertion” is similar to “insertion”, but it only counts the insertions once if the pair of insertions of a fragment are both within the same region of interest [Miao24]. Note that this parameter has no effect if input are single-end reads.file (
Optional
[Path
]) – File name of the output file used to store the result. If provided, result will be saved to a backed AnnData, otherwise an in-memory AnnData is used. This has no effect wheninplace=True
.backend (
Literal
['hdf5'
]) – The backend to use for storing the result. IfNone
, the default backend will be used.n_jobs (
int
) – Number of jobs to run in parallel whenadata
is a list. Ifn_jobs=-1
, all CPUs will be used.
- Returns:
An annotated data matrix of shape
n_obs
xn_vars
. Rows correspond to cells and columns to bins. Iffile=None
, an in-memory AnnData will be returned, otherwise a backed AnnData is returned.- Return type:
AnnData | ad.AnnData | None
See also
Examples
>>> import snapatac2 as snap >>> data = snap.pp.import_data(snap.datasets.pbmc500(downsample=True), chrom_sizes=snap.genome.hg38, sorted_by_barcode=False) >>> snap.pp.add_tile_matrix(data, bin_size=500) >>> print(data) AnnData object with n_obs × n_vars = 585 × 6062095 obs: 'n_fragment', 'frac_dup', 'frac_mito' uns: 'reference_sequences' obsm: 'fragment_paired'