snapatac2.ex.export_coverage#

snapatac2.ex.export_coverage(adata, groupby, selections=None, bin_size=10, blacklist=None, normalization='RPKM', ignore_for_norm=None, min_frag_length=None, max_frag_length=2000, smooth_length=None, out_dir='./', prefix='', suffix='.bw', output_format=None, compression=None, compression_level=None, tempdir=None, n_jobs=8)[source]#

Export and save coverage in a bedgraph or bigwig format file.

This function generates coverage tracks (bigWig or bedGraph) for each group of cells defined in groupby. The coverage is calculated as the number of reads per bin, where bins are short consecutive counting windows of a defined size. For paired-end data, the reads are extended to the fragment length and the coverage is calculated as the number of fragments per bin. There are several options for normalization. The default is RPKM, which normalizes by the total number of reads and the length of the region. The normalization can be disabled by setting normalization=None.

Parameters:
  • adata (AnnData | AnnDataSet) – The (annotated) data matrix of shape n_obs x n_vars. Rows correspond to cells and columns to regions.

  • groupby (str | list[str]) – Group the cells. If a str, groups are obtained from .obs[groupby].

  • selections (Optional[list[str]]) – Export only the selected groups.

  • bin_size (int) – Size of the bins, in bases, for the output of the bigwig/bedgraph file.

  • blacklist (Optional[Path]) – A BED file containing the blacklisted regions.

  • normalization (Optional[Literal['RPKM', 'CPM']]) – Normalization method. If None, no normalization is performed.

  • ignore_for_norm (Optional[list[str]]) – A list of chromosomes to ignore for normalization.

  • min_frag_length (Optional[int]) – Minimum fragment length to be included in the computation.

  • max_frag_length (int | None) – Maximum fragment length to be included in the computation.

  • smooth_length (Optional[int]) – Length of the smoothing window for the output of the bigwig/bedgraph file. For example, if the bin_size is set to 20 and the smooth_length is set to 3, then, for each bin, the average of the bin and its left and right neighbors is considered (the total of 60 bp).

  • out_dir (Path) – Directory for saving the outputs.

  • prefix (str) – Text added to the output file name.

  • suffix (str) – Text added to the output file name.

  • output_format (Optional[Literal['bedgraph', 'bigwig']]) – Output format. If None, it is inferred from the suffix.

  • compression (Optional[Literal['gzip', 'zstandard']]) – Compression type. If None, it is inferred from the suffix.

  • compression_level (Optional[int]) – Compression level. 1-9 for gzip, 1-22 for zstandard. If None, it is set to 6 for gzip and 3 for zstandard.

  • n_jobs (int) – Number of threads to use. If <= 0, use all available threads.

Returns:

A dictionary contains (groupname, filename) pairs. The file names are formatted as {prefix}{groupname}{suffix}.

Return type:

dict[str, str]

See also

export_fragments

Examples

>>> import snapatac2 as snap
>>> data = snap.read(snap.datasets.pbmc5k(type="annotated_h5ad"), backed='r')
>>> snap.ex.export_coverage(data, groupby='cell_type', suffix='.bedgraph.zst')
{'cDC': './cDC.bedgraph.zst',
 'Memory B': './Memory B.bedgraph.zst',
 'CD4 Naive': './CD4 Naive.bedgraph.zst',
 'pDC': './pDC.bedgraph.zst',
 'CD8 Naive': './CD8 Naive.bedgraph.zst',
 'CD8 Memory': './CD8 Memory.bedgraph.zst',
 'CD14 Mono': './CD14 Mono.bedgraph.zst',
 'Naive B': './Naive B.bedgraph.zst',
 'NK': './NK.bedgraph.zst',
 'CD4 Memory': './CD4 Memory.bedgraph.zst',
 'CD16 Mono': './CD16 Mono.bedgraph.zst',
 'MAIT': './MAIT.bedgraph.zst'}
>>> snap.ex.export_coverage(data, groupby='cell_type', suffix='.bw')
{'Naive B': './Naive B.bw',
 'CD4 Memory': './CD4 Memory.bw',
 'CD16 Mono': './CD16 Mono.bw',
 'CD8 Naive': './CD8 Naive.bw',
 'pDC': './pDC.bw',
 'CD8 Memory': './CD8 Memory.bw',
 'NK': './NK.bw',
 'Memory B': './Memory B.bw',
 'CD14 Mono': './CD14 Mono.bw',
 'MAIT': './MAIT.bw',
 'CD4 Naive': './CD4 Naive.bw',
 'cDC': './cDC.bw'}