snapatac2.ex.export_coverage#

snapatac2.ex.export_coverage(adata, groupby, selections=None, bin_size=10, blacklist=None, normalization='RPKM', include_for_norm=None, exclude_for_norm=None, min_frag_length=None, max_frag_length=2000, counting_strategy='fragment', smooth_base=None, out_dir='./', prefix='', suffix='.bw', output_format=None, compression=None, compression_level=None, tempdir=None, n_jobs=8)[source]#

Export and save coverage in a bedgraph or bigwig format file.

This function first divides cells into groups based on the groupby parameter. It then independently generates the genome-wide coverage track (bigWig or bedGraph) for each group of cells. The coverage is calculated as the number of reads per bin, where bins are short consecutive counting windows of a defined size. For paired-end data, the reads are extended to the fragment length and the coverage is calculated as the number of fragments per bin. There are several options for normalization. The default is RPKM, which normalizes by the total number of reads and the length of the region. The normalization can be disabled by setting normalization=None.

../../_images/func%2Bexport_coverage.svg
Parameters:
  • adata (AnnData | AnnDataSet) – The (annotated) data matrix of shape n_obs x n_vars. Rows correspond to cells and columns to regions.

  • groupby (str | list[str]) – Group the cells. If a str, groups are obtained from .obs[groupby].

  • selections (list[str] | None) – Export only the selected groups.

  • bin_size (int) – Size of the bins, in bases, for the output of the bigwig/bedgraph file.

  • blacklist (Path | None) – A BED file containing the blacklisted regions.

  • normalization (Optional[Literal['RPKM', 'CPM', 'BPM']]) – Normalization method. If None, no normalization is performed. Options: - RPKM (per bin) = #reads per bin / (#mapped_reads (in millions) * bin length (kb)). - CPM (per bin) = #reads per bin / #mapped_reads (in millions). - BPM (per bin) = #reads per bin / sum of all reads per bin (in millions).

  • include_for_norm (list[str] | Path) – A list of string (e.g., [“chr1:1-100”, “chr2:2-200”]) or a BED file containing the genomic loci to include for normalization. If specified, only the reads that overlap with these loci will be used for normalization. A typical use case is to include only the promoter regions for the normalization.

  • exclude_for_norm (list[str] | Path) – A list of string (e.g., [“chr1:1-100”, “chr2:2-200”]) or a BED file containing the genomic loci to exclude for normalization. If specified, the reads that overlap with these loci will be excluded from normalization. If a read overlaps with both include_for_norm and exclude_for_norm, it will be excluded.

  • min_frag_length (int | None) – Minimum fragment length to be included in the computation.

  • max_frag_length (int | None) – Maximum fragment length to be included in the computation.

  • counting_strategy (Literal['fragment', 'insertion', 'paired-insertion']) – The strategy to compute feature counts. It must be one of the following: “fragment”, “insertion”, or “paired-insertion”. “fragment” means the feature counts are assigned based on the number of fragments that overlap with a region of interest. “insertion” means the feature counts are assigned based on the number of insertions that overlap with a region of interest. “paired-insertion” is similar to “insertion”, but it only counts the insertions once if the pair of insertions of a fragment are both within the same region of interest [Miao24]. Note that this parameter has no effect if input are single-end reads.

  • smooth_base (int | None) – Length of the smoothing window in bases for the output of the bigwig/bedgraph file.

  • out_dir (Path) – Directory for saving the outputs.

  • prefix (str) – Text added to the output file name.

  • suffix (str) – Text added to the output file name.

  • output_format (Optional[Literal['bedgraph', 'bigwig']]) – Output format. If None, it is inferred from the suffix.

  • compression (Optional[Literal['gzip', 'zstandard']]) – Compression type. If None, it is inferred from the suffix.

  • compression_level (int | None) – Compression level. 1-9 for gzip, 1-22 for zstandard. If None, it is set to 6 for gzip and 3 for zstandard.

  • n_jobs (int) – Number of threads to use. If <= 0, use all available threads.

Returns:

A dictionary contains (groupname, filename) pairs. The file names are formatted as {prefix}{groupname}{suffix}.

Return type:

dict[str, str]

See also

export_fragments

Examples

>>> import snapatac2 as snap
>>> data = snap.read(snap.datasets.pbmc5k(type="annotated_h5ad"), backed='r')
>>> snap.ex.export_coverage(data, groupby='cell_type', suffix='.bedgraph.zst')
{'cDC': './cDC.bedgraph.zst',
 'Memory B': './Memory B.bedgraph.zst',
 'CD4 Naive': './CD4 Naive.bedgraph.zst',
 'pDC': './pDC.bedgraph.zst',
 'CD8 Naive': './CD8 Naive.bedgraph.zst',
 'CD8 Memory': './CD8 Memory.bedgraph.zst',
 'CD14 Mono': './CD14 Mono.bedgraph.zst',
 'Naive B': './Naive B.bedgraph.zst',
 'NK': './NK.bedgraph.zst',
 'CD4 Memory': './CD4 Memory.bedgraph.zst',
 'CD16 Mono': './CD16 Mono.bedgraph.zst',
 'MAIT': './MAIT.bedgraph.zst'}
>>> snap.ex.export_coverage(data, groupby='cell_type', suffix='.bw')
{'Naive B': './Naive B.bw',
 'CD4 Memory': './CD4 Memory.bw',
 'CD16 Mono': './CD16 Mono.bw',
 'CD8 Naive': './CD8 Naive.bw',
 'pDC': './pDC.bw',
 'CD8 Memory': './CD8 Memory.bw',
 'NK': './NK.bw',
 'Memory B': './Memory B.bw',
 'CD14 Mono': './CD14 Mono.bw',
 'MAIT': './MAIT.bw',
 'CD4 Naive': './CD4 Naive.bw',
 'cDC': './cDC.bw'}