Datasets#

These functions facilitate the download of public datasets and auxiliary data used in the SnapATAC2 package.

Note

You can change the data cache directory by setting the SNAP_DATA_DIR environmental variable.

Genome#

genome.Genome(chrom_sizes, annotation_filename)

genome.GRCh37

genome.GRCh38

genome.GRCm38

genome.GRCm39

genome.hg19

genome.hg38

genome.mm10

genome.mm39

Motifs#

datasets.cis_bp([unique])

A list of transcription factor motifs curated by the CIS-BP database.

datasets.Meuleman_2020()

A list of transcription factor motifs curated from [Meuleman20].

Raw data#

datasets.pbmc500([downsampled])

scATAC-seq dataset of 500 PBMCs from 10x Genomics.

datasets.pbmc5k([type])

scATAC-seq dataset of 5k PBMCs from 10x Genomics.

datasets.pbmc10k_multiome([modality, type])

Single-cell multiome dataset of 10k PBMCs from 10x Genomics.

datasets.colon()

scATAC-seq datasets of five colon transverse samples from [Zhang21].

datasets.cre_HEA()

Curated cis-regulatory elements from [Zhang21].