Datasets#

These functions facilitate the download of public datasets and auxiliary data used in the SnapATAC2 package.

Note

You can change the data cache directory by setting the SNAP_DATA_DIR environmental variable.

Genome#

genome.Genome(chrom_sizes, annotation_filename)

genome.GRCh37

genome.GRCh38

genome.GRCm38

genome.GRCm39

genome.hg19

genome.hg38

genome.mm10

genome.mm39

Motifs#

datasets.cis_bp([unique])

Motifs from CIS-BP database.

datasets.Meuleman_2020()

Motifs from CIS-BP database.

Raw data#

datasets.pbmc500([downsampled])

500 PBMCs from 10x Genomics.

datasets.pbmc5k([type])

5k PBMCs from 10x Genomics.

datasets.pbmc10k_multiome([modality, type])

10k PBMCs from 10x Genomics.

datasets.colon()

5 colon transverse samples from Zhang et al., 2021.

datasets.cre_HEA()

cis-regulatory elements from Zhang et al., 2021.