snapatac2.pp.import_contacts#

snapatac2.pp.import_contacts(contact_file, *, file=None, genome=None, chrom_size=None, sorted_by_barcode=True, bin_size=500000, chunk_size=200, tempdir=None, backend='hdf5')[source]#

Import chromatin contacts.

Parameters:
  • contact_file (Path) – File name of the fragment file.

  • file (Optional[Path]) – File name of the output h5ad file used to store the result. If provided, result will be saved to a backed AnnData, otherwise an in-memory AnnData is used.

  • genome (Optional[Genome]) – A Genome object, providing gene annotation and chromosome sizes. If not set, gff_file and chrom_size must be provided. genome has lower priority than gff_file and chrom_size.

  • chrom_size (Optional[dict[str, int]]) – A dictionary containing chromosome sizes, for example, {"chr1": 2393, "chr2": 2344, ...}. This is required if genome is not set. Setting chrom_size will override the chrom_size from the genome parameter.

  • sorted_by_barcode (bool) – Whether the fragment file has been sorted by cell barcodes. If sorted_by_barcode == True, this function makes use of small fixed amout of memory. If sorted_by_barcode == False and low_memory == False, all data will be kept in memory. See low_memory for more details.

  • bin_size (int) – The size of consecutive genomic regions used to record the counts.

  • chunk_size (int) – Increasing the chunk_size speeds up I/O but uses more memory.

  • tempdir (Optional[Path]) – Location to store temporary files. If None, system temporary directory will be used.

  • backend (Literal['hdf5']) – The backend.

Returns:

An annotated data matrix of shape n_obs x n_vars. Rows correspond to cells and columns to regions. If file=None, an in-memory AnnData will be returned, otherwise a backed AnnData is returned.

Return type:

AnnData | ad.AnnData