snapatac2.pp.make_gene_matrix#
- snapatac2.pp.make_gene_matrix(adata, gene_anno, *, inplace=False, file=None, backend='hdf5', chunk_size=500, use_x=False, id_type='gene', transcript_name_key='transcript_name', transcript_id_key='transcript_id', gene_name_key='gene_name', gene_id_key='gene_id', min_frag_size=None, max_frag_size=None, count_frag_as_reads=True)[source]#
Generate cell by gene activity matrix.
Generate cell by gene activity matrix by counting the TN5 insertions in gene body regions. The result will be stored in a new file and a new AnnData object will be created.
import_data
must be ran first in order to use this function.- Parameters:
adata (
AnnData
|AnnDataSet
) – The (annotated) data matrix of shapen_obs
xn_vars
. Rows correspond to cells and columns to regions.gene_anno (
Genome
|Path
) – Either a Genome object or the path of a gene annotation file in GFF or GTF format.inplace (
bool
) – Whether to add the gene matrix to the AnnData object or return a new AnnData object.file (
Optional
[Path
]) – File name of the h5ad file used to store the result. This has no effect wheninplace=True
.backend (
Optional
[Literal
['hdf5'
]]) – The backend to use for storing the result. IfNone
, the default backend will be used.chunk_size (
int
) – Chunk sizeuse_x (
bool
) – If True, use the matrix stored in.X
to compute the gene activity. Otherwise the.obsm['insertion']
is used.id_type (
Literal
['gene'
,'transcript'
]) – “gene” or “transcript”.transcript_name_key (
str
) – The key of the transcript name in the gene annotation file.transcript_id_key (
str
) – The key of the transcript id in the gene annotation file.gene_name_key (
str
) – The key of the gene name in the gene annotation file.gene_id_key (
str
) – The key of the gene id in the gene annotation file.min_frag_size (
Optional
[int
]) – Minimum fragment size to include.max_frag_size (
Optional
[int
]) – Maximum fragment size to include.count_frag_as_reads (
bool
) – Whether to count fragments as reads. IfTrue
, each fragment is converted to two points representing both ends of the fragment.
- Returns:
An annotated data matrix of shape
n_obs
xn_vars
. Rows correspond to cells and columns to genes. Iffile=None
, an in-memory AnnData will be returned, otherwise a backed AnnData is returned.- Return type:
See also
Examples
>>> import snapatac2 as snap >>> data = snap.pp.import_data(snap.datasets.pbmc500(downsample=True), chrom_sizes=snap.genome.hg38, sorted_by_barcode=False) >>> gene_mat = snap.pp.make_gene_matrix(data, gene_anno=snap.genome.hg38) >>> print(gene_mat) AnnData object with n_obs × n_vars = 585 × 60606 obs: 'n_fragment', 'frac_dup', 'frac_mito'