Pigeon classify and filter output
Classification File
The classify and filter tools output a txt file containing isoform annotation information. The output from classify and filter will have the extensions _classification.txt
or _filtered.classification.txt
respectively. Both of these outputs follow the SQANTI3 classification file convention with the exception of two added columns.
Column | Description |
---|---|
fl_assoc | FL count associated with the isoform including PCR duplicates. |
cell_barcodes | Comma separated list of unique cell barcode ids associated with the isoform. |
Sample specific classification output
The output from classify and filter can contain fields that are separated by sample and labeled with the sample name. This output is expected when the flnc_count.txt
file from isoseq collapse
is input using --flnc
to classify.
Column | Description |
---|---|
FL.<Sample> | FL count for the isoform by sample according to the *.flnc_count.txt file. |
FL_TPM.<Sample> | Transcripts per million of the FL count for the isoform by sample. |
FL_TPM.<Sample>_log10 | Log10 of the transcripts per million for the isoform by sample. |
fl_assoc | FL count associated with the isoform across all samples in input. |
Junction File
The classify tool outputs a txt file containing every junction for each isoform (_junctions.txt
) following the SQANTI3 junction file convention.
Filtered Reasons File
The filter tool outputs a txt file containing the reasons an isoform was filtered.
Reasons an isoform can be filtered:
Reason | Description |
---|---|
IntraPriming | The primer was annealing to downstream A-rich regions. |
Mono-Exonic | The isoform contains a single exon. |
RTSwitching | There is an artifact of reverse transcriptase template switching. |
LowCoverage/Non-Canonical | There is a low sample coverage for splice sites that are not in the known “canonical” set of splice sites. |
Example:
# classification: sample_classification.txt
# isoform: ????????
# intrapriming cutoff: 0.6
# min_cov cutoff: 3
filtered_isoform,filter
PB.1.1,IntraPriming
PB.1.6,LowCoverage/Non-Canonical
PB.1.7,IntraPriming
PB.5.1,LowCoverage/Non-Canonical
PB.6.1,LowCoverage/Non-Canonical
Pigeon report output
The report tools outputs a txt file containing the read count and number of unique genes found in a subsampled number of reads.
Pigeon make-seurat output
The make-seurat tool outputs the required files to run tertiary analysis with Seurat and other examples here. Output is provided at both the isoform-level and the gene-level containing the sum of all isoforms associated with a particular gene.
Files output:
<output_dir>/annotated.info.csv
<output_dir>/annotated-prefilter.info.csv
<output_dir>/info.csv
<output_dir>/genes_seurat/barcodes.tsv
<output_dir>/genes_seurat/genes.tsv
<output_dir>/genes_seurat/matrix.mtx
<output_dir>/isoforms_seurat/barcodes.tsv
<output_dir>/isoforms_seurat/genes.tsv
<output_dir>/isoforms_seurat/matrix.mtx
info.csv
contains the following information:
id UMI UMIrev BC BCrev length count
molecule/0 CCGCTCTCCT AGGAGAGCGG AAACCTGAGACATAAC GTTATGTCTCAGGTTT 3718 1
annotated.info.csv
and annotated-prefilter.info.csv
contains additional information from pigeon classify
and pigeon filter
.
id pbid length transcript gene category ontarget ORFgroup UMI UMIrev BC BCrev pass_pigeon_filter
molecule/1856656 PB.10002.69 1201 ENST00000263918.9 STRN incomplete-splice_match NA NA GCATTACTGT ACAGTAATGC ACCGTAAAGAAGATTC GAATCTTCTTTACGGT PASS