Pigeon classify and filter output

Classification File

The classify and filter tools output a txt file containing isoform annotation information. The output from classify and filter will have the extensions _classification.txt or _filtered.classification.txt respectively. Both of these outputs follow the SQANTI3 classification file convention with the exception of two added columns.

Column Description
fl_assoc FL count associated with the isoform including PCR duplicates.
cell_barcodes Comma separated list of unique cell barcode ids associated with the isoform.

Sample specific classification output

The output from classify and filter can contain fields that are separated by sample and labeled with the sample name. This output is expected when the flnc_count.txt file from isoseq collapse is input using --flnc to classify.

Column Description
FL.<Sample> FL count for the isoform by sample according to the *.flnc_count.txt file.
FL_TPM.<Sample> Transcripts per million of the FL count for the isoform by sample.
FL_TPM.<Sample>_log10 Log10 of the transcripts per million for the isoform by sample.
fl_assoc FL count associated with the isoform across all samples in input.

Junction File

The classify tool outputs a txt file containing every junction for each isoform (_junctions.txt) following the SQANTI3 junction file convention.

Filtered Reasons File

The filter tool outputs a txt file containing the reasons an isoform was filtered.

Reasons an isoform can be filtered:

Reason Description
IntraPriming The primer was annealing to downstream A-rich regions.
Mono-Exonic The isoform contains a single exon.
RTSwitching There is an artifact of reverse transcriptase template switching.
LowCoverage/Non-Canonical There is a low sample coverage for splice sites that are not in the known “canonical” set of splice sites.

Example:

# classification: sample_classification.txt
# isoform: ????????
# intrapriming cutoff: 0.6
# min_cov cutoff: 3
filtered_isoform,filter
PB.1.1,IntraPriming
PB.1.6,LowCoverage/Non-Canonical
PB.1.7,IntraPriming
PB.5.1,LowCoverage/Non-Canonical
PB.6.1,LowCoverage/Non-Canonical

Pigeon report output

The report tools outputs a txt file containing the read count and number of unique genes found in a subsampled number of reads.

Pigeon make-seurat output

The make-seurat tool outputs the required files to run tertiary analysis with Seurat and other examples here. Output is provided at both the isoform-level and the gene-level containing the sum of all isoforms associated with a particular gene.

Files output:

<output_dir>/annotated.info.csv
<output_dir>/annotated-prefilter.info.csv
<output_dir>/info.csv
<output_dir>/genes_seurat/barcodes.tsv
<output_dir>/genes_seurat/genes.tsv
<output_dir>/genes_seurat/matrix.mtx
<output_dir>/isoforms_seurat/barcodes.tsv
<output_dir>/isoforms_seurat/genes.tsv
<output_dir>/isoforms_seurat/matrix.mtx

info.csv contains the following information:

id	UMI	UMIrev	BC	BCrev	length	count
molecule/0	CCGCTCTCCT	AGGAGAGCGG	AAACCTGAGACATAAC	GTTATGTCTCAGGTTT	3718	1

annotated.info.csv and annotated-prefilter.info.csv contains additional information from pigeon classify and pigeon filter.

id	pbid	length	transcript	gene	category	ontarget	ORFgroup	UMI	UMIrev	BC	BCrev	pass_pigeon_filter
molecule/1856656	PB.10002.69	1201	ENST00000263918.9	STRN	incomplete-splice_match	NA	NA	GCATTACTGT	ACAGTAATGC	ACCGTAAAGAAGATTC	GAATCTTCTTTACGGT	PASS

THIS WEBSITE AND CONTENT AND ALL SITE-RELATED SERVICES, INCLUDING ANY DATA, ARE PROVIDED "AS IS," WITH ALL FAULTS, WITH NO REPRESENTATIONS OR WARRANTIES OF ANY KIND, EITHER EXPRESS OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, ANY WARRANTIES OF MERCHANTABILITY, SATISFACTORY QUALITY, NON-INFRINGEMENT OR FITNESS FOR A PARTICULAR PURPOSE. YOU ASSUME TOTAL RESPONSIBILITY AND RISK FOR YOUR USE OF THIS SITE, ALL SITE-RELATED SERVICES, AND ANY THIRD PARTY WEBSITES OR APPLICATIONS. NO ORAL OR WRITTEN INFORMATION OR ADVICE SHALL CREATE A WARRANTY OF ANY KIND. ANY REFERENCES TO SPECIFIC PRODUCTS OR SERVICES ON THE WEBSITES DO NOT CONSTITUTE OR IMPLY A RECOMMENDATION OR ENDORSEMENT BY PACIFIC BIOSCIENCES.