Link Search Menu Expand Document (external link)

CLI Workflow

Map reads to a reference genome

Map reads using pbmm2.

pbmm2 align --preset ISOSEQ --sort <input.bam> <ref.fa> <mapped.bam>

Collapse into unique isoforms

Collapse redundant transcripts into unique isoforms based on exonic structures using isoseq collapse.

Single-cell IsoSeq:

isoseq3 collapse <mapped.bam> <collapsed.gff>

Bulk IsoSeq:

isoseq3 collapse --do-not-collapse-extra-5exons <mapped.bam> <collapsed.gff>

Sort input transcript GFF

Sort the transcript GFF file output from isoseq collapse.

pigeon sort <collapsed.gff> -o sorted.gff

Index the reference files

Index the genome annotation, (optional) CAGE peak, and (optional) intropolis files before classification.

pigeon index <gencode.annotation.gtf>
pigeon index <cage.bed>
pigeon index <intropolis.tsv>

Classify Isoforms

Classify isoforms into categories using the base required input.

pigeon classify <sorted.gff> <annotations.gtf> <reference.fa>

Alternatively, classify isoforms using supplemental reference information. Details in pigeon input.

pigeon classify <sorted.gff> <annotations.gtf> <reference.fa> --fl abundance.txt --cage-peak refTSS.bed --poly-a polyA.list

Filter isoforms

Filter isoforms from the classification output.

pigeon filter <classification.txt>

If you want to generate a filtered GFF, you need to also provide the GFF that was used as input to pigeon classify

pigeon filter <classification.txt> --isoforms <sorted.gff>

The expected output consists of:

*.filtered_lite_classification.txt
*.filtered_lite_junctions.txt
*.filtered_lite_reasons.txt
*.sorted.filtered_lite.gff (only if --isoforms is used)

Report gene saturation

Gene saturation can be determined by subsampling the classification output and determining the number of unique genes at each subsample size.

pigeon report <classification.filtered_lite_classification.txt> <saturation.txt>

Make Seurat compatible input

Output files that are compatible with the downstream Seurat analysis package.

pigeon make-seurat --dedup <dedup.fasta> --group <collapse.group.txt> -d <output_dir> <classification.filtered_lite_classification.txt>

The dedup.fasta file is obtained after running isoseq3 groupdedup or isoseq3 dedup. The collapse.group.txt file is obtained after running isoseq3 collapse.

The output will consist of:

Make-seurat output:
<output_dir>/annotated.info.csv
<output_dir>/info.csv
<output_dir>/genes_seurat/barcodes.tsv
<output_dir>/genes_seurat/genes.tsv
<output_dir>/genes_seurat/matrix.mtx
<output_dir>/isoforms_seurat/barcodes.tsv
<output_dir>/isoforms_seurat/genes.tsv
<output_dir>/isoforms_seurat/matrix.mtx

THIS WEBSITE AND CONTENT AND ALL SITE-RELATED SERVICES, INCLUDING ANY DATA, ARE PROVIDED "AS IS," WITH ALL FAULTS, WITH NO REPRESENTATIONS OR WARRANTIES OF ANY KIND, EITHER EXPRESS OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, ANY WARRANTIES OF MERCHANTABILITY, SATISFACTORY QUALITY, NON-INFRINGEMENT OR FITNESS FOR A PARTICULAR PURPOSE. YOU ASSUME TOTAL RESPONSIBILITY AND RISK FOR YOUR USE OF THIS SITE, ALL SITE-RELATED SERVICES, AND ANY THIRD PARTY WEBSITES OR APPLICATIONS. NO ORAL OR WRITTEN INFORMATION OR ADVICE SHALL CREATE A WARRANTY OF ANY KIND. ANY REFERENCES TO SPECIFIC PRODUCTS OR SERVICES ON THE WEBSITES DO NOT CONSTITUTE OR IMPLY A RECOMMENDATION OR ENDORSEMENT BY PACIFIC BIOSCIENCES.