Link Search Menu Expand Document

CLI Workflow

The low-level workflow explained via CLI calls. All necessary dependencies are installed via bioconda.

Step 1 - Input

CLR data from Sequel / Sequel II / Sequel IIe

For each SMRT cell a movieX.subreads.bam is needed for processing.

Each sequencing run is processed by ccs to generate one representative circular consensus sequence (CCS) for each ZMW. It is advised to use the latest CCS version 4.2.0 or newer. ccs can be installed with conda install pbccs.

$ ccs movieX.subreads.bam movieX.ccs.bam --min-rq 0.9

You can easily parallelize ccs generation by chunking, please follow this how-to.

CCS data from Sequel IIe

If on-instrument CCS was performed, you can use the reads.bam or hifi_reads.bam as input.

The hifi_reads.bam contains only HiFi reads, with predicted accuracy ≥Q20. No additional filtering is required.

The reads.bam contains one representative sequence per productive ZMW, irrespective of quality and passes. Do not forget to use isoseq3 refine --min-rq 0.9 in step 3!

Step 2 - Primer removal and demultiplexing

Removal of primers and identification of barcodes is performed using lima, which can be installed with
conda install lima and offers a specialized --isoseq mode. Even in the case that your sample is not barcoded, primer removal is performed by lima. If there are more than two sequences in your primer.fasta file or better said more than one pair of 5’ and 3’ primers, please use lima with --peek-guess to remove spurious false positive signal. More information about how to name input primer(+barcode) sequences in this lima Iso-Seq FAQ.

$ lima movieX.ccs.bam barcoded_primers.fasta movieX.fl.bam --isoseq --peek-guess

Example 1: Following is the primer.fasta for the Clontech SMARTer and NEB cDNA library prep, which are the officially recommended protocols:

>NEB_5p
GCAATGAAGTCGCAGGGTTGGG
>Clontech_5p
AAGCAGTGGTATCAACGCAGAGTACATGGGG
>NEB_Clontech_3p
GTACTCTGCGTTGATACCACTGCTT

Example 2: Following are examples for barcoded primers using a 16bp barcode followed by Clontech primer:

>primer_5p
AAGCAGTGGTATCAACGCAGAGTACATGGGG
>brain_3p
CGCACTCTGATATGTGGTACTCTGCGTTGATACCACTGCTT
>liver_3p
CTCACAGTCTGTGTGTGTACTCTGCGTTGATACCACTGCTT

Lima will remove unwanted combinations and orient sequences to 5’ → 3’ orientation.

Output files will be called according to their primer pair. Example for single sample libraries:

movieX.fl.NEB_5p--NEB_Clontech_3p.bam

If your library contains multiple samples, execute the following workflow for each primer pair:

movieX.fl.primer_5p--brain_3p.bam
movieX.fl.primer_5p--liver_3p.bam

Step 3 - Refine

Your data now contains full-length reads, but still needs to be refined by:

Input
The input file for refine is one demultiplexed CCS file with full-length reads and the primer fasta file:

  • <movie.primer--pair>.fl.bam or <movie.primer--pair>.fl.consensusreadset.xml
  • primers.fasta

Output
The following output files of refine contain full-length non-concatemer reads:

  • <movie>.flnc.bam
  • <movie>.flnc.transcriptset.xml

Actual command to refine:

$ isoseq refine movieX.NEB_5p--NEB_Clontech_3p.fl.bam primers.fasta movieX.flnc.bam

If your sample has poly(A) tails, use --require-polya. This filters for FL reads that have a poly(A) tail with at least 20 base pairs (--min-polya-length) and removes identified tail:

$ isoseq refine movieX.NEB_5p--NEB_Clontech_3p.fl.bam movieX.flnc.bam --require-polya

Attention!
If your workflow input is reads.bam, use --min-rq 0.9

Step 3b - Merge SMRT Cells

If you used more than one SMRT cells, list all of your <movie>.flnc.bam in one flnc.fofn, a file of filenames:

$ ls movie*.flnc.bam movie*.flnc.bam movie*.flnc.bam > flnc.fofn

Step 4 - Clustering

Compared to previous IsoSeq approaches, IsoSeq v3 performs a single clustering technique. Due to the nature of the algorithm, it can’t be efficiently parallelized. It is advised to give this step as many coresas possible. The individual steps of cluster are as following:

  • Clustering using hierarchical n*log(n) alignment and iterative cluster merging
  • Polished POA sequence generation, using a QV guided consensus approach

Input The input file for cluster is one FLNC file:

  • <movie>.flnc.bam or flnc.fofn

Output The following output files of cluster contain polished isoforms:

  • <prefix>.bam
  • <prefix>.hq.fasta.gz with predicted accuracy ≥ 0.99
  • <prefix>.lq.fasta.gz with predicted accuracy < 0.99
  • <prefix>.bam.pbi
  • <prefix>.transcriptset.xml

Example invocation:

$ isoseq cluster flnc.fofn clustered.bam --verbose --use-qvs

THIS WEBSITE AND CONTENT AND ALL SITE-RELATED SERVICES, INCLUDING ANY DATA, ARE PROVIDED "AS IS," WITH ALL FAULTS, WITH NO REPRESENTATIONS OR WARRANTIES OF ANY KIND, EITHER EXPRESS OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, ANY WARRANTIES OF MERCHANTABILITY, SATISFACTORY QUALITY, NON-INFRINGEMENT OR FITNESS FOR A PARTICULAR PURPOSE. YOU ASSUME TOTAL RESPONSIBILITY AND RISK FOR YOUR USE OF THIS SITE, ALL SITE-RELATED SERVICES, AND ANY THIRD PARTY WEBSITES OR APPLICATIONS. NO ORAL OR WRITTEN INFORMATION OR ADVICE SHALL CREATE A WARRANTY OF ANY KIND. ANY REFERENCES TO SPECIFIC PRODUCTS OR SERVICES ON THE WEBSITES DO NOT CONSTITUTE OR IMPLY A RECOMMENDATION OR ENDORSEMENT BY PACIFIC BIOSCIENCES.