Real-world example

Single sample

This is an example of an end-to-end cmd-line-only workflow to get from HiFi reads to transcripts. It’s a 1% subsampled Alzheimer dataset. You can download the HiFi reads generated by CCS v4.2:

# Download the pre-computed HiFi reads
$ wget https://downloads.pacbcloud.com/public/dataset/IsoSeq_sandbox/2020_Alzheimer8M_subset/alz.1perc.ccs.bam

$ cat primers.fasta
>NEB_5p
GCAATGAAGTCGCAGGGTTGGGG
>Clontech_5p
AAGCAGTGGTATCAACGCAGAGTACATGGGG
>NEB_Clontech_3p
GTACTCTGCGTTGATACCACTGCTT

$ lima --version
lima 1.11.0 (commit v1.11.0)

$ lima alz.1perc.ccs.bam primers.fasta alz.fl.bam --isoseq --peek-guess

$ ls alz.fl*
alz.fl.json         alz.fl.lima.summary
alz.fl.lima.clips   alz.fl.NEB_5p--NEB_Clontech_3p.bam
alz.fl.lima.counts  alz.fl.NEB_5p--NEB_Clontech_3p.bam.pbi
alz.fl.lima.guess   alz.fl.NEB_5p--NEB_Clontech_3p.subreadset.xml
alz.fl.lima.report

$ isoseq refine alz.fl.NEB_5p--NEB_Clontech_3p.bam primers.fasta alz.flnc.bam

$ ls alz.flnc.*
alz.flnc.bam                   alz.flnc.filter_summary.json
alz.flnc.bam.pbi               alz.flnc.report.csv
alz.flnc.consensusreadset.xml

$ isoseq cluster alz.flnc.bam clustered.bam --verbose --use-qvs
Read BAM                 : (37648) 1s 235ms
Convert to reads         : 589ms 797us
Sort Reads               : 8ms 409us
Aligning Linear          : 23s 63ms
Read to clusters         : 861ms 287us
Aligning Linear          : 20s 279ms
Merge by mapping         : 7s 242ms
Consensus                : 4s 663ms
Merge by mapping         : 980ms 742us
Consensus                : 103ms 913us
Write output             : 1s 799ms

$ ls clustered*
clustered.bam                 clustered.hq.fasta.gz
clustered.bam.pbi             clustered.lq.bam
clustered.cluster             clustered.lq.bam.pbi
clustered.cluster_report.csv  clustered.lq.fasta.gz
clustered.hq.bam              clustered.transcriptset.xml
clustered.hq.bam.pbi

Multiplexed samples

# Download HiFi reads
$ wget https://downloads.pacbcloud.com/public/dataset/IsoSeq_sandbox/2020_MultiplexIsoSeq_toy/m54363_190223_194117.ccs.bam

# Download barcoded primers
$ wget https://downloads.pacbcloud.com/public/dataset/IsoSeq_sandbox/2020_MultiplexIsoSeq_toy/NEB_barcode16.fasta

# Demux and primer removal
$ lima m54363_190223_194117.ccs.bam NEB_barcode16.fasta fl.bam --isoseq --peek-guess

# Combine inputs
$ ls fl.bc1001_5p--bc1001_3p.bam fl.bc1002_5p--bc1002_3p.bam > all.fofn

# Remove poly(A) tails and concatemer
$ isoseq refine all.fofn NEB_barcode16.fasta flnc.bam --require-polya

$ isoseq cluster flnc.bam clustered.bam --use-qvs --verbose

THIS WEBSITE AND CONTENT AND ALL SITE-RELATED SERVICES, INCLUDING ANY DATA, ARE PROVIDED "AS IS," WITH ALL FAULTS, WITH NO REPRESENTATIONS OR WARRANTIES OF ANY KIND, EITHER EXPRESS OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, ANY WARRANTIES OF MERCHANTABILITY, SATISFACTORY QUALITY, NON-INFRINGEMENT OR FITNESS FOR A PARTICULAR PURPOSE. YOU ASSUME TOTAL RESPONSIBILITY AND RISK FOR YOUR USE OF THIS SITE, ALL SITE-RELATED SERVICES, AND ANY THIRD PARTY WEBSITES OR APPLICATIONS. NO ORAL OR WRITTEN INFORMATION OR ADVICE SHALL CREATE A WARRANTY OF ANY KIND. ANY REFERENCES TO SPECIFIC PRODUCTS OR SERVICES ON THE WEBSITES DO NOT CONSTITUTE OR IMPLY A RECOMMENDATION OR ENDORSEMENT BY PACIFIC BIOSCIENCES.