Real-world example
Single sample
This is an example of an end-to-end cmd-line-only workflow to get from HiFi reads to transcripts. It’s a 1% subsampled Alzheimer dataset. You can download the HiFi reads generated by CCS v4.2:
# Download the pre-computed HiFi reads
$ wget https://downloads.pacbcloud.com/public/dataset/IsoSeq_sandbox/2020_Alzheimer8M_subset/alz.1perc.ccs.bam
$ cat primers.fasta
>NEB_5p
GCAATGAAGTCGCAGGGTTGGGG
>Clontech_5p
AAGCAGTGGTATCAACGCAGAGTACATGGGG
>NEB_Clontech_3p
GTACTCTGCGTTGATACCACTGCTT
$ lima --version
lima 1.11.0 (commit v1.11.0)
$ lima alz.1perc.ccs.bam primers.fasta alz.fl.bam --isoseq --peek-guess
$ ls alz.fl*
alz.fl.json alz.fl.lima.summary
alz.fl.lima.clips alz.fl.NEB_5p--NEB_Clontech_3p.bam
alz.fl.lima.counts alz.fl.NEB_5p--NEB_Clontech_3p.bam.pbi
alz.fl.lima.guess alz.fl.NEB_5p--NEB_Clontech_3p.subreadset.xml
alz.fl.lima.report
$ isoseq refine alz.fl.NEB_5p--NEB_Clontech_3p.bam primers.fasta alz.flnc.bam
$ ls alz.flnc.*
alz.flnc.bam alz.flnc.filter_summary.json
alz.flnc.bam.pbi alz.flnc.report.csv
alz.flnc.consensusreadset.xml
$ isoseq cluster alz.flnc.bam clustered.bam --verbose --use-qvs
Read BAM : (37648) 1s 235ms
Convert to reads : 589ms 797us
Sort Reads : 8ms 409us
Aligning Linear : 23s 63ms
Read to clusters : 861ms 287us
Aligning Linear : 20s 279ms
Merge by mapping : 7s 242ms
Consensus : 4s 663ms
Merge by mapping : 980ms 742us
Consensus : 103ms 913us
Write output : 1s 799ms
$ ls clustered*
clustered.bam clustered.hq.fasta.gz
clustered.bam.pbi clustered.lq.bam
clustered.cluster clustered.lq.bam.pbi
clustered.cluster_report.csv clustered.lq.fasta.gz
clustered.hq.bam clustered.transcriptset.xml
clustered.hq.bam.pbi
Multiplexed samples
# Download HiFi reads
$ wget https://downloads.pacbcloud.com/public/dataset/IsoSeq_sandbox/2020_MultiplexIsoSeq_toy/m54363_190223_194117.ccs.bam
# Download barcoded primers
$ wget https://downloads.pacbcloud.com/public/dataset/IsoSeq_sandbox/2020_MultiplexIsoSeq_toy/NEB_barcode16.fasta
# Demux and primer removal
$ lima m54363_190223_194117.ccs.bam NEB_barcode16.fasta fl.bam --isoseq --peek-guess
# Combine inputs
$ ls fl.bc1001_5p--bc1001_3p.bam fl.bc1002_5p--bc1002_3p.bam > all.fofn
# Remove poly(A) tails and concatemer
$ isoseq refine all.fofn NEB_barcode16.fasta flnc.bam --require-polya
$ isoseq cluster flnc.bam clustered.bam --use-qvs --verbose