Preface

This is a notebook that will go through some examples of basecalling, alignment, methylation and assembly for nanopore sequencing reads.

We will go through this tutorial during the session, but please feel free to go through the tutorial at your own pace.

Getting started

The virtual box is provided by Vagrant. It’s a modification of Ubuntu 16 with miniconda3, R and docker installed along with some non-conda installations such as albacore and nanopolish. We will use docker to use any other apps. The enironment has been configured so that sudo is not required when using docker.

Console

  • Windows
  • Git Bash
  • MacOS/Ubuntu
  • Terminal

VM Installation

After installing vagrant, use one of the following options and subsequent commands to load the virtual machine.

Option 1:Downloading the VM - preferred

The VM has been uploaded to the Vagrant cloud, we can download it using the following commands.

# Create directory
mkdir -p ~/Vagrant/nanopore_nz
cd ~/Vagrant/nanopore_nz

# Initilise vagrant configuration file
vagrant init alexiswl/nanoporeVM
# If you get the error 'vagrant command not found', try logging out and back in again.

# Download the  vagrant box
vagrant up
# This will redirect to cloudstor. The file is large please be patient.
# SSH into the vagrant box.
vagrant ssh

Option 2: Importing the VM via USB.

In the somewhat likely event that the Wifi router cannot handle 20 users each downloading a 4 Gb file at once, three USB sticks have been loaded with the VM which can be added manually.

# Create the VM directory
mkdir -p ~/Vagrant/nanopore_dunedin/  

# Change into the VM directory
cd ~/Vagrant/nanopore_dunedin/

# Move across the Vagrant box from the USB onto your computer.
# You can use your file explorer to drop and drag into your preferred location

# Add the vagrant box
vagrant box add alexiswl/nanoporeVM /localpath/to/vagrant-box.box

# Generate the relevant Vagrant file
vagrant init alexiswl/nanoporeVM
# If you cannot find the vagrant command, try logging out and back in again*
# SSH into the vagrant box.
vagrant ssh

# You should now be user vagrant@xenial_ont
# xenial is the 'codename' for Ubuntu 16.

Jupyter

For those that are fans of having a documented version of their analysis, Jupyter is installed on the VM, however you will need to configure the vagrant file accordingly. See this link for more details.

The data directory.

The vagrant machine has a directory called /vagrant.
This is bound with the directory ~/Vagrant/nanopore_nz on the host machine. Any datasets we download today will be in this directory.

A note on owncloud. If a directory is downloaded from owncloud, it will be downloaded automatically as a zip file.

Basecalling

Download the DCM E cosli data set. The data was run on the standard ligation 1D kit with R 9.4 chemistry. The DNA has been PCR amplified.

The commands to do are here.

# Create the directory /vagrant/data/ecoli
mkdir -p /vagrant/data/ecoli_dcm_dam
# Write the zip file to /tmp
TEMP_LAMBDA=`mktemp /tmp/ecoli.XXXX.zip`
curl -O ${TEMP_LAMBDA} https://cloudstor.aarnet.edu.au/plus/s/YF69Ppmh5xTg3LQ/download
# Unzip download and move to /vagrant/data/lambda
unzip ${TEMP_LAMBDA}
mv ${TEMP_LAMBDA%.zip} /vagrant/data/ecoli

We will now create a basecalling bash script in /vagrant/data/lambda/basecalling_script If you are not comfortable in vim / nano or another terminal please open up Atom or Notepad++ and create the file in the appropriate folder. If you are on a Windows machine, ensure that the carriage return is off - CRLF to LF (Bottom left hand corner of the IDE).

Click here to download the basecalling shell script template.

Click here to view a solution to the basecalling script.

The log files

Plotting the log files in R. The read_fast5_basecaller places a tidy dataframe in the output.
We can use this to:
1. Plot the yield data over time.
2. Plot the ratio of pass and failed reads.
3. Generate a histogram of the read lengths generated.
Bonus: 4. Use the average quality column to create a plot of the average quality over the read-length.

Click here to view the plotting in R template.

Click here to view a solution to the R plots.

Nanoplot QC.

If learning the tidyverse was not your cup of tea. We get nanoplot to do this for us. Nanoplot is a simple tool to grab a few metrics post-basecalling.

Click here to view the nanoplot template

Click here to view a solution to the nanoplot command

Data trimming

Trimming the read data with porechop. Extracting the logs.

Alignment.

There are a few options for alignment, but the most popular and current aligner for nanopore sequencing is minimap2. minimap2 was created by the author of bwa-mem and samtools.

Analysing the alignment files.

Download this python script into your vagrant directory. Using the arguments x and y, we’re going to run this on our bam files to generate two tab delimited text files for each bam file. .error.tsv and .supp.tsv.

The error.tsv file has the following columns: + QueryName - the fastq identifier + EditDistance. + AlignmentLength.

The supp.tsv file has the following columns, where P stands for Primary and S - supplementary: + QueryName + P_ReferenceStart + P_ReferenceEnd + P_IsReverse + P_AlignmentLength + P_AlignmentStart + P_AlignmentEnd + P_MappingQuality + S_ReferenceStart + S_ReferenceEnd + S_IsReverse + S_AlignmentLength + S_AlignmentStart + S_AlignmentEnd + S_Mapping Quality

Let’s head back to the tidyverse and analyse the alignment files.

Click here to view the alignment template

Click here to view a solution to the alignment plots.

Assembly.

Download the plasmid dataset from here < ## We’ll run this through miniasm and racon.