Module: Mycobacterium tuberculosis NGS made easy: data analysis step-by-step

Webinars and hands-on tutorials on several aspects related to NGS in Tuberculosis

Introduction

There were unusually high rates of TB cases in your country this year. To characterize the underlying bacterial factors driving the epidemic, isolates have been sent for whole-genome sequencing. Doctors and public health authorities request information in order to take decisions. In this course it will be demonstrated how you would make use of NGS to answer several questions relevant for patient and public health system management such as:

Are there cases of drug resistant bacteria?
Is there transmission of drug resistance?
Is there evidence of de novo emergence of resistance?
Are there multiple infections per patient?
Do we have on-going transmission?

We hope that at the end of the different training sessions you can answer this question on your own!

Part 1 - Overview of NGS technologies & TB specific NGS solutions

Webinar: Overview of NGS technologies & TB specific NGS solutions

This webinar will introduce different sequencing technologies and what applies best to what kind of problem.

Title:	Webinar: Overview of NGS technologies & TB specific NGS solutions
Description:	This webinar will introduce different sequencing technologies and what applies best to what kind of problem.
Length:	1h10m
Video	link
Author(s):	Andrea Cabibbe
Feedback	link

Speaker

Andrea Cabibbe

IRCCS Ospedale San Raffaele

Note: This is a training session outside of the GTN. Please contact the authors if you have questions.

Webinar: Implementation of NGS for TB- WHO documents and other considerations

This webinar will summarize the recommendations and considerations available from the WHO documents on the use of NGS for TB

Title:	Webinar: Implementation of NGS for TB- WHO documents and other considerations
Description:	This webinar will summarize the recommendations and considerations available from the WHO documents on the use of NGS for TB
Length:	1h
Video	link
Author(s):	Andrea Cabibbe
Feedback	link

Speaker

Andrea Cabibbe

IRCCS Ospedale San Raffaele

Note: This is a training session outside of the GTN. Please contact the authors if you have questions.

FAQs

For a list of frequently asked questions, please refer to this document

Note: This is a training session outside of the GTN. Please contact the authors if you have questions.

Part 2 - Mapping and Variant calling of short MTBC reads

The 20 strains isolated in your country have been sequenced with Illumina technology to obtain whole-genome sequences. In this part of the workshop you will learn how to analyse those sequences.

In a typical bioinformatic pipeline you would store your sequences in a computer server where all necessary software would be installed. This would be a server running the operating system LINUX, which is the most efficient way to run bioinformatics pipelines (more on this on Part 5). You will be running your analysis in a LINUX server from Galaxy, but instead of writing directly commands to execute operations in the server, you will be executing operations through a Galaxy graphical interface. This allows you to have access to a LINUX server and to run workflows without knowing LINUX. Importantly, for training purposes it also allows you to dedicate more attention in trying to understand what is being done in each of the steps without having to understand the programing behind. This being said, working directly on a LINUX cluster provides you always more flexibility, but if you don’t have access to one, Galaxy is a very good alternative for data analysis.

You will need to understand how to use Galaxy to run all the hands-on tutorials and therefore is highly recommended that you follow the next webinar and hands-on on Galaxy. The good thing about this is that once you know how it works, you can use it to run your own analysis with your own data.

Session 1: Learning Galaxy

A Very Short Introduction to Galaxy

Lecture & Demo Video

Description:	This video will introduce the Galaxy data analysis platform, and give a short demo on how to use it.
Length:	10 minutes
Captions:	Anton Nekrutenko
Created:	15 February 2021
Materials:	Related Slides: Version in Video \| Latest Version
Support:	Slack: #galaxy-intro Slack Invite Gitter Galaxy Help Forum

Go to the video page to see all versions.

Speaker

Anton Nekrutenko

Penn State University

Galaxy 101

Tutorial Video

Description:	This tutorial will introduce you to Galaxy. You will familiarize yourself with tools, workflows and histories. Those skills will be needed the next days
Length:	13 minutes
Captions:	Saskia Hiltemann
Created:	21 July 2021
Materials:	Tutorial: Version in Video \| Latest Version FAQs History on GalaxyEU
Support:	Slack: #galaxy-intro Slack Invite Gitter Galaxy Help Forum
Supported Servers:	Known Working GalaxyTrakr UseGalaxy.be UseGalaxy.eu ⭐️ UseGalaxy.fr UseGalaxy.no Possibly Working Galaxy@AuBi
Learning Objectives:	Familiarize yourself with the basics of Galaxy Learn how to obtain data from external sources Learn how to run tools Learn how histories work Learn how to create a workflow Learn how to share your work

Go to the video page to see all versions.

Speaker

Anton Nekrutenko

Penn State University

Session 2: Mapping and Variant calling of short MTBC reads -hands-on

Once you have received your sequences, the first step is to assess the quality of sequencing. Once we are sure that the sequencing worked well, we typically compare our sequencing results to a reference genome (re-sequencing approach) by using a bioinformatics procedure usually called mapping. After, we will identify the genomic variants in our sequences with respect to the reference genome, a bioinformatics procedure called, variant calling. Once we are certain of the variants we have identified, usually we are interested in determining to what genes they belong, to what pathways or for instance if they are likely to disrupt protein function. This procedure is called annotation. Once we have gone through each of these steps we are ready to analyse drug resistant patterns, draw phylogenetic relationships or identify clusters of transmission M. tuberculosis.

You are now ready for performing bioinformatic analysis in Galaxy. Before we start we would like you to watch a short video on how Illumina sequencing works. Following that video we have prepared a webinar on mapping and variant calling of Illumina applied to MTBC. After watching it you will be hopefully able to know; how a reference genome is chosen, why we typically ignore some regions of the MTBC genomes or what is the difference between a fixed and a variable SNP and why do we care about it (among other things).

Illumina sequencing principle video

Title:	Illumina sequencing principle video
Length:	5m
Video	link

Note: This is a training session outside of the GTN. Please contact the authors if you have questions.

Webinar: Mapping and Variant calling

Main bioinformatics steps involved in mapping and variant calling from Illumina short reads applied to MTBC

Title:	Webinar: Mapping and Variant calling
Description:	Main bioinformatics steps involved in mapping and variant calling from Illumina short reads applied to MTBC
Length:	45m
Video	link
Author(s):	Daniela Brites
Feedback	link

Speaker

Daniela Brites

Swiss Tropical and Public Health Institute

Note: This is a training session outside of the GTN. Please contact the authors if you have questions.

FAQs

For a list of frequently asked questions, please refer to this document

Note: This is a training session outside of the GTN. Please contact the authors if you have questions.

Hands-on: M. tuberculosis Variant Analysis

M. tuberculosis Variant Analysis

image indicating this is a self-study session.

This is a self-study session. Please work through the materials on your own, and ask the instructors for help if you get stuck or have any questions!

Tutorial:	M. tuberculosis Variant Analysis
Description:	Variation in the genome of M. tuberculosis (Mtb) is associated with changes in phenotype, for example drug resistance and virulence. It is also useful for outbreak investigation as the single nucleotide polymorphisms (SNPs) in a sample can be used to build a phylogeny.
Material:	GTN Tutorial
Support:	Slack: #variant-analysis_tb-variant-analysis Slack Invite Gitter Galaxy Help Forum

Go to the video page to see all versions.

Instructor

The Galaxy Training Network

Part 3 - Evolutionary epidemiology: using phylogenetics to understand DR emergence and Mtb transmission

We are ready to analyse drug resistant patterns, draw phylogenetic relationships or identify recent transmission among the isolates we have sampled in our population. Before delving into the analysis of the genomes we would like to share with you some notions important to the inference of direct transmission and to the interpretation of drug resistant patterns.

Webinar: Drug resistance prediction

Principles of drug resistance detection from genomic data

Title:	Webinar: Drug resistance prediction
Description:	Principles of drug resistance detection from genomic data
Length:	20m
Video	link
Author(s):	Galo Adrián Goig
Feedback	link

Speaker

Galo Adrián Goig

Swiss Tropical and Public Health Institute

Note: This is a training session outside of the GTN. Please contact the authors if you have questions.

Webinar: "Phylogenetic" mutations

This video will introduce one special type of mutations to take into account when studying drug resistance patterns

Title:	Webinar: "Phylogenetic" mutations
Description:	This video will introduce one special type of mutations to take into account when studying drug resistance patterns
Length:	15m
Video	link
Author(s):	Galo Adrián Goig
Feedback	link

Speaker

Galo Adrián Goig

Swiss Tropical and Public Health Institute

Note: This is a training session outside of the GTN. Please contact the authors if you have questions.

Webinar: The concept of clustering

Main aspects of clustering analysis to infer transmission in MTBC

Title:	Webinar: The concept of clustering
Description:	Main aspects of clustering analysis to infer transmission in MTBC
Length:	15m
Video	link
Author(s):	Galo Adrián Goig

Speaker

Galo Adrián Goig

Swiss Tropical and Public Health Institute

Note: This is a training session outside of the GTN. Please contact the authors if you have questions.

Webinar: Genetic distance thresholds

Clustering as an approximation to infer transmission

Title:	Webinar: Genetic distance thresholds
Description:	Clustering as an approximation to infer transmission
Length:	15m
Video	link
Author(s):	Galo Adrián Goig
Feedback	link

Speaker

Galo Adrián Goig

Swiss Tropical and Public Health Institute

Note: This is a training session outside of the GTN. Please contact the authors if you have questions.

Hands-on: Identifying tuberculosis transmission links: from SNPs to transmission clusters

Identifying tuberculosis transmission links: from SNPs to transmission clusters

This is a self-study session. Please work through the materials on your own, and ask the instructors for help if you get stuck or have any questions!

Tutorial:	Identifying tuberculosis transmission links: from SNPs to transmission clusters
Description:	Learning how to do clustering analysis and interpret drug resistance patterns
Material:	GTN Tutorial
Support:	Slack: #evolution_mtb_transmission Slack Invite Gitter Galaxy Help Forum

Go to the video page to see all versions.

Instructor

The Galaxy Training Network

Introduction to phylogenetics

Recommended to those wanting to learn more about phylogenetics

Title:	Introduction to phylogenetics
Description:	Recommended to those wanting to learn more about phylogenetics
Length:	1h
Tutorial:	link
Author(s):	EMBL-EBI

Speaker

EMBL-EBI

EMBL-EBI

Note: This is a training session outside of the GTN. Please contact the authors if you have questions.

Hands-on: Tree thinking for tuberculosis evolution and epidemiology

Tree thinking for tuberculosis evolution and epidemiology

This is a self-study session. Please work through the materials on your own, and ask the instructors for help if you get stuck or have any questions!

Tutorial:	Tree thinking for tuberculosis evolution and epidemiology
Description:	Main principles of phylogenetic inference, tree interpretation
Material:	GTN Tutorial
Support:	Slack: #evolution_mtb_phylogeny Slack Invite Gitter Galaxy Help Forum

Go to the video page to see all versions.

Instructor

The Galaxy Training Network

FAQs

For a list of Frequently Asked Quetions, please refer to this document

Note: This is a training session outside of the GTN. Please contact the authors if you have questions.

Part 4 - Webtools dedicated to MTBC bioinformatics

The use of whole-genome sequencing (WGS) for antibiotic resistance prediction and routine typing of bacterial isolates has increased substantially in recent years. To date a multitude of solutions for analyzing WGS data of the Mycobacterium tuberculosis complex (MTBC) data have been developed. In the first 4th part, we introduce some freely available webtools and open source pipelines designed to analyze MTBC sequence data and we’ll provide some examples of how these tools work and how to interpret the results.

Webinar: Web tools for analysis of MTBC sequenced data

Introduction to most common web tools for fast identification of bacterial species from raw sequencing reads

Title:	Webinar: Web tools for analysis of MTBC sequenced data
Description:	Introduction to most common web tools for fast identification of bacterial species from raw sequencing reads
Length:	50m
Video	link
Author(s):	Arash Ghodousi
Feedback	link

Speaker

Arash Ghodousi

IRCCS Ospedale San Raffaele & Università Vita-Salute San Raffaele

Note: This is a training session outside of the GTN. Please contact the authors if you have questions.

Webinar: Introduction to the MTBseq pipeline

Introduction to MTBseq pipeline, an automated pipeline for mapping, variant calling and detection of resistance mediating and phylogenetic variants from whole genome sequence data of MTBC

Title:	Webinar: Introduction to the MTBseq pipeline
Description:	Introduction to MTBseq pipeline, an automated pipeline for mapping, variant calling and detection of resistance mediating and phylogenetic variants from whole genome sequence data of MTBC
Length:	30m
Video	link
Author(s):	Arash Ghodousi

Speaker

Arash Ghodousi

IRCCS Ospedale San Raffaele & Università Vita-Salute San Raffaele

Note: This is a training session outside of the GTN. Please contact the authors if you have questions.

FAQs

For frequently asked questions, please see this document

Note: This is a training session outside of the GTN. Please contact the authors if you have questions.

Part 5 - Be a bioinformatician in the jungle

On parts 2 and 3 of this training you have learned how you could use galaxy for analysing your own data. Establishing your own workflows in galaxy would allow you combining different tools and build your own pipeline without having to know how to program. If you are not so interested in having your own pipeline, webtools for WGS analysis can be very useful, as we have shown in part 4.

However, in this last part of the training we would like to convey to you what it would take if would want to run Linux via the command line. The Linux operating system will be introduced, how to perform basic tasks using the Unix shell and how to install and run pipelines on the command line. You will learn the power of the Unix shell in performing complex and powerful tasks, often with just a few keystrokes or lines of code. In fact, Unix shell helps users automate repetitive tasks and easily combine smaller tasks into larger, more powerful workflows (i.e. pipelines). Use of the shell is fundamental to a wide range of advanced computing tasks, including high-performance computing. These webinars will introduce you to this powerful tool. Which approach to choose, Galaxy workflows, Webtools or native Linux depends on your needs, your interests and what computer resources you have available.

Webinar: Introduction to Linux

Introduction to Linux OS: installation and usage

Title:	Webinar: Introduction to Linux
Description:	Introduction to Linux OS: installation and usage
Length:	35m
Video	link
Author(s):	Andrea Spitaleri

Speaker

Andrea Spitaleri

IRCCS Ospedale San Raffaele & Università Vita-Salute San Raffaele

Note: This is a training session outside of the GTN. Please contact the authors if you have questions.

Webinar: How to run programs (Python, Docker, Nextflow)

Learning how to install and use programs to analyze data

Title:	Webinar: How to run programs (Python, Docker, Nextflow)
Description:	Learning how to install and use programs to analyze data
Length:	35m
Video	link
Author(s):	Andrea Spitaleri

Speaker

Andrea Spitaleri

IRCCS Ospedale San Raffaele & Università Vita-Salute San Raffaele

Note: This is a training session outside of the GTN. Please contact the authors if you have questions.

Webinar: Demo on how to run the Linux command line

Demo video on how to use the shell commands

Title:	Webinar: Demo on how to run the Linux command line
Description:	Demo video on how to use the shell commands
Length:	20m
Video	link
Author(s):	Andrea Spitaleri

Speaker

Andrea Spitaleri

IRCCS Ospedale San Raffaele & Università Vita-Salute San Raffaele

Note: This is a training session outside of the GTN. Please contact the authors if you have questions.

Hands-on: The Unix Shell

Recommended tutorial from software carpentries to those wanting to learn Linux.

Title:	Hands-on: The Unix Shell
Description:	Recommended tutorial from software carpentries to those wanting to learn Linux.
Length:	4h
Tutorial:	link
Author(s):	The Carpentries

Speaker

The Carpentries

The Carpentries

Note: This is a training session outside of the GTN. Please contact the authors if you have questions.