What are we doing?
There is an ongoing explosion of scientific datasets being generated, brought on by recent technological advances in the area of genomic sequencing. As a result, the life sciences have become increasingly computational in nature, and bioinformatics has taken on a central role in research studies. However, basic computational skills and data stewardship are rarely taught in life science educational programmes (Attwood et al. 2019), resulting in a skills gap in many of the researchers tasked with analysing these big datasets. Acquiring a minimum level of computational skills helps researchers communicate and interact more effectively and improves critical thinking about research results (Tan et al, 2009; Welch et al, 2014).
Needs of the Community
The main challenge with this explosion of scientific datasets is not the required storage space, nor the computational resources, but rather the general lack of trained researchers to manipulate and analyse these data. The need for such training cannot be overstated; while the majority of researchers (>95%) work or plan to work with large datasets, most (>65%) possess only minimal bioinformatics skills and are not comfortable with statistical analyses (Larcombe et al, 2017; Williams and Teal, 2017). This overwhelming need creates a demand for training, which, at present, greatly exceeds supply (Attwood et al, 2017). In a recent survey from the EMBL (European Molecular Biology Laboratory), over 60% of biologists expressed the need for more training while only 5% called for additional computing power.
The Gallantries project aims to increase bioinformatics and core data analysis skills in the field of life sciences. In order to provide these skills as early as possible, this project focuses on Masters and PhD level students. Bioinformatics is a rapidly evolving field, and the tools and concepts taught in degree-length education programmes become outdated quickly. Therefore, we will create a modular curriculum, consisting of interrelated, but independent modules covering the latest developments in the field. These modules can be integrated into existing Master and PhD-programmes, either combined or individually. They will also be suitable for stand-alone use in workshops for researchers who have already obtained their PhD in different stages of their careers. This answers the high demand for supplemental education of later-career researchers (Via et al. 2019).
To reach the project’s overall aim to increase bioinformatics and core data analysis skills in the field of life sciences, the Gallantries project has four project specific objectives:
- To develop a set of four training modules covering fundamental computational and data analysis skills and competencies based around real-world scientific data analysis applied through the field of genomics (IO1-IO4).
- To deliver these training modules (IO1-IO4) in workshops and lessons via live-streaming sessions to multiple geographically distant satellite classrooms across Europe, applying a hybrid training methodology to ensure scalability and reusability.
- To develop a train-the-trainer program and build a community of instructors to ensure sustainability (IO5).
- To effectively engage stakeholders and disseminate project results to ensure uptake and sustainability.
Transnational collaboration and coordination is critical to the success of this project, as it allows us to make optimal use of the combined expertise of the EU-wide community of instructors for the development of the curricula. This project builds upon two existing international training initiatives. We will leverage these communities for dissemination, peer review of lessons, and facilitating maintenance in the long term.
The target groups of the intellectual outputs are Master and PhD-students of the educational programmes of the partner and associated partner institutes, later-career researchers, and the teachers in these programmes. Furthermore, we will coordinate closely with the educational directors of the participating institutions in order to implement the Gallantries outputs into the Master and PhD-programmes directly involved. As the project progresses, we will use conference participation and other dissemination channels to expand the Gallantries communities.
This project aims to provide a set of core skills and competencies essential to meet the data analysis needs in this “-omics” era (Tan 2009; Welsch 2019; Atwood 2019). The students will gain a) basic essential knowledge in the specific domains of computer science and statistics that intersect with modern biology, b) expertise in communicating and representing biological knowledge and processes in statistical and computing terms and concepts, c) ability to use bio-computational tools and techniques for the acquisition, interpretation, analysis, modelling, and visualisation of biological data, d) proficiency in the handling of biological data and information in databases for deriving biological insight and knowledge discovery, e) critical thinking and problem-solving skills in quantitative aspects of biology.
Novel training modules (IO1-4)
We will develop 4 novel training modules, each centred around a set of bioinformatics skills and competencies as outlined above. The modules will teach not only abstract analysis skills, but also the practical application of these skills to different scientific domains for which genomics is at the core.
Introduction to data analysis and -management, statistics, and coding.
Aimed at novices, this module will teach basic skills in data handling and analysis, statistics, and coding. These skills will be applied to basic genomics analyses. Genomics is the study of genetic material (DNA, RNA, proteins) of organisms, and has a wide range of applications in various domains such as biomedical research, environmental studies and agriculture. While the analyses involved in these different domains may vary considerably, they share a common core of data and file formats, and analysis steps.
Large-scale data analysis, and introduction to visualisation and data modelling.
Research studies often involve analysis of hundreds to thousands of samples. This module will focus on scaling up analyses from a single sample to large cohorts. These skills will be applied to the biomedical domain. We will focus on microbiome analyses; the study of genetic characteristics of communities of microorganisms, and their involvement in human health and disease. Secondly, this module will cover scaling up analyses in terms of data complexity. This will be illustrated using cancer analyses; cancer is a disease of the genome, resulting in high-complexity data and interpretation.
Data stewardship, federation, standardisation, integration, and collaboration
Going beyond a single research study, generated data must be made interoperable, so that results can be combined across studies. This involves standardisation of data formats, accessibility of data in federated databases, and collaboration. These competencies will be applied to the domain of genome annotation; the identification of genetic characteristics and their functions. This involves integrating data from multiple (often heterogeneous) data sources by large groups of scientists.
Data analysis for evidence and hypothesis generation and knowledge discovery.
Gaining novel scientific insights from these research analyses, requires visualisation, modelling, and statistical analysis. This module will focus on the skills required for knowledge acquisition from large-scale genetic studies. This will be applied to biodiversity; the interplay of different organisms within an entire ecosystem, based on the genomic characteristics of each of the different organisms.
Together, these four modules cover genomics as it is applied to a wide range of life science domains; from biomedical applications to the exploration of the world around us (e.g. agriculture and ecology).
Train-the-Trainer (TtT) and mentoring programme (IO5) and Community Building.
In order to ensure the long-term sustainability of this project, we will develop a TtT programme and build a community of (new) instructors. Instructors will learn how to deliver the modules using the hybrid training approach. They will also learn how to update existing materials and create novel modules reflecting the changes in the rapidly evolving field of genomics data analysis and bioinformatics. The TtT-programme will consist of: a) a TtT module; a 2-3-day workshop aimed at instructors, b) a user-friendly central framework for lesson development and organisation, c) an instructor handbook consisting of a set of guidelines describing how to prepare for, deliver, and follow-up on such hybrid trainings, and d) a system of communication channels, online events, and mentorships for instructors, in order to foster a sense of community.
Novel approach to curriculum
The partners will design and implement a training structure that combines the evidence-based teaching practices promoted by the Carpentries, with the technological and e-infrastructure offered by Galaxy. This training structure will offer the fundamental skills of data handling, data manipulation, and basic coding with the widely used coding languages R and Python. These fundamental skills will be taught in scientific domains (genome annotation, biodiversity and microbiome analysis), in order to directly showcase practical application of the new skills.
Novel approach to infrastructure
The partners will apply innovative approaches to e-infrastructure for training. A key issue in traditional training activities involving computational analysis, is the heterogeneity of the available computers, and software dependencies. The Gallantries will rely on the Galaxy platform, which removes all technical restrictions, as only a browser is required. Additionally, Galaxy already embeds frameworks often employed in coding training, e.g. RStudio and Jupyter. This enables adaptation of existing materials based on these frameworks, and gives the opportunity for this project to test and further adapt them in order to ensure scalability for use in hybrid workshops.
Novel approach to lesson delivery
The partners will refine the hybrid training delivery method previously piloted in the context of the Mozilla Science mini-grant. This method includes multiple satellite locations through online training delivery. It enhances the inclusiveness of higher education and reduces the overall carbon footprint by minimizing or fully eliminating travel. Finally, by having multiple satellite locations connected, a single workshop will have significantly higher impact, due to the larger audience.
Open science and education principles
The partners commit to the Open Science principles. Like the Carpentries and the GTN, all outputs will be openly licensed and available online for self-study. Additionally, all lessons will be developed collaboratively, inviting the wider scientific community to offer contributions, comments, and general thoughts to the process.
Scalability to other scientific disciplines
Given the Gallantries modular approach and infrastructure, the training can also be transferred to other scientific disciplines. The modules will cover highly transferable skills and competencies required within Life Sciences and Bioinformatics such as data management, but illustrated through application to specific domains.
We have high ethical standards, including:
- Education: Educate the researcher about HTS data analyse, reproducibility, open science
- Transparency: Emphasize transparency and the sharing of resources, material, knowledge and experiences
- Open science: Promote citizen science and decentralized access to science
- Modesty: Know you don’t know everything
- Community: Carefully listen to any concerns and questions and respond honestly
- Respect: Respect humans and all living systems
- Responsibility: Recognize the complexity and dynamics of life science and research and our responsibility towards them
You are very welcomed and invited to join the community: Come and chat with us on Gitter
Please note that it’s very important to us that we maintain a positive and supportive environment for everyone who wants to participate. When you join us we ask that you follow our code of conduct in all interactions both on and offline.