Research Module M.Sc., Master Thesis or Bachelor Thesis: Development of software components for a database of microstructural mutations

Status open
Supervisors Stöver, Ben
Müller, Kai
Accepting institution Evolution and Biodiversity of Plants
Institute for Evolution and Biodiversity
WWU Münster
Hüfferstraße 1
48149 Münster



Multiple sequence alignments for phylogenetic homology assessment is essential for multiple sequence alignment (MSA) for phylogenetic purposes, but available MSA algorithms tend to align by sequence similarity or shared structure rather than by homology. (Morrison 2015) MSAs needed in phylogenetics represent hypotheses about the results of scenarios of molecular evolution. Due to the lack of suitable algorithms, many researchers in phylogenetics and evolutionary biology tend to manually edit the results of automated MSAs (Morrison 2009) to improve homology assessment. To increase reproducibility and throughput, it would be desirable to have MSA algorithms that focus on aligning by homology instead of simple sequence similarity or shared structure or function.

One important step towards better homology assessment in automated MSAs is to model patterns of molecular evolution that are more complex than simple equally distributed point mutations, insertions and deletions as they are modeled by the majority of current MSA algorithms.


The aim of the thesis is to construct a database that contains information on the location and frequency of microstructural mutations (MSMs). As MSMs we define possible patterns of molecular evolution that are more complex than simple indels or point mutations, such as:
  • Duplications of tandem repeat periods
  • Inversions
  • Transpositions
  • Mutations associated with inverted repeats (hairpins)
Based on available software components, an application shall be developed that searches DNA sequence data and MSAs of DNA sequences for the MSMs described above and stores the results in a database. This shall allow to answer the question which MSMs are how frequent in which taxa and how relevant they are for phylogenetic MSA. Additionally theories on currently unknown types of MSMs (e.g. tandem repeat extensions associated with hairpins or other secondary structure) shall be tested using the generated data.

Herbarium specimen with SIFT vectors

The figure above shows a screenshot of the current version of a software developed in our group able to locate some MSMs in a single sequence. The extension of this software to achieve the goals described above is one objective in the thesis.

What we offer
  • Individualized supervision and an advanced training in bioinformatical methods and software development.
  • Co-authorship in a journal publication depending on the progress made.
  • Interest in working in bioinformatics and software development.
  • Knowledge of a programming language. (Object orientation/Java would be perfect, but not obligatory.)
Further informationContact

If you are interested to work on this topic in your bachelor or master thesis or in a master research module, please contact Ben Stöver. (Working on other bioinformatics topics related to our research and software is also possible.)