Bachelorarbeit: Erweiterung des Metadatenmodells des Editors für phylogenetische Bäume, TreeGraph 2

Status abgeschlossen am Sep 30, 2017
Student Brech, Phoebe
Betreuer Müller, Kai
Kemena, Carsten
Stöver, Ben
Bewertende Einrichtung Evolution und Biodiversität der Pflanzen
Institut für Evolution und Biodiversität
WWU Münster
Hüfferstraße 1
48149 Münster
Germany

Zusammenfassung

Why this thesis?

Accessibility and reusability of data are of fundamental importance in bioinformatics and in science in general. Making biological data available in public databases is one step, but due to the increasingly fast growth of available data, it is equally important to describe the data and the relations between it in a standardized and machine readable form to be able to efficiently reuse it in automated bioinformatics analyses. Such metadata would ideally be provided by the same scientists who produce the data and therefore standards and software tools to assist them with this are necessary.

In the field of phylogenetics, the NeXML format is a recent standard that allows to annotate taxon lists, multiple sequence alignments or phylogenetic trees with metadata using the resource description framework (RDF). RDF is a generally accepted standard of the World Wide Web Consortium (W3C) to formulate logical expressions on any type of data using externally defined predicates and fulfills the requirements described above. (See example in the figure below for details.) Our phylogenetic tree editor TreeGraph 2 on the other hand is a software tool able to attach metadata to tree nodes and branches that currently uses freely definable string keys to link such data. An advanced support of RDF-annotations and NeXML in TreeGraph 2 would enable biologists generating data to easily attach the necessary standardized annotations to allow efficient data reuse.

Objectives

The aim of the thesis is to extend the current metadata model of TreeGraph 2 to support RDF-based externally defined predicates and to make full use of the possibilities offered by NeXML. Users should be able to attach metadata to tree nodes, branches and the whole tree using RDF-predicates as an alternative to or in combination with freely definable string keys, as they are used in the current version of the tree editor.

One or more of the following subtopics can be part of the thesis:
  • Developing a data model that combines the currently used freely definable string keys to attach data with the RDF model and its externally defined predicates.
  • Implementing the new data model.
  • Refactoring of the GUI elements to let the user work efficiently with the new metadata model.
  • Replacing XTG with NeXML as TreeGraph’s native file format.
Example

The following tree contains bootstrap support values in its internal branches.

Example tree


In the representation as a Newick string, the support values are stored as internal node names.
(((A,B)99,C)83,(D,E)89)98;
This way, no information on the meaning of the values it given. It is not even defined that the node names are support values at all. This way, it would be difficult or impossible to automatically process the information and its meaning. The following NeXML representation of the tree contains additional information linked using RDF predicates that explains the meaning of the values. This way automatic processing and data reuse is possible.
<tree id="tree" xsi:type="nex:FloatTree">
<!-- Declaration of the type of support value. -->
<meta rel="phyl:supportValueType">
<meta property="phyl:type">bootstrap</meta>
<meta rel="phyl:software">
<meta property="phyl:name">RaXML</meta>
<meta property="phyl:version">8.2.4</meta>
</meta>
<meta property="phyl:valueName">support1</meta>
</meta>
...
<edge id="edge1" source="node1" target="node2">
<!-- Attachemant of a support value to a tree branch -->
<meta rel="phyl:support">
<meta property="phyl:supportName">support1</meta>
<meta property="phyl:value" datatype="xsd:integer">83</meta>
</meta>
</edge>
<edge id="edge2" source="node2" target="node3">
<!-- Attachemant of a support value to a tree branch -->
<meta rel="phyl:support">
<meta property="phyl:supportName">support1</meta>
<meta property="phyl:value" datatype="xsd:integer">99</meta>
</meta>
</edge>
...
</tree>
Screenshot of TreeGraph 2


Screenshot of the current version of TreeGraph 2.

Further information