Tuesday, September 19, 2017

Arguments from authority, and the Cladistic Ghost, in historical linguistics

Arguments from authority play an important role in our daily lives and our societies. In political discussions, we often point to the opinion of trusted authorities if we do not know enough about the matter at hand. In medicine, favorable opinions by respected authorities function as one of four levels of evidence (admittedly, the lowest) to judge the strength of a medicament. In advertising, the (at times doubtful) authority of celebrities is used to convince us that a certain product will change our lives.

Arguments from authority are useful, since they allow us to have an opinion without fully understanding it. Given the ever-increasing complexity of the world in which we live, we could not do without them. We need to build on the opinions and conclusions of others in order to construct our personal little realm of convictions and insights. This is specifically important for scientific research, since it is based on a huge network of trust in the correctness of previous studies which no single researcher could check in a lifetime.

Arguments from authority are, however, also dangerous if we blindly trust them without critical evaluation. To err is human, and there is no guarantee that the analysis of our favorite authorities is always error proof. For example, famous linguists, such as Ferdinand de Saussure (1857-1913) or Antoine Meillet (1866-1936), revolutionized the field of historical linguistics, and their theories had a huge impact on the way we compare languages today. Nevertheless, this does not mean that they were right in all their theories and analyses, and we should never trust any theory or methodological principle only because it was proposed by Meillet or Saussure.

Since people tend to avoid asking why their authority came to a certain conclusion, arguments of authority can be easily abused. In the extreme, this may accumulate in totalitarian societies, or societies ruled by religious fanatism. To a smaller degree, we can also find this totalitarian attitude in science, where researchers may end up blindly trusting the theory of a certain authority without further critically investigating it.

The comparative method

The authority in this context does not necessarily need to be a real person, it can also be a theory or a certain methodology. The financial crisis from 2008 can be taken as an example of a methodology, namely classical "economic forecasting", that turned out to be trusted much more than it deserved. In historical linguistics, we have a similar quasi-religious attitude towards our traditional comparative method (see Weiss 2014 for an overview), which we use in order to compare languages. This "method" is in fact no method at all, but rather a huge bunch of techniques by which linguists have been comparing and reconstructing languages during the past 200 years. These include the detection of cognate or "homologous" words across languages, and the inference of regular sound correspondence patterns (which I discussed in a blog from October last year), but also the reconstruction of sounds and words of ancestral languages not attested in written records, and the inference of the phylogeny of a given language family.

In all of these matters, the comparative method enjoys a quasi-religious authority in historical linguistics. Saying that they do not follow the comparative method in their work is among the worst things you can say to historical linguists. It hurts. We are conditioned from when we were small to feel this pain. This is all the more surprising, given that scholars rarely agree on the specifics of the methodology, as one can see from the table below, where I compare the key tasks that different authors attribute to the "method" in the literature. I think one can easily see that there is not much of an overlap, nor a pattern.

Varying accounts on the "comparative methods" in the linguistic literature

It is difficult to tell how this attitude evolved. The foundations of the comparative method go back to the early work of scholars in the 19th century, who managed to demonstrate the genealogical relationship of the Indo-European languages. Already in these early times, we can find hints regarding the "methodology" of "comparative grammar" (see for example Atkinson 1875), but judging from the literature I have read, it seems that it was not before the early 20th century that people began to introduce the techniques for historical language comparison as a methodological framework.

How this framework became the framework for language comparison, although it was never really established as such, is even less clear to me. At some point the linguistic world (which was always characterized by aggressive battles among colleagues, which were fought in the open in numerous publications) decided that the numerous techniques for historical language comparison which turned out to be the most successful ones up to that point are a specific method, and that this specific method was so extremely well established that no alternative approach could ever compete with it.

Biologists, who have experienced drastic methodological changes during the last decades, may wonder how scientists could believe that any practice, theory, or method is everlasting, untouchable and infallible. In fact, the comparative method in historical linguistics is always changing, since it is a label rather than a true framework with fixed rules. Our insights into various aspects of language change is constantly increasing, and as a result, the way we practice the comparative method is also improving. As a result, we keep using the same label, but the product we sell is different from the one we sold decades ago. Historical linguistics are, however, very conservative regarding the authorities they trust, and our field was always very skeptical regarding any new methodologies which were proposed.

Morris Swadesh (1909-1967), for example, proposed a quantitative approach to infer divergence dates of language pairs (Swadesh 1950 and later), which was immediately refuted, right after he proposed it (Hoijer 1956, Bergsland and Vogt 1962). Swadesh's idea to assume constant rates of lexical change was surely problematic, but his general idea of looking at lexical change from the perspective of a fixed set of meanings was very creative in that time, and it has given rise to many interesting investigations (see, among others, Haspelmath and Tadmor 2009). As a result, quantitative work was largely disregarded in the following decades. Not many people payed any attention to David Sankoff's (1969) PhD thesis, in which he tried to develop improved models of lexical change in order to infer language phylogenies, which is probably the reason why Sankoff later turned to biology, where his work received the appreciation it deserved.

Shared innovations

Since the beginning of the second millennium, quantitative studies have enjoyed a new popularity in historical linguistics, as can be seen in the numerous papers that have been devoted to automatically inferred phylogenies (see Gray and Atkinson 2003 and passim). The field has begun to accept these methods as additional tools to provide an understanding of how our languages evolved into their current shape. But scholars tend to contrast these new techniques sharply with the "classical approaches", namely the different modules of the comparative method. Many scholars also still assume that the only valid technique by which phylogenies (be it trees or networks) can be inferred is to identify shared innovations in the languages under investigation (Donohue et al. 2012, François 2014).

The idea of shared innovations was first proposed by Brugmann (1884), and has its direct counterpart in Hennig's (1950) framework of cladistics. In a later book of Brugmann, we find the following passage on shared innovations (or synapomorphies in Hennig's terminology):
The only thing that can shed light on the relation among the individual language branches [...] are the specific correspondences between two or more of them, the innovations, by which each time certain language branches have advanced in comparison with other branches in their development. (Brugmann 1967[1886]:24, my translation)
Unfortunately, not many people seem to have read Brugmann's original text in full. Brugmann says that subgrouping requires the identification of shared innovative traits (as opposed to shared retentions), but he remains skeptical about whether this can be done in a satisfying way, since we often do not know whether certain traits developed independently, were borrowed at later stages, or are simply being misidentified as being "shared". Brugmann's proposed solution to this is to claim that shared, potentially innovative traits, should be numerous enough to reduce the possibility of chance.

While biology has long since abandoned the cladistic idea, turning instead to quantitative (mostly stochastic) approaches in phylogenetic reconstruction, linguists are surprisingly stubborn in this regard. It is beyond question that those uniquely shared traits among languages that are unlikely to have evolved by chance or language contact are good proxies for subgrouping. But they are often very hard to identify, and this is probably also the reason why our understanding about the phylogeny of the Indo-European language family has not improved much during the past 100 years. In situations where we lack any striking evidence, quantitative approaches may as well be used to infer potentially innovated traits, and if we do a better job in listing these cases (current software, which was designed by biologists, is not really helpful in logging all decisions and inferences that were made by the algorithms), we could profit a lot when turning to computer-assisted frameworks in which experts thoroughly evaluate the inferences which were made by the automatic approaches in order to generate new hypotheses and improve our understanding of our language's past.

A further problem with cladistics is that scholars often use the term shared innovation for inferences, while the cladistic toolkit and the reason why Brugmann and Hennig thought that shared innovations are needed for subgrouping rests on the assumption that one knows the true evolutionary history (DeLaet 2005: 85). Since the true evolutionary history is a tree in the cladistic sense, an innovation can only be identified if one knows the tree. This means, however, that one cannot use the innovations to infer the tree (if it has to be known in advance). What scholars thus mean when talking about shared innovations in linguistics are potentially shared innovations, that is, characters, which are diagnostic of subgrouping.


Given how quickly science evolves and how non-permanent our knowledge and our methodologies are, I would never claim that the new quantitative approaches are the only way to deal with trees or networks in historical linguistics. The last word on this debate has not yet been spoken, and while I see many points critically, there are also many points for concrete improvement (List 2016). But I see very clearly that our tendency as historical linguists to take the comparative method as the only authoritative way to arrive at a valid subgrouping is not leading us anywhere.

Do computational approaches really switch off the light which illuminates classical historical linguistics?

In a recent review, Stefan Georg, an expert on Altaic languages, writes that the recent computational approaches to phylogenetic reconstruction in historical linguistics "switch out the light which has illuminated Indo-European linguistics for generations (by switching on some computers)", and that they "reduce this discipline to the pre-modern guesswork stage [...] in the belief that all that processing power can replace the available knowledge about these languages [...] and will produce ‘results’ which are worth the paper they are printed on" (Georg 2017: 372, footnote). It seems to me, that, if a discipline has been enlightened too much by its blind trust in authorities, it is not the worst idea to switch off the light once in a while.

  • Anttila, R. (1972): An introduction to historical and comparative linguistics. Macmillan: New York.
  • Atkinson, R. (1875): Comparative grammar of the Dravidian languages. Hermathena 2.3. 60-106.
  • Bergsland, K. and H. Vogt (1962): On the validity of glottochronology. Current Anthropology 3.2. 115-153.
  • Brugmann, K. (1884): Zur Frage nach den Verwandtschaftsverhältnissen der indogermanischen Sprachen [Questions regarding the closer relationship of the Indo-European languages]. Internationale Zeischrift für allgemeine Sprachewissenschaft 1. 228-256.
  • Bußmann, H. (2002): Lexikon der Sprachwissenschaft . Kröner: Stuttgart.
  • De Laet, J. (2005): Parsimony and the problem of inapplicables in sequence data. In: Albert, V. (ed.): Parsimony, phylogeny, and genomics. Oxford University Press: Oxford. 81-116.
  • Donohue, M., T. Denham, and S. Oppenheimer (2012): New methodologies for historical linguistics? Calibrating a lexicon-based methodology for diffusion vs. subgrouping. Diachronica 29.4. 505–522.
  • Fleischhauer, J. (2009): A Phylogenetic Interpretation of the Comparative Method. Journal of Language Relationship 2. 115-138.
  • Fox, A. (1995): Linguistic reconstruction. An introduction to theory and method. Oxford University Press: Oxford.
  • François, A. (2014): Trees, waves and linkages: models of language diversification. In: Bowern, C. and B. Evans (eds.): The Routledge handbook of historical linguistics. Routledge: 161-189.
  • Georg, S. (2017): The Role of Paradigmatic Morphology in Historical, Areal and Genealogical Linguistics. Journal of Language Contact 10. 353-381.
  • Glück, H. (2000): Metzler-Lexikon Sprache . Metzler: Stuttgart.
  • Gray, R. and Q. Atkinson (2003): Language-tree divergence times support the Anatolian theory of Indo-European origin. Nature 426.6965. 435-439.
  • Harrison, S. (2003): On the limits of the comparative method. In: Joseph, B. and R. Janda (eds.): The handbook of historical linguistics. Blackwell: Malden and Oxford and Melbourne and Berlin. 213-243.
  • Haspelmath, M. and U. Tadmor (2009): The Loanword Typology project and the World Loanword Database. In: Haspelmath, M. and U. Tadmor (eds.): Loanwords in the world’s languages. de Gruyter: Berlin and New York. 1-34.
  • Hennig, W. (1950): Grundzüge einer Theorie der phylogenetischen Systematik. Deutscher Zentralverlag: Berlin.
  • Hoenigswald, H. (1960): Phonetic similarity in internal reconstruction. Language 36.2. 191-192.
  • Hoijer, H. (1956): Lexicostatistics. A critique. Language 32.1. 49-60.
  • Jarceva, V. (1990): . Sovetskaja Enciklopedija: Moscow.
  • Klimov, G. (1990): Osnovy lingvističeskoj komparativistiki [Foundations of comparative linguistics]. Nauka: Moscow.
  • Lehmann, W. (1969): Einführung in die historische Linguistik. Carl Winter:
  • List, J.-M. (2016): Beyond cognacy: Historical relations between words and their implication for phylogenetic reconstruction. Journal of Language Evolution 1.2. 119-136.
  • Makaev, E. (1977): Obščaja teorija sravnitel’nogo jazykoznanija [Common theory of comparative linguistics]. Nauka: Moscow.
  • Matthews, P. (1997): Oxford concise dictionary of linguistics . Oxford University Press: Oxford.
  • Rankin, R. (2003): The comparative method. In: Joseph, B. and R. Janda (eds.): The handbook of historical linguistics. Blackwell: Malden and Oxford and Melbourne and Berlin.
  • Sankoff, D. (1969): Historical linguistics as stochastic process . . McGill University: Montreal.
  • Weiss, M. (2014): The comparative method. In: Bowern, C. and N. Evans (eds.): The Routledge Handbook of Historical Linguistics. Routledge: New York. 127-145.

Monday, September 11, 2017

A network of political parties competing for the 2017 Bundestag

Many elections now have some sort of online black box that allow you to see which political party or candidate has the highest overlap with your own personal political opinions. This is intended to help voters with their decisions. However, the black boxes usually lack any documentation regarding how different are the viewpoints of the competing parties / candidates. Exploratory data analysis via Neighbour-nets may be of some use in these cases.

As a European Union citizen (of German and Swedish nationality) I am entitled to live and work in any EU country. I currently live in France, but I cannot vote for the parliament (Assemblée nationale) and government (M. Le Président) that affects my daily life, and decides on the taxes, etc, that I have to pay. However, I’m still eligible to vote in Germany (in theory; in practice it is a bit more complex).

The next election (Budestagswahl) is closing in for the national parliament of the Federal Republic of Germany, the Bundestag (equivalent to the lower house of other bicameral legislatures). To help the voters, a new Wahl-O-Mat (described below) has been launched by the Federal Institute of Political Education (Bundeszentrale für politische Bildung, BPB). This is a fun thing to participate in, even if you have already made up your mind about who to vote for.

Each election year, the BPB develops and sends out a questionnaire with theses (83 this year) to all of political parties that will compete in the election. The parties can answer with ‘agree’, ‘no opinion / neutral’, or ‘don’t agree’ for each thesis. The 38 most controversially discussed political questions have been included in the Wahl-O-Mat, and you can also answer them for yourself. As a final step, you can choose eight of the political parties competing for the Bundestag, and the online back box will show you an agreement percentage between you and them in the form of a bar-chart diagram.

But as a phylogeneticist / data-analyst, I am naturally sceptical when it comes to mere percentages and bar charts. Furthermore, I would like to know how similar the parties’ opinions are to each other, to start with. An overview is provided, with all of the answers from the parties, but it is difficult to compare these across pages (each page of the PDF lists four parties, in the same order as on the selection page). The Wahl-O-Mat informs you that a high fit of your answers with more than one party does not necessarily indicate a closeness between the parties — you may, after all, be agreeing with them on different theses.

This means that the percentage of agreement between me and the political parties would provide a similarity measure, which I can use to compare the political parties with each other. But how discriminatory are my percentages of agreement (from the larger perspective)?

A network analysis

There are 33 parties that are competing for seats in the forthcoming Bundestag, one did not respond. Another one, the Party for Health Research (PfHR — a one-topic party) answered all 36 questions with 'neutral'. However, the makers of the Wahl-O-Mat still had to include it; and since that party provided no opinion on any of the questions, I scored 50% agreement with them (since I answered every question with 'yes' or 'no') — this is more than with the Liberal Party (because we actually disagree on half of the 38 questions). This is a flaw in the Wahl-O-Mat. If you say 'yes' (or 'no') to a thesis that the party has no opinion on, then it is counted as one point, while two points are awarded for a direct match. However, it does not work the other way around — having no opinion on any question brings up a window telling you that your preference cannot be properly evaluated.

Because of this, I determined my position relative to the political parties using a neighbour-net. The primary character matrix is binary, where 0 = ‘no’, 1 = ‘yes’ and ‘?’ stands for no opinion (neutral), compared using simple (Hamming) pairwise distances. So, if two parties disagree for all of the theses their pairwise distance will be 1. If there is no disagreement, the pairwise distance will be 0. Since the PfHR has provided no opinion, I left it out (ie. its pairwise distances are undefined).

Fig. 1 Neighbour-net of German political parties competing in the 2017 election (not including me). Parties of the far-left and far-right are bracket, for political  orientation. Parties with a high chance to get into the next Bundestag (passing the 5% threshold) are in bold. [See also this analysis by The Political Compass, for comparison].

The resulting network (Figure 1) is quite fitting: the traditional perception of parties (left-wing versus right-wing) is well captured. Parties, like the ÖDP (green and conservative), that do not fit into the classic left-right scheme are placed in an isolated position.

The graph reveals a (not very surprising) closeness between the two largest German political parties, the original Volksparteien (all-people parties): the CDU/CSU (centre-right, the party of the current Chancellor) and the SPD (centre-left). The SPD is the current (and potentially future) junior partner of the CDU/CSU, its main competitor. According to the graph, an alternative, more natural, junior partner of the CDU/CSU would be the (neo-)liberal party, the FDP.

The parties of the far-right are placed at the end of a pronounced network stem — that is they are the ones that deviate most from the consensus shared by all of the other parties. They are (still) substantially closer to the centre-right parties than to those from the (extreme) left. However, the edge lengths show that, for example, a hypothetical CDU/CSU–AfD coalition (the AfD is the only right-wing party with a high chance to pass the 5% threshold) would have to join two parties with many conflicting viewpoints. That is, regarding their answers to the 38 questions, in general the CSU appears to be much closer to the AfD than to it's sister party, the CDU.

Regarding the political left, the graph depicts its long-known political-structure problem: there are many parties, some with very unique viewpoints (producing longer terminal network edges); but overall there is little difference between them. The most distinct parties in this cluster are the Green Party (Die Grünen) and the Humanist Party (Die Humanisten), a microparty promoting humanism (see also Fig. 2).

Any formal inference is bound by its analysis rules, which may represent the primary signal suboptimally. The neighbour-net is a planar graph, but profiles of political parties may require more than two dimensions to do a good job. So let's take a look at the underlying distance matrix using a ‘heat map’ (Figure 2).

Fig. 2 Heat-map based on the same distance matrix as used for inferring the neighbour-net in Fig. 1. Note the general similarity of left-leaning parties and their distinctness to the right-leaning parties.

We can see that the Left Party (Die Linke) and the Bündnis Grundeinkommen (BGE), a single-topic party founded to promote a basic income without conditions, don’t disagree in any point, and that the declining Pirate Party (flagged as social-liberal on Wikipedia) has turned sharp left. The Party for Animal Protection (Tierschutzpartei) and the Party of Vegetarians and Vegans (V3) should discuss a merger; whereas the Alliance for Animal Protection (Tierschutzallianz) is their more conservative counter-part, being much closer to e.g. the CDU/CSU.

We can also see that the party with the highest agreement with the SPD is still the Greens (Die Grünen). Furthermore, although the FDP and the Pirate Party have little in common, the Humanist Party (Die Humanisten) may be a good alternative when you’re undecided between the other two. [Well, it would be, if in Germany each vote counts the same, but the 5% threshold invalidates all votes cast for parties not passing the threshold.] The most unique party, regarding their set of answers and the resulting pairwise distances, is a right-wing microparty (see the network above) supporting direct democracy (Volksabstimmung).

Applications such as the Wahl-O-Mat are put up for many elections, and when documented in the way done by the German Federal Institute of Political Education, provide a nice opportunity to assess how close are (officially) the competing parties, using networks.

PS. For our German readers who are as yet undecided: the primary character matrix (NEXUS-formatted) and related files can be found here.

Tuesday, September 5, 2017

SPECTRE: a suite of phylogenetic tools for reticulate evolution

Recently, the Earlham Institute, in the UK, released a set of software tools that are of relevance to this blog — SPECTRE. These tools are described in a forthcoming paper:
Sarah Bastkowski, Daniel Mapleson, Andreas Spillner, Taoyang Wu, Monika Balvočiūte and Vincent Moulton (2017) SPECTRE: a Suite of PhylogEnetiC Tools for Reticulate Evolution.

This is a toolkit rather than simple-to-use program, meaning that the various analyses exist as separate entities that can be combined in any way you like. More importantly, new analyses can be added easily, by those who want to write them, which is not the case for more commonly used programs like SplitsTree. This way, the analyses can also be incorporated into processing pipelines, rather than only being used interactively.

Apart from the usual access to data files (including Nexus, Phylip, Newick, Emboss and FastA formats), the following network analyses are currently available:
NeighborNet, NetMake, QNet, SuperQ, FlatNJ, NetME
The program also outputs the networks, of course. Here is an example of the SPECTRE equivalent of a NeighborNet analysis from a recent blog post (where the network was produced by SplitsTree, and then colored by me).

Running the program(s) is relatively straightforward, once you get things installed. Installation packages are available for OSX, Windows and Linux.

Sadly, for me installation was tricky, because SPECTRE requires Java v.8, which is unfortunately not available for OSX 10.6 (which runs on most of my computers). Even getting Java v.8 installed on the one computer I have with a later version of OSX was not easy, because installing a Java Runtime Environment (the JRE download file) from Oracle does not update the Java -version symlinks or add Java to the software path — for this I had to install the full Java Development Kit (the JDK download file). Sometimes, I hate computers!

Tuesday, August 29, 2017

More non-treelike data forced into trees: a glimpse into the dinosaurs

Plant morphological data sets including fossil taxa can be riddled with incompatible data patterns (e.g. see my first post), and this can be a bit mind-blowing when it comes to tracing evolution over time. So, let’s move on to something potentially more simple: extinct groups of animals.

Until a time-machine is invented, phylogenetic hypotheses for groups such as the many extinct lineages of dinosaurs will have to be based on morphological data sets. Dinosaur fossils are nowhere near as frequent as as plant fossils (often isolated organ); but when a complete or partial skeleton is found, this specimen allows scoring more characters than is possible for even a higher-level composite plant taxon. For instance, the largest (character-wise) plant data matrices, using composite taxa and operating at the level of genera and above, including fossils, have a little over 100 characters, whereas dinosaur matrices like the one used by Tschopp, Mateus & Benson (2015) can have several hundreds of characters.

Classification of dinosaurs tries to apply the principles of ‘cladistics’ (see also http://tolweb.org/Dinosauria), a classification system established by Hennig (1950). Cladistic classification – Hennig did not propose any inference framework – aims to identify exclusively shared derived traits (synapomorphies), and consequently groups of taxa (originally species) that share an inclusive common origin, Hennig's “monophyla”. [In contrast to Haeckel’s (1866) concept of monophyletic groups, which just assumed a common origin, but did not require inclusiveness.] For some reason, which seem to have no scientific basis, but can be understood in a historical context (Felsenstein 2001, 2004: chapter 10), cladistics has been synonymised with parsimony analysis, one of the optimality criteria to infer one-dimensional graphs reflecting a series of dichotomous splits (phylogenetic trees). A basic assumption of cladistic studies is that a clade in a parsimony-inferred tree equals a monophylum (which is not necessarily the case, see e.g. Scotland & Steel 2015 for binary data).

In palaeontology (and systematic biology to some degree) it is common not to show a phylogram, a phylogenetic tree with branch-lengths, but a cladogram. These cladograms rarely depict the optimised (or one of the equally optimal) tree(s), but instead show the strict consensus tree of the found equally parsimonious trees (or potentially most-parsimonious trees) (MPTs). This is also the case for the study by Tschopp et al., used here as an example of the generally non-treelike data used in studies dealing with extinct groups of animals.

David provided a list of questions for exploratory data analysis (EDA), which can (and should) be asked when trying to infer phylogenies based on morphological data. I will look at some of them here.

First question: Are the data tree-like?

The data matrix of Tschopp et al. is impressive (much like the paper itself, with its 298 pages). The authors scored 477 characters (243 new) for (a final set of) 81 “operational taxonomic units” (OTUs). The OTUs are typically specimens in the case of the ingroup, and include several outgroup species for rooting the phylogenetic tree. There are lots of gaps in of the matrix (65% missing data), which relates to the inclusion of poorly known fossil specimens, which the authors tried to classify using parsimony inference and pairwise distances. The authors note (p. 163): “Given the low consistency index (CI) and thus high number of homoplasies in the dataset, an additional analysis with the same settings was conducted using implied weighting (iw).” In addition to signal ambiguity related to general homoplasy and ontogeny, the authors note character overlap effects and deformation (pp. 166ff). So, there are quite a few different sources of incompatible, non-treelike signal.

With equal weighting and including all 81 OTUs, the authors ended up with 60,000 equally parsimonious trees (possibly more — this was the maximum number limited by computational constraints). This produced a strict consensus (SC) tree with just 12 nodes, in which “all ingroup specimens formed one large polytomy”. The ‘implied weighting’ lead to a slightly more resolved SC tree. ‘Implied weighting’ is a posterior means to downweigh characters conflicting with the inferred tree. The authors further identified some (4, 8, or 15) OTUs accounting for most of the “instability”. A posteriori filtering of these putative rogue taxa led to SC trees that were much better resolved (Fig. 1).

Fig. 1 The six strict consensus trees shown by Tschopp et al. The red crosses indicate the OTUs that were pruned from the MPT tree sample to increase the resolution of the SC tree. For the first tree, I added the information on the fraction of missing data (blue dots).

Both tree-like and non-treelike data can collapse strict consensus trees, but the large number of MPTs can be a first indication that the data are not tree-like. The MPT samples inferred by Tschopp et al. are not included in the documentation (following the current standard; see also data uploaded to TreeBase). Using the quick-analysis option in PAUP* (random heuristic search, 100 replicates, CHUCK-options set), I found 3,000 equally parsimonious trees, which are only slightly worse (1983 steps) than the 60,000 MPTs (1979 steps reported) combined in Tschopp et al.’s unweighted cladogram.

Using the consensus network approach (Holland & Moulton 2003) for summarising the parsimony-tree sample (no cut-off value), we can get a first impression of the signal in the matrix (Fig. 2). The data allow for a great number of topological alternatives — they are generally not tree-like. Only a few relationships are unambiguous in this collection. The fan-like topological features (composed typically of low-dimensional boxes) relate to: (a) jumping OTUs (rogue taxa), (b) uncertainty regarding relationships between related OTUs consistently found in the same subtree, and (c) the exact composition of the subtrees. In contrast to the strict consensus tree, the network visualises the tree-unlikeliness of the data expressed in the MPT collection, revealing extremely ‘rogue’-ish OTUs (e.g. Diplodocus_YPM_1922) and OTUs with indiscriminate signal (e.g. FMNH_P25112), and also allows us to qualify the ‘rogueness’ of all other OTUs.

Fig. 2 Strict consensus network (all edge-lengths set to 1) of 3000 equally parsimonious trees, inferred from Tschopp et al.'s matrix. This graph is the network equivalent of the commonly seen strict consensus cladograms (Fig. 1). Note that the tree sample is slightly suboptimal and likely incomprehensive.

One pre-inference measure for tree-likeness is the Delta Value (DV) introduced by Holland et al. (2002); see e.g. Auch et al. (2006) and Göker & Grimm (2008) for applications. The matrix DV is 0.47, which is very high, even for a morphological matrix. The individual DVs (iDV) range between 0.417 and 0.577, which means that no set of OTU provides a tree-like signal. The complete data are not tree-like, and hence the failure to find unambiguous relationships, even when a comprehensive tree search and ‘implicit weighting’ are used (see Tschopp et al. 2015). Extreme iDV (> 0.55) correlate with (relatively) high proportions of missing data (75–98%, i.e. 10–119 defined characters; Fig. 3), indicating that missing data are a problem for inferences and the calculation of the pairwise distance matrix.

Fig. 3 XY-plot showing the individual Delta Values (a measure for treelike signal) in relation to the proportion of missing data. The green "comfort zone" indicates iDVs favorable for tree-inference (based on personal experience).

Subsequent question: Why are the data not tree-like?

In his post, David listed four possible reasons for non-tree-like data:
  (a) uninformative data: a “bush”,
  (b) weakly tree-like data: a “tree obscured by vines”,
  (c) data containing several strongly incompatible relationships: a “structured network”,
  (d) confusing or random data: a “spider-web”.
Lacking branch-lengths, the MPT consensus network above provides no information regarding (a), and limited information regarding (b) and (c). Only (d) can be excluded as a main source of non-tree-like signal for the dinosaur data: higher-than-3-dimensional boxes are rare.

Fig. 4 Boostrap (BS) consensus network based on 10,000 BS (pseudo)replicates. Trivial splits in grey, splits without strong alternatives in blue, conflicting splits (always two alternatives) in red. All splits found in less than 20% of the BS replicates not shown, and edge length are proportional to the split frequencies.

Figure 4 shows the bootstrap support network based on 10,000 parsimony bootstrap pseudoreplicates (generated following Müller 2005). Some terminal sister relationships seen in the original, taxon-reduced, unweighted or weighted SC trees rely on quite robust, unconflicted signal, a few others are only supported by a small fraction of the characters, but all competing alternatives even less (blue edges in the graph). Thus, it is a “Maybe” for (a) (see also Fig. 5), and a “Yes” for (b) (compare Figs 2 and 4). The character suites of many OTUs provide no robust signal to place them; their position in the set of trees is based on the signal of relatively (large matrix!) few characters, or the result of branching artefacts as we force non-treelike data into a tree. The robust signal for some terminal clades may be obscured by ambiguous signal of potential additional members of the clade, or OTUs similar to only part of a clade (the “vines”).

We can also observe some pronounced 2-dimensional boxes: here the signal from the data matrix has no preference for a single alternative, but indicates two competing alternatives (red edges in the graph), i.e. also a possible “Yes” for (c). In the case of morphological data, reticulate signals do not necessarily indicate reticulation in an evolutionary sense. They can be triggered by two (more or less related) lineages evolving into the same morphospace, or the co-existence of ancestral and derived forms (see also this post). No spider-web-like portions (high-dimensional boxes) are seen (and are also largely missing from the MPT consensus network in Fig. 2), so we can exclude chaotic signal as reason (d) for the tree-unlikeliness of the data.

Fig. 5 Neighbour-net splits graph based on pairwise (Hamming) distances computed with PAUP* using the Tschopp et al. matrix.

Figure 5 shows the unfiltered, simple (Hamming) distance-based neighbour-net (NNet) for the same matrix. Mirroring the high matrix DV and iDVs, the NNet has only a few tree-like portions, but nevertheless reflects a high diversity — long terminal edges; pairwise distances range between 0 (no difference in data-covered characters) and 1 (all characters are different). Some OTUs are placed closed to or in the boxy centre of the graph or the root trunks of terminal groups. Such a placement is either indicative of ancestry (see my earlier post), which is a special case of reason (c), or a lack of discriminative signal, i.e. reason (a) for non-treelike data. Here, it appears to be mostly the latter: the iDV are high, and the highest iDV relate to high proportions of missing data (more than 75%).

High proportions of missing data do not necessarily result in high DV (here 75% missing data equals c. 150 defined characters, which could be more than enough to place a taxon). But not a few OTUs have zero pairwise-distances to a set of diverse OTUs that are not closely related. In total, 74 of the 81 OTUs show a zero-distance to at least one other OTU; with Diplodocus YPM 1922 (98% missing data) being the most-extremely non-distinct OTU: it has a zero-distance to 66 OTUs, including one outgroup taxon. Such a pattern is impossible from an evolutionary point of view (even an ancestor cannot be identical to all of its off-spring when they diversified). and is a missing data artefact. The NNet resolves this data insufficiency by placing the highly ambiguous OTUs in the centre of the graph, whereas parsimony (or other tree inference) deals with this effectively unsolvable problem by providing some, many, or all theoretically possible placements of the problematic OTU (the OTU turns ‘rogue’) as equally optimal (large fans in Fig. 2) but without support (Fig. 4).

There are two options to infer phylogenetic trees, or to test alternative evolutionary hypotheses using Tschopp et al.’s matrix with its tree-unlike data.
  1. One is to reduce the taxon set to those OTUs with less than 50% of missing data, to produce a backbone tree or network (matrix DV = 0.28; iDV range between 0.219–0.352; Fig. 6), Then  to evaluate the position (or possible positions) of each other OTU within this backbone (using ‘+1 OTU’ neighbour-nets, parsimony-optimisation or algorithms such as the evolutionary placement algorithm implemented in RAxML; Berger & Stamatakis 2010; Berger, Krompass & Stamatakis 2011). Then finalise with group-restricted taxon and character subsets to study within-group relationships.
  2. The other is to cut the matrix into pieces and taxon sets with good data overlap. Then assess the correlation between these submatrices (e.g. using Pearson’s correlation coefficient) and their tree-likeness (using Delta Values). Then use consensus networks and/or supernetworks to investigate potential incongruences, and to summarise topological alternatives.

Fig.6 Neighbour-net (NNet) for a taxon-reduced set, only including OTUs with more than 50% of defined characters. These data result in a single most-parsimonious tree, which is largely congruent to the main splits in the NNet (blue), except for a three poorly supported branches (red). Numbers indicate neighbour-joining and parsimony bootstrap support for branches in the MPT and corresponding edges in the NNet and their alternatives.

Palaeontologists: Please stop using strict consensus trees, and start with EDA

To fill the deeper parts of the Tree of Life with life, we cannot get around morphological data and phylogenetic inferences based on these data. Most of Earth’s diversity is extinct, so their molecular data are (largely) lost to science. But no matter whether we work with extinct plants or animals, or with matrices containing many or few morphological characters, we should keep a close eye on the primary signals in those matrices. Are the data tree-like? Are there rogue taxa, and how/why do they affect the inferences? How discriminatory are the data regarding competing alternative hypotheses? Does taxon and character sampling matter? Networks (planar or n-dimensional) can help to: (1) assess the potential of the data for tree inference, and (2) discuss the putative monophyly of groups and their alternatives.

The signal from morphological data matrices is complex, and the data are rarely tree-like. Irrespective of whether one wants to stick with parsimony or not, tree-based and support consensus networks should by now have long replaced the strict (or majority-rule) consensus trees in “cladistic” or general-phylogenetic studies dealing with extinct groups of organisms.

Posteriori methods to filter or down-weight characters not fitting the inferred tree(s) ignore the fact that morphological differentiation typically cannot be explained by a single tree (leaving aside, that total evidence and DNA-constrained analysis demonstrate that morphological evolution is not parsimonious at all). There are too many sources of signal incompatible with the true tree.

In the light of ambiguous and potentially biased signals (outlined and discussed by Tschopp et al. 2015 for their data), the focus of cladistic or other phylogenetic studies that aim to fill the Tree of Life with extinct branches cannot be to infer a clean(ed) tree. Instead, the focus should be on exploring the signals in the data and assessing their capacity to exclude or support evolutionary scenarios. A well understood topological uncertainty is always better than a poorly supported clade.

Regarding the Tree of Life, we should start representing uncertainty as-is (i.e. showing the currently competing alternatives), and reserve polytomies for cases where we really have no idea at all. Also, we should place potential ancestors (ancestral forms) where they belong: at the root nodes of their descendant lineages (the forms derived from them).


Auch AF, Henz SR, Holland BR, Göker M. (2006) Genome BLAST distance phylogenies inferred from whole plastid and whole mitochondrion genome sequences. BMC Bioinformatics 7:350.

Berger SA, Krompass D, Stamatakis A. (2011) Performance, accuracy, and web server for evolutionary placement of short sequence reads under Maximum Likelihood. Systematic Biology 60:291–302.

Berger SA, Stamatakis A. (2010) Accuracy of morphology-based phylogenetic fossil placement under Maximum Likelihood. IEEE/ACS International Conference on Computer Systems and Applications (AICCSA). Hammamet: IEEE. p. 1-9.

Felsenstein J. (2001) The troubled growth of statistical phylogenetics. Systematic Biology 50:465–467.

Felsenstein J. (2004) Inferring phylogenies. Sunderland, MA, U.S.A.: Sinauer Associates Inc.

Göker M, Grimm GW. (2008)General functions to transform associate data to host data, and their use in phylogenetic inference from sequences with intra-individual variability. BMC Evolutionary Biology 8:86.

Haeckel E. (1866) Generelle Morphologie der Organismen. Berlin: Georg Reiner.

Hennig W. (1950) Grundzüge einer Theorie der phylogenetischen Systematik. Berlin: Dt. Zentralverlag.

Holland B, Moulton V. (2003) Consensus networks: A method for visualising incompatibilities in collections of trees. In: Benson G, and Page R, eds. Algorithms in Bioinformatics: Third International Workshop, WABI, Budapest, Hungary Proceedings. Berlin, Heidelberg, Stuttgart: Springer Verlag, p. 165–176.

Holland BR, Huber KT, Dress A, Moulton V. (2002) Delta Plots: A tool for analyzing phylogenetic distance data. Molecular Biology and Evolution 19:2051-2059.

Müller KF. (2005) The efficiency of different search strategies for estimating parsimony, jackknife, bootstrap, and Bremer support. BMC Evolutionary Biology 5:58.

Scotland RW, Steel M. (2015) Circumstances in which parsimony but not compatibility will be provably misleading. Systematic Biology 64:492–504. [preprint]

Tschopp E, Mateus O, Benson RBJ. (2015) A specimen-level phylogenetic analysis and taxonomic revision of Diplodocidae (Dinosauria, Sauropoda). PeerJ 3:e857.

Post-script: Why distance-based approaches?

Distance-based approaches may be still refuted by hard-core cladists as “unphylogenetic” or “phenetic” (again, see Felsenstein 2004 for the historical reasons, and why this is wrong), particularly when acting as anonymous reviewers of palaeontological papers. But the simple fact is: a character matrix not allowing inference of a pairwise distance matrix with at least some tree-like signal, should not be used to infer phylogenetic trees (no matter which optimality criterion is used).

A perfect character matrix, i.e. a matrix in which each dichotomy is subsequently followed by one or several strictly synapomorphic changes will, of course, result in a single MPT. But it will also provide a simple (Hamming) mean distance matrix allowing us to infer a neighbour-joining tree fulfilling the least-squares or minimum evolution optimality criteria, and this will be identical to the MPT and a corresponding NNet without any box-like portions. It will also be the most probable topology that can be inferred using maximum likelihood or Bayesian inference.

When different tree inference methods come to substantially different results for morphological matrices, the signal from the primary matrix is likely not to be tree-like, and internal conflict then needs to be explored. The more tree-like is the matrix, then the less it will be affected by methodological differences (e.g. Fig. 6; the only branches of the MPT not fitting the preferred splits in the NNet have low support, and compete with equally low supported splits seen in the NNet that receive high support from NJ-bootstrapping).

Distance-based analyses are much faster than parsimony, maximum likelihood, and Bayesian inferences; and they are not restricted to inferring phylogenetic trees. Within the same time that I need to perform a comprehensive tree and branch support analysis, I can generate hundreds of NNets using different taxon and character subsets of my matrix, and thus explore its many signals. One can employ different distance measures to deal with continuous or ordered categorical data, and then directly see the effect on the reconstruction. Eventually, one may find a subset that provides the most tree-like signal, which will be the best possible basis for the final tree-inference (in case an evolutionaru tree is what is wanted) and branch support analysis.

Tuesday, August 22, 2017

Unattested character states

In an earlier post from January 2016, I argued that it is important to account for directional processes when modeling language history through character-state evolution. In previous papers (List 2016; Chacon and List 2015), I  tried to show that this can be easily done with asymmetric step matrices in a parsimony framework. Only later did I realize that this is nothing new for biologists who work on morphological characters, thus supporting David's claim that we should not compare linguistic characters with the genotype, but with the phenotype (Morrison 2014). Early this year, a colleague introduced me to Mk-models in phylogenetics, which were first introduced by Lewis (2001)) and allow analysis of multi-state characters in a likelihood framework.

What was surprising for me is that it seems that Mk-models seem to outperform parsimony frameworks, although being much simpler than elaborate step-matrices defined for morphological characters (Wright and Hillis 2014). Today, I read that a recent paper by Wright et al. (2016) even shows how asymmetric transition rates can be handled in likelihood frameworks.

Being by no means an expert in phylogenetic analyses, especially not in likelihood frameworks, I tend to have a hard time understanding what is actually being modeled. However, if I correctly understand the gist of the Wright et al. paper, it seems that we are slowly approaching a situation in which more complex scenarios of lexical character evolution in linguistics no longer need to rely on parsimony frameworks.

But, unfortunately, we are not there yet; and it is even questionable whether we will ever be. The reason is that all multi-state models that have been proposed so far only handle transitions between attested characters: unattested characters can neither be included in the analyses nor can they be inferred.

I have pointed to this problem in some previous blogposts, the last one published in June, where I mentioned Ferdinand de Saussure, (1857-1913), who postulated two unattested consonantal sounds for Indo-European (Saussure 1879), of which one was later found to have still survived in Hittite, a language that was deciphered and shown to be Indo-European only about 30 years later (Lehmann 1992: 33).

The fact that it is possible to use our traditional methods to infer unattested sounds from circumstantial evidence, but not to include our knowledge about them into phylogenetic analyses, is a huge drawback. Potentially even greater are the situations where even our traditional methods do not allow us to infer unattested data. Think, for example, of a word that was once present in some language but was later completely lost. Given the ephemeral nature of human language, we have no way to know this, but we know very well that it easily happens when just thinking of some terms used for old technology, like walkman or soon even iPod, which the younger generations have never heard about.

Colleagues with whom I have discuss my concerns in this regard are often more optimistic than I am, saying that even if the methods cannot handle unattested characters they could still find the major signal, and thus tell us at least the general tendency as to how a language family evolved. However, for classical linguists, who can infer quite a lot using the laborious methods that still need to be applied manually, it leaves a sour taste, if they are told that the analysis deliberately ignored crucial aspects of the processes and phenomena they understand very well. For example, if we detect that some intelligence test is right in about 80% of all cases, we would also abstain from using it to judge who we allow to take up their studies at university.

I also think that it is not a satisfying solution for the analysis of morphological data in biology. It is probably quite likely that some ancient species had certain traits which later evolved into the traits we observe which are simply no longer attested anywhere, either in fossils or in the genes. I also wonder how well phylogenetic frameworks generally account for the fact that what the evidence we are left with may reflect much less of what was once there.

In Chacon and List (2015), we circumvent the problem by adding ancestral but unattested sounds to the step matrices in our parsimony analysis. This is of course not entirely satisfactory, as it adds a heavy bias to the analysis of sound change, which no longer tests for all possible solutions but only for the ones we fed into the algorithm. For sound change, it may be possible to substantially expand the character space by adding sounds attested across the world's languages, and then having the algorithms select the most probable transitions. But given that we still barely know anything about general transition probabilities of sound change, and that databases like Phoible (Moran 2015)  list more than 2,000 different sounds for a bit more than 2,000 languages, it seems like a Sisyphean challenge to tackle this problem consistently.

What can we do in the meantime? Not very much, it seems. But we can still try to improve our methods in baby steps, trying to get a better understanding of the major and minor processes in linguistic and biological evolution; and not forgetting that, although I was only talking about phylogenetic tree reconstruction, in the end we also want to have all of this done in network approaches.

  • Chacon, T. and J.-M. List (2015) Improved computational models of sound change shed light on the history of the Tukanoan languages. Journal of Language Relationship 13: 177-204.
  • Lehmann, W. (1992) Historical linguistics. An Introduction. Routledge: London.
  • Lewis, P. (2001) A likelihood approach to estimating phylogeny from discrete morphological character data. Systematic Biology 50: 913-925.
  • List, J.-M. (2016) Beyond cognacy: Historical relations between words and their implication for phylogenetic reconstruction. Journal of Language Evolution 1: 119-136.
  • Moran, S., D. McCloy, and R. Wright (eds) (2014) PHOIBLE Online. Max Planck Institute for Evolutionary Anthropology: Leipzig.
  • Morrison, D.A. (2014) Are phylogenetic patterns the same in anthropology and biology? bioRxiv.
  • Saussure, F. (1879) Mémoire sur le système primitif des voyelles dans les langues indo-européennes. Teubner: Leipzig.
  • Wright, A. and D. Hillis (2014) Bayesian analysis using a simple likelihood model outperforms parsimony for estimation of phylogeny from discrete morphological data. PLoS ONE 9.10. e109210.
  • Wright, A., G. Lloyd, and D. Hillis (2016) Modeling character change heterogeneity in phylogenetic analyses of morphology through the use of priors. Systematic Biology 65: 602-611.

Tuesday, August 15, 2017

Is reticulation as important in rice as in wheat?

I have previously discussed the use of phylogenetic networks to study the Complex hybridizations in wheat, due to the very reticulate evolutionary history. It seems that the situation for the other major world food source, rice, also requires network analysis, although this time introgression is the biological source of reticulation, rather than hybridization.

Jae Young Choi, Adrian E. Platts, Dorian Q. Fuller, Yue-Ie Hsing, Rod A. Wing, and Michael D. Purugganan (2017) The rice paradox: multiple origins but single domestication in Asian rice. Molecular Biology & Evolution 34: 969-979.

The authors note:
The Asian rice Oryza sativa is the world’s most important food crop, and is a staple for more than one-third of the world’s population. Oryza sativa is genetically differentiated into several groups, the main ones being japonica and indica, which have been considered as subspecies / subpopulations with distinct morphological and physiological characteristics

The origin of domesticated Asian rice has been a contentious topic, with conflicting evidence for either single or multiple domestication of this key crop species. We examined the evolutionary history of domesticated rice by analyzing de novo assembled genomes from domesticated rice and its wild progenitors. Our results indicate multiple origins, where each domesticated rice subpopulation (japonica, indica, and aus) arose separately from progenitor O. rufipogon and / or O. nivara.

We also show that there is significant gene flow from japonica to both indica (c. 17%) and aus (c. 15%), which led to the transfer of domestication alleles from early-domesticated japonica to proto-indica and proto-aus populations. Our results provide support for a model in which different rice subspecies had separate origins, but that de novo domestication occurred only once, in O. sativa ssp. japonica, and introgressive hybridization from early japonica to proto-indica and proto-aus led to domesticated indica and aus rice.
Similar reticulation histories have, of course, been reported for most domesticated organisms (see Are phylogenetic trees useful for domesticated organisms?), including dogs, cattle, horses, sheep, grapes, etc.

Tuesday, August 8, 2017

Where to retire - a network analysis

I am an elderly man, and it is getting towards time to retire. But where?

I could retire back in Australia; but, as Thomas Wolfe said: "You can't go home again." I could retire in Sweden, but the tax authorities are likely to then take 25% of my pension, which I need to be living on, instead. So, where to go?

This is a question that has occupied the minds of many people, for themselves as well as others; and so, inevitably, you will find web sites on the matter. For example, Live and Invest Overseas has a Retire Overseas Index, recommending particular places, which it updates annually; and International Living has a similar Annual Global Retirement Index.

To help me in my decision, let's look at the International Living data, The World’s Best Places to Retire in 2017. This site provides a rating (out of 100) of ten important characteristics, for 24 countries that might be of interest to retirees:
  • Benefits & discounts
  • Buying & renting
  • Climate
  • Cost of living
  • Entertainment & amenities
  • Fitting in
  • Health care
  • Healthy lifestyle
  • Infrastructure
  • Visas & residence
For 2017, the individual scores vary from 57-100, with "Benefits & discounts" and "Cost of living" varying the most between countries, and "Fitting in" and "Health care" varying the least.

The ten scores for each country can be averaged, to provide a rank ordering of the 24 countries. These average scores vary from 73.3 to 90.9, as shown in the first graph.

There is little to choose between the first three countries in terms of their average score (Ecuador, Mexico, Panama), nor between the next three (Colombia, Costa Rica, Malaysia). But this does not make these countries intrinsically equal. After all, both Panama and Ecuador handsomely outdo Mexico on "Benefits & discounts", while Mexico does better on "Cost of living". I need an analysis that takes into account which characteristics differ between the countries.

This is where a network analysis comes in handy, as a tool for exploratory data analysis. As usual in this blog, I have calculated the Manhattan distance pairwise between the countries; and I am displaying this in the next figure using a NeighborNet network. Countries that have similar retirement characteristics are near each other in the network; and the further apart they are in the network then the more different are their characteristics.

The countries are color-coded by geography, which shows that their actual location has little effect on the Retirement Index. However, the European countries are gathered at the bottom-left, without any representative from Asia. The six top-ranked countries are all clustered in the bottom-right of the network.

Next to this top-rank cluster come Portugal and Spain on one hand, and Nicaragua on the other. These three countries have similar Retirement Scores, but they are separated in the network because Nicaragua scores poorly on "Infrastructure" and "Health care", but better than Europe on "Cost of living", "Buying & renting" and "Healthy lifestyle".

Spain does better than Portugal on "Entertainment & amenities"!

All in all, Portugal look like a good bet to me. The Live and Invest Overseas site lists individual places to retire, not just countries, and for the past three years it has recommended the Algarve region in Portugal as the top location.

Importantly, the Portugese also won't tax my pension (Pension i Portugal ger skattefria miljoner), although the Swedish government is not happy about this, of course (Skattefrihet ska stoppas: Portugal till förhandlingsbordet).

Tuesday, August 1, 2017

Stacking neighbour-nets: a real-world example

In my last post, I outlined two ideas about how stacking neighbour-nets can assist in tracing evolutionary change over time, using a theoretical example. In this post, I will show how this could work using a (tricky) real-world example: a morphological matrix including a high proportion of fossil taxa and a good deal of (strongly) homoplasious characters (Bomfleur, Grimm & McLoughlin 2017).

Stacking can be valuable when both fossil and extant taxa are included in the study. The idea of stacking is to construct networks for each time slice, rather than creating one giant network that tries to encompass everything. Adjacent time-slice networks can then be directly compared, which should reveal the evolutionary changes that occurred between those two times. The final phylogeny can then be constructed from this information, including all of the extant taxa and fossils together.

I regard our work as quite innovative for a palaeobotanical/-phylogenetic systematic study, as it generated a taxon-dense dataset down to species (sometimes individual specimens) as ‘operational taxonomic units’ (OTUs). Our goal was to provide a unifying classification for extant and fossil Osmundales (royal ferns) rhizomes. The primary purpose is hence not to infer a phylogenetic tree but to assist in describing and placing new-found rhizome fossils in the phylogeny. The placement workflow (see this tutorial) combines a polytomous key (using conserved, lineage-diagnostic traits) with neighbour-nets that use different taxon sets. We discussed odd placements in the splits graphs, and matrix signal quality (robustness) from differential branch support, as estimated by non-parametric bootstrapping (least-squares, maximum likelihood, maximum parsimony).

Sources of incompatible data patterns in real-world data

The main problem with real-world data when it comes to inferring phylogenetic relationships, i.e. estimating the true phylogeny, are incompatible data patterns. For molecular matrices, the two main sources of signals that will be incompatible with the true phylogeny are back-mutations and model-bias. For instance, there is usually a higher probability for transitions than for transversions; and for coding gene regions, the 3rd codon position can become over-saturated and thus stochastically distributed, providing little phylogenetic signal. By adapting the model in a probabilistic environment, we can (try to) counter such biases during inference

In the case of morphological (or other non-molecular) traits, incompatible signals arise from:
  1. homoplasious characters – traits that evolve convergently or in parallel, which are frequently included in such matrices;
  2. epigenetic effects – morphological traits not, or not fully, controlled by the genetic composition of the organism; and
  3. pseudo-homologies – traits that are seemingly the same but are the endpoint of different evolutionary pathways.
Inferring a tree reflecting the true phylogeny from such a matrix may be very difficult or even impossible. For a perfect probabilistic approach, we would need to establish character-wise probabilities for change, which requires that a lineage has a modern-day diversity fairly matching that in the past.

Fossils add further sources of signals incompatible with the true phylogeny, such as: preservation artefacts and misinterpretations (false homologies); uncertainty linked to heterochrony; and, last but not least, ‘temporal’ convergences, i.e. the parallel or convergent evolution of the same (or similar) trait in an ancient sister or unrelated lineage of a modern (or much younger) lineage.

For all of these aspects, the royal fern rhizomes provide a nice example (i.e. a bad-case scenario). Only a few of the 45 scored traits that can be observed in fossil material are conserved within the modern lineages and their extant representatives, and hence are of high diagnostic value for assigning fossils to one of these lineages. Many other rhizome features are variable within extant members of the now six genera (some even within a species), and increasingly so looking back into the past.

The royal ferns became arborescent several times, as reflected by convergent adaptations in rhizome anatomy — highly complex stele architectures are found from the Permian onwards in (morpho)species that differ in all relatively stable, lineage-diagnostic traits. The most complex modern-day rhizomes have anatomies that appear to be less derived than those of some of their ancient counterparts. Nonetheless, the rhizomes, scored for 129/130 OTUs (fossil species, partly referring to individual specimens) in our matrix (click here for an annotated version for use with Mesquite), reflect a substantial past diversity and cover more than 250 million years of evolution.

Basic data situation

The all-inclusive neighbour-net (Fig. 1; see here for a fully annotated version) captures aspects of similarity patterns related to phylogenetic relationships, but does not clearly resolve the known (modern) or putative (extinct) genera within the core group Osmundoideae, for example. Overall branch-support is generally low for any alternative (details can be found here), independent of the optimality criterion used. [For our systematic treatment, we used data subsets to generate a series of networks including only members of the same (putative) lineage, which were increasingly proficient to sort the OTUs.]

The main problems are: (i) the differentiation between less-derived rhizome anatomies of the Osmundoideae found in the likely paraphyletic extinct genus Millerocaulis (pink in Fig. 1) and the modern genus Claytosmunda (magenta, paraphyletic with one survivor); and (ii) the distinctness and superficial similarity of two arborescent lineages, the genus Osmundacaulis (red) and the extinct (Permian to Jurassic) family Guaireaceae (greenish). They differ in all stable, lineage-diagnostic characters but share highly dissected steles. Phylogenetic trees "resolve" this conflict by creating an artificial clade (e.g. the parsimony cladogram by Wang et al. 2014). The neighbour-net (Fig. 1) places Osmundacaulis between the Guaireaceae and the Osmundoideae, the subfamily of Osmundaceae including the surviving modern genera.

Fig. 1. Neighbour-net based on a morphological distance matrix of 122 OTUs representing Permian to extant Osmundales and their putative relatives, the Grammatopteridales (black).

Stacking procedure one: identifying closest relatives in subsequent time-slices

Signal ambiguity (from homoplastic characters and the related resolution issue) affects also the time-wise networks to some degree. Figures 2–4 show the network-per-time-slice stacks. Each neighbour-net includes only the OTUs from one stratigraphic period (Permian, Triassic, Jurassic, Cretaceous, Paleogene + Neogene) and the modern-day survivors. For simplicity, links are only established for the closest potential relative in the subsequent or preceding time-slice; and only shown when the mean morphological distance (MD) does not exceed 0.25. The colouring of the dots reflects the systematic affinity of the taxon as established by Bomfleur et al. and shown in Fig. 1.

A major taxonomic turnover characterises the transition from the (late) Permian to the Triassic (Fig. 2). The most primitive (rhizome-wise) Osmundales, the Thamnopterioideae (brown) become extinct, and are completely replaced by the Osmundoideae, their modern counterparts. The only representative of the Permian diversity remaining in the Triassic appears to be Millerocaulis (?Palaeosmunda) stipabonnetiorum, and this may provide a good taxon for rooting the Triassic phylogeny. However, it also one of the worst-preserved and most poorly described taxa — to some degree, its similarity with both lineages of Permian Osmundaceae (Thamnopterioideae and Palaeosmunda) may hint that the distances are under-estimated, since traits could not be scored that otherwise lead to increased distances.

Fig. 2. Taxon-reduced neighbour-nets, including only species from the same time-slice (as labelled). Inter-time-slice links indicate the morphologically closest match in the preceding or following time-slice for each species (in case of pairwise distances < 0.25)

The Jurassic graph (in Fig. 2) highlights a decrease in overall diversity, despite the much higher numbers of OTUs. The links can help to establish relationships between congeners of both time scales; but for Osmundastrum (today represented by a single, genetically and morphologically derived species) a more pronounced evolutionary shift is indicated: the Triassic putative member is linked to Jurassic Millerocaulis species (a paraphyletic Osmundoideae genus defined by the absence of a trait found in all extant genera), which are relatively close to the first unambiguous Osmundastrum. We also find that the three Jurassic newcomers have little relation to the Triassic basis (Fig. 2).

The linking of the Jurassic and Cretaceous time-slices highlights (Fig. 3) a general weakness of the approach using this matrix: poorer preserved, incompletely described fossils included in the matrix (Cretaceous Millerocaulis) attract most links from the Jurassic Osmundoideae — their distances are under-estimated.

Fig. 3. As above, but linking the Jurassic and subsequent Cretaceous neighbour-nets. Note the decreasing diversity but clear signals for Osmundacaulis (red) in contrast to the group of modern Osmundoideae (purplish). Plenasium (light blue) is a modern arborescent genus with complex and highly dissected steles and generally derived rhizomes.

The two Osmundastrum, which are probably part of the same evolutionary lineage, are not linked (see Bomfleur, Grimm & McLoughlin 2015 for the reasons). Two modern lineages with more or strongly derived rhizomes appear in the Cretaceous, the Todinae and Plenasium.

In the case of the Todinae the Jurassic links are partly ambiguous, with one Cretaceous OTU linked to Jurassic Claytosmunda (part of the Todinae’s sister clade according to molecular data), but the other with some relatively distinct Millerocaulis. The problem here is that the Todinae may have diverged earlier (Bomfleur, Grimm & McLoughlin 2015; Grimm et al. 2015), but their rhizome fossils have so far not been found (or lack the diagnostic characters of the lineage). Gaps in the fossil record can hinder establishing meaningful links. The links are, however, to a group of Millerocaulis that are closer to coeval Claytosmunda – which show a rhizome anatomy that may be closest to that of the common ancestor of all modern-day king ferns – than to their congeners. In the case of Plenasium, the genus with the most-derived rhizomes of all modern Osmundaceae, the closest older relative is part of the same subgroup of Millerocaulis. These potentially false links may reflect that some Millerocaulis show derived character suites, which are typically found also in one or another modern Osmundaceae genus (similarity due to convergence).

The closer we get to the modern-day situation, the more interpretable the links become (Fig. 4). Lineages with distinct and derived rhizome anatomies such as Osmundastrum and Plenasium are linked across time-slices. Cross-generic links from Cretaceous Millerocaulis to Paleogene-Neogene Osmunda to modern-day Claytosmunda relate directly to higher numbers of shared, possibly primitive characters in the connected taxa; these links can again be informative for rooting the graphs. Substantially weaker links (mean morphological distances > 0.1 between time-slices) are found for distantly related pairings (Cretaceous and extant Todinae with Paleogene-Neogene Osmundastrum and Claytosmunda).

Fig. 4. As above, but for Cretaceous to modern-day.

Stacking procedure two: graphs including taxa of two subsequent time-slices

Figures 5 and 6 show the two-adjacent-time-slices-per-graph stacks. Interpretation of these figures is more straightforward — one just compares the placement of the connecting taxa (Triassic and Jurassic in Fig. 5; Paleogene and Neogene in Fig. 6). The resolution issue regarding the relationship between Millerocaulis and genera representing the modern lineage (Claytosmunda, Osmundastrum, Plenasium, Leptopteris, Todea) is obvious — the Triassic Millerocaulis are clustered in the Permo-Triassic graph, but are placed apart within the spider-web-like portion in the Triassic-Jurassic graph (Fig. 5). This could mean that several lineages of Millerocaulis diversified in the Jurassic, all of which have their roots in the Triassic. Some of the emerging Millerocaulis groups remain coherent in the Jurassic-Cretaceous graph (and can include Cretaceous species), put their position relative to each other can change. In contrast, for Osmundacaulis the Cretaceous newcomers simply fit into the existing organisation.

Fig. 5. Stack of neighbour-nets comprising species of two subsequent time-slices, covering the time from the Permian to the Cretaceous. Connections relate to Triassic (lower half) or Jurassic (upper half) species that are included in two subsequent splits graphs.

The transition from the Cretaceous to the modern-day situation (Fig. 6) fairly reflects what could be inferred by mapping morphological characters onto the molecular tree. The placement of Osmunda species in the graphs reflect evolutionary change towards the modern-day species, whereas stasis can be assumed for Osmundastrum, and a loss of diversity for Claytosmunda. According to the structures of the graphs, the modern-day Plenasium (subgenus Plenasium) replaced the more diverse (and partly more derived) Cretaceous-Paleogene Plenasium (subgenus Aurealcaulis); but the genus is absent from the Neogene, so there are no connections between the ‘65–5 Ma’ and ‘last 25 Ma’ graphs.

Fig. 6. As above, but covering the time from the Cretaceous to now. Connections refer to Paleogene (lower half) and Neogene (upper half) species.

Now that it’s done, what can be said?

Establishing similarity links across time-slices can be tedious or even misleading, especially with increasing numbers of taxa and increasing complexity of the signals in the matrix (Figs 2–3). The process is more time-consuming and the result (Figs 2–4) is graphically more challenging than the alternative stacking procedure (Figs 5–6).

With most real-world data, it may be difficult to get a set of links between time slices that reflect the true phylogeny, like it did in my earlier theoretical example. Nonetheless, the procedure can help to identify potential relatives (ancestors, descendants, sister lineages) of groups that are restricted to a single time slice, or highlight the lack of potential or favourable candidates.

However, in general, joining the taxa from two subsequent time-slices in one graph, and connecting these graphs by the shared taxa, seems to be a more feasible and straightforward approach. Once a matrix is compiled, the distance calculation and splits-graph inference is a matter of minutes, and it takes less than half-an-hour to produce a first graphical output using the graphical functions in SplitsTree and software to graphically stack the exported SVG or EPS files (further beautification may take a day). Taxa with odd signals (with ambiguous affinity) will be placed accordingly in the nets and eventually move around in the two containing graphs (Fig. 5) and the amount of evolutionary change across time may be directly visible (Fig. 6).

Additional links for readers interested in details

Figure illustrating the history of taxonomic systems for Osmundales.
— An archive including all analysis files generated in the course of the original study is hosted at the Dryad Digital Repository.
— Further annotated versions of the figures shown in this post and the used analysis files have been published under a CC-BY licence: Grimm G. (2017) Osmundales diverstity through time: stacking networks. figshare. https://doi.org/10.6084/m9.figshare.5255014.v1.


Bomfleur B, Grimm GW, McLoughlin S (2015) Osmunda pulchella sp. nov. from the Jurassic of Sweden—reconciling molecular and fossil evidence in the phylogeny of modern royal ferns (Osmundaceae). BMC Evolutionary Biology 15: 126.

Bomfleur B, Grimm GW, McLoughlin S (2017) The fossil Osmundales (Royal Ferns)—a phylogenetic network analysis, revised taxonomy, and evolutionary classification of anatomically preserved trunks and rhizomes. PeerJ 5: e3433.

Grimm GW, Kapli P, Bomfleur B, McLoughlin S, Renner SS (2015) Using more than the oldest fossils: Dating Osmundaceae with the fossilized birth-death process. Systematic Biology 64: 396-405.

Huson DH, Bryant D (2006) Application of phylogenetic networks in evolutionary studies. Molecular Biology & Evolution 23: 254-267.

Maddison WP, Maddison DR (2001 onwards) Mesquite: a modular system for evolutionary analysis.

Wang S-J, Hilton J, He X-Y, Seyfullah LJ, Shao L (2014) The anatomically preserved Zhongmingella gen. nov. from the Upper Permian of China: evaluating the early evolution and phylogeny of the Osmundales. Journal of Systematic Palaeontology 1: 1-22.