Contributions to Zoology, 84 (2) – 2015Valentin Rineau; Anaïs Grand; René Zaragüeta; Michel Laurin: Experimental systematics: sensitivity of cladistic methods to polarization and character ordering schemes
Material and methods

To refer to this article use this url:

Simulated character sets

Reference topologies. Data was simulated on three topologies of 20 ingroup OTUs and one outgroup OTU and a total of 18 branch length settings (Fig. 5). One of the reference topologies is fully pectinate (i.e., fully asymmetrical), a second is fully symmetrical for the ingroup (balanced) and a third is an equiprobable (randomly generated) tree, intermediate in asymmetry between the two others (Fig. 5A-F). In this paper, branch lengths represent evolutionary time, but given that we simulated characters using Brownian motion (see below), branch lengths also reflect expected character variance (Felsenstein, 1985).

Reference trees. Three outgroup branch lengths were used on each of the reference topology, leading to nine ultrametric reference trees (Fig. 5A-C). Nine other branch lengths differing in both ingroup and outgroup were specified on the equiprobable tree with taxa of various geological ages, leading to 9 additional non-ultrametric reference trees (Fig. 5D-G). For six out of these nine trees, all terminal and internal branch lengths of the ingroup were set to one, and the outgroup branch was set at zero, one, three, five, six tenths, or the full tree depth (Fig. 5D). Three out of the nine trees (Fig. 5E-G) were generated by modifying the ingroup branch lengths on the equiprobable tree with the outgroup branch set at zero length (actual ancestor).

Simulated matrices. From each of the nine reference trees illustrated in Fig. 5A-C, 100 matrices of 100 characters × 10 states were simulated (i.e., a total of 900 matrices). Similarly, 100 matrices of 100 characters × 10 states were simulated for each of the nine paleontological trees illustrated in Fig. 5 D-G (i.e., a total of 900 paleontological matrices). We thus simulated a total of 1800 matrices, which were produced using Mesquite and the scripts in Supplementary Online Materials 1 (S1) on data that were discretized using Excel spreadsheets (S2 for parsimony; S3 for 3ta) and that were compiled into S4 (which are the matrices in the parsimony format).


Fig. 5. Trees used for our simulations: A, pectinate; B, symmetric; C, equiprobable; D, equiprobable with branch length set to 1; E-G, equiprobable with steady increase of internal/external branch length ratio. Each color represent a specific outgroup branch length expressed as a proportion of total tree depth (A-C: blue, 1; green, 1/2; red, 1/4; D: dark blue, 1; light blue, 2/3; green, 1/2; yellow, 1/3; orange, 1/10; red, 0). A-C represent trees with a neontological ingroup; trees D-G represent paleontological trees (with diachronous tips).

Character coding. The characters were simulated with continuous Brownian motion in Mesquite (Maddison and Maddison, 2014) to represent data inherently ordered as morphoclines (such as size or shape characters). Simulations were made using this evolutionary model because it is one of the simplest and most widely used in evolutionary biology to study the evolution of continuous phenotypic characters. For example, phylogenetic independent contrasts (Felsenstein, 1985) and squared-change parsimony (Maddison, 1991) assume this model. Characters simulated through Brownian motion are continuous; they were then discretized into 10 equal intervals representing character states, in order to simulate morphoclines following the simple procedure described in Laurin and Germain (2011). Because Brownian motion has no tendency, the resulting distribution is Gaussian; thus, gap coding cannot be used, and the limits between states are arbitrary. The primitive condition is determined by the outgroup criterion. The variable outgroup branch lengths allow us to assess the influence of polarization errors, whereas the variable ingroup branch lengths allow assessment of the impact of geological age of ingroup taxa on tree resolution (presumably by altering support of the clade subtended by the various branches), thus enabling a comparison of paleontological and neontological datasets. Each of the 100 characters (in the 1800 matrices) was coded in three different ways corresponding to unordered parsimony, ordered parsimony (with linear character states) and 3ta.