To refer to this article use this url:

Contributions to Zoology, 71 (1/3) (2002)

The limitations of ontogenetic data in phylogenetic analyses

Stefan Koenemann , Frederick R. Schram

Institute for Biodiversity and Ecosystem Dynamics, University of Amsterdam Mauritskade 57, 1092 AD Amsterdam, The Netherlands

Keywords: Ontogeny, heterochrony, event pairs, vertebrate development, sequence data, phylogenetic methodology, parsimony, neighbor joining.


The analysis of consecutive ontogenetic stages, or events, introduces a new class of data to phylogenetic systematics that are distinctly different from traditional morphological characters and molecular sequence data. Ontogenetic event sequences are distinguished by varying degrees of both a collective and linear type of dependence and, therefore, violate the criterion of character independence. We applied different methods of phylogenetic reconstruction to ontogenetic data including maximum parsimony and distance (cluster) analyses. Two different data sets were investigated: (1) four simulated ontogenies with defined phylogenies of six hypothetical taxa, and (2) a set of “real” data comprising sequences of 29 ontogenetic events from 11 vertebrate taxa. We confirm that heterochronic event sequences do contain a phylogenetic signal. However, based on our results we argue that maximum parsimony is a biased method to analyze such developmental sequence data. Ontogenetic events require a special analytical algorithm that would not neglect instances of chronological (horizontal) dependence of this type of data. One coding method, “event-pairing”, appeared to fulfill this requirement in the vertebrate analyses. However, to accurately analyze ontogenetic sequence data, a more sophisticated coding method and algorithm are needed, for example, measuring distances of dependent events.


One significant gateway that the era of electronic data processing opened for systematics was the application of molecular data to phylogenetic reconstructions. After the analytical potential of molecular sequence data went through an initial period of euphoric overestimation in the eighties and early nineties, the methodological framework has undergone a phase of critical review and refinement (e.g., see Telford, this volume, However, while re-evaluations of morphological versus molecular data assessment continue, a special class of genes has entered the stage. The liaison of developmental genetics and systematics, now commonly referred to as “evo-devo”, successfully explores new sources of data for the reevaluation of proposed relationships within and among animal groups. For example, the comparison of Hox gene expression patterns in arthropods convincingly overturned some long standing views concerning the assumed homology of particular segments and appendages (Akam 1995; Telford and Thomas 1998; Damen et al 1998; Browne and Patel 2000; Schram and Koenemann 2001). Despite the fact that Hox genes are still restricted to comparable expression patterns within a small selection of taxa they have become a powerful source of data to resolve conflicting or poorly known phylogenies. In this context, we would like to evaluate whether heterochronic events also contain a detectable phylogenetic signal. For example, if chronological changes of deploying morphological structures or organs in embryos are subjected to inheritable (shared) developmental constraints, it should be possible to reveal a pattern of descent.

Certainly, the conceptual link of phylogeny and development is not new to science. Unlike sequencing of homeotic genes, the study of ontogenetic “events”, i.e., the formation of organs and morphological structures during embryogenesis, has a long history. Detailed descriptions of embryonic stages, especially for vertebrates, date back into the 19th century providing an abundant array of data for a relatively broad range of taxa. Yet, the phylogenetic importance of ontogeny has probably an equally long record of dispute, of which the rejection of Haeckel’s idea of recapitulation represents a prominent example. Another, still ongoing controversy concerns the Hourglass Model of conserved developmental stages, the so-called “phylotypic stage”, within major taxa (see Wheeler 1990, and Smith 2001 for a historical and critical review). The conflicting conceptions indicate that pattern and process of ontogeny in a phylogenetic context are not yet fully understood. However, this is a central prerequisite for the development of analytical tools. We will show that there are additional, intrinsic factors that complicate rigorous analyses of ontogenetic data.

Unlike “classical” morphological characters that define adult anatomy, ontogenetic structures have a transient nature. Ontogeny is a dynamic developmental process distinguished by structural transformations (events) rather than fixed character states (Alberch 1985). Some of the early attempts to phylogenetically analyze ontogenetic data treated whole transformation series of individual taxa as single characters (see Klompen and O’Connor 1989, and references herein). While the practical application of this approach may be limited to comparatively small data sets, its innovative merit is the recognition of ontogenetic characters as a sequential type of data, or “sequence combinations” (Mabee 1993). This idea was independently taken a step further by several workers in the late nineties, who recognized the importance of the relative timing of ontogenetic characters and introduced a new method of encoding ontogenetic data as “sequence units” or “event pairs” (Smith 1996, 1997; Velhagen 1997; Mabee and Trendler 1996). In the primary literature, the observed appearance of a morphological structure or organ during embryogenesis is either given in measures of absolute time, e.g., minutes, hours, or days, or in subsequent stages. However, because absolute time units vary for different taxa, quantitative comparisons of ontogenetic events are difficult to make. The above studies evaded this dilemma by relating the occurrence of individual events to every other event within the developmental sequence of an embryo. For example, in a particular taxon, structure I occurs before structure II, structure II occurs before structure III, etc. (Table 1). The resulting event-paired data (Table 2), or “event sequences”, can subsequently be translated into binary coding and analyzed using cladistic software programs (Velhagen 1997; Jeffrey et al. 2002a, 2002b). Guralnick and Lindberg (2000) in their parsimony analysis of invertebrate cell lineage data showed that useful ontogenetic event sequences are not necessarily restricted to relatively well-developed vertebrate embryos.


Table 1: Ranked events – Matrix of time sequences given for a series of ontogenetic events. Each time represents the first observed occurrence of a morphological structure or organ during embryogenesis relative to other events (ranked events).

Ontogenetic events





Taxon 1

time 1

time 1

time 2

time 3

Taxon 2

time 1

time 2

time 2

time 3

Taxon 3

time 1

time 2

time 2

time 3

Taxon 4

time 2

time 1

time 3

time 4

Several workers who investigated event pairs addressed the non-independence of ontogenetic data (Jeffrey et al. 2002a; Nunn and Smith 1998; Smith 1997). However, we think that event sequencing introduces a entirely new class of data to phylogenetic systematics that are distinctly different from morphological characters and molecular sequence data. Many (if not most) ontogenetic events are characterized by both a collective and linear type of dependence and, in this, violate the criterion of independence. Therefore, maximum parsimony cannot be the appropriate method to analyze ontogenetic data. To test this assumption we empirically applied different parsimony and distance methods to simulated ontogenetic events sequences of a selection of model taxa. For each simulated data set, two matrices were built based on two alternative coding methods: (1) events ordered by relative occurrence (ranked events), and (2) event-pairing. This setup allowed us to compare individual results with a predictable phylogeny and test the effectiveness of the methods investigated. In addition, the experimental design was employed to analyze a set of “real” data comprising 11 vertebrate taxa, with sequences of 29 ontogenetic events (from Jeffrey et al. 2002a). Based on our results we argue that effective analysis of ontogenetic sequence data will require a special analytical algorithm.


Table 2: Event pairing – Within the ontogenetic sequences of the five taxa in Table 1, the relative occurrence of each individual event is compared with all other events, and coded as follows: state 0: events xi and xj occur simultaneously (at the same time); state 1: event xi occurs before event xj; state 2: event xi occurs after event xj. For example, in the first three data columns, event I is compared to events II-IV, in the fourth and fifth columns, event II is compared to events III-IV, etc.

Ontogenetic events




Taxon 1







Taxon 2







Taxon 3







Taxon 4








Analyses of simulated ontogenetic sequences

To test the effectiveness of different methods of phylogenetic reconstruction we simulated the ontogenies of six model taxa A-F (Appendix A). For each taxon, sequences of nine ontogenetic events were defined to construct a predictable phylogeny based on the assumption that heterochronic changes of homologous structures follow a detectable pattern of descent in closely related taxa. For example, the event “first occurrence of limb buds” in an ancestral taxon may be gradually accelerated in a series of successors. The phylogeny of taxa A-F was defined as follows:

The event sequence of taxon D was designated as ancestral sequence (out-group) and ordered from earliest to last event (see tables of ranked event data in Appendix A). Subsequently, a dichotomous branching pattern was constructed from the designated ancestor D to two designated descendant clades [C, B, A], and [E, F], respectively. For example, taxon C is the descendant of ancestor D, taxon B descendant of C, taxon A descendant of B. Correspondingly, a second lineage of descent was determined from ancestor D to E to F. This setup allowed us to conveniently compare a distinct, dichotomous branching pattern with topologies obtained by the methods of reconstruction investigated. We will refer to this defined phylogeny of taxa A-F as “expected tree” in the following text.

The dichotomous branching pattern described above was replicated four times. For each simulation, we mapped different series of heterochronic shifts on the expected tree (Figs. 1.1, 2.1, 3.1, 4.1), always starting from the ancestor D towards the two descendent clades. In doing so, different event distributions could be alloted to each simulation, while the basic expected topology was maintained. For simulations 1 and 2, heterochronic shifts were assigned in small, gradual steps, e.g., an event that occurs at time 5 in the ancestral sequence can only shift to sequential position 4 or 6 in the descendant. In contrast, events in simulations 3 and 4 were allowed to shift to any position within a sequence, for example, an event occurring at time 5 may shift to position 9.

To be able to compare the lengths of trees obtained by PAUP with those of the expected trees the chronological shift of an event was treated as two steps because it involves two positional changes within a sequence. For example, the events IV and V occur at relative times 4 and 5 in an ancestral taxon, respectively. In the closest descendant, event V develops before event IV, at time 4, while event IV is now scored at time 5 (see tables of ranked event data in Appendix A). However, from a developmental point of view, we could, of course, alternatively argue that just one event was accelerated or retarded.

Each simulation was analyzed using the phylogenetic methods described below.

Ontogenetic data of vertebrates

In addition to the simulated analyses, we analyzed “real” ontogenetic data of 11 vertebrate taxa from a recent study (Jeffrey et al. 2002a). To avoid the occurrence of question marks for unknown data in our experimental design, we selected 29 out of 41ontogenetic events investigated by Jeffrey et al. (see Appendices B and C).

Coding methods

Ranked events

For each ontogenetic sequence, the times of occurrences of events were assigned by integer values. This coding method represents the relative chronological order of events during embryogenesis, for example, “time 1” encodes the observed occurrence of the first event within a sequence, “time 2” the second event, etc. (Table 1). In the following text, we will refer to this coding method as ranked events.


Instead of simply ranking ontogenetic events in the relative order of occurrences, it is also possible to apply an alternative coding method, event-pairing (Smith 1996, 1997; Velhagen 1997; Jeffrey et al. 2002a). In this method, the relative occurrence of each individual event within the ontogenetic sequence of a taxon is compared with all other events, and coded as follows (Table 2):

State 0: Events xi and xj occur simultaneously (at the same time)

State 1: Event xi occurs before event xj

State 2: Event xi occurs after event xj

The ranked event matrices of the four simulated ontogenies and the vertebrates were translated into event-pair matrices (Appendices A and D) and analyzed using the same set of parsimony and distance methods.

Methods of phylogenetic reconstruction

All analyses were conducted using PAUP 4.0b6. For each data set, alternative runs were conducted using altered parsimony and distance settings (see below).

Because PAUP offers only two distance measures and limited options for clustering methods, the software package CALCDIST VO.1 was employed to combine a larger selection of distance measures with various clustering algorithms. CALCDIST analyses were applied to both the vertebrate data and the simulated data sets.


Each data set was analyzed by ‘exhaustive search’; character optimization: delayed transformation (DELTRAN). All events were left unordered and equally weighted, topological constraints were not enforced, and the ‘MulTrees’ option was effective.

Distance (cluster) analyses

In PAUP, vertebrate and simulated data were investigated choosing each of the two distance measures available: total and mean character differences. Similarly, the linkage or clustering algorithms Unweighted Pair-Group Method using Arithmetic Averages (UPGMA) and neighbor-joining were alternatively employed for tree reconstructions. Alternative distance settings were applied to each cluster analysis, e.g., “minimum evolution” and “weighted/unweighted least squares” as objective functions.


Only those alternative program settings and methods are described herein that produced incongruent results. For example, all trees obtained by “minimum evolution” and “weighted/unweighted least squares” were identical. Similarly, results obtained using designated ancestral sequences were fully compatible.

Simulation 1

The reconstructed tree of the first simulated data set has a length of 14 steps (Fig. 1.1).


Fig. 1. Analyses of simulation 1. [1] Expected tree based on defined phylogeny, with changing events mapped onto branches; length 14 steps. Ontogenetic events represented by Roman numerals, relative times of occurrences by Arabic numerals. [2] Parsimony analysis of ranked event data; one tree found, length 14 steps. [3] UPGMA analysis of ranked event data. [4] Neighbor-joining analysis of ranked event data. [5] Parsimony analysis of event-paired data; one tree found. [6] Neighbor-joining analysis of event-paired data. Branch lengths of UPGMA and neighbor-joining trees in Arabic numerals.

Ranked events. – The parsimony and neighbor-joining analyses yielded trees with identical topologies, both of which consistent with the expected tree (Fig. 1.2+4). The phylogeny obtained by UPGMA is incompatible with the expected tree (Fig. 1.3): It renders the taxa [C, D, E, F] as monophylum.

Event-pairing. – The single tree obtained by the parsimony analysis fails to resolve the relationships of all taxa but A and B (Fig. 1.5). The neighbor-joining tree shows a monophyletic clade comprising taxa [E, B, A], which deviates from the expected branching pattern (Fig. 1.6).

Simulation 2

The expected tree has a length of 12 steps (Fig. 2.1).


Fig. 2. Analyses of simulation 2. [2, 5] The results of the parsimony analyses are strict consensus trees calculated from two trees, respectively. See Fig. 1 for legends.

Ranked events. – The parsimony analysis yielded two trees of same length (12 steps). However, the strict consensus tree of these two trees is completely unresolved (Fig. 2.2). The tree obtained by UPGMA shows F as a basal sister group to the clades [E, D, C] and [A, B] (Fig. 2.3). The topology of the neighbor-joining tree is identical with the expected tree (Fig. 2.4).

Event-pairing. – Similar to the ranked-event analyses, the parsimony consensus tree remains completely unresolved (Fig. 2.5), while the neighbor-joining analysis produced a tree congruent with the expected tree. (Fig. 2.6).

Simulation 3

The expected tree has a length of 20 steps (Fig. 3.1).


Fig. 3. Analyses of simulation 3. [2, 5] Both parsimony analyses produced one tree, respectively. See Fig. 1 for legends.

Ranked events. – All three analyses inconsistently rendered A as a sister group to B and C, and a larger clade composed of [F, A, B, C] (Fig. 3.2-4).

Event-pairing. – Both parsimony and neighbor-joining analyses yielded F as a sister group to an unresolved clade with [A, B, C] (Fig. 3.5+6).

Simulation 4

The expected tree has 25 steps (Fig. 4.1).


Fig. 4. Analyses of simulation 4. [2, 5] As in simulation 2, both parsimony analyses yielded strict consensus trees calculated from two trees, respectively. See Fig. 1 for legends.

Ranked events. – All three analyses produced different inconsistencies with the expected tree. The parsimony analysis rendered a monophyletic clade composed of E, F and C (Fig. 4.2). The UPGMA tree yielded [D, B, A] and [C, F, E] as two sister clades (Fig. 4.3). In the neighbor-joining analysis, the taxa [B, C, F, E] form a large monophyletic clade (Fig. 4.4).

Event-pairing. – Both parsimony and neighbor-joining produced branching patterns inconsistent with the expected tree (Fig. 4.5+6).

Vertebrate data

Ranked events. – The parsimony analysis failed to retain any of the possible, well-established monophyla: amniotes, artiodactyls, mammals, birds, and amphibians are rendered as para- or polyphyletic groups (Fig. 5.1). The trees resulting from both neighbor-joining and UPGMA are identical and yield mammals as a paraphylum, and birds as a polyphylum. Unlike in the parsimony tree, however, the amniotes form a large monophyletic clade (Fig. 5.2).


Fig. 5. PAUP analyses of ontogenetic vertebrate data. Coding method “ranked events” applied to 29 ontogenetic events. [1]: Single tree obtained by maximum parsimony analysis. Default out-group: newt; CI = 0.90, RI = 0.55; RC = 0.49; length = 205. [2]: Tree obtained by neighbor-joining analysis. Default out-group: newt; distance measure: total character distance.

Event-pairing. – Parsimony and neighbor-joining trees feature identical topologies, with birds, artiodactyls, mammals and amniotes as monophyla. (Fig. 6.1+2). The UPGMA analysis produced a monophyletic amniote clade. In this tree, the mammals are rendered as a series of paraphyletic taxa, with deer and pig as basal sister group and rat as a terminal sister group to the Diapsida (lizard + birds; tree not shown).


Fig. 6. PAUP analyses of ontogenetic vertebrate data. Coding method “event-pairing” applied to 29 ontogenetic events. [1]: Single tree obtained by maximum parsimony analysis. Default out-group: newt; CI = 0.70, RI = 0.56; RC = 0.39; length = 467. [2]: Tree obtained by neighbor-joining analysis. Default out-group: newt; distance measure: total character distance. Note that basal branching patterns of both trees are not incongruent: The basal polytomy of frog and newt in the neighbor-joining tree versus paraphyly taxa in the parsimony analysis can be affected by different rooting options.

Distance measures

Alternative distance measures and clustering (linkage) algorithms applied to all data sets in CALCDIST generally produced results in agreement with the PAUP neighbor-joining analyses. Only in one instance, a neighbor-joining analysis based on Euclidean distance yielded a branching pattern more congruent with the expected tree: In simulation 3, the relationship of E and F are correctly resolved, while A still remains a sister group to B and C (Fig. 7.2).


Fig. 7. Comparison of different distance measures. Neighbor-joining analyses of ranked event data of simulation 3. [1] Total character difference; tree calculated in PAUP. [2] Euclidean distance; tree calculated in CALCDIST. Branch lengths in Arabic numerals.


Conflicting results

A comparison of the vertebrate trees generated in this study with generally accepted amniote relationships clearly features event-pairing as the more accurate coding method (Fig. 6), whereas coding based on ranked events generates obviously unlikely branching patterns (Fig. 5). Among both methods of phylogenetic reconstruction, neighbor-joining performed better than parsimony in the ranked event analyses (Fig. 5).

The results of simulations 1 and 2 do agree with those of the vertebrate data in featuring neighbor-joining as more reliable method. However, unlike in the vertebrate analyses, the ranked event matrices out-performed the event-paired data (Figs. 1.2+4, 2.4+6). The reconstructions obtained for simulations 3 and 4 did not accord with the expected tree (Figs. 3, 4).

Summarizing the results of both vertebrate and simulated data sets we can state that

· UPGMA produced incompatible results in all analyses.

· In the vertebrate analyses, both parsimony and neighbor joining rendered identical, acceptable phylogenies applied to event pairs; neighbor-joining performed slightly better in the ranked event analyses.

· In simulations 1 and 2, neighbor-joining yielded 3 accurate reconstructions, while parsimony rendered only one congruent phylogeny; the ranked event data produced three correct reconstructions, event-pairing only one.

How can the inconsistency of these results regarding coding methods, i.e., ranked events vs. event-pairing, be explained?

Taxon and event sampling

To be able to interpret the results of this study we need to understand the characteristics of the data investigated. There are two important differences that distinguish the vertebrate data set from the simulated sequences. First, the vertebrate matrix is composed of almost twice as many taxa, and second, it features five times more events than the simulated data sets. The methods investigated failed to detect correct relationships for simulations 3 and 4, in which heterochronic shifts were allowed to occupy any position within a sequence. This design, applied to only nine available events within a sequence, is subjected to a higher level of ambiguities. It may, for example, represent a limited, patchy selection of events from a more extensive ontogenetic transformation series. For a less ambiguous phylogenetic signal, more events would be needed. Therefore, the rule of thumb that the accuracy of phylogenetic reconstruction increases with the addition of taxa and/or characters obviously applies to ontogenetic sequences data as it does for morphological and molecular data.

The properties of ontogenetic data

However, there are additional factors that hamper phylogenetic analyses of ontogenetic data. Some of these limitations violate basic assumptions postulated for cladistic methodology.

A basic criterion of phylogenetic systematics is the concept of homology. An analysis that aims to investigate phylogenetic relationships within a group of taxa relies primarily on homologous characters that were derived from a common ancestor and are shared among its descendants. A second assumption concerns the independence of characters. A character analyses can include selections of morphological features, but also anatomical, physiological and/or behavioral traits, each of which is assumed to have evolved independently from any other character. The independence of homologized units is also postulated for molecular data. We assume in principle that each molecular position within a sequence has evolved independently.

In our study, we presume that compared ontogenetic events are homologous. For example, the induction of a lens placode involves the same embryonic tissues in each taxon analyzed. However, we cannot claim a priori independence for individual events (Alberch 1985; Schlosser 2001). For each developing structure or organ we can observe series of consecutive stages. We cannot exclude that the development of certain structures are affected by the same ontogenetic pathways. In these cases, we may have to accept a varying degree of collective dependence among several ontogenetic events (Table 3). For example, there is a cascade of genes that control the development of limbs during arthropod embryogenesis. In this cascading network, the gene engrailed typically regulates the compartmentilization of the arthropod body into parasegments during an early developmental phase. In addition, engrailed also influences the expression of hedgehog, which in turn, induces the expression of two other genes, decapentaplegic and wingless, in the anterior dorsal and ventral compartments, respectively. Finally, decapentaplegic and wingless trigger the expression of Distalless, a gene controlling the outgrowth of limb buds (Schram and Koenemann 2001). However, in the limb bud, Distalless is not expressed in the absence of hedgehog. Since many developmental genes have multiple functions, it is conceivable that several structures may be affected by slight changes in complex, cascading genetic networks, especially during early embryogenesis.


Table 3: Two cases of chronological (horizontal) dependence of ontogenetic data shown in a 2-dimensional matrix. In a collective dependence (thick line box), events II-VI are affected by the same developmental constraints, e.g., genetic cascades or inductions. In a linear dependence (thin line box), event IV can only appear after the development of event III is completed, e.g., events III and IV represent two successive stages of the same structure or organ. dE = ontogenetically dependent distances.

Moreover, two individual events can be part of different, consecutive developmental stages of a single structure or organ, e.g., the single or paired rudiments of the endocardial anlage have to appear before the first aortic arch can be formed. In case of the genetic cascade during arthropod embryogenesis, the formation of (para-) segments is most likely a prerequisite for the development of limbs on a particular segment. In these cases, we can speak of a linear dependence of ontogenetic events. Alberch (1985) defined this type of dependence as “causal series of events” (which he regarded as a constrained sequence) opposed to “temporal series of events” that lack causal correlations.

Both collective and linear dependence are causal. In both instances, we have a chronological (horizontal) correlation that is characterized by the distance dE of two ontogenetically dependent events (Table 3). Ontogenetic event sequences are series of morphological transitions. The special properties of these data distinguish them as a class apart from both morphological characters and molecular sequences. Ontogenetic sequences constitute a “hybrid” type of data showing properties of both morphological and molecular data. In addition to a 2-dimensional character/taxon matrix, we have to consider a third dimension, time, for ontogenetic events to take chronological dependence into account. For example, a group of events with assumed collective dependence can show characteristic slopes and peaks when plotted in a 3-dimensional diagram (Fig. 8). The subtle (chronological) variations of these curves obviously contain significant information, probably of phylogenetic relevance. Parsimony obviously neglects the distances of collectively dependent events. Data affected by horizontal dependence need special treatment in analyses of a classical character/taxon matrix.


Fig. 8. Four taxa of vertebrate analysis compared in 3-dimensional chart with logarithmic time scale. Data sets taken from ranked events matrix (Appendix B). From front to back: frog, chicken, monkey, man. The ontogenetic data are grouped according to developmental units (cf. Appendix C). For example, the selection of events concerning the cardiovascular development (Car; indicated by arrows) may be subjected to a collective dependence. For each of the four taxa, the cardiovascular events form characteristic curves with varying slopes and peaks (event distances), which probably contain valuable phylogenetic information neglected by parsimony.

Suggestions and solutions

Phylogenetic systematics is interested in qualitative and quantitative changes of homologous units. In the special case of ontogenetic sequence data, these units could be different rates of development, for example, measured as chronological event distances. In other words, we are analyzing heterochrony in a phylogenetic framework.

Based on our results we argue that phylogenetic analyses of ontogenetic sequence data require a special methodological approach (see also Alberch 1985). Parsimony is an inadequate and biased method to analyze this type of data because it neglects, and even explicitly excludes, horizontal character dependence. Cluster analyses, e.g., neighbor-joining, evaluate complete sequences of each pair of taxa based on distance measures. Although cluster analyses of sequential divergences do constitute a horizontal approach, the currently available distance measures and linkage algorithms are not sufficient to accurately evaluate the complexity of ontogenetic dependence.

The analyses of vertebrate ontogenies show that event-pairing out-performs simple event ranking. Since event-pairing particularly considers chronological order (event II occurs before event III, etc.) this coding method meets, at least in part, our proposed requirements. However, as stated for cluster analyses, this procedure does not embrace the complex network of developmental constraints. Event-pairing simplifies ranked data sets. It does not discriminate between dependent and independent events and important quantitative information provided by chronological distances of individual dependent events is disregarded. Therefore, we think that an optimal approach should be based on ranked events rather than on event-paired data. For reliable phylogenetic reconstructions, it will also be critically important to identify irreversible events among dependent ontogenetic characters. The solution to an adequate analytical method is more likely to be derived from specially adjusted distance measures and clustering algorithms, for example, with the optional designation of ancestral sequences and irreversible event shifts. Moreover, an optimized algorithm for ontogenetic sequences should ideally be able to analyze invertebrate data as well, which may feature entirely different developmental patterns, e.g., molting stages of arthropods.


Alberch P. 1985. Problems with the interpretation of developmental sequences. Syst. Zool. 34: 46-58.

Akam M. 1995. Hox genes and the evolution of diverse body plans. Phil. Trans. R. Soc. Lond. B 349: 313-319.

Browne WE, Patel NH. 2000. Molecular genetics of crustacean feeding appendage development and diversification. Seminars in Cell and Developmental Biology 11: 427-435.

Damen WD, Hausdorf M, Seyfarth E-A, Tautz D. 1998. A conserved mode of head segmentation in arthropods revealed by the expression pattern of Hox genes in spider. Proc. Nat. Acad. Sci. USA 95: 10665-10670.

Guralnick RP, Lindberg DR. 2001. Reconnecting cell and animal lineages: what do cell lineages tell us about the evolution and development of Spiralia. Evolution 55: 1501-1519.

Jeffrey JE, Bininda-Emonds ORP, Coates MI, Richardson MK. 2002a. Analyzing evolutionary patterns in amniote embryonic development. Evol. Development 4: 292-302.

Jeffrey JE, Richardson MK, Coates MI, Bininda-Emonds ORP. 2002b. Analyzing developmental sequences within a phylogenetic framework. Syst. Biol. 51: 478-491.

Klompen JSH, O’Connor BM. 1989. Ontogenetic patterns and phylogenetic analysis in Acari. In: André HM and Lions J-C, eds. The concept of stase and the ontogeny of arthropods. AGAR Publishers, Wavre, Belgium: 91-103.

Mabee PM, Humphries J. 1993. Coding polymorphic data: examples from allozymes and ontogeny. Syst. Biol. 42: 166-181.

Mabee PM, Trendler TA. 1996. Development of the cranium and paired fins in Betta splendens (Teleostei: Percomorpha): intraspecific variation and interspecific comparisons. J. Morph. 227: 249-287.

Nunn CL, Smith KK. 1998. Statistical analyses of developmental sequences: The craniofacial region in marsupial and placental mammals. Amer. Nat. 152: 82-101.

Schlosser G. 2001. Using heterochrony plots to detect the dissociated coevolution of characters. J. Exp. Zool. (Mol.Dev. Evol.) 291: 282-304.

Schram FR, Koenemann S. 2001. Developmental genetics and arthropod evolution: part I, on legs. Evol. Development 3: 343-354.

Smith KK. 1996. Integration of craniofacial structures during development in mammals. Am. Zool. 36: 70-79.

Smith KK. 1997. Comparative patterns of craniofacial development in eutherian and metatherian mammals. Evolution 51: 1663-1678.

Smith KK. 2001. Heterochrony revisited: the evolution of developmental sequences. Biol. J. Linn. Soc. 73: 169-186.

Telford MJ, Thomas RH. 1998. Expression of homeobox genes shows chelicerate arthropods retain their deutocerebral segment. Proc. Nat. Acad. Sci. USA 95: 10671-10675.

Velhagen WA. 1997. Analyzing developmental sequences using sequence units. Syst. Biol. 46: 204-210.

Wheeler QD. 1990. Ontogeny and character phylogeny. Cladistics 6: 225-268.


We would like to thank Michael K. Richardson, Jonathan E. Jeffrey and Olaf R.P. Bininda-Emonds from Leiden University for allowing us to use a compilation of vertebrate developmental data prior to publication, and for critical discussions during the course of this study. This is publication number 20 of the Dutch national research program in evolution and development, grant no. 805.33.430.



Appendix A

Matrices of simulated data sets. The ontogenetic event sequence of taxon D is designated as ancestral sequence, and ordered from earliest to last event for a more convenient comparison. Shaded cells indicate first occurrences of heterochronic shifts. Note that the acceleration of event II in taxon C could alternatively be interpreted as retardation of event I.


Simulation 1: Ranked events

Simulation 1: Event-pairs calculated from ranked events

A 1 2 1 1 1 1 1 1 2 1 1 1 1 1 1 1 1 1 1 1 1 2 1 1 1 1 1 1 1 1 2 1 1 1 1 2

B 1 2 1 1 1 1 1 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 1 1 1 1

C 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

D 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

E 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1

F 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1


Simulation 2: Ranked events

Simulation 2: Event-pairs calculated from ranked events

A 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 1 1 1 1

B 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 1 1 1 1

C 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

D 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

E 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2

F 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 1 1 1 2


Simulation 3: Ranked events

Simulation 3: Event-pairs calculated from ranked events

A 2 2 2 1 1 1 1 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 1 2 1 1 1 1 2

B 2 2 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 1 1 1 1 1 1 1 2

C 2 2 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2

D 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

E 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 1 1 1 1 1 1 1 1 1 1 1 1

F 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2


Simulation 4: Ranked events

Simulation 4: Event-pairs calculated from ranked events

A 2 2 2 1 1 1 1 2 1 2 1 1 1 1 1 2 1 1 1 1 1 1 1 1 1 1 2 2 2 2 1 1 2 1 2 2

B 2 2 2 1 1 1 1 2 1 2 1 1 1 1 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 1 2 2 2 2

C 1 2 1 1 1 1 1 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 1 1 2 1 1 2 1 1 1 1 2

D 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

E 2 2 1 1 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 1 1 2 1 1 2 1 1 1 1 1

F 2 2 1 1 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 1 1 2 1 1 2 1 1 1 1 2


Appendix B


Matrix of vertebrate developmental events ranked according to relative chronological occurrence (ranked events) based on corrected stages investigated by Jeffrey et al. (2002a). See Appendices C for a detailed listing of abbreviated developmental events. Newt = Triturus vulgaris (Smooth or Common Newt); Frog = Xenopus laevis (African Clawed Toad); Lizard = Lacerta agilis (Sand Lizard); Chicken = Gallus gallus; Lapwing = Vanellus vanellus (Lapwing); Budgerigar = Melopsittacus undulatus; Rat = Rattus norvegicus (Brown Rat); Pig = Sus scrofa domestica (Domestic Pig); Deer = Capreolus capreolus (Roe Deer); Monkey = Tarsius spectrum (Spectral Tarsier); Human = Homo sapiens. Data courtesy of Michael K. Richardson and Jonathan E. Jeffrey.


Appendix C


Developmental events used in analyses of vertebrate data. Abbreviations: Axi = Axial; Car = Cardiovascular; Int = Intestinal; Kid = Kidney; Lim = Limb; Neu = Neural; Olf = Olfactory; Opt = Optic; Oti = Otic; Pha = Pharyngeal. Data courtesy of Jonathan E. Jeffrey et al. (2002a).

Abbreviated events

Description of events

Axi A

1st somite

Car B

Endocardial tubes start to fuse

Car C

Heart looping

Car F

Endocardial cushions of atrioventricular canal

Car H

Trabeculae carneae in ventricles

Int A

Anterior intestinal portal begins as diverticulum (or archenteron reaches head fold)

Int B

Liver diverticulum

Int C

Dorsal pancreas as diverticulum

Int F

Ventral pancreas anlage(n)

Kid A

Mesonephric duct anlagen

Kid B

Paramesonephric duct anlagen

Kid C

Mesonephric ducts open into cloaca

Lim A

Forelimb bud

Neu A

Neural folds begin to fuse

Olf A

Nasal placodes appear as ectodermal thickenings

Olf B

Nasal placodes depressed (formation on olfactory pit)

Opt A

Optic vesicle as lateral evagination from neural tube

Opt B

Lens placode

Oti C

Otocyst closed, but still connected with surface ectoderm

Oti D

Otocyst detached from ectoderm

Oti E

Endolymphatic appendages

Pha A

2nd visceral pouch contacts ectoderm (formation of hyoid arch)

Pha B

Thyroid anlage

Pha C

3rd visceral pouch contacts ectoderm (formation of 1st branchial arch)

Pha D

Hypophysis anlage

Pha E

Lung buds as distinct paired evaginations


Appendix D

Download file

Matrix of event-pairs calculated from ontogenetic vertebrate data in Appendix B. See Methods and Tables 1 and 2 for detailed description of coding technique.