Contributions to Zoology, 68 (1) 3-18 (1998)Arne Ø. Mooers; Dolph Schluter: Fitting macroevolutionary models to phylogenies: an example using vertebrate body sizes

To refer to this article use this url:

Data collection

The first data set comprised a group of specieslevel molecular phylogenies of vertebrates obtained from the literature. Candidate phylogenies had to meet three criteria: (1) They had to include at least N1 of the N known species of the ingroup (“complete phylogenies”, sensu Mooers, 1995). This requirement greatly restricts the pool of available trees but is necessary when considering the speciational model of morphological evolution, where morphological change is concentrated at speciation events. We consider the effects of nonrandom extinction in the discussion. (2) Data sufficient to reconstruct the author’s tree using their algorithm had to be included. Most phylogenies were reconstructed from distance data (genetic distances from allozymes (Nei, 1978; Rogers, 1972), pvalues based on RFLP data (Nei & Li, 1979), DNA-DNA hybridization distances (see Sibley & Ahlquist, 1990)), or aligned gene sequence data and commonly used models of base substitution (cf. PHYLIP 3.5c; Felsenstein, 1995). We reconstructed each phylogeny using PHYLIP (Felsenstein, 1993; 1995) and algorithms that assume rate constancy (cf. Hey, 1992). (3) There had to be specieslevel size data for all the species in the clade. These size data were taken as point estimates, and there was no attempt to assign specieslevel variances to the estimate. The maximum likelihood program adapted to perform the analysis (see below) does not allow for variance at the tips (though this information could, in theory, be incorporated) and most specieslevel weight data are not reported with estimates of variance. The weight data were considered a speciesspecific trait, as in most comparative analyses (Harvey & Pagel, 1991). Where possible, the mean of male and female weights was taken; otherwise sexes were pooled. In one case (Ursidae), female body weight was considered a better trait to model than male weight because of the large amount of intraspecific variation in male body size. For the Plethodon and Desmognathus salamanders snoutvent lengths were transformed to relative weights by assuming a constant allometry among species. For the baleen whales, marine turtles, and the kodkod (an ocelot) speciesspecific allometric relationships were used (see Table I for references) to estimate body size. All body weights were logarithmically transformed prior to analysis, such that we studied changes in proportional rather than absolute body size. Twentyone trees from the literature met the criteria for inclusion. The clades ranged from three to thirteen species, including ten groups of birds, six of mammals, three of reptiles, and two of amphibians. We restricted ourselves to molecular phylogenies. We do not feel they are inherently superior, but only molecular phylogenies allow us to assign tentative branch lengths to the resulting trees, using the assumption of the molecular clock.


Table I. Maximum likelihood fits for the macroevolution of vertebrate body size under a Brownian motion process and four models.

In addition to body mass, we recorded the age of the group, estimated as the time of the earliest split, the number of species in the group, and the class of molecular data (allozymes, restriction fragments or DNA sequences). The ages were estimates, and were made using a combination of fossil dates, biogeographic information, and molecular calibrations taken from the original papers. For allozyme frequency data, Roger’s D (Rogers, 1972). distances were converted to Nei’s D (Nei, 1978) distances using an empirical calibration supplied by N. Grabovac (pers. comm.) before tree construction. While necessarily crude, this allowed ultrametric trees to be constructed for these data.

The second data set is the higher level phylogeny for two clades of birds (the Ciconiiformes and the Passeriformes, Sibley & Ahlquist, 1990). The two bird clades are fairly large (with 28 and 31 tips, respectively), but no raw data are available to reconstruct the trees for ourselves. We constrained ourselves to the same level in the tree as Nee et al. (1992) , roughly the family level, where we can be fairly confident of a complete tree (no missing lineages). The UPGMA tapestry was used, with the branch lengths (in ΔT50 H units; Sibley & Ahlquist, 1990) taken to be linearly related to time. Estimates for representative body sizes of the taxa were reconstructed by hierarchical weighting such that speciose taxa do not bias the estimate (Harvey & Mace, 1982), using weights from Blackburn & Gaston (1994) . Species were first averaged within genera, and then genera were averaged within tribes, tribes within subfamilies and subfamilies within families. Under the gradual scenario, the familylevel representative body sizes were placed at the highest split within the family, following the conventions of Mooers et al. (1994). This means that bodysize estimates made from families that radiated soon after the 10 ΔT50 H unit cutoff will be found on short terminal branches, while estimates from lateradiating families will be found on the ends of longer branches. This allows more time for change in groups whose familylevel estimates sample less elapsed time. This procedure conforms with the underlying Brownian motion process (see below) and does not bias the results towards preferring one model over another.