To refer to this article use this url: http://contributionstozoology.nl/vol68/nr01/a01

# Methods

#### The models

next section

Underlying all four models is the assumption that changes in body size can be modeled by a continuous random walk (Brownian motion). Under this process, change is continuous, reversals in direction are frequent, the expected rate of change is the same at all stages and in all lineages, and evolution is unbounded (Felsenstein, 1985). Different theories of biological evolution assume different relationships between the amount of time that passes and the amount of evolutionary change that occurs. The gradual model (Fig. 1A) is the only one that conforms precisely to Brownian motion: the squared differences in body size between any two species should be proportional to the time separating them. For example, closelyrelated species should have similar sizes. This is the pattern expected under conditions of evolution by genetic drift (Lande, 1976; Lynch, 1990). However, natural selection. can lead to a similar outcome if selection pressures vary unpredictably (Felsenstein, 1981).

The other three models are made to correspond to a Brownian motion process by adjusting relative branch lengths on the phylogenetic tree, essentially manipulating time. Under the speciational model (Fig. 1B), the squared differences between speciesspecific body size should be proportional to the number of speciation events that separate them, irrespective of the time elapsed. Under this view (for examples, see Rohlf et al., 1990; Harvey & Purvis, 1991: box 2), change in trait values occurs rapidly at, or shortly after, speciation, and stasis follows until the next speciation event. This pattern is associated with the view that speciation is somehow necessary for change to occur (Eldredge & Gould, 1972; Gould & Eldredge, 1993) and may arise for many reasons – for example, if speciation is associated with niche shifts, which may happen commonly in adaptive radiations. This model is a common default setting for trees used in comparative studies when there is no branch length information.

Under the third model (the pitchfork model, Fig. 1C) squared differences are unrelated to time or speciation events. Closelyrelated species are no more likely to be similar in size than any two species picked at random. We can test it using the Brownian motion process by setting all internode distances to zero, and setting all terminal branches to unit length, creating a star, or pitchfork phylogeny. Success of this scenario would suggest that phylogenetic history is of no importance to the evolution of body size. Another interpretation of this model is that estimated topology used is very wrong: removing any internal structure (making a pitchfork)might then actually be a better representation of the true phylogeny than that used in the other models.

The final model tested (the free model, Fig. 1D) is qualitatively different from the other three, and can be viewed as the null model. Here the topology of the tree is kept fixed, but we allow each branch length to vary freely until the most likely set of branch lengths is found, given the body sizes of all the species in the clade. This set of branch lengths produces the best fit to the Brownian motion process, effectively allowing body size to evolve at any number of rates. The length of each branch is the best descriptor of how size differences among species actually accumulated in that interval. A significant improvement under this model would suggest that the simpler models presented do not reflect the true pattern of bodysize macroevolution. A drawback of this fourth model is that it is too unconstrained: it is hard to imagine that every branch (internode) requires a brandnew parameter governing change. Thus the degrees of freedom (number of parameters to estimate) are inflated, making simpler models harder to reject.

##### Data collection

The first data set comprised a group of specieslevel molecular phylogenies of vertebrates obtained from the literature. Candidate phylogenies had to meet three criteria: (1) They had to include at least N1 of the N known species of the ingroup (“complete phylogenies”, sensu Mooers, 1995). This requirement greatly restricts the pool of available trees but is necessary when considering the speciational model of morphological evolution, where morphological change is concentrated at speciation events. We consider the effects of nonrandom extinction in the discussion. (2) Data sufficient to reconstruct the author’s tree using their algorithm had to be included. Most phylogenies were reconstructed from distance data (genetic distances from allozymes (Nei, 1978; Rogers, 1972), pvalues based on RFLP data (Nei & Li, 1979), DNA-DNA hybridization distances (see Sibley & Ahlquist, 1990)), or aligned gene sequence data and commonly used models of base substitution (cf. PHYLIP 3.5c; Felsenstein, 1995). We reconstructed each phylogeny using PHYLIP (Felsenstein, 1993; 1995) and algorithms that assume rate constancy (cf. Hey, 1992). (3) There had to be specieslevel size data for all the species in the clade. These size data were taken as point estimates, and there was no attempt to assign specieslevel variances to the estimate. The maximum likelihood program adapted to perform the analysis (see below) does not allow for variance at the tips (though this information could, in theory, be incorporated) and most specieslevel weight data are not reported with estimates of variance. The weight data were considered a speciesspecific trait, as in most comparative analyses (Harvey & Pagel, 1991). Where possible, the mean of male and female weights was taken; otherwise sexes were pooled. In one case (Ursidae), female body weight was considered a better trait to model than male weight because of the large amount of intraspecific variation in male body size. For the *Plethodon*
and *Desmognathus*
salamanders snoutvent lengths were transformed to relative weights by assuming a constant allometry among species. For the baleen whales, marine turtles, and the kodkod (an ocelot) speciesspecific allometric relationships were used (see Table I for references) to estimate body size. All body weights were logarithmically transformed prior to analysis, such that we studied changes in proportional rather than absolute body size. Twentyone trees from the literature met the criteria for inclusion. The clades ranged from three to thirteen species, including ten groups of birds, six of mammals, three of reptiles, and two of amphibians. We restricted ourselves to molecular phylogenies. We do not feel they are inherently superior, but only molecular phylogenies allow us to assign tentative branch lengths to the resulting trees, using the assumption of the molecular clock.

Table I. Maximum likelihood fits for the macroevolution of vertebrate body size under a Brownian motion process and four models. |

In addition to body mass, we recorded the age of the group, estimated as the time of the earliest split, the number of species in the group, and the class of molecular data (allozymes, restriction fragments or DNA sequences). The ages were estimates, and were made using a combination of fossil dates, biogeographic information, and molecular calibrations taken from the original papers. For allozyme frequency data, Roger’s D (Rogers, 1972). distances were converted to Nei’s D (Nei, 1978) distances using an empirical calibration supplied by N. Grabovac (pers. comm.) before tree construction. While necessarily crude, this allowed ultrametric trees to be constructed for these data.

The second data set is the higher level phylogeny for two clades of birds (the Ciconiiformes and the Passeriformes, Sibley & Ahlquist, 1990). The two bird clades are fairly large (with 28 and 31 tips, respectively), but no raw data are available to reconstruct the trees for ourselves. We constrained ourselves to the same level in the tree as Nee et al. (1992) , roughly the family level, where we can be fairly confident of a complete tree (no missing lineages). The UPGMA tapestry was used, with the branch lengths (in ΔT50 H units; Sibley & Ahlquist, 1990) taken to be linearly related to time. Estimates for representative body sizes of the taxa were reconstructed by hierarchical weighting such that speciose taxa do not bias the estimate (Harvey & Mace, 1982), using weights from Blackburn & Gaston (1994) . Species were first averaged within genera, and then genera were averaged within tribes, tribes within subfamilies and subfamilies within families. Under the gradual scenario, the familylevel representative body sizes were placed at the highest split within the family, following the conventions of Mooers et al. (1994). This means that bodysize estimates made from families that radiated soon after the 10 ΔT50 H unit cutoff will be found on short terminal branches, while estimates from lateradiating families will be found on the ends of longer branches. This allows more time for change in groups whose familylevel estimates sample less elapsed time. This procedure conforms with the underlying Brownian motion process (see below) and does not bias the results towards preferring one model over another.

##### Analysis

The goal of the analysis was to determine, for each clade, which model of character change best described the evolution of speciesspecific body size. To do this we calculated the goodness of fit of each model to the body sizes of the taxa at the tips of the tree. Goodness of fit was measured as the logarithm of the likelihood of each model given the data (body sizes) under the Brownian motion process of character change. Better fits between the data and a model for bodysize evolution will have higher log(likelihood) scores.

The fits of alternative models were compared using the differences in their corresponding log(likelihoods). The first three models tested have the same number of estimated parameters. For the gradual model, there is only one rate. For the speciational and pitchfork models, we assume that the traits evolve at different rates on different branches (e.g. at rate = 0 for the zero length branches of the pitchfork model and at different rates for the unit length branches). However, because we set the branch lengths *a priori *(ignoring the actual inferred time between nodes), only one rate parameter is estimated. The other rates are proportional to it: we could estimate the actual rate for any branch on the tree by dividing the single estimated rate by the branch length taken from the gradual tree. This rate is uninteresting, however: for the speciational model, it would be the average of the fast rate at speciation and zero for the duration of that lineage’s existence. Given that these three models are equally parameterized, any two are deemed significantly different when the difference in their log(likelihoods) is 2.0, corresponding to approximately a sevenfold difference in their likelihoods (Edwards, 1992: 180ff). Risch (1992) would prefer a difference of 3.0 before discriminating among models (corresponding to a 20-fold difference in likelihoods).

The fourth model is qualitatively different. The species’ body sizes are used to optimize the branch lengths, which are unconstrained by *a priori *hypotheses, and so the model has many more parameters than the first three. The number of extra parameters is 2N4 for a tree having N species (the number of branches in an unrooted tree –1). This allows us to judge whether the free model is significantly better than any of the alternatives by comparing twice the improvement in the log(likelihood) to a chisquare distribution with 2N4 degrees of freedom (see Goldman, 1993).

Given that the actual topology and number of tips affects the fit of any given data set to a tree, we cannot compare the values of the same model for different clades; for example we cannot state that the gradual model for one group of birds fits *n *times better than it does for some other group (Goldman, 1993). However, individual trees are independent and the log(likelihoods) for a given model may be summed across the full data set of twentyone trees to yield an overall measure of the goodness of fit of each model. Because the differences between the fits of different models are relevant, but not their absolute value, we scaled the log(likelihoods) so that the worstfit model for any data set was designated zero.

All analyses were performed on a modified version of the CONTML program of PHYLIP (version 3.0, Felsenstein, 1993). The modification allowed CONTML to return a log(likelihood) for a data set that was independent of the scale of measurement of body size and total tree length. The modification is available from the authors upon request. The Brownian motion model does not require directionality (i.e., rootedness) in the tree, and in order to use CONTML, the rooted trees in Fig. 1 were made unrooted before analysis. This does not bias the results in favour of any specific model.