Mode of evolution and protein abundance in seminal fluidnext section
Data from several previous studies that have identified the most abundant proteins in the seminal fluid of domestic animals allow testing hypotheses about the evolution of relevant genes. Our recent study that showed a particularly high proteome diversity of seminal fluid between species suggested this diversity was potentially associated with attributes of male reproductive physiology (Druart et al., 2013). Our negative results concerning the possible correlation between the abundance of proteins in the seminal fluid and the presence of positive selection in the gene encoding it in the same species (obtained by multivariate phylogenetic pairwise comparisons) should be viewed with caution because of the low power of our test, itself resulting from the low number of genes, taxa, and the limited variability of the relevant characters in our dataset. Nevertheless, our results do not lend any support to the hypothesis that both characters are positively correlated. The apparent absence of correlation between the predominance of a protein in seminal fluid in one species and its evolution under positive selection, which is confirmed by visual inspection of the data (Figs 5-6), is compatible with the ‘translational robustness hypothesis’ proposed before (Drummond et al., 2005). According to this hypothesis, genes with high expression evolve slowly, which avoids protein misfolding.
Diversification of proteins in seminal fluid
The present work suggests that the high diversity of proteins present in seminal fluid of mammals is associated with a species-specific evolutionary pattern of the corresponding genes by fairly frequent pseudogenisation, high expression diversity, and positive selection. Pseudogenisation has been previously demonstrated for TGM4 and semenogelin genes in some ape species (Jensen-Seaman and Li, 2003). We also have previously shown that TGM4 has also been lost in cattle, horse, dog, and likely several other mammalian species (Tian, Pascal, Fouchécourt, et al., 2009) and that the ortholog of porcine Sal1 and Major allergen Equine C1 Precursor has also been lost in human as well as in the Neanderthal genome (Meslin et al., 2011). It is difficult to determine if the pseudogenisation rate of 0.00048 events/lineage/gene/Ma is especially high because such rates have seldom been reported in the literature. We reported before, from a different set of 69 genes and a different taxonomic sample, rates ranging from 0 (in teleosts) to 0.016 (in eutherians) (Meslin et al., 2012). The latter value, to be meaningfully compared with our rates, has to be converted into a rate per gene, which gives about 0.00023 events/lineage/gene/Ma for eutherians. Given that our sample is also composed of eutherians, the 20 genes studied here appear to have undergone more pseudogenisation than most of the 69 genes studied previously (Meslin et al., 2012).
A few examples illustrate how this diversity appeared. The gene encoding KLK2, a kallikrein expressed in the prostate in humans, was previously shown to be lost in several primates (Gorilla gorilla Savage 1847, Papio anubis Lesson 1827, (Marques et al., 2012)), and under positive selection in others, as were two other genes encoding the proteases ACPP and TGM4 (Clark and Swanson, 2005). In the present study, we found that KLK2 has been lost in cattle, horse, and mouse (i.e. we found traces of pseudogenes), probably independently because close relatives of these taxa retain this gene. The situation of the TGM4 gene is different. It is present in birds, squamates, platypus, several primates and at least three rodents, but is absent in all sampled laurasiatherians. Thus, it may have been lost before the appearance of Laurasiatheria. Interestingly KLK1, a paralog of KLK2, a major protein found in equine seminal fluid (named KLK1E2), is under positive selection in cattle, horse, mouse as well as in human. KLK1 seems to be mainly expressed in the kidney, the pancreas and the salivary glands in the mouse (http://www.ncbi.nlm.nih.gov/UniGene/clust.cgi?ORG=Hs&CID=123107&MAXEST=94), so some investigations are needed to confirm its presence in the seminal fluid of other species (except horse). Nevertheless, this suggests that at least in horse, KLK1 may replace KLK2 for an important biological function in the seminal fluid.
Position and function of amino acids under positive selection
Our data suggest that amino acids under positive selection are more often exposed to the solvent than expected by chance. However, this result should be viewed with caution because it is based on a relatively low number of amino acids, and this result reflects the data of only some of the sampled proteins; for others, we could not reject the null hypothesis. Some studies have led to similar conclusions, on an ad hoc basis, but without addressing this issue at a large scale. For instance, amino acids under positive selection were identified in Toll-like receptor (Fornůsková et al., 2013), which are likely involved in species-specific recognition of lipopolysaccharide of gram-negative bacteria. In the particular case of our medium-scale study, more data on the position of amino acids and of associated 3D structures will need to be gathered to reach firm conclusions on this point. Our new results about the position of amino acids suggest that positive selection affected preferentially amino acids involved in interactions with partners rather than other functions of the proteins, as most such amino acids are located at the surface rather than in the vicinity of an eventual enzymatic pocket.
Ultimately, we have confirmed here that MFGE8 was under positive selection in the dog and in the human/chimpanzee clade. This protein binds to the zona pellucida of unfertilised (but not fertilised) oocytes, because recombinant protein or specific antibody raised against MFGE8 competitively inhibit sperm-egg interaction. For this protein, positioning the positively selected amino acids on a 3D model was particularly informative. In particular, ten sites under positive selection (92R, 94T, 149L, 152H, 214T, 259L, 279V, 281G, 285N, 312S) are located within or in the vicinity of the three β-hairpin loops also called ‘spikes’, which allow interaction with phospholipids and between membranes (Rodrigues et al., 2013). Interestingly, among the whole family of F5/8 type C domains, there is a particularly high variability in the domain interfaces. One can then hypothesise that the position of the amino acids under positive selection on a same platform, displayed by the alignment of the spikes, provides support to the hypothesis that both F5/8 type C domains participate in a species-dependent function of MFGE8. More generally and as observed in our previous work on the evolution of genes encoding Odorant Binding Proteins and proteins involved in gamete fertilisation, amino acids under positive selection are located almost always at the surface of the proteins rather than in the vicinity of the enzymatic pocket or other functional domain (Meslin et al., 2011, 2012). This suggests that this evolution is driven by species-dependent interaction with partners, as described for example for the positively selected sites on the surface glycoprotein (G) of infectious hematopoietic necrosis virus (LaPatra et al., 2008).