Phylogenetic and syntenic analyses
Twenty-one proteins were chosen for the analysis, based on their predominance (at least about 20 times greater than the average concentration of the seminal plasma proteins) from at least one species among seven domestic mammals. The abundance was based on the identification of proteins by mass spectrometry in the seminal plasma from seven species performed previously (Druart et al., 2013). In this study, seminal plasma proteins were separated by SDS PAGE and imaged after Coomassie Blue staining (Fig. 1 Supplemental data). This staining is commonly used to detect proteins of high abundance given its moderate sensitivity, and also can provide protein quantification as the intensity of staining is positively correlated to protein amount. The main bands observed after SDS PAGE and Coomassie staining were further subjected to mass spectrometry (MS) to identify their protein content. Each band contains several proteins from which the one exhibiting the maximum number of MS spectra was selected as the major protein of the band (those having at least approximately 20 times the average protein concentration). Therefore, proteins identified according to 1) high intensity staining after SDS PAGE and Coomassie staining and 2) predominant number of MS spectra, were considered quantitatively major components of the seminal plasma. Finally, RNAse10 and MFGE8 have been included in the analysis because they are specific markers of epididymal maturation in ungulates (Castella et al., 2004; Belleannée et al., 2011). Because of this, not all proteins considered by Druart et al. (2013) are considered here.
This study has sampled the genome of nine placental mammal species that have been fully sequenced (Bos taurus Linnaeus, 1758, Canis lupus familiaris Linnaeus, 1758, Equus caballus Linnaeus, 1758, Homo sapiens Linnaeus, 1758, Mus musculus Linnaeus, 1758, Oryctolagus cuniculus Linnaeus, 1758, Pan troglodytes Blumenbach, 1775, Rattus norvegicus Linnaeus, 1758 and Sus scrofa Linnaeus, 1758). We have worked on the version of Ensembl January 2013 (http://jan2013.archive.ensembl.org/index.html), on the following versions of genomes: human (GRCh37), chimpanzee (CHIMP2.1.4), mouse (GRCm38), rat (Rnor_ 5.0), rabbit (oryCun2), dog (CanFam3.1), pig (Sscrofa10.2), horse (EquCab2), and cattle (UMD3.1). We have chosen these fully sequenced species because it is possible to find pseudogenes and to test the hypothesis of gene loss.
For all identified genes, the corresponding Ensembl protein ID was retrieved from the Ensembl database and submitted to the PhyleasProg web server v2.3 (http://phyleasprog.inra.fr/) (Busset et al., 2011). All reconstructed phylogenetic trees were carefully examined before interpreting selective pressure results, eventually corrected by synteny analysis as previously described (Tian, Pascal, Fouchecourt et al., 2009), so that calculations were performed with correct orthologs.
For the comparative analyses on the relationship between protein abundance and intensity of positive selection and for evolutionary rates, we built a reference phylogeny on the nine species on which this paper focuses. The reference phylogeny (S1, 2) generally follows Murphy and Eizirik’s phylogeny for topology and divergence times (Murphy and Eizirik, 2009), with the exceptions of fairly recent divergences, like artiodactyls (Hassanin et al., 2012), hominids (Vignaud et al., 2002), and murines (Rowe et al., 2008).