Institutions: UMR 9189 CRIStAL, bonsai team and UMR 8198 Evo-Eco-Paleo (EEP) at CNRS/Lille University
Project title: Algorithms, bioinformatics and evolution for paleoproteomics
Project funding: Paleo-Info-Evo (CNRS 80|PRIME project)
Promotorship: Hélène Touzet (DR CNRS, bioinformatics), Céline Poux (Associate Professor, Ulille, evolution)
Email contact: helene.touzet|at|univ-lille.fr, celine.poux|at|univ-lille.fr
Start date and duration: October 2023 for 3 years.
We are pleased to announce a PhD fellowship for a highly motivated, enthusiastic and independent person with a strong interest in the development of bioinformatics algorithms to improve the analysis of proteomic data in paleotonlogical studies. Background knowledge in all or some of these fields are required: evolutionary biology, phylogenetics, sequence analyses and python programing.
Project description
In recent years, the analysis of ancient biological samples has changed our understanding of the evolution of life on Earth, renewing the approaches previously used in paleontology based on the study of fossils or carbon-14 dating. At the forefront of new molecular techniques is paleogenomics (sequencing of ancient DNA), although DNA degrades relatively quickly. More recently, paleoproteomics via ZooArchaeology by mass spectrometry (ZooMS) offers a possibility to identify morphologically ambiguous or unidentifiable bone fragments from bone assemblages. Identification of bones with ZooMS results from the sequencing of a target protein, such as collagen, which is abundant in bone fragments. The collagen present in the samples is digested and the mass of the peptides obtained by spectrometry gives indirect information on the amino acid sequence of the protein studied. To exploit this data, the community works with marker peptides, which serve as a sort of molecular barcode for taxonomic assignment. But the use of these marker peptides suffers from two limitations: it remains manual and it neglects the evolutionary dimension of the data. There is therefore a real need to formalize and automate the methods in order to obtain robust and reproducible assignments, even on a large scale. This raises multiple questions:
- How can the marker peptide approach be generalized towards the combination of marker peptides or consensus marker peptides to take full advantage of the phylogenetic signal contained in the data?
- How to infer marker peptides at different taxonomic levels ?
- How to measure the phylogenetic signal contained in the target protein and its peptides ?
- How to reconstruct ancestral protein sequences from spectra and contemporary sequences to enrich contemporary data sets ?
The methods developed will combine sequence algorithmic approaches and a probabilistic framework using protein sequence evolution models to reconstruct phylogenetic trees and ancestral sequences. The expected results are twofold: to develop a toolbox for data analysis, and to propose a methodological framework for an informed use of marker peptides in ZooMS.
Setting and requirements
The project is funded by the CNRS 80|PRIME initiative and will be developed in an inter-institutional and interdisciplinary collaboration between the UMR CRIStAL and UMR EEP of the CNRS and the University of Lille. Furthermore, this project is realized in close collaboration with Fabrice Bray (MSAP) in charge of the ZooMS platform in Lille, and Patrick Auguste (palaeontologist, EEP). Master students that are graduating over the summer are welcome to apply. More information on studying at Lille University can be found on the Lille University webpage: https://www.univ-lille.fr/home/international-student.
Profile of the candidate
- Master’s degree in a relevant field: bioinformatics and/or evolution (sequence analysis, Python programming, phylogenetics)
- Eager to acquire new competences and knowledge in proteomics, evolution and/or bioinformatics depending on the candidates’ background
- Ability to work in an interdisciplinary and collaborative environment (independency, reliability, integrity)
- Ability to write clear scientific reports and disseminate results
- Have good non-academic attributes (e.g. maturity, open-mindedness, respectfulness)
Interested?
To apply for this position, please send, to both of the email addresses indicated above, the following information: a complete CV including grades obtained during the Master program; a letter of motivation that also briefly outlines past research accomplishments and future goals; the name and contact information of a previous project supervisor (bachelor or master thesis). Informal inquiries regarding this vacancy can be sent as well to these two email adresses.
Bibliographical references
- Age estimates for hominin fossils and the onset of the Upper Palaeolithic at Denisova Cave. Nature, 2019
- Species identification of ancient Lithuanian fish remains using collagen fingerprinting. Journal of Archaeological Science, 2018
- Distinguishing African bovids using Zooarchaeology by Mass Spectrometry (ZooMS): New peptide markers and insights into Iron Age economies in Zambia. Plos One, 2021
- Extinct species identification from late middle Pleistocene and earlier Upper Pleistocene bone fragments and tools not recognizable from their osteomorphological study by an enhanced proteomics protocol. Archeometry, 2022
- compareMS2 2.0: An Improved Software for Comparing Tandem Mass Spectrometry Datasets. Journal of Proteomics, 2023
- Semi-supervised machine learning for automated species identification by collagen peptide mass fingerprinting. BMC Bioinformatics, 2018
- An experimental phylogeny to benchmark ancestral sequence reconstruction. Nature Communications, 2016
- Beyond mass spectrometry, the next step in proteomics. Sciences Advances, 2020