A quest for LncRNA coding for functional micropeptides

  • What it is

    Mobility experience with a research focus

  • Who it’s for

    Master students involved in the final research; PhD sandwich; Post Doc

Department

Biology (Unit of Cell, Developmental and Molecular Biology)

Main research activities/topics/projects

Long noncoding RNAs (lncRNAs) represent the majority of RNA transcripts, yet, unlike mRNAs, their function is not completely understood. LncRNAs are known to play a role in regulating gene expression and shaping the architecture of the nucleus, but the range of their functions is still being explored. Their noncoding nature is hard to assess with certainty as they can – potentially - code for rare, tissue-specific peptides which are hard to detect and validate in the lab. Existing bioinformatics tools to predict coding potential are based on empirical user-defined parameters and have shown limited predictive power. This project aims to build a novel bioinformatic prediction tool using mathematical probability theory to identify lncRNAs which actually code for peptides and then validate it by detecting their peptides by mining proteomic databases.  This project will be done at the Unit of Cell and Developmental Biology, Department of Biology, University of Pisa Strada Statale dell’Abetone Brennero 4, 56123, Pisa, Italy OR from remote. This project is part of a collaborative effort between the University of Pisa, QMUL and Kingston College. The selected student can – potentially – spend 6 months abroad. For enquires or more info please contact dr Cerase.

 If you are interested you may have a look at these articles detailing the main coding prediction tools in use in the lab:

 1. Kong L, Zhang Y, Ye ZQ, Liu XQ, Zhao SQ, Wei L, Gao G. CPC: assess the protein-coding potential of transcripts using sequence features and support vector machine. Nucleic Acids Res. 2007 Jul;35 doi: 10.1093/nar/gkm391. PMID: 17631615; PMCID: PMC1933232.

2. Lin MF, Jungreis I, Kellis M. PhyloCSF: a comparative genomics method to distinguish protein coding and non-coding regions. Bioinformatics. 2011 Jul 1;27(13):i275-82. doi: 10.1093/bioinformatics/btr209. PMID: 21685081; PMCID: PMC3117341.

3. Wang L, Park HJ, Dasari S, Wang S, Kocher JP, Li W. CPAT: Coding-Potential Assessment Tool using an alignment-free logistic regression model. Nucleic Acids Res. 2013 Apr 1;41(6):e74. doi: 10.1093/nar/gkt006. Epub 2013 Jan 17. PMID: 23335781; PMCID: PMC3616698.

Special entry requirements

The project requires some experience in programming (e.g. C, R, Python), familiarity with mathematical modelling and, of course, a lot of curiosity.

Duration in months (min-max)

Master Research: 12-18

PhD sandwich: 2-12

Post Doc: 24-36

Contacts

Main scientific contact person

Doctor Andreqa Cerase

+39 3381552580

Write an e-mail

Other scientific contact person

Irene Perotti

Write an e-mail