Biological Data: Analysis, Visualisation and Prediction by Geoff Barton – Professor of Bioinformatics, College of Life Sciences, University of Dundee

Abstract: Modern biological research hinges on technologies that are able to generate very large and complex datasets. For example, recent advances in DNA sequencing technologies have led to global collections in the multi-petabyte range that are doubling every five months. These data require organising in a form that allows interpretation by a very large and diverse user community that are interested in everything from human health and disease, through crop and animal breeding to the understanding of ecosystems. In this talk I will first give an overview of core molecular biology concepts and some of the different types of data that are currently collected, I will then focus on work from my group in visualisation and analysis of sequence alignment data before turning to examples of prediction of properties and features from biological data.

Biography: Prof. Geoff Barton Professor of Bioinformatics College of Life Sciences University of Dundee, UK

Geoff Barton did his first degree in Biochemistry at the University of Manchester. He then performed Ph.D. research supervised by Mike Sternberg in the Department of Crystallography, Birkbeck College, University of London before spending two years as an ICRF Fellow working with Chris Rawlings at the Imperial Cancer Research Fund Labs. in London. In 1989 he was awarded a Royal Society University Research Fellowship to set up his own group in the Lab of Molecular Biophysics, University of Oxford. From April 1995 until October 1997, Geoff was also Head of Genome Informatics at the Wellcome Trust Centre for Human Genetics., University of Oxford. From 1st October 1997-July 2001 Geoff was a Research and Development Team Leader at the EMBL European Bioinformatics Institute (EBI), Cambridge. From 1st January 1998-July 2001 Geoff was also head of the European Macromolecular Structure Database at EBI which is now known as the Protein Data Bank in Europe (PDBe).
Geoff has been Professor of Bioinformatics at the University of Dundee, College of Life Sciences since 2001 and is co-director of the Post-Genomics Centre and Head of the Data Analysis Group. He has more than 20 years’ experience in bioinformatics research and has published 103 refereed papers which have attracted over 8,400 citations. His research centres on developing computational techniques for biological sequence and structure analysis as well as applying those techniques to systems of interest in collaboration with experimentalists. Since 2008 he has expanded his research to the analysis of novel small RNAs [1-4] and the application of deep sequencing techniques to a number of problems [5-8]. Recently, he has established a strong BBSRC-funded collaboration with Dr Gordon Simpson to perform genome-wide studies by deep sequencing of alternative polyadenylation and non-coding RNAs in the plant Arabidopsis thaliana.

The majority of Geoff’s tools are distributed in the form of downloadable program packages, web-accessible systems or databases and are widely used by the community. For example, the ALSCRIPT program [9] for visualisation of multiple alignments as PostScript graphics has been cited 1,028 times, while the Jalview multiple alignment visualisation, editor and analysis workbench, is installed on >20,000 computers world-wide, and is mentioned on over 100,000 web pages. The two papers describing Jalview have attracted over 1,100 citations with the most recent [10] being identified by ISI as a “hot paper”. Jalview is exploited by many websites including major databases such as Pfam/Rfam and the EBI services. Geoff’s group have contributed a number of techniques for protein structure prediction of which the JPred server [11, 12] is the most widely appreciated, running up to 95,000 predictions monthly for scientists in 140 countries. His group also developed the TarO sequence analysis pipeline [13] which is aimed at target selection and optimisation for structural proteomics. TarO takes a sequence, runs a comprehensive range of analysis and database search programs and then combines the results into a set of ranked tables and as an annotated alignment viewed by Jalview. TarO ranks results by applying techniques developed by his group for predicting a protein’s crystallisability [14-16]. Recently, Geoff’s group has developed novel techniques for the prediction of protein-protein interactions [17, 18] and the first predictor of protein nucleolar localisation signals [19] which was a “feature” paper in the journal Nucleic Acids Research.

Geoff is a Fellow of the Society of Biology and an honorary Fellow of the James Hutton Institute.

References

1. Ono, M., M.S. Scott, K. Yamada, F. Avolio, G.J. Barton, and A.I. Lamond, Identification of human miRNA precursors that resemble box C/D snoRNAs. Nucleic Acids Res, 2011. 39(9): p. 3879-91.
2. Ono, M., K. Yamada, F. Avolio, M.S. Scott, S. van Koningsbruggen, G.J. Barton, and A.I. Lamond, Analysis of human small nucleolar RNAs (snoRNA) and the development of snoRNA modulator of gene expression vectors. Mol Biol Cell, 2010. 21(9): p. 1569-84.
3. Scott, M.S., F. Avolio, M. Ono, A.I. Lamond, and G.J. Barton, Human miRNA precursors with box H/ACA snoRNA features. PLoS Comput Biol, 2009. 5(9): p. e1000507.
4. Cole, C., A. Sobala, C. Lu, S.R. Thatcher, A. Bowman, J.W. Brown, P.J. Green, G.J. Barton, and G. Hutvagner, Filtering of deep sequencing data reveals the existence of abundant Dicer-dependent small RNAs derived from tRNAs. RNA, 2009. 15(12): p. 2147-60.
5. Gkikopoulos, T., V. Singh, K. Tsui, S. Awad, M.J. Renshaw, P. Scholfield, G.J. Barton, C. Nislow, T.U. Tanaka, and T. Owen-Hughes, The SWI/SNF complex acts to constrain distribution of the centromeric histone variant Cse4. EMBO J, 2011. 30(10): p. 1919-27.
6. Gkikopoulos, T., P. Schofield, V. Singh, M. Pinskaya, J. Mellor, M. Smolle, J.L. Workman, G.J. Barton, and T. Owen-Hughes, A Role for Snf2-Related Nucleosome-Spacing Enzymes in Genome-Wide Nucleosome Organization. Science, 2011. 333(6050): p. 1758-1760.
7. van Koningsbruggen, S., M. Gierlinski, P. Schofield, D. Martin, G.J. Barton, Y. Ariyurek, J.T. den Dunnen, and A.I. Lamond, High-resolution whole-genome sequencing reveals that specific chromatin domains from most human chromosomes associate with nucleoli. Mol Biol Cell, 2010. 21(21): p. 3735-48.
8. Remenyi, J., C.J. Hunter, C. Cole, H. Ando, S. Impey, C.E. Monk, K.J. Martin, G.J. Barton, G. Hutvagner, and J.S. Arthur, Regulation of the miR-212/132 locus by MSK1 and CREB in response to neurotrophins. Biochem J, 2010. 428(2): p. 281-91.
9. Barton, G.J., ALSCRIPT: a tool to format multiple sequence alignments. Protein Eng, 1993. 6(1): p. 37-40.
10. Waterhouse, A.M., J.B. Procter, D.M. Martin, M. Clamp, and G.J. Barton, Jalview Version 2–a multiple sequence alignment editor and analysis workbench. Bioinformatics, 2009. 25(9): p. 1189-91.
11. Cole, C., J.D. Barber, and G.J. Barton, The Jpred 3 secondary structure prediction server. Nucleic Acids Res, 2008. 36(Web Server issue): p. W197-201.
12. Cuff, J.A. and G.J. Barton, Application of multiple sequence alignment profiles to improve protein secondary structure prediction. Proteins, 2000. 40(3): p. 502-11.
13. Overton, I.M., C.A. van Niekerk, L.G. Carter, A. Dawson, D.M. Martin, S. Cameron, S.A. McMahon, M.F. White, W.N. Hunter, J.H. Naismith, and G.J. Barton, TarO: a target optimisation system for structural biology. Nucleic Acids Res, 2008. 36(Web Server issue): p. W190-6.
14. Overton, I.M., C.A. van Niekerk, and G.J. Barton, XANNpred: neural nets that predict the propensity of a protein to yield diffraction-quality crystals. Proteins, 2011. 79(4): p. 1027-33.
15. Overton, I.M., G. Padovani, M.A. Girolami, and G.J. Barton, ParCrys: a Parzen window density estimation approach to protein crystallization propensity prediction. Bioinformatics, 2008. 24(7): p. 901-7.
16. Overton, I.M. and G.J. Barton, A normalised scale for structural genomics target ranking: the OB-Score. FEBS Lett, 2006. 580(16): p. 4005-9.
17. McDowall, M.D., M.S. Scott, and G.J. Barton, PIPs: human protein-protein interaction prediction database. Nucleic Acids Res, 2009. 37(Database issue): p. D651-6.
18. Scott, M.S. and G.J. Barton, Probabilistic prediction and ranking of human protein-protein interactions. BMC Bioinformatics, 2007. 8: p. 239.
19. Scott, M.S., F.M. Boisvert, M.D. McDowall, A.I. Lamond, and G.J. Barton, Characterization and prediction of protein nucleolar localization sequences. Nucleic Acids Res, 2010. 38(21): p. 7388-99.

Event details

  • When: 14th November 2011 14:00 - 15:00
  • Where: Phys Theatre C
  • Series: CS Colloquia Series
  • Format: Colloquium