Tag Archive | computational biology

See you soon, Computational Oncogenomics lab!

Núria López-Bigas started her lab on Computational Oncogenomics at the GRIB, within the PRBB, ten years ago. After a very successful decade, we are sad to see her leaving. We wish her all the best in her lab’s new adventure, and we hope the very fruitful interactions she has started with the different groups at the park will continue to prosper.

In her last post on her blog, Núria says thanks to the GRIB, the UPF, the PRBB community and the PRBB Intervals programme… We want to say, thanks to you Núria, for the great research you have done and for being such an open, collaborative and supportive person, both within the scientific community at the park and with outreach events for the general public! You will be missed. Good luck and see you soon!

PhD Thesis from the Computational Oncogenomics group at the PRBB during the last 10 years

PhD Thesis from the Computational Oncogenomics group at the PRBB during the last 10 years

In-silico selection of targeted anti-cancer therapies

The Biomedical Genomics group led by Núria López-Bigas at the Pompeu Fabra Unviersity have recently published a paper in Cancer Cell describing the landscape of anti-cancer targeted therapeutic opportunities across a cohort of patients of twenty eight of the most prevalent cancers. They first looked for all the driver mutations (mutations that ’cause’ the cancer) for each individual cancer, then collected information on all the existing therapeutic agents that target those mutations, and finally, combining both datasets, came up with anti-cancer targeted drugs that could potentially benefit each patient. You can read more about this paper on their blog post.

Coinciding with the publication of that paper, the lab has crafted a new IntOGen interface which presents the results of this analysis. You can see it and learn more about it here.

Improving the prediction of cancer causing mutations

Cancer is generally caused by a combination of many specific mutations, called drivers. But cancer cells contain many other mutations that are not the cause of the cancer, but rather a consequence (passenger mutations). Also, high-throughput genome projects are identifying a huge number of somatic variants. Which ones are cancer-causing? How to distinguish the needle in the haystack?

A new computational method recently published in Genome Medicine by the research group led by Núria López-Bigas at the GRIB (UPF-IMIM), can help. Called transformed Functional Impact Score for Cancer (transFIC), it improves the assessment of the functional impact of tumor nonsynonymous single nucleotide variants (nsSNVs) by taking into account the baseline tolerance of genes to functional variants.

https://i2.wp.com/bg.upf.edu/transfic/images/MCC_histogram.pngOther methods predicting the functional impact of cancer-causing somatic variants employ evolutionary information to assess the likely impact of an amino acid change on the structure or function of the altered protein. However, according to the authors, the ultimate effect of this amino acid change on the functioning of a cell depends on other factors as well, such as the particular role played by the altered protein in the cellular machinery. The more critical that role is, the less tolerant will the protein be to an amino acid change.

Their new method takes this feature into consideration, and has been shown to outperform previous ones. They distribute their new tool as a PERL script that users can download and use locally, and they have set up a web server which can be queried to obtain the transFIC of somatic cancer nsSNVs.


Gonzalez-Perez A, Deu-Pons J, Lopez-Bigas N. Improving the prediction of the functional impact of cancer mutations by baseline tolerance transformation. Genome Med. 2012 Nov 26;4(11):89

Finding the genes underlying complex genetic diseases

Complex genetic disorders often involve multiple proteins interacting with each other, and pinpointing which of them are actually important for the disease is still challenging. Many computational approaches exploiting interaction network topology have been successfully applied to prioritize which individual genes may be involved in diseases, based on their proximity to known disease genes in the network.

In a paper published in PLoS OneBaldo Oliva, head of the Structural bioinformatics group at the GRIB (UPFIMIM)  and Emre Guney, have presented GUILD (Genes Underlying Inheritance Linked Disorders), a new genome-wide network-based prioritization framework. GUILD includes four novel algorithms that use protein-protein interaction data to predict gene-phenotype associations at genome-wide scale, and the authors have proved that they are comparable, or outperform, several known state-of-the-art similar approaches.

Alzheimer’s disease-associated top-scored proteins and their interactions

As a proof of principle, the authors have used GUILD to investigate top-ranking genes in Alzheimer’s disease (AD), diabetes and AIDS using disease-gene associations from various sources.

GUILD is freely available for download at http://sbi.imim.es/GUILD.php


Guney E, Oliva B. Exploiting Protein-Protein Interaction Networks for Genome-Wide Disease-Gene Prioritization. PLoS One. 2012;7(9):e43557

“I’m working at what I’d always dreamed of” – Manuel Pastor, researcher on drug design

An interview published in Ellipse, the monthly magazine of the PRBB.


Manuel Pastor, 45 and from Madrid, studied pharmacy at the University of Alcalà de Henares (Madrid), and after doing his PhD in the organic chemistry department went to Perugia in Italy for his postdoc. Self-taught computer expert and passionate about reading and the cinema, Pastor fell in love with medicines when he was little. Years later he has realised his dream as head of the research group for computer aided drug design at the GRIB (IMIM-UPF).

When did you hear the call to science? 
I’ve been passionate about medicines since I was small. When I was 5 my brothers used to read Spiderman comics and I remember clearly that I was intrigued by the hero, a scientist, who created compounds that could reverse the effects of the bad guys’ poisons. I wondered how these compounds worked in the body. Straight away I started to say that I wanted to work in research. My friends just laughed and asked me if my parents were rich. “No? Then forget it!” I come from a humble family, but with a bit of luck and dedication I’ve turned out OK.

So you owe your vocation to Spiderman? 
I think it is a fascinating comic. Today there are so few cases where the hero is a student, an intellectual and committed person, with values.

How was your postdoc in Italy? 
It was 1994, when Internet was just starting to become known about and used. There was everything to do. When I arrived in Italy I installed the first browser on the lab computers; they hadn’t even realised that they could access information and databases via the Internet!

Are you not a pharmacist then? 
Yes, but I learnt to program during my doctorate. In fact, toward the end of my thesis I created one of the first computer networks in Spain – and when I say created I mean I physically joined the cables from one computer to another! After the postdoc in Italy I went back to Madrid for a year and a half, but it was a bad time, much worse than now, and there was no work anywhere. I went back to Perugia where I was offered a job in a scientific software company (Multivariate  Infometric Analysis). I worked there for three years as head scientist.

And finally you came back… 
It was 1999 and this time Ferran Sanz contacted me because they needed biostatistics teachers that also fit in with the research areas of the IMIM. Mostly the teaching attracted me, I’ve always liked it. For me giving classes is not secondary, it is important.

What differences have you found between the worlds of academia and industry? 
They have different objectives and methods. I think the important thing is to find out which the good things are from each and try to apply them. My group is just at the interface between these two worlds.

And what are the best bits of each world? 
From the company world I like the clear objectives and practical results. I don’t have anything against basic research, but I want to do research that impacts the world I live in. In academia there is more creative freedom and more contact with young people, researchers just starting out. This is especially enriching because having to explain things lots of times means that in the end you understand them better yourself!

The next international conference on computational molecular biology, in April in Barcelona

RECOMB 2012, the 16th Annual International Conference on Research in Computational Molecular Biology, will take place in Barcelona on April 21-24, 2012. It is being organised by Roderic Guigó, from the CRG. Check out this video where he presents the meeting.

The meeting will focus on the computational challenges arising from the extraordinary developments in high throughput technologies. You can check the updates on the speakers and the program on the conference website.

As the organisers point out, the meeting overlaps with Sant Jordi (Saint George), on April 23, the patron of Catalonia, and one of the most important civic holidays in the country. The city fills with the smell of red roses, and there are books sold in every corner.

So, make sure you don’t miss this opportunity to enjoy the best mix of science and culture. Register now to Recomb 2012!

Protein coding genes exhibit low splicing variability within populations


Despite all having the same DNA content, each cell is different. The phenotypic differences observed between cells depend on the differences in the RNA transcript content of the cell. And this variability of transcript abundance is the result of gene expression variability, which has been studied for many years and is usually measured using DNA arrays, but also of alternative splicing variability. Indeed, changes in splicing ratios, even without changes in overall gene expression, can have important phenotypic effects. However, little is known about the variability of alternative splicing amongst individuals and populations.

Taking advantage of the popular use of RNA-seq (or “Whole Transcriptome Shotgun Sequencing”), a technique that sequences cDNA in order to get information about a sample’s RNA content, a team of researchers at the CRG have recently published in Genome Research a statistical methodology to measure variability in splicing ratios between different conditions. They have applied this methodology to estimates of transcript abundances obtained from RNA-seq experiments in lymphoblastoid cells from Caucasian and Yoruban (Nigerian) individuals.

Their results show that protein coding genes exhibit low splicing variability within populations, with many genes exhibiting constant ratios across individuals. Genes involved in the regulation of splicing showed lower expression variability than the average, while transcripts with RNA binding functions, such as long non coding RNAs, showed higher expression variability. The authors also found that up to 10% of the studied protein coding genes exhibit population-specific splicing ratios and that variability in splicing is uncommon without variability in transcription.

Even as they accept the limitations of their work (e.g. RNA-seq is still very new and not completely understood, and the data in which they base their analysis belongs to the first and only human RNA-seq studies published so far), the authors conclude that “given the low variability in the expression of protein coding genes, phenotypic differences between individuals in human populations are unlikely to be due to the turning on and off of entire sets of genes, not to dramatic changes in their expression levels, but rather to modulated changes in transcript abundances”.

The researchers, led by Roderic Guigó, present in the same paper a new methodology to find out the relative contribution of gene expression and splicing variability to the overall transcript variability. They estimated that about 60% of the total variability observed in the abundance of transcript isoforms can be explained by variability in transcription, and that a large fraction of the remaining variability can likely result from variability in splicing.

Guigó, last author of this paper, has recently received an ERC Advanced Grant, the most prestigious given to scientific projects in Europe, in the category of Physical Sciences and Engineering. The 2 M € awarded over five years will allow his team to carry out the study of RNA using massively parallel sequencing techniques.

Gonzalez-Porta M, Calvo M, Sammeth M, Guigo R. Estimation of alternative splicing variability in human populations. Genome Res. 2011 Nov 23; [PDF]

XIth Spanish Symposium on Bioinformatics 2012

If you are interested in Computational Biology, don’t miss this upcoming conference at the PRBB!! It will take place in January 23-25 and the deadline for registration is November 30, 2011.

Follow it in twitter!!!!!


CRG Symposium “Computational Biology of Molecular sequences”, part 3

On the second day of the conference, some more interesting talks at the “Computational Biology of Molecular sequences” X CRG Symposium taking place at the PRBB Conference Hall. I will focus on one talk of each of the sessions (genome regulation, RNA analysis and genome annotation), although all were very interesting!

Ron Shamir (Tel Aviv University) presented Amadeus, a software platform for genome-scale detection of known and novel motifs in DNA sequences, and explained some of the findings they have done with it. He also presented his new book “Bioinformatics for biologists”, which will surely be very useful for many biologists drowning in today’s sea of data and tools for analysing it.

Anna Tramontano (Sapienza University), the only female out of the 20 invited speakers and a very well-known figure in the protein world, gave her first RNA talk ever, as she presented it. She talked about a new method for controlling gene expression: a long ncRNA which contains 2 miRNAs within its sequence, and which competes with those miRNAs on binding to their target genes.

Tim Hubbard (Sanger Institute) gave so much information in 45 min that was hard to keep track of it all. He started with the catch 22 of reference genomes: we want it to be complete, but we don’t want it to change… the proposed solution: to keep the reference genome and to release patches with ‘novel’ information or with corrections (the ‘fix’ patches) whenever we get more information. Now, this means that alignment algorithms will need to be aware of patches, he warned the audience.

He then moved on to the costs of sequencing a human genome (5000 pounds, as per October 2011) and said that every 2-4 years the cost drops by 10 times! With this ever-lowing costs, he said, in the UK there has been quite a lot of movement regarding future policies on genomic medicine. And the main question is: what is the health economic value on having all this information, of sequencing the whole population? Nobody knows that yet, but according to Hubbard, one day the cost of sequencing will go low enough and the usefulness of the information will grow enough so that they will both meet and make it viable.

He finally presented the ITFoM (IT Future of medicine) project, one of the six funded by the Future and Emerging Technologies (FET) flagship programme of the EU – which has the goal of “encouraging visionary, “mission-oriented” research with the potential to deliver breakthroughs in information technology with major benefits for European society and industry”. The ITFoM project is expected to run for at least 10 years and will receive funding of up to € 100 million per year.  Considering what they aim to do – integrating all available biological data to construct computational models of the biological processes that occur in every individual human – they will certainly need that much money… Just consider this fact: to cover all the ‘cancer genomes’ appearing every day, we would need to sequence a new genome every 2 seconds!

So, that was my own pick of the day. Of course, much more happened at the meeting. You can find summaries of all talks and much more at the Symposium’s website http://2011symposium.crg.es/

And if you are interested in Computational Biology, don’t miss two upcoming events also at the PRBB:

Report by Maruxa Martinez, Scientific Editor at the PRBB

CRG Symposium “Computational Biology of Molecular sequences”, part 2

In the early afternoon, genome evolution was still the focus of the talks at the X CRG Annual Symposium. Why do we care about reconstructing ancestral genomes? Apart from the fact that it’s difficult and fun (reasons enough for most computer scientists), according to Mathieu Blanchette (McGill University) it can help us to study the mechanisms of genome evolution and also for the identification of functional regions, such as TFBS. Blanchette also showed how to turn multiple sequence realignment into a game with Phylo http://phylo.cs.mcgill.ca/

Following on into genome regulation, Philipp Bucher (Swiss Institute of Bioinformatics) explained the workings of the Chip-seq technology and talked about the computational promoter analysis in the era of ultra-high-throughput sequencing. And to finish off the quite long day, Alfonso Valencia (CNIO) substituted Gene Myers (Janelia Farm), who regretfully couldn’t make it at the last minute. Valencia talked about open questions in the protein universe, such as: what proteins are in a cell? What protein complexes exist? How did protein families originated? And are we using the right tools and approaches to analyze pathways? Some may think that we know the answers to some of them, but Valencia showed otherwise…

More tomorrow! You can also check the rest of the program here.


Report by Maruxa Martinez, Scientific Editor at the PRBB

%d bloggers like this: