Tag Archive | bioinformatics

T-Coffee Reloaded

Cedric Notredame, a group leader at the CRG, tells us in his “Slow bioinformatics blog” his personal and interesting story behind the development of T-coffee, a method for multiple sequence alignment which he developed during his PhD and which is currently widely used.



“For those who have no clue what T-Coffee does, it is a multiple sequence aligner. It means that it takes a bunch of biological sequences – typically proteins – that have evolved from a common ancestor by accumulating mutations, insertions and deletions…”

If you want to know the real story behind the T-coffee success, read Notredame’s blog here!




A tutorial on Burrows-Wheeler indexing methods

Guillaume Filion’s latest post is aimed at those wanting to understand the details of how the Burrows–Wheeler transform (an algorithm used in data compression) works. It may be of particular interest to those genomics researchers working on alignments, since, Filion says, the Burrows-Wheeler indexing is used to perform the seeding step of the DNA alignment problem, and it’s exceptionally well adapted to indexing the human genome.

For those of you who are not afraid of the small mathematical details, you can see this “The grand locus” post here.

Get it right, keep it clean, make a record

Keeping detailed records of your research and taking the right decisions when analysing your data is easier said than done. Yet, despite its importance, researchers often receive no formal training in these and other issues key to scientific integrity.

The PRBB Good Scientific Practice Working Group – formed by members of all the centres at the park, including myself – run a survey at the PRBB last year in which improper record keeping was the most relevant (mis)behaviour identified by scientists at the park, with over 40% of the 521 respondents saying they had “sometimes or often” noticed it.  Several surveys (Martinson BC et al. Scientists behaving badly. Nature 2005; 435:737-8) from around the world show this is not unusual – so the group decided to tackle this seemingly general problem in its first action campaign since it was created at the end of 2014.

poster gsp

A series of activities were organized for the week starting on the 25th of January.

The BIG QUIZ were a series of questions regarding data recording and managing that invited scientists to discuss amongst themselves in the restaurant or the lifts, and to record their opinion via the Good Scientific Practice website.

poll gsp


The questions were posted via Twitter as well as in posters around the building during the whole week. More than 285 people visited the website during that time, with between 70 and 120 replies to each of the questions.

eg tweet gsp



Without really aiming at answering those questions – rather, in any case, at opening new ones – three special workshops were held during the week. These were aimed at slightly different audiences, as a way of trying to cater for the great variety of science that takes place at the PRBB, and the different needs of each field.


Keeping the data record straight in the lab” – aimed at people working on wet labs – had Lola Mulero from the CMRB explaining the audience her centres’ system to keep track of the more than 100 experiments they deal with in parallel. This was followed by an open discussion on do’s and dont’s of a good lab notebook, and the seminar ended with a look to the future with the last talk focusing on the CRG’s pilot experiment of using electronic notebooks such as Onenote.

In silico data tsunami: will you survive?” was the suggestive title of the second workshop. It was led by Cedric Notredame from the CRG, who set the ground for the following discussions on reproducibility, traceability and sharing in computational data with a statement (“Science is about being able to measure something in a reproducible way”), a question (What to do with the growing amount of unused – but potentially useful for others – data we are producing?) and a reference to the #data#parasites recent controversy. Three short talks followed about the importance of metadata, how to ensure your experiments are reproducible, and the specific challenges of creating software for clinical applications. At the end of the workshop, group discussions took place on several open questions and ideas were put together with Ivo Gut, director of the CNAG, as the host.

The last workshop “Managing data in human research” gave some tips about how to create and maintain reliable and secure databases with human data and tackled the issues of privacy, anonymisation and data protection, before going on to the second, interactive part. This consisted of three case studies that made the audience think twice about the issues at hand when designing a study or the huge problem they could face if their data manager left without warning – just at the tip of the iceberg of problems would be finding the final version of a document/analysis/experiment amongst the files called “final”, finalv2”, “supefinal”, “final_draft”, “final_MM”,…20160128_141549

All three workshops were well attended, with over 60 people in each, and the feedback from the assistants was positive. You can see the presentations for all the seminars here.

The aim was achieved: to raise awareness about the intricacies and difficulties of proper record keeping and data management and to discuss with colleagues about possible solutions.

And the next challenge was set for the PRBB Good Scientific Practice working group. Watch this space for more upcoming activities!



Why do bioinformatics? By Guillaume Filion.

Guillaume Filion, a researcher at the CRG, reflects on why, despite all the problematic issues with bioinformatics (the variety of incompatible file formats and the difficulty in replicating results, amongst others) it still matters.

Read the full post in his blog “The Grand Locus”.


In-silico selection of targeted anti-cancer therapies

The Biomedical Genomics group led by Núria López-Bigas at the Pompeu Fabra Unviersity have recently published a paper in Cancer Cell describing the landscape of anti-cancer targeted therapeutic opportunities across a cohort of patients of twenty eight of the most prevalent cancers. They first looked for all the driver mutations (mutations that ’cause’ the cancer) for each individual cancer, then collected information on all the existing therapeutic agents that target those mutations, and finally, combining both datasets, came up with anti-cancer targeted drugs that could potentially benefit each patient. You can read more about this paper on their blog post.

Coinciding with the publication of that paper, the lab has crafted a new IntOGen interface which presents the results of this analysis. You can see it and learn more about it here.

About Linux, Freedom and Science

Today we recover this post “Why Linux is awesome” by CRG researcher Guillaume Fillion in his blog “The grand locus“. He explains his personal experience with this operating system, what he has learned by using Linux and why, in his own words “it has made me a better scientist”.

Curious? Read the full post! We’ll tell you the take-home message: “Following my experience of using Linux, I believe that freedom and openness lead to knowledge and competence“.

Your very own cancer avatar

Fátima Al-Shahrour, from the CNIO in Madrid, came last week to the PRBB to give a talk entitled “Bioinformatics challenges for personalized medicine”. She explained what they do at her Translational Bioinformatics Unit in the Clinical Research Programme. And what they do is both exciting and promising.

They start with a biopsy of a tumour from a cancer patient who has relapsed after some initial treatment – they concentrate mostly in pancreatic cancer, but it would work with any, in principle. From this sample, they derive cell lines, but also – and they are quite unique in this – they generate a personalised xenograft. That is, they implant the human tumour in an immunocompromised mouse, creating an ‘avatar’ of the patient. After passing it from one mouse to another (they do about 60 mice per patient), they extract the tumour to analyse it by exome sequencing (and sometimes gene expression data, etc). They then have about 8 weeks to find, using bioinformatics, druggable targets that they then test on the avatar. Those drugs that work on the mouse are then given to the patient.

The advantages of this system are many and obvious: not only the in vivo model can be used to validate the hypothesis generated by the genetic analysis, but we basically have a personalised cancer model for a patient in which we can try as many drugs as we want. It can be cryopreserved, so we have unlimited access to the sample. And, since cancer is not a disease we can cure yet, but instead patients must keep checking out for possible relapses, metastasis, or resistances to treatment, keeping the mouse in parallel with the patient can help predicting how the patient will react to all these: whether he will develop resistance to the drug, which other mutations might appear, etc.

But there are several disadvantages, too. One is hinted in Fátima’s talk title: the bioinformatics analysis of the tumours to find which mutations are important (drivers) in the disease and which can have drugs that affect them is challenging, not the least because an individual cancer genome can have hundreds to thousands of mutations.

Perhaps the biggest barrier is that, at the moment, making these avatars is inefficient, very expensive and slow. And since the patients who are benefit from this technology are already in a very bad clinical condition, many of them don’t get to live enough to enjoy those benefits. But there are some successful cases, and Fátima mentioned a couple. In one case, a man with pancreatic cancer who was treated with mitomycin after all the tests in his avatar, survived more than 5 years, when he had been given 1 year at the most.

So there is hope in the field of personalised medicine, despite the fact that this is still not standard, and won’t probably be for the near future. And, as someone in the audience mentioned, in an ideal future, we might even have personalised prevention, according to our genetic makeup. Wouldn’t that be great?

A report by Maruxa Martinez, Scientific Editor at the PRBB

Finding the genes underlying complex genetic diseases

Complex genetic disorders often involve multiple proteins interacting with each other, and pinpointing which of them are actually important for the disease is still challenging. Many computational approaches exploiting interaction network topology have been successfully applied to prioritize which individual genes may be involved in diseases, based on their proximity to known disease genes in the network.

In a paper published in PLoS OneBaldo Oliva, head of the Structural bioinformatics group at the GRIB (UPFIMIM)  and Emre Guney, have presented GUILD (Genes Underlying Inheritance Linked Disorders), a new genome-wide network-based prioritization framework. GUILD includes four novel algorithms that use protein-protein interaction data to predict gene-phenotype associations at genome-wide scale, and the authors have proved that they are comparable, or outperform, several known state-of-the-art similar approaches.

Alzheimer’s disease-associated top-scored proteins and their interactions

As a proof of principle, the authors have used GUILD to investigate top-ranking genes in Alzheimer’s disease (AD), diabetes and AIDS using disease-gene associations from various sources.

GUILD is freely available for download at http://sbi.imim.es/GUILD.php


Guney E, Oliva B. Exploiting Protein-Protein Interaction Networks for Genome-Wide Disease-Gene Prioritization. PLoS One. 2012;7(9):e43557

“Personalised medicine and Big Pharma need bioinformatics”


David Searls retired three years ago from his position as senior Vice President of Bioinformatics in GlaxoSmithKline. Since then, this computer scientist who spent 16 years in academia and 19 years in industry has gone back to his theoretical studies on linguistic analysis of biological sequences. He was invited to the PRBB and talked to us about drugs and computers.

This interview was published in Ellipse, the monthly magazine of the PRBB.

What part does bioinformatics have in drug development? 

It is an essential step along the way. This is because not only drug discovery, but all biology, has become, since the human genome and the high throughput technologies, an information science. It is very data-intensive, and you need computers to analyse that data.

How is the industry crisis affecting the pharmaceutical companies? 

The industry is indeed in great difficulty at the moment, as costs are increasing while the number of new drugs is going down. One way the large pharmaceutical companies are adapting is by starting to drop some of their therapeutic areas. Fundamentally, R+D is becoming smaller, due to the merging of companies and the reduction of costs. They are also depending more on in-licensing, i.e. buying drugs at different stages of development from smaller biotech companies or from universities. This way the ideas, the basic science and the early testing, are done by smaller companies while Big Pharma does only the last stage, the clinical trials, which is what they are best at. Basically, a more spread out economic model is being created.

Can bioinformatics help? 

Yes, it can. One of the reasons why the cost of developing drugs is so high is that many of the molecules studied as potential drugs aren’t effective, or have undesired side effects. Better use of the information that predicts interactions between molecules can prevent early failure, since the side effects are usually due to interactions between the drug and proteins other than the target.

Another way bioinformatics can help is in drug repositioning, which is taking a drug that has been approved for one disease, and looking for other uses for it. Bioinformatics helps us find other protein interactions of a specific drug target, and predict which processes that target might be involved in, as well as potential effects. The advantage is that we already have data on the safety of the drug, which is one of the most costly procedures.

What will be the role of bioinformatics in personalised medicine? 

It is already helping to classify diseases via the analysis of transcriptomics, i.e. which genes are activated in each tissue. This allows us to find subtypes of an apparently homogeneous tumour that are susceptible to different drugs. We can then check the expression pattern of the patients to decide which treatment is best for them. Also, personalised medicine won’t be one drug for one individual, but a combination of drugs for each individual. Again, bioinformatics will help with the prediction of which combinations will be more useful.

“I’m working at what I’d always dreamed of” – Manuel Pastor, researcher on drug design

An interview published in Ellipse, the monthly magazine of the PRBB.


Manuel Pastor, 45 and from Madrid, studied pharmacy at the University of Alcalà de Henares (Madrid), and after doing his PhD in the organic chemistry department went to Perugia in Italy for his postdoc. Self-taught computer expert and passionate about reading and the cinema, Pastor fell in love with medicines when he was little. Years later he has realised his dream as head of the research group for computer aided drug design at the GRIB (IMIM-UPF).

When did you hear the call to science? 
I’ve been passionate about medicines since I was small. When I was 5 my brothers used to read Spiderman comics and I remember clearly that I was intrigued by the hero, a scientist, who created compounds that could reverse the effects of the bad guys’ poisons. I wondered how these compounds worked in the body. Straight away I started to say that I wanted to work in research. My friends just laughed and asked me if my parents were rich. “No? Then forget it!” I come from a humble family, but with a bit of luck and dedication I’ve turned out OK.

So you owe your vocation to Spiderman? 
I think it is a fascinating comic. Today there are so few cases where the hero is a student, an intellectual and committed person, with values.

How was your postdoc in Italy? 
It was 1994, when Internet was just starting to become known about and used. There was everything to do. When I arrived in Italy I installed the first browser on the lab computers; they hadn’t even realised that they could access information and databases via the Internet!

Are you not a pharmacist then? 
Yes, but I learnt to program during my doctorate. In fact, toward the end of my thesis I created one of the first computer networks in Spain – and when I say created I mean I physically joined the cables from one computer to another! After the postdoc in Italy I went back to Madrid for a year and a half, but it was a bad time, much worse than now, and there was no work anywhere. I went back to Perugia where I was offered a job in a scientific software company (Multivariate  Infometric Analysis). I worked there for three years as head scientist.

And finally you came back… 
It was 1999 and this time Ferran Sanz contacted me because they needed biostatistics teachers that also fit in with the research areas of the IMIM. Mostly the teaching attracted me, I’ve always liked it. For me giving classes is not secondary, it is important.

What differences have you found between the worlds of academia and industry? 
They have different objectives and methods. I think the important thing is to find out which the good things are from each and try to apply them. My group is just at the interface between these two worlds.

And what are the best bits of each world? 
From the company world I like the clear objectives and practical results. I don’t have anything against basic research, but I want to do research that impacts the world I live in. In academia there is more creative freedom and more contact with young people, researchers just starting out. This is especially enriching because having to explain things lots of times means that in the end you understand them better yourself!

%d bloggers like this: