Marc A. Marti-Renom is interested in three-dimensional structures. After eight years in the US dedicated to the world of proteins, the biophysicist returned to his native country, first Valencia and then Barcelona, to specialise in RNA and DNA folding. In 2006 he set up his own group, which today is divided between the CNAG, where there are ten people, and the CRG, where there are two. “We do the experimental part, the sample preparation, here in the CRG, and the sequencing and analysis happens in the CNAG”, he explains. For his research he requires a large sequencing and computing capacity, which he can get at the CNAG, the second-most important sequencing analysis centre in Europe. “We are fortunate to be in one of the best places in the world to do these studies,” he says proudly.
Proteins with clinical application
Proteins caught his attention while he was doing his PhD, and in 2004, when he was at the University of California (UCSF), he collaborated in the creation of the “Tropical Disease Initiative,” a drug-discovery initiative linking people from both academia and companies to try to reposition drugs in favour of neglected diseases such as malaria and tuberculosis. “The idea was to make it all open source so everything we found was published directly to the web and couldn’t be patented”, says Marti-Renom.
The Structural Genomics group was a major player in one of the first instances that genome sequencing was used at the clinical level. “There was a patient with tuberculosis and a high resistance to antibiotics. We sequenced samples from the patient and found out he was infected by two different strains, and one of them was mutated. When we made models of the protein structure resulting from this mutation we saw how it was affecting the function”, explains the scientist. According to Marti-Renom, in a few years not only will everyone have their genome sequenced, but it will happen several times. “When someone develops a disease like cancer we will sequence them again to see what has changed and why”, he predicts.
Beyond proteins: RNA and DNA
Proteins, the cell’s building blocks, are not the be-all and end-all of life. Since the 1960s we have known that RNA has essential functions other than converting the information in DNA into proteins. But of its three-dimensional structure very little is known, and in the end, the function occurs in 3D. For this reason the group is developing computational tools to incorporate experimental data and make structural predictions.
The most recent biological component to enter the ‘3D world’ was the genome. In this case, too, little is known about how it folds in space. The group of Marti-Renom, along with three other groups at the CRG (Miguel Beato, Guillaume Fillion and Thomas Graf) is carrying out the 4DGenome project, which has a budget of 12.2 million euros, in order to understand the structure of the genome and how it changes over time. “We know the genome sequence very well, thanks to molecular biology and the big genome projects. We also understand the chromosomal macrostructure, thanks to advances in microscopy; but we can’t see the middle ground, the step between the tangled skein and the well-defined chromosome”, says the head of the group. In 2006 they began using Chromosome Conformation Capture (3C) data to develop software that allows you to view the entire genome at high-resolution, a kind of ‘molecular microscope’. With this and other technologies, like Hi-C, and using computational algorithms they have been able to observe how different regions of the same chromosome tend to interact with each other. They have also seen that the 3D ‘photo’ of a moment when, for example, there is high gene expression may be very different to another where the expression is low. “Without this three-dimensional information it is much more difficult to characterise how the genome works”, concludes the researcher.
According to Gene Myers (near) perfect genome assembly is within reach for any organism of your choice.
Time will tell if he’s right, but being an influential bioinformatician who has made key contributions in sequence comparison algorithms such as BLAST, whole-genome shotgun sequencing and genome assembling, one will think he knows what he’s talking about!
In a conference at the PRBB auditorium today, he explained to a mixed audience of biologists and computer scientists how, after a few years dedicated to other issues (mostly image analysis), he was now coming back to sequencing with great excitement. The reason: PacBio RSII. This sequencing device is able to produce very long reads (of more than 10,000bp!) and has a couple of other characteristics that can potentially make full assembly possible: although error rates are high (10-15%) they are random, not like with other techniques that tend to make always the same errors. And sampling is also random. This randomness and the length of the reads mean that, with enough sequencing coverage, you can always get the right sequences.
So now all we need, Myers says – apart from waiting for the cost of the PacBio to go down, which he promised will happen soon (4x in one year) – is to build an efficient assembler. He talked about what he and some colleagues have been doing in that sense. The main element is a ‘scrubber’ to clear and edit the reads while removing as little data as possible. Because his point was that even though people have been focusing on the assembly, the real problem is the data, the contaminants, chimeras, excessive error rates,… So he presented his personal ‘data cleaner’, DAscrub, soon to be released.
You can read more details about his recent work on this in his blog,
In the meantime, his advice to the world – stop the 10,000 genomes project right away and wait a couple of years to have better sequences!
Wouter de Laat was one of the developers of 4C, a technique highly used to find out DNA interactions between different regions within or between chromosomes. He came from the Hubrecht Institute in Utrecht, The Netherlands, to give a talk to the PRBB today, invited by Guillaume Filion, from the CRG.
The room was packed, with more than 70 researchers ready to learn about how much function is actually within the genome structure. We learned about ‘gene kissing’ – or how genes functionally related but far away in a chromosome come close together during transcription. Interestingly, when de Laat and colleagues inhibited transcription, these interactions (kisses) did not change. The same happened when transcription was overexpressed; and even when they forced mono-allelic expression (silencing just one of the two alleles for a specific gene) and checked by allele-specific 4C, they saw that the contacts with the rest of the chromosome still had not changed.
He used a good metaphor to explain how these 3D localisation in the nucleus takes place: each gene in a chromosome is like a “dog-on-a-leash” – the gene goes wherever the chromosome goes (in space), as the dog does with its owner, although once in that location, a gene is ‘free’ to interact with whoever they want – choose which tree they want to pee on, so to speak. However, there are some genes (mostly largely repetitive regions, such as rRNA genes or centromeres) which are able to decide their preferred location and actually bring the rest of the chromosome: these would be the Pit Bulls amongst the genes.
He talked about much more his lab is studying, mostly comparing the 3D spatial organization of differentiated cells versus embryonic cells (both ESC and IPs), and showed that differentiated cells are also more spatially defined than totipotent cells.
De Laat talked about other uses of 4C, and amongst others he mentioned, at the end of his talk, how he is taking this technique further and using it in diagnostics. Indeed, he has co-founded a company called Cergentis that uses 4C to identify DNA regions which are rearranged.
A report by Maruxa Martinez, Scientific Editor at the PRBB
Cancer is generally caused by a combination of many specific mutations, called drivers. But cancer cells contain many other mutations that are not the cause of the cancer, but rather a consequence (passenger mutations). Also, high-throughput genome projects are identifying a huge number of somatic variants. Which ones are cancer-causing? How to distinguish the needle in the haystack?
A new computational method recently published in Genome Medicine by the research group led by Núria López-Bigas at the GRIB (UPF-IMIM), can help. Called transformed Functional Impact Score for Cancer (transFIC), it improves the assessment of the functional impact of tumor nonsynonymous single nucleotide variants (nsSNVs) by taking into account the baseline tolerance of genes to functional variants.
Other methods predicting the functional impact of cancer-causing somatic variants employ evolutionary information to assess the likely impact of an amino acid change on the structure or function of the altered protein. However, according to the authors, the ultimate effect of this amino acid change on the functioning of a cell depends on other factors as well, such as the particular role played by the altered protein in the cellular machinery. The more critical that role is, the less tolerant will the protein be to an amino acid change.
Their new method takes this feature into consideration, and has been shown to outperform previous ones. They distribute their new tool as a PERL script that users can download and use locally, and they have set up a web server which can be queried to obtain the transFIC of somatic cancer nsSNVs.
Gonzalez-Perez A, Deu-Pons J, Lopez-Bigas N. Improving the prediction of the functional impact of cancer mutations by baseline tolerance transformation. Genome Med. 2012 Nov 26;4(11):89
An interview published in Ellipse, the monthly magazine of the PRBB.
Mar Albà is a biologist who has moved from the lab to the computer and the analysis of the genome. After five years in England, she joined the UPF with a Ramon y Cajal contract, and since 2005 she is an ICREA Research Professor. Currently she coordinates the group of Evolutionary Genomics at the GRIB (IMIM/UPF) and the subject ‘Principles of Genome Bioinformatics’ at the master of Bioinformatics at the UPF. Since several months she has added motherhood to those tasks.
What memories do you have from your PhD?
It was a good experience, but I did see that I was not made for the laboratory but for a more theoretical research.
How did you decide to do bioinformatics?
It was somewhat by chance. When I arrived at University College London in 1997, I didn’t know where to direct my career. I joined a Master’s degree in bioinformatics and molecular modelling, and it was decisive.
What fascinates you most about your research?
Trying to figure out how organisms evolve using the tracks present in the DNA sequence. Understanding how our genes have originated and how, during evolution, certain sequences happen to have an important role that natural selection is responsible for preserving. We do this indirectly by comparing the genomes of different species and trying to infer what may have happened on the way.
What have been the highlights of your career?
The studies I made in London in the late 90s about the evolution of repetitive sequences in the laboratory of John Hancock, the first one to use data from the complete genome of yeast. Also the research on the origin and evolution of genes that have recently appeared, which I have done in collaboration with José Castresana and Macarena Toll-Riera, indicating that these genes have an evolutionary plasticity that will be lost over time.
What are the differences in the way of doing research in London?
There weren’t big differences in the quality of research, but it was a more open, more American system, where the merits of the person are what counts, and not their origin or who they know. In fact, many group leaders were foreigners. This surprised me a lot because when I did my PhD in Barcelona, there weren’t even any foreign researchers. Things are changing now with centres like the PRBB, the CNIO or the Parc Cientific, which try to adopt a different philosophy in recruiting and which, being new, don’t suffer from certain inertia.
Is informatics a male area?
Yes, but so are other sciences. In fact, I think the working world is designed for people with few family responsibilities, which traditionally have been men. We must also take into account the instability of the research career and the continuity you need in a system where assessment is done through the production of publications and attendance at conferences. Difficult to assume if you have kids.
How can it change?
Perhaps when more women are in positions of decision, since they have a broader vision. And it’s not just a question of children, but also other aspects of a person’s life, such as caring for the elderly.
What advice would you give to junior researchers?
Do not be discouraged. At times when you doubt about your research, remember that it is a privilege to live off what you love.
What would you be if you were not a scientist?
I never thought I would do something else other than research. I never had a plan B.
The colours of science
Science, in its day-to-day form, presents itself full of colours, as many as a painter’s palette and with the rainbow’s range of tonalities. The single nucleotide polymorphisms (SNPs) are the most common variations of the human genome. These small modifications are very useful in medical research of complex diseases and to develop new drugs. The SNPs present few variations between generations, a fact that allows us to follow the evolutionary processes in studies of population genetics. They are also used in some genetic tests, such as paternity tests or forensic analyses.
The use of SNP arrays, seen in the image, allows the analysis of up to 1 million SNPs in a single reaction. This system generates an impressive amount of data from less than one microgram of DNA; an amount of data that years ago no researcher ever dreamed of having so quickly.
This image was published in Ellipse, the PRBB monthly newspaper.
With more than 1,400 people working at the PRBB, the movement of researchers coming and going is constant.
One of the most recent acquisitions is Eduard Sabidó, who has just arrived to be the new head of the CRG/UPF Proteomics Unit. Eduard is coming from the Swiss Federal Institute of Technology of Zurich (ETHZ) and will be leading this core facility which offers service to the whole park and beyond.
A new young group leader has also joined the CRG recently. The French molecular biologist Guillaume Filion (who, as we mentioned in an earlier post, is currently looking for a postdoc) was last at The Netherlands Cancer Institute, in Amsterdam, where he did a postdoc during three years. His research group on Genome Architecture is focused on understanding the ‘regulatory genome’ – that is, the largest amount of the genome which does not code for proteins. We hope to be posting some more news on his research soon!
And while some come, others go… Hernán López-Schier and his group will sadly be leaving the Cell and developmental biology programme of the CRG in March. After nearly 6 years at the CRG, the Sensory Cell Biology and Organogenesis group is moving north to Munich. Hernán will become director of the Department of Sensory Biology & Organogenesis at the IDG – Helmholtz Zentrum München. There, the group will continue their research on the acquisition and maintenance of sensory-organ function, using the zebrafish as a model organism. Sabrina Desbordes, currently in this group, is also moving to the same institute to start her own group as a Junior Group Leader. Good luck to both of them!
He explained there are only about 20,000 human genes, but more than 1,000,000 genome ‘switches’, short DNA regions which control which genes are expressed and at what levels. And the so-called ‘gene deserts’, areas of the genome with very few genes in them, are actually very rich in these conserved non-coding elements which act as cis-regulators of gene expression. So, as he said, perhaps more than gene deserts they should be called ‘regulatory jungles’.
Just because of the sheer number of these regulatory units, their importance should not be overlooked – if there are so many they must be doing something. And their key role in gene regulation is more obvious each day.
But in the context of evolution and the apparition of new, beneficial mutations, their role takes a new dimension I, for one, hadn’t thought much about. Let me use Bejerano’s example, the Sall1 gene, a key gene during development that is expressed in limb, brain and neural tube. There are three enhancers for this gene, one for each of these locations. If a mutation appears in the gene itself, it will cause problems in all the regions the gene is expressed and therefore that mutation might be lethal and won’t be successful (evolutionary speaking). But if the mutation affects only one of the enhancers, it would have an effect just on one of the regions where the gene is expressed. Such a mutation would have more chances to be passed on to the next generation. Therefore, beneficial mutations are more likely to appear in enhancers (highly conserved non-coding regions) than in genes. Enhancers would then play an essential role in evolution.
Report by Maruxa Martinez, Scientific Editor at the PRBB