新科学家: 扑朔迷离的人类基因组

我们对于人类基因组的奥秘知道的越多,对DNA的整体工作机制却懂得的更少。然而,一切奥秘还是有迹可寻...
      
   编者案:“不要以为'垃圾基因’真的是垃圾”
   “心脏功能紊乱:概率99%,潜伏并早期发病致死。预期寿命:30.2年”
   一出生,文森特的寿命和死亡原因就已经被知晓。他的劣质基因意味着他所能期望的最佳工作就是去当一名清洁工,而不是去实现自己的雄心,成为一名宇航员。
    上面描述的正是科幻电影“千钧一发”中的开始场景,在未来每个人的一生将由他们自己的基因决定。“千钧一发”于1997年公映,而当时人类基因组计划正在执行过程中。电影情节反映了当时人们的普遍认识:人类不久就能够通过基因预测个体的未来。“存在一种信念,我们可以通过研究基因的自身和差异性,来回答大量关于人类自身的问题”,人类基因组计划参与者,伦敦国王大学基因学家提姆.斯佩克特说。
   时至今日,美好的期望却似乎更加遥远。在人类基因组计划进行过程中,另外一个重要项目被启动,以试图理解人类基因组中发现的基因比特到底在表达着些什么。本周公布的结果显示,我们的基因组已经比十年前生物学家所想象的更加复杂和神秘莫测。
   回溯1960年,一幅美丽的图片展现在人们面前。我们的基因原来由蛋白质菜单组成。双螺旋结构可以通过解链来使RNA执行菜单的复制,并将其传送至细胞内部的蛋白质工厂。然而直至70年代,人们发现只有很少的一部分基因编码用于蛋白质制造,而现在我们知道这一比例不到1.2%。那么剩下的那一部分基因编码在做什么?有些人假设它们一定在执行某些功能,而另一些人则认为它们都是垃圾。“人类基因中至少90%的部分是无用的,或者说是垃圾”,基因学家Susumu Ohno在1972年写道。
   Ohno当时就知道,或者认为,一些并没用来执行蛋白质编码的DNA仍然在一些过程中扮演着重要的角色。比如,制造基因的RNA拷贝过程 - 转录- 一些蛋白质簇在基因附近被绑定至特定序列。这些被称为转录因子的蛋白质,通过控制基因来加速或阻断转录过程,而组装成的特定序列被称作调节DNA或开关。
   那么到底有多少DNA作为开关而工作,或者说,还执行一些其它功能?为了提供一幅基因组的不同组件各自功能的全景图,ENCODE( DNA元素百科全书)项目于2003年成立。ENCODE项目由遍及全世界,并使用各种技术手段的不同小组构成。针对仅仅1%基因的前期研究结果于2007年发布。本周,所有基因的研究结果也已经发布,并且在“Science”和其他期刊上公开发表大约30篇论文。
   ENCODE项目试图从众多事务中寻找控制基因活动的开关。研究者通过获取已知的转录因子并观察它们将绑定的DNA序列来研究。到目前为止,研究者已经发现4百万点位,约占全部基因序列的8.5%,这一结果远远超过任何人的期望。
   即便如此,这一结果也仅仅是对开关的最终真实数字的一个粗略估计,因为ENCODE还没有研究每一种细胞类型,或每一种转录因子。“据我们的乐观估计,开关大约占基因序列的18%到19%”,英国剑桥大学欧洲生物信息研究所的尤恩.伯尼说,他目前正在主持ENCODE项目的数据分析工作。“我们看到比原先期望多得多的开关点位,并且看起来每一部分基因组都和开关相邻。”
    但是,新的发现和科学结论非常惊人,但这些发现却并没有显示这些开关到底在做些什么有益的工作。比如,许多开关在过去或许执行过一些角色,但现在被断开了。
   另一个巨大的惊奇之处是调节基因在整个基因组中分布的相当广泛,其中许多存在于过去被认为是“荒地”的基因之间很长的片段中间。大约95%的基因组处于大约10000个基础的开关对之间。“这意味着几乎所有的基因组都在做些什么,或者说如果你更改它们的话,就可能对某些地方的某些东西产生作用”,伯尼说。这些开关工作的实际方式看起来比我们最初设想的要复杂的多。一个ENCODE项目的研究发现单个开关可以影响多个基因。进一步说,多数基因可以同时被大量的开关影响。“几乎我们观察的每个基因都和DNA的其它片段存在物理接触,且从来不仅仅是1个,而是5个,8个甚至10个,并且每个上面都有RNA,蛋白质和组蛋白”,一个来自伍斯特市马萨诸塞大学医学院的小组成员乔布斯.德克说。
    这一结果有助于解释一个生物学上最大的谜团:消失的遗传特征。我们都知道存在大量的遗传基因会导致诸如身高或糖尿病的病态变异,但病态变异却只在拥有这一遗传基因的后代中非常小的一部分人中发生。
    丢失的遗传变异 
   存在这么一个假设:基因变异是彼此独立发生的,因此其结果是可加性的:如果你拥有某种会增加疾病风险的变异,比如说,一个变异会增加心脏病发作风险5%而另一个变异增加10%,那么你总的风险为15%。但是德克的发现却显示,多个变异产生的综合效应可能是可乘性的:某些变异自身作用不大,但是和多个变异叠加后却产生乘数效应。
   “我确信很多遗传变异的丢失,是因为多重基因,多重非编码变体和多重环境因素导致的,”新汉普郡达特默斯盖泽尔医学院的詹森.莫尔说。“之所以会存在大量遗传变异被丢失的现象,是因为我们忽略在已知的生物学领域存在大量复杂的相互作用关系。”
   所以在超过20%的基因组中都存在调节开关,无论它们工作或不工作。那么其它的基因组呢?ENCODE试图通过确定基因组中各自参与特定生化反应的映射关系来回答这个问题,并显示到底有多少基因组将参与日常运作。研究结果显示超过80%的基因组是活动的,多数参与并被转码到RNA。
   既然这些RNA并不负责运送制造蛋白质的编码,那么它们负责做什么呢?目前我们已经知道在已存具备大量不同功能的RNA中,多数都将参与调节基因活动,比如微RNA。进一步说,一些非编码RNA看起来在执行一些未知的工作。
   “它们可能像出租车一样负责在基因组附近传送蛋白质,但他们也能发挥桥梁的作用,将基因组的不同部分连接起来”,加利福尼亚州拉贺亚市斯克里普斯研究中心的凯文.莫里斯说。而其它的则像一个捕鸟器一样,通过吸收编码RNA来降低蛋白质生产。
   到目前为止具备已知功能的RNA还不到全部基因组的80%。一个解释是大多数的RNA译码过程都是无用的,仅仅是过分活跃而又不知道如何终止DNA向RNA译码的过程中产生的白噪声。这就像在屋子里面养猫捕鼠却无法制止猫捕杀邻居家的小鸟一样。
  “非编码DNA的转录过程并不会自动标记其功能”,加拿大安大略湖圭尔夫大学研究基因进化的瑞安.格雷戈里说。“我不认为ENCODE会显示98%非编码DNA中的多数具备调节功能。使用如此多的非编码DNA去调节区区20000个基因是令人惊讶的。”
   只有为数不多的几个生物学家认为多数非编码RNA在发挥重要的作用,其中最直言不讳的是澳大利亚布里斯班昆士兰大学的约翰.马蒂克。“是时候让噪声理论的鼓噪者们来解释解释,为何有这么多基因组显示其功能迹象?”
   但是做事并不代表做有用的事,确实存在充足的理由来认为我们大多数的DNA并不在发挥重要的作用。对于起步阶段的人而言,设想我们都有几十个新的基因变异。华盛顿州西雅图市系统生物研究所的约瑟夫.纳多说,我们中的多数都拥有一到五个对正常基因功能和蛋白质编码存在负面影响的基因变异。
   如果我们拥有的DNA中多数都是致命的,人们将会比胚胎和幼儿由于不良基因变异导致的死亡更快的获取有害的基因变异。“如果基因组中存在有效功能的部分在增长,问题是:我们如何才能够忍受这一切?为什么我们不会在若干次变异后死亡?”纳多问。
   一个获取给定DNA片段重要性的方法是看是否这一片段可以累积存在而不存在不良后果,或者说可以在自然选择淘汰不良变异后果的过程中幸免遇难,并一个种群中持续不变。“蛋白质编码序列中70%的核苷酸变动将由于有害的后果而遭到淘汰,但是90%的非编码序列却被保留下来,”牛津大学的克里斯.庞廷说。“这一点告诉我们编码序列的变异较非编码序列的变异重要的多。”
   伯尼说ENCODE项目已经发现针对灵长目的特定转录RNA,并且其中有些看起来正处于人类竞争选择的压力之下。
   另外一种评估DNA中给定基因编码重要性的方法是删除并观察其后果。这一实现显然不能针对人类进行,然是在针对小白鼠的实验中发现大量功能性非编码DNA的删除看起来并没有造成明显的后果。而且,删除诸如酵母菌的许多蛋白质编码基因,看起来也不会造成明显的后果。
   一个解释是在良好的实验室环境下,生物体即使不具备某些在实际竞争环境下至关重要的基因,也可以继续生存。另一个解释是在生物基因中存在大量冗余备份。尽管实验者期望变异会消除冗余,仍然可能存在一些情形能够是生物维持生存。
  “冗余会让系统更加强健,”德克说。“如果任何给定的DNA片段都仅仅是上下文的一部分,则删除这些片段只会导致有限的后果。我认为这可以解释为什么我们能够容忍个体之间存在巨大的差异,却仍然能够一起在街上漫步”。
   第三个解释是大多数我们的DNA都没那么重要,虽然物种之间基因组存在大小的差异,而物种复杂性和其基因组尺寸却没有太大的相关性。并不存在明显的理由表明石肺鱼确实需要40倍于我们,400倍于河豚的DNA。
    因为它们能 
   格雷戈里已经发现某些动物,比如变态中的青蛙细胞需要快速分裂,并拥有比其它动物小的基因组。这显示生物体倾向于累积DNA直至产生有害的后果,而不是像住大房子的人倾向于拿东西填满阁楼而住小房子的人总是不断把东西往外清。
   正因如此,一些秘密将继续保持下去。虽然一些证据表明我们的多数基因并没有那么重要,ENCODE项目的结果仍然显示我们基因组中的多数区域还将做些事情。一个答案是这些区域中的多数部分将不会导致重大后果。“它们或许会产生一些后果。它们可以改变脸部解剖外形,”庞廷说。“它们可以产生较小,并且和进化无关的效应。”
   尽管如此,伯尼仍然认为基因组的有些区域是至关重要的。“我们还不能决然的回答到底这些区域有多重要,但是我们已经发现更多的事实,并且较先前所有人所怀疑的更为重要,”他说。“人们经常说它们不过是蛋白质编码区域而已,多不了多少。不是多不了多少,而是多得多。”
   当ENCODE项目的工作开始时,伯尼自己就高度怀疑非编码RNA在基因功能中的作用。他甚至和马蒂克赌一箱香槟酒,只有不到20%的非编码RNA是有用的。“看起来我很有可能输掉这个赌局,”他说。
   然而,距离他们任何一方赢得赌局还有很长一段时间。实际上,将数百万基因区域中具备功能的部分标识出来是一个需要耗费数十年的艰巨任务。根本上说,证明特定基因区域重要性的唯一方法就是证明该区域的变异将对人造成影响,而这一过程非常困难。尽管,在某些情形下,证据已经存在:通过研究基因变异和疾病之间映射关系而标记出的重要基因变异常常和ENCODE项目结果相符。“多数时候我们我们获取的结果要么和ENCODE研究结果完全相符,或非常接近,”伯尼说。
   这一结果已经给我们提供了关于疾病原因的线索。比如,克罗恩病的一些基因变异就和ENCODE项目发现的被称为T-helper的免疫细胞中的开关被激活有关。
  然而,看起来我们对于基因组了解的越多,我们知道反而越少。“我们的基因知道如何制造一个人,但是如果认为基因组的整张菜谱非常简单并且已经完全展开则是不折不扣的妄自尊大,”伯尼说。“人类是我们自身能够了解到的最复杂的事物,且确确实实非常复杂。”
   无疑,正如“千钧一发”的电影中所描述的那样,我们距离完全了解和懂得基因组还有很长的一段路要走。事实上,我们从来就没有过进展。“它确实过于复杂,”摩尔说。
    这种复杂性的一部分来自于我们的基因组和环境间存在多种交互方式。“我认为DNA会变得更具可预测性,但是这项研究的另一面要求我们懂得由于环境影响和自由意志所产生的诸多复杂个性--那些我们能够改变的事物,”伯尼说,“DNA不是命中注定的”。
   某种程度上,即使“千钧一发”的作者也会认可的是,文森特置自身基因缺陷于不顾,坚持抗拒宿命并最终完成离开地球的愿望。

新科学家记者 林达.格迪斯

The ever deepening mystery of the human genome

05 September 2012 by Linda Geddes

For similar stories, visit the Genetics Topic Guide

  • The more we learn out about the secrets of the human genome, the less we seem to know about all that DNA actually does. But there are some clues

(Image: Laguna Design/SPL/Getty)


Editorial:
             "Don't junk the 'junk DNA' just yet"

"HEART disorder: 99 per cent probability, early fatal potential. Life expectancy: 30.2 years."

At birth, the time and cause of Vincent's death were already known. His inferior genes meant that the best job he could hope to get was as a cleaner, rather than realising his ambition of becoming an astronaut.

Thus begins the film Gattaca, set in a future when a person's potential is thought to be determined by their genes. Gattaca was released in 1997 during the middle of the Human Genome Project, and its plot reflected what many believed at the time: we'd soon be able to predict all kinds of things about people based on their genes. "There was this belief that we could answer huge amounts of things just by studying genes and gene variants," says geneticist Tim Spector of Kings College London, who was involved in the project.

Yet today, this prospect seems more distant than ever. After the genome was sequenced, another major project was launched to try to understand which bits of the genome do what. The results, released this week, reveal that our genome is far more complex and mysterious than biologists imagined just a decade ago.

Back in the 1960s, a beautifully simple picture emerged. Our DNA consisted of recipes for proteins. The double helix could be unzipped to allow RNA copies of these recipes to be made and sent to the protein-making factories in cells. But by the 1970s, it had become clear that only a tiny proportion of our DNA codes for proteins - just 1.2 per cent, we now know. What about all the rest? Some assumed it must do something, others suggested it was mostly junk. "At least 90 of our genomic DNA is 'junk' or 'garbage' of various sorts," the geneticist Susumu Ohno wrote in 1972.

Ohno knew, though, that some of the DNA that didn't code for proteins still played a vital role. For instance, the process of making RNA copies of genes - transcription - involves clusters of proteins binding to specific sequences near the genes. These proteins - called transcription factors - control the activity of genes by either boosting or blocking transcription, so the sequences to which they bind are known as regulatory DNA or switches.

So how much DNA acts a switch, or has some other function? To provide an overall picture of which parts of the genome do what, the Encyclopedia of DNA Elements (ENCODE) project was set up in 2003. It involves many teams around the world using a variety of techniques. The results of a pilot studylooking at just 1 per cent of the genome were released in 2007. This week, the results of its study of the entire genome were released, with the publication of more than 30 papers in Nature and other journals.

Among other things, ENCODE looked for switches that control gene activity. The researchers did this by taking known transcription factors and seeing which bits of DNA these proteins bound to. So far, they have found 4 million sites, covering 8.5 per cent of the genome - far more than anyone expected.

Even this is likely to be a gross underestimate of the true number, because ENCODE hasn't yet looked at every cell type, or every known transcription factor. "When we extrapolate up, it's more like 18 or 19 per cent," says Ewan Birney of the European Bioinformatics Institute in Cambridge, UK, who is coordinating the data analysis for ENCODE. "We see way more switches than we were expecting, and nearly every part of the genome is close to a switch."

But - and it is a big but - these findings do not show whether these switches actually do anything useful. Many of them may have played a role in the past, for instance, but are now "disconnected".

The other big surprise is that these regulatory regions are widely dispersed throughout the genome, with many lying in the middle of long stretches between genes that were thought to be barren wastelands. More than 95 per cent of the genome may lie within 10,000 base pairs of a switch. "It means that nearly all of the genome is in play for doing something, or if you change it maybe it would have an effect on something somewhere," Birney says.

The way in which these switches work is also turning out to be vastly more complicated than thought. One ENCODE study found that individual switches interact with many genes. What's more, most genes are being influenced by numerous switches at the same time. "Almost every gene we look at is physically touching other pieces of DNA, and it's never just one, it tends to be five, eight, 10 sites, and each site in turn has RNAs on it, proteins on it, histones on it," says team member Job Dekker of the University of Massachusetts Medical School in Worcester.

This might help to explain one of biology's biggest puzzles: the mystery of the "missing heritability". We know there's a big genetic component to traits and diseases such as height and diabetes, but the genetic variants found so far typically account for only a tiny percentage of this heritability.

The missing inheritance

The assumption has been that genetic variants work in isolation, so their effects are additive: if you've got one variant that increases the risk of, say, heart disease 5 per cent and another that increases it 10 per cent, your overall risk is 15 per cent. But Dekker's discovery suggests that the effects of some variants can multiply: these variants may have a small effect on their own, but a much bigger effect if a person has certain other variants too.

"I firmly believe that much of the missing heritability is due to complex interactions between multiple genes, multiple non-coding variants and multiple environmental factors," saysJason Moore of the Geisel School of Medicine at Dartmouth in Hanover, New Hampshire. "The reason we've missed a lot of the heritability for complex diseases could be because we've ignored the complexity of the interactions that we know exist in biology."

So up to 20 per cent of the genome may consist of regulatory switches, working or otherwise. What about the rest? ENCODE tried to address this by mapping what proportion of the genome is involved in some kind of biochemical event, which might suggest how much of it is in daily use. The results suggest that up to 80 per cent of the genome is active, with much of it being transcribed into RNAs.

This RNA is not carrying the codes for making proteins, so what is it for? We know that there are lots of different kinds of functional RNAs, many of which are involved in regulating gene activity, such as microRNAs. What's more, some non-coding RNA is turning out to perform other unexpected jobs.

"They can work like taxi drivers to deliver proteins around the genome, but they can also tether one part of the genome to another and act as a bridge," says Kevin Morris of the Scripps Research Institute in La Jolla, California. Yet others act as decoys, reducing protein output by soaking up coding RNAs.

But so far all the RNAs with known functions do not begin to add up to 80 per cent of the genome. One explanation is that most RNA transcripts are useless, being mere "noise" generated by overzealous enzymes that don't know when to stop transcribing DNA into RNA. It's like getting a cat to kill mice in your house, and not being able to stop it killing birds in the neighbourhood too.

"Transcription of non-coding DNA does not automatically indicate function," says Ryan Gregory of the University of Guelph in Ontario, Canada, who studies genome evolution. "I don't think ENCODE will show that the majority of the 98 per cent of non-coding DNA in the human genome is functional for regulation. It would be astonishing if it took so much to regulate a mere 20,000 genes."

Only a handful of biologists, the most vocal being John Mattick of the University of Queensland in Brisbane, Australia, think most non-coding RNAs will turn out to have an important role. "It's now up to the proponents of the noise theory to explain why there's so much of the genome that's showing functional signatures," says Mattick.

But doing something is not the same as doing something useful, and there are good reasons to think that most of our DNA does not play a vital role. For starters, at conception we all have dozens of new mutations. Most of us have one to five mutations that adversely affect gene function in our protein-coding DNA alone, says Joseph Nadeau of the Institute for Systems Biology in Seattle, Washington.

If most of our DNA were vital, populations would acquire harmful mutations faster than they lose them through the death of embryos and children with lots of nasty mutations. "If the fraction of the genome that's functional increases, the question is: how do we tolerate that? Why aren't we dead many times over?" asks Nadeau.

One way to get a sense of the importance of a given bit of DNA is to look at whether it can accumulate mutations without consequence or whether it remains unchanged in a population because natural selection eliminates any individuals with mutations. "Something like seven out of ten nucleotide changes in [protein] coding sequences get kicked out because they're deleterious, but nine out of ten changes in non-coding sequence don't get kicked out," says Chris Ponting of the University of Oxford. "That's telling us something about the importance of coding changes versus non-coding changes."

Birney agrees, although he says ENCODE has looked at the transcribed RNA that's specific to primates and found that some of it seems to be under selective pressure in humans.

Another way to assess the importance of a given bit of DNA is to delete it to see what happens. This obviously can't be done in people, but in mice huge chunks of non-coding DNA that appeared to be functional have been deleted without any obvious effect. Then again, it is possible to delete many protein-coding genes from organisms such as yeast without any obvious effect, too.

One explanation is that in cossetted lab conditions, organisms can manage without DNA that is essential for survival in more challenging environments. Another is that there is a lot of redundancy in the genome. Although you would expect mutations to eliminate redundancy, there may be some circumstances in which it can be maintained.

"[Redundancy] could be what makes the system robust," says Dekker. "If any given piece of DNA is only part of the context, deleting it may have very limited effects. I think it may explain why we can tolerate huge variation between individuals yet we're all still walking down the street."

A third reason to think most of our DNA isn't vital is that although there isenormous variation in genome size between species, there is very little correlation between the complexity of animals and their genome size. There is no obvious reason why the marbled lungfish needs around 40 times as much DNA as we do, and nearly 400 times as much as the green pufferfish, for instance.

Because they can

Gregory has found that some kinds of animals, such as metamorphosing amphibians whose cells need to divide rapidly at times, have smaller genomes than other animals. This suggests that organisms tend to accumulate DNA until its size becomes detrimental, rather like the way people in a big house tend to fill the attic with junk whereas those in small houses have to keep throwing things out.

So we are left with something of a mystery. Although several lines of evidence suggest that most of our DNA is far from essential, ENCODE's results suggest that most regions of our genome do do something. One answer could be that most of these regions do not do anything of any great consequence. "They may still have effects. They may change someone's facial anatomy," says Ponting. "They may have very small effects which evolution isn't acting upon."

Birney thinks some of these regions do matter, though. "We don't yet have a definitive answer to how much of it is important, but we've discovered a lot more things that could be important than anybody had ever suspected," he says. "People often say it's the protein coding regions, plus a bit more. It's not a bit more, it's a lot more."

When ENCODE's work started, Birney himself was also highly sceptical about the role of non-protein-coding RNAs in genome function. He even bet Mattick a case of vintage champagne that less than 20 per cent of non-coding RNAs would turn out to be useful. "I am definitely closer to losing this bet," he says.

It could be a long time before one of them gets the champagne, though. Working out which of the millions of regions ENCODE has identified as functional are actually important is an immense task that could take many decades. Ultimately, the only way to prove that a particular region is vital is to show that variants in this region have some effect on people, which is far from easy. In some cases, though, the evidence already exists: the positions of significant genetic variants identified by studies looking for associations between genetic variants and diseases often coincide with ENCODE regions. "Most of the time we have something that's either bang on top of those regions identified by [these] studies, or really close by," says Birney.

This is already giving us clues about the causes of diseases. For example, some variants associated with Crohn's disease are in switches that ENCODE found are active in immune cells called T-helper cells.

It seems, though, that the more we learn about the genome, the less we know. "Our genome knows how to make a human, but I think it is hubris to think that that recipe book would be simple and well laid out," says Birney. "We are one of the most complicated things that we know about, and indeed it does look very complicated."

We're certainly a long way from the complete understanding of the genome portrayed in Gattaca. In fact, we may never achieve it. "It may well be too complex," says Moore.

Part of this complexity comes from the many ways in which our genome interacts with the environment. "I think DNA will become more predictive, but the flip side of this research is understanding how much of complex traits are due to environmental effects, or free will - things that we can change," says Birney. "DNA is not destiny."

To some extent, even the writers of Gattaca agreed. (Spoiler alert.) Despite Vincent's genetic inadequacies, he refused to accept his genetic fate and ultimately achieved his goal of leaving Earth.

Linda Geddes is a reporter for New Scientist

(0)

相关推荐