机器学习从 DNA 序列特征预测昼夜节律基因表达

昼夜节律是维持人类、植物和动物健康的关键,了解生物钟有助于提高植物的生长和产量,还能为应对人类疾病提供新的途径。在一项基于人工智能和机器学习技术的最新研究中,研究人员利用新生成的数据集、发布的时间数据集和拟南芥(Arabidopsis thaliana)基因组,预测了拟南芥复杂的时间昼夜节律基因的调控和表达模式,增进了对生物时钟机制的理解。在此过程中,团队利用从公共基因组资源中重新生成的 DNA 序列特征对昼夜节律基因进行分类,在不需要实验工作或前提背景知识的情况下促进了该方法的下游应用。此外,DNA 序列特征被用以区分转录组表达的时间阶段,揭示了昼夜节律类中隐藏的亚类。最后,从单个转录组时间点预测昼夜时间,得出对准确预测影响最大的标记转录组。这一技术不限于植物,也可用于研究抑郁症,癌症等疾病与生物钟失调的联系。相关研究结果于 8 月 10 日发表在《美国科学院院刊》(PNAS)。(Earlham institute,PNAS

Unlocking the AI algorithm 'black box’ - new machine learning technology to find out what makes plants and humans tick

The inner 24-hour cycles - or circadian rhythms - are key to maintaining human, plant and animal health, which could provide valuable insight into how broken clocks impact  health.

Circadian rhythms, such as the sleep-wake cycle, are innate to most living organisms and critical to life on Earth. The word circadian originates from the Latin phrase 'circa diem’ which means 'around a day’.

Biologically, the circadian clock temporally orchestrates physiology, biochemistry, and metabolism across the 24-hour day-night cycle. This is why being out of kilter can affect our fitness levels, our health, or our ability to survive. For example, experiencing jet lag is a chronobiological problem - our body clocks are out of sync because the normal external cues such as light or temperature have changed.

The circadian clock isn’t unique to humans. In plants, an accurate clock helps to regulate flowering and is crucial to synchronising metabolism and physiology with the rising and setting sun. Understanding circadian rhythms can help to improve plant growth and yields, not to mention revealing new avenues for tackling human diseases.

Beyond plants

For this latest research, the team applied ML to predict complex temporal circadian gene expression patterns in model plant Arabidopsis thaliana. Taking newly generated datasets, published temporal datasets, and Arabidopsis genomes, the team of scientists trained ML models to make predictions about circadian gene regulation and expression patterns.

Featured in the journal PNAS, the work demonstrates the power of AI and ML-based approaches to enable more cost-effective analysis and deeper insight into the function of the circadian clock and its regulation. These approaches are redefining how scientists use public data and generate testable hypotheses to understand gene expression control in plants and humans.

Lead author Dr Laura-Jayne Gardiner from IBM Research Europe (formerly at the Earlham Institute where the research was carried out), said: “Essentially, our inner rhythm is driven by a circadian clock, which is a biochemical oscillator synchronised with solar time or the position of the sun in the sky. In most living things, including animals, plants, fungi and even cyanobacteria, internally synchronised circadian clocks make it possible for an organism to anticipate daily environmental changes corresponding with the day-night cycle and adjust its biology and behavior accordingly.”

Detecting circadian rhythms

Prof Anthony Hall, Group Leader at the Earlham Institute, said: “Genes involved in the circadian clock typically show an oscillation between off-on state rhythmic patterns throughout a 24-hour period. This pattern is called circadian rhythmicity.

“Detecting circadian rhythmicity with existing methods is challenging as it requires using sequencing technologies to generate long, high-resolution, time-series transcriptome datasets to measure gene expression throughout the day. Not only is this expensive, it is also time-consuming for laboratory scientists. Consequently, our knowledge to date of how genes are controlled and regulated in a circadian clock is limited.”

The development of AI and ML based technology was initially applied to the model plant Arabidopsis, progressing to testing other complex or temporal gene expression patterns as well as other species across Arabidopsis ecotypes. Furthermore, the team have adapted the ML approach for wheat to show that the methods used allow accurate analysis of key food crops.

Arabidopsis thaliana is a popular scientific model organism used by plant biology and genetics. The first plant to have its genome sequenced, it has been used to understand the molecular biology and genetics of many plant traits, including circadian regulation.

“Our ML models classify circadian expression patterns using iteratively lower numbers of transcriptomic timepoints, which is an improvement in accuracy compared to the existing state-of-the-art models,” explained Prof Hall.

“We developed a ML model which generates a proxy gene set to predict the circadian time (phase) from a single transcriptomic sampling time point in the day. There are thousands of public transcriptomic datasets and by comparing this predicted time with the experimental time, we can identify specific genes or conditions that alter the clock function. Therefore increasing our understanding of the mechanism and function of the clock.”

“We re-defined the field by developing ML models to distinguish circadian transcripts that don’t use transcriptomic timepoint information, but instead DNA sequence features generated from public genomic resources. Therefore, allowing us to predict the circadian regulation of genes simply by analysing the genome DNA sequence.”

The researchers based their study on the theory that a major mechanism of gene expression control, be it circadian or other mechanisms, is through transcription factors (and other factors) that bind to a regulatory DNA sequence.

Transcription factors are vital molecules that can control gene expression - directing when, where and to what degree genes are expressed. They bind to specific sequences of DNA and control the transcription of DNA into mRNA.

Explainable AI

Dr Gardiner adds: “Our ML models and their application in crops, where circadian rhythms are critical to maintaining healthy growth and development, could lead to increased yields as agricultural scientists and farmers begin to use the model to understand the inner rhythms of the plants they grow and harvest.

“However, the technology we developed goes beyond the scope of plants. We are now looking at different species to investigate the circadian clock and its link to disease in humans, for example, where the dysregulation of the circadian clock has been associated with a range of diseases from depression to cancer.”

Dr Gardiner is clear about the value of ML and AI in gaining deeper insights into circadian regulation: “What makes our models more informative is our usage of explainable AI algorithms,” she explains. “We wanted to use the interpretation of our ML models to illuminate what’s inside the 'black box’, so that we can better understand the predictions they make.

“We used local model explanations that are transcript specific to rank DNA sequence features, which provide a detailed profile of the potential circadian regulatory mechanisms for each transcript. Using the local explanation derived from ranked DNA sequence features allows us to distinguish the temporal phase of transcript expression and, in doing so, reveal hidden sub-classes within the circadian class. E.g., whether a transcript is likely to show its peak expression in the morning, afternoon, evening or night.”




  • 综述 | Science:生物钟、癌症与时间化学疗法

    编译:微科盟Echo,编辑:微科盟Tracy.江舜尧. 微科盟原创微文,欢迎转发转载. 导读 生物钟可以协调人类生物化学.生理和行为功能的每日节律.基因表达.细胞分裂和DNA修复都受到生物钟的调节,这 ...

  • 熬夜降低了DNA修复过程的有效性,患癌风险大增

    昼夜节律,即生物钟,已得到科学家的广泛研究.昼夜节律赋予了人类行为和生理学的时间模式,使身体内在与外在环境的预期变化保持一致,被扰乱的昼夜节律会对健康产生影响.熬夜会增加代谢紊乱的风险,从代谢和心血管 ...

  • Science子刊解读!降低YKL

    2020年12月19日讯/生物谷BIOON/---睡眠中断.白天困倦和其他昼夜节律紊乱的症状是阿尔茨海默病患者常见问题,而且随着疾病的进展,这些问题会越来越严重.但是,阿尔茨海默病和昼夜节律紊乱之间存 ...

  • DNA结合蛋白特征提取算法综述

    DNA结合蛋白是一类特殊的蛋白质,它能够与DNA相结合,并通过两者之间相互作用,实现DNA转录.复制等功能,进而对生物体的生命活动进行调控,因此对DNA结合蛋白的识别研究,能够帮助人们更好地理解核酸和 ...

  • 夜班的危害远超过你的想象!

    又是一个不眠的夜班,感觉自己被掏空 TUT. 图源:soogif 都知道熬夜对身体有害,可是夜班的危害远超过你的想象!!! 前不久,Journal of pineal research 杂志最新发布了 ...

  • 熬夜真让人变笨!研究发现作息不规律可能诱发阿尔茨海默病症状

    "晚上不想睡,早晨不想起"已是许多人生活的常态."大不了下午/周末补个觉!"然而,最近一项研究发现,如果改变正常作息,长期暴露在光照条件下,扰乱身体的昼夜循环, ...

  • 教授预聘制的缺点

    人类实践的体制,恐怕没有一个是只有优点.没有缺点的. 近百年世界各国大学逐渐普及的教授预聘制,也是一样. 有优点,有缺点. 有适用的时间和空间范围. 当然,伪预聘制泛滥的时空,不是预聘制的问题.打着预 ...

  • 陈根:熬夜背后,放纵作祟

    文/陈根 睡眠是大脑和身体唤醒的自然循环状态,人的一生中有三分之一的时间都在睡眠中度过.睡眠对于人体的重要性不言而喻. 但事实是,睡不好觉,已经越来越成为现代人的一种通病.<2018年中国睡眠指 ...

  • NMN-让你睡的像个婴儿

    Click on the blue text above follow GeneChic 生物钟 biological clock 地球上大多数生命体,包括人类 都有一种叫"生物钟" ...

  • 长期夜班会增加患癌风险?新研究发现该群体的DNA损伤更严重

    据外媒报道,一项新实验研究发现,夜班工作造成的昼夜节律紊乱可以会改变肿瘤相关基因的表达,进而使得一个基因更容易受到导致癌症的DNA损伤的影响.这项研究建立在对昼夜节律在自然DNA修复过程中所起作用的越 ...