011 教育:全球37种语言的通用语法
All languages have evolved to have this in common
图1 所有的语言都进化出来一个同样的规则!
(译者注:学习外语,大胆地讲和写是关键,只要句子有简明的信息,别人就会懂,除此之外,其他的语法都不那么重要。)
汉语、英语、德语、西班牙语等等全球流行的语言有多达37种之多,看上去他们相互间差别很大。但是,科学家日前在PNAS(美国顶尖科学学术刊物)上发表了对这些语言的最新研究成果:不管你讲什么语言,你的语法都要求你用最少的力气去让别人听明白,也就是说,语言都要求更高的效率。
因此,这里就发现了一个简单的规则,就是每一句话要采用“附着距离(DL: dependency length)总和最小化”来确定用词的排列和顺序。
图2 有关系的词之间的距离用数字表示,每个句子所用的距离加起来要最短。
图2 A句和B句有同样词,C句和D句也有同样多的词,但是A句的附着(总)长度(DL: dependency length)为6,B 句为7,C句为14,D句为20。可见, 根据DL最小化原则,我们应该采用A句和C句,而不是D句和C句,因为6<7,14<20。
图2 展示了我们为什么喜欢说“约翰扔掉了那些垃圾(John threw out the trash.)”而不是“约翰扔那些垃圾出去了(John threw the trash out.)”。这里最关键的词,也就是主动(谓)宾是John, threw和trash,他们之间具有直接的附着关系。但是仅仅只有这三个词表达的意思不够丰富,于是我们还得在这三个此种塞入out和the,其中out附着于threw,the附着于trash.紧挨着的词附着距离是1。可是threw和trash被out和the 隔开了,所以threw和trash之间的附着距离是3,这样,把所有有附着关系的词的附着距离加起来,得到了总附着距离6。可是对于句子的另一种说法,John threw the trash out,由于threw和out也被隔开了,附着距离为3,虽然threw和trash的附着距离降低到2,但总的附着距离还是增加到了7。所以前面一种语法的表达轻松一些。对于复杂的句子,如图中的C和D,这种距离可能达到14和20的差别。因此应该选择C的表达方式而不是D的表达方式。
虽然每种语言的主动宾(SVO:S代表主词,V代表动词,O代表宾词)顺序不同,例如英语和法语是SVO,德语和日语则是VSO,阿拉伯语和希伯来语是VSO(不存在O开头的语法结构),但以上附着距离总和最小化(DLM: dependency length minimization)的原则在各种语言的进化中都是基本原则。当然不同语言在进化中的最小化程度不同,对比古语,德语最小化的程度不明显,但意大利语,印度尼西亚语和爱尔兰语的最小化程度最大。一般来说SOV结构的语言最小化的程度较小,这是因为要用到“格位标记”需要用来区分主词和宾词。例如,英语“John kisses Mary” (约翰吻玛丽)或 “Mary kisses John”(玛丽吻约翰) 清楚地表明了谁吻了谁,这样就一定要用到词的格(如主格或宾格)。但在日语中 “John Mary kiss” (约翰玛丽吻)就不是太清楚到底是谁吻了谁。英语中的一些格其实是来自于德语语源的古英语,如我们说 “He threw the ball to her” 而不是 “He threw the ball to she” ,其中的her 就来自于德语,这个her,而不是she 是为了明确表示她是宾词而不是主词。
附着距离总和最小化(DLM)原则其实是让语言更加易懂,尽量不要浪费大脑的能量及存储资源,这是语言进化的基本原则。
--------------------------------------------------
原新闻稿:Science| DOI: 10.1126/science.aac8959)
----------------------------英语学习------------------------------
英文原文: All languages have evolved to have this in common
Have you ever wondered why you say “The boy is playing Frisbee with his dog” instead of “The boy dog his is Frisbee playing with”? You may be trying to give your brain a break, according to a new study. An analysis of 37 widely varying tongues finds that, despite the apparent great differences among them, they share what might be a universal feature of human language: All of them have evolved to make communication as efficient as possible.
Earth is a veritable Tower of Babel: Up to 7000 languages are still spoken across the globe, belonging to roughly 150 language families. And they vary widely in the way they put sentences together. For example, the three major building blocks of a sentence, subject (S), verb (V), and object (O), can come in three different orders. English and French are SVO languages, whereas German and Japanese are SOV languages; a much smaller number, such as Arabic and Hebrew, use the VSO order. (No well-documented languages start sentences or clauses with the object, although some linguists have jokingly suggested that Klingon might do so.)
Yet despite these different ways of structuring sentences, previous studies of a limited number of languages have shown that they tend to limit the distance between words that depend on each other for their meaning. Such “dependency” is key if sentences are to make sense.
For example, in the sentence “Jane threw out the trash,” the word “Jane” is dependent on “threw”—it modifies the verb by telling us who was doing the throwing, just as we need “trash” to know what was thrown, and “out” to know where the trash went. Although “threw” and “trash” are three words away from each other, we can still understand the sentence easily.
But we might have more trouble understanding a sentence like “Jane threw the old trash sitting in the kitchen out,” because now “threw” and “trash” are four words apart and “threw” and “out” are eight words apart. We can shorten those dependency distances, and make the sentence clearer, by changing it to read “Jane threw out the old trash sitting in the kitchen.”
Sentences A and B are the same length and use the same words, as do C and D. But the dependency lengths of the second sentence in each pair are longer.
Sentences A and B are the same length and use the same words, as do C and D. But the dependency lengths of the second sentence in each pair are longer. R. FUTRELL ET AL., PNAS (2015)
These observations had led some linguists to hypothesize that all of the world’s languages reduce the distance between dependent words, something called dependency length minimization (DLM). Yet the most comprehensive previous studies of this trend only covered seven languages. Although most of them did show at least some evidence for DLM, the support for it in German was weak. That finding raised doubts about whether DLM really was a universal feature of human language.
To try to resolve the question, a team led by Richard Futrell, a linguist at the Massachusetts Institute of Technology in Cambridge, analyzed 37 languages from 10 different language families to see how much they minimized dependency lengths over what would be expected by chance. In addition to major languages such as English, German, French, and Spanish, the database also included ancient Greek, Arabic, Basque, Tamil, and Telugu, one of India’s classical languages. For most of the languages, the researchers used written prose from newspapers, novels, and blogs, although for ancient Greek and Latin they relied on poetry. They crunched thousands of sentences using software designed to measure dependency lengths.
The results, published online today in the Proceedings of the National Academy of Sciences, demonstrate that all 37 languages, including German, minimize dependency lengths to degrees greater than expected by chance. Nevertheless, the team found wide variations in the extent of DLM. Thus Italian, Indonesian, and Irish showed high degrees of minimization, whereas Japanese, Korean, and Turkish showed much less. In general, SOV languages like German tend to have longer dependency lengths, although this is not a hard-and-fast rule.
Just why these variations exist is a topic for future research, the authors say. But they point out that German and many other SOV languages employ a linguistic device called “case marking,” a modification of key words in a sentence that makes it easier to distinguish the subject from the object. For example, whereas English speakers must say either “John kisses Mary” or “Mary kisses John” to know who is kissing whom, in Japanese one can say “John Mary kiss” because the case marking will make it clear. (English, an SVO language that generally does not use case markings, nevertheless has some vestiges of it from its origins in the Germanic Old English: We say “He threw the ball to her” rather than “He threw the ball to she” to make it absolutely clear who is the subject and who is the object.)
Limiting dependency length is advantageous, Futrell says, because convoluted sentences require more memory processing—and thus more energy—for both listeners and speakers who are trying to understand and be understood. Thus it makes sense that short dependency lengths became a universal feature in human language. “As language users, we have a choice of many ways of expressing ourselves,” Futrell says. “What languages don’t do is force you” into inefficient and energy-wasting use of memory stores.
The new work is a “major advance” because “it shows that DLM is a property of languages in general,” says David Temperley, a cognitive scientist at the University of Rochester (U of R) in New York. Nevertheless, he stops short of concluding that it is a “universal” or “hard-wired” feature of language, rather than a strategy that humans have developed over time to make themselves better understood. Florian Jaeger, a psycholinguist also at U of R, agrees. Jaeger says that the current paper, along with other recent research, shows that although “the bias towards efficiency is a strong factor in explaining” common features of the world’s languages, “finding a potentially universal pattern does not necessarily” mean that it is “genetically encoded.”