MIT software program decodes “lost” languages
From Nat Geo:
A new computer program has quickly deciphered a written language last used in Biblical times—possibly opening the door to “resurrecting” ancient texts that are no longer understood, scientists announced last week.
Created by a team at the Massachusetts Institute of Technology, the program automatically translates written Ugaritic, which consists of dots and wedge-shaped stylus marks on clay tablets. The script was last used around 1200 B.C. in western Syria.
Written examples of this “lost language” were discovered by archaeologists excavating the port city of Ugarit in the late 1920s. It took until 1932 for language specialists to decode the writing. Since then, the script has helped shed light on ancient Israelite culture and Biblical texts.
Using no more computing power than that of a high-end laptop, the new program compared symbol and word frequencies and patterns in Ugaritic with those of a known language, in this case, the closely related Hebrew.
Through repeated analysis, the program linked letters and words to map nearly all Ugaritic symbols to their Hebrew equivalents in a matter of hours.
The program also correctly identified Ugaritic and Hebrew words with shared roots 60 percent of the time. Shared roots are when words in different languages spring from the same source, such as the French homme and Spanish hombre, which share the Latin root for “man.”
Led by computer science professor Regina Barzilay, the team may be the first to show that a computer approach to dead scripts can be effective, despite claims that machines lack the necessary intuition.
“Traditionally, decipherment has been viewed as a sort of scholarly detective game, and computers weren’t thought to be of much use,” Barzilay said.
“Our aim is to bring to bear the full power of modern machine learning and statistics to this problem.”
Not Always a “Rosetta Stone”
The next step should be to see whether the program can help crack the handful of ancient scripts that remain largely incomprehensible.
Etruscan, for example, is a script that was used in northern and central Italy around 700 B.C. but was displaced by Latin by about A.D. 100. Few written examples of Etruscan survive, and the language has no known relations, so it continues to baffle archaeologists.
“In the case [of Ugaritic], you’re dealing with a small and simple writing system, and there are closely related languages,” noted Richard Sproat, an Oregon Health and Science University computational linguist who was not involved in the new work.
“It’s not always going to be the case that there are closely related languages that one can use” for Rosetta Stone-like comparisons.
But study leader Barzilay thinks the decoding program can overcome this hurdle by scanning multiple languages at once and taking contextual information into account—improvements that could uncover unexpected similarities or links to known languages.