Andrew Robinson claimed that decipherment required a combination of synthesis, logic and intuition that is beyond the reach of artificial intelligence.
Regina Barzilay, an associate professor in MIT’s Computer Science and Artificial Intelligence Lab, Ben Snyder, a grad student in her lab, and the University of Southern California’s Kevin Knight took that claim personally. At the Annual Meeting of the Association for Computational Linguistics in Sweden next month, they will present a paper on a new computer system that, in a matter of hours, deciphered much of the ancient Semitic language Ugaritic. [MIT New Release]
The mathematics is interesting and worthy of pursuit but if one needs to use a grammar, I hardly call it a decipherment. “. . . manual morphological segmentation was carried out with the guidance of a standard Ugaritic grammar (Schniedewind and Hunt, 2007).” It is true that, technically, decipherment only involves assigning phonimic and/or lexical values to a set of glyphs. Construction of a grammar is part of interpretation. But I never thought one would have the luxury of a grammar to help in decipherment (cart and horse and all that). I’m sure Virolleaud, Bauer and Dhorme could have completed the decipherment of Ugaritic even faster if they had only had a grammar.
Like I said, this is abnormally interesting but it isn’t what I would call decipherment. At the very best, Snyder, Barzilay and Knight’s method may be a tool that, in some cases, may be helpful. It also doesn’t hurt that they used a corpus with “7,386 unique word types.” Because the corpus has grown significantly over the years, that’s a whole lot more than Virolleaud, Bauer and Dhorme had when they deciphered Ugaritic the old fashioned way.
Here’s what Andrew Robertson wrote in an email to, well I’m not sure who it was to, but the MIT News Release quotes from it.
“If the authors believe that their approach will eventually lead to the computerised ‘automatic’ decipherment of currently undeciphered scripts,” he writes in an e-mail, “then I am afraid I am not at all persuaded by their paper.” The researchers’ approach, he says, presupposes that the language to be deciphered has an alphabet that can be mapped onto the alphabet of a known language — “which is almost certainly not the case with any of the important remaining undeciphered scripts,”
Read the Snyder, Barzilay and Knight paper, “ Statistical Model for Lost Language Decipherment” and see what you think.
Assuming a cognate relationship with Biblical Hebrew, it is interesting that Snyder, Barzilay and Knight were able to identify 29 or the 30 Ugaritic letters. That’s quite good. If they would have tried Arabic as the best fit cognate language would they have gotten 100%? Also, would they have gotten better than 60% cognate word identifications?