« It Looks Like I Joined the Wrong Club
Main
Motivational Reading »
July 6, 2009
Epigraphy By The Numbers
This post is really a follow up on yesterday's effort. You might want to take a look at that one to see what this one is all about.
I was able to get hold of Michail Panagopoulos, et al, "Automatic Writer Identification of Ancient Greek Inscriptions," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 31, no. 8, pp. 1404-1414, July 2009 and even read through it.
Since this post is on archaeology and epigraphy, I thought an image from the paper would be in order.

Don't you just love ancient studies? In case this isn't enough for you, the paper has additional details of this nature with some narrative wrapped around them. Scattered among the equations and narrative are a few images of Greek letters with lines and symbols drawn on them.
The goal of the Panagopoulos, et al. study is to associate Greek inscriptions from the same hand without so associating inscriptions that are from a different hand. To over simplify greatly, the authors use a combination of character recognition and manipulation techniques combined with statistical noise cancelation filtering to find an ideal, "Platonic," representation for each of several letters on each inscription under study. They then compare these "Platonic" representations across 24 different inscriptions. The test was blind and in a couple of cases what they thought were individual inscriptions were actually known fragments from the same original inscription. Their results were 100% in agreement with those of epigraphers using standard epigraphic methods. I do feel that before we start replacing epigraphers with computers, researchers should do a much larger number of well controlled tests. The authors think the same thing.
Here are a few observations on the paper, including how it may speak to yesterday's wild speculations.
First, my concern about the number and frequency of repeated individual glyphs was correct. The authors worked with six individual letters each letter having up to ten instantiations on every inscription or inscription fragment. With fewer letters and instantiations, results become increasingly unreliable from a statistical viewpoint and, I'd guess, from a practical viewpoint also. Unless the method would work at the individual wedge level, which I kind of doubt, applying even a modified approach to Akkadian texts would require fairly large tablets each with a lot of text. Finding tablets with several signs that repeat ten or more times on each tablet could limit the use of the method however modified.
Second, if there is any hope of this working on Ugaritic or even Akkadian tablets, some algorithm modifications are certainly required. For one thing, researchers would need to introduce some kind of flatting algorithm on the front end to compensate for tablet surface curvature. The algorithm for contour extraction would likely need a major overhaul for two reasons. First, I don't think (but I'm not sure) that the algorithm would work with non-overlapping wedges. What is the contour shape of Ugaritic l,
, for example? One the one hand, is there a single contour shape for this and several other Ugaritic letters? Perhaps the method would exclude such letters as not diagnostic. On the other hand, how would the algorithm model wedge spacing? Is wedge spacing important? Second, the gradual sloping sides of wedges might make them harder to model than letters carved in stone. Exactly where the wedges begin is often not very clear and depth of impression may be a bigger issue with stylus in clay than it is with chisel in stone. Perhaps a three dimensional model is required. But there are a few issues of the same nature in engraved inscriptions that the authors were able to work through so this last concern may not be a showstopper.
Third, if the method, upon further testing and with proper modifications, could be made to work with tablet fragments or papyrus fragments large enough to have a statically meaningful numbers of repeat glyphs, than it does appear that it might help in identifying joins among these larger fragments. Here is why I think this is so:
The epigraphist/archaeologist confirmed that he fully agrees with the classification. He also revealed to the other authors that E15 and E8 were fragments of the same inscription, as well as E12 and E17.[p. 1413, emphasis added]
I do worry that, as currently implemented, there might be a few mechanical hand operations in need of automation before this is ready for primetime. But I didn't see anything that couldn't be automated. A computer does all the really hard parts now. There's no other way to do them.
The authors say this about further research,
Consequently, further research will include application of the method to more inscriptions and writers, development of other methods, and application of the methodology to modern writer identification. In any case, we do hope that application of the method and the developed system will help those who study historical documents since it can contribute to inscription dating and understanding of the evolution of the alphabet symbols, as well as to document authentication.
It looks to me like they need to reset their chronological vector in the opposite direction. Why not start with, oh, say, Late Bronze Age tablets from Ugarit and move forward until they have enough confidence in their system to identify those modern writers. Generating as false positive for a modern forgery is just too risky without several millennia of practice. ![]()
And there were those who may have questioned my citing a math textbook among my impactful five.
Update: July 6, 2009
Fixed a minor error
Posted by Duane Smith at July 6, 2009 9:41 AM | Read more on Archaeology |
Trackback Pings
TrackBack URL for this entry:
http://www.telecomtally.com/mt/mt-tb.cgi/2927
Comments
And now for a real brain-tripper, try applying this technique to Mayan script.
Posted by: Glen Gordon at July 7, 2009 10:04 AM
I'm sure you intend "Platonic" rather than "plutonic", yes?
Posted by: Kevin P. Edgecomb at July 7, 2009 1:35 PM
Kevin,
Of course, I fixed it. Thanks. I really do need an intention checker.
Posted by: Duane at July 7, 2009 2:15 PM
Though now I'll have to figure out a use for "plutonic". It's a great word.
Posted by: Kevin P. Edgecomb at July 9, 2009 10:38 PM
Hate to take the wind out of their sails but...
Scholars in Medieval studies have been working with computers since the mid-1990's on writer identification programs.
Members of the IGS have been working for years on the computerized handwriting ID problem -- their bi-annual conference is next week... lots of new stuff, too. (Yeah, I'm a member.)
For that matter, I've been using the computer for writer ID since 1991. That was the first thing I thought of when I got that first Logitec hand held scanner. I developed a number of techniques that will distinguish individual hands on an MS. Published on it, too. In fact, I also use computer manipulations on Greek inscriptions.
As you noted, Duane, they create a Platonic form on which to do their manipulations. I am quite familiar with Tracy's work. There are trees in that forest that he misses by using only measurements to re-unite fragments.
Then, there is a huge difference in the level of complexity in working with ligatured handwriting, cursives, and Greek in all caps and in stoichodon writing.
They are going to see if their program can be applied to modern handwritten texts? Maybe they should check in with what has been already done before they find themselves traveling up paths already long discarded as dead ends. Have they ever read any of Srihari's articles?
You know, articles with titles such as: "A system for handwriting matching and recognition," in the Proceedings of the 2003 Symposium on Document Images. There is an entire institute devoted to the handwriting recognition problem. And what the hell do they think the FBI's Forensic Institute has been doing ever since the judge's comments in Daubert vs. Dow. Twiddling their thumbs?
I think what really annoys me is their obvious ignorance of what has already been done. Just look at that title in the AJA: "The Study of Hands on Greek Inscriptions: The Need for a Digital Approach."
Hello? Only now you are finding out that computers are useful for writer ID programs? And that goes for the AJA, too!
Duane, if you are seriously interested in finding a way to work on Ugaritic with the computer, read up on what is already available. perhaps an already existing program will do it for you. I must warn you that the bib is enormous. I do not know if the problem of a curved surface has been addressed, but crumpled surfaces have been.
Posted by: rochelle at July 12, 2009 7:55 AM
Hi Rochelle,
I thought this might draw a comment from you. Thanks. I suppose I should have been more critical. The completely predicable result of not knowing the literature is that one doesn't know the literature. I always worry about that when I post of something that I know little about. Heck, I even worry when I post on something I know a lot about. I did this post for two reasons. First, even at the time I wrote the post there was a lot of nonsense about the use of the technique on inscriptions with only a very few glyphs. I wanted to point out the statistical basis of the method. This motivation obviously failed because even more nonsense appeared after my post. Second, I take almost every opportunity to make fun of the math phobics among us.
No, I'm not really interested in applying such method on Ugaritic texts. I just wanted to reflect on a few of the difficulties one might encounter. Now that doesn't mean that I wouldn't like someone else to try it.
Posted by: Duane at July 12, 2009 8:52 AM
Post a comment
Please read Abnormal Interest's Comments Policy.