SomeÂwhere withÂin the VatÂiÂcan exists the VatÂiÂcan Secret Archives, whose 53 miles of shelvÂing conÂtains more than 600 colÂlecÂtions of account books, offiÂcial acts, papal corÂreÂsponÂdence, and othÂer hisÂtorÂiÂcal docÂuÂments. Though its holdÂings date back to the eighth cenÂtuÂry, it has in the past few weeks come to worldÂwide attenÂtion. This has brought about all manÂner of jokes about the plot of Dan Brown’s next novÂel, but also imporÂtant news about the techÂnolÂoÂgy of manÂuÂscript digÂiÂtiÂzaÂtion. It seems a project to get the conÂtents of the VatÂiÂcan Secret Archives digÂiÂtized and online has made great progress crackÂing a probÂlem that once seemed imposÂsiÂbly difÂfiÂcult: turnÂing handÂwritÂing into comÂputÂer-searchÂable text.
In Codice Ratio is “develÂopÂing a full-fledged sysÂtem to autoÂmatÂiÂcalÂly tranÂscribe the conÂtents of the manÂuÂscripts” that uses not the stanÂdard method of optiÂcal charÂacÂter recogÂniÂtion (OCR), which looks for the spaces between words, but a new way that can hanÂdle conÂnectÂed curÂsive and calÂliÂgraphÂic letÂters. Their method, in the linÂgo of the field, “is to govÂern impreÂcise charÂacÂter segÂmenÂtaÂtion by conÂsidÂerÂing that corÂrect segÂments are those that give rise to a sequence of charÂacÂters that more likeÂly comÂpose a Latin word. We have designed a prinÂciÂpled soluÂtion that relies on conÂvoÂluÂtionÂal neurÂal netÂworks and staÂtisÂtiÂcal lanÂguage modÂels.”
This is a job, in othÂer words, for artiÂfiÂcial intelÂliÂgence, but in partÂnerÂship with human intelÂliÂgence, a selÂdom-tapped source of which the sciÂenÂtists behind In Codice Ratio have harÂnessed: that of high-school stuÂdents. Their speÂcial OCR softÂware, writes the Atlantic’s Sam Kean, works by “dividÂing each word into a series of verÂtiÂcal and horÂiÂzonÂtal bands and lookÂing for local minimums—the thinÂner porÂtions, where there’s less ink (or realÂly, fewÂer pixÂels). The softÂware then carves the letÂters at these joints.” But the softÂware “needs to know which groups of chunks repÂreÂsent real letÂters and which are bogus,” and so “the team recruitÂed stuÂdents at 24 schools in Italy to build the projects’ memÂoÂry banks,” manÂuÂalÂly sepÂaÂratÂing the letÂters the sysÂtem had propÂerÂly recÂogÂnized from those over which it had stumÂbled.
And so the stuÂdents became the sysÂtem’s “teachÂers,” improvÂing its abilÂiÂty to extract the conÂtent of handÂwritÂing, and not just handÂwritÂing but vast quanÂtiÂties of archaÂic handÂwritÂing, with every click they made. The encourÂagÂing results thus far mean that it probÂaÂbly won’t be long before large porÂtions of the VatÂiÂcan Secret Archives (which, conÂtrary to its awkÂwardÂly transÂlatÂed name, is such a non-secret it even has its own offiÂcial web site) will finalÂly become easy to browse, search, copy, paste, and anaÂlyze. So they may, in the fullÂness of time, prove a fruitÂful resource indeed to writÂers of CatholiÂcism-cenÂtric thrillers like Brown — who, after all, has already gone pubÂlic with his enthuÂsiÂasm for manÂuÂscript digÂiÂtiÂzaÂtion.
RelatÂed ConÂtent:
Based in Seoul, ColÂin MarÂshall writes and broadÂcasts on cities and culÂture. His projects include the book The StateÂless City: a Walk through 21st-CenÂtuÂry Los AngeÂles and the video series The City in CinÂeÂma. FolÂlow him on TwitÂter at @colinmarshall or on FaceÂbook.
Leave a Reply