Germanic Lexicon Project
By Sean Crist
My goal since the late 1990's has been to give 100% coverage of the lexicons of the early Germanic languages in an integrated electronic format which programmers can readily work with. The three texts which I consider the highest priority are:
- Fick, Falk, and Torp's Wortschatz der Germanischen Spracheinheit (1909)
- Bosworth and Toller's An Anglo-Saxon Dictionary (1898) and supplement(1921)
- Cleasby and Vigfusson's An Icelandic-English Dictionary (1874)
This plan gradually developed in the late 1990's thru 2001. By summer 2001, work was underway on Torp and on Bosworth/Toller.
I've been working on the texts in the order given. I finished the major corrections on Fick/Falk/Torp in 2003 and have now turned my full attention to Bosworth/Toller and Cleasby/Vigfusson.
Mid-1990's: I conceived of a free, comprehensive online database on the lexicons of the early Germanic languages.
August 1998: As an exploratory effort, I scanned the glossary to Wright's Gothic Grammar and did OCR and correction on it. This was a short text, but many of the semi-automated correction and validation techniques were worked out at this stage.
July 1999: For the second text, I started the glossary from Bright's Anglo-Saxon Grammar. Performing OCR gave such degraded results that I decided that the text needed to be hand-typed. I asked for volunteers on the ANSAXNET mailing list to each type a few pages; I snail-mailed xeroxed pages to volunteers, and they emailed me back the online text. The project was a fast success; volunteers quickly covered the whole text. I realized that volunteer effort on the Internet could make a lot of things happen. On the other hand, I also realized that getting volunteers to follow any kind of standard on anything is a real difficulty.
October 1999: I researched data entry options for Bosworth/Toller, as well as data formats such as TEI.
January 2000: I submitted a proposal to the National Science Foundation to fund the correction of Fick/Falk/Torp and of Clark Hall. In May, I got word that the proposal had not been funded, although the comments were positive and seemed to encourage re-applying.
October 2000: I scanned Fick/Falk/Torp.
March 2001: I performed OCR on Fick/Falk/Torp and started the global corrections.
Around this same time, I became aware of Cleasby/Vigfusson. I had originally planned to digitize Zoëga's student dictionary of Old Icelandic, but as it came clear that Cleasby/Vigfusson is the much more comprehensive work, my priority shifted to Cleasby/Vigfusson.
Summer 2001: I and my colleagues obtained a Joel Dean grant to fund the scanning of Bosworth/Toller. The scanning and preliminary OCR were finished that summer by Jason Burton.
Summer 2001: I started hand corrections of Fick/Falk/Torp.
July 2001: I submitted a second proposal to the National Science Foundation, this time to fund the correction of Fick/Falk/Torp and Bosworth/Toller. In January 2002, I got word that the proposal had not been funded, although the comments were once again positive.
Summer 2002: I and my colleagues obtained a Joel Dean grant to fund a student to write special software to make thousands of automated corrections in the uncorrected text of Bosworth/Toller. This work was done by B. Dan Fairchild.
January 2003: I submitted a third proposal to the National Science Foundation to fund the correction of Bosworth/Toller and Cleasby/Vigfusson (CHECK). In June, I got word that the project had been turned down for the third time. The comments were once again positive, but at this point I gave up on the NSF.
January 2003: I submitted a proposal to the American Scandinavian Foundation to fund the scanning (but not OCR or correction) of Cleasby/Vigfusson. The funding was approved in March.
Spring 2003: Rebecca Kuipers scanned Cleasby/Vigfusson, post-processed the images (in preparation for their inclusion in the PDF files for the web-based correction system), and corrected the page headers text.
April 2003: I finished hand-correction of Fick/Falk/Torp.
June 2003: I submitted a proposal to the National Endowment for the Humanities to correct Bosworth/Toller and Cleasby/Vigfusson. In March 2004, I got word that the project had not been funded, although the comments were once again positive as they had been on three occassions with the NSF.
Summer 2003: Margaret Hoyt post-processed the scanned images for pages b0400-b1000 of Bosworth/Toller. I post-processed the others. This was necessary as a part of creating the PDF files for the web-based correction system. Grace Mrowicki and Michael O'Keefe corrected some pages.
Summer 2003: I performed OCR on Cleasby/Vigfusson. Michael O'Keefe did a great many global corrections on the text, and hand-corrected the first dozen pages.
September 2003: I wrote a preliminary DTD for Fick/Falk/Torp and converted the document to a preliminary valid XML form.
October 2003: I wrote a web-based search engine for Fick/Falk/Torp, Cleasby/Vigfusson, and Bosworth/Toller.
August 2004: After poking at the programming project for a long time, and a summer of more intensive hard work, I finally released the web-based volunteer correction system to the public.