Germanic Lexicon Project
Filenames
Previous Up Next

Following is the project-internal standard for filenames. If you want to submit a scanned text to this collection, it would be very helpful if your filenames followed these guidelines.

Anatomy of a filename

Suppose that page xii of a book is scanned into PNG format. This file would be named a0012.png. Here is the system:


Letter a is for Roman-numeraled introduction pages (i, ii, iii, iv...).

Letter b is for the main body of the book (1, 2, 3, 4...).

Sections c and d are the introduction and main body of volume II, if this exists.
This is the page number fron the book.

Roman numerals are converted to Arabic numerals (so xii would be written 12).

The page number is padded with initial zeroes to make it exactly four digits long.
The project uses .png, .tiff, .html, .pdf, and .txt formats for individual pages.



Why pad the page number with zeroes?

The order
you expected
:
(numeric order)

1
2
3
9
10
11
18
20
21
99
100
101
1000
What the computer usually
gives you instead:

(alphabetical order)

1
10
100
1000
101
11
18
2
20
21
3
9
99
Adding initial zeroes
forces the correct order:



0001
0002
0003
0009
0010
0011
0018
0020
0021
0099
0100
0101
1000

The rationale to this whole numbering scheme is this: the computer should list the pages in the correct order, exactly as they appear in the original paper book. This makes it MUCH easier to manage and use the online files.

Many texts could get by with three digits (000-999), but a few require four. It is convenient for automated processing if the number of digits is uniform across texts, so we go ahead and use four digits for all texts.

Roman numerals are converted to Arabic numerals, once again to force the correct order. Putting Roman numerals into alphabetical order would give the wrong order.



Indicating which text a page belongs to

Some files have two extra letters at the beginning to show which text they belong to. For example, bt_b1033.pdf is page 1033 of Bosworth/Toller, first volume. At this writing, the only codes in use are bt (Bosworth/Toller), cv (Cleasby/Vigfusson), and tp (Fick/Falk/Torp).


Contact