John Mamoun
How to create an e-text efficiently or automatically is an interesting
logistical problem. Here is my procedure, which I recently used to
make an e-text in about a week, with maybe 6 man-hours of work on my
part:
I take the book, and use an x-acto blade to cut out all of the pages.
I then feed the pages into an HP 4C scanner with an automatic document
feeder accessory attachment that I got from e-bay for $200. I feed it
up to 50 pages at a time, and it automatically scans them in.
I work the scanner using software called scan2000, from
www.informatik.com (30-day shareware trial period, $50 to register).
This program automatically works with the scanner to save each image
as a CCITT4 standard format TIFF file. Most importantly, it
automatically numbers each page, starting with an initial value you
specify (typically 001.tif) and increasing the number of the file name
by an increment you specify (typically by 2 pages, since you scan
double sided pages; you scan the evens first, then flip the pages over
and scan the odds, but you want the page numbers in order, right?). So
the scanner outputs, say, 001.tif, 003.tif, 004.tif, etc., then you
flip the pages over and re-feed them into the scanner; the even pages
are saved as 002.tif, 004.tif, etc., after you tell the program to
begin the first of the even page files with 002.tif.
So now I have a bunch of consecutively numbered CCITT4 TIFF files. At
this point, I could use a freeware program called cc42 (search for it
at www.pdfzone.com) to combine all of the sequentially numbered CCITT4
TIF files into a single PDF file with the pages in order.
Or, if making e-texts, not PDF files, I OCR the pages and save them as
corresponding pages like 001.txt, 002.txt, etc. I also use Paint Shop
Pro (shareware 30 day trial) to batch-convert the tiff files into GIF
file format. I can then upload the GIF files and the correspondingly
numbered text files to the Distributed Proofreaders page
(http://texts01.archive.org/dp/) to have them rapidly proofread by
numerous proofreaders, who finish the task at a rate of 50-100 pages a
day per book, very roughly speaking. When done, I then download the
text files as a single text file combining all of the files. The
upload function on the DP site is tedious, requiring one to upload
each file one-by-one, but I spoke to the webmaster recently, and he
said there are, with special arrangements, ways to FTP them or even
e-mail them to him on CD.
Now, hard returns. It was once a grave problem to fix hard returns so
that the text outputted to 65 characters per line. Then I got a
freeware program called Clipcase at www.shareware.com. With Clipcase,
you select a body of text (about 20 pages or so; any more, and the
program crashes) in your word processor, copy the text to the
clipboard, then load up Clipcase, paste the text into the Clipcase
window, the process the text.
When this happens, all of the hard carriage returns within the text
are eliminated, EXCEPT for returns between paragraphs. Then, you
select the text, copy it, and paste it into any word processor to
process it. I use Microsoft Word. After pasting all of the text into
it, I select all of the text, choose Courier New font, 10 point size,
and set the margins at 5.5 inches. With this setup, when the text is