Torah puzzle

Running

Programs

Any one of my programs, if called with no (or too few) parameters, informs you about the needed parameters and terminates immediately. For example:

>domain
usage: domain infile outfile
>prepare
usage: prepare word infile outfile

Here > is the prompt of the operating system.

Steps of WRR94 algorithm

The algorithm of the computation reported in [WRR94] is divided by me into 6 steps. They gradually transform Data 0 (the source) into Data 6 (the result):

For detailed description of the steps and the data formats, see my Steps of the computation page.

The corrected distance between two Hebrew words, say, "WHYQDC" (Zedekia) and "HYNTM" (Matanya), can be computed by the following sequence of program calls:

prepare 'CDQYHW' data0 data1a
search 2 data1a data2a
sort_per data2a data3a
domain data3a data4a

prepare 'MTNYH' data0 data1b
search 2 data1b data2b
sort_per data2b data3b
domain data3b data4b

distance data4a data4b data5
correct data5 data6
General forms of the calls (separate lines, not a sequence):
prepare word infile outfile
search mode infile outfile
sort_per infile outfile
domain infile outfile
distance infile1 infile2 outfile
correct infile outfile
Thus, for example, the file data3b is created by the second run of sort_per and used by the second run of domain (and after that it may be deleted).

The source file data0 is the text of Bereishit (Genesis) in the book format or the table of letters format.

The mode parameter of search sets the range of perturbations; use mode=2.

The word parameter of prepare is a Hebrew word, written from the left to the right. Hebrew letters must be represented in the same way as in data0. The Michigan-Clairmont transliteration can make troubles: some Hebrew letters are represented by special characters ) ( + $ , which may confuse your operating system. Usually, the confuse is avoided by enclosing the word either in apostrophes or double quotes: 'CDQYHW' or "CDQYHW", depending on the system.

If you'll run the programs frequently, you probably will make it easier by using script (batch) files, pipes, files on a virtual disk, and so on, depending on inclinations of you and your computer.

Text reformatting programs

A number of formats is used when a Hebrew book is written as a file.

Formats may be combined with various Hebrew code tables. For international exchange on the Internet, we use representing Hebrew letters by uppercase Latin letters. However, if your computer supports Hebrew, you'll probably prefer the corresponding Hebrew code table for local processing.

I provide the following programs:

Their calls (separate lines, not a sequence):
import infile outfile
squeese infile outfile
init infile outfile
translit from to infile outfile
Before running translit you need two simple one-line files. One of them, michigan.alp (take it) contains the Hebrew alphabet in the Michigan-Clairmont transliteration:

[ ) B G D H W Z X + Y K L M N S ( P C Q R $ T ]
The other one-line file hebrew.alp should contain the Hebrew alphabet as it appears on your computer. Make it yourself. Keep the form of michigan.alp, especially the order of letters, and the square brackets; spaces do not matter. When michigan.alp and hebrew.alp are ready, run the transliteration:

translit michigan.alp hebrew.alp data0 data0.heb
and get data0.heb, the Book in Hebrew. Enjoy!

Another program, reverse, reverses an alphabet. For example, doing

reverse michigan.alp michigan.inv
you get a one-line file michigan.inv containing

[ T $ R Q C P ( S N M L K Y + X Z W H D G B ) ]

Letter frequencies and skip restrictions

Before searching for ELS's matching a given word, the WRR94 algorithm estimates the expected number of matchings (see Step 1-2 for detail), which is a part of the search program. The same is made more explicitly by the following programs.

Calculating letter frequencies:

frequen infile outfile
If infile is the text of Bereishit (either in the book format, or as a table of letters), you get the following outfile:
[ $ : 3574 ]
[ ( : 2823 ]
[ ) : 7634 ]
[ + : 308 ]
[ B : 4332 ]
[ C : 1091 ]
[ D : 1848 ]
   . . .
[ X : 1844 ]
[ Y : 9035 ]
[ Z : 428 ]
It means that the letter "Shin" (transliterated as $) appears 3574 times in Bereishit, ..., and the letter "Zain" (transliterated as Z) appears 428 times.

Ordering frequencies alphabetically:

freq_alp alphabet infile outfile
Here alphabet is a one-line file similar (or identical) to michigan.alp (see translit in the section "Text reformatting programs" above), and infile is the output of frequen. The outfile looks as follows:
[ )  7634 ]
[ B  4332 ]
   . . .
[ $  3574 ]
[ T  4152 ]
A letter of zero frequency (if any) will appear in the outfile, though it is absent in the infile.

Computing skip restrictions:

restrict frequencies words outfile
Here frequencies is the output of frequen or freq_alp (both fit), and words is a list of words like that:
[ MHRB)YBR ]
[ YB)RH ]
 . . .
[ $$RHMH ]
[ HML$YBR ]
The words may be written from the right to the left, or from the left to the right, it is the same, since the computation is symmetric. The outfile looks as follows:
[ MHRB)YBR         11154 unrestricted  0.7 ]
[ YB)RH               22  restricted  10.2 ]
            . . .
[ $$RHMH            1016  restricted  10.0 ]
[ HML$YBR          13012 unrestricted  7.8 ]
The first number is the maximal skip, the second number is the expected number of mathchings. For more explanation, see my ELS's for Table 2 page. I provide two versions of the restrict program: one version fits exactly the corresponding part of the "ELS1" program by Y. Rosenberg, the other version is corrected according to my remarks, see Step 1-2; results are close enough.

back to Programs Compiling Programmer's notes