Torah puzzle | Running | Programs |
>domain usage: domain infile outfile >prepare usage: prepare word infile outfile
Here > is the prompt of the operating
system.
The algorithm of the computation reported in [WRR94] is divided by me into 6 steps. They gradually transform Data 0 (the source) into Data 6 (the result):
prepare
search
sort_per
domain
distance
correct
The corrected distance between two Hebrew words, say, "WHYQDC" (Zedekia) and "HYNTM" (Matanya), can be computed by the following sequence of program calls:
prepare 'CDQYHW' data0 data1a search 2 data1a data2a sort_per data2a data3a domain data3a data4a prepare 'MTNYH' data0 data1b search 2 data1b data2b sort_per data2b data3b domain data3b data4b distance data4a data4b data5 correct data5 data6General forms of the calls (separate lines, not a sequence):
prepare word infile outfile search mode infile outfile sort_per infile outfile domain infile outfile distance infile1 infile2 outfile correct infile outfileThus, for example, the file
data3b is created by
the second run of sort_per and used by the second
run of domain (and after that it may be
deleted).
The source file data0 is the text of Bereishit
(Genesis) in the book format or the
table of letters format.
The mode parameter of search sets
the range of perturbations; use mode=2.
The word parameter of prepare is a
Hebrew word, written from the left to the right. Hebrew letters
must be represented in the same way as in data0.
The Michigan-Clairmont transliteration can make troubles: some
Hebrew letters are represented by special characters ) (
+ $ ,'CDQYHW' or
"CDQYHW", depending on the system.
If you'll run the programs frequently, you probably will make
it easier by using script (batch) files, pipes, files on a
virtual disk, and so on, depending on inclinations of you and
your computer.
Text reformatting programs
A number of formats is used when a Hebrew book is written as a file.
I provide the following programs:
import : from McKay's format to the book
format;
squeese : from the book format to a table of
numbered letters;
init : from Rosenberg's format to a table of
numbered letters;
translit : transliterates a Hebrew text in
the book format, or a table of numbered letters.
import infile outfile squeese infile outfile init infile outfile translit from to infile outfileBefore running
translit you need two simple
one-line files. One of them, michigan.alp (take it)
contains the Hebrew alphabet in the Michigan-Clairmont
transliteration:
[ ) B G D H W Z X + Y K L M N S ( P C Q R $ T ]The other one-line file
hebrew.alp should
contain the Hebrew alphabet as it appears on your computer. Make
it yourself. Keep the form of michigan.alp,
especially the order of letters, and the square brackets; spaces
do not matter. When michigan.alp and
hebrew.alp are ready, run the transliteration:
translit michigan.alp hebrew.alp data0 data0.heband get
data0.heb, the Book in Hebrew. Enjoy!
Another program, reverse, reverses an
alphabet. For example, doing
reverse michigan.alp michigan.invyou get a one-line file
michigan.inv containing
[ T $ R Q C P ( S N M L K Y + X Z W H D G B ) ]
Before searching for ELS's matching a given word, the WRR94
algorithm estimates the expected number of matchings (see
Step 1-2 for detail), which is a part
of the search program. The same is made more
explicitly by the following programs.
Calculating letter frequencies:
frequen infile outfileIf
infile is the text of Bereishit (either in the
book format, or as a table of letters), you get the following
outfile:
[ $ : 3574 ] [ ( : 2823 ] [ ) : 7634 ] [ + : 308 ] [ B : 4332 ] [ C : 1091 ] [ D : 1848 ] . . . [ X : 1844 ] [ Y : 9035 ] [ Z : 428 ]It means that the letter "Shin" (transliterated as
$) appears 3574 times in Bereishit, ..., and the
letter "Zain" (transliterated as Z) appears 428
times.
Ordering frequencies alphabetically:
freq_alp alphabet infile outfileHere
alphabet is a one-line file similar (or
identical) to michigan.alp (see
translit in the section "Text reformatting
programs" above), and infile is the output of
frequen. The outfile looks as follows:
[ ) 7634 ] [ B 4332 ] . . . [ $ 3574 ] [ T 4152 ]A letter of zero frequency (if any) will appear in the
outfile, though it is absent in the
infile.
Computing skip restrictions:
restrict frequencies words outfileHere
frequencies is the output of
frequen or freq_alp (both fit), and
words is a list of words like that:
[ MHRB)YBR ] [ YB)RH ] . . . [ $$RHMH ] [ HML$YBR ]The words may be written from the right to the left, or from the left to the right, it is the same, since the computation is symmetric. The
outfile looks as follows:
[ MHRB)YBR 11154 unrestricted 0.7 ]
[ YB)RH 22 restricted 10.2 ]
. . .
[ $$RHMH 1016 restricted 10.0 ]
[ HML$YBR 13012 unrestricted 7.8 ]
The first number is the maximal skip, the second number is the
expected number of mathchings. For more explanation, see my
ELS's for Table 2 page. I provide two
versions of the restrict program: one version fits
exactly the corresponding part of the "ELS1" program by Y.
Rosenberg, the other version is corrected according to my
remarks, see Step 1-2; results are
close enough.
| back to Programs | Compiling | Programmer's notes |