Torah puzzle | Running | Programs |
>domain usage: domain infile outfile >prepare usage: prepare word infile outfile
Here >
is the prompt of the operating
system.
The algorithm of the computation reported in [WRR94] is divided by me into 6 steps. They gradually transform Data 0 (the source) into Data 6 (the result):
prepare
search
sort_per
domain
distance
correct
The corrected distance between two Hebrew words, say, "WHYQDC" (Zedekia) and "HYNTM" (Matanya), can be computed by the following sequence of program calls:
prepare 'CDQYHW' data0 data1a search 2 data1a data2a sort_per data2a data3a domain data3a data4a prepare 'MTNYH' data0 data1b search 2 data1b data2b sort_per data2b data3b domain data3b data4b distance data4a data4b data5 correct data5 data6General forms of the calls (separate lines, not a sequence):
prepare word infile outfile search mode infile outfile sort_per infile outfile domain infile outfile distance infile1 infile2 outfile correct infile outfileThus, for example, the file
data3b
is created by
the second run of sort_per
and used by the second
run of domain
(and after that it may be
deleted).
The source file data0
is the text of Bereishit
(Genesis) in the book format or the
table of letters format.
The mode
parameter of search
sets
the range of perturbations; use mode=2
.
The word
parameter of prepare
is a
Hebrew word, written from the left to the right. Hebrew letters
must be represented in the same way as in data0
.
The Michigan-Clairmont transliteration can make troubles: some
Hebrew letters are represented by special characters ) (
+ $
,'CDQYHW'
or
"CDQYHW"
, depending on the system.
If you'll run the programs frequently, you probably will make
it easier by using script (batch) files, pipes, files on a
virtual disk, and so on, depending on inclinations of you and
your computer.
Text reformatting programs
A number of formats is used when a Hebrew book is written as a file.
I provide the following programs:
import
: from McKay's format to the book
format;
squeese
: from the book format to a table of
numbered letters;
init
: from Rosenberg's format to a table of
numbered letters;
translit
: transliterates a Hebrew text in
the book format, or a table of numbered letters.
import infile outfile squeese infile outfile init infile outfile translit from to infile outfileBefore running
translit
you need two simple
one-line files. One of them, michigan.alp
(take it)
contains the Hebrew alphabet in the Michigan-Clairmont
transliteration:
[ ) B G D H W Z X + Y K L M N S ( P C Q R $ T ]The other one-line file
hebrew.alp
should
contain the Hebrew alphabet as it appears on your computer. Make
it yourself. Keep the form of michigan.alp
,
especially the order of letters, and the square brackets; spaces
do not matter. When michigan.alp
and
hebrew.alp
are ready, run the transliteration:
translit michigan.alp hebrew.alp data0 data0.heband get
data0.heb
, the Book in Hebrew. Enjoy!
Another program, reverse
, reverses an
alphabet. For example, doing
reverse michigan.alp michigan.invyou get a one-line file
michigan.inv
containing
[ T $ R Q C P ( S N M L K Y + X Z W H D G B ) ]
Before searching for ELS's matching a given word, the WRR94
algorithm estimates the expected number of matchings (see
Step 1-2 for detail), which is a part
of the search
program. The same is made more
explicitly by the following programs.
Calculating letter frequencies:
frequen infile outfileIf
infile
is the text of Bereishit (either in the
book format, or as a table of letters), you get the following
outfile
:
[ $ : 3574 ] [ ( : 2823 ] [ ) : 7634 ] [ + : 308 ] [ B : 4332 ] [ C : 1091 ] [ D : 1848 ] . . . [ X : 1844 ] [ Y : 9035 ] [ Z : 428 ]It means that the letter "Shin" (transliterated as
$
) appears 3574 times in Bereishit, ..., and the
letter "Zain" (transliterated as Z
) appears 428
times.
Ordering frequencies alphabetically:
freq_alp alphabet infile outfileHere
alphabet
is a one-line file similar (or
identical) to michigan.alp
(see
translit
in the section "Text reformatting
programs" above), and infile
is the output of
frequen
. The outfile
looks as follows:
[ ) 7634 ] [ B 4332 ] . . . [ $ 3574 ] [ T 4152 ]A letter of zero frequency (if any) will appear in the
outfile
, though it is absent in the
infile
.
Computing skip restrictions:
restrict frequencies words outfileHere
frequencies
is the output of
frequen
or freq_alp
(both fit), and
words
is a list of words like that:
[ MHRB)YBR ] [ YB)RH ] . . . [ $$RHMH ] [ HML$YBR ]The words may be written from the right to the left, or from the left to the right, it is the same, since the computation is symmetric. The
outfile
looks as follows:
[ MHRB)YBR 11154 unrestricted 0.7 ] [ YB)RH 22 restricted 10.2 ] . . . [ $$RHMH 1016 restricted 10.0 ] [ HML$YBR 13012 unrestricted 7.8 ]The first number is the maximal skip, the second number is the expected number of mathchings. For more explanation, see my ELS's for Table 2 page. I provide two versions of the
restrict
program: one version fits
exactly the corresponding part of the "ELS1" program by Y.
Rosenberg, the other version is corrected according to my
remarks, see Step 1-2; results are
close enough.
back to Programs | Compiling | Programmer's notes |