Torah puzzle

Step 1-2

Searching for the word

Acording to the file "Data 1",

[abcdef]
[ .....ffe.f fe.e.dea.. efa..e..f. d..e...de. .....d....   <--      1 ]
[ dedf.f.ded .de.....df .d.ed..... .....de... f.f.fe.d..   <--     51 ]
[ df...e.d.f .f.e.d..de ...b.df.f. d..f.e...d e....df.f.   <--    101 ]
[ .dfb...fd. c.dedf...d edfe.d...c ....f.fd.f ...de....c   <--    151 ]
[ .de....df. d...d..d.. db..dedf.d .e.f...dc. ded.de....   <--    201 ]
[ .dc....... ..d.e.d.f. dc........ ..d.e.d..b .df.dc.e..   <--    251 ]
[ ...dfd...f d.c.dedf.. .dedf.d... dc...de... .cdf..dedf   <--    301 ]
[ cdf..dedfe ..dee...fb ...fc....d ..e.....d. effcd.de..   <--    351 ]
    .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .
the first letter "a" of the given word matches the 28-th letter of the Book, while the second letter "b" of the word matches the 127-th letter of the Book. We may form an equidistant letter sequence (ELS): (28, 127, 226, 325, ...) having a constant skip, 99 (28+99=127; 127+99=226; 226+99=325; ...). Look at these positions in "Data 1": (28:"a"; 127:"b"; 226:"d"; 325:"."; ...). The ELS does not spell the word "abcdef", which is clear already after looking at the third term (226:d). Well, we try the next opportunity: "b" occurs at position 197, which gives the skip 197-28=169 and the third position 197+169=366, marked by a dot (which means "irrelevant"), not by "c". We try the next opportunity, and so on...

A computer makes for as the long unskilled labour and reveals the desired ELS: start=28052, skip=7752 is it!
28052:a; 35804:b; 43556:c; 51308:d; 59060:e; 66812:f.
The corresponding lines of "Data 1" follow:

[ ....fb.df. .dff.cdffd ......e... ..dfd.c... ......c.a.   <--  28051 ]
[ d..f..f... df.....d.f ....fd.... ......fd.. .....fb.c.   <--  35801 ]
[ f.f...f.f. ...af.dedf b..b...d.e a..df.c.d. .d..cef...   <--  43551 ]
[ ....d..d.. c.dbf..... .cd...c.d. ...de..f.. ..dff.....   <--  51301 ]
[ df.e.e.d.. .d.f...... d.a....... .e..efec.. ef....f...   <--  59051 ]
[ ...d...e.d .a..f.b... e.ffb..f.f .d.df...f. d....dfc..   <--  66801 ]

To make it spectacular, imagine that the text is divided into rows of 7752 letters each (rather than 50 letters each):

                                                  |
--....fb.df..dff.cdffd......e.....dfd.c.........c.a.-----------
----d..f..f...df.....d.f....fd..........fd.......fb.c.---------
------f.f...f.f....af.dedfb..b...d.ea..df.c.d..d..cef...-------
--------....d..d..c.dbf......cd...c.d....de..f....dff.....-----
----------df.e.e.d...d.f......d.a........e..efec..ef....f...---
------------...d...e.d.a..f.b...e.ffb..f.f.d.df...f.d....dfc..-
                                                  |

The ELS "abcdef" becomes visible as a column! Here is the corresponding part of "Bereishit":

                                                  |
--)BL)WDXYWKLYWWMQYWWYR(NL)MHRB)B$YWYLQBT(M$R$)BQ(CR-----------
----Y)RW)RWRM)YWMKT)MYNWXL$TWYT)MT)N$MT)WYL)MT)B(WDMQX---------
------WTWXP$WTWBRN)CWLYHYWD)MD)M$Y)HCRPYWBQ(YLMYR$QHWNBL-------
--------L)R$YM)YKBQ(YDW(KM$)RQY)LBQ(YKM$MYHL)WLRM)YWWT)KRB-----
----------YWRHSHTYBBMYRWS)R$)MYRCMKLMLR$)HP)HWHQ$MHWMLXNWRTP---
------------T)MYLK)HMYRCMLWMDBLMHLWWDBLWLWMY$YWMXLWMY$RM)YWQP)-
                                                  |

The vertical word "WHYQDC" (Zedekia) is clearly visible.

The above ELS (start=28052, skip=7752) is not the only ELS spelling out the given word "abcdef" (that is, "WHYQDC"), however, it is the most "noteworthy" in the sense that it has the minimal skip. There is one more ELS, but only one: (start=65634, skip=-12308); it is less noteworthy since the absolute value 12308 of its skip is greater than 7752. And what about its minus sign? It means the opposite direction, from the end of the Book to the beginning: (65634:"a"; 53326:"b"; 41018:"c"; 28710:"d"; 16402:"e"; 4094:"f").

[ d.f...f.dd .ea.....c. ffbd...d.. e..f..f.f. ..b.f...b.   <--   4051 ]
[ .a...fe.b. ...fe..... .f.b.....a df....aa.. ..ded...e.   <--  16401 ]
[ e...d..... ..feb.eac. ...f....e. ...e...... d...df..a.   <--  28701 ]
[ .........e .d..ed...e .d.ae...f. .fc....e.. d........d   <--  41001 ]
[ d...f..a.. d...e..f.. ....b....d e...a....e ...fee..fe   <--  53301 ]
[ ..f......b ...fd.a... e....d..fb d.fef..d.. .a..e.....   <--  65601 ]

The situation changes dramatically, when we turn to another word, "HYNTM" (Matanya). The most noteworthy ELS is (start=24185, skip=-2).

                  | | |  | |
[ ..a....b.e .d..ea.b.c ddced..b.. .eee.d..a. cd..b.....   <--  24151 ]
Its small skip contrasts with the minimal skip (7752) for the word "WHYQDC" (Zedekia). Next ELS's for "HYNTM" (Matanya) are: (start=75436 skip=3), (start=12198 skip=5), and others.
    |  |  |   |  |
[ a.e..d..c. .b..a...ae db..a.a.d. .ad..d...c ..a.e..d..   <--  75401 ]

    |
[ ..a...c... .dba.eb..d .a.eb..d.. ..eb..dcd. eb..d...eb   <--  12151 ]
[ .ea..e...e ....e...e. ..c.d.adc. c.e....ded .dc.c.eb..   <--  12201 ]
                                     |    |     |    |

There are two simple reasons for the contrast. First, the word "HYNTM" is shorter than "WHYQDC". Second, letters "C", "D", "Q" are rather rare. The letter "C" (tsade) matches only 1091 letters of the Book out of 78064, that is, 1.4 %. Here are frequencies for all letters in "Bereishit":

  K     Y     +     X     Z     W     H     D     G     B     )
2774  9035   308  1844   428  8448  6283  1848   577  4332  7634
3.6% 11.6%  0.4%  2.4%  0.5% 10.8%  8.1%  2.4%  0.7%  5.5%  9.8%

  T     $     R     Q     C     P     (     S     N     M     L 
4152  3574  4793  1301  1091  1203  2823   446  3785  6110  5275
5.3%  4.6%  6.1%  1.7%  1.4%  1.5%  3.6%  0.6%  4.8%  7.8%  6.8%

If we choose at random one letter among all the Book, it appears to be "M" with probability 0.078. If we choose 5 letters at random, independently, they are "HYNTM" with probability p = 0.078 * 0.053 * 0.048 * 0.116 * 0.081 = 0.000 0019 = 1.9 * 10-6, which is about 1 / 530,000. That is, choosing 5,300,000 five-letter random samples, we may expect to get "HYNTM" about 10 times. For "WHYQDC" the probability is much smaller: p = 0.014 * 0.024 * 0.017 * 0.116 * 0.081 * 0.108 = 0.000 000 0057 = 5.7 * 10-9, which is about 1 / 180,000,000.

Equidistant letter sequences (ELS's) are not random samples, but still, it is instructive to compare the above probabilities with the number of ELS's. Consider all five-letter ELS's of skip 2 (with no matching). The first ELS is (start=1, skip=2: 1, 3, 5, 7, 9). The last ELS is (start=78056, skip=2: 78056, 78058, 78060, 78062, 78064). There are 78056 = 78064 - (5-1)*2 of such ELS's. In general, there are 78064 - (n-1)d of n-letterr ELS's having skip d.

Compare the number of five-letter skip 2 ELS's, 78,056, with the probability 1 / 530,000 of matching "HYNTM". No match is the most probable result. Adding skip (-2) we get 2 * 78,056 = 156,112 ELS's; still not enough. Though, we know the truth: there is a match, (start=24185, skip=-2). However, the algorithm estimates probable skips before starting the search. Let us follow the logic of the algorithm. By including larger skips, we increase our chance. Skips 2 and 3 give (78064-4*2) + (78064-4*3) = 78,056 + 78,052 = 156,112 ELS's. In general, the number of ELS's for all skips from 2 to D is a sum of (D-1) terms. The first term is (78064-2(n-1)), the last term is (78064-D(n-1)); in the mean, a skip is d=(2+D)/2 and a term is (78064-(n-1)(2+D)/2), thus, the sum is equal to (D-1)(78064-(n-1)(2+D)/2). Adding negative skips, we get finally N = 2(D-1)(78064-(n-1)(2+D)/2) ELS's of n letters, having skips (-D)...(-2), 2...D. The algorithm determines D such that 10/N = p, where p is the probability calculated as above. It means that about 10 matches are expected. The needed D is determined by solving the quadratic equation 2(D-1)(L-(n-1)(2+D)/2) = 10/p, where L=78064. The solution D = ( L - (n-1)/2 - ((L-1.5(n-1))2 - 10(n-1)/p)1/2 ) / (n-1) need not be an integer, and we round it off to the nearest integer, since the calculation is anyway only a crude approximation: we only hope that the actual number of matching will be not too far from 10. In fact, the program "ELS1" of Yoav Rosenberg computes D = 2 + ( L - (n-1)/2 - ((L-0.5(n-1))2 - 10(n-1)/p)1/2 ) / (n-1); note the additional number 2 and the coefficient 0.5 instead of 1.5; maybe it is a mistake, but rather harmless. I keep his formula in my version of the program.

Returning to the word "HYNTM" (Matanya), we substitute n=5 and p=1/530,000, which gives D=36 (while the solution of the quadratic equation gives D=34.98). In fact, the number of ELS's having skips (-36)...(-2), 2...36 is 5,459,160, which is reasonably close to 10 * 530,000 (excluding skips (-36) and 36 we get 5,303,320 ELS's).

The similar calculation for the word "WHYQDC" (Zedekia) does not go: the number under the square root is negative. It means that the number of expected matches is less than 10 even if all skips are allowed. The only restriction for the skip d is L - (n-1)d < 0, thus the maximal skip is the integral part 15,612 of the number L/(n-1) = 78064/5 = 15612.8, however, the algorithm adds 2, obtaining 15,614. The number of ELS's is N = 2(D-1) (L - (n-1)(2+D)/2) = 2*15611*(78064-5*15614/2) = 1,218,563,438; being so large, it is still less than 10/p = 10 * 180,000,000. The expected number of matches Np = 1,218,563,438 / 180,000,000 = 6.8 is less than 10. (We know that the actual number of matches is only 2.)

Step 1-2 Program code
back to Steps of computation back to Impossible Facts back to my home page