All Sound Same

Discussion of challenges you have already solved
Post Reply
gfoot
Posts: 269
Joined: Wed Sep 05, 2007 11:34 pm
Location: Brighton, UK

All Sound Same

Post by gfoot »

I just noticed that MagneticMonopole solved this over a week ago - well done!

Did you do your own analysis, or did you find some software that already does it?
MagneticMonopole
Posts: 26
Joined: Fri Nov 07, 2008 3:19 pm

Post by MagneticMonopole »

Hi gfoot,

I guessed from the challenge name that this was a homophone cipher. I found some way of clustering the homophones, i.e. finding the numbers that denote the same letter.

After that, it was a matter of simply feeding the now "normal" substitutiion cipher into a substitution cipher solver (SCBsolver) that uses frequency tables of 4-letter-groups.

Of course, the results from SCBsolver were not perfect (as my clustering contained several errors), but the text got readable enough (about 3 or 4 letters were wrong) to allow manual deciphering.

BTW: I can no longer access the challenge itself - says I am not qualified.

Best regards,
mm
User avatar
zjorzzzey
Posts: 11
Joined: Fri Oct 30, 2009 7:31 pm
Location: NL

Post by zjorzzzey »

Kinda followed the same strategy..

First i translated the homophonic cipher into a regular substitution cipher (by just using the solver I made for the warmup).

Then I manually solved the regular substitution cipher, which took most of the time btw :wink:
User avatar
Hippo
Posts: 339
Joined: Sat Feb 01, 2014 12:05 am
Location: Praha 5

Post by Hippo »

I have used beam search with rather big window (120000). Actually the main problem was badly trained language model. I have rechecked that window (30000) would be enough.
(BTW: The computation is still in progress, the 2nd half was much faster finished by "hand" ... of course not translation, but just key finishing)

I have computed some kind of optimal order of replacement. I have used language model based on upto 5grams.

According the plain text, I agree the methods used in warmup could be helpfull here.
But generally I can imagine choosing encodding probabilities such that this method would fail.

The method working very efficiently using just bigrams was not helpfull here.

P.S.: I have rerun it with window 3000 and it looks well ... it would probably finish (successfully) faster than remaining runs.

The window 300 run quickly and gave wrong solution but roughly half the words were correct. So manual correction would be possible, and for native speaker trivial.

It would be nice to invent some error recovery for the narrow window in the search.

...
Yes 3000 window made perfect deciphering (clearly separated from the 2nd best "solution" with zj flipped) while 30000 is on 33th letter and 120000 on 30th now.

Running the search twice with 300 window failed on one letter making about 4 wrong words.
(In second try ... I want to run it more times, and it actually made perfect deciphering already on 2nd run).

No recovery helped with window of size 100.

Size 260 recovered on 3rd round. As well as 220. 180 was very far from solution.

(Seems small error recovery is possible, but you cannot hope for recovery what you would not done by hand.)

I have tried window 100000 on the about 1/8 of the cipher ... and it failed.
Now I am experimenting with 1/2 of the cipher. And window 1000 seems would suffice (perfect deciphering in 2 rounds ... 2nd tries improving keys by replacing one letter (instead of setting the letter)).

Something more than 1/4 of the cipher probably fails with window 1000.
... OK I am on 4th round and it still slowly improves and now I see some of the correct words ...
... and I got perfect deciphering on round 8 ;).

And now it worked with first about 140 words (window 1000, round 9)

.... I have not recomputed the "optimal" extension order for the smaller ciphertexts.

Except the model, I have problems with implementation of huge window ... I have chosen TreeSet to represent it and I have recycled the nodes which did not fit to the window ... (I have cut next generation during it's build ... and I have processed elements in natural ordering so a lot of candidates were just evaluated and binned.)

...trying first 124 words and window 2000 ... done in 19 rounds :) ... first words of the solution appeared at the middle of the 13th round

and 120 starting words looks as unbeateble ... 2000, 3000 windows failed 30000 and 120000 in progress, but it does not seem to differ too much.

Looks like my implementation of beam search with recovery was not able to solve 120 first words with any size of window. The problem was in small diversification of keys in the window.
So I have invented trick ... to store only such keys that all already processed keys in hamming distance at least 2 have bigger eval. ... with this diversification window size 3000 was enough to solve the cipher in 5 rounds (just the found key was in hamming distance 1 of perfect deciphering).

So I am going to experiment with even shorter cipher :).

Oh I have not mentioned that I am computing optimal order of replacements again.

88 of 115 words correctly deciphered by 13 rounds with window 3000 (and hamming distance > 2) :) (trivial to finish by hand).

What didn't work for me (may be due to bad bigram statistics) was method from "Efficient Cryptanalysis of Homophonic Substitution Ciphers"... Amrpali Dhavare, Richard M Low, Mark Stamp. (I think I have speeded it a bit, but it quickly finished with no success).

My solution was based on Improved Decipherment of Homophonic Ciphers
Malte Nuhn, Julian Schamper, Hermann Ney (I had to think about good language model, and I have added error recovery to solve shorter ciphers (I did not limit the size of groups the way they did...), but the method was good enough for the challenge).

110 first words correctly decipherd by a lot of rounds (perfect at round 73, easily readable around 47) with window 3000.
... going to experiment with genetics and yet smaller ciphers.

And even 100 words were solvable with genetics in help (except n used in place of j what given better total evaluation).

I got the same kind of solution for 90 words and window size 16000 using 9 rounds.
Windows 3000 and 8000 failed on other hills. I should improve the genetics a bit.

OK I have changed the genetics such that top square root of population has sex with partner in "(0,1) random square" position of population. ... and now window 4000 got to the perfect solution in 7 rounds, while window 2000 seems walks around other hill (even after 16 rounds). ... both on 90 first words.

I have trained English by some part of English Gigaword corpus instead of the 3 "novels" used before and used window 5000 on 80 starting words ... and perfect deciphering appeared in 6th round.

Originally trained English with window 4000 did perfect deciphering at 12th round and with window 12000 on 7th.

On 70 words the window 5000 walked around the hill, so I have added "addaptive window change" as another experiment. ... perfect deciphering on 7th round with window size 8000 (round 6 ended with just jz switched). The experiment starting with window 4000 got wrong hill at round 6 and even 2 size doubling of the window did not forced finding another hill. Let us see what happens ... (starting with bigger window was more successfull) ... (re)starting with bigger window looks better than extending window after climbing wrong hill.

And seems 60 words are unbeatable even with fixed window 100000. ... confirmed ... and it's end of my experiments on Sounds Same ;).
Post Reply