Not Smaller Yet

Discussion of challenges you have already solved
Post Reply
tails
Posts: 191
Joined: Tue Jun 10, 2008 7:51 pm
Location: Tokyo

Not Smaller Yet

Post by tails »

I have a question. How should we use the value 34 on decoding?
User avatar
adum
Posts: 392
Joined: Thu Apr 19, 2007 12:49 pm
Contact:

Post by adum »

i guess it's not totally necessary in BWT but it serves as a hint. i'm not an expert here by any means...
papa
Posts: 13
Joined: Wed Oct 29, 2008 7:51 am
Location: Germany

Post by papa »

I suppose, 34 defines the beginning of the original text. Since the BWT ist rotation invariant (abcde has the same bwt-result as e.g. bcdea or deabc), you cannot otherwise know where the text starts in the backtransformed sequence.
But I thought that this should be the position of the last (or the first) character of the original text in the BWT-Sequence, which is 17 (resp. 20)?
tails
Posts: 191
Joined: Tue Jun 10, 2008 7:51 pm
Location: Tokyo

Post by tails »

Yes, that's exactly the same as my thought. If the value were either 17 or 20, I would not have the question.
User avatar
adum
Posts: 392
Joined: Thu Apr 19, 2007 12:49 pm
Contact:

Post by adum »

was the title of the challenge a hint for you guys? were you already familiar with BWT?
papa
Posts: 13
Joined: Wed Oct 29, 2008 7:51 am
Location: Germany

Post by papa »

BWT is the first step in the compression algorithm of bzip2. It sorts the letters such that those appearing in a similar surrounding come together. E.g. a 'q' is always followed by a 'u', so the q's will appear only within a certain region of the BWT-sequence. Therefore this sequence is easier to compress with Move-To-Front- and Huffman-Coding, which are the next steps in bzip2 and make the sequence shorter actually.
By the way, I did pattern matching on BWT-sequences in my diploma thesis, that's why I found the solution to this challenge at first glance (in contrast to some other crypto challenges).
joki
Posts: 1
Joined: Sun Nov 16, 2008 12:13 pm

Post by joki »

About the meaning of the number: in this case it denotes which character of the original text appears first in the encoded text. In the given text, the rotation starting with " Burr..." is lexically smallest, so the 'e' of "the" in front of it appears first in the transformed output, and this is character #34 of the original text. To the decoder, this means that after the text has been reconstructed by iterating the "sorting permutation" starting with the first character, the correct rotation will be obtained after moving the last 34 characters to the front of the string.

Note that this is quite similar in spirit, yet opposite in effect to the number expected by tails and papa. In that case the decoder would obtain the correct rotation by applying the "sorting permutation" repeatedly, starting with the specified character of the transformed string.

This is the first challenge I have created and I'm sorry if the number caused much confusion... I wasn't sure whether to include it in the puzzle because there is more than one equally useful way to convey the missing information to the decoder and this puzzle is most likely just as solvable without it (modulo rotation).

Cheers and happy puzzling,

Joachim
tails
Posts: 191
Joined: Tue Jun 10, 2008 7:51 pm
Location: Tokyo

Post by tails »

adum wrote:was the title of the challenge a hint for you guys?
Yes, it was a big hint for me. I was like this:
(1) frequency analysis -> hmm, it seems a transposition cipher.
(2) "Not Smaller Yet" -> going to be smaller?
(1) + (2) -> Aha!
adum wrote:were you already familiar with BWT?
No, I didn't even know the word BWT. But I know the very basic things about how bzip2 works. It's called as "block sort" in Japan, so this time I googled it (actually in Japanese) and found the Wikipedia article.
User avatar
m!nus
Posts: 202
Joined: Sat Jul 28, 2007 6:49 pm
Location: Germany

Post by m!nus »

I recognized it even without the challenge title, but the title made it even more clear.
I know BWT because I once tried to implement DEFLATE in PHP (didn't make it but that's why i know about it). For the challenge I did BWT in PHP, not really hard, the number was very confusing though.
Quite nice challenge, maybe I should take a (closer) look at MTF und Huffman again aswell.
man
Posts: 5
Joined: Fri Mar 20, 2009 1:50 pm

Post by man »

'Not smaller yet' immediately reminded me of BWT.
I know that because I read some articles about compression some time ago.

It seems that you can only solve this one if you already knew something about bz2.
Very easy for us, but very hard for the others.
rmplpmpl
Posts: 113
Joined: Sun Oct 26, 2008 10:38 am
Location: Germany

Post by rmplpmpl »

I didn't know about this before, though the challenge's name was a hint that I would be looking for a algorithm which is performed before compressing data.
Nevertheless, it took me ages to learn about BWT (though I used to know Huffman encoding)
AMindForeverVoyaging
Forum Admin
Posts: 496
Joined: Sat May 28, 2011 9:14 am
Location: Germany

Post by AMindForeverVoyaging »

To be honest, I found the (34) rather confusing than helping. Also, it took me ages to find out which sort of algorithm is to be applied here.
Post Reply