Page 2 of 2

155/165; 157/167

Posted: Sun Nov 15, 2015 1:24 am
by Hippo
I am still very far from geting good recognition ...

I cannot distinguish these two pairs even myself ...

I must say I was very surprised that there are differences in the numbers pronunciation.

So RFC 2083, RFC 1950, RFC 1951 read, filration residue is good start point ... I have implemented my own inflater to improve error recovery ...
I have already decoded Hufmann tree used for Huffman tree construction
still I have not constructed the first Hufmann tree ... (still with manual voice recognition with file split by silence to smaller files).

OK thinking about surprising values in decoded lengths and comparing with drifter's I went into following header:
Drifter wrote: Edited:

Code: Select all

89504E470D0A1A0A0000000D4948445200000078
0000005A0802000000FC6205B800000001735247
4200AECE1CE9000000097048597300000B130000
0B1301009A9C180000000774494D4507D8071403
171AA545EB8F0000001974455874436F6D6D656E
74004372656174656420776974682047494D5057
810E17000020004944415478DA7CBC69AC6DD971
1E56C35A6BEF7D867BEE7CEF9BA7EED7CD26BB39
8B4D89438BA6461AB063119600590A2843919040
91292B13E0244E2028B163393F12C1B013409194
C0F290C888048B14253111459AA2D8249B3DF075
F77BDD6F7E773CF78C7B586B55557EDC27010992
OK now it generates huffman tree using all prefixes.
(With low absolute values having shorter encodding than bigger ones, end of block has maximal length ... so at least it looks reasonable).

So and now to automate the recognition process...

Posted: Tue May 31, 2016 10:53 am
by Hippo
I have implemented metric trees to find neerest presented point ... unfortunately it's not easy to find a good metrics (when you try to do time shifts it usually ends with something not beeing metrics ... so the trees could missclassify even the elements contained in them).

Actually I use some FFTs (overlapping windows) as coordinates (I use differences of absolute values on separate coordinates). ... actually I use metric tree for prefixes, metric tree for suffixes and bunch of metric trees for "middle of the chunk corresponding to classified ends. ... seems it improved the classification with small teaching set."

I slowly increase number of classified chunks. Some of them are in "known" subdirectory, others in sorted and "sorted_else" directories. The chunk has the number incoded in the name and those classified with correspondence with the names are in sorted, the others in sorted_else. sorted, sorted_else have subdirectories 000, 001, ..., 255 in it.

So I choose say 4096 unclassified chunks, run classifier trained by "known subdirectory", and the chunks are distributed to sorted_else subdirs. Than I run "play all wavs in directory" and when all sounds well, I rename all chunks to incorporate the dir number in name. Otherwise I move&rename wrongly classified sounds manually till all sounds in directory sound well ...
Than I choose representants of wrongly classified chunks and (after carefull listening) I add them to "known". Next I take whole sorted_else subtree and reclassify it. Now the well classified run to sorted and I have to choose new candidates to move to "known" ... the loop continues till there are chunks put to sorted_else.
When there is no file in sorted_else, I reclassify all sorted to see if some of them wents to sorted_else again ... introducing new "knowns".

Often I found that my classification by listening was wrong and I should rename the chunk.
(the sound quality is poor and for not native speaker it's really diffcult to classify some sounds ... for example sound where I hear just "D2" after a long enough silence).

I am afraid that after I will finish the chunk classification, the error correction process to create a readable image would be still big challenge.

Posted: Thu Jun 02, 2016 12:06 am
by Napoleon
I thought this challenge was pretty straight forward.

There are many speech recognition software solutions out there that can solve this challenge. The only thing is to look out for errors. Luckily, those are also pretty straightforward to find.

Here's what I suggest:
  • Slow the audio down... Maybe 30%
    Split the audio into smaller files with just one number, say "52".
    Write a small program, that will play the corresponding byte and show the speech recognition byte simultaneously.
Those 3 tips should have you solve the challenge in no time :)

Posted: Thu Jun 02, 2016 6:17 am
by Hippo
Wow, looks you know what you are talking about :).
Hmm 3 recent solvers ... So seems the error recovery stage neednot be that difficult.
Finally I have found a way how to play sounds in java and release the source files so they could be moved ... and now the classification is comfortable. Even that I would "train" the automated recognition ... .

Posted: Thu Jun 02, 2016 11:39 pm
by Napoleon
I'd love to see your program, once you've solved it :D

Posted: Fri Jun 03, 2016 10:25 am
by Hippo
Napoleon wrote:I'd love to see your program, once you've solved it :D
OK, but it is code in progress ... commenting out parts not needed in given stage ... and containing deadends. And I am still not sure how the error recovery for the deflate would look like ... .

Posted: Sun Jun 05, 2016 3:22 pm
by TheBigBoss
The SpeechRecognitionEngine-Class from the .NET-Framework is so great! The first number "137" from the slow-down-audio-file is recognized as "100 and Curtis Robbins". The first few seconds are recognized as "100 and Curtice Robbins taste of the 87 wants cartoon cats 26 at least". I have so much fun with .NET... :D

Posted: Tue Jun 07, 2016 2:42 pm
by AMindForeverVoyaging
TheBigBoss wrote:I have so much fun with .NET...
Hehe. Perhaps I should try it out too.

Nah, I'm sticking with C++ and Java. ;-)

Posted: Sun Jun 12, 2016 6:29 am
by Hippo
So I have all the chunks classified (and verified at least once).
3 lines of the picture look reasonable ... just 3 lines ... so error recovery process starts :(.

I have corrected bug which rarely computed log(0) ... why java didn't complain and continue with -infty instead :(. May be I would need less representants of several numbers.

Posted: Sun Jun 12, 2016 12:08 pm
by AMindForeverVoyaging
Hippo wrote:So I have all the chunks classified (and verified at least once).
3 lines of the picture look reasonable ... just 3 lines ... so error recovery process starts :(.
It would have made more sense to use an image format which allows for a margin of error. Especially since this challenge opens up very early on the map, but does not have a warm-up to it.

An easier version of this could have been a good warm-up indeed.

Posted: Wed Jun 15, 2016 3:42 pm
by Napoleon
AMindForeverVoyaging wrote: It would have made more sense to use an image format which allows for a margin of error. Especially since this challenge opens up very early on the map, but does not have a warm-up to it.

An easier version of this could have been a good warm-up indeed.
Agreed

Posted: Thu Jun 16, 2016 2:07 pm
by Hippo
Hippo wrote:So I have all the chunks classified (and verified at least once).
3 lines of the picture look reasonable ... just 3 lines ... so error recovery process starts :(.

I have corrected bug wich rarely computed log(0) ... why java didn't complain and continue with -infty instead :(. May be I would need less representants of several numbers.
Hmm, first line is probably OK as all colors are above 0x80 and a lot of pixels have 0xffffff.
Unfortunately I have probably problems already on 2nd line where thanks to PAETH filtering you have no confirmation it's OK. I have changed 13x to 15x what surprisingly changed 3 bits in the huffman code of length 4 in both cases so only 1 byte in the stream. It resulted in about 10 changes in colors on 2nd and 3rd line. I have surely bugs till third line as there is surprising color on left of the line and there is overflow to 0 at the right part of the line.

Fifth line starts with wrong filtering code ... and the rest of the picture is random mosaic (I could force PAETH filtering, but that would probably generate another random image).

Wow, I have corrected 4 more bytes (one in the openning) and I can read two words depicted on the message. There are either wrong pixels at the top which due to Paeth made just local bias.

The changed byte resulted in one less byte in byte stream so differences were applied to shifted colors ...
making third of image almost readable. The overflows in well separated colors are not such a big problem for readability.

Ignoring data and setting Paeth mode for each line makes picture much more readable ...