Page 1 of 2

'Say It' [Misc]

Posted: Mon Dec 29, 2008 4:19 pm
by Drifter
  • Aggravation: *Curses the creator of this challenge*
  • Pity: *I hope your friend has a good calling plan*
  • Denial*Your friend must be a computer, b/c 'he read' (for continuously)*
  • Self-Pity *Do we have to correct mistakes made while reading*
  • Fear *Hopes 'your friend isn't dyslexic*
The original file is 4 hours, 2 minutes, 7 seconds... (14527 seconds.) Or at 70% play speed for transcription and safety 5 hours, 45 minutes, 53 seconds... (20753seconds.) Add two child labors again for transcription safety and integrity; and the resultant

While the premise is relatively simple ... compression from speech to hex looks too painful. 40 bytes in the first 25 seconds expands to an ending file size to be 23kb.

Anyone care to recommend a Speech Recognition Engines (preferably open source)?

Before I realize that it was not 4:02 minutes but hours The first 240 bytes are:

Code: Select all

89504E470D0A1A0A0000000D4948445200000078
0000005A0802000000FC6205B800000001735247
4200AECE1CE9000000097048597300000B130000
0B1301009A9C180000000774494D4507D8071403
171AA545EB8F0000001974455874436F6D6D656E
74004372656174656420776974682047494D5057
810E17000020004944415478DA7CBC69AC6DD971
1E56C35A6BEF7D867BEE7CEF9BA7EED7CD26BB39
8B4D89438B8B461AB06311A000590A2843919040
91292B13E0244E2028B163433F12C1B013409194
C0F290C888048B1425311145A4A2D8249B3DF075
F77BDD6F7E773CF78C7B586B55557EDC1F010992
This block is about 1% of the unchecked raw file.
Feel free to edit if you feel this reveals to much...

They say a picture is worth a thousand words; this one is worth ~12 thousand...
! and I only need one.
- Another atomatron trudges senselessly away.

Posted: Mon Dec 29, 2008 4:41 pm
by MagneticMonopole
Hi Drifter,

and, as you will without doubt have noticed, the PNG format is exceptionally unforgiving. A single wrong byte will not result in an ill-shaded pixel, but an almost completely corrupted file - thanks to the mandatory use of compression and checksums in PNG. :D

Posted: Mon Dec 29, 2008 4:55 pm
by efe
@Drifter: There are at least 5 errors in your 240 bytes.
I compared it to my result and listened again to the differing bytes.
I also found an error in my byte sequence. :)

Sometimes its hard to distinguish between 5 and 6. (ex. 62 or 52).

Posted: Mon Dec 29, 2008 8:19 pm
by gfoot
Checksums work in your favour if the input data could be corrupted.

Posted: Tue Dec 30, 2008 3:54 pm
by MichaBln
Hi,

i'm getting approximatly 95% right ... still my PNG is corrupt.
My way of approaching the whole thing works pretty good ... I made kindof my own learning speach-recognition software. Unfortunatly I'm really not good at audio-fingerprinting and such, I tried using my own algotithms for the recoginition by analysing characteristics of each number.
Still that 5 / 6 - Problem as mentioned is really sick ... I don't know if its a good idea but I'm going to try using FFT and analyzing the frequencies itself.

I got the feeling I close (and so seem to be others ...) but as mentiond 99,9 % won't do the job.

Anyways ... nice challenge.

Michael

Posted: Thu Apr 01, 2010 4:49 am
by helly0d
Can i ask if it is really true that from the 18 hours of the track just 6 of them are the picture because the other 12 hours are just 2 repetitions of the picture?
And am i close if i get a 4 Kb PNG filled with black?

Posted: Mon May 10, 2010 5:37 am
by Arkondi
[quote="helly0d"]Can i ask if it is really true that from the 18 hours of the track just 6 of them are the picture because the other 12 hours are just 2 repetitions of the picture?
And am i close if i get a 4 Kb PNG filled with black?[/quote]

There are no repetitions, the file you will get has 22536 bytes and it is not filled with black.

Posted: Mon May 17, 2010 3:20 pm
by Masti6
This is one of the most frustrating challenges yet.
Any speech-to-text(other than youtube?) out there that you could share?

Posted: Wed Dec 14, 2011 12:21 pm
by wolf may cry
Anyone still working on this challenge?
I've tried some speech-to-text software, but it doesn't work at all
Do I have to write a special speech-to-text software for this challenge? Because I have no idea how it works... :cry:

Thanks

Posted: Wed Dec 14, 2011 2:24 pm
by AMindForeverVoyaging
There really should be a warm-up challenge for this one, introducing you to speech processing/voice recognition.

The "solving rate" for this challenge is less than one person per year. If that is not a clear indicator that the difficulty level is way overdone (for a challenge that stands on its own and has no warm-up), then I don't know what is.

Posted: Mon Jan 30, 2012 9:41 am
by aurora
i am trying speech to text now, but as of bad quality, my hopes are not very high, that this will work. i have still another idea to solve it, but that sure will mean "some" work in coding ...

Posted: Mon Jan 30, 2012 1:31 pm
by aurora
could someone who solved this problem verify, that it's indeed 22536 bytes? i have currently 25442 to work with. (i gave up with speech-to-text btw. and am trying a different approach).

Posted: Tue Jan 31, 2012 2:47 pm
by AMindForeverVoyaging
aurora wrote:could someone who solved this problem verify, that it's indeed 22536 bytes?
Well that figure is from Arkondi, who *is* one of the (very few) people who solved it.

Posted: Wed Feb 01, 2012 12:28 pm
by aurora
AMindForeverVoyaging wrote:
aurora wrote:could someone who solved this problem verify, that it's indeed 22536 bytes?
Well that figure is from Arkondi, who *is* one of the (very few) people who solved it.
errr ... right. i think i could have figured this out myself, sorry. i had probably the wrong tool anyway, and i can confirm now, that with the right tool you can indeed get 22536 bytes out of this input. toughest part is still ahead, though :( ...

Posted: Wed Oct 30, 2013 10:34 pm
by Konk
Like many other I get about 2% errors in my speech-detected files and so I corrected some hours manually...
Since my png looks very corupt and 2/3 black, I would like to ask, if anybody with the solution would habe a look at my numbers. If anybody has a text file it's just a simple file compare. Currently I am pretty sure, that the first 8336 numbers are correct. In the png only the first 5 lines are nice, rest ist corrupted pixels and black.
So if anybody would be willing to have a quick look at my numbers - maybe you can drop me a message. I would appreaciate this very much.