Page 1 of 1

Hot 'n Tropic Climate

Posted: Tue Sep 06, 2016 6:58 am
by eulerscheZahl
AMindForeverVoyaging asked me to create this topic.

I've seen this type of compression before, as I answer questions in a different forum from time to time.
Teebee was so kind to help with the frequency table.

My pari/gp solution:

Code: Select all

{
	freq = [8.167, 1.492, 2.782, 4.253, 12.702, 2.228, 2.015, 6.094, 6.966, 0.153, 0.772, 4.025, 2.406, 6.749, 7.507, 1.929, 0.095, 5.987, 6.327, 9.056, 2.758, 0.978, 2.360, 0.150, 1.974, 0.074];
	n =  0.8579089046659475886936755259078 * 100;
	for(i=1,25,
		index = 1;
		s = 0;
		while(s + freq[index] < n,
			s += freq[index];
			index += 1);
		remaining = freq[index];
		n -= s;
		n = n / remaining * 100;
		printf("%c", 64 + index)
	)
}

Posted: Tue Sep 06, 2016 1:31 pm
by Hippo

Code: Select all

from fractions import *

freq = [
          ['A',8167,0],['B',1492,0],['C',2782,0],['D',4253,0],['E',12702,0],['F',2228,0],['G',2015,0],
          ['H',6094,0],['I',6966,0],['J',153,0],['K',772,0],['L',4025,0],['M',2406,0],['N',6749,0],
          ['O',7507,0],['P',1929,0],['Q',95,0],['R',5987,0],['S',6327,0],['T',9056,0],['U',2758,0],
          ['V',978,0],['W',2360,0],['X',150,0],['Y',1974,0],['Z',74,0]
       ]
sum = 0
for i in xrange(len(freq)):
   freq[i][2] = sum
   sum = sum+freq[i][1]
print sum
sum+=1

message = Fraction(8579089046659475886936755259078,10000000000000000000000000000000)
for i in xrange(25):
   for j in xrange(len(freq)):
       if message < Fraction(freq[j][2],sum):
           print freq[j-1][0]
           message -= Fraction(freq[j-1][2],sum)
           message /= Fraction(freq[j-1][1],sum)
           break
the sum+=1 seems to be bug in encodding

seems one off would make it more readable ...

Posted: Wed Sep 07, 2016 1:54 am
by stubbscroll
I've had an idea how to solve this for a while, but didn't have the correct letter frequencies until now. C code using GMP (with precision of 1000 digits to be on the safe side):

Code: Select all

#include <stdio.h>
#include <gmp.h>

double freq[27]={
	0.08167, 0.01492, 0.02782, 0.04253, 0.12702, 0.02228, 0.02015,
	0.06094, 0.06966, 0.00153, 0.00772, 0.04025, 0.02406, 0.06749,
	0.07507, 0.01929, 0.00095, 0.05987, 0.06327, 0.09056, 0.02758,
	0.00978, 0.02360, 0.00150, 0.01974, 0.00075, 0.0
};
double cumul[27];

int main() {
	mpf_t tall,t,f[27],c[27];
	int i,j;
	char s[100];
	cumul[0]=0;
	mpf_set_default_prec(1000);
	for(i=1;i<27;i++) cumul[i]=cumul[i-1]+freq[i-1];
	for(i=0;i<27;i++) {
		sprintf(s,"%.5f",freq[i]);
		mpf_init_set_str(f[i],s,10);
		sprintf(s,"%.5f",cumul[i]);
		mpf_init_set_str(c[i],s,10);
	}
	mpf_init_set_str(tall,"0.8579089046659475886936755259078",10);
	mpf_init(t);
	for(i=0;i<25;i++) {
		for(j=0;j<26;j++) if(mpf_cmp(tall,c[j+1])<0) {
			putchar('a'+j);
			mpf_sub(tall,tall,c[j]);
			mpf_div(tall,tall,f[j]);
			break;
		}
	}
	return 0;
}

Posted: Sun Sep 11, 2016 11:18 am
by AMindForeverVoyaging
Ah, so arithmetic encoding it was. An interesting concept. Too bad that it makes for an inherently flawed challenge if finding the original assignment for symbols and frequencies can be ambiguous, as is the case here.

Posted: Mon Sep 12, 2016 12:36 pm
by Hippo
AMindForeverVoyaging wrote:Ah, so arithmetic encoding it was. An interesting concept. Too bad that it makes for an inherently flawed challenge if finding the original assignment for symbols and frequencies can be ambiguous, as is the case here.
Yep, I could imagine the encodding based on bigrams, but there would be much more small decisions where the implementation details would make it unbreakable ... This way the challenge was rather fine, especially with W% note.