A Repeat of Pi

Discussion of challenges you have already solved
megabreit
Posts: 141
Joined: Sat Jan 03, 2009 3:33 pm

A Repeat of Pi

Post by megabreit »

How did you do that?
My solution in perl uses a sliding window and takes just a few sec to display the right solution (even if it continues to run further but don't display any more possible results)

Code: Select all

#!/usr/bin/perl
$string="141592....58151" # the 1mio pi string
$strlen=length($string);
$sub_len=5;	# start with 5
my %match;
while ($sub_len <= ($strlen/2)) {
	undef %match;
	for ($idx=0; $idx<$strlen-$sub_len; $idx++) {
		$str1=substr($string,$idx,$sub_len);
		if ( not defined $match{$str1} )
		{
			$match{$str1}=1;
		} else {
			print "$str1\n";
			last;
		}
	}
	$sub_len+=1;
}
MerickOWA
Posts: 182
Joined: Mon Apr 07, 2008 5:54 pm
Location: HkRkoz al KuwaiT 2019 HaCkEr 101

Post by MerickOWA »

I did the same sort of thing only in reverse, starting with the largest string and going to the smallest.

There seems like there should be a much faster way to do this with a hash of sets of consecutive numbers from pi or something like that.

I didn't worry about it too much because the simple search just took a little while to run and found the answer.
megabreit
Posts: 141
Joined: Sat Jan 03, 2009 3:33 pm

Post by megabreit »

I'm not quite sure if there is really a substantially faster way because you need
to create all the hash elements first. While doing so you loose time for calculation/comparison.
Hm.. you could start with a pre-built hash and not with a 1mio character string...

But as you say: If the solution is calculated in a convenient period of time, it's OK.
On the other side: We seem to be all geeks and to care about those nasty little details :lol:
gfoot
Posts: 269
Joined: Wed Sep 05, 2007 11:34 pm
Location: Brighton, UK

Post by gfoot »

I hashed strings of length N to position within PI, found matches, and brute-force searched for the longest subsequent sequence. It was very fast in the end, but initially terribly slow - I had to use quite a large N value (lots of memory) to get the fast results. That was surprising because I expected the large N value to make the initial indexing step too slow, and the memory use to make the later searching steps slow too... but I was wrong.

I don't remember the exact hash string length I used in the end though, just that it was bigger than I had expected.
User avatar
teebee
Posts: 89
Joined: Mon Nov 10, 2008 3:21 pm
Location: Germany

Post by teebee »

I used the sample program lrstext.c of the libstree library (http://www.icir.org/christian/libstree/) which computes the longest repeated substring of a string using a suffix tree.
theStack
Posts: 72
Joined: Sun Nov 02, 2008 12:46 am

Post by theStack »

Probably the slowest solution ever (took a few hours to compute), pure brute-force approach without any optimization in python:

Code: Select all

#!/usr/bin/env python
fd = open("pimillion", "r")
pi = fd.readline()
fd.close()

for seqlen in xrange(1, 500001):
    print "len: ", seqlen
    for index in xrange(0, len(pi)-seqlen):
        sequence = pi[index:index+seqlen]
        if sequence in pi[index+1:]:
            print "startindex", index, "sequence:", sequence
            break
Maybe it would have been faster if I used a special method for searching substrings instead of the "in" operator.
gfoot
Posts: 269
Joined: Wed Sep 05, 2007 11:34 pm
Location: Brighton, UK

Post by gfoot »

I found my code - I knew it must have been there somewhere...

Tidying a few things up, and timing it for index lengths 5 through 9, it executes in about 12 seconds for length 5, and about 5.5 seconds for length 9. I didn't go higher, as it already needs about 256MB of memory to store the index (according to top, anyway) and the host only had 512MB. It also wasn't getting much faster any more.

This is all with garbage collection turned off during the critical phases - that saves a lot of time, if you don't run out of memory in the process. It seemed faster to do an explicit collect after the indexing phase though.

Code: Select all

import gc

def make_index(data,indexlen):
    map = {}
    for y in xrange(len(data)+1-indexlen):
        p = data[y:y+indexlen]
        if p not in map:
            map[p] = []
        map[p].append(y)
    return map

def find_longest(data,map):
    longestpos = None
    longestlen = 1
    for starts in map.itervalues():
        for (p1,x) in enumerate(starts):
            for y in starts[p1+1:]:
                l = longestlen
                while y+l < len(data) and data[x:x+l] == data[y:y+l]:
                    l += 1
                if l > longestlen:
                    longestpos = x
                    longestlen = l
    return data[longestpos:longestpos+longestlen]


fp = open("pi.txt")
data = fp.read()
fp.close()

data = ''.join([x for x in data if x.isdigit()])

gc.disable()
gc.collect()

map = make_index(data,7)
gc.collect()
longest = find_longest(data,map)

print longest
blablaSTX
Posts: 11
Joined: Mon Aug 17, 2009 8:12 pm
Contact:

Post by blablaSTX »

I took successive 8-digit sequences as hashkey a set of already encountered positions for that 8digit sequence. When encountering an already hashed sequence, look if it is acrually longer and remrember the longest.
Takes 4 seconds to find the answer in unoptimized code.

Code: Select all

_98_arepeatOfPI
    |sequenceHash digits oldPositions l maxLength k p1 p2
     maxPosition1 maxPosition2|

    maxLength := -1.
    digits := self digitsOfPI.

    sequenceHash := Dictionary new.

    "/ take 8-digit seuences as key
    1 to:digits size-8 do:[:index |
        k := Integer readFrom:(digits copyFrom:index to:index+7).
        oldPositions := sequenceHash at:k ifAbsent:nil.

        "/ to avoid a million sets, store either single position or set of positions
        "/ most combinations only occur once
        oldPositions isNil ifTrue:[
            sequenceHash at:k put:index.
        ] ifFalse:[
            oldPositions isInteger ifTrue:[
                oldPositions := Set with:oldPositions.
            ].
            oldPositions do:[:eachOldPosition |
                l := 8.
                p1 := eachOldPosition+8.
                p2 := index+8.
                [ (digits at:p1) = (digits at:p2) ] whileTrue:[
                    l := l + 1.
                    p1 := p1 + 1.
                    p2 := p2 + 1.
                ].
                l > maxLength ifTrue:[
                    maxLength := l.
                    maxPosition1 := eachOldPosition.
                    maxPosition2 := index.
                ].
            ].
            oldPositions add:index.
            sequenceHash at:k put:oldPositions.
        ].
    ].
    Transcript 
        show:('longest repeated sequence: %1 at %2 and %3'
                bindWith:(digits copyFrom:maxPosition1 to:maxPosition1 + maxLength - 1)
                with:maxPosition1
                with:maxPosition2).

    "
     self _98_arepeatOfPI
    "
markobr
Posts: 17
Joined: Thu May 20, 2010 4:09 pm
Location: Tübingen
Contact:

Post by markobr »

Not really fast but quite perlish with substr and =~ :

Code: Select all

$min_len = 2;
STELLE:
for ($stelle = 0; $stelle < 1000000; ++$stelle) {
	LEN:
	for ($len = $min_len; $len < (1000000 - $stelle); ++$len) {
		$folge = substr($pi,$stelle,$len);
		$rest = substr($pi,$stelle+$len);
		if ($rest =~ /$folge/) {
			$max = $folge;
			$min_len = $len;
			print STDERR "$len ab $stelle: $folge\n";
		}
		else {
			next STELLE;
		}
	}
}
User avatar
SinistraD
Posts: 89
Joined: Sun Aug 16, 2009 8:39 am
Location: find me
Contact:

Post by SinistraD »

The text of this challenge is really ambiguous, at first I was searching (with great success and many time) for the longest repeated sequence like 111111.

For the solution I used a map to search for the d length "word" with the maximum appearance. Every time I've ran the program changed the value of d until the result was 1. The previous result was the good one.

Code: Select all

import operator

f = open("D:\\tmp.txt")
s = f.read(10000000)

dict = {}
d = 13

for i in range(0, len(s) - d - 1):
    a = s[i:i + d]
    if dict.has_key(str(a)):
        dict[str(a)] += 1
    else:
        dict[str(a)] = 1

max_seq = max(dict.iteritems(), key=operator.itemgetter(1))[0]

print(max_seq)
print(dict[max_seq])
DamaTeq
Posts: 8
Joined: Tue Feb 01, 2011 7:40 pm

Post by DamaTeq »

I also used a "sliding window" for the first time using a specific length. Those sequences that found are stored in the list. Afterwards i just go through the list and check if those strings (+x) are also found. when not - it gets removed from the list.

The most time is used for making the index (> 5 minutes), the rest is fast. So, when i look at the other solutions above and read how long they take for the answer, than i can see a clearly disadvantage of c# - its f*cking slow ;)

Code: Select all

static void Main(string[] args)
        {
            String pi = "";
            using (StreamReader sr = new StreamReader(File.OpenRead("pi.txt")))
            {
                pi = sr.ReadToEnd().Substring(2);
            }

            int startingLength = 10;
            List<int> foundIndexes = makeIndexList(pi, startingLength);
            String seq = FindLongestRepeatedSequence(pi, startingLength+1, foundIndexes);
            
            Console.WriteLine("The answer is: " + seq);
            Console.WriteLine("Ready ...");
            Console.ReadLine();
        }
        private static String FindLongestRepeatedSequence(String pi, int length,  List<int> foundIndexes)
        {
            while (true)
            {
                for (int i = foundIndexes.Count - 1; i >= 0; i--)
                {
                    int index = foundIndexes[i];
                    if (index + length >= pi.Length) return pi.Substring(index, length-1);

                    String search = pi.Substring(index, length);
                    String remaining = pi.Substring(length + index);

                    if (remaining.IndexOf(search) == -1)
                    {
                        if (foundIndexes.Count == 1) return search;
                        foundIndexes.RemoveAt(i);
                    }
                }
                length++;
            }
        }
        private static List<int> makeIndexList(String pi, int length)
        {
            List<int> foundIndexes = new List<int>();

            Parallel.ForEach(Enumerable.Range(0, pi.Length - length -1), x =>
            {
                if (pi.LastIndexOf(pi.Substring(x, length)) != x)
                    foundIndexes.Add(x);
            });
            return foundIndexes;
        }
moose
Posts: 67
Joined: Fri Jul 16, 2010 7:32 pm

Post by moose »

Here is my bit of python. I got the solution after some minutes (15? 30? I don't know as I ate cake with my father and my grandmother ;-) ... reminds me of xkcd ):

Code: Select all

import string, psyco

psyco.full()

def readpi(digits=1000000,filename = '/home/moose/Desktop/challenges/pi.txt'):
    f = open(filename)
    strDigits = f.read()
    strDigits = strDigits.replace("\n", "")
    return strDigits[0:digits]

digits = 1000000
maxLength = 0
start = 0
piDigits = str(readpi(digits))
sequence = ""
print("calculated %i digits of pi" % digits)

while start < digits:
    length = maxLength + 1
    while string.count(piDigits, piDigits[start:start+length]) >= 2 and (start+length<digits):
        length += 1
    length -= 1
    if length > maxLength:
        maxLength = length
        sequence = piDigits[start:start+maxLength]
        print("Sequence-Length\t: %i" % maxLength)
        print("Sequence\t: %s" % sequence)
        print("Start\t\t: %i" % start)
        print("")
    start += 1

print("Length of longest repeated sequence in the %i digits of pi: %i" % (digits, maxLength))
print("Sequence: %s" % sequence)
longxie
Posts: 4
Joined: Fri Jul 15, 2011 7:53 am

Post by longxie »

I just google it, and get the answer at http://boards.straightdope.com/sdmb/arc ... 39467.html
zafarrancho
Posts: 2
Joined: Thu Aug 18, 2011 2:08 pm

Post by zafarrancho »

I did it with zsh:

# cat pi_10e6digits.txt| sed 's;.;&\n;g' > pi_10e6_newlines.txt
# for i in {1..12} ; cat pi_10e6_newlines.txt| tail -n +$i > listpi.$i.txt
# paste listpi.{1..12}.txt| sed 's;\s;;g' | sort -n | uniq -c | grep -v " 1 "

computing time was about 10 seconds
megabreit
Posts: 141
Joined: Sat Jan 03, 2009 3:33 pm

Post by megabreit »

I like your zsh version... going back to the roots... (even though it's nearly not zsh specific).
I often forget that the paste command exists. It's very useful here, nice algorithm.
The "real" execution time is at least 2 min, since you can't know before, that the solution is 12 chars long... but still impressive :-)
Post Reply