In Ten numbers on a blackboard, someone asks about the largest number you can compute by reducing a set of numbers. I was surprised to see that someone spent quite a bit of time to brute force it and that their Python solution was so slow. So many of the things I write about in the “Benchmarking” and “Profiling” chapters come into play in this conversation.
The particular problem is to reduce a list of numbers by taking them off of the list and replacing them with the single value a*b + a + b
. Expressed in Perl, I write:
my @array = 1 .. 10; my $sum = 0; foreach ( @array ) { $sum = $sum * $_ + $sum + $_; } print "Sum is $sum\n";
The answer for that sequence is 39,916,799.
There’s a bit of a trick here by starting with $sum
at 0
. The first loop gets me back to the first item in the list. It’s a nice trick in programming to include a degenerate case and even nicer that Perl allows and even expects me to start with an undefined value. I could just as well start with the first element directly by shifting it into $sum
:
my @array = 1 .. 10; my $sum = shift @array; foreach ( @array ) { $sum = $sum * $_ + $sum + $_; } print "Sum is $sum\n";
Still, the question remains. Would I get the same answer no matter the order of the numbers in the sequence? I can generate all of the permutations of the list, something explained in the perlfaq4’s How do I permute N elements of a list? as well as StackOverflow’s How can I generate all permutations of an array in Perl?.
For this article, I’ll use Algorithm::Permute. It’s a bit of an odd interface because its permute
function takes a code block where I get the permutation in an array variable of the same name:
use Algorithm::Permute ('permute'); my @array = 1 .. 10; permute { print "@array\n"; } @array;
Careful here! There are 10! permutations of the order of 10 distinct thingys. That’s 3,628,8000 lines of output. Let’s start with something a bit smaller. I’ll start with two items:
use Algorithm::Permute ('permute'); @ARGV = 1 unless @ARGV; my @array = 1 .. $ARGV[0]; permute { print "S = [@array] "; my $sum = $array[0]; foreach ( 1 .. $#array ) { $sum = $sum * $array[$_] + $sum + $array[$_]; } print "sum is $sum\n"; } @array;
Here are a few runs where I see the sum is the same each time (more on that in a moment):
% permute 2 S = [1 2] sum is 5 S = [2 1] sum is 5
% permute 3 S = [1 2 3] sum is 23 S = [1 3 2] sum is 23 S = [3 1 2] sum is 23 S = [2 1 3] sum is 23 S = [2 3 1] sum is 23 S = [3 2 1] sum is 23
% permute 4 S = [1 2 3 4] sum is 119 S = [1 2 4 3] sum is 119 S = [1 4 2 3] sum is 119 S = [4 1 2 3] sum is 119 S = [1 3 2 4] sum is 119 S = [1 3 4 2] sum is 119 S = [1 4 3 2] sum is 119 S = [4 1 3 2] sum is 119 S = [3 1 2 4] sum is 119 S = [3 1 4 2] sum is 119 S = [3 4 1 2] sum is 119 S = [4 3 1 2] sum is 119 S = [2 1 3 4] sum is 119 S = [2 1 4 3] sum is 119 S = [2 4 1 3] sum is 119 S = [4 2 1 3] sum is 119 S = [2 3 1 4] sum is 119 S = [2 3 4 1] sum is 119 S = [2 4 3 1] sum is 119 S = [4 2 3 1] sum is 119 S = [3 2 1 4] sum is 119 S = [3 2 4 1] sum is 119 S = [3 4 2 1] sum is 119 S = [4 3 2 1] sum is 119
Rather than output every sum, I modify the program to find a sum that’s not the same. If I can find one that’s different I know that I might have to brute-force the problem to find the maximum:
use v5.10; use Algorithm::Permute ('permute'); @ARGV = 1 unless @ARGV; my @array = 1 .. $ARGV[0]; permute { my $sum = $array[0]; foreach ( 1 .. $#array ) { $sum = $sum * $array[$_] + $sum + $array[$_]; } state $last_sum = $sum; say "Sum for [@array] is special: $sum" unless $last_sum eq $sum; } @array;
I like this trick with state
where I initialize it with the first value of $sum
, but with this code I don’t see any output. Instead, I can use a hash to count the sums that I compute. If I accumulate more than one sum, I do more work to find the minimum and maximum values:
use v5.10; use Algorithm::Permute ('permute'); use List::Util qw(min max); @ARGV = 1 unless @ARGV; my @array = 1 .. $ARGV[0]; my %Sums; permute { my $sum = $array[0]; foreach ( 1 .. $#array ) { $sum = $sum * $array[$_] + $sum + $array[$_]; } $Sums{$sum}++; } @array; if( 1 < values %Sums ) { print "There is more than one sum for n=$ARGV[0]!"; my $min = min( keys %Sums ); my $max = max( keys %Sums ); say "Min: $min Max: $max"; } else { say "There is only one sum for n=$ARGV[0]" . (keys %Sums)[0]; }
Now for the brute-force. This is the part that caught my eye about imallett's Python solution, which he says:
For what it's worth, after reading the answers I decided to see how long a brute force program would take. My solution takes 0.43 seconds on the analogous problem of length 7, 12.3 seconds for length 8, and 442.1 seconds for length 9. I didn't let it run for length 10.
There are so many odd things about that statement? Twelve seconds? Doesn't that seem odd for 7! (=5040) operations? Remember in Mastering Perl that I strongly emphasized that we can only compare times on the same setups. But, even without trying very hard, my solution is three orders of magnitude faster:
% time perl permute 7 There is only one sum: 40319 real 0m0.033s user 0m0.027s sys 0m0.005s
One of imallett's problems is his use of recursion, a popular technique that's great for high-level languages that can compile it into something that isn't recursive. This isn't a problem that needs it. Mark Jason Dominus talks about this quite a bit in Higher-Order Perl too.
Here's imallett Python solution, which I find hard to unravel:
#!/usr/bin/python import time def rem_comb(i,j, l): l2 = [] for index in range(len(l)): if index == i or index == j: continue l2.append(l[index]) a,b = l[i],l[j] l2.append(a*b + a + b) return l2 def experiment(l): if len(l) == 1: return l[0] else: best = (0, None,None) for i in range(len(l)): for j in range(i+1,len(l),1): value = experiment(rem_comb(i,j, l)) if value > best[0]: best = (value, i,j) return best[0] for length in range(1,10+1,1): t0 = time.time() value = experiment(list(range(1,length+1,1))) t1 = time.time() print("Length % 2d best %d (time %f)"%(length,value,t1-t0))
For N=7, it takes less than a second, but then it blows up at N=8 on my machine. These are roughly the same times he reports, but this is awful performance:
Length 1 best 1 (time 0.000006) Length 2 best 5 (time 0.000015) Length 3 best 23 (time 0.000043) Length 4 best 119 (time 0.000223) Length 5 best 719 (time 0.002201) Length 6 best 5039 (time 0.032876) Length 7 best 40319 (time 0.674944) Length 8 best 362879 (time 18.632982) Length 9 best 3628799 (time 683.095185)
I also strongly emphasized that in these sorts of issues, it's the algorithm that's usually the problem, not the language or the interpreter. I haven't written Python in a long time (and now I remember why), but this solution is much faster:
#!/usr/bin/python import time from itertools import permutations def add_it_up(p): sum = 0; for i in p: sum = sum * i + sum + i return sum def experiment(l): sum = 1; last_sum = 0 for p in permutations( range( 1, l+1 ) ): sum = add_it_up( p ) if last_sum > 0 and sum != last_sum: print( p ) print( "For %d, found two sums: %d %d" % ( l, sum, last_sum ) ) last_sum = sum return last_sum for length in range(1, 10+1, 1): t0 = time.time() value = experiment( length ) t1 = time.time() print( "Length % 2d best %s (time %f)" % (length,value,t1-t0) )
This leads to the advice I gave in Mastering Perl. Better algorithms are better than optimization. The accepted answer for this puzzle shows that for 1 .. N, the sum is N! - 1. More importantly, it shows that since all of the operations are commutative, the order doesn't matter so I don't have to compute every sum.
Now I divert from the question, which already has its answer. Now I merely want to make the sums to see how fast I can do it. I already show in the "Profiling" chapter how to compute factorials while saving the result from previous computations (you can see all of the book's programs on the Downloads page:
#!/usr/bin/perl # factorial_iterate_bignum_memo.pl use bignum; { my @Memo = (1); sub factorial { my $number = shift; return unless int( $number ) == $number; return $Memo[$number] if $Memo[$number]; foreach ( @Memo .. $number ) { $Memo[$_] = $Memo[$_ - 1] * $_; } $Memo[ $number ]; } } { print "Enter a number> "; chomp( my $number = <STDIN> ); exit unless defined $number; print factorial( $number ), "\n"; redo; }
As I continued to brute-force the problem, I would have to redo quite a bit of work to compute factorials I already know the answer to, but there's no reason to do all that extra work. I modify the factorial program a little (and add some v5.10 features):
use v5.10; sub factorial { my $number = shift; state $Memo = [1]; return unless int( $number ) == $number; return $Memo->[$number] if $Memo->[$number]; foreach ( @$Memo .. $number ) { $Memo->[$_] = $Memo->[$_ - 1] * $_; } $Memo->[ $number ]; } foreach ( 1 .. $ARGV[0] ) { say "N = $_, sum is ", factorial( $_ ) - 1; }
Now I know the sums virtually instantly:
% time perl factorial 100 N = 1, sum is 0 N = 2, sum is 1 N = 3, sum is 5 N = 4, sum is 23 N = 5, sum is 119 N = 6, sum is 719 N = 7, sum is 5039 N = 8, sum is 40319 N = 9, sum is 362879 N = 10, sum is 3628799 ... N = 99, sum is 9.33262154439441e+155 N = 100, sum is 9.33262154439441e+157 real 0m0.007s user 0m0.003s sys 0m0.003s
Suppose now, that I did have to check every sum because the operations weren't commutative and the order mattered. How would I make everything faster? I'd do the same thing I did with the factorial example. I would save the result of previous operations, even across different-sized inputs. First, I'll comment out the meat (at line 15) and count the number of calls for each input length:
use v5.10; use Algorithm::Permute; my $Memo; my %Calls; sub make_key { join "\000", @_ } sub in_order { my $key = make_key( @_ ); $Calls{ scalar @_ }++; if( exists $Memo->{$key} ) { #return $Memo->{$key}; } if( @_ == 2 ) { return $Memo->{$key} = $_[0]*$_[1] + $_[0] + $_[1]; } elsif( @_ == 1 ) { return $_[0] } $Memo->{$key} = in_order( in_order( @_[0 .. $#_-2] ), in_order( @_[-2, -1] ) ); } N: foreach my $n ( 2 .. $ARGV[0] ) { my $p = Algorithm::Permute->new( [ 1 .. $n ], $n ); my $last_sum; PERMUTE: while( my @res = $p->next ) { my $key = make_key( @_ ); $Memo{ $key } = in_order( @res ); if( defined $last_sum and $Memo{ $key } != $last_sum ) { say "For (@res), sum is different [$last_sum != $Memo{ $key }]"; next N; } $last_sum = $Memo{ $key }; } say "N = $n: sum is $last_sum"; } say Dumper( \%Calls ); use Data::Dumper;
This is still pretty slow, but much faster than the recursive solution in Python:
% time perl memo 10 N = 2: sum is 5 N = 3: sum is 23 N = 4: sum is 119 N = 5: sum is 719 N = 6: sum is 5039 N = 7: sum is 40319 N = 8: sum is 362879 N = 9: sum is 3628799 N = 10: sum is 39916799 $VAR1 = { '3' => 368046, '1' => 368046, '6' => 3669840, '5' => 368040, '10' => 3628800, '8' => 3669120, '9' => 362880, '4' => 3669864, '2' => 35878886, '7' => 367920 }; real 2m53.379s user 2m51.615s sys 0m1.697s
Now, I'll uncomment that line and try again. I cut the time in half, roughly, and see there are fewer calls. I strongly emphasize this in the Mastering Perl too. If I think I know the situation, I should make a change to ensure what I understand to happen actually does:
% time perl memo 10 N = 2: sum is 5 N = 3: sum is 23 N = 4: sum is 119 N = 5: sum is 719 N = 6: sum is 5039 N = 7: sum is 40319 N = 8: sum is 362879 N = 9: sum is 3628799 N = 10: sum is 39916799 $VAR1 = { '8' => 3669120, '2' => 12323810, '6' => 1815120, '5' => 181560, '1' => 504, '9' => 362880, '3' => 15126, '7' => 367920, '10' => 3628800, '4' => 151224 }; real 1m27.931s user 1m26.666s sys 0m1.251s