Coastal Vectors

Project Euler: 10

Problem 10 of Project Euler is to find the sum of all primes under 2,000,000.

This isn’t a hard problem in most languages. I have recently been going through and re-solving problems in various languages for the fun of it. Just for fun, I decided to solve some problems using Bash scripting. Bash has all the constructs of a real programming language, however, the main draw back for problems like this is that it’s slow. VERY slow. Well, lets see what we can do to speed this problem up.

Brute Force

The brute force approach is pretty simple. Loop through every number, n, less than 2,000,000, and try dividing it by every number from 2 to n-1. If it is divisible, break out of the loop and try the next number.

sum=2
for (( i=3; i<2000000; i++ )); do
    prime=1
    for (( d=2; d<$i; d++ )); do
    if [ $(($i % $d)) -eq 0 ]; then
        prime=0
        break;
    fi
    done
    if [ $prime -eq 1 ]; then
    sum=$(($sum + $i))
    fi
done
echo $sum

This solution has no optimization, but will still yield a result in a relatively small time-frame for most languages. This would take incredibly long to process primes smaller than 2,000,000. For a benchmark, I have run this through primes less than 2,000, and it took 6.430 seconds. We need to optimize.

Optimize

There are a lot of simple changes that we can do to get our compute times down. For instance, if we only loop through odd numbers, we have half as many to process.

Another classic technique to optimize a prime test is to check for factors less than or equal to the square root of the number. The reason this works is that if a factor of our number, n, is larger than the square root of n, we will find the other factor has to be smaller than the square root of n, and will thus be detected.

sum=2
for (( i=3; i<2000; i+=2 )); do
    prime=1
    for (( d=2; d*d<=$i; d++ )); do
    if [ $(($i % $d)) -eq 0 ]; then
        prime=0
        break;
    fi
    done
    if [ $prime -eq 1 ]; then
    sum=$(($sum + $i))
    fi
done
echo $sum

These two changes alone bring our compute time down to .319 seconds for primes under 2,000! I still wouldn’t recommend trying computing all primes under 2,000,000. I tried, and killed it after 3 days without a result!

Sieve of Eratosthenes

Prime sieves are a super fast way of computing prime numbers without having to do any division. This method is attributed to Eratosthenes of Cyrene, a Greek mathematician living about 200 BC. (You can read more about the history on Wikipedia).

Basically, you have an array of every number you want to check. Start by making each element of the array equal to 1, indicating it is a prime. Go through the array, starting at two. Whenever you hit a value that is 1, mark all the multiples of that value as 0’s. So, we look at element 2, which is prime because it’s value is 1. element whose index is a multiple of 2 should then get marked as 0, since that cannot be prime. You then advance to the next element, and repeat until you’ve visited every number in the array.

limit=2000
primes[$limit]=0
primeList[0]=2
primeIndex=0
sum=0
for (( i=2; i<$limit; i++ )); do
    if [[ ${primes[$i]} -eq \0 ]]; then
        sum=$(($sum+$i))
        primeList[$primeIndex]=$i
        primeIndex=$(($primeIndex+1))
        for (( n=$i; n<$limit; n+=$i )); do
            primes[$n]=1
        done
    fi
done

echo $sum

Implementing this method definitely improves results for our benchmark of primes less than 2,000; .186 seconds! However, this approach does not scale up to the desired 2,000,000 because Bash arrays are super slow as they get larger. Even though this approach worked great in our benchmark, it will still take upwards of 3 days to complete for all the primes under 2,000,000.

Combination of Methods

By using a sieve method combined with our optimized method, I was able to finally get the correct answer in 34 minutes using a Bash script.

I used a sieve to generate the primes between 1 and sqrt(2000000). This was fast since the array wasn’t nearly as big. I then used the array of primes that I generated to check every odd number between 2 and 2,000,000.

Another shortcut to reduce computation is to make a second array containing the squares of each prime. This means that we don’t need to perform multiplication in each loop when we are checking for primality.

limit=2000000
sum=0

# Generate primes up to sqrt of 2000000 using sieve
primeLimit=$(echo "scale=0;sqrt($limit*2)" | bc)
primes[$primeLimit]=0
primeList[0]=2
primeListSquared[0]=4
primeIndex=0
for (( i=2; i<$primeLimit; i++ )); do
    if [[ ${primes[$i]} -eq \0 ]]; then
        sum=$(($sum+$i))
        primeList[$primeIndex]=$i
        primeListSquared[$primeIndex]=$i*$i
        primeIndex=$(($primeIndex+1))
        for (( n=$i; n<$primeLimit; n+=$i )); do
            primes[$n]=1
        done
    fi
done

# Make sure primeLimit is an odd number
[[ $(($primeLimit % 2)) -eq 0 ]] && primeLimit=$(($primeLimit + 1))

# Use known primes from above to check each remaining number for primality
for (( i=$primeLimit; i<$limit; i+=2 )); do
    prime=1
    for (( p=0; ${primeListSquared[$p]}<=$i; p++ )); do
        if [[ $(($i % ${primeList[$p]})) -eq 0 ]]; then
            prime=0
            break
        fi
    done
    # Record!
    if [[ $prime -eq 1 ]]; then
        sum=$(($sum+$i))
    fi
done

echo $sum

34 minutes. Finally, a “reasonable” result (by some definitions of reasonable).

Multi-Processing

The last step I took was by splitting up the task of finding primes into multiple processes. This allowed all 8 of my cores to work on the solution. I moved my prime search method into a function that took a start and stopping point as parameters, and then echoed the sum for that portion back to a while loop that added them all together.

limit=2000000

# Generate primes up to sqrt of limit
primeLimit=$(echo "scale=0;sqrt($limit*2)" | bc)
primes[$primeLimit]=0
primeList[0]=2
primeListSquared[0]=4
primeIndex=0
for (( i=2; i<$primeLimit; i++ )); do
    if [[ ${primes[$i]} -eq \0 ]]; then
        primeList[$primeIndex]=$i
        primeListSquared[$primeIndex]=$i*$i
        primeIndex=$(($primeIndex+1))
        for (( n=$i; n<$primeLimit; n+=$i )); do
            primes[$n]=1
        done
    fi
done

function checkRange() {
    start=$1
    end=$2
    rangeSum=0

    # Make sure range starts at an odd number
    [[ $start -lt 2 ]] && start=3 && rangeSum=2
    [[ $(($start % 2)) -eq 0 ]] && start=$(($start + 1))

    # Use known primes to check all remaining numbers for primality
    for (( i=$start; i<$end; i+=2 )); do
        prime=1
        for (( p=0; ${primeListSquared[$p]}<=$i; p++ )); do
            if [[ $(($i % ${primeList[$p]})) -eq 0 ]]; then
                prime=0
                break
            fi
        done
        # Record!
        if [[ $prime -eq 1 ]]; then
            rangeSum=$(($rangeSum+$i))
        fi
    done
    echo $rangeSum
}

# Split up task into chunks, and check with multiple processes
sum=0
range=$(($limit / 50))
for (( start=0; start<$limit; start+=$range )); do
    checkRange $start $(($start+$range)) &
done | while read result; do
    sum=$(($sum+$result))
    echo $sum
done | tail -1

This method brought the compute time down to 3 minutes 22.075 seconds.

Conclusion

Project Euler gives the guideline that if your solution takes more than a minute, you’re probably doing it wrong. For this particular task though, I think 3 minutes is rather successful given the constraints of Bash. It was a fun project to try to optimize such a relatively simple algorithm within the constraints of this little scripting language.

If you can find any way to speed up the results, please let me know!

Visit the solution on my GitHub page.

Project Euler 357

I recently completed Project Euler 357 after working on it here and there over the course of a couple of days. I decided to do this one out of order because it seemed like a fairly straight foward problem.

Consider the divisors of 30: 1,2,3,5,6,10,15,30.
It can be seen that for every divisor d of 30, d+30/d is prime.

Find the sum of all positive integers n not exceeding 100 000 000
such that for every divisor d of n, d+n/d is prime.

This shouldn’t be hard. Just find the divisors of every number, and see if n+1 is prime. Heck, I already have an isPrime() function written! The brute force approach will be easy to code.

I started off with code resembing this:

bool isPrimeGen(int n, bool *primes)
{
    for (int d=1; d<=n; d++)
        if (n % d == 0)
            if (primes[d+n/d] != true)
                return false;
    return true;
}
int main()
{
    int limit = 100000000;
    unsigned long long sum = 1;
    for (int i=2; i<limit; i++)
        if (isPrimeGen(i))
            sum += i;
    printf("%d", sum)
}

Well, this worked… But it took far too long. It took over 4 hours to complete. Obviously this doesn’t fit in with the guidlines that solutions should take less than a minute to complete.

I was able to make a single optimization that got the time down from 4 hours to 12 minutes. A single, tiny change. (It’s actually a similar optimization I made to my prime function when I first started doing these Euler problems…) Instead of looking for every divisor up to the size of n, our number, I put in a square root n, sqrt(n), as the limit for divisor checking.

Another optimization was to make use of looking for patterns in the qualifying numbers.

n is always even
n is always 1 less than a prime number
n/2+2 is always a prime number

Filtering down the potential numbers by using these patterns further reduced the time, but it was still taking longer than I would have liked. I decided to take drastic measures… I pre-computed a ton of primes using a Seive of Eratosthenes.

I made a giant boolean array where any number is marked as a prime or not. This made looking up whether or not primes are numbers super trivial. I’m not sure this is the most elegant solution, but it seemed simple and fast.

// Generate primes using sieve
bool *primes = malloc(limit*sizeof(bool));
for (int i=2; i<limit; i++)
    primes[i] = true;
primes[0] = false;
primes[1] = false;
for (int i=2; i<limit; i++)
    if (primes[i] == true)
    {
        int n = 2;
        while (n*i < limit)
            primes[i*n++] = false;
    }

So, this generates an array, and I pre-load the value for each number as ‘true’ (excepting for 0 and 1 of course). I then go through sequentially, and if I hit a number that is ‘true,’ or prime, every multiple of that number gets marked as ‘false,’ or not prime, until we reach the limit of the array. This is a super fast method for calculating an obscene number of primes very quickly because it doesn’t require any division.

With this pre-loaded table of primes, the code was then refactored to make use of this table rather than my standard isPrime() function.

Putting all these changes together allowed the problem to be solved in under 5 seconds, which is just a tad better than my original 4 hours.

Of course, I won’t reveal the answer here. You’ll need to figure that out on your own. You can, however, check out my full solution on my GitHub account.

Mandelbrot

Math can be beautiful. One of the mathematical things that has always intrigued me has been the visualization of the Mandelbrot set. I never tried to generate it myself until just recently. The only understanding I had previously of the Mandelbrot set was vague. I looked for some explanation on how to do the computation, but I couldn’t find an adequately succinct explanation. My brother helped me take the Mandelbrot equation and massage it into an understandable format that I could then plug into a program to generate my own image of the Mandelbrot set.

Generating the Mandelbrot set is straight forward once you understand what the Mandelbrot set really is. The fractal depicts a grid of coordinates along the x and y axis that either fall into the set or not. The formula gets computed using the x and y position for the pixel in question, and then the resulting number get put back into the equation. The resulting number either gets larger or smaller. If it passes a certain threshold, then we decide it is not a part of the set, and move on. The resulting image is a bit map of numbers that are either part of the Mandelbrot set, or not. The fun gradients and colors, with which we are all familiar, represent how many times the calculation had to be performed before the number passed the threshold.

The Mandelbrot equation is as follows, where z_(n+1) is the value for which we are solving, (z_n)² is our accumulated value so far, and C is our complex coordinate.

z_(n+1) = (z_n)² + C

We are just focusing on the right side of the equation for now, because the left contains the values that we want to solve for. Let’s expand the right side of the equation with complex coordinates so we can simplify the answer.

= (x′ + y′ × i)² + x + y × i

The next step is to expand the accumulated value. We can use our old friend FOIL.

= x′² + 2(x′ × y′ × i) + (y′ × i)² + x + y × i

The i² simplifies down to -1, reducing the number of imaginary numbers to worry about.

= x′² + 2(x′ × y′ × i) – y′² + x + y × i

Next we will move all the components with an imaginary number (i) to the right.

= (x′² + x – y′² )+( 2(x′ × y′ × i) + y × i)

Now we can the simplify by isolating the i.

= (x′² + x – y′²)+(2 × x′ × y′ + y) × i

Now the right side of the equation resembles a complex coordinate, which happens to be what we are solving for on the left.

x′′ + y′′ × i = …

So, each element of the left hand side of the equation can be solved with its counterpart element on the right.

x′′ = x′² + x – y²

y′′ = 2 × x′ × y′ + y

i = i

Since the i is equal to i, and remains independent from the rest of the variables, we can just ignore it.

Now, all we need to do is plug x and y to generate our new x′′ and y′′, and then keep plugging in the new values until the numbers go out of bounds, or we reach our threshold.

With the equation in hand, I started out by prototyping the program in Python, and outputting a Portable PixelMap (PPM) file because it is a dead-simple file format. When I first ran the program, it generated this image:

I was really disappointed that I didn’t get a meaningful image the first time I ran it. I verified the equation I was using, and the basic logic of the code, but it still came out like this. I kicked up the threshold a little bit to see what would happen, and suddenly I realized what the problem was. I was only rendering the upper left quadrant of the image. I finagled the bounds of the image, and ended up with the recognizable shape here:

This certainly looked better, but something was still off. In all the Mandelbrot images I had seen, the different threshold levels looked smooth and curved, not jagged like mine were looking. It turned out I had misunderstood how to check if the calculation was out-of-bounds. This code is the culprit:

if (abs(xT) + abs(yT)) > 10:

This is the conditional statement that decides whether the number has fallen out-of-bounds. The key is to check if the number has left the circular area that contains the entire Mandelbrot set, which has a radius of 2. This would mean that the enclosing circle is something like x′′² + y′′² = 2². The corrected out-of-bounds check looks like this:

if ((xT**2) + (yT**2)) > 4:

With that change, the images started coming out beautifully. With that, I started adding in command line options to set the bounds of the image, the depth, the gradient colors, etc. in order to easily render custom images of the set.

Now that I did it all in Python, I decided I should do it in C. Rendering 1024 pixel square image in Python takes about 15 seconds. To compare, rendering the same image in C takes .1 seconds; substantially faster!

The following code from my function, mandel(), calculates whether any given point is in the Mandelbrot set. The function takes an x and a y coordinate, and the threshold level which I call “depth”. It returns a -1 if the number is in the set, or it will return an integer representing the number of iterations it took for the point to become out-of-bounds.

int mandel(double x, double y, int depth)
{
    double xP=0, yP=0; // Prime Values
    double xT=0, yT=0; // Temporary Values
    int i=0;
    for (i=0; i<depth; i++)
    {
        xT = (pow(xP, 2)) + x - (pow(yP, 2));
        yT = 2 * xP * yP + y;
        if (pow(fabs(xT),2) + pow(fabs(yT),2) > 4)
            return i;
        xP = xT;
        yP = yT;
    }
    return -1;
}

The current code for both the Python and C versions are available from my GitHub page.