Square roots are often used in computer to calculate distances (from Pythagorus). In graphics (e.g. gaming) the inverse square root is used to normalise 3-D vectors (i.e. make sure that between the r,g,b channels, you aren't outputting more than 1 unit of light). Unlike the distance calculation, the inverse sqrt() doesn't have to be all that accurate. There is a fast inverse sqrt() for graphics, see Origin of Quake3's fast inverse square root (http://www.beyond3d.com/content/articles/8/).
There are manuscripts from 1650BC referring to Egyptian methods of extracting square roots (see wiki square root. http://en.wikipedia.org/wiki/Square_root#History).
There is a sqrt() in the python math module. It calls the C math libraries, which call the hardware sqrt() function on your computer's floating point unit (fpu). The sqrt() then is being done in special hardware. You aren't going to get any faster than that. Despite the ready availability of a fast accurate sqrt(), we're going to use one of the simpler sqrt() methods, the Babylonian method, an algorithm well suited to hand calculation.
First we need to know the precision (number of significant figures) that python uses to represent real numbers. (What is the precision for integers [145] ?) Remember that a 32 bit computer can manipulate 232 integers (you do remember the value of 232? - it's about 10 decimal digits). We'll learn more about Real Numbers later, but for the moment we won't go far wrong if we assume that a 32 bit computer is capable of representing 232 real numbers. While there is only one integer between 0..2, there can be an infinite number of reals 0..2 and a computer can't hope to represent all of them. The computer finds the nearest one it can represent, whether it's correct or not, and say "it's near enough". If we go to 64-bits, we'll get a more accurate representation, but it still won't be exact.
Here's some code to show that floating point numbers longer than 12 digits are truncated i.e. any following numbers are garbage.
>>> for x in range (1,15): ... print x, 1+10**-x ... 1 1.1 2 1.01 3 1.001 4 1.0001 5 1.00001 6 1.000001 7 1.0000001 8 1.00000001 9 1.000000001 10 1.0000000001 11 1.00000000001 12 1.0 13 1.0 14 1.0 |
101212 is 2? [146] This is not enough accuracy for a 64 bit number and is too much for a 32 bit number. We should suspect that we've goofed. The above truncation turns out to be due to limitations of the formatting. print uses str() for outputting, which only outputs 12 digits after the decimal point. If we increase the number of digits displayed, you can get about 15 significant figures before getting garbage. (Note: we are finding the machine's epsilon).
>>> for x in range (1,20): ... print "%2d, %10.40f" %( x, 1+10**-x) ... 1, 1.1000000000000000888178419700125232338905 2, 1.0100000000000000088817841970012523233891 3, 1.0009999999999998898658759571844711899757 4, 1.0000999999999999889865875957184471189976 5, 1.0000100000000000655120402370812371373177 6, 1.0000009999999999177333620536956004798412 7, 1.0000001000000000583867176828789524734020 8, 1.0000000099999999392252902907785028219223 9, 1.0000000010000000827403709990903735160828 10, 1.0000000001000000082740370999090373516083 11, 1.0000000000100000008274037099909037351608 12, 1.0000000000010000889005823410116136074066 13, 1.0000000000000999200722162640886381268501 14, 1.0000000000000099920072216264088638126850 15, 1.0000000000000011102230246251565404236317 16, 1.0000000000000000000000000000000000000000 17, 1.0000000000000000000000000000000000000000 18, 1.0000000000000000000000000000000000000000 19, 1.0000000000000000000000000000000000000000 |
of course we hoping to get
1, 1.1000000000000000000000000000000000000000 2, 1.0100000000000000000000000000000000000000 3, 1.0010000000000000000000000000000000000000 4, 1.0001000000000000000000000000000000000000 5, 1.0000100000000000000000000000000000000000 6, 1.0000010000000000000000000000000000000000 7, 1.0000001000000000000000000000000000000000 8, 1.0000000100000000000000000000000000000000 9, 1.0000000010000000000000000000000000000000 10, 1.0000000001000000000000000000000000000000 11, 1.0000000000100000000000000000000000000000 12, 1.0000000000010000000000000000000000000000 13, 1.0000000000001000000000000000000000000000 14, 1.0000000000000100000000000000000000000000 15, 1.0000000000000010000000000000000000000000 16, 1.0000000000000001000000000000000000000000 17, 1.0000000000000000100000000000000000000000 18, 1.0000000000000000010000000000000000000000 19, 1.0000000000000000001000000000000000000000 |
Any calculation of reals using python, will only be accurate to about the 15th place. 15 decimal places requires 50 bits (here "l" is log)
# echo "15*l(10)/l(2)" | bc -l 49.82892142331043521840 |
The IEEE double precision (64 bit) representation of real numbers uses 52 bits (close enough to the 50 bits we see above) to represent the mantissa, 11 bits for the exponent and 1 bit for the sign - see IEEE 754-1985 (http://en.wikipedia.org/wiki/IEEE_floating_point_standard#Double-precision_64_bit). It looks like python is using double precision (64 bit) to represent real numbers.
You start with your number>1, and an estimate of the square root (Many algorithms require you to give it starting value(s). Often these can be almost anything.) The square root is going to be somewhere between the number and 1, so make the estimate the arithmetic mean (the average).
estimate = (number + 1.0)/2.0 |
You use this estimate to get a better estimate
new_estimate = (estimate + number/estimate)/2.0 |
How does this work? Let's find the square root of 9.
number = 9 estimate = (9+1)/2.0=5.0 new_estimate = (5 + 9/5))/2.0 = 3.4 |
We started with estimate=5.0 and get estimate=3.4, which is closer to the answer. The estimate gets closer to the known answer with each iteration (the algorithm is said to converge). When we eventually get to our answer (at least within the precision of a real), there will be no change when we plug in the estimate.
estimate = 3 new_estimate = (3 + 9/3))/2.0 = 3 |
We keep iterating till the difference between the estimate and then new_estimate is acceptable. (for the proper way to do real comparisons see Lahey compiler) We could use division to test if they are close enough; this will be slow, but will work no matter what size the number is. We could use substraction, which is fast, but the remainder will be large for large numbers and small for small numbers. Ideally we'd like to get the maximum precision possible (double precision), but for the exercise, we'll substract and use a relatively large difference to terminate the algorithm.
Before we write the code that iterates, we need to initialise some numbers
Write some code that does these 4 steps and prints out the value of number and the estimate. Here's my code [147] .
We are about to enter a loop. We don't know how many iterations are going to be needed, but we do have a test to show that we're done. Should we use a for loop or a while loop [148] ?
At the top of the while loop we test if we should enter the loop again. What's the test that we've found the square root and how do we use it in the while line? Here's my code [149]
Assume that the loop is entered (i.e. we don't have the square root yet): Before you entered the loop, when you initialised a few variables, you generated an estimate of the square root and from it a better estimate. You should use exactly the same steps to inside the while loop, to generate a better value of estimate. Heres what happens inside the loop:
Write code to do this, and inside the loop, print out the value for estimate. Here's my code [150] .
Note | |
---|---|
In a for loop, the code for the calculations is all inside the loop. In a while loop, the code for the calculations must also be ahead of the loop (where it runs once), so that the first time the conditional test in the while statement is run, all values will be known. |
Where to have the print() statement: For the final code the print() statement will be commented out, so this decision isn't all that important. You can print out values at any step in the while loop and everyone reading your code will know what you've done. The location given is slightly better, as you're printing out the value that caused the loop to be executed. After the loop exits, the last value of new_estimate will be printed by a statement in the following code. If you'd printed after the calculation of new_estimate, then the value for the first iteration of the loop would not be printed.
The only modification left is to print out the final value for the square root [151] .
The order of an algorithm e.g. O(n), O(nlogn) describes the rate at which algorithm's execution time increases with the amount of data that it processes. The calculation of the square root has no data (that can be varied), so we can't use this measure. Instead the measure of worth is the increase in running time needed to increase accuracy by one decimal digit (i.e. a factor of 10).
With a print() statement in the loop, you can see the number of iterations before the answer converges (arrives within the acceptable limit).
On Paul Hsieh's Square Root page (http://www.azillionmonkeys.com/qed/sqroot.html) you'll find that you only need 6 iterations to get 532 places (a 64 bit double precision real number) i.e. every step gets you 9 bits, a factor of 29=512, closer to your answer. This is pretty fast.
Note | |
---|---|
End Lesson 20 |
At this stage we have working code for the Babylonian sqrt(). The algorithm converges quickly (only 6 iterations are required to calculate to an accuracy of 53 bits). However saying that it "converges quickly" doesn't quantify anything (it's a phrase from marketing). You need to be able to say how quickly, using a measurement that is meaningful to just about anybody (e.g. your grandma), not just computer programmers. An obvious first test is to compare the Babylonian sqrt() to the best sqrt() you have (the math module sqrt()). If it's better, we'll do more tests, otherwise we'll just drop it.
To do speed tests on code, we need to be able to measure the execution time of a program. Python has a timer that measures wall clock time in seconds since 1 Jan 1970 (the start of unix time). (see Time access and conversions http://docs.python.org/lib/module-time.html). time() is a (apparently 32-bit) float. On some systems the resolution is only 1sec (programs that run for less than 1 sec will appear to run for 0 time). If the computer is otherwise idle, the wall clock time and the execution time of your program are nearly the same. Here's some code to measure time.
from time import time n = 1000000 start = time() #do something n times finish = time() print "execution time per iteration = %10.10f", (finish-start)/n |
A single calculation of a square root is a little fast for most timers. We usually do a large number of iterations in a loop and then divide by the number of iterations. This brings its own problems: we have to figure out how much time is taken up by the looping code, but we can handle that too (see below). To time your square root code, first make your Babylonian sqrt() into a function. Copy your Babylonian sqrt code to a new file square_root_compare.py
You now have a function. From here on, this function is just a block of code, that you'll do timing runs on. You won't be changing anything inside of it; it's just a black box.
You need a main() to call the function. Write a two-line main() to test your function - assign the variable number the value 1000, and then call babylonian_sqrt() with the parameter number. Test that your code runs.
To do timing runs on the function, main() needs to generate a whole lot different numbers to feed to your the function. Why don't you just feed the same number over and over? Compilers and interpreters are supposed to be smart. If they see you do the same operation over and over on the same data, the compiler/interpreter will recognise what's going on and will just return the same answer, without going through the bother of calculating it over and over. You don't know if your python interpreter is this smart (how would you find out [152] ), but you should expect that one day you'll bump into a smart compiler/interpreter, and your timing runs will be meaningless. You may as well start writing your benchmarking code correctly from the start.
One way of producing a whole lot of different numbers is to use all the numbers produced by range(1,largest_number), where largest_number is say 10,000.
You now have a block of code in main() that calls the function largest_number of times. You now want to time that block
largest number 1000 time/iteration 0.0000341 |
Here's my code [154] . This outputs the time required to calculate the square roots of all the numbers in the range 1..1000. This piece of timing code is a self contained block. You aren't going to change it (except for the value of largest_number)
The average time for the sqrt() calculation depends on the number of times you looped (it shouldn't, but it does). You'll find that the time/iteration depends on whether you did it 10,100 or 1,000,000 times. You need to find a range of largest_number for which the time/iteration is the fastest availalbe and relatively constant. To handle this, you need to call the timing loop with a range of values of largest_number. To do this, put the timing loop above, inside another loop, which feeds different values of largest_number into the timing loop. Putting one loop inside another is called "nesting loops".
To do this, you'll need to use a list. Go to the section explaining lists and then return here.
Create a list named largest_numbers[] holding the values 1,10,100....1000000. Read these values out, using a loop (for or while?) assigning the values one at a time to largest_number. The timing loop, being nested inside the new outer loop, will now have to be indented one step to allow it to be parsed correctly. Here's what the nested loops look like
largest_numbers=[1,10,100,1000,10000,100000,1000000] #list of values for outer loop for largest_number in largest_numbers: #new outer loop for number in range(0,largest_number): #original loop babylonian_sqrt(number) |
Here's my code [155] .
Here's my output
# ./square_root_compare.py largest number 1 time/iteration 0.0001909 largest number 10 time/iteration 0.0000348 largest number 100 time/iteration 0.0000295 largest number 1000 time/iteration 0.0000341 largest number 10000 time/iteration 0.0000398 largest number 100000 time/iteration 0.0000474 largest number 1000000 time/iteration 0.0000512 |
Look to see if there's a range of values for largest_number where time for doing a sqrt is independant of the loop size (it should be the fastest time). For largest_number having a value of 10 or more, the Babylonian square root function takes 30-50usec on my 1GHz machine. One one student's machine (MacOS), the time/interation kept dropping with in the range values for largest_number above, so I sent him off to do runs with increasing values of largest_number. Another student's machine (WinXP) reported 0 time for the first 3 runs, and presumably has a time() function with a resolution of only 1 sec.
Note | |
---|---|
End Lesson 21 |
Next we want to compare the speed of the Babylonian sqrt() to the built-in sqrt() (in the math module).
Here's the new calling code in main() [156] .
Calling range() and setting up the looping consumes cpu cycles, so we need to subtract that time from our results. Code up a 3rd timing loop that doesn't do anything, so later you can subtract that time out. Here's a loop that does nothing (you can't have an empty line, the parser gets confused).
for number in range(0,largest_number): pass |
Here's my code [157] . Here's the output
# ./square_root_compare.py babylonian_sqrt: largest number 1 time/iteration 0.0001969337 library_sqrt: largest number 1 time/iteration 0.0000259876 empty_loop: largest number 1 time/iteration 0.0000119209 babylonian_sqrt: largest number 10 time/iteration 0.0000346184 library_sqrt: largest number 10 time/iteration 0.0000043154 empty_loop: largest number 10 time/iteration 0.0000017881 babylonian_sqrt: largest number 100 time/iteration 0.0000295210 library_sqrt: largest number 100 time/iteration 0.0000031996 empty_loop: largest number 100 time/iteration 0.0000008512 babylonian_sqrt: largest number 1000 time/iteration 0.0000341501 library_sqrt: largest number 1000 time/iteration 0.0000030880 empty_loop: largest number 1000 time/iteration 0.0000007720 babylonian_sqrt: largest number 10000 time/iteration 0.0000398650 library_sqrt: largest number 10000 time/iteration 0.0000031442 empty_loop: largest number 10000 time/iteration 0.0000007799 babylonian_sqrt: largest number 100000 time/iteration 0.0000459829 library_sqrt: largest number 100000 time/iteration 0.0000033149 empty_loop: largest number 100000 time/iteration 0.0000009894 babylonian_sqrt: largest number 1000000 time/iteration 0.0000511363 library_sqrt: largest number 1000000 time/iteration 0.0000033719 empty_loop: largest number 1000000 time/iteration 0.0000010112 |
The time for the empty loop is 0.7-1.0usec. Here's the timing results, after subtracting the empty loop time.
Table 1. square root (time, usec): Babylonian and math library code
Babylonian | math library |
---|---|
30-50 | 2 |
The library sqrt() is 15-25 times faster than the Babylonian sqrt(). Even though the babylonian_sqrt() only needs 5 iterations to get the answer, the library routine gets the answer in the time of 1/3-1/5th of a loop.
Python being an interpretted language will be slower than a compiled language. As well, python is object oriented, making it slower yet. C is one of the languages of choice when speed is required. Here's the same program as above (comparing the Babylonian and math library sqrt()) written in C (it uses the low resolution timers). With your current programming knowledge, you should be able to read this code and know what it's doing. [158] . Here's the output
time/iteration: babylonian 10000000 0.00000122000000000000 time/iteration: library 10000000 0.00000010800000000000 time/iteration: empty 10000000 0.00000000700000000000 time/iteration: babylonian 100000000 0.00000135340000000000 time/iteration: library 100000000 0.00000010790000000000 time/iteration: empty 100000000 0.00000000680000000000 time/iteration: babylonian 1000000000 0.00000147792000000000 time/iteration: library 1000000000 0.00000010824000000000 time/iteration: empty 1000000000 0.00000000691000000000 |
Here's the comparison of the timing results, subtracting 7nsec for the empty loop (compared to 700nsec for the empty loop in python).
Table 2. square root (time, usec): Babylonian and math library code
Language | Babylonian | math library |
---|---|---|
python | 30-50 | 2 |
C | 1.3 | 0.11 |
ratio time Python/C | 25-40 | 20 |
C is 20-40 times faster than python, at least for doing sqrt(). So why does anyone code in python? If speed is your main requirement you don't. However it takes longer to write C than it does to write python and people who don't program a lot, can write working python without too much trouble, but may not be able to get a C program to work. If you only require a few square roots and it takes 5 mins to write it in python and 10 mins to write it in C, then you do it in python. If you need to do 1012 square roots, then you do it in C.
If your code only takes 1 min to run and you're only going to run it a couple of times, then you really don't care if it takes 10 times longer. However if you're going to be running the 1 minute program hundreds of times, or your program will take a week to run, then a factor of 10 in speed is important.
Note | |
---|---|
End Lesson 22 |
You've just completed a piece of code and have a nice story to tell. For the rest of your life, whether you're a coder or something else, you're going to have to sell what you've done to someone else. Here you have a piece of code.
It's simple to tell other coders of your work. You just walk them through the code. They'll just go "yup, yup... yup. Nice story." and go back to their work. A little harder is a technically trained person, who isn't neccessarily a python coder.
Here's what you do.
You give an introduction.
This will give enough background information so that people will know why you did the work. In this case you wanted to code up the Babylonian square root, to see how it worked, and then you tested its speed compared to the fastest available square root, the math library square root.
You need to give people time to adjust from what they were doing before they came into the room, so you can add a few things that don't require much brainwork; e.g. you could tell them where Babylon is (50miles south of Baghdad), what else Babylon is famous for (besides the square root algorithm) - the hanging gardens of Babylon, one of the 7 wonders of the ancient world; and the Laws of Hammurabi.
You give your talk.
In this case you will explain what the code does from an overall point of view, then explain each piece. You can explain the code in the order you wrote it, which makes it simple to understand
This is usually phrased as
It's a conceptually simple story with a conclusion that everyone will agree on. A presentation on this piece of code shouldn't take more than 5-10mins.
Note | |
---|---|
End Lesson 23 |