For example, if the problem is to find a match sock in a pile of similar socks that matches a given sock, then an algorithm is to take every sock in turn and see if it matches the given sock, and if so stop. This algorithm is called a linear search.
Exercise: get two packs of cards an shuffle each one. Take one card from the first pack and search through the other pack to find the matching card. Repeat this until you get a feel for how it performs.
If the problem is putting 14 socks away in matching pairs then one algorithm is to (1) clear a place to layout the unmatched socks, (2) for each sock in turn, take it to each unmatched sock and if they match put them both away as a pair. But, if the new sock matches none of them unmatched socks, add it to the unmatched socks. I call this the "SockSort" algorithm.
Exercise. Get a pack of card and extract the Ace,2,3, .. 7 of the hearts and clubs. Shuffle them together. Now run the SockSort algorithm to pair them up by rank: Ace-Ace, 2-2, ... .
If you had an array of 7 numbered boxes you could sort the cards faster than sorting the socks. Perhaps I should put number tags on my socks?
So, changing what we are given, changes the problem, and so changes the best algorithm to solve it.
Here is another example. Problem: to find a person's name in a telephone directory. Algorithm: use your finger to split the phone book in half. Pick the half that contains the name. Repeatedly, split the piece of the phone book that contains the name in half, ... This is a binary search algorithm.
Here is another example using the same data as the previous one: Find my neighbor's phone number. Algorithm: look up my address using binary search, then read the whole directory looking for the address that is closest to mine.
If we change the given data, the problem of my neighbor's phone number has a much faster algorithm. All we need is copy of the census listing of people by street and it is easy to find my neighbors name and so phone number.
A problem best expressed in terms of
Why should I learn about algorithms?
Bjarne Stroustroup, the developer of C++, has written a very comprehensive and deep introduction to the standard C++ library as part of his classic C++ book. This book is in the CSUSB library.
As a general rule: a practical programmer starts searching the library
for their language for known algorithms.
Is there an easy way to grasp a new algorithm?
No... unless the algorithm is simple or the author very good at writing.
I find that a deck of playing cards is very helpful for sorting and searching algorithms.
A pencil and paper is useful for doing a dry run of an algorithm. With a group, chalk and chalk board is very helpful.
Programming a computer tends to take too long but is good for gathering statistics on the performance of algorithms.
When there are loops it is well worth looking for things that the body
of the loop does not change. They are called invariants. If an invariant
is true before the loop, then it will be true afterward as well. You can
often figure out precisely what an algorithm does by noting what it does not
change...
Are there any common problems that have algorithmic solutions?
The two classic problems are called Searching and Sorting. In searching we are
given a collection of objects and need to find one that passes some test. For
example: finding the student whose student Id number ends "1234". In sorting,
we are given a collection and need to reorganize it according to some rule. An
example: placing a grade roster in order of the last four digits of student
identifier,
Finding the root of an equation: is this a search or a sort?
You can't find things in your library so you decide to place the books in a particular order: is this a search or a sort?
Optimization problems are another class of problems that have attracted a lot of
attention. Here we have a large number of possibilities and want to pick the
best. For example: what dress is the most attractive in a given price range?
What is the shortest route home from the campus? What is the smallest amount of
work needed to get a B in this class?
What types of algorithm are there?
ALGORITHM to find the integer lo that is just below the square root of an integer n (√(n)). (
ALGORITHM Given a positive number a and error epsilon calculate the square root of a. (
Exercise. Code & test the above.
Can every problem be solved by an algorithm?
NO! Take CSCI546 to see what problems are unsolvable and why this is so.
As a quick example: No algorithm can exist for finding the bugs in any given
program.
Optimization problems often prove difficult to solve: consider finding the
highest point in Seattle without a map in dense fog...
How are algorithms expressed?
First they were expressed in natural language: Arabic, Latin, English, etc.
In the 1950s we used flow charts. These where reborn as Activity Diagrams in the Unified Modeling Language in the 2000s.
Boehm and Jacopini proved in the 1960's that all algorithms can constructed using three structures
Here is a sample of structured English:
ALGORITHM Sock_Sort.
clear space for unmatched socks
FOR each sock in the pile,
FOR each unmatched sock UNTIL end or a match is found
IF the unmatched sock matches the sock in your hand THEN
form a pair and put it in the sock drawer
END IF
END FOR
IF sock in hand is unmatched THEN
put it down with the unmatched socks
END IF
END FOR
END ALGORITHM
Where do algorithms appear in objects and classes?
Algorithms often appear inside the methods in a class. However some
algorithms are best expressed in terms of interacting objects. So,
a method may be defined in terms of a traditional algorithm or as
a system of collaborating objects.
You need an algorithm any time there is no simple sequence of steps to code the solution to a problem inside a function. It is wise to either write at an algorithm or use the UML to sketch out the messages passing between the objects.
Algorithms can be encapsulated inside objects. If you
create an inheritance hierarchy, you can organize a set of
objects each knowing a different algorithm (method). You can
then let the program choose its algorithm at run time.
When should I write an algorithm?
Algorithms are used in all upper division computer science classes.
I write my algorithm, in my program, as a sequence of comments.
I then add code that carries out each step -- under the comment
that describes the step informally.
How do you write algorithms?
First there is no algorithm for writing algorithms! So here are some hints.
Check out the text books in the Data Structures and Algorithms classes in the upper division of our degree programs.
John Mongan and Noah Suojanen have written "Programming Interviews exposed: Secrets to landing your next job" (Wiley 2000, ISBN 0-471-38356-2). Most chapters are about problem solving and the well known algorithms involved.
Donald Knuth 's multi-volume "Art of Computer Programming" founded the study of algorithms and how to code and analyze them. It remains an incredible resource for computer professionals. All three volumes are in our library.
Jon Bentley 's two books of "Programming Pearls" are lighter than Knuth's work but have lots of good examples and advice for programmers and good discussion of algorithms. They are in the library.
G H Gonnet and R. Baeza-Yates wrote a very comprehensive "Handbook of Algorithms and Data Structures in Pascal and C" (Addison-Wesley 1991) which still my favorite resource for detailed analysis of known algorithms. There is a copy in my office.... along with other resources.
The Association for Computing Machinery (ACM)
started publishing algorithms in a special
supplement (CALGO) in the 1960s. These are algorithms involving numbers. So
they tend to be written in FORTRAN.
Are there any algorithms on the web?
Yes -- lots. The Wikipedia, alone, has dozens of articles on
particular algorithms, on the theory of algorithms, and on classes of algorithms.
How are algorithms turned into code?
Step by step you translate each line of the algorithm into your target language.
Ideally each line in the algorithm becomes two or three lines of code. I like
to leave the steps in my algorithm visible in my code as comments.
How are algorithms rated?
You know that movies are rated by using stars or "thumbs-up". Rating an
algorithm is more complex, more scientific, and more mathematical.
In fact, we label algorithms with a formula, like this, for example:
or in short
This means: for very large values of n the search can not take more than a linear time (plus some small ignored terms) to run.
On the other hand we can show:
The above formulas tell us that if we make n large enough then binary search will be faster than linear search.
As another example, a linear search is in O(n) and the Sock-Sort (above) is in O(n squared). This means that the linear search is better than the sock_search. But in what way? This takes a little explanation.
Originally (in the 40's through to the mid-60's) we worked out the average number of steps to solve a problem. This tended to be correlated with the time. However we found (Circa 1970) two problems with this measure. First the average depended on how the data is distributed. For example, a linear search is fast if we are likely to find the target at the start of the search rather than at the end. Worse, certain sorting algorithms work quickly on average but sometimes are very slow. For example the beginner's bubble sort is fast when most of the data is in sequence and quick sort can be horribly slow on the same data.... but for other re-orderings of the same data quick sort is much better than bubble sort. So before you can work out an average you have to know the frequencies with which different data appear. Typically we don't know this distribution. Second, even when we know the distribution of input data, it is often hard to calculate the averages.
These days a computer scientist judges an algorithm by its worst case behavior, We are pessimistic, with good reason. Many of us have implemented an algorithm with a good average performance on random data and had users complain because their data is not random but in the worst case possible. This happened, for example, to Jim Bentley of AT&T [Programming Pearls above]. In other words we choose the algorithm that gives the best guarantee of speed, not the best average performance. It is also easier to figure out the worst case performance than the average -- the math is easier.
The second complication for comparing algorithms is how much data to consider. Nearly all algorithms have different times depending on the amount of data. Typically the performance on small amounts of data is more erratic than for large amounts of data. Further, small amounts of data don't make a big delays that the users will notice. So, it is best to consider large amounts of data. Luckily, mathematicians have a tool kit for working with larger numbers: asymptotic analysis. This is the calculus of what functions look like for large values of their arguments. This also simplifies the calculations because we only need to look at the higher order terms in the formula.
Finally, to get a standard "measure" of an algorithm, independent of its hardware, we need to eliminate the speed of the processor from our comparison. We do this by removing all constant factors out of our formula to give the algorithm its rating.
We talk about the order of an algorithm and use a big letter O to symbolize
the rating.
To summarize: To work out the order of an algorithm, calculate
For example, when I do a sock-search, with n socks, in the worst case I would have to layout n/2 unmatched socks before I found my first match, and finding the match I would have to look at all n/2 socks. Then I'd have to look at the remaining n/2 -1 socks to match the next one, and then n/2-2, ... So the total number of "looks" would be
or
or
Simplifying by ignoring the n/4 term and the constant (1/8) we get
Here is listing of typical ratings from best to worst...:
Table
| Name | Formula | Note |
|---|---|---|
| Constant | O(1) | |
| Logarithmic | O(log n) | Good search algorithm |
| Linear | O(n) | Bad Search algorithms |
| n log n | O(n * log n) | Good sort algorithms |
| n squared | O(n^2) | Simple sort algorithms |
| Cubic | O(n^3) | Normal matrix multiplication |
| Polynomial | O(n^p) | good solutions to bad problems |
| Exponential | O(2^n) | Not a good solution to a bad problem |
| Factorial | O(n!) | Checking all permutations of n objects. |
Here is a picture graphing some typical Os:
Notice how the worst functions may start out less than the better ones, but that they always end up being bigger.
Most algorithms for simple problems have a worst case times that are a power of n times a power of log n. The lower the powers, the better the algorithm is.
There exist thousands of problems where the best solutions we have found, however, are exponential -- O(2^n)! Why we have failed to improve on this is one of the big puzzles of Computer Science.
Helpful Page on BigO
Please read and study this page:
[ 000957.html ]
What is this Big-O Notation?
A formula like O(n^2.7) names a large family of similar functions that are
smaller than the formula for very large n (if we ignore the scale). n and
n*n are both in O(n^2.7). n^3 and exp(n) are not in O(n^2.7). Now, a clever
divide and conquer matrix multiplication algorithm is O(n^2.7) and so better
than the simple O(n^3) one for large matrices.
Big_O (asymptotic) formula are simpler and easier to work with than the original formula because we can ignore so much: instead of 2^n +n^2.7+123.4n we write 2^n or exponential.
Timing formula are expressed in terms of the size n of the data. To simplify the formulas we remove all constant factors: 123*n*n is replaced by n*n. We also ignore the lower order terms: n*n + n + 5 becomes n*n. So
To be very precise and formal, here is the classic text book definition:
This means that to show that f is in O(g) then you have to find a constant multiplier c and an number n0 so that for n bigger than n0, f(n) is less than or equal to c*g(n).
For example 123n is in O(n^2) because for all n> 123, 123n <= n^2. So, by choosing n0=123 and c=1 we have 123n <= 1 * n^2.
We say f and g are asymptotically equivalent if and only if f(n) is in O(g(n)) and g(n) is in O(f(n)).
There is another way to look at the ordering of these functions. We can look at the limit L of the ratio of the functions for large values:
In CSCI202 I expect you to take these facts on trust. Proofs will follow in the upper division courses.
Exercise: Reduce each formula to one the classic "big_O"s listed.
Answers (randomized)
Is there an efficient algorithm for every problem?
Probably not.
There are everyday problems that force you to find needles in haystacks. Problems like this force a computer to do a lot of work to solve them. You need to learn to spot these, and try to avoid them if possible.
We normally consider polynomial algorithms as efficient and non-polynomial
ones as hard. Computer scientists have discovered a large class of problems
that don't seem to have polynomial solutions, even though we can check the
correctness of the answer efficiently. A common example is to find the
shortest route that visits every city in a country in the shortest time.
This is the famous Traveling Salesperson's Problem. Warning: you can waste
a lot of time trying to find an efficient solution to this problem.
What are the important algorithms of Computer Science
. . . . . . . . . ( end of section What are the important algorithms of Computer Science) <<Contents | End>>
Review
Go back to the start of this document and look at the list of questions ...
try to write down, from memory, a short, personal, answer to each one?
Acknowledgments
Dr. Botting wants to acknowledge the excellent
help and advice given by Dr. Zemoudeh
on this document. Whatever errors remain are Dr. Botting's mistakes.