.Open Notes on Pointers These notes summarize just about everything you need to know about pointers in CS202. . See Also .See http://en.wikipedia.org/wiki/Pointers (Wikipedia) .See http://cslibrary.stanford.edu/104/ and .See ./03binky.cpp (Pointer fun with Binky) .See http://www.codeproject.com/tips/lovelypointers.asp (Only for experts who wantto know about MicroSoft C++ stuff). . Pointer as dog trained to point at the bird etc you are hunting .See http://www.pointers.uk.com/images/haysi-fantazee.jpg Computer pointers also tell your program where to find things... but they point out a place in memory not a bird in the brush. . A Problem that needs Pointers Construct a family tree storing the people, their marriages, their children, their brothers and sisters, etc. We need a way to link together related people.... to be able to find a person's mother, father, son, sister, ... .As_is dick---father--> frank ---father--->john .As_is || |married .As_is frank ---mother--->chrystabel .As_is |married .As_is dick---mother--> meg ---mother--->sarah---father--->caleb .As_is || |married .As_is meg ---father--->richard .As_is |brother-sister .As_is kath Notice that I used names ("dick", "meg", and "frank") in the diagram above. In computer memory there are no names! Instead we have addresses. We place the data in memory and must remember where we put it. So we must put `addresses` into memory as well. The addresses are special numbers. We call them `pointers`. We connect different objects by storing one object's address (a number) as part of the other object. We, therefore, start by reviewing the ideas of address and contents in computer storage.. . Introduction Pointers are hard to understand because they are part of the machine's way of doing business. They are not designed to be easy for humans. They are part of the way the CPU and Memory works. You need, therefore, to learn to imagine the memory of the computer, and how it works. You should think of computer memory or RAM a gigantic array of bytes: .List byte byte byte ... .Close.List The bytes are numbered. The number of a byte is called its address. Each address is a number that identifies a particular piece of memory. .Set Think of a phone number identifying your pager... Think of a street number identifying a house... Think of an Array of items with subscripts! .Close.Set You have already learned that a variable is an identifier that is associated (tied, bound, linked) to a piece of storage of the right size to hold the variable's values. In a running program the memory is divided into three pieces: .Set The code of the programming (Fixed) The Stack (variable, last in - first out) The Heap (see below) .Close.Set When you declare a variable like this: .As_is int remainder; the program interprets this by (1)reserving a piece of the stack and (2) using the address of of this piece of storage in place of the variable `remainder`. At the end of the block in which `remainder` was declare the storage is "popped" off the Stack ready for recycling for other temporary storage. This is a local variable. The Heap is random access storage and available for data that is not on the stack. It is not created and deleted automatically. You must explicitly reserve a piece of it. You must explicitly access it. You must make sure your program remembers where it is. You must explicitly recycle it. The heap is often highly disorganized. The parts of the heap are not bound to a fixed variable. The storage has no name. It has an address instead. We `store the address` in a special variables so that we can find the data again. .Set Think of putting a phone number on a piece of paper... Think of Directory inquiries! Think of noting a street number in an address book Think of an index variable of an Array! .Close.Set . Pointers In C and C++ there is a special kind of variable called a pointer. A pointer variable is a variable that has some storage of its own.... often on the stack. But a pointer also `stores another address`. This address is where the "real" data can be found. So a pointer has two addresses - one where it is and one where it points. It forms a link between two addresses. .Set Think of opening an address book, looking up a number and then calling the number... Think of directory inquiries: It has a number of its own and it lets you find other numbers. Imagine phoning the CS dept and asking them for my office phone number. .Close.Set You need to think of pointers as an indirect way of getting to data. Like a phone number points to a person. Like an address identifies a house, and so people inside the house. Pointers are like a finger pointing at something and saying "That thing, there ---->". Do not confuse the finger with the thing it is pointing at! For example the amount of storage needed for an address is the same whatever data is stored in that address. Think of two twins at the zoo: one points at an elephant, and the other other at an ant. The twins are the same size, but the ant and the elephant are different sizes! . C++ Pointers If you declare a pointer variable called `p` then it points at a thing written .As_is *p So in C/C++ we declare a pointer to an object of type T by writing: .As_is T *p; It says: If you put an asterisk in front of `p` you will get a T. After the above statement you've got a `dangling pointer` because it hasn't been assigned to a reliable address yet. You have three choices for safely getting an address: .Box Assign the address of a variable t of type T: .As_is p = & t; Copy an existing and valid/alive/nondangling pointer: .As_is p = another_pointer; Grab some new storage off the heap of unused memory: .As_is p = new T; .Close.Box Think about this program: .As_is #include .As_is main() .As_is { .As_is int *p; .As_is int i=12; .As_is p= & i; .As_is *p = 17; .As_is cout << "p is " << p << " and i is " << i; .As_is } If you compile it and run it then it prints out two values. The address in `p` is printed in Hexadecimal format ( "0x...") and `i` is printed and should equal 17, because changing `*p` is changing `i`. In C++ you get a new piece of heap storage of type T by: .As_is new T This is an expression whose value is an $address! However, this storage has not been given a value yet. If T is an elementary data type then you will have to initialise the new storage: .As_is int* pi = new int; *pi=42; Note: it is an error to write `pi=42;` which makes `pi` point at byte 42 in RAM! If you repeatedly grab new storage then your program will run out of memory. We call this a `memory leak`. When you assign an existing address (using `&` for example) then you don't have to worry. It is the calls to `new` that give storage that needs tidying up. Learn to assoicate every `new` with a `delete`: .Box Grab some new storage off the heap of unused memory: .As_is p = new T; .As_is ... Return the storage to the heap of unused memory: .As_is ... .As_is delete p; .Close.Box Many years ago, in a country far away from here, a friend wrote the software used to reconcile checks every night.... the software ran on minicomputers all England. It worked perfectly for three weeks and then started to fail mysteriously.... all over England. The machines started to report they had run out of memory. It was a small memory leak. My friend was very embarassed by this. It is wise to return unused storage to the heap when you have finished with it. Storage you can't access any more is called `garbage`. Losing track of such storage causes a `memory leak`. Collecting and returning useless memory is called .See Garbage Collection , and there is more on this below. Think about this program: .As_is #include .As_is main() .As_is { .As_is int *p; .As_is p=new int; .As_is *p = 17; .As_is cout << "p is " << p << " and *p is " << *p; .As_is delete p; .As_is } If you compile it and run it then it prints out two values. The address in `p` is printed in Hexadecimal format ( "0x...") and the contents (*p) is printed in decimal and should equal 17. . Pointers and Constructors If we have a class C then `new C` will use C's default constructor to initialize the contents of the new storage. The expression `new C(arguments)` calls a special constructor to initialize the object and returns the address of the object. Notice that a constructor can `also` grab heap storage. If class C has a constructor `C::C(...){.....new....}` then it grabs storage off the heap. You should also write a `destructor` with header `~C()` to delete the storage grabbed inside `C(...)`. . C++ Notation if `*p` is an object (or struct) then part of the thing it points at is written: .As_is p->name_of_part If `p` points at an object with a member function then .As_is p->name_of_function(arguments) calls the function and applies it to `*p`. . Objects vs Pointer at objects .Table .Row declare an object in class: Class_Name object_variable; .Row address of object: & object_variable .Row .Row declare pointer: Class_Name *pointer_variable; .Row .Row make default dynamic object: pointer_variable = new Class_Name; .Row make object on the heap: pointer_variable = new Class_Name(arguments); .Row .Row address of object: pointer_variable .Row The object pointed at: *pointer_variable .Row part of object object_variable.name_of_part .Row part of object pointer_variable->name_of_part .Row .Row Do something to object object_variable.name_of_function(arguments) .Row Do something to object pointer_variable->name_of_function(arguments) .Row .Row Store value in object object_variable = new_value; .Row Store value in object *pointer_variable = new_value; .Row .Row Move pointer pointer_variable = another_pointer_or_address .Close.Table Notice an object_variable is stuck to its object, but a pointer_variable can point at different objects at different times. Pointers get tricky because two or more pointers can point at the same thing! In the following program the change to *q also changes *p. But changing *q does not change q. The pointer is not the object that it points at: .As_is #include .As_is main() .As_is { .As_is int *p; .As_is p=new int; .As_is *p = 42; .As_is cout << "p is " << p << " and *p is " << *p; .As_is int *q; .As_is q = p; .As_is cout << "q is " << q << " and *q is " << *q; .As_is *q = 17; .As_is cout << "q is " << q << " and *q is " << *q; .As_is cout << "p is " << p << " and *p is " << *p; .As_is delete p; .As_is } Here is a picture of the storage just as the output is being produced: .Image pq.gif Example p[x--]-->x[42]<--q[--x] Notice: I've used `x` as a symbol for the unknown address created by .As_is p=new int; Doing this is a good way to trace programs that contain pointers. Notice I only have to delete `one` piece of new storage. I chose to do this via `delete p;`. Since `p` and `q` point at the same `int` it doesn't matter which I use.... but I mustn't do both. After deleting `p`, both `p` and `q` can not be derefenced: `*p` and `*q` can crash the program. . Addresses of Variables .Net Declare a variable of type T like this: .As_is T v; Then v is a variable and has type T. The address of variable v is written with the ampersand character (shifted 7) .As_is &v Then &v has type T* and can point at v. It can not be made to point any where else either! .As_is T * p = &v; p is a variable of type T* and can point at objects of type T. It is initially pointing at v -- because it is equal to `&v`. So *p has the same value (initially) as v has. Always, *(&v) is the same as v. Asterisks(*) cancel out ampersands(&). .Close.Net The following may help: .Box Memory is like a gigantic array of byte-sized boxes. Each box is numbered. Imagine a large number of labeled boxes... all the same size, with labels stuck on them... The labels are numbers 0,1,2,3,... The numbers are called the addresses. The boxes are memory locations. Inside each box is its contents. If you don't put something in a box you will find some junk left behind by the previous user... The compiler handles a normal declaration of a variable by writing the name of the variable on an unused boxes label, underneath the number. When the variable goes out of scope the name is erased and the box can be reused for something else. A Pointer is a box that has a number in it. This number must be the address(on a label) of another box. When the compiler sees the declaration of a pointer variable it again finds an unused box and writes the name of the pointer on it. When the program exits the block the variable's name is erased again. A program can put an address into a box used for a pointer. Then the number in the pointer's box is the number on another box. It is said to point at the other box. The '*' operator in an expression returns follows the pointer from its box to the address of stored in the pointers box. If p is the name on box number 12 for example and box 12 has the number 123 inside it (12 outside, 123 inside) then `*p` is whatever is in box 123 and `p` is 123. .Close.Box It helps to draw many pictures! .Box A pointer is actually the index of a very large array of data. It points to one piece of RAM: Random Access Memory, or Primary memory. Suppose we declare p by .As_is Type * p; then p is an index to a gigantic array RAM[0..???] of Type items and *p is short for RAM[p]. So until you make p point at something old or something new: .As_is p = & old_thing; or .As_is p = new Type; you dare not use *p or RAM[p]. .Close.Box . NULL In one thing a pointer is not an index in RAM. There is a special value that a pointer can have that is not an address of an item in RAM. It is symbolized by .As_is NULL in C and C++. It is called the the Null-pointer. It is used to indicate that there is NO data linked in at that point (think of a loose phone chord). If a pointer `p` is set to NULL then it is saying that `p` points at no data. It is said that madness destroys those who follow the NULL pointer. In other words if `p`==NULL then `*p` will crash the program. . Pointers and Vectors It is quite common to have a vector full of pointers. for example if: .As_is vector pv; then you can add new int locations to the vector: .As_is pv.push_back(new int); .As_is pv.push_back(new int); and do things with them: .As_is *(pv[0])=42; .As_is *(pv[1])=32; It is also possible to have a variable that points at a vector! .As_is vector *vp; The above declares a pointer variable, but it is a dangling pointer until we assign it to the address of an existing vector: .As_is vp = & v; or a new and empty vector of ints: .As_is vp = new vector; Here you can add `int`s to the vector that vp points at: .As_is (*vp).push_back(42); .As_is (*vp).push_back(32); or in shorthand .As_is vp->push_back(42); .As_is vp->push_back(32); There are two examples of these two techniques in .See http://cse.csusb.edu/dick/examples/pv.cpp and .See http://cse.csusb.edu/dick/examples/vp.cpp can you figure out which uses a pointer to a vector, and which uses a vector of pointers? . Pointers and Arrays First, you can have arrays that have pointers in them. Here is a common example: .As_is char * argv[] Each `argv[i]` is the address of a character. Or in short hand each `argv[i]` is a `char*`. Second, sneakily, each C++ array name is a `constant pointer`! In C/C++ all the items in an array are the same size and they are placed in memory like this: .Table Address Contents .Row name+0 name[0] .Row name+1 name[1] .Row name+2 name[2] .Row ... .Row name+length-1 name[length-1] .Close.Table Indeed for all arrays *(name+subscript) == a[subscript] In other words we can add integers to pointers to get another pointer. This is called .Key pointer arithmetic and is rather powerful. If you use the name of an array without any index then you get the address of the data, not the data itself. This means you can attach a pointer to the first item in the array like this: .As_is Item * pointer = array_name; Now, if `p` points at an item in an array then `p+1` points at the next item in the array (if it exists). C/C++ is designed so that we can use pointers to rapidly scan the items in an array: .As_is for(Item *p = array_name; p is OK; p++) do something with *p; Notice that, if your pointer goes outside the array then lizards can come out of your nose! . Data Structures with Objects With simple data like int's and doubles you can store them in a vector or array quite safely. If you have more complex data -- objects in classes or structs -- then it may be best to use arrays and vectors of pointers to the objects rather than the objects themselves. The explanation will have to wait until we cover polymorphism! .As_is vector <*Classname> vpc; .As_is (*Classname)[Size] apc; . Common and Subtle errors with Pointers There are two big problems with pointers .List (following a bad pointer): Following a dangling or NULL pointer. (memory leak): Creating undeleted garbage. (running out of heap): THis is rare with good programs and is usually the reult of a meory leak in bad programs. .Close.List The first error is usually signalled by the program going illegal -- accessing a restricted piece of memory, or the operating system crashing. The second is subtle and tends to cause problems after people have started to use the system. A memory leak occurs when you accidently create garbage and don't collect it... it then ceases be useful to the computer and ultimately you can run out of memory. For example -- my friend Tony worked on the computers that the Banks in England used to clear checks. They formed a nationwide network. He had a memory leak if one word (4 bytes) each time a chack was cleared. The network ran perfectl;y for three weeks and then machines started to report that they had no more memory -- embarrassment and panic ensured... Here is a way to work out if an operation on a pointer is safe, risky, or unsafe. One reason C++ and C programs are unreliable is the shear complexity of these rules. Pointer_states::=following .Net A pointer is in one of four states: $dead, $alive, $null, or out_of_scope. It is only when a pointer is alive that you can use it safely. The classic use of a pointer is to start dead, become alive, be used for various purposes, and then to be deleted and become dead again before the program removes it when it goes out of scope. (dead): Pointers are dead when declared and become $alive when you assign an address, or a new location or an $alive pointer. It is safe (indeed wise) to assign a NULL to a dead pointer -- it becomes a null pointer. It is safe to leave the scope (at a "}" in the program) if the pointer is dead. It is unsafe to delete or follow(dereference) a dead pointer. (alive): Alive pointers are pointing at a real safe location. You can safely follow them (derefence them). You can delete them -- and they become $dead pointers -- however there is a risk that `other` pointers will suddenly become $dead if they refer to the same storage! Assigning a new value to a pointer is risky because it can create garbage -- by losing track of the old address which then becomes a memory-leak. If you assign NULL they become NULL pointers. Assigning a dead pointer is unsafe and makes the pointer $dead. If you exit the scope of a pointer that is alive then you risk losing track of it's memory and creating garbage and a possible memory leak. (null): Null pointers can not be followed safely -- they go nowhere. They can not be deleted safely -- there is nothing to delete. But you can assign an alive pointer (or a new piece of storage) safely and they become $alive. If you assign a dead pointer they also become $dead, but this is rather a stupid thing to do! It is safe to let a null pointer go out of scope. .Close.Net Note: In some of my sample programs in this page I have some memory leaks that are safe -- you can safely exit a program with live pointers because the operating system clears the memory up, garbage and all. . Hints I have some rules about pointers: dicks_rules::=following, .Net K.I.S.S.: Keep pointers as simple as you possibly can. Encapsulate pointers in thoroughly tested, reusable classes that implement well know data structures. Don't reinvent the Wheel. In C++ use .See The Standard Library Draw lots of diagrams of the RAM. Walk through the program with a table of addresses and data, step by step. Document your data structures using the UML. Take a Data Structures Class (CS330). Take out the trash: Every `new` needs a `delete`. Never follow a dead or null pointer. .Close.Net . Pointers in the UML A pointer should be shown in a UML daigram as an association with an arrow-head. For example .Image uml.pointer.png [Example class with a pointer] Older books put and open diamond at the other end ( <>-----> ). This is is called an aggregation. It looks good but is not necessary. The commonest special links in a UML class diagram are .Image links.png [ USES, ISA, HAS, KNOWS ] Other associations are a simple line connecting two classes and should be given a descriptive name. . The Standard Library Standard C++ comes with a large library of powerful templates designed to make programming a faster and less error prone process, without slowing down the speed of the program. The library is called STL after its old name "Standard Template Library." It is used by including files with an names like this: list, stack, vector,... .See http://cse.csusb.edu/dick/cs202/stl.html You have already met `vector` and `string` from this library. .Open Dynamically Allocated Arrays . A Problem that needs Dynamic Arrays Input and store a line of users input. You have a virtual memory system with therefore a GByte of storage.... but you don't want to declare an array with 1000MBytes of storage when the user input 5 characters! The creation of vectors as part of the Standard Library for C++ hides the need for storage that appears to grow and shrink to fit the current size of the problem. . Arrays on the Heap The storage that a pointer points at is usually explicitly created and destroyed as the program runs. The size of this can be determined after the program starts running. Such dynamic data has an address that is determined after the program starts running... and must be stored in a special variable. This called a pointer variable. A pointer variable stores an address and this address contains the data. For example a dynamic array is set up like this: .Box declaration: Class_Name *pointer_variable; Get storage pointer_variable = new Class_Name[Size]; (Size can be any expression) .Close.Box It is operated on like this: .Box Put data in pointer_variable[place] = value; Operate on pointer_variable[place] . function_name(arguments); first Item pointer_variable[0] next Item pointer_variable[1] ... last Item pointer_variable[Size-1] .Close.Box You should delete a dynamically allocated array like this .Box Free storage delete [] pointer_variable; .Close.Box .Close Dynamically Allocated Arrays . Garbage Collection Nobody wants to take out the trash! In programming `garbage` is the storage that you have asked for (in C++ using `new`) and yet no longer can use because no pointer points at it any more. C++ doesn't tidy this up for you so a porogram can run out of RAM purely because it hasn't taken tidied up as it goes. There is a command that removes the storage that a pointer points at: .As_is delete p; Afterwards `p` may or may not have changed.... but the storage `p` refered to is now available for other parts of the program to use. It is wise to also clear the pointer `p` at the same time: .As_is delete p; p=NULL; Notice that if data is allocated with `new[...]` then it must be free'd with `delete [] ....;`. Do not forget the `[]`! Some of the subtlest bugs occur because of bad garbage collection. If you forget to delete storage when it is no longer needed, you have a memory leak. In time your program will break. You can buy tools that make sure that you don't have any memory leaks (Purify). .Set Incorrect rule: Each pointer must be deleted. Correct rule: Each `new` that is executed must be matched by a `delete'. .Close.Set Allocated storage, that can not be used again is called `garbage`. Only garbage can be safely deleted. If there is any pointer that is directly or indirectly attached to some allocated storage than it might get used.... so it is not garbage. Finding the garbage is called garbage collection. If any pointer points at a piece of storage, you must `not` delete the storage. When several links have been made to a node it can not be deleted until all the links are broken. Safe garbage collection in C/C++ is hard work... but is automatic in most other languages: LISP, Java, SmallTalk, Ada,.... In the old days Garbage collection was triggered off when memory ran short. These days we have "On-the-fly garbage collection". Classic War Story .Box In the early days of Artificial Intelligence the Department of Defense (DoD) sponsored an experimental system that would answer any question put to it about the armed forces of the USA. The input would be in English and a large collection of data would be searched for relevant information. If the answer was found it would be printed. Other wise the information would be assembled into a complex data structure as the software searched for an answer to the question. They used LISP 1.5 to do the programming. When the day came for testing a 5 star general marched up to the console and typed in the question: .As_is What is the function of the General Staff of the United states army. Sadly the question was not on file. The machine started to try and work out the answer. after a while its data structures had filled the memory. The General was not pleased when the computer answered his question by printing: .As_is Garbage Collection. .Close.Box . A Design Pattern for Simple Dynamic Data .Net Hide pointers inside a class. Write setters and getters for the data, not the pointers. Allocate storage in constructors for that class. Delete storage in the destructor for that class. .Close.Net .Close Notes on Pointers . See Also Notes on linked data structures: .As_is http://cse.csusb.edu/dick/cs202/linked.html .See http://cse.csusb.edu/dick/cs202/linked.html If you think you understand pointers, please see .See http://c2.com/cgi/wiki?ThreeStarProgrammers which discusses the pros and cons of using: .As_is char***ppc; Have fun... . Syntax and Semantics of pointers in C++ .Net declaration::statement= $type $asterisk $variable. .As_is int * pi; .As_is char * pc; .As_is Person * pperson; variable::=identifier. type::=`name of a struct, class, enum, typedef, or built-in data types` asterisk::="*". get_value_pointed_at_expression::= $asterisk $expression, .Key dereference the pointer. This means finding the contents of the address in p. .As_is * pi; .As_is * pc; .As_is * pperson; .Table Address Contents .Row p *p .Close.Table Take note of how an asterisk works. It converts something of type T* into something of type T. If p is a T* then *p is a T! If t is a variable of type T then &t is a constant of type T*. So: .Set *&t == t .Close.Set new_element_expression::expression= new $type. Gets a piece of memory of the right size and returns the address. new_array_expression::expression= new $type[ `expression` ]. For all pointers p and integers i, p[i]::=*(p+i). deletion_statement::statement= delete $variable. indirect_reference_to_field::expression= $variable -> field. For all pointers to structures p and fields f, p->f ::= (*p).f. .Close.Net . Glossary constructor::=a member function with the same name as the class that is called automatically when a variable is declared of that class. garbage::=storage allocated but no longer accessable after a pointer has been moved. destructor::=a member function whose name starts with a tilde and ends with the name of the class that is called automatically when the object goes out of scope.