Computer pointers also tell your program where to find things... but they point out a place in memory not a bird in the brush.
We need a way to link together related people.... to be able to find a person's mother, father, son, sister, ...
dick---father--> frank ---father--->john
|| |married
frank ---mother--->chrystabel
|married
dick---mother--> meg ---mother--->sarah---father--->caleb
|| |married
meg ---father--->richard
|brother-sister
kathNotice that I used names ("dick", "meg", and "frank") in the diagram above. In computer memory there are no names! Instead we have addresses. We place the data in memory and must remember where we put it. So we must put addresses into memory as well.
The addresses are special numbers. We call them pointers. We connect different objects by storing one object's address (a number) as part of the other object. We, therefore, start by reviewing the ideas of address and contents in computer storage..
You should think of computer memory
or RAM a gigantic array of bytes:
You have already learned that a variable is an identifier that is associated (tied, bound, linked) to a piece of storage of the right size to hold the variable's values.
In a running program the memory is divided into three pieces:
When you declare a variable like this:
int remainder;the program interprets this by (1)reserving a piece of the stack and (2) using the address of of this piece of storage in place of the variable remainder. At the end of the block in which remainder was declare the storage is "popped" off the Stack ready for recycling for other temporary storage.
This is a local variable.
The Heap is random access storage and available for data that is not on the stack. It is not created and deleted automatically. You must explicitly reserve a piece of it. You must explicitly access it. You must make sure your program remembers where it is. You must explicitly recycle it. The heap is often highly disorganized.
The parts of the heap are not bound to a fixed variable. The storage has no name. It has an address instead. We store the address in a special variables so that we can find the data again.
You need to think of pointers as an indirect way of getting to data. Like a phone number points to a person. Like an address identifies a house, and so people inside the house. Pointers are like a finger pointing at something and saying "That thing, there ---->".
Do not confuse the finger with the thing it is pointing at!
For example the amount of storage needed for an address is the same whatever data is stored in that address. Think of two twins at the zoo: one points at an elephant, and the other other at an ant. The twins are the same size, but the ant and the elephant are different sizes!
*p
So in C/C++ we declare a pointer to an object of type T by writing:
T *p;It says: If you put an asterisk in front of p you will get a T.
After the above statement you've got a dangling pointer because it hasn't been assigned to a reliable address yet. You have three choices for safely getting an address:
p = & t;
p = another_pointer;
p = new T;
#include <iostream.h>
main()
{
int *p;
int i=12;
p= & i;
*p = 17;
cout << "p is " << p << " and i is " << i;
}If you compile it and run it then it prints out two values. The address in p is printed in Hexadecimal format ( "0x...") and i is printed and should equal 17, because changing *p is changing i.
In C++ you get a new piece of heap storage of type T by:
new TThis is an expression whose value is an address! However, this storage has not been given a value yet. If T is an elementary data type then you will have to initialise the new storage:
int* pi = new int; *pi=42;Note: it is an error to write pi=42; which makes pi point at byte 42 in RAM!
If you repeatedly grab new storage then your program will run out of memory. We call this a memory leak. When you assign an existing address (using & for example) then you don't have to worry. It is the calls to new that give storage that needs tidying up. Learn to assoicate every new with a delete:
p = new T;
...
...
delete p;
It is wise to return unused storage to the heap when you have finished with it. Storage you can't access any more is called garbage. Losing track of such storage causes a memory leak. Collecting and returning useless memory is called [ Garbage Collection ] , and there is more on this below.
Think about this program:
#include <iostream.h>
main()
{
int *p;
p=new int;
*p = 17;
cout << "p is " << p << " and *p is " << *p;
delete p;
}If you compile it and run it then it prints out two values. The address in p is printed in Hexadecimal format ( "0x...") and the contents (*p) is printed in decimal and should equal 17.
Notice that a constructor can also grab heap storage. If class C has a constructor C::C(...){.....new....} then it grabs storage off the heap. You should also write a destructor with header ~C() to delete the storage grabbed inside C(...).
p->name_of_part
If p points at an object with a member function then
p->name_of_function(arguments)calls the function and applies it to *p.
declare an object in class: | Class_Name object_variable; | ||
address of object: | & object_variable | ||
declare pointer: | Class_Name *pointer_variable; | ||
make default dynamic object: | pointer_variable = new Class_Name; | ||
make object on the heap: | pointer_variable = new Class_Name(arguments); | ||
address of object: | pointer_variable | ||
The object pointed at: | *pointer_variable | ||
part of object | object_variable.name_of_part | ||
part of object | pointer_variable->name_of_part | ||
Do something to object | object_variable.name_of_function(arguments) | ||
Do something to object | pointer_variable->name_of_function(arguments) | ||
Store value in object | object_variable | = new_value; | |
Store value in object | *pointer_variable = new_value; | ||
Move pointer | pointer_variable = another_pointer_or_address |
Pointers get tricky because two or more pointers can point at the same thing! In the following program the change to *q also changes *p. But changing *q does not change q. The pointer is not the object that it points at:
#include <iostream.h>
main()
{
int *p;
p=new int;
*p = 42;
cout << "p is " << p << " and *p is " << *p;
int *q;
q = p;
cout << "q is " << q << " and *q is " << *q;
*q = 17;
cout << "q is " << q << " and *q is " << *q;
cout << "p is " << p << " and *p is " << *p;
delete p;
}Here is a picture of the storage just as the output is being produced:
Notice: I've used x as a symbol for the unknown address created by
p=new int;Doing this is a good way to trace programs that contain pointers.
Notice I only have to delete one piece of new storage. I chose to do this via delete p;. Since p and q point at the same int it doesn't matter which I use.... but I mustn't do both. After deleting p, both p and q can not be derefenced: *p and *q can crash the program.
T v;
&v
T * p = &v;
The following may help:
The numbers are called the addresses. The boxes are memory locations. Inside each box is its contents. If you don't put something in a box you will find some junk left behind by the previous user...
The compiler handles a normal declaration of a variable by writing the name of the variable on an unused boxes label, underneath the number. When the variable goes out of scope the name is erased and the box can be reused for something else.
A Pointer is a box that has a number in it. This number must be the address(on a label) of another box. When the compiler sees the declaration of a pointer variable it again finds an unused box and writes the name of the pointer on it. When the program exits the block the variable's name is erased again.
A program can put an address into a box used for a pointer. Then the number in the pointer's box is the number on another box. It is said to point at the other box.
The '*' operator in an expression returns follows the pointer from its box to the address of stored in the pointers box. If p is the name on box number 12 for example and box 12 has the number 123 inside it (12 outside, 123 inside) then *p is whatever is in box 123 and p is 123.
Type * p;then
So until you make p point at something old or something new:
p = & old_thing;or
p = new Type;you dare not use *p or RAM[p].
NULLin C and C++. It is called the the Null-pointer. It is used to indicate that there is NO data linked in at that point (think of a loose phone chord).
If a pointer p is set to NULL then it is saying that p points at no data.
It is said that madness destroys those who follow the NULL pointer. In other words if p==NULL then *p will crash the program.
vector<int*> pv;then you can add new int locations to the vector:
pv.push_back(new int);
pv.push_back(new int);and do things with them:
*(pv[0])=42;
*(pv[1])=32;
It is also possible to have a variable that points at a vector!
vector<int> *vp;The above declares a pointer variable, but it is a dangling pointer until we assign it to the address of an existing vector:
vp = & v;or a new and empty vector of ints:
vp = new vector<int>;Here you can add ints to the vector that vp points at:
(*vp).push_back(42);
(*vp).push_back(32);or in shorthand
vp->push_back(42);
vp->push_back(32);There are two examples of these two techniques in [ pv.cpp ] and [ vp.cpp ] can you figure out which uses a pointer to a vector, and which uses a vector of pointers?
char * argv[]Each argv[i] is the address of a character. Or in short hand each argv[i] is a char*.
Second, sneakily, each C++ array name is a constant pointer!
In C/C++ all the items in an array are the same size and they
are placed in memory like this:
Table
Address | Contents |
---|---|
name+0 | name[0] |
name+1 | name[1] |
name+2 | name[2] |
... | |
name+length-1 | name[length-1] |
In other words we can add integers to pointers to get another pointer. This is called pointer arithmetic and is rather powerful.
If you use the name of an array without any index then you get the address of the data, not the data itself. This means you can attach a pointer to the first item in the array like this:
Item * pointer = array_name;
Now, if p points at an item in an array then p+1 points at the next item in the array (if it exists). C/C++ is designed so that we can use pointers to rapidly scan the items in an array:
for(Item *p = array_name; p is OK; p++) do something with *p;
Notice that, if your pointer goes outside the array then lizards can come out of your nose!
vector <*Classname> vpc;
(*Classname)[Size] apc;
The first error is usually signalled by the program going illegal -- accessing a restricted piece of memory, or the operating system crashing.
The second is subtle and tends to cause problems after people have started to use the system. A memory leak occurs when you accidently create garbage and don't collect it... it then ceases be useful to the computer and ultimately you can run out of memory.
For example -- my friend Tony worked on the computers that the Banks in England used to clear checks. They formed a nationwide network. He had a memory leak if one word (4 bytes) each time a chack was cleared. The network ran perfectl;y for three weeks and then machines started to report that they had no more memory -- embarrassment and panic ensured...
Here is a way to work out if an operation on a pointer is safe, risky, or unsafe. One reason C++ and C programs are unreliable is the shear complexity of these rules.
It is only when a pointer is alive that you can use it safely.
The classic use of a pointer is to start dead, become alive, be used for various purposes, and then to be deleted and become dead again before the program removes it when it goes out of scope.
(dead): Pointers are dead when declared and become alive when you
assign an address, or a new location or an alive pointer. It is safe (indeed wise) to
assign a NULL to a dead pointer -- it becomes a null pointer. It is
safe to leave the scope (at a "}" in the program) if the pointer
is dead. It is unsafe to delete or follow(dereference) a dead pointer.
(alive): Alive pointers are pointing at a real safe location. You can
safely follow them (derefence them). You can delete them -- and they
become dead pointers -- however there is a risk that
other pointers will suddenly become dead if they refer to the
same storage! Assigning a new value to a pointer is risky because it
can create garbage -- by losing track of the old address which then
becomes a memory-leak. If you assign NULL they become NULL pointers.
Assigning a dead pointer is unsafe and makes the pointer dead. If
you exit the scope of a pointer that is alive then you risk losing
track of it's memory and creating garbage and a possible memory leak.
(null): Null pointers can not be followed safely -- they go nowhere.
They can not be deleted safely -- there is nothing to delete. But
you can assign an alive pointer (or a new piece of storage) safely and
they become alive. If you assign a dead pointer they also become dead,
but
this is rather a stupid thing to do! It is safe to let a null pointer
go out of scope.
Note: In some of my sample programs in this page I have some memory leaks that are safe -- you can safely exit a program with live pointers because the operating system clears the memory up, garbage and all.
Older books put and open diamond at the other end ( <>-----> ). This is is called an aggregation. It looks good but is not necessary.
The commonest special links in a UML class diagram are
Other associations are a simple line connecting two classes and should be given a descriptive name.
The creation of vectors as part of the Standard Library for C++ hides the need for storage that appears to grow and shrink to fit the current size of the problem.
A pointer variable stores an address and this address contains the data. For example a dynamic array is set up like this:
. . . . . . . . . ( end of section Dynamically Allocated Arrays) <<Contents | End>>
delete p;Afterwards p may or may not have changed.... but the storage p refered to is now available for other parts of the program to use. It is wise to also clear the pointer p at the same time:
delete p; p=NULL;
Notice that if data is allocated with new[...] then it must be free'd with delete [] ....;. Do not forget the []!
Some of the subtlest bugs occur because of bad garbage collection. If you forget to delete storage when it is no longer needed, you have a memory leak. In time your program will break. You can buy tools that make sure that you don't have any memory leaks (Purify).
Allocated storage, that can not be used again is called garbage. Only garbage can be safely deleted. If there is any pointer that is directly or indirectly attached to some allocated storage than it might get used.... so it is not garbage. Finding the garbage is called garbage collection. If any pointer points at a piece of storage, you must not delete the storage. When several links have been made to a node it can not be deleted until all the links are broken. Safe garbage collection in C/C++ is hard work... but is automatic in most other languages: LISP, Java, SmallTalk, Ada,.... In the old days Garbage collection was triggered off when memory ran short. These days we have "On-the-fly garbage collection".
Classic War Story
When the day came for testing a 5 star general marched up to the console and typed in the question:
What is the function of the General Staff of the United states army.
Sadly the question was not on file. The machine started to try and work out the answer. after a while its data structures had filled the memory.
The General was not pleased when the computer answered his question by printing:
Garbage Collection.
. . . . . . . . . ( end of section Notes on Pointers) <<Contents | End>>
http://cse.csusb.edu/dick/cs202/linked.html[ linked.html ]
If you think you understand pointers, please see [ wiki?ThreeStarProgrammers ] which discusses the pros and cons of using:
char***ppc;Have fun...
int * pi;
char * pc;
Person * pperson;
* pi;
* pc;
* pperson;
Address | Contents |
---|---|
p | *p |