Sunday, January 26, 2014

Pointers in C/C++

Before understanding pointer, it is necessary to understand variable.

Variable declaration
To store any value (for example an integer value 102) in memory, you need a variable. You need to declare the variable before using. Declaration tells compiler that what kind of value you want to store. The declaration also tells the compiler that some space in memory has to be allocated for the variable you declare.
Example of variable declaration:

int num;
char ch;

In this example, there are two variables declared: num and ch. The type specifier int indicates compiler that the num will store integer type and char indicates that ch will store character type.

Scope of variables
Variables have scope: global or local. The global variables have lifespan same as that of the program i.e. the global variables will remain valid for whole lifetime of the program. The memory allocated for the a local variable is valid only for lifetime of the function it is defined in.

int aNum;
int main()
{
  func();
  func();
  return 0;
}
void func()
{
  int x = 100;
}

In this example, aNum is a global variable. It will exist for the lifetime of the program. And the variable declared in func() (int x) will remain valid only if the function func is executing. Also note that the func is called twice from main function. The variable x will be allocated for each call separately. The variables allocated in the first call will be destroyed after the call finishes and in the next call, x will again be allocated.


Value of variables
Any variable you declare in C/C++ has some value. You assign some value to the variable before using it. If you do not assign any value, then it has some random (junk) value. You should never rely on the random junk value. You should always assign a value to variable before using the value of the variable.

In summary:

  • You have to declare a variable before using
  • Variables have values. If you do not explicitly assign a value, they contais random junk values
  • There is some memory allocated for each variable you declare in your program
  • Each variable has some address in memory
  • Variables have scope: global or local


Pointer

A pointer is a type of variable as discussed above but its value has special meaning. Its value contains address of some other variable. Whenever you use a pointer, there are two concepts involved:

  • Pointer variable
  • Pointed memory (whose address is same as the value of pointer)

pointer and pointed memory
The value of pointer variable is address of pointed memory. You can assign any value to pointed memory. Size of a pointer variable is fixed but the size of data at pointed memory can vary.

Declaration of pointers

int* ptr;
char* cptr;

Pointers are declared as type_name * pointer_name; The type_name is used to specify type of memory pointed. In this example, ptr will point to a integer data. That means the ptr is a pointer and its value points to a memory which contains integer data. Similarly cptr will point to some character data.


Assigning values to pointer

There are many ways you can assign value of a pointer

Assign with address of another variable

int* ptr; // declare a pointer variable
int x = 100; // declare an integer variable and assign value of 100
ptr = &x; // assign value of pointer to the address of variable x.
// Notice that the & operator used to get address of a variable

pointer pointing to a variable

In this example the pointer ptr is pointing to the variable x i.e. the value of ptr is address of x. The address of x will vary depending on where it is allocated in the memory. The diagram uses 0xABC just for simplicity.

Assign with value of other pointer

int* ptr;
int* ptr2;
int x = 100;
ptr = &x; // assign value of ptr to address of variable x
ptr2 = ptr; // value of ptr2 is assigned as value of ptr which is address of variable x

pointers

In this example, we first assign the address of x to the pointer ptr then assign the value of ptr to ptr2. After assigning the value of ptr2, both ptr and ptr2 are pointing to the variable x

Assign with memory allocated from heap

int* ptr;
ptr = malloc(sizeof(int)); // malloc allocate memory from heap
// and return address of allocated memory

pointers

In this example, a new memory is allocated from heap and its address is assigned to the ptr. We will discuss more about heap memory later in the post.


Using pointers

Pointers can be used in any expression in two ways:

  • Use the pointer simply by its name. As pointer is a variable, using it by its name will use its value i.e. an address.
  • Use as *pointerName: Using this way will use the value at pointed memory
int x = 100;
int* ptr;
ptr = &x;
printf("%p", ptr); // This will print an address (address of x)
printf("%d", *ptr); // This will print value at pointed memory i.e. value of x i.e. 100.
// Notice that * operator is used to access the data at pointed memory. Do not get confused
// it with pointer declaration. At the time declaration * indicates that the variable is
// pointer and at the time of using the pointer in any expression, 
// * used get the value at pointed memory location


Changing the value at pointed memory
You can change value at the pointed memory using the pointer variable. For example

int x = 100;
int* ptr = &x;
*ptr = 200; // This changes the value at pointed address to 200

In the last line of this example, we are changing the value of pointed memory. Since the ptr is pointing to x, so it will change the value of x. So the value of x will become 200 after the execution of the last line


Unassigned Pointer
What will happen if you do not assign any value to a pointer and try to access the pointed memory? As the pointer is a variable as discussed in the beginning of this post, it will contain some random value if not assigned. That means the pointer points to some random address. If you try to access the pointed memory then

  • You may get access violation. Not all memory addresses are accessible to a program. Please note that you can get the value of pointer (the address) in this case but the problem is for accessing the value at pointed memory
  • You may get/set value at some random memory in your program. This will result in memory corruption.


Null Pointer
If you do not assign any value to a pointer, then it points to some random memory. But there are cases when you will create a pointer variable and not assign any value (pointed memory address) or you will assign the value later. In such cases, you assign the value of the pointer to NULL. NULL is a special value that indicates the pointer is not assigned any value. In your program, you can check if a pointer is assigned to NULL or not. This is a proper way to use a pointer whose value is not set yet.

int* ptr = NULL;
if (ptr == NULL) {
  // you can check if the pointer is set to NULL then do something
}

If you try to access the value at pointed memory for null pointer, then you will get access violation.


Pointer to a pointer
As a pointer can point to any memory location of any variable so it can also point to another pointer. In this case the pointed memory is another pointer.

int x = 100;
int* ptr = &x;
int** ptr2 = &ptr; // Pointer to a pointer. Declared using **
// Using ptr2
// ptr2 will give value of ptr i.e. address of ptr
// *ptr2 will give value at pointed memory by ptr2 i.e. value of ptr i.e. address of x
// **ptr2 is equivalent to *ptr i.e. value of x


Accessing array using pointer

int main()
{
  int arr[4]; // create an array with four elements
  int* ptr = arr; // create a pointer ptr and point to the array's first element
  *ptr = 10; // set the pointed memory (first elemtn of the array) to 10
  ptr++; // change pointer to point to next element in the arrray
  *ptr = 20; // change the pointed memory (second element in array) to 20
}

pointers

You can set value of a pointer to address of an array. In this example, int* ptr = arr; assigns the address of the array 'arr' to ptr. After execution of this statement, the ptr will point to the first element of the array. The ++ on ptr will make the pointer to point to the next element in the array.

Please Note that the ++ operator on a pointer behaves differently than the integer type. The ++ operator on integer type will increment value by one but on pointer it will increment it by size of the type of pointer. In this example, the pointer ptr is of type int (declared as int* ptr) and if we are on 32-bit system, then size of int is 4, then the ++ on ptr will increment its value by 4. That is why the ++ operator makes the pointer to point to next element in the array. On 32 bit system, the arr[4] will be of size 16 bytes. The first element will start at index 0 of the 16 bytes memory, the second element will start at index 4 of the 16 bytes memory and so on. And the ++ operator increasing the value by four making the pointer to point to next element of the array

In the previous example, we assigned the pointer value to the address of an array and manipulated the elements of the array. We can also allocate some memory from heap and treat the allocated memory as array and can manipulate that as an array

int main()
{
  int* ptr = malloc(4*sizeof(int)); // allocate memory from heap to hold
  // four contigious integers. You can think it as dynamically allocated array
  *ptr = 10; // set value of first element of dynamically allocated array
  ptr++; // point to next element
  *ptr = 20; // set value of second element of dynamically allocated array
}


Heap memory
Each program has a pool of memory called heap memory. Whenever you need some memory you can allocate some part of it (the heap) and use. Whenever you are done with the allocated memory, then you can free it. Then it will become unallocated and later can be used by other part of the program. The benefit of heap memory is that it gets allocated on demand. Like global variables, it can be accessed from anywhere in the program. The global variables are allocated when program starts and remain valid till the end of the program. You cannot free global variables.
Example of using heap memory

int* getMem()
{
  int* ptr = malloc(sizeof(int)); // allocate a memory (from heap) of size of integer
  return ptr;
}
int main()
{
  int* p = getMem();
  *p = 100; // Use the allocated memory
  free(p); // Now free the allocated memory
  return 0;
}


Dangling pointer

void  main()
{
  int* ptr = malloc(sizeof(int)); // allocate a memory from heap and set its address to ptr
  *ptr = 100; // sets the value at pointed memory
  free(ptr); // free the pointed memory. Now the ptr does not point to a valid memory 
  *ptr = 200; // try to set value at pointed memory. This is ERROR, you setting
  // value at memory which has already been freed
}

In this example free(ptr) call makes memory allocator to deallocate memory pointed by ptr. If you access the same memory after freeing, it may resuly into unexpected behavior. You will probably corrupt some memory. In this example, after freeing the memory, the pointer points to invalid memory (since it has been de-allocated). This type of pointer is called dangling pointer. You can again set the pointer to to some valid memory and make it non-dangling. But never use dangling pointers


Returning address of local variable

int* func()
{
  int x = 100;
  int* ptr = &x;
  return ptr; // return the address of x. The x will become invalid after the function
  // returns. So the address returns by this function will become invalid
  // after the function returns
}
void main()
{
  int* ptr = func();
  *ptr = 100; // Try to use invalid pointer!!
}

In this example, the function func returns the address of a local variable. As the local variable becomes invalid, after the function returns, you cannot manipulate it after it becomes invalid. The main() function, tries to set at pointed memory (which is invalid at that point of time) which will result in undefined behavior.


Memory leak

void func()
{
  int* ptr = malloc(sizeof(int)); // allocate a memory
  *ptr = 100; // set value at allocated memory
}
void main()
{
  func();
}

In this example, the func() allocate some memory, use it and then returns without freeing it. After returning the function, we have lost the address of allocated memory. The address of allocated memory was stored in a local variable pointer (ptr). The local variable will get destroyed after the function returns. After the function returns, we have no way to access the allocated memory. This situation is called memory leak. In general: if you have allocated some memory and will never use it in future and also not freeing it, then the situation is called memory leak.


Pointer example in linked list
Pointer is heavily used in almost all data structures. Here is a simple example how it is used in linked list. First of all declare a structure to represent a node of the linked list

struct Node {
  int data;
  struct Node* next;
};

The Node struct represents a node of the linked list. The Node contains a data member of type int. The data member can be of any type based on requirement. The second member is of type pointer of type Node. This pointer can point to an object of type Node. It enables a node of linked list to point to another node.

Linked list contain series of nodes in liner fashion. One node pointing to next and the last one pointing to NULL. The NULL pointer indicates that there is no node after. Typically the memory of all nodes of linked list are dynamically allocated from heap memory. We also need a pointer that will point to the first node of the linked list. Let's declare the pointer which will point to the first node of the linked list.

struct Node* first = NULL;

The first is a pointer. So it contains only a address of memory which contains data of type Node. Simply by declaring like this will allocate only the memory for storing the address of actual linked list node. At this point there is no memory allocated for linked list node. Now create the first node of the linked list.

first = malloc(sizeof(struct Node));
first->data = 1; // Notice the -> operator. If a pointer is pointing to a structure object
// and you want to access members of stcuture using the pointer, then -> operator is used.
first->next = NULL;

After execution of these statements, the first pointer will point to dynamically allocated memory. The memory allocated will store a node of linked list. After allocating, we assigned the data part of the node and also set the next pointer to NULL to indicate that there is no node after this node. Let's create another node in the linked list.

struct Node* second = malloc(sizeof(struct Node));
second->data = 2;
second->next = NULL;
first->next = second;

After execution of these lines of code, a memory is allocated for the second node of the list. We set the data and set next pointer to NULL to indicate there is no node after this. And then set the next of first node to address of the second node.

After creating two nodes, the list nodes and pointers can be described by this diagram
pointer


Conclusion

The blog post covered (in short) various aspects of pointer with examples and diagrams. Feel free to comment if you have any question.


Diagrams in this post were quickly drawn using Lekh Diagram. Lekh Diagram is a sketch recognition diagramming app for Android and iOS

Related articles by me: C++ Internals,   C Internals

No comments:

Post a Comment