Of Pointers and Men (2)
This post is an automatic translation from French. You can read the original version here.
And here we go again for a new chapter in our exploration of pointers! Now that you understand what a pointer actually is, I hope all of this scares you much less on the theoretical side… Yes, all that fuss for just that!
Yet the pointer, despite its simplicity, is a fundamental concept of C. It is truly one of the aspects of the language that make it, in my humble opinion, so close to the machine and so fascinating! So I suggest we continue our journey by looking at how they are used in a small function example.
A seemingly simple function…
Let’s imagine we write a small program that calls an integer display function, which we will call dis_moi_des_mots_doux:
#include <stdio.h>
void dis_moi_des_mots_doux( int ) ;
int
main()
{
int A ;
A = 69 ;
dis_moi_des_mots_doux( A ) ;
return 0 ;
}
void
dis_moi_des_mots_doux( int n )
{
printf("Oh ... un %d ! Comme c'est gentil !!!\n", n ) ;
}We can verify that our function works perfectly:
$gcc -o thefunk main.c
$./thefunk
Oh ... un 69 ! Comme c'est gentil !!!The value 69 is perfectly passed to the dis_moi_des_mots_doux function, which displays it. So far so good!
Emboldened by this success, we decide to add the remise_a_zero function which, as you may have guessed, should reset the passed variable to zero. Most beginners will write a function that looks like this:
#include <stdio.h>
void dis_moi_des_mots_doux( int ) ;
void remise_a_zero(int) ;
int
main()
{
int A ;
A = 69 ;
dis_moi_des_mots_doux( A ) ; // J'affiche A
remise_a_zero( A ) ; // Je remets A à zéro
dis_moi_des_mots_doux( A ) ; // Je réaffiche A
return 0 ;
}
void
dis_moi_des_mots_doux( int n )
{
printf("Oh ... un %d ! Comme c'est gentil !!!\n", n ) ;
}
void
remise_a_zero( int k )
{
k = 0 ;
}Unfortunately, when we test this function, our disappointment is immense:
$gcc -o thefunk_v2 main_v2.c
$./thefunk_v2
Oh ... un 69 ! Comme c'est gentil !!!
Oh ... un 69 ! Comme c'est gentil !!!A has not changed value.
And this is completely normal!!! Because A itself is never transmitted when the function is called. It is a copy of its value that is sent to the function.
To explain what is happening, I offer you two ways to look at it: graphically, and in assembly. You will see that it is very intuitive.
When you call a function, the following operations are performed, in order:
- The expression in parentheses is evaluated (i.e., we find its numerical value)
- We jump to the function’s location
- A local variable is created (here n in the dis_moi_des_mots_doux function)
- The variable is initialized with our numerical value.
Graphically, for the first function call, the execution looks like this:
The problem, as you might suspect, is with the call to the remise_a_zero function, which tries to modify A.
Here are the next steps of our program’s execution:
Indeed, since only a copy of the value contained in A is passed to remise_a_zero, the variable A remains unchanged. It cannot work this way! (too hard for us!)
Let’s now look at “the real thing” and dive into assembly. If this step scares you, you can skip it without any problem: meet us at the next chapter, a bit further down!
Let’s compile, disassemble, and view our program’s code:
$gcc -g -o thefunk_v2 mainv2.c --static
$objdump -S ./thefunk_v2( If you don’t have the necessary tools, you can get this assembly code here
[...]
int
main()
{
40174d: 55 push %rbp
40174e: 48 89 e5 mov %rsp,%rbp
401751: 48 83 ec 10 sub $0x10,%rsp
int A ;
A = 69 ;
401755: c7 45 fc 45 00 00 00 movl $0x45,-0x4(%rbp)
dis_moi_des_mots_doux( A ) ; // J'affiche A
40175c: 8b 45 fc mov -0x4(%rbp),%eax
40175f: 89 c7 mov %eax,%edi
401761: e8 1b 00 00 00 call 401781 <dis_moi_des_mots_doux>
remise_a_zero( A ) ; // Je remets A à zéro
401766: 8b 45 fc mov -0x4(%rbp),%eax
401769: 89 c7 mov %eax,%edi
40176b: e8 35 00 00 00 call 4017a5 <remise_a_zero>
dis_moi_des_mots_doux( A ) ; // Je réaffiche A
401770: 8b 45 fc mov -0x4(%rbp),%eax
401773: 89 c7 mov %eax,%edi
401775: e8 07 00 00 00 call 401781 <dis_moi_des_mots_doux>
return 0 ;
40177a: b8 00 00 00 00 mov $0x0,%eax
}
40177f: c9 leave
401780: c3 ret
0000000000401781 <dis_moi_des_mots_doux>:
void
dis_moi_des_mots_doux( int n )
{
401781: 55 push %rbp
401782: 48 89 e5 mov %rsp,%rbp
401785: 48 83 ec 10 sub $0x10,%rsp
401789: 89 7d fc mov %edi,-0x4(%rbp)
printf("Oh ... un %d ! Comme c'est gentil !!!\n", n ) ;
40178c: 8b 45 fc mov -0x4(%rbp),%eax
40178f: 89 c6 mov %eax,%esi
401791: 48 8d 3d 70 f8 07 00 lea 0x7f870(%rip),%rdi # 481008 <_IO_stdin_used+0x8>
401798: b8 00 00 00 00 mov $0x0,%eax
40179d: e8 5e 84 00 00 call 409c00 <_IO_printf>
}
4017a2: 90 nop
4017a3: c9 leave
4017a4: c3 ret
00000000004017a5 <remise_a_zero>:
void
remise_a_zero( int k )
{
4017a5: 55 push %rbp
4017a6: 48 89 e5 mov %rsp,%rbp
4017a9: 89 7d fc mov %edi,-0x4(%rbp)
k = 0 ;
4017ac: c7 45 fc 00 00 00 00 movl $0x0,-0x4(%rbp)
}
4017b3: 90 nop
4017b4: 5d pop %rbp
4017b5: c3 ret
[...]Since A is a local variable, it is created on the stack. This is done by growing said stack with the following instruction:
401751: 48 83 ec 10 sub $0x10,%rspThe stack is increased by 16 bytes by changing the value of %rsp (the stack pointer). This is more than enough to store an int (4 bytes).
The value 69 (0x45 in hexadecimal) is then stored in A, which is at address rbp-4 (thus on the stack, and more specifically in main()’s frame):
401755: c7 45 fc 45 00 00 00 movl $0x45,-0x4(%rbp)Let’s now see how the call to the remise_a_zero function is made from the main function:
401766: 8b 45 fc mov -0x4(%rbp),%eax
401769: 89 c7 mov %eax,%edi
40176b: e8 35 00 00 00 call 4017a5 <remise_a_zero>A’s value is first loaded into the %eax register from the stack. It is then copied into the %edi register and execution “jumps” to address 4017a5, where the remise_a_zero function is located.
This passing through %eax then %edi may seem surprising: why not put A’s value directly into %edi? Well, simply because the expression in parentheses is first evaluated, and only then is the call made.
Thus, you could have things like:
remise_a_zero( 2*A+3 ) ;In this case, the expression (2*A+3) would first need to be evaluated, then remise_a_zero would be called. The compiler therefore generated two-step code:
- Evaluation of the expression with %eax (here it is trivial)
- Calling the function (the standard tells us the first argument must be placed in %edi)
In any case, neither A nor its memory location are transmitted to the function: only a copy of its value!
Pointers and functions
Things are a bit clearer now: the remise_a_zero function does not work because no variable is “transmitted” – only a copy of its value.
This is where pointers come into play: instead of transmitting A’s value, we will transmit its address, so that the called function knows where to write. And addresses, we manipulate them with this new type of variable that we saw last time: the pointer!
Let’s go ahead and modify main() to transmit A’s address to our function. We will simply use the & operator that we saw last time:
int
main()
{
int A ;
A = 69 ;
dis_moi_des_mots_doux( A ) ;
remise_a_zero( &A ) ; // <= The magic is here!!!
dis_moi_des_mots_doux( A ) ;
return 0 ;
}Since we are no longer passing an integer but an address of an integer, we need to adapt remise_a_zero:
void
remise_a_zero( int* k)
{
*k = 0 ; // Don't forget the * operator here!!!
}The local variable k is now a pointer, which will therefore contain an address.
To write a zero at this address, we use the * operator as seen last time. Once again, this means:
- I read k
- k contains an address
- I write 0 at that address
The complete code looks like this:
#include <stdio.h>
void dis_moi_des_mots_doux( int ) ;
void remise_a_zero(int*) ;
int
main()
{
int A ;
A = 69 ;
dis_moi_des_mots_doux( A ) ;
remise_a_zero( &A ) ;
dis_moi_des_mots_doux( A ) ;
return 0 ;
}
void
dis_moi_des_mots_doux( int n )
{
printf("Oh ... un %d ! Comme c'est gentil !!!\n", n ) ;
}
void
remise_a_zero( int* k )
{
*k = 0 ;
}What? You want another little drawing? Really?
But come ooooon… It takes forever to makeeeee!!!!!!!!!!!!!!
Fine, since I like you, here it is:
And it should work! Let’s test it right away:
$gcc -o thefunk_v3 main_v3.c
$./thefunk_v3
Oh ... un 69 ! Comme c'est gentil !!!
Oh ... un 0 ! Comme c'est gentil !!!I think we can call this a victory!
Once again, let’s look at the assembly code. Those who want to can once again skip to the next chapter :)
[...]
int
main()
{
40174d: 55 push %rbp
40174e: 48 89 e5 mov %rsp,%rbp
401751: 48 83 ec 10 sub $0x10,%rsp
401755: 64 48 8b 04 25 28 00 mov %fs:0x28,%rax
40175c: 00 00
40175e: 48 89 45 f8 mov %rax,-0x8(%rbp)
401762: 31 c0 xor %eax,%eax
int A ;
A = 69 ;
401764: c7 45 f4 45 00 00 00 movl $0x45,-0xc(%rbp)
dis_moi_des_mots_doux( A ) ;
40176b: 8b 45 f4 mov -0xc(%rbp),%eax
40176e: 89 c7 mov %eax,%edi
401770: e8 31 00 00 00 call 4017a6 <dis_moi_des_mots_doux>
remise_a_zero( &A ) ;
401775: 48 8d 45 f4 lea -0xc(%rbp),%rax
401779: 48 89 c7 mov %rax,%rdi
40177c: e8 49 00 00 00 call 4017ca <remise_a_zero>
dis_moi_des_mots_doux( A ) ;
401781: 8b 45 f4 mov -0xc(%rbp),%eax
401784: 89 c7 mov %eax,%edi
401786: e8 1b 00 00 00 call 4017a6 <dis_moi_des_mots_doux>
return 0 ;
40178b: b8 00 00 00 00 mov $0x0,%eax
}
401790: 48 8b 55 f8 mov -0x8(%rbp),%rdx
401794: 64 48 2b 14 25 28 00 sub %fs:0x28,%rdx
40179b: 00 00
40179d: 74 05 je 4017a4 <main+0x57>
40179f: e8 dc 62 04 00 call 447a80 <__stack_chk_fail>
4017a4: c9 leave
4017a5: c3 ret
[...]
void
remise_a_zero( int* k )
{
4017ca: 55 push %rbp
4017cb: 48 89 e5 mov %rsp,%rbp
4017ce: 48 89 7d f8 mov %rdi,-0x8(%rbp)
*k = 0 ;
4017d2: 48 8b 45 f8 mov -0x8(%rbp),%rax
4017d6: c7 00 00 00 00 00 movl $0x0,(%rax)
}
4017dc: 90 nop
4017dd: 5d pop %rbp
4017de: c3 ret
4017df: 90 nop
[...]While the creation of A has not changed much (A is still on the stack), the stack now also contains a canary. This is not today’s topic, so we will disregard it.
However, the assignment of the value 69 to A is still performed in the same way:
401764: c7 45 f4 45 00 00 00 movl $0x45,-0xc(%rbp)The call to the remise_a_zero function, however, has changed significantly:
remise_a_zero( &A ) ;
401775: 48 8d 45 f4 lea -0xc(%rbp),%rax
401779: 48 89 c7 mov %rax,%rdi
40177c: e8 49 00 00 00 call 4017ca <remise_a_zero>We no longer use the mov instruction, but the lea instruction. And this one stores A’s address (%rbp-0xc) in %rax.
As you can see, no more trickery: everything happens exactly as expected!
Final words
This chapter is already quite long, so we will postpone some of the points I wanted to cover here to a future installment. To conclude, I would just like to highlight how clear C can be when it comes to functions.
If you see a function declared like this:
int fonction_mystere ( int A, double* B, int *C ) ;You know that after a call to this function, the first parameter passed could not have been modified by it, while the next two could! This allows you to “compartmentalize” your variables, and maintain some control even when using functions from obscure libraries.
On the other hand, you will need to be very careful not to use invalid addresses – it is very easy to shoot yourself in the foot! Typedefs, in particular, can hide pointers from your keen eye.
See you soon, and don’t hesitate to send me your questions!
Rancune.


