CoffeeBeforeArch.github.io

View on GitHub

Compilation with GCC

Many programmers coloquially refer to the process of generating an executable from high-level source code as compilation, and refer to g++ as a compiler. In reality, g++ is a compiler driver that invokes different subcomponents of GCC (the GNU Compiler Collection) that work together to generate an executable. This includes a preprocessor, compiler, assembler, and linker. In this blog post, we’ll be going over each of these subcomponents for GCC, and analyzing how our source code is transformed by each step.

A “Hello, world!” Program

The program we’ll be anaylzing in this post is one that prints Hello, world! to the screen. Here is the exact code we’ll be using:

// A simple "Hello, world!" program written in C++
// By: Nick from CoffeeBeforeArch

#include <iostream>

int main() {
  // Print to the screen
  std::cout << "Hello, world!\n";
  
  // Return that the program completed successfully
  return 0;
}

To generate an executable, we can use the compiler driver g++ in the with the following commands:

g++ hello_world.cpp -o hello_world

We can execute the program using ./hello_world, which will cause "Hello, world!" to be printed to the screen.

cba@cba:~/forked_repos/CoffeeBeforeArch.github.io/_posts$ ./hello_world 
Hello, world!

Now that we’ve looked at how to generate an executable, let’s look at all the intermediate steps.

Preprocessing

The first step for generating an executable with GCC is preprocessing. In the preprocessing stage, g++ invokes cpp, The C Preprocessor. This does things like:

For our simple Hello, world! program, the the preprocessor will take care of the line #include <iostream>, and replace it the code from the header files for the stream-based I/O library.

We can do preprocessing in two ways. The first is by directly invoking cpp:

cpp hello_world.cpp -o hello_world.ii

We can also use the compiler driver g++ with the -E flag to specify we want to stop after preprocessing:

g++ -E hello_world.cpp -o hello_world.ii

The resulting output file (hello_world.ii) contains over 30,000 lines of code! Let’s take a look at some of the relevant parts. First, we have our original main function:

# 6 "hello_world.cpp"
int main() {

  std::cout << "Hello, world!\n";


  return 0;
}

The only major difference is that our comments have been stripped out.

The remaining ~30,000 lines are what replaced the #include <iostream> directive in our original program. Within these lines, we can find reference to std::cout:

namespace std __attribute__ ((__visibility__ ("default")))
{

# 60 "/usr/include/c++/8/iostream" 3
  extern istream cin;
  extern ostream cout;
  extern ostream cerr;
  extern ostream clog;


  extern wistream wcin;
  extern wostream wcout;
  extern wostream wcerr;
  extern wostream wclog;




  static ios_base::Init __ioinit;


}

Within namespace std, we see extern ostream cout;. This line specifies that the definition for our cout ostream object exists someplace else, and will be taken care of in later stages of compilation.

Compilation

Now that the preprocessor has fetched the code we needed from the iostream header files, we can move on to compilation. Compilation is the process of translating our high-level source into assembly. The GCC compiler for C++ is cc1plus.

While we can invoke cc1plus ourselves, it is not advised. Instead, we will pass our preprocessed output (hello_world.ii) to g++, and use the -S flag to say we want to stop after compilation.

g++ -S hello_world.ii -o hello_world.s

Let’s take a look at a few different parts of the assembly to see what’s going on. First, we can find our string "Hello, world!\n" in a section called .rodata:

	.section	.rodata
	.type	_ZStL19piecewise_construct, @object
	.size	_ZStL19piecewise_construct, 1
_ZStL19piecewise_construct:
	.zero	1
	.local	_ZStL8__ioinit
	.comm	_ZStL8__ioinit,1,1
.LC0:
	.string	"Hello, world!\n"

The .rodata section is for read-only data, and our string can be found under the label .LC0.

We can also find our main function:

main:
	endbr64
	pushq	%rbp
	movq	%rsp, %rbp
	leaq	.LC0(%rip), %rsi
	leaq	_ZSt4cout(%rip), %rdi
	call	_ZStlsISt11char_traitsIcEERSt13basic_ostreamIcT_ES5_PKc@PLT
	movl	$0, %eax
	popq	%rbp
	ret

Ignoring the prologue and epilogue that manages the stack, the first thing our program dies is load the address for our string from the label .LC0 using a load effect address (lea) instruction:

	leaq	.LC0(%rip), %rsi

We also see that we load the address of cout, and call a long mangled name that includes basic_ostream inside it:

	leaq	_ZSt4cout(%rip), %rdi
	call	_ZStlsISt11char_traitsIcEERSt13basic_ostreamIcT_ES5_PKc@PLT

This is really just a call to the operator::<<.

The final thing our program does is our return 0; statement. It does this by setting the %eax register to 0 using movl, then return using ret:

	movl	$0, %eax
	popq	%rbp
	ret

Note popq is just part of managing the stack.

Assembly

Our instructions must be translated into machine code by an assembler before our processor can understand them. The assembler for GCC is as (The GNU Assembler).

We can run the GNU Assembler directly by invoking as:

as hello_world.s -o hello_world.o

We can also use g++ with the -c flag to say we want to stop after assembly:

g++ -c hello_world.s -o hello_world.o

The result is typically referred to object code or object files (with the .o extension).

We can translate our object code back to human-readable assembly using the utility objdump. For example, we can run the following command that tells objdump to disassemble our object file with -d, and remove the C++ name mangling using -C:

objdump -d -C hello_world.o

By default, this will only dump .text sections, which contain the instructions of our program.

Let’s start by looking at our main function:

hello_world.o:     file format elf64-x86-64


Disassembly of section .text:

0000000000000000 <main>:
   0:	f3 0f 1e fa          	endbr64 
   4:	55                   	push   %rbp
   5:	48 89 e5             	mov    %rsp,%rbp
   8:	48 8d 35 00 00 00 00 	lea    0x0(%rip),%rsi        # f <main+0xf>
   f:	48 8d 3d 00 00 00 00 	lea    0x0(%rip),%rdi        # 16 <main+0x16>
  16:	e8 00 00 00 00       	callq  1b <main+0x1b>
  1b:	b8 00 00 00 00       	mov    $0x0,%eax
  20:	5d                   	pop    %rbp
  21:	c3                   	retq   

We now have our instructions and their hex encoding side-by-side. One important thing to note is that many of our addresses, including those for our "Hello, world!\n" string and cout have been replaced by placeholders. To understand why, we need to briefly talk about the symbol table.

The symbol table is a map between names (like cout) and information related to those names that is used by the linker. If we dump the symbol table for our object code using nm -C hello_world.o, we get the following output (note, -C gets rid of the C++ name mangling):

                 U __cxa_atexit
                 U __dso_handle
                 U _GLOBAL_OFFSET_TABLE_
000000000000006f t _GLOBAL__sub_I_main
0000000000000000 T main
0000000000000022 t __static_initialization_and_destruction_0(int, int)
                 U std::ios_base::Init::Init()
                 U std::ios_base::Init::~Init()
                 U std::cout
0000000000000000 r std::piecewise_construct
0000000000000000 b std::__ioinit
                 U std::basic_ostream<char, std::char_traits<char> >& std::operator<< <std::char_traits<char> >(std::basic_ostream<char, std::char_traits<char> >&, char const*)

For each symbol, we get an address, and the symbol type. Many of our symbols are of the U type. These means the symbols are undefined, and will need to be resolved in the next phase (linking). Other symbols are of the T or t type, meaning they live in the text (code) section of the object file. Type r symbols are from the read-only data section (like our string), and b symbols are in the BSS data section (zero/uninitialized data).

Linking

The final step of generating an executable is linking. This is where our object files get linked together, and our placeholder addresses get replaced with their final values. The linker for GCC is ld. However, GCC will use the collect2 utility which is a wrapper around ld.

We will finish linking our program using g++. By default, g++ will link against things like the C++ standard library implementation, libstdc++.so. Here is the final command we’ll use:

g++ hello_world.o -o hello_world

This gives us a fully formed executable! Let’s dump the symbol table to see how things have changed after linking:

0000000000004010 B __bss_start
0000000000004150 b completed.0
                 U __cxa_atexit@@GLIBC_2.2.5
                 w __cxa_finalize@@GLIBC_2.2.5
0000000000004000 D __data_start
0000000000004000 W data_start
00000000000010d0 t deregister_tm_clones
0000000000001140 t __do_global_dtors_aux
0000000000003d98 d __do_global_dtors_aux_fini_array_entry
0000000000004008 D __dso_handle
0000000000003da0 d _DYNAMIC
0000000000004010 D _edata
0000000000004158 B _end
0000000000001298 T _fini
0000000000001180 t frame_dummy
0000000000003d88 d __frame_dummy_init_array_entry
00000000000021ac r __FRAME_END__
0000000000003fa0 d _GLOBAL_OFFSET_TABLE_
00000000000011f8 t _GLOBAL__sub_I_main
                 w __gmon_start__
0000000000002014 r __GNU_EH_FRAME_HDR
0000000000001000 t _init
0000000000003d98 d __init_array_end
0000000000003d88 d __init_array_start
0000000000002000 R _IO_stdin_used
                 w _ITM_deregisterTMCloneTable
                 w _ITM_registerTMCloneTable
0000000000001290 T __libc_csu_fini
0000000000001220 T __libc_csu_init
                 U __libc_start_main@@GLIBC_2.2.5
0000000000001189 T main
0000000000001100 t register_tm_clones
00000000000010a0 T _start
0000000000004010 D __TMC_END__
00000000000011ab t __static_initialization_and_destruction_0(int, int)
                 U std::ios_base::Init::Init()@@GLIBCXX_3.4
                 U std::ios_base::Init::~Init()@@GLIBCXX_3.4
0000000000004040 B std::cout@@GLIBCXX_3.4
0000000000002004 r std::piecewise_construct
0000000000004151 b std::__ioinit
                 U std::basic_ostream<char, std::char_traits<char> >& std::operator<< <std::char_traits<char> >(std::basic_ostream<char, std::char_traits<char> >&, char const*)@@GLIBCXX_3.4

We now see that our symbol table has grown quite a bit, and symbols like cout that were undefined now have final addresses. Let’s use objdump to see how our assembly has changed:

0000000000001189 <main>:
    1189:	f3 0f 1e fa          	endbr64 
    118d:	55                   	push   %rbp
    118e:	48 89 e5             	mov    %rsp,%rbp
    1191:	48 8d 35 6d 0e 00 00 	lea    0xe6d(%rip),%rsi        # 2005 <std::piecewise_construct+0x1>
    1198:	48 8d 3d a1 2e 00 00 	lea    0x2ea1(%rip),%rdi        # 4040 <std::cout@@GLIBCXX_3.4>
    119f:	e8 dc fe ff ff       	callq  1080 <std::basic_ostream<char, std::char_traits<char> >& std::operator<< <std::char_traits<char> >(std::basic_ostream<char, std::char_traits<char> >&, char const*)@plt>
    11a4:	b8 00 00 00 00       	mov    $0x0,%eax
    11a9:	5d                   	pop    %rbp
    11aa:	c3                   	retq   

Inside our main function, we see that our placeholder addresses for our string and cout have been replaced by real values.

The address 4040 in the comment at the end of the lea instruction for std::cout corresponds to the same address we dumped from the symbol table:

0000000000004040 B std::cout@@GLIBCXX_3.4

We also can dump the .rodata section from the executable as we did with our object code:

hello_world:     file format elf64-x86-64

Contents of section .rodata:
 2000 01000200 0048656c 6c6f2c20 776f726c  .....Hello, worl
 2010 64210a00                             d!..            

Note, the address 2005 in the comment at the end of the lea instruction for our string corresponds to the start of our string in the .rodata section (2000 + 5). This address is calculated in the comment with <std::piecewise_construct+0x1>, where std::piecewise_construct is located at 2004, as seen in the symbol table:

text
0000000000002004 r std::piecewise_construct

Additional Notes

Generating Intermediate Files

If you want g++ to save the result from all the intermediate steps of compilation, you can pass it the -save-temps flag:

g++ hello_world.cpp -o hello_world -save-temps

This will generate the preprocessed output hello_world.ii, compiled code hello_world.s, object code hello_world.o, and executable hello_world.

Verbose Output from g++

You can also get the commands that the compiler driver g++ used to compile your application by specifying the -v option. Here’s the output from my machine running g++ hello_world.cpp -o hello_world -v:

Using built-in specs.
COLLECT_GCC=g++
COLLECT_LTO_WRAPPER=/usr/local/libexec/gcc/x86_64-pc-linux-gnu/10.0.1/lto-wrapper
Target: x86_64-pc-linux-gnu
Configured with: ../configure --disable-multilib
Thread model: posix
Supported LTO compression algorithms: zlib
gcc version 10.0.1 20200412 (experimental) (GCC) 
COLLECT_GCC_OPTIONS='-o' 'hello_world' '-v' '-shared-libgcc' '-mtune=generic' '-march=x86-64'
 /usr/local/libexec/gcc/x86_64-pc-linux-gnu/10.0.1/cc1plus -quiet -v -imultiarch x86_64-linux-gnu -D_GNU_SOURCE hello_world.cpp -quiet -dumpbase hello_world.cpp -mtune=generic -march=x86-64 -auxbase hello_world -version -o /tmp/cc9kgEvV.s
GNU C++14 (GCC) version 10.0.1 20200412 (experimental) (x86_64-pc-linux-gnu)
	compiled by GNU C version 10.0.1 20200412 (experimental), GMP version 6.1.0, MPFR version 3.1.4, MPC version 1.0.3, isl version isl-0.18-GMP

GGC heuristics: --param ggc-min-expand=30 --param ggc-min-heapsize=4096
ignoring nonexistent directory "/usr/local/include/x86_64-linux-gnu"
ignoring nonexistent directory "/usr/local/lib/gcc/x86_64-pc-linux-gnu/10.0.1/../../../../x86_64-pc-linux-gnu/include"
#include "..." search starts here:
#include <...> search starts here:
 /usr/local/lib/gcc/x86_64-pc-linux-gnu/10.0.1/../../../../include/c++/10.0.1
 /usr/local/lib/gcc/x86_64-pc-linux-gnu/10.0.1/../../../../include/c++/10.0.1/x86_64-pc-linux-gnu
 /usr/local/lib/gcc/x86_64-pc-linux-gnu/10.0.1/../../../../include/c++/10.0.1/backward
 /usr/local/lib/gcc/x86_64-pc-linux-gnu/10.0.1/include
 /usr/local/include
 /usr/local/lib/gcc/x86_64-pc-linux-gnu/10.0.1/include-fixed
 /usr/include/x86_64-linux-gnu
 /usr/include
End of search list.
GNU C++14 (GCC) version 10.0.1 20200412 (experimental) (x86_64-pc-linux-gnu)
	compiled by GNU C version 10.0.1 20200412 (experimental), GMP version 6.1.0, MPFR version 3.1.4, MPC version 1.0.3, isl version isl-0.18-GMP

GGC heuristics: --param ggc-min-expand=30 --param ggc-min-heapsize=4096
Compiler executable checksum: 46de3d8a6d088bac0288614969e77d24
COLLECT_GCC_OPTIONS='-o' 'hello_world' '-v' '-shared-libgcc' '-mtune=generic' '-march=x86-64'
 as -v --64 -o /tmp/ccnpWHbO.o /tmp/cc9kgEvV.s
GNU assembler version 2.30 (x86_64-linux-gnu) using BFD version (GNU Binutils for Ubuntu) 2.30
COMPILER_PATH=/usr/local/libexec/gcc/x86_64-pc-linux-gnu/10.0.1/:/usr/local/libexec/gcc/x86_64-pc-linux-gnu/10.0.1/:/usr/local/libexec/gcc/x86_64-pc-linux-gnu/:/usr/local/lib/gcc/x86_64-pc-linux-gnu/10.0.1/:/usr/local/lib/gcc/x86_64-pc-linux-gnu/
LIBRARY_PATH=/usr/local/lib/gcc/x86_64-pc-linux-gnu/10.0.1/:/usr/local/lib/gcc/x86_64-pc-linux-gnu/10.0.1/../../../../lib64/:/lib/x86_64-linux-gnu/:/lib/../lib64/:/usr/lib/x86_64-linux-gnu/:/usr/local/lib/gcc/x86_64-pc-linux-gnu/10.0.1/../../../:/lib/:/usr/lib/
COLLECT_GCC_OPTIONS='-o' 'hello_world' '-v' '-shared-libgcc' '-mtune=generic' '-march=x86-64'
 /usr/local/libexec/gcc/x86_64-pc-linux-gnu/10.0.1/collect2 -plugin /usr/local/libexec/gcc/x86_64-pc-linux-gnu/10.0.1/liblto_plugin.so -plugin-opt=/usr/local/libexec/gcc/x86_64-pc-linux-gnu/10.0.1/lto-wrapper -plugin-opt=-fresolution=/tmp/ccLOxCSG.res -plugin-opt=-pass-through=-lgcc_s -plugin-opt=-pass-through=-lgcc -plugin-opt=-pass-through=-lc -plugin-opt=-pass-through=-lgcc_s -plugin-opt=-pass-through=-lgcc --eh-frame-hdr -m elf_x86_64 -dynamic-linker /lib64/ld-linux-x86-64.so.2 -o hello_world /usr/lib/x86_64-linux-gnu/crt1.o /usr/lib/x86_64-linux-gnu/crti.o /usr/local/lib/gcc/x86_64-pc-linux-gnu/10.0.1/crtbegin.o -L/usr/local/lib/gcc/x86_64-pc-linux-gnu/10.0.1 -L/usr/local/lib/gcc/x86_64-pc-linux-gnu/10.0.1/../../../../lib64 -L/lib/x86_64-linux-gnu -L/lib/../lib64 -L/usr/lib/x86_64-linux-gnu -L/usr/local/lib/gcc/x86_64-pc-linux-gnu/10.0.1/../../.. /tmp/ccnpWHbO.o -lstdc++ -lm -lgcc_s -lgcc -lc -lgcc_s -lgcc /usr/local/lib/gcc/x86_64-pc-linux-gnu/10.0.1/crtend.o /usr/lib/x86_64-linux-gnu/crtn.o
COLLECT_GCC_OPTIONS='-o' 'hello_world' '-v' '-shared-libgcc' '-mtune=generic' '-march=x86-64'

Shared Libraries

One detail we did not cover in this example is shared libraries. Check out this blog post on shared libraries and dynamic loading for more details.

Final Thoughts

Understanding the individual components of compilation can be useful when debugging complex compilation errors and trying to speed up the build time of large applications. It’s also a great place to start if you want to start working on compilers yourself.

Thanks for reading,

–Nick