Getting started with linking code
In the previous post "Getting started with compiling code" I mentioned that after you've preprocessed (with e.g. gcc -E hello.c > hellop.c
), compiled (gcc -S hellop.c
) and assembled (as hellop.s
) your source code to machine code, you need to link it together with any other machine code that it depends on.
Linking against preexisting libraries
Declaring functions
Let's again look at the sample code from the previous post.
#include "stdio.h"
int main() {
printf("Hello, world!\n");
}
As I explained in the post, #include "stdio.h"
is a preprocessor directive that tells the compiler to copy all contents from "stdio.h" (in e.g. /usr/include
) and replace the #include
line with them. But "stdio.h" is just a header file, meaning (generally) that it provides a bunch of declarations without definitions. For example, in my system's "/usr/include/stdio.h" there's the following line:
extern int printf (const char *__restrict __format, ...);
This declares the function printf
, essentially making a promise that the end program will have access to a function called printf
that takes a char
pointer and returns an integer. So, by adding #include "stdio.h"
(which results in the preprocessor copying in the printf
declaration) we're telling the compiler (at the gcc -S
step) that it's fine for code in "hello.c" to make printf
calls.
Note that the header file doesn't define printf
's behavior. Unlike our "hello.c" file's main
function, "stdio.h" has no printf
function body enclosed in curly braces. Even though "stdio.h" is used to make the promise that printf
will be available in the end program, actually fulfilling that promise is handled separately. After assembling your own code into hello.o
, you need to use a linker to link that machine code with other machine code that does define printf
's behavior.
Linking with ld
Just like with compilers, there are different linkers. The default one in Linux is ld (short for "load" or perhaps "link editor" according to different sources discussed in this SO thread). Using it directly for system functions like printf
can be somewhat daunting, but here's the full CLI command that works on my machine:
ld -dynamic-linker /lib64/ld-linux-x86-64.so.2 /usr/lib/x86_64-linux-gnu/crt1.o /usr/lib/x86_64-linux-gnu/crti.o /usr/lib/x86_64-linux-gnu/crtn.o hello.o -o hello -lc
The part about selecting a particular dynamic linker and the extra ".o" files (e.g. "crt1.o") are details that we won't talk about here, but you can find more information in this SO thread and this LinuxQuestions thread.
What's more relevant to us is that we specify hello.o
, and -lc
. hello.o
tells ld
to statically link our assembly code into the end program, i.e. to put our machine code in the executable. -o hello
says to call our program/executable "hello", without any file extension. -lc
says to also link our program to a "c" library. This means that ld
looks through a default list of directories, and any directories specified by you (which we'll get to later), trying to find a "libc.so" or "libc.a" file. If we were to add -lfoo
, ld
would additionally search for for "libfoo.so" and "libfoo.a" files - that's just how the convention programmed into ld
works.
Static and dynamic libraries
"*.a" files are "archive files" which are statically linked, just like our own "hello.o" machine code. In fact, "*.a" files are little more than bundles of "*.o" files. When we link in ".a" files, the relevant code is copied directly to our executable - even if you were to delete all ".a" and ".o" files that you used when creating an executable like "hello", you would still be able to run it since the code had already been copied. "*.so" files are dynamically linked, meaning their machine code isn't actually copied to the executable. Instead, linking against an executable like "hello" means that whenever you run "hello", the dynamic linker will, similar to ld
, look through a set of default and user-defined directories to dynamically grab the code when the process is loaded into memory. This means that if you don't have the required ".so" file on hand, like if you delete "libc.so" after linking (doing this is a very bad idea), your program will fail to run.
On Linux the default dynamic linker is linux-ld.so
, still often referred to by the older name ld.so
. The /lib64/ld-linux-x86-64.so.2
we specified in the ld
command's -dynamic-linker
argument is a specific version of linux-ld.so
(more info available in the ld.so man page).
Static linking means your executable doesn't depend on any .so files to run, while with dynamic linking you might have to worry about potentially missing .so files. On the other hand, since you're copying all the code, static linking sometimes results in huge executable files and potentially many duplicates of the same machine code stored in different files, while dynamic linking means more lightweight files. When passing a flag like -lc
to ld
, dynamic libraries (a.k.a. shared objects; e.g. "/usr/lib/x86_64-linux-gnu/libc.so") are always picked over static libraries ("/usr/lib/x86_64-linux-gnu/libc.a").
All this means that after running the previous ld
command, we've got a "hello" executable that has its own copy of machine code from hello.o
(and the other .o
files, like crt1.o
), and dependencies on linux-ld.so
itself and the dynamic library libc.so
. When I try to run the file, this triggers linux-ld.so
to be loaded, which in turn searches for libc.so
and loads it into memory, making the printf
function available for use.
Inspecting executables
There are various tools for inspecting executables in Linux. One of the most useful ones when it comes to dynamic linking is ldd
(man page), which lists the dynamic libraries (shared objects) that the executable depends on:
ldd hello
# linux-vdso.so.1 (0x00007ffce5196000)
# libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007cdb70600000)
# /lib64/ld-linux-x86-64.so.2 (0x00007cdb70a24000)
As expected, there is a dependency on "ld-linux-x86-64.so.2" and "libc.so.6" (technically this is a linker script which resolved to pointing at "libc.so.6", if you're interested the details are discussed in this SO thread). The "linux-vdso.so.1" dependency is a bit special as it's baked into all modern Linux executables and can safely be ignored, as explained in the vdso man page.
ldd
lists all shared objects that an executable depends on. Say your program depends on "libfoo.so". "libfoo.so" might itself depend on "libqux.so". ldd
would list both of these, which is often what you want. However, if you only want to list direct dependencies like "libfoo.so" (and exclude ld-linux.so
and linux-vdso.so
), you can use readelf --dynamic
to list the file's dynamic section, where shared object dependencies are represented by NEEDED
lines:
readelf -d hello
# ---
# 0x0000000000000001 (NEEDED) Shared library: [libc.so.6]
# ---
(I got this from this SO thread)
Using the normal shortcut
It's very useful to understand the linking stage in-depth, as this really helps to troubleshoot linking issues. However, just like with the as
command for converting from assembly to machine code, you rarely run ld
directly. Instead, ld
is used behind the scenes for you when you run gcc
commands, which is why we get the same executable just by:
gcc hello.c -o hello
Here, preprocessing, compilation, assembling and linking is all done by the use of one command. Also note that gcc
handles commonly required ld
flags, like specifying that crt1.o
should be linked in.
Creating your own libraries
Once you understand what libraries are and how to use them, creating your own is relatively straightforward.
Static libraries
Let's say we have the following files in the same directory:
// badadd.c
int bad_add(int val1, int val2) {
return val1 + val2 + 7;
}
// badmultiply.c
int bad_multiply(int val1, int val2) {
return val1 * val2 * 2;
}
We want to put bad_add
and bad_multiply
inside of a static archive (".a" file) so that users can call both functions after just linking against one file. This is done by first creating one assembly code object file for each source code file, then jamming them into a single archive with the GNU utility program ar
(man page):
gcc -c badadd.c
gcc -c badmultiply.c
ar rcs libbadmath.a badadd.o badmultiply.o
You can read in the man page about what the r
, c
and s
ar
flags indicate.
With the library file in place, which will be used during linking, we need a header file that defines the library's interface, for use when compiling code that calls our library's functions.
// badmath.h
int bad_add(int val1, int val2);
int bad_multiply(int val1, int val2);
We also want another file with a simple main function that will use our badmath library.
// domath.c
#include "stdio.h"
#include "badmath.h"
int main() {
printf("1 + 2 = %d\n", bad_add(1, 2));
printf("1 * 2 = %d\n", bad_multiply(1, 2));
}
When we compile "domath.c", we need to tell the preprocessor, through gcc
, to search in the current directory for files to include ("."; where the "badmath.h" file is located). This is done with -I
, include, flags. We also need to tell ld
, again through gcc
, to look in the current directory, specified with an -L
flag, and that we want to link against "libbadmath" with -lbadmath
.
gcc -I. domath.c -L. -lbadmath -o domath
# alternatively, with separate steps:
# gcc -I. -c domath.c
# ld -dynamic-linker /lib64/ld-linux-x86-64.so.2 /usr/lib/x86_64-linux-gnu/crt1.o /usr/lib/x86_64-linux-gnu/crti.o /usr/lib/x86_64-linux-gnu/crtn.o domath.o -o domath -lc -L. -lbadmath
Note that the order of arguments matters. You should generally put dependents (like "domath.c") toward the left, and dependencies (libraries the dependents require) to their right. More details are available in this SO thread answer.
Dynamic libraries
The process for creating a dynamic library is very similar, except you need to supply the flag -fPIC
(Position Independent Code) when producing the machine code, and instead of ar
you run another gcc
command to jam the object files together into a library
gcc -fPIC -c badadd.c
gcc -fPIC -c badmultiply.c
gcc -shared badadd.o badmultiply.o -o libbadmath.so
When we compile the program using the shared object, we do exactly the same as when using a static library.
gcc -I. domath.c -L. -lbadmath -o domath
However, if we run ldd
on "domath" we can see that the dynamic linker, linux-ld.so, is unable to find "libbadmath.so":
ldd domath
# linux-vdso.so.1 (0x00007ffefc1ab000)
# libbadmath.so => not found
# libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x000075f9d8000000)
# /lib64/ld-linux-x86-64.so.2 (0x000075f9d8413000)
This is expected, since the random directory we happen to be in isn't one of the directories that linux-ld.so
has been preprogrammed to search in. You can solve this by relinking domath
with an -rpath
flag to ld
, using the magical $ORIGIN variable which refers to the executable's location (as described in this SO thread). However, what if we want to specify this through the gcc
command, like we did with -L
and -l
? gcc
doesn't have shorthand flags for specifying these ld
arguments, but there is the generic gcc
flag -Wl,<option>
which allows us to pass on arbitrary options to the linker:
gcc -I. domath.c -L. -lbadmath -Wl,-rpath='$ORIGIN' -o domath
This tells ld
to configure the executable's information so that the runtime linker linux-ld.so
will look for libraries in the same directory that the executable itself is located in. Now we get:
ldd domath
# linux-vdso.so.1 (0x00007ffdcdbb9000)
# libbadmath.so => /home/datalowe/Documents/ex/libbadmath.so (0x000079303e862000)
# libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x000079303e600000)
# /lib64/ld-linux-x86-64.so.2 (0x000079303e86e000)
And we can successfully run the program:
./domath
# 1 + 2 = 10
# 1 * 2 = 4
Further reading & watching
- CS50 2016 C lecture - same as previous post
- IBM article on dynamic Linux libraries - this article explains more details about how dynamic libraries are structured, and dynamic loading which I didn't touch upon at all
- ld.so man page
- ldd man page
- nm man page -
nm
is another GNU utility listing symbols (variables, functions...) used in files that can give more granular information about dependencies- e.g.
nm -gu domath
lists symbols undefined by domath itself, that need to be brought in from shared objects, whilenm -D libbadmath.so
can be used to see symbols (marked withT
) exported by the badmath library
- e.g.
- otool man page - on MacOS,
otool -L
may be used to list shared library (including .dylib) dependencies, similar to usingldd
on Linux - Microsoft dumpbin docs -
dumpbin /EXPORTS *.dll
provides information about symbols exported by a Windows DLL file, similar tonm -D
on Linux - SO thread: "Why does the order in which libraries are linked sometimes cause errors in GCC?"
- SO thread: "Where do executables look for shared objects at runtime?"