AustinTek homepage | | Linux Virtual Server Links | AZ_PROJ map server | |
Copyright © 2009 Joseph Mack
v20091207, released under GPL-v3.
Abstract
Class lessons for a group of 9th graders who have taken my introductory programming course. The class only assumes an understanding of procedural structured programming. When learning a new language, students don't need to learn structured programming again, just the syntax in the new language for constructs they already know. Class members will be expected to look up C syntax for printing to the screen, loops and conditionals.
Material/images from this webpage may be used, as long as credit is given to the author, and the url of this webpage is included as a reference.
The lessons weren't given in the order the material is presented. The path through this material wasn't as clear to me as it was for the python class. Consequently after giving a lesson, I would realise that I'd omitted some background material. In this case, I'd add the material to an earlier section and go over it in the next class.
Table of Contents
A large impediment to my learning when I was younger ('50s-60's) was the unavailability of information. (Who knew how a waveguide worked?) If no-one around you had the answer, your only alternative was the library, a long trip, with a guaranteed long search time and little guarantee of success. I gave up on finding answers to a large range of questions. Now with the internet (and search engines and wikipedia), I'm continually amazed to accidentally find that the answer has long since been available on the internet and that 40yrs ago, I'd pushed the question to the back of my mind as unknowable.
The biggest advance in learning since I was a kid, is the ready availability of information. After someone had explained an object to me, I learned C++ from Wikipedia (50%) and from my coworkers (50%). A friend of mine did a semester course on statistics, to find that everything he was studying, was carefully and well explained on Wikipedia by people who had put some time into their explanations.
With Wikipedia and the internet being the greatest step forward in education in the last 50yrs, I was disappointed to find that my son's school won't let him use Wikipedia for his homework. My son's school is 50yrs behind. Well, I use it, my coworkers use it for their jobs. In this C course and my python course I refer to Wikipedia extensively. I expect the internet and Wikipedia will be helpful in this course and in many other branches of learning.
This is known as "K&R". It's only 200 pages long, a sensible length to describe a language (I wish that other language manuals were so short). It's short, not because it's concise (in fact it has plenty of examples). It's short because C is a small language.
The material here is run on Linux kernels 2.4.x and 2.6.x, depending where I am at the time. Output will be (slightly) different depending on the kernel, compiler, and utility versions you have. It's possible that the class material will have the output from a command from one machine, and the next table looking at a file will be from another machine. Don't be alarmed if your output is a little different to mine. The code examples should all work the same though.
The hello world program, first seen in C, has become the archetypal first program for all languages. In C, most programs are derived from several files, so you will need a separate directory for each class exercise. Do this exercise in a directory class_code/hello_world/.
Type in this program with your editor (or swipe with your mouse).
//hello_world.c //Dennis Ritchie dennis@att.com #include <stdio.h> int main(){ printf ("hello world!\n"); return (0); } //------------------- |
There is a lot in this program and we will be spending some time on it. Compile this code with the instruction
gcc hello_world.c |
then look in your directory for new files. The new file a.out, is the default output from gcc. (gcc is an a.out compiler. A program configure which does automatic compiles, which we will come to later, will test the compiler type and with gcc will output the message "compiler type is a.out".) A reasonable person would have expected gcc to output a file called hello_world, but we'll come to that later.
Run your executable
hello_world# ./a.out hello world! |
Test the return code
# echo $? 0 |
All code is expected to inform the OS of its exit status by returning an int; did it run and exit cleanly, did it exit because of a problem? The int 0 is the value for success. The code here is telling the OS that it ran (and exited) cleanly, even if it didn't (we didn't test for problems). Any non 0 value is usually a failure of some sort; some will be benign, and can be ignored, while others require attention.
Comment out the return statement, run the code again and look at the exit status.
hello_world# gcc hello_world.c gcc hello_world.c hello_world# ./a.out hello world! hello_world# echo $? 13 |
What does the 13 indicate? Unless you have the source code and know a bit of C, you won't have a clue. It's the return code of the last instruction executed, printf(), telling you that printf() successfully wrote 13 characters (what are they? [1] ).
Note | |
---|---|
One of the students found that printf stands for formatted print. It must have been a long time since there was an unformatted print as I've never seen it, and it wasn't in K&R v1. |
When letting code out into the world of unsuspecting users, you must have a sensible exit code and document it in the man page(s); users will quite reasonably be distressed to see non 0 return codes, and will suspect that their answers are invalid. For a non-trivial piece of code, issuing a return code of 0 no matter what happens (like I just did here) is not acceptable. For a trivial piece of code like this, printf() is not a function that's expected to fail, so you wouldn't inspect its return code. You could in principle test if the return code was 0 (i.e. nothing written) and then have main() return a non 0 value. But no-one would bother with this for printf().
Instead of compiling the source code straight through to the executable, do this and look for new files.
gcc -c hello_world.c |
You'll have a new file hello_world.o. The .o file (pronounced "dot o") is an object file. A .o file is the assembly code that will be loaded into memory and executed when the program runs. If you don't know what sort of file you have, you can do this.
hello_world #: file hello_world.o hello_world.o: ELF 32-bit LSB relocatable, Intel 80386, version 1 (SYSV), not stripped |
Note | |
---|---|
try file on hello_world.c, a.out. |
What is file telling us?
The ELF information can be read with readelf. readelf -a gives pages of info. Instead tryThe object file contains the function main() and a function puts() (put string) of undefined (UND) location. The compiler has been smart and has replaced the call to printf() by a call to the string output function puts(). (Compilers are required and expected to be smart.)
hello_world: # readelf -s hello_world.o Symbol table '.symtab' contains 10 entries: Num: Value Size Type Bind Vis Ndx Name 0: 00000000 0 NOTYPE LOCAL DEFAULT UND 1: 00000000 0 FILE LOCAL DEFAULT ABS hello_world.c 2: 00000000 0 SECTION LOCAL DEFAULT 1 3: 00000000 0 SECTION LOCAL DEFAULT 3 4: 00000000 0 SECTION LOCAL DEFAULT 4 5: 00000000 0 SECTION LOCAL DEFAULT 5 6: 00000000 0 SECTION LOCAL DEFAULT 7 7: 00000000 0 SECTION LOCAL DEFAULT 6 8: 00000000 23 FUNC GLOBAL DEFAULT 1 main 9: 00000000 0 NOTYPE GLOBAL DEFAULT UND puts
Notice the small file size of the object file hello_world.o compared to the executable a.out.
hello_world# ls -alF . . -rwxr-xr-x 1 root root 8311 Oct 18 13:32 a.out* -rw-r--r-- 1 root root 126 Oct 18 13:32 hello_world.c -rw-r--r-- 1 root root 832 Oct 18 14:05 hello_world.o |
What's in a.out that isn't in hello_world.o?
Note | |
---|---|
In python and perl, a library file is called a module file. |
The executable doesn't do its own printing. Common functions such as outputting to a screen, reading and writing to files, sending packets over the network, are contained in a giant library libc.
printf() formats output. If the output was a real (e.g. 3.14159), then printf() would determine whether the output was to be in scientific notation or regular, how many significant figures are to be displayed, whether leading 0's are displayed, whether a leading+ is displayed for positive numbers. The assembled output (now a string) is sent to libc. In the case of displaying a string, libc doesn't do a whole lot, it calls puts(), which sends the string to the OS saying "print this at the location of the cursor". The OS knows how to handle hardware and prints the string on the hardware.
As you learn to use printf(), the syntax will probably appear clunky to you (it does to me). This may reflect the difficulty in formatting computer output for humans, as no-one (including me) has come up with a better scheme, at least if the languages that post-date C are any indication. The C scheme looks brilliant, when you compare it to C++'s. It would appear that the C++ designers said "We can't think of anything better, but we've got to have something different to C - it doesn't matter what - anything, or people will think we're idiots", so they came up with something worse.
The object code usually contains calls to libraries. The required code is found (a process called "resolving") and included in the executable by a process called linking. In the case of hello_world.o, the code for puts() must be found. gcc knows whether it's linking or compiling by the filetypes (and flags - more about them later) you send it. gcc looks through libc for puts() and includes that piece of code in the output file. Here is the link process producing a.out.
hello_world# gcc hello_world.o hello_world# ls -alFrt . . -rw-r--r-- 1 root root 126 Oct 18 13:32 hello_world.c -rw-r--r-- 1 root root 832 Oct 18 14:05 hello_world.o -rwxr-xr-x 1 root root 8311 Oct 18 15:44 a.out* |
When I say code is "included", what does that mean? libc (and all libraries) are available in two forms; the static library /usr/lib/libc.a (.a being the original extension for a library), the dynamic library /lib/libc.so (.so being shared object).
In the old days, by default, the code for puts() would be copied from the static library libc.a and spliced into the executable. This is called static linking (static, because the library code is part of the executable file and when loaded into memory to be executed, stays fixed in its location, relative to the object code). With static linking, if there are 1000 executables on the disk, all of which need the same library function, then there will be 1000 copies of that library written on the disk. Similarly if there are multiple programs running, each of which use the same library (quite likely in the case of printf()), then there will be multiple copies of the library code residing in memory. Static linking is not a good use of disk space or memory.
Along came dynamic linking (which I think was devised by Sun). In dynamic linking, the executable has the name of the library and the address in the library for the function, but no code for the function. Now no executables on disk or in memory have the code for printf(). When the executable executes printf(), the part of the executable that has the printf() code, has instead a notice to ask the OS for the code, the OS loads the library file into memory, in some location completely independant of the location of the executable, and hands the address of the library and the offset within the library of printf() to the executable. Execution passes to the library (via a longjmp longjump call) and then when done, returns to the executable. It is likely that an often used library like libc.so will be in memory continuously. This type of linking is called dynamic linking, as the library routine is only loaded when it's needed.
For some time now (about a decade), gcc defaults to dynamic linking. If the dynamic library doesn't exist, then gcc will look for a static library. We didn't tell gcc to link libc; if there are any unresolved references (computer speak for functions for which code can't be found in the object files), then gcc assumes they're in libc. libc is the default library.
How do we know what dynamic libraries our executable needs?
hello_world# ldd a.out libc.so.6 => /lib/libc.so.6 (0x40017000) /lib/ld-linux.so.2 (0x40000000) |
This tells us that a.out needs libc.so and ld-linux.so (which links the dynamic libraries at run time). If a shared library is missing, ldd will tell us that too.
What calls does our executable make? We're expecting calls to main() and puts(). Here are the names and addresses of calls made by our code.
hello_world# nm -o a.out a.out:080494b8 D _DYNAMIC a.out:08049584 D _GLOBAL_OFFSET_TABLE_ a.out:0804848c R _IO_stdin_used a.out: w _Jv_RegisterClasses a.out:080494a8 d __CTOR_END__ a.out:080494a4 d __CTOR_LIST__ a.out:080494b0 D __DTOR_END__ a.out:080494ac d __DTOR_LIST__ a.out:080484a0 r __FRAME_END__ a.out:080494b4 d __JCR_END__ a.out:080494b4 d __JCR_LIST__ a.out:080495a0 A __bss_start a.out:08049598 D __data_start a.out:08048440 t __do_global_ctors_aux a.out:08048310 t __do_global_dtors_aux a.out:0804959c D __dso_handle a.out:080494a4 A __fini_array_end a.out:080494a4 A __fini_array_start a.out: w __gmon_start__ a.out:080494a4 A __init_array_end a.out:080494a4 A __init_array_start a.out:08048400 T __libc_csu_fini a.out:080483b0 T __libc_csu_init a.out: U __libc_start_main@@GLIBC_2.0 a.out:080495a0 A _edata a.out:080495a8 A _end a.out:0804846c T _fini a.out:08048488 R _fp_hw a.out:08048278 T _init a.out:080482c0 T _start a.out:080482e4 t call_gmon_start a.out:080495a0 b completed.5900 a.out:08049598 W data_start a.out:080495a4 b dtor_idx.5902 a.out:08048370 t frame_dummy a.out:08048394 T main a.out: U puts@@GLIBC_2.0 |
Calls to functions starting with __ (i.e. two underscores) are kernel calls, presumably starting and stopping the code. Where is our puts()? It's hard to see in all of this. If you missed it you can do this
hello_world# nm -o a.out | grep puts a.out: U puts@@GLIBC_2.0 |
The U tells the loader (the library that puts the executable in memory and then tells it to start executing) that code for puts() is unknown and has to be found before execution starts (or the program will crash). If the code for puts() is not found, the loader will abort with an error.
The function main() has a "T", indicating that the code is in the "text". (another word for the image).
What if we wanted to statically link in printf()/puts()?
/hello_world# gcc hello_world.o --static -lc hello_world# ls -alFrt . . -rw-r--r-- 1 root root 832 Oct 18 14:05 hello_world.o -rwxr-xr-x 1 root root 2406986 Oct 18 17:26 a.out* hello_world# ./a.out hello world! |
Notice the larger size of the statically linked a.out compared to the dynamically linked version. How much of libc.a did we link in?
/hello_world# ls -al /usr/lib/libc.a -rw-r--r-- 1 root root 16541964 Dec 18 2005 /usr/lib/libc.a |
About 1/6th of libc.a. It's unlikely that puts() is 16% of libc.a. Let's see what's been included in our executable that might (not) be needed for puts().
Conclusion: While compilers are smart, linkers aren't smart at all. When code is linked statically, you'll get a large executable with lots of code you don't need. Linkers don't neccessarily have to be smart. The current expectation is that you will be using dynamic libraries, and if you only have one copy in memory, then you have to include it all (you never know when a piece of code will need a function), so why bother developing strategies for optimising static libraries? For embedded code (code that runs on microprocessors that don't have kbd, monitor, disks, and sit in a router, refrigerator or car door and never talk to humans), you use smaller versions of libc (e.g. μlibc) with only a subset of libc's functions, or you build your own libc.a with only the functions you need.
When do you link statically and when do you link dynamically? It depends on how much control you have over the target machine.
Microsoft executables are usually dynamically linked (with DLLs - dynamically linked libraries), as Microsoft guarantees that WinXP will be the same WinXP no matter where it is. The same is true for commercial unices (of which Solaris is about the only one left). The same is not true for Linux, which can have a range of kernel versions and packages installed. If you're shipping dynamically linked binaries for Linux users, you have to know the distribution and version on the target machine.
There is an added factor preventing you from using statically linked code; statically linked programs can be huge. Disk space, at least for files the size of executables, is cheap. However downloading a large binary will try the patience of some people. Downloading firefox as a 15MB dynamically linked executable is easier than downloading the 120MB statically linked version.
One place where code absolutely must run is at boot time. The directory /sbin has statically linked executables (except for the dynamic links to libc and ld-linux, which can be expected to be present when /sbin is mounted). Executables in the directory /bin have dynamically linked libraries.
Conclusion: if it absolutely has to work, use a statically linked executable. If you want to ship a smaller file, use dynamically linked executables. If you're compiling from source code and you control the machine, use a dynamically linked executable.
At my job, I do speed tests on cluster machines all the time (particularly following hardware changes). I need executables that are guaranteed to be the same on each machine for every test run, no matter what new libraries have been installed since I did my last run. All these test executables are statically linked. (You don't have to statically link all the libraries, just the ones that admins are upgrading all the time.)
It's not obvious why gcc gives a.out as the default name for the executable. Our hello_world comes from only one object file. Producing an executable from one object file is the exception in C. Often 100's of object files contribute to the executable (main.o, foo.o, bar.o.... baz.o). The default name for the executable then could be the name of the first object file in the link list (then your executable will always be main()). For whatever reason, you have to tell gcc the name of the executable (the -o option usually sets the name of the output file).
hello_world# gcc -o hello_world hello_world.c hello_world# ls -alFrt . . -rw-r--r-- 1 root root 126 Oct 18 13:32 hello_world.c -rwxr-xr-x 1 root root 8311 Oct 18 17:36 a.out* -rwxr-xr-x 1 root root 8311 Oct 18 19:05 hello_world* drwxr-xr-x 2 root root 4096 Oct 18 19:05 ./ hello_world# ./hello_world hello world! |
Notice that a.out and hello_world have the same size and are identical at seen by two file comparing utilities.
hello_world# diff a.out hello_world hello_world# cmp a.out hello_world |
If you're coming from python, you are used to user code, which doesn't allow you much access to the system. C is designed to write kernels and C programmers want to know what's going on in the silicon and iron below them. A lot more detail is available to the C programmer than for the python programmer. In this class you'll be only doing user level coding, but you can still take advantage of all these tools. You will learn these tools by repetition. The ones that you don't remember by repetition, you can look up. You don't need to remember how to do any of this, but you do need to know what sort of things you can do with C.
Let's look again at the hello_world code.
//hello_world.c //Dennis Ritchie dennis@att.com #include <stdio.h> int main(){ printf ("hello world!\n"); return (0); } //------------------------ |
/* This is my comment paragraph. */ |
/* A whole bunch of code you want to comment out /* This is my comment paragraph. */ code.... */ |
//A whole bunch of code you want to comment out // /* //This is my comment paragraph. // */ //code.... |
stdio.h is called a header file and by convention in C is given the extension .h. Despite the different extension, a header file is C code and is compilable just like any other C code (although it usually has no executable statements and therefore won't produce an executable object file).
The include statement has a couple of parts
gcc -I/usr/my_include/ -o hello_world hello_world.c |
Note | |
---|---|
there is no ";" at the end of a preprocessor directive (a line starting with "#"). |
Why would you want to #include a file? Include files contain the declaration of a function (or procedure or variable). The declaration tells the compiler the types for the arguments and the return value. This allows the compiler to create the right amount of space for storing these values and for type checking when the code for the function is linked. Remember the declaration of a function is different from the definition.
- The declaration is the type (and number) of arguments and the type of the return value.
- The definition is the code that implements the declaration.
In interpreted languages, the definition of a function also has its declaration. In compiled languages, it's convenient to separate the declaration from the definition, and to make the declaration available at compile time, to pieces of code, that will be later linked to the definition.
Implicit in this convention is the requirement that the declaration be independant of the definition. This is so that as some later time, the definition can be reimplemented and relinked to the code, without having to change any other code. The reimplemener doesn't even have to know what the calling code is doing; they only have to know the parameters and what's returned (and what has to be done with the parameters to get the return value).
stdio.h contains the declarations for the standard I/O functions. In our case we're defining printf(). Since we're only outputting a string, the compiler substitutes puts() for printf. Lets look at the declaration of printf() and puts(). You can do this by grep'ing /usr/include/stdio.h. Here's my results (after a bit of editing).
hello_world: # grep printf /usr/include/stdio.h extern int printf (__const char *__restrict __format, ...); hello_world: # grep puts /usr/include/stdio.h extern int puts (__const char *__s); |
Not so long ago, the declarations of functions were in terms of primitive data types that everyone recognised (e.g. int, char, string, real...). The data types here, if you follow the chain of declarations, eventually reach a primitive data type, but the chain is long and so is the number of files you have to traverse. You will meet the parameter types later (maybe). For the moment the return types are an int, the number of characters written. extern says that the variable (in this case a function) is global i.e. it can be used by every file/function in your code.
main() is the place where execution starts. (Remember how in python, execution starts at the top of the file and you have to do fancy things to stop python from executing code before the point you want it to start. In C, the code starts where you tell it to start.) Here main() returns an int and takes no parameters.
main() must always return an int (or the compiler will complain), this being the return code on exit.
A code block is something that has scope. Anything declared inside a code block will not exist after execution leaves that code block.
In python, code blocks are marked by the same number of tabs. In C (and a few other languages), code blocks are marked with {}, pronounced (among other things) "squigglies" or "squiggles" (they're also called "braces").
There are several styles of indenting (see indent style http://en.wikipedia.org/wiki/Indent_style) to mark code blocks. I will show two here. (The C compiler doesn't care about indenting, tabbing or any white space; you can have the whole program on one line if you like. Only humans care about indenting.)
int main() { print "hello world!\n"; } |
int main() { print "hello world!\n"; } |
When you're starting, you often make mistakes nesting braces and the first style (where braces line up) makes it easy to check nesting. The 2nd style is used extensively in Linux and I've managed to get used to it. There is no rational reason why one style is better than another, and there have been no attempts to even test whether one is better than another. You will find that otherwise rational people have strong opinions on the matter and handle debates on the relative merits as a religious issue. (e.g. for classes you will be required to use the method favoured by your teacher or get 0/10 for each assignment. For your job, you will be required to use the style favoured by your team leader.) The compiler doesn't care which style you use and if it doesn't care, then I don't care either. To make it easy for others to read, you should use the same style throughout a project, although for a class, there's no harm in experimenting around. Flipping a coin would be as good a method of choosing as any. I will be using the 2nd style as I've got used to it.
the "\n" is a single character, in this case a carriage return. The "\" tells the compiler to interpret the next character as a special character. For other special characters see bash escape sequences (http://tldp.org/HOWTO/Bash-Prompt-HOWTO/bash-prompt-escape-sequences.html).
End Lesson 1
Remember the include file? Let's see what happens to it. The C preprocessor cpp (see C preprocessor http://en.wikipedia.org/wiki/C_preprocessor) does several things to input files, including including include files. The preprocessor is automatically invoked by gcc. If you suspect that any of the things you've asked the preprocessor to handle aren't quite right, you'll need to look at the output of the preprocessor. You can see the result of the preprocessor by doing either of these equivalent operations
hello_world: # cpp hello_world.c > hello_world_preprocessed.c hello_world: # gcc -E hello_world.c > hello_world_preprocessed.c //stop compilation after preprocessing hello_world# dir . . -rw-r--r-- 1 root root 126 Oct 18 13:32 hello_world.c -rw-r--r-- 1 root root 18412 Oct 20 12:33 hello_world_preprocessed.c hello_world# ls -al /usr/include/stdio.h -rw-r--r-- 1 root root 27936 Dec 18 2005 /usr/include/stdio.h |
Note | |
---|---|
I've chosen the extension .c for the preprocessed file. gcc makes assumptions about the filetype based on (among other things) the extension. If you use some other extension be prepared for some other behaviour from gcc; it may no longer think that the file is C code. |
Notice that the size of hello_world_preprocessed.c is smaller than the sum of the size of stdio.h+hello_world.c. The reason for this is that stdio.h itself includes other files and uses conditional compilation directed by #ifdef statements to add or exclude lines of code, depending on flags and other information you've sent to the preprocessor (we haven't sent any information and we're getting the default behaviour). You should expect that the code included in your .c file has little direct resemblance to the included file. On my machine, the first piece of code from stdio.h recognisable in the output file is the definition of size_t.
The file produced by the preprocessor is logically equivalent, as far as compilation is concerned, to the original source file.
hello_world# gcc -c hello_world_preprocessed.c hello_world# dir . . -rw-r--r-- 1 root root 126 Oct 18 13:32 hello_world.c -rw-r--r-- 1 root root 18412 Oct 20 12:33 hello_world_preprocessed.c -rw-r--r-- 1 root root 1060 Oct 20 13:10 hello_world_preprocessed.o hello_world# gcc hello_world_preprocessed.o hello_world# dir . . -rw-r--r-- 1 root root 126 Oct 18 13:32 hello_world.c -rw-r--r-- 1 root root 18412 Oct 20 12:33 hello_world_preprocessed.c -rw-r--r-- 1 root root 1060 Oct 20 13:10 hello_world_preprocessed.o -rwxr-xr-x 1 root root 8726 Oct 20 13:14 a.out* hello_world# ./a.out hello world! |
Here are some trivial examples of functions used to illustrate declarations and show the requirement for include files..
Note | |
---|---|
A function returns a value. The name "procedure" is not used in C (or C++). In other languages, a procedure is a function that doesn't return a value (it might print some output). The name "subroutines" is not used in C (subroutines only exist in FORTRAN and BASIC). In C everything is a function. You need to understand this important conceptual framework in case you run into someone who thinks nomenclature is really important and you call a piece of code a procedure (or horrors, a subroutine). (These are the same people who get excited about indenting style and they'll think you're some programming weenie). |
Go to a new directory called my_function.
Note | |
---|---|
I needed a name for the directory, project and files which won't collide with any system files. I found too many files on my machine with names like function and project, so I called this project my_function, a rather dumb name I'll admit, but it will do. |
Swipe this code with your mouse
//my_function.c //Joseph Mack (C) 2009, released under GPL.v3 //illustrates function calls #include <stdio.h> void print_int(int x){ printf ("%d \n", x); } int main() { int i = 10; print_int(i); return (0); } //------------------- |
compile and run it (and check the exit code, just for the hell of it).
my_function# gcc -o my_function my_function.c my_function# dir . . -rw-r--r-- 1 root root 213 Oct 20 14:06 my_function.c -rwxr-xr-x 1 root root 8376 Oct 20 14:12 my_function* my_function# ./my_function 10 my_function# echo $? 0 |
print_int() is of type void i.e. it returns nothing (printf can be assumed not to fail). print_int() has one parameter, an int.
Note | |
---|---|
You can look up the syntax of printf on your own. |
There are many good reasons to split code into smaller chunks. Code development is often done my multiple people, working on their own piece of the problem. For organisation, once you have a piece of code working, you want to put it off to one side and not look at it again. For reasons like this, code is usually split into files with one function or a bunch of related functions.
Now make two files
//main.c //Joseph Mack (C) 2009, released under GPL.v3 //illustrates function calls int main() { int i = 10; print_int(i); return (0); } //-main.c------------------ |
and
//print_int.c //Joseph Mack (C) 2009, released under GPL.v3 //illustrates function calls #include <stdio.h> void print_int(int x){ printf ("%d \n", x); } //-print_int.c-------------------- |
We have split the original file split into two files, each with one function. We will now produce the two .o files and then link them to produce the executable. First print_int.c
my_function# gcc -c print_int.c my_function# dir . . -rw-r--r-- 1 root root 183 Oct 20 14:32 print_int.c -rw-r--r-- 1 root root 832 Oct 20 14:37 print_int.o |
Let's look at print_int.o.
my_function# nm -o print_int.o print_int.o:00000000 T print_int print_int.o: U printf |
The location of printf() is unknown (no surprise, linking will handle this) and the code for print_int() is in the text (the "T").
Now handle main.c
my_function# gcc -c main.c my_function# dir . . -rw-r--r-- 1 root root 164 Oct 20 14:34 main.c -rw-r--r-- 1 root root 756 Oct 20 14:39 main.o |
and looking at main.o
my_function# nm -o main.o main.o:00000000 T main main.o: U print_int |
we see that the location of the code for print_int() is unknown (will be handled on linking), while the code for main() is in the text.
We can link the two .o files and produce the executable.
my_function# gcc -o my_function main.o print_int.o my_function# ./my_function 10 |
One of the students found that you can compile to the executable directly from multiple source code files.
my_function# gcc -o my_function main.c print_int.c my_function# dir total 84 -rw-r--r-- 1 root root 221 Oct 20 16:49 main.c -rw-r--r-- 1 root root 219 Oct 20 16:52 print_int.c -rw-r--r-- 1 root root 214 Oct 21 12:17 my_function.h -rwxr-xr-x 1 root root 6436 Oct 24 15:53 my_function* |
I didn't know you could do this. I've never compiled up a bunch of source files in one go.
End Lesson 2
This is all fine and dandy, except that the above code should not have compiled. Why not? Look in main.c: the function print_int() is not declared and the compiler has no idea what it is. Being helpful, the compiler assumes any undeclared variable, parameter, return value or function is an int. This is sometimes (but only sometimes) what you want. In this case, we aren't looking at the return value from print_int(), and the assumption that the function returns an int, doesn't get us into any trouble.
Let's try the compile again, this time writing all warnings
my_function# gcc -Wall -c main.c main.c: In function 'main': main.c:8: warning: implicit declaration of function 'print_int' |
Note | |
---|---|
You should always compile with the warning flag -Wall. |
In the original one-file version of my_function.c, the function print_int() was declared above the code for main() (which calls print_int()). When the compiler, working inside main(), found the call to print_int(), it already had seen the declaration of print_int() and was able to substitute the correct return value for print_int().
Let's make sure that the correct return value is used for print_int(). Here's the new version of main.c. The fix is to declare print_int() before the compiler looks inside main().
//main.c //Joseph Mack (C) 2009, released under GPL.v3 //illustrates function calls void print_int(int); int main() { int i = 10; print_int(i); return (0); } //------------------- |
Now we can compile without getting any warnings.
my_function# gcc -Wall -c main.c my_function# gcc -Wall -o my_function main.o print_int.o my_function# ./my_function 10 |
Note | |
---|---|
We didn't need to recompile print_int.c. The .o file is fine, but to be sure that there were no problems, we could recompile with -Wall. |
In the original one-file version of my_function.c, let's see what would have happened had we reversed the order in which we defined main() and print_int(). Would this code have any problems? Swipe this code
//my_function_reversed.c //Joseph Mack (C) 2009, released under GPL.v3 //illustrates function calls #include <stdio.h> int main() { int i = 10; print_int(i); return (0); } void print_int(int x){ printf ("%d \n", x); } //-my_function_reversed.c-------------------------- |
and compile it with -Wall
my_function# gcc -Wall -c my_function_reversed.c my_function_reversed.c: In function 'main': my_function_reversed.c:10: warning: implicit declaration of function 'print_int' my_function_reversed.c: At top level: my_function_reversed.c:14: warning: conflicting types for 'print_int' my_function_reversed.c:10: note: previous implicit declaration of 'print_int' was here |
Note | |
---|---|
This time you get a warning, even without using the -Wall flag (you do get fewer warnings). In both cases, you get warnings, not errors, and the compiler will produce a .o file (with the wrong declaration of print_int()). |
In line 10 the compiler assumed, since print_int() wasn't declared, that it returned an int. When the compiler reached the declaration (and definition) of print_int() in line 14, it found a conflicting declaration for print_int().
What's the fix [2] ?
For small projects, you can avoid the problem of having to declare variables and functions ahead of time, by judiciously chosing the order in which you define them. For a sufficiently large project, you can't do this and maintain sanity. It's best to adopt the large project convention at the start and concentrate on writing your code rather than spend your time figuring out why code randomly stops working for no reason at all (some function started returning an int or it started to require an int parameter.)
This leads to your first important lesson in C: C is an unsafe language; just about any code will compile and run, thus giving you endless opportunities to shoot yourself in the foot. Even worse, half the time you won't know that you shot yourself in the foot (like happened here) and instead the hapless user will wind up shooting themselves in the foot, but will have a hard time figuring out what's gone wrong.
Note | |
---|---|
C is designed to write kernels. Kernel code must be fast (the fastest code on the machine). Fast code doesn't have time to check whether it just did something sensible. Kernel code is written by smart people who don't make mistakes, or have written enough code that they've already made every kind of mistake possible. C was never designed to write application code. Application code should have checking. If you want safe application code you should choose something like Ada (it's used for Air Traffic Control, space craft and medical equipment) or the much maligned Pascal. Pascal did service for a decade or two as the introductory programming language that anyone (including your Grandma) could learn. Unfortunately Pascal never recovered from this role and is mainly remembered with horror or as a joke by people who had no interest in learning programming, but who were forced to learn it, because "it was good for them". C became popular amongst CS students, possibly because they knew that Unix was written in C, and so naturally everyone wanted to write in the language the kernel was written in. As well C is fast, and if you write bloated code, no-one will notice the bloat (you wish). And C just about always compiles and runs, no matter how bad and buggy your code is, so businesses knew they could always ship their C code. |
It's easy to write buggy code in C (it's almost impossible to do anything else). There are no seat belts or collapsible steering columns in C. I will show you more examples of this fabulous property of C later and I'm no doubt you will come across many of your own.
Note | |
---|---|
make is one of the many tools/utilities associated with programming. Learning to use these tools doesn't require understanding in the sense of understanding maths. It's more like learning to use a drill. You need to know what it does and how to use it. If the history of computing was rerun from the start, we may find that a different tool set was developed. But what we've got is what we've got. Some tools aren't great and some are great. No matter what, it's better to use these tools than not use them. It's likely there is plenty of room for more tools. No-one knows what tools we still need, but whenever a new one arrives, we all slap our foreheads and say "that's so obvious, why didn't we think of this before?" You also need to know the lingo (nomenclature), not because it's good, but because you'll need to talk to people about what you're doing. If you expect or are asked for help, you will need to talk the same language. I have been one of the main respondants on an internet project's (the Linux Virtual Server) help list for about 10yrs. It drives me nuts to get a request from someone who hasn't read the HOWTO I wrote and who is using the wrong nomenclature and I have no idea what they're talking about. In the past I replied, trying to figure out their problem, usually finding that they were beyond help; it turns out that a person, who can't state their problem, doesn't understand it well enough to understand the answer. As a result, if the person can't state their problem in their first posting, I usually don't bother replying. Learn the lingo, not because it's good, but because you need it to communicate with others. If you arrive using the terminology correctly, people won't have to spend the first 10minutes figuring out if you know what you're talking about. You can argue whether dog is the best word for the family of animals, but if you want to talk about dogs, you have to use the word dog. |
You now have 2 files, main.c, print_int.c. If you modify one of them, which files do you have to recompile, before relinking? For a small project, you could just recompile everything, but for any real project, after making a change to your code, you want to recompile the minimum number of files, so you can test your changes. You need to figure out the dependancies, i.e. which files need to be recreated if a file is changed. The utility make reads a file Makefile in the current directory containing the dependancy information, and only (re)compiles updated files.
Writing a Makefile is a bread and butter operation for a programmer, and in this section you're going to become an expert writer of Makefiles. The syntax of Makefiles is sufficiently arcane that you usually don't write new Makefiles from scratch; you grab one that looks close enough and modify it. You get a Makefile from some project already on your computer, off the internet or from the guy in the next cube to you (a process known semi-humourously as "coding by theft".) In this case you only ever write one Makefile, your first one, and you rapidly forget how they work. Still you can be expected to produce one on demand for a new project and you can't say "You're kidding! I haven't written a Makefile from scratch in years." You have to say "no problem", download the make manual (no-one expects you to remember the Makefile syntax) and get onto it.
From here on in, you can expect to write (or modify) a Makefile for almost every piece of code you write.
To figure out what needs to be recompiled, you make up a dependancy tree. Here's what the dependancy tree looks like for our 2 files
first
then
This is a somewhat complicated way of saying
The process of compiling a set of source files into an executable is called building (you did the "build", or you "built it") and make is one of the utilities used for builds.
You can script building, and early builds used scripts (there was nothing else). (Some packages from the pre-unix days still use scripts, their authors updating the code, but resolutely determined to eschew Makefiles and all of the new fangled tools. Instead you wade through pages of instructions on how to build the package, which would have taken longer to write than any Makefile.)
Then someone devised a utility make, which handles the dependancies for us. It's more limited in its capabilities than a scripting language; however if you want to build executables, it has all you need. make can use either form of the dependancy tree above. The "somewhat more complicated" form above is a little trickier to setup, but is the only way to go for large projects.
Here's the dependancies, written as a tree.
main.c print_int.c | | | | main.o print_int.o \ / \ / my_function |
In this tree my_function is called the root, and the .c files are called leaves. The tree can just as easily drawn with the root at the top and the leaves at the bottom like this (or drawn sideways - expect a tree to be drawn in any orientation).
my_function / \ / \ main.o print_int.o | | | | main.c print_int.c |
The concept of doing something only once (e.g. including a header file, see later) when you're told to do it many times is called idempotency.
about idempotency:
If you execute the instruction "open the door" several times, only the first execution changes the state of the door. After the first time, you will see that the door is open and not do anything. Opening the door is an idempotent instruction. If instead you executed the instruction "drink a glass of water" several times, then you would keep drinking water. Drinking a glass of water is not an idempotent instruction. In the computer world, you have to know whether your command needs to be (or is) idempotent.
If you go to the bank, where you have a balance of $100 and you deposit $10, the bank's computers read the balance of your account and the amount of your deposit and then they record the command "make the balance of Joe's account $110" in a file of instructions, which are run serially later (at night, so they don't have to give you credit for the extra money till the next day; serially so that only one operation is being done on each account at a time). They don't say "add $10 to the balance". Why not? Computers are unreliable and you have to plan for a crash. Let's say the computer crashed in the middle of updating your account and as a result the computer has no record of whether the transaction was completed. The bank has to run the instructions again from the last successful transaction. If the instruction issued was "add $10 to Joe's account", no-one can tell from your balance whether the transaction succeeded. Should they add an extra $10 just incase? However if the instruction is "make the balance of Joe's account $110", the bank can run the instruction again, knowing that the balance in your account will be correct.
Some unix commands are idempotent e.g. ifconfig. You can issue ifconfig eth0 192.168.1.1 all day and the computer will accept each command, giving a return code of 0 and the IP on eth0 will stay 192.168.1.1.
However if you issue rm filename twice, and on the 2nd time the file is no longer there, then rm will return an error (there was no file to erase). Unfortunately the authors of many unix commands assume that the commands are being given by people, who know what to do with the result. It would be better if the authors wrote their code assuming the results are being monitored by a computer. Requiring a human to monitor a computer is a waste of a human. Computers are only doing simple things and can be monitored more simply by other computers. (Not so long ago, most computers didn't have keyboard or monitors.) It's also a waste of a computer, having to wait for a human to type on the keyboard. In the case of rm, a computer monitoring the results of the command will get an error code and have to figure out what to do (is the disk dead....?). Clearly when you erase a file you expect it to be gone. If it's already gone, you don't want an error. Fortunately rm has an option -f to tell not to complain if the file doesn't exist. Not all commands are fixable in this way and for these commands, before issuing the command, you have to test the state of the machine to see if the command will give an erroneous error (e.g. if the file isn't there, don't issue the command to erase it).
The command vi is not idempotent: you may want multiple editing sessions in multiple windows. The command firefox is idempotent. So that only one process writes to history (and a few other things), only one firefox is allowed to run at a time. You can open multiple windows or tabs in any firefox session. If you start up another firefox process, the firefox that's running opens another window.
The make command is idempotent.
According to make there are prerequisites, targets and commands. If a prerequisite is newer than the target, then command is run to remake the target. In the diagrams above
The syntax of a make rule is
target: prerequisite(s) command |
The target is separated from the prerequisites by ":". This type of rule, written into the body of the Makefile, is called an explicit rule. You will come to implicit rules shortly.
Note | |
---|---|
There is a "\t" before the command (i.e. the white space cannot be blanks). |
Note | |
---|---|
Rules used to have to separated by a blank line, but no longer need be. (The Makefile will be easier to read if rules are separated by a blank line). |
Here's part of our dependancy tree, rephrased in make speak.
target: main.o.
prerequisite(s): main.c.
command: If any of the prerequisites are newer than the target, then remake the target, by running the command: gcc -Wall -c main.c
make looks for its dependancy tree, in a file Makefile in the current directory. Here's the file Makefile that will handle the above dependancy tree.
#Makefile #Joseph Mack (C) 2009, released under GPL.v3 #demonstrates make for the my_function project main.o: main.c gcc -Wall -c main.c #---------------------- |
Warning | |
---|---|
If you swipe this with your mouse, it won't work. But swipe it with your mouse anyhow; you need to recognise the error message. |
Now run make.
my_function# make Makefile:6: *** missing separator (did you mean TAB instead of 8 spaces?). Stop. |
The error, at least nowadays, is informative (it used not to be that way). make requires tabs ("\t") before a command. The command must be indented by a tab. When you swiped the code with the mouse, the OS replaced the tab in the original code with 8 spaces (why this happens is another problem) and your copy of the file now has 8 spaces on the left of the command rather than the original tab. We've all suffered decades of cryptic errors from the early versions of make when 1 out of 100 tabs had been accidently replaced with 8 spaces (or worse, 5 spaces and a tab) which you can't see by eye. When python arrived with code blocks delimited by tabs, most of us threw up our hands in horror saying "haven't we learnt this lesson already".
The story (from the Unix Hater's Handbook - good reading for those wanting some perspective on Unix) is that the person who wrote the first make used tabs as delimiters. This was just a minor mistake, quite fixable. You have to choose something for your delimiter for the first iteration and a tab isn't all that bad. You're too busy creating your brainchild to see minor blemishes like this, but that's what friends are for - they let you know that they can't tell the difference between a bunch of blanks and a tab. Then came the big mistake. The author realised that now there were now 8 people using make and it was TOO LATE to fix. Of course now millions of people are using make every day, and all cursing when their Makefile which looks fine on the screen fails. make has been fixed to give an informative error message. Why on earth then can't they fix the tab problem and make the delimiter any number of blanks?
Use your editor to replace the blanks, at the start of the line with the command, with a tab. You need to make the target main.o out of date. You can do this by deleting it or by making the prerequisite main.c newer than the target. You change the date on files with touch. Here touch gives the file the current date/time.
my_function# dir -rw-r--r-- 1 root root 214 Oct 20 15:50 main.c my_function# touch main.c my_function# dir -rw-r--r-- 1 root root 214 Oct 21 12:17 main.c |
Now with main.c being newer, main.o is out of date and needs to be remade. Rerun make.
my_function# make gcc -Wall -c main.c |
and look for the new main.o. Run make again: make will tell you that main.o is up to date (doesn't need to be remade) and make won't run command. (What sort of command is make that running it multiple times won't do anything more than running it once [3] ?) If you want to remake the files now, you'll have to touch a prerequisite or delete the target (try both actions).
Comment out the rule (two lines) in Makefile that makes main.o. Add another rule to your Makefile to produce the target print_int.o. Run make again - you will get a new print_int.o. Then comment out the rule to make print_int.o and write a rule to make the target my_function (and run it). Here's the resulting Makefile [4] . (Remember, if you swipe this off the screen, you'll have to do something to it before it will work.)
Now we need to get these 3 rules to work as a tree. make has to be presented with the final target (in this case my_function) (in tree speak, the root of the tree), then look at the root's prerequisites (main.o, print_int.o). If these in turn have prerequisites, then they must be checked, until you get to the leaves on the tree (in this case main.c, print_int.c). Then make works back towards the root, (re)making targets on the way.
Here's the new version of your Makefile.
#Makefile #Joseph Mack (C) 2009, released under GPL.v3 #demonstrates make for the my_function project all: my_function main.o: main.c my_function.h gcc -Wall -c main.c print_int.o: print_int.c my_function.h gcc -Wall -c print_int.c my_function: print_int.o main.o gcc -Wall -o my_function print_int.o main.o #---------------------- |
If you give make no options (i.e. you just type make at the prompt) it will make only the first target, here all. make looks to the prerequisits for all, in this case my_function and then looks for prerequisites for my_function working its way back to the source files. When the end of the chain is reached, the commands are run (the targets are remade), producing the targets, which become updated prerequisites for targets in other rules.
With the files now all upto date, if you want to run other commands, first you will have to delete a target or touch a prerequisite. You can make targets explicitly; e.g. make print_int.o or make main.o. You can also run make all, although this is the same as running make (at least for this Makefile, since all is the first target).
The target all is not a file, although it behaves as if it were. There is no file all, so the target all is always out of date (it will always be remade). With all always out of date, make will check to see if the prerequisite my_function needs to be remade. It's possible you could have a file all in your directory. If all were newer than the prerequisite my_function, then the target would not be remade. To handle this possibility, you tell make that all is not a file, but a phony target (i.e. all is a label). Here's Makefile now.
#Makefile #Joseph Mack (C) 2009, released under GPL.v3 #demonstrates make for the my_function project .PHONY: all all: my_function main.o: main.c gcc -Wall -c main.c print_int.o: print_int.c gcc -Wall -c print_int.c my_function: print_int.o main.o gcc -Wall -o my_function print_int.o main.o #---------------------- |
You want to be able to delete all your targets (i.e. everything except the leaves), so you can do a complete compile from scratch. This is called clean'ing your directory tree. Here's the Makefile with a clean target. You run make for a specific target, e.g. the target clean, with make clean. (By tradition all: is at the top of the Makefile, while clean: is at or near the bottom.)
#Makefile #Joseph Mack (C) 2009, released under GPL.v3 #demonstrates make for the my_function project .PHONY: all clean all: my_function main.o: main.c gcc -Wall -c main.c print_int.o: print_int.c gcc -Wall -c print_int.c my_function: print_int.o main.o gcc -Wall -o my_function print_int.o main.o clean: rm -f *.o my_function #---------------------- |
Note | |
---|---|
If any of the files you want to remove are absent, rm will complain that it can't delete them (because they aren't there to be deleted). The -f (force) option tells rm to be silent, if any of the files to be removed are absent. (rm is not idempotent, but rm -f is.) |
Now run make clean, check that the expected files have gone and then run make to remake the (target) files. Now we can demonstrate how .PHONY works. With all files upto date, remove clean from the .PHONY list, then create a file clean in the current directory that is newer than all the files in there now (how do you do that [5] ?) then run make clean again. make tells you that the target clean is upto date; you don't want this. Instead you want make to ignore the file clean. The directive .PHONY handles this. Add clean back to the .PHONY list and show that make clean now deletes the expected files even in the presence of the file clean.
Note | |
---|---|
Just don't have files with names all, clean in your development directories, OK? |
Other common .PHONY targets are install and test/check. (The commands for install install the compiled code.)
You've probably been editing in one window and running make in another window, or else you've exited from your editing session to run make. You can run commands from within your editor. This is a process called shelling out. In vi, after saving your file, hit "ESC" then ":". You have a ":" prompt on the bottom line of the screen. You can now invoke a shell process with "!your_shell_command" e.g. !make. When make returns (exits), you'll be back at the same spot in your editor as you were before shelling out. This only is useful for short compiles, when you don't expect (many) errors. If you get a page full of errors and have to roll the screen back with the mouse, then you'll need to go to a real window to handle this. If the compile takes minutes, you still want your editor, so it's best to fire up make in another window, so you can keep editing in one window, while make runs in the other (and you hope that the compile succeeds).
If we change Makefile, should we recompile? In general Makefile should be a prerequisite.
If you make Makefile a prerequisite, then a trivial change to the Makefile will trigger a complete recompile. make can't tell whether you've made trivial or substantial changes to the Makefile, possibly rearranging the directories and libraries produced. A complete recompile in this case is just life, I'm afraid. In the case of the .PHONY target install, a change in the install directory, after you've finished the compile, will trigger a complete recompile. For an overnight or multiday compile, this is a real pain. You can handle this by making sure that you've decided on the install directory before you compile, or you can have the part of the Makefile that does the install, included from the Makefile, and not have the include file as a prerequisite.
To make Makefile a prerequisite;
#Makefile #Joseph Mack (C) 2009, released under GPL.v3 #demonstrates make for the my_function project .PHONY: all clean all: my_function main.o: main.c Makefile gcc -Wall -c main.c print_int.o: print_int.c Makefile gcc -Wall -c print_int.c my_function: print_int.o main.o Makefile gcc -Wall -o my_function print_int.o main.o clean: rm -f *.o my_function #---------------------- |
Why couldn't we say "it should work, we added Makefile as a prerequisite to each of the rules"?
"It should work" only flies with people and politicians. What differentiates an engineer from everyone else is that an engineer tests that it works. Sure it should work, but do you know that it does work? You shouldn't pay even 1˘ for something that "should work". Are you going to trust your life to "it should work?". No way. You always verify. You only trust your life (or your money) to "it does work". It costs very little to do these tests; 100s or millions of people may wind up depending on what you've done. You have to test it.
Makefiles can be used for any multistep process. I edit the web pages for this class with vi producing an xml file. The xml is converted to an html by xsltproc using a Makefile. If the conversion succeeds, the files are copied to my usb flash drive.
Why didn't we use Makefiles for the python class? In python, after you exit the editor, you already have the executable (the .py file). In C (or xml, or many other languages), there is another step to convert an ascii file to the final file. In general in unix, the editing step is separate from the step which produces the file(s) you want. This allows you to use the same editor for any process.
Note | |
---|---|
Nomenclature: compile is the process of turning ascii source files into an executable. It's also the name of any step in this process. The step of turning object files into an executable has a special name - linking, but it's also part of the compile too. The nomenclature is quite loose here; compile can mean any step or all of it. |
On inspecting our Makefile, you'll notice that the same command is run on print_int.c and main.c. If you had to compile 1000 source files, you wouldn't want to make 1000 separate rules, identical except for the filenames. As a computer programmer, you should be thinking about writing a function to do this. You shouldn't be surprised to learn that make knows how to compile 1000 source files in a directory.
Modify your current Makefile, commenting out the rules that make the object files.
#Makefile #Joseph Mack (C) 2009, released under GPL.v3 #demonstrates make for the my_function project .PHONY: all clean #all: my_function #main.o: main.c Makefile # gcc -Wall -c main.c #print_int.o: print_int.c Makefile # gcc -Wall -c print_int.c my_function: print_int.o main.o Makefile gcc -Wall -o my_function print_int.o main.o clean: rm -f *.o my_function #-Makefile--------------------- |
Note there's only one (explicit) rule in this Makefile (ignoring the clean rule). Run make clean so we can start from scratch. Run the new Makefile (with the command make, which makes the first target it finds; my_function).
my_function# make cc -c -o print_int.o print_int.c cc -c -o main.o main.c gcc -Wall -o my_function print_int.o main.o |
Before I tell you what happened: What is cc?
In the early days, there was only one C compiler on a machine. It was supplied by the machine's manufacturer, it was proprietary, it only ran on that type of machine, and it was called cc. Being proprietary, you couldn't guarantee that your code would compile on another machine, with its different proprietary compiler (standards committees partially handled this). Later, independant suppliers produced compilers with a different name, so you could fire up whichever compiler you wanted. The GNU C compiler is called gcc and it runs on a wide range of machines. Using gcc you can now expect that if your code compiles on one machine, it will compile on any machine. Programs (like make) needing to call a C compiler now didn't know the compiler's name. The convention adopted was that every machine still had to have an executable cc (the C Compiler), but now it's a link to the user's preferred compiler. On my machine gcc is the C compiler and cc is a link to it.
With the inroads from independant compiler providers, computer manufacturers became less interested in supplying compilers (after all, compiler writing is a lot of work, and users were becoming disaffected with proprietary compilers that wouldn't compile their code). Manufacturers, realising that users weren't neccessarily using their compiler, gave their compiler a different name too. Sun which had a good compiler, used to bundle their compiler free with their machines. Then they let the world know that they didn't want anyone to use their compiler anymore, by charging a fortune for it. The world readily followed Sun's lead and adopted gcc. It didn't take long before the world realised they didn't need Sun's hardware either and Sun was bought out by Oracle.
Here's what happened (in order) when you ran make:
cc -c -o file.o file.c |
gcc -Wall -o my_function print_int.o main.o |
The implicit rule for turning the source code into the object file, we've used before and discarded. Why [7] ? Implicit rules are flexible and can be modified to add warning flags.
The implicit rule for compiling C files to object files is this
%.o : %.c $(CC) -c $(CFLAGS) $(CPPFLAGS) $< -o $@ |
Explanation:
Note | |
---|---|
differences: with assignment, the "=" can have blanks on each side; the syntax of variables are $(CC), rather than ${CC}. |
Note | |
---|---|
The implicit rule has the output file (-o $@) at the end of the command. I've always had the output file as the first filename in the command line. The compiler doesn't care where the output file is in the command line; the compiler just looks for the -o option to find the name of the output file. |
To include -Wall in the implicit rule, we need to set the variable $(CFLAGS). Here's our new Makefile.
#Makefile #Joseph Mack (C) 2009, released under GPL.v3 #demonstrates make for the my_function project WARN = -Wall CFLAGS = $(WARN) .PHONY: all clean #all: my_function #main.o: main.c my_function.h Makefile # gcc -Wall -c main.c #print_int.o: print_int.c my_function.h Makefile # gcc -Wall -c print_int.c my_function: print_int.o main.o Makefile gcc -Wall -o my_function print_int.o main.o clean: rm -f *.o my_function #-Makefile--------------------- |
Note | |
---|---|
For the variables, I could have written
CFLAGS = -Wall |
in one step, but $(CFLAGS) can be quite long and you want to be able to turn parts of it on and off, without risking an editing blunder in a long string. If you want to turn off warnings, the definition of $(WARN) can be commented out.
Note | |
---|---|
Variable names (e.g. $(WARN) ) are traditionally uppercase. Presumably this is a hangover from the early days of computing, when all text was uppercase. (As to why this is so, see Eric Raymond's "Hacker's Dictionary" - hint; uppercase was chosen by management, not by the technical people.) There is no reason for the variables still to be uppercase. Possibly one day, some well known programmer will change to the more readable lowercase, and everyone will follow suit. Year's later people will wonder why anyone ever used uppercase. Till then, since people are expecting uppercase for Makefile variables, I'm going to continue to use it. |
Here's the build.
my_function# make cc -Wall -c -o print_int.o print_int.c cc -Wall -c -o main.o main.c gcc -Wall -o my_function print_int.o main.o my_function# dir -rw-r--r-- 1 root root 256 Oct 29 12:18 main.c -rw-r--r-- 1 root root 462 Oct 29 14:11 Makefile -rw-r--r-- 1 root root 408 Oct 29 14:12 print_int.c -rw-r--r-- 1 root root 832 Oct 29 14:12 print_int.o -rw-r--r-- 1 root root 756 Oct 29 14:12 main.o -rwxr-xr-x 1 root root 6376 Oct 29 14:12 my_function* my_function# ./my_function 10 |
Let's remove commented lines, add a few variables, and add back the target all. (The variable names are arbitary, but traditional.)
#Makefile #Joseph Mack (C) 2009, released under GPL.v3 #demonstrates make for the my_function project OBS = print_int.o main.o EXEC = my_function MAKEFILE = Makefile CC = gcc WARN = -Wall CFLAGS = $(WARN) .PHONY: all clean all: $(EXEC) $(EXEC): $(OBS) $(MAKEFILE) $(CC) $(CFLAGS) -o $@ $(OBS) clean: rm -f $(OBS) $(EXEC) #-Makefile--------------------- |
Here's the build
my_function# make clean rm -f *.o my_function my_function# make gcc -Wall -c -o print_int.o print_int.c gcc -Wall -c -o main.o main.c gcc -Wall -o my_function print_int.o main.o my_function# ./my_function 10 |
Is this Makefile obscure or what? What have we done?
The prerequisites are "$(OBS) $(MAKEFILE)". The two automatic variables which parse the prerequisites are "$<": the first prerequisite and "$?": all prerequisites. There is no automatic variable for "all but the last prerequisite" or "the prerequisites which are object files". It seems that the authors of make don't expect Makefile to be a prerequisite of $(EXEC). Are the authors being sloppy in assuming that Makefile won't be a prerequisite, or lazy in not coding up a new automatic variable or is there something I'm missing? I can't figure another way to make $(MAKEFILE) a prerequisite of $(EXEC) (making it a prerequisite of all: doesn't do it). I assume the authors are being sloppy.
Originally we'd made Makefile a prerequisite for each object file. Now Makefile is a prerequisite only for the executable and not for the object files. As well we no longer have targets for the individual objects. The only targets in the Makefile are all, clean and $(EXEC) (invoked with make my_function). It's seems like we've lot some safety trying to shoehorn our Makefile into the standard Makefile format. However we have gained a shorter (but less comprehensible) Makefile, We'll fix all this shortly (except the comprehensibility part).
If you wanted to adapt this Makefile for a new project, what would you have to change? [8] You should expect that by the time you write your next Makefile, that you'll have to look up the meanings of the automatic variables.
End Lesson 4
Makefile no longer has rules with the targets main.o and print_int.o. Run make clean. What do you think will happen if you do make main.o or make print_int.o. Explain what you see. Here's my explanation [9] .
You need to know how make works. You also need to know what happens when make doesn't work. More often than you'd like, when compiling large packages you've downloaded from the internet, a build failure will give you one of make's obscure error messages. You need to be prepared for these.
What about make foo.o
my_function# make foo.o make: *** No rule to make target `foo.o'. Stop. |
Telling make to make foo.o clearly is ridiculous no? It isn't rediculous. Despite what make says, there is an (implicit) rule to make foo.o. You make it from foo.c. The rule isn't missing. What's missing is the prerequisite foo.c. We can fix that.
my_function# rm foo* my_function# touch foo.c my_function# make foo.o gcc -Wall -c -o foo.o foo.c /my_function# dir -rw-r--r-- 1 root root 0 Oct 31 00:35 foo.c -rw-r--r-- 1 root root 619 Oct 31 00:35 foo.o my_function# nm -o foo.o my_function# make foo gcc -Wall foo.c -o foo /usr/lib/gcc/i686-pc-linux-gnu/4.4.1/../../../crt1.o(.text+0x18): In function `_start': : undefined reference to `main' collect2: ld returned 1 exit status make: *** [foo] Error 1 |
See there is a rule for making foo.o (the file is in the directory for everyone to see). nm doesn't find any symbols in it, which is not surprising, since foo.c is empty. Why did the linking of foo.o to foo fail [10] ?
This would fix the failure to produce foo by changing foo.c
my_function# cat foo.c int main(){return 0;} my_function# make foo gcc -Wall -c -o foo.o foo.c gcc foo.o -o foo my_function# ./foo my_function# |
There's something missing in the last command (the line gcc foo.o -o foo). [11] ?
In big compiles, there can be files with no explicit targets and you can get all sorts of errors that you have no idea where they came from.
You can write your own explicit rules which are modified implicit rule(s), called pattern rules. Here's the template
%.o: %.c $(CC) -c $(CFLAGS) $(CPPFLAGS) $< -o $@ |
The "%" is a wildcard. It says that any x.o target has a prerequisite x.c. If the target: prerequisit line had been %a.o: %a.c then this rule would have been used to make xa.o from xa.c, but make would have to look for another rule to compile xb.c. A pattern rule overrides an implicit rule with the same target: prerequisit pair.
Pattern rules traditionally are added at the bottom of the Makefile (below clean). What pattern rule would you write to make Makefile a prerequisite when compiling a .c file to an object file? Here's the Makefile [12] . For good measure, I've added $(LDFLAGS) to handle the problem that -Wall isn't in the implicit rule for executable targets.
You now have an explicit (pattern) rule for any .o file. To show that Makefile is a prerequisite for each object file, try these commands: make clean, make, make clean, make print_int.o, make print_int.o, touch Makefile, make print_int.o to show that you can make specific .o files and that Makefile is a prerequisite for the .o files.
For any target, you can have as many commands as you like. Each command is run in its own shell, one line after another. There is no state carried over between lines. Each line is a world unto itself.
Note | |
---|---|
You can have multiple commands on one line, separated by ";". In this case since the same shell is executing all the commands on the line, state is maintained between commands. |
There is no iterating and conditionals are hard to write. However this is not a problem for compiling; you're just running a series of commands.
Because you can have multiple commands in a rule, the Makefile can handle backup.
Hard disks are guaranteed to fail. Any file on only one disk cannot be regarded as saved. If the compile succeeds you want the good files copied to a backup directory, on some other disk (not on the current disk). I use a usb flash disk to hold the files backed up with make. Later I copy the files from the flash drive to a disk on another machine, so I have 3 copies of all files.
One useful feature of make is that if any command fails (i.e. a compile returns a non-0 return code), then make exits. In this case the only new files produced are ones from compiles that succeed. The backup commands won't be fed any files that didn't compile.
To do backup
Note | |
---|---|
Make the variable $(BACKUP) a directory appropriate for your setup (it's on a flash disk in mine). |
#Makefile #Joseph Mack (C) 2009, released under GPL.v3 #demonstrates make for the my_function project #-------------------------- #user defined variables #project files OBS = print_int.o main.o EXEC = my_function #backup #at work, my usb disk is mounted on /pny/home/ #at home, my usb disk is mounted on /pny/work/ #when I change locations, I can copy the files with #cd ~/project_directory #cp -pauv /pny/$where_i_am/`pwd`/ . BACKUP_TOP_DIR = /pny/work #-------------------------- #utilities CP = cp -pauv RM = rm -f PWD = `pwd` #backup BACKUP = $(BACKUP_TOP_DIR)/$(PWD) #make specific variables MAKEFILE = Makefile CC = gcc #FLAGS WARN = -Wall CFLAGS = $(WARN) LDFLAGS = -Wall #----------------------------- .PHONY: all clean all: $(EXEC) $(EXEC): $(OBS) $(MAKEFILE) $(CC) $(LDFLAGS) -o $@ $(OBS) %.o: %.c $(MAKEFILE) $(CP) $< $<.sav $(CC) -c $(CFLAGS) $(CPPFLAGS) $< -o $@ $(CP) $< $<.sav2 $(CP) $< $(BACKUP) clean: $(RM) $(OBS) $(EXEC) #-Makefile--------------------- |
I've divided the Makefile into sections and labelled them. This will help anyone wanting to modify the file. In such files there usually is a section at the very top labelled user defined variables; variables that the user will need to change if the file is adapted to another project. Changes here won't crash the make. Changes to other variables in the other sections should only be made if you know what you're doing. (The variable $(BACKUP) needs $(PWD) otherwise it would be in the "user defined variables" section.)
You don't need to backup $(EXEC), only the source files. You can recreate $(EXEC) anytime by running make.
End Lesson 5library wikipedia (http://en.wikipedia.org/wiki/Library_(computing)), static libraries http://www.linux.org/docs/ldp/howto/Program-Library-HOWTO/static-libraries.html, shared libraries (http://www.linux.org/docs/ldp/howto/Program-Library-HOWTO/shared-libraries.html).
In a large project, it's possible that some of the functions you've written will be useful to others. In this case you put these functions into a library, which is installed (usually) in /usr/lib (you have to be root to do this - why [13] ?) and you write a man page for the library. Even if the functions aren't useful to anyone else, it's possible that a set of code will survive relatively unchanged in future versions of your code, and for organisational purposes (your sanity), you separate it out into a library tree. It's possible that you'll write functions into your libraries that you won't need for your executable - you may write a set of fuctions for completeness. If you're using someone else's library, it's very likely that you won't use all their functions.
A static library file will have a name like libmy_function.a - this has no version information in the name.
A dynamic library will have a name like libmy_function-2.0.1.so or libmy_function.so.2.0.1, containing version information. The dynamic library will have a chain of links (set up by ldconfig), possibly back to libmy_function.so e.g.
my_function# ls -alFrt /usr/lib | grep my_function -rwxr-xr-x 1 root root 322223 Sep 26 11:56 libmy_function.so.2.0.1* lrwxrwxrwx 1 root root 17 Sep 26 11:56 libmy_function.so.2 -> libmy_function.so.2.0.1* lrwxrwxrwx 1 root root 17 Sep 26 11:56 libmy_function.so -> libmy_function.so.2.0.1* -rw-r--r-- 1 root root 408464 Sep 26 11:56 libmy_function.a |
Note | ||
---|---|---|
Although we're staying with the same two source files (main.c, print_int.c) and Makefile, we're going to make enough changes, that examples from previous sections will no longer work. Make a new directory my_function.static, copy all your files into it and continue work there. You can do this with
The copy command creates the target directory and then copies the files. In large builds, files are built in their own directory, each with their own Makefile (the directory can have 1..any number of source files - whatever you need to organise your code). A Makefile in the top directory will direct building of code in the subdirectories, by cd'ing to the subdirectories and running make in each of them. |
Let's make a library, containing a function not called by your executable (or called by anything in fact, it's just going to be an unused function). Add void hello_world(void) which just prints "hello world!\n", to print_int.c. Here's the new print_int.c [14] .
Run make and check that you just get the usual output "10" from ./my_function. Now make your library with
ar rcs libmy_function.a print_int.o |
Note | |
---|---|
Some commands have a "-" before their flags, some don't. You can have as many .o files on the command line as you like. |
The standard name for a library starts with lib. The archiver, ar, puts the functions from the .o file(s) into the static library, (usually) in alphabetical order, along with a symbol table. There's not much difference between print_int.o and libmy_function.a. (The two files don't look much different to readelf -s either.)
my_function# dir -rw-r--r-- 1 root root 932 Oct 25 15:46 print_int.o -rw-r--r-- 1 root root 1094 Oct 25 15:52 libmy_function.a my_function# nm -o libmy_function.a libmy_function.a:print_int.o:0000001c T hello_world libmy_function.a:print_int.o:00000000 T print_int libmy_function.a:print_int.o: U printf libmy_function.a:print_int.o: U puts my_function# nm -o print_int.o print_int.o:0000001c T hello_world print_int.o:00000000 T print_int print_int.o: U printf print_int.o: U puts |
Now we need to link the library to the executable. I'm giving the executable a new name my_function.static, to differentiate it from the executable you've previously produced.
Note | |
---|---|
Nowadays, the default compile produces the dynamically linked executable. However, some executables (ones that just have to run, like files that run at boot time) are needed as statically linked executables. Some builds produce both forms of the executable, with names like my_function.static and my_function.dynamic (there's no traditional name for the two differently linked forms of the executable, but obvious names are used). At install time, you choose the one to install and the installed executable will have the standard name, in this case my_function. |
from man ld; The linker looks for the library in /lib and /usr/lib. If the library is not found, the linker aborts. To get the linker to look elsewhere for the library file, you can
For linking a library, you can use the abbreviation for the library name. -lmy_function can be used instead of libmy_function.a or libmy_fuction.so.
from man ld; -lmy_function looks first for libmy_function.so, then if not present looks for libmy_function.a. Here since there is no libmy_function.so, gcc links against libmy_function.a.
Here are some example ways of linking the static library. All are logically equivalent.
my_function# gcc -o my_function.static main.o ./libmy_function.a |
my_function# echo $PWD /src/www.austintek.com/c_class/class_code/functions my_function# gcc -L$PWD -o my_function.static main.o libmy_function.a |
my_function# gcc -L$PWD -o my_function.static main.o -lmy_function |
my_function# gcc -L. -o my_function.static main.o -lmy_function |
Here's the output.
my_function# dir -rw-r--r-- 1 root root 1094 Oct 25 15:52 libmy_function.a -rw-r--r-- 1 root root 1028 Oct 25 17:18 main.o -rwxr-xr-x 1 root root 6548 Oct 26 20:32 my_function.static* my_function# ./my_function.static 10 |
my_function# ldd my_function.static linux-gate.so.1 => (0xb80ba000) libc.so.6 => /lib/libc.so.6 (0xb7f6d000) /lib/ld-linux.so.2 (0xb80bb000) |
main() didn't ask for and doesn't need the function hello_world(). Is it in the executable?
my_function# nm -o my_function.static my_function:0804953c D _DYNAMIC my_function:08049608 D _GLOBAL_OFFSET_TABLE_ my_function:0804850c R _IO_stdin_used my_function: w _Jv_RegisterClasses my_function:0804952c d __CTOR_END__ my_function:08049528 d __CTOR_LIST__ my_function:08049534 D __DTOR_END__ my_function:08049530 d __DTOR_LIST__ my_function:08048524 r __FRAME_END__ my_function:08049538 d __JCR_END__ my_function:08049538 d __JCR_LIST__ my_function:08049628 A __bss_start my_function:08049620 D __data_start my_function:080484c0 t __do_global_ctors_aux my_function:08048350 t __do_global_dtors_aux my_function:08049624 D __dso_handle my_function:08049528 A __fini_array_end my_function:08049528 A __fini_array_start my_function: w __gmon_start__ my_function:08049528 A __init_array_end my_function:08049528 A __init_array_start my_function:08048480 T __libc_csu_fini my_function:08048430 T __libc_csu_init my_function: U __libc_start_main@@GLIBC_2.0 my_function:08049628 A _edata my_function:08049630 A _end my_function:080484ec T _fini my_function:08048508 R _fp_hw my_function:0804829c T _init my_function:08048300 T _start my_function:08048324 t call_gmon_start my_function:08049628 b completed.5900 my_function:08049620 W data_start my_function:0804962c b dtor_idx.5902 my_function:080483b0 t frame_dummy my_function:08048414 T hello_world my_function:080483d4 T main my_function:080483f8 T print_int my_function: U printf@@GLIBC_2.0 my_function: U puts@@GLIBC_2.0 |
Yes. If you link against some huge static library, with 1000's of functions and you only need one of them, the executable could have all the functions in the library linked into it.
Note | |
---|---|
for an explanation of "could" see granularity of object files. The explanation will be simpler to work through, when we've written a Makefile to do the compiling for us. |
Making a static library is not a complicated process - instead of linking all the object files in one step to main(), you do it in two steps - put the object files into a big file (the library), and then link the library to main().
Does make have an implicit rule for making static libraries? You can look in the manual or you can ask the computer. I asked the computer first. See if you can figure out the likely target and prerequisite(s) (if you find an implicit rule, make will know the command). Here's what I did [15] and then, to make sure I hadn't missed anything, I looked in the manual.
There are no implicit rules for making static (or dynamic) libraries. Modify Makefile to make the static library with an explicit rule. I needed
Here's my Makefile [16] . Check (use nm) that you get a static library with the functions print_int(), hello_world().
Update the Makfile to make the statically linked executable $(EXEC_STATIC) and add it as a prerequisite to all.
Note | |
---|---|
logically this executable is identical to the one which is made by linking in all the object files; as it turns out, on my machine the two files are the same size, but differ in only one char, at least according to cmp. |
Make sure you link against the static library in the current directory and not one in /usr/lib that you didn't know about (or had forgotten about and which could be older than the one you're trying to link against right now). Update the rule for the target clean:. Here's my Makefile [17] .
Check your static executable for the functions print_int(), hello_world().
End Lesson 6
Note | |
---|---|
I delayed presenting this material, so that you would have a Makefile to help with the compiles. |
Make a new directory my_function.static.linking, copy all files from my_function.static and work in the new directory.
In a previous section (static library), we found that an uncalled library function was bound into the executable.
static libraries are collections of object files:
To the linker, the library looks like a collection of object files, rather than a collection of functions. In computer speak, we say that the linker has granularity at the file level (here file==object file). On linking, if an executable needs a particular function (from an object file, contained in the library), then all functions from that object file, whether or not they are called, are linked. If no functions from an object file are needed, then that object file is not linked.
To show that static libraries are collections of object files, split print_int.c into two files print_int.c, print_int_2.c as is done here (you can duplicate the files and comment out lines to make the files below). Now print_int() and hello_world() are in separate .o files.
//print_int.c //Joseph Mack (C) 2009, released under GPL.v3 //illustrates function calls #include <stdio.h> void print_int(int x) { printf ("j=%d \n", j); } //-print_int.c-------------------- //print_int_2.c //Joseph Mack (C) 2009, released under GPL.v3 //illustrates function calls #include <stdio.h> void hello_world() { printf ("hello world!\n"); } //-print_int_2.c-------------------- |
Add the new object file to the various $(OBS) variables in your Makefile and rerun the compile. (Make sure that the command which contructs the library file adds both object files, not just print_int.o.) Run nm on the static library and the statically linked executable and look for hello_world(). Here's the library, showing that hello_world() was linked in.
my_function# nm -o libmy_function.a libmy_function.a:print_int.o:00000000 T print_int libmy_function.a:print_int.o: U printf libmy_function.a:print_int_2.o:00000000 T hello_world libmy_function.a:print_int_2.o: U puts |
Here's an abbreviated output from looking at the executable, showing that hello_world() was not linked.
my_function# nm -o my_function.static . . my_function.static:080483e4 T main my_function.static:08048420 T print_int my_function.static: U printf@@GLIBC_2.0 my_function.static: U sleep@@GLIBC_2.0 |
You want to organise your source files so that related functions end up in the same .o file, and unrelated functions end up in separate .o files. (The definition of "related" then would be "functions that the linker regards as unseparable, when putting into the executable".) Here hello_world() and print_int() can be treated as unrelated functions. You'll later see (static functions) an example of functions, which need to be in the same .o file.
How can we tell which functions are in the same .o file? Look at the output of nm -o libmy_function.a above. On the left of each line is the name of the library and the object file that contributed that function. You can see that hello_world() came from print_int_2.o and that print_int() came from print_int.o.
Running readelf (with almost any option) displays the functions (and variables) that came with an object file.
my_function# readelf -c libmy_function.a | more Index of archive libmy_function.a: (3 entries, 0x18 bytes in the symbol table) Binary print_int.o contains: print_int Binary print_int_2.o contains: hello_world |
In a later section (mess) we will see the linker handling object files in which the same function name is used in different object files.
I said earlier that functions are stored usually in alphabetical order in libraries. Are they stored in alphabetical order within the constituent .o files, or globally in alphabetical order? We can't tell with our example here, which has only one function in each of the two *.o files. You could write a library with a whole lot of *.o files with different functions or you could find out a faster way. How else would you find out [18] ?
What determines the order in which the object files are put into the library? (I don't know - if you change the order of linking of print_int.o, print_int_2.o, they're still included in the same order in the library file.)
Why are (static) library files collections of object files, rather than collections of functions? I don't know. Perhaps it's so that you can have static functions (functions that only can be seen within an object file). We'll learn about static functions later.
End Lesson 7
Return to the directory my_function.static.
In this section, you will modify your Makefile to install the library (i.e. copy the library to one of the standard locations, usually /usr/lib). Other builds, needing this library, will look for it among the standard locations. Install the executable (also called the binary) (i.e. copy the binary to one of the standard locations, for user utilities this usually is /usr/bin).
The default target (usually all) compiles all the code. Traditionally, installation is not one of the default actions of a Makefile. After the compile finishes and the user is happy with the results, the user runs make install as a separate action.
Is the target install a prerequisite for the target all [19] ?
To enable your Makefile to do the install, add
Installation of the library and the executable are separate events. Update your Makefile so that the user can choose to install these separately if they want.
You'll need new .PHONY targets install_lib, install_bin and install. Assume that the default install is to install both the binary and the library (what then are the prerequisites for install?). Put these install targets above clean: with install the first in the sequence of install targets (this is for readability - the user will look for the root rule of the install tree first).
Since the install targets are .PHONY (always out of date), the command for the install targets will always run. What's the word that describes whether commands run, whether or not they've just been run [20] ? Since you can't control the idempotency of a .PHONY rule, write a command for the rule, that will only run once, even if invoked many times.
The executable, as produced by the Makefile, has the name my_function.static. Make sure the installed version has the name my_function.
You will need to be root to do the installs. Check that clean actually cleans. After cleaning, check that make install also does the compile before installing.
End Lesson 8
I've never seen an uninstall in Linux, although I know of no reason why you can't have one. If people want a clean system (one with old programs removed), everyone seems to reinstall from scratch. You can't accumulate any useful programs that way - distros only have the basic programs to boot your computer and to put up a flashy window manager.
To uninstall, your Makefile has to do the following
Write an uninstall rule for your Makefile. Is the command for the uninstall rule idempotent? Here's my uninstall rule [21] . You could write a rule to uninstall the library and executable separately, but I don't think it's worth the bother. You will need to add the target uninstall to what line [22] ? If there was a traditional place for the target uninstall I expect it should go just below the rule(s) for install.
Here's my Makefile [23] .
The target clean removes object files, libraries and executables. If you download a package from the internet, usually you will need to set up Makefiles particular for your machine (build static as well as the dynamic libraries?; where to install?). configure is often used to do this setup. The files that you've set up, to compile the package, are not removed by make clean in case you want to rerun the compile. To remove these setup files, you use the target distclean (distribution clean, i.e. restore the directory to the state it was when you downloaded it).
We don't have setup files, but we do have the *.sav,*.sav2 files. If you want to remove them, you could use the distclean target (distclean should have clean as a prerequisite). Here's my distclean rule [24] . Check that distclean runs the command(s) for clean.
Note | |
---|---|
People often keep copies of files they need for a short time as filename.ext.sav. This is in case your current idea doesn't work out and you need to revert to earlier code. Because of this convention, it would be better to only erase $(SOURCE_FILES).sav[2], rather than using a wild card, but we don't have a list of the source files in the Makefile. Just don't keep copies you want of Makefile as Makefile.sav. |
Just in case you've lost track, here are our source files [25] .
On building outside the source tree.
Our build is done in the source directory. Gradually, starting about the year 2000, builds have changed over to building outside the source tree. In this case the source files will be in their directory tree, and the object files, libraries, and executables will be in a separate tree, with a similar (or identical) tree structure to the source directories. Now you can expect a set of downloaded source files, to be able to build outside the source tree if you want (you do). This advance is possible mostly by the use of configure which automatically sets up (writes) the Makefiles according to options you pass to it.
Although I haven't seen an explanation or reason for the new method, I can imagine that these factors are involved
- You aren't working in the source tree, so a keyboard mistake can't erase or change a file.
- A mistake in the Makefile can't write or erase anything in the source tree
- When you're finished, you can be sure that all your build files are gone - you just erase your build directory.
- You can do multiple simultaneous builds from the same source tree (I've never done this).
Building outside the source tree is safer and is now the current best practice.
Shortly you will be building dynamic libraries. These are produced from *.o files with the same names as the *.o files that build the static library. However the two lots of *.o files are different (the dynamic *.o files are built with the -fPIC flag) and the Makefile can't differentiate them. One solution to this problem is to build the two libraries in separate directories. Since it's regarded as good practice to build outside the source tree, let's build the static library in its own directory. Later we will do the same thing for the dynamic library.
Copy the files in my_function.static to a new directory my_function-2.0, which will now have Makefile, main.c, print_int.c. Copy Makefile to a subdirectory static and continue to work there.
In build speak, for the static build, we have the top directory and the build directory (which is called static). Other directories are possible e.g. man for man pages and src for source files. (Currently the source files are in the top directory, but we'll move them to src later.) When you download a package from the internet, the top directory will be the name of the package and a version number (here my_function-2.0). Inside the top directory will be various directories and a Makefile(s) (or configure files).
After cd'ing to the build directory static, your source files are in the directory above (top directory). Modify the Makefile to produce its output files in the current directory (the build directory). (You could just run make and see what doesn't work.) Here's a hint if you're stuck [26] . Here's the changes to my Makefile [27] . Here's the whole Makefile [28] .
If the Makefile doesn't do what you expect, or you don't understand what it's doing, you can use the -d (debug) option, which shows the steps make is taking as it works through the Makefile. Since you're likely to be overwhelmed with the amount of output, you can redirect the output to a file and review the file with your editor. Run this command.
my_function-2.0# make -d > foo.out |
Note | |
---|---|
I output to files with an obvious temporary name e.g."foo.out". If I find a file with "foo" in the name, I know I can delete it, without first having to look at it. |
The output shows make attempting to make Makefile. First make checks pattern rules and then implicit rules. Why is make trying to make Makefile [29] ? This behaviour of make is not a bug; make is supposed to go up its dependency tree looking for targets for its prerequesites. We've setup a rule for the target *.o files, but we haven't setup a rule for the target Makefile. The behaviour we're seeing is make doing its best to be make. Tell make that it doesn't have to make Makefile. Here's two ways I found [30] .
I assume the .PHONY method is the one you're supposed to use. Fix your Makefile to not make the prerequisite Makefile.
Rerun make -d >foo.out to see if make is now running as expected. Here's my output.
. . (correctly handles Makefile) . Considering target file `all'. File `all' does not exist. Considering target file `libmy_function.a'. File `libmy_function.a' does not exist. Considering target file `print_int.o'. File `print_int.o' does not exist. Looking for an implicit rule for `print_int.o'. Trying pattern rule with stem `print_int'. Trying implicit prerequisite `../print_int.c'. Trying rule prerequisite `Makefile'. Found an implicit rule for `print_int.o'. Considering target file `../print_int.c'. Looking for an implicit rule for `../print_int.c'. . . (looking for ways to make ../print_int.c) . |
We see that make correctly handles the prerequisite Makefile (make does nothing), and next correctly handles the pattern rule for print_int.o finding the prerequisite ../print_int.c. Then make looks for ways to make the target ../print_int.c, going through the pattern and implicit rules. make doesn't have to make ../print_int.c anymore than it had to make Makefile. Find a way to stop make from making ../print_int.c. Here's my method [31] . Fix your Makefile so that it doesn't treat ../*.c files as targets.
Here's my .PHONY rule [32] . This will stop remaking any ../*.c file, not only your source ../*.c files. This is fine in this case here, but sometimes a .c is produced from another file (using yacc). So you might want to remake a ../*.c file. Here's a better version of the Makefile (suggested by one of the students)
SOURCES = ../print_int.c ../main.c . . .PHONY: ... $(MAKEFILE) $(SOURCES) |
Here's the output of make -d (seen by redirecting the output to a file with make -d >foo.out) [33] . You can see a 1-to-1 mapping between the expected actions of the Makefile rules and the debug output.
Note | |
---|---|
I rarely see Makefiles where the prerequisites *.c, Makefile are handled. Most Makefiles let make exhaust its implicit rules, before going on to the next prerequisite. There's no real penalty for this, as the compile steps always take longer than running the Makefile, so speeding up the make process doesn't help much. However if you want to debug your Makefile, it's easier if you don't have to wade through the output of make working its way through unneccessary rules. I'm just telling you the right way to do things. You can do it the way "other people" do it if you want. However while "other people" continue to blow up Space Shuttles, dump greenhouse gases into the atmosphere and routinely manage large software projects to be overbudget, late and buggy, you should be selective about the "other people" you choose emulate. You saw in erroneous errors how implicite rules can make anything (there foo.o). You don't want that happening here. It will be reassuring, when you're reviewing a code package with 106 lines of code, to know that all prerequisites in the Makefiles are handled by explicit rules. |
While writing and testing the code, you will be working in the build directory my_function-2.0/static (and with the Makefile in the build directory). When you put the code out for anyone to use, they will only run the Makefile in the top directory my_function-2.0. You now have to change the Makefile in the top directory for users to use. The job of the top directory Makefile, is to run the Makefiles in the various subdirectories (we only have static so far). The user is only interested in running the complete compile and the install (and possibly clean). Thus the upper directory Makefile doesn't need all the targets of the build directory Makefile; it only needs the targets all, install, uninstall, clean, distclean. If the user wants any finer control, they can go back to the Makefile in the subdirectory.
Some of the targets for the Makefile in the top directory will now be handled by the Makefile in the build directory. Edit the relevant commands in the top directory Makefile to call the corresponding target in the build directory Makefile. Here's a start.
all: static_build static_build: cd static; make; cd - |
With this modification, from the top directory, check that the build runs. Since you don't have a revised clean target in the top directory Makefile yet, cd to the build directory and run make clean.
Demonstrate this problem in class, and find the solution for homework.
Go back to the top directory and edit the two rules above, to change static_build to static. Now run make. It tells you "Nothing to be done for 'all'". It would appear that all files are made, but you just ran make clean and when you check, there are no object files in the build directory. What's happened (it's another helpful error message from make) [34] ? What's the fix [35] ?
Let's look at this problem some more. Here's the debug output before the fix [36] . The debug output shows that make thinks the target static is a file rather than an internal label.
Why does make choose to do nothing ("no need to remake target `static'") with the target file static. If you look in the gnu make documentation (http://www.gnu.org/software/make/manual/make.html), you find that remake is what commands do. If the prerequisite(s) are newer than the target file, then the command(s) run to remake the target. The only reason not to remake the target file is if it's newer than the prerequisite(s). In this case there are no prerequisites. We conclude then that a target file is always newer than no prerequisite (i.e. a target file which has no prerequisites, is always upto date).
What happens when we run the fixed version? Here's the before and after
before: Considering target file `all'. File `all' does not exist. Considering target file `static'. Finished prerequisites of target file `static'. No need to remake target `static'. after: Considering target file `all'. File `all' does not exist. Considering target file `static_build'. File `static_build' does not exist. Finished prerequisites of target file `static_build'. Must remake target `static_build'. mkdir -p static |
In the fixed (static_build) case, the target static_build does not exist (it's not a file; it's a .PHONY target); and there are no prerequisites. The next action is that the target is remade. From this we conclude that target that doesn't exist is always out of date, even if there's no prerequisite(s).
Conclusions:
You can probably rationalise that these two rules are required if the role of make is to create files that don't exist in one situation or in another situation to do an action everytime, no matter what dates are on files. (I'm sure there are similar rules with sets: is a NULL set newer or older than a set that exists?).
Note | |
---|---|
I wouldn't make a lot of effort to commit these rules to memory. You need to be able to write a Makefile, which will usually mean just hacking your last Makefile. You also need to understand enough to modify it. These two rules aren't written down anywhere; they're a part of the .PHONY directive's job to get make to work. If you understand what a Makefile has to do, then when you make a mistake, you should be able to fix it quickly enough. |
For the commands in the previous section, why didn't you write the following?
static_build: cd static make cd - |
Change your Makefile to run the commands on separate lines (as above) and see what happens.
What went wrong? It turns out that each line (there are 3 lines in the box above) of commands, is run in a separate invocation of bash. There is no memory or maintenance of state between lines. Thus in the box above, a bash shell would be invoked just to run cd static. After cd'ing, the bash shell will exit back to the parent make process, which is running in the directory of the Makefile. Then a new bash shell is invoked to execute make and you're stuck in a loop. To see that you don't change directories with the cd command, try this (pwd is print working directory).
static_build: cd static pwd make cd - |
Here's the output
my_function-2.0# make mkdir -p static cd static pwd my_function-2.0 #this is the output of pwd make |
The output from pwd is the directory you started in my_function-2.0. You are not in the static directory, even though the previous command was cd static.
The correct version of the make command line is
cd static; make; cd - |
It has cd - as the last bash command. Why is it there? After cd - finishes, the bash process (running the make command line) exits. (So it would not have mattered what directory the you'd cd'd to.) The process running then is the parent (make) process and it's running in the original directory in which make was invoked (no matter what directory you'd cd'd to).
Note | |
---|---|
make can't cd. Only bash can cd. When any make command exits, the make process will always be in the directory it was invoked. |
The bash command cd -, at the end of the command line, isn't neccessary; However it's a good idea; The code is written this way, so it can be transplanted (swiped with your mouse) without side effects. Code that changes state (e.g. cd into a directory), should reverse those changes before exiting, so that the calling process is returned to the original ground/environment/directory where it was working, before made the call. When editing, it's nice to to be able to grab a make command and not have to worry whether it's an amputated set of commands that only work inside make. (Remember - debugging is harder than writing the code in the first place. You write code that's safe under all circumstances, even at the cost of running an extra command now and again.)
End Lesson 10
Since we've got an infinite loop, let's see how it works. In the top Makefile, change the static_build rule to this (run mkdir static to create the static directory, and comment out lines so that it looks like this).
static_build: cd static make |
When you first invoke make, there will be only one make process running, called the parent (make) process. After executing the 2nd command line above, a new (make) process, called the child process will be invoked. The parent (make) process will still be running, but it isn't doing anything; it's waiting for the child process to return. The child process will return after its subprocess(es) have returned. Since the OS has no idea when the child processes will finish, the parent is put to sleep (the parent is still in memory, but not doing anything; if nothing else happens, in about 5 mins it will be swapped out - i.e. the process image will be written to disk). When the child process reaches the make command, a 3rd make process will be started and the 2nd make process will be put to sleep.
In this code, there's no way for the subprocesses to return, as the current child process (the last in the chain) will continue to create a new last child process. The number of processes will grow linearly (1,2..n) with time, until the computer runs out of resources e.g. memory, or the limit is reached for the number of processes that the user can run at once. Here's the limits on my machine (here unlimited means "all the machine has").
c_class# ulimit -a core file size (blocks, -c) 0 data seg size (kbytes, -d) unlimited file size (blocks, -f) unlimited max locked memory (kbytes, -l) unlimited max memory size (kbytes, -m) unlimited open files (-n) 1024 pipe size (512 bytes, -p) 8 stack size (kbytes, -s) 8192 cpu time (seconds, -t) unlimited max user processes (-u) 7168 virtual memory (kbytes, -v) unlimited |
A process which invokes itself in a way that the child process never returns is called a fork bomb
How does the fork bomb process get killed? When a process is invoked, it requests resources e.g. memory. If a fork bomb is running, then eventually the machine will run out of memory, and the OS will deny the request for memory. The process has the choice of handling the denied request (there isn't a whole lot it can do, except wait, which doesn't get you much), or doing nothing about the denied request (probably the default for most unix programs). If the process does nothing, without memory, it can't continue and it will be killed by the OS. In our setup here, when the child is killed, and the child will return (breaking the non-returning child process which keeps the loop going) and the whole fork bomb chain will exit from the bottom up, and the machine will recover.
The utility killall kills processes by name (kill uses the PID - the process identification number, which you see in the output of ps). However killall needs the PIDs to work and searches the process table for the PIDs for each process name. For a large number of processes, the search will take longer than the time for a make process to spawn another child process. When killall exits, it will have killed all of the make processes that were running when it was invoked. However it's likely that the last child process (the very bottom of the chain) will spawn another child between when killall finds its list of PIDs, and when killall kills what was the last child when it ran. In this case a new chain of makes will have started, that killall didn't know about. Eventually, with enough repetitions of killall, killall will find the last child and the forkbomb will exit.
If you're filling up the machine's memory with a fork bomb, then other processes running (like from other users), are likely to find their requests for resources denied too. The OS will kill any process that can't start for lack of resources, whether it's the fork bomb or a program being run by another user. As well as memory fills up and resources are taken, other processes will run slowly. You might find the machine doesn't respond to the keyboard anymore.
We're going to run the fork bomb and watch it. You don't want to do this on a machine that anyone else is using. You also want to save all your files before starting it. Open another window and run top. Show only the user's processes (hit 'u' and then enter your username). You will watch the number of make processes here.
Another parameter of interest displayed on top is the load average (displayed at the top right). The load average is the number of processes that are not running (due to lack of resources). You should see this number increase as the fork bomb forks subprocesses. Top also displays the amount of memory and swap space available. The machine won't slow down much till you fill memory and start writing to swap.
Open another window and line up the command
my_function-2.0# killall make |
but don't hit <cr> (yet). If you get in a tight spot and want to get out of the fork bomb, you'll move the mouse to this window and hit <cr>.
Open up another window, in case you want to be able to do anything else (the machine may not be able to launch a new window while the fork bomb is running). To keep track of the number of make processes, you can repeatedly (use the up arrow) run
my_function-2.0# ps -elf | grep make | wc -l |
to count the number of make processes.
When you start your fork bomb, it could take seconds to minutes for the machine to slow to a crawl. When the mouse starts to get slow, you won't have too much time before the mouse is useless. If you like, you can let the machine go to the bitter end to see if the OS can save it or you can kill it with the prepared kill command ready in its window. On the machine we tried in class, the OS killed the chain without our intervention. Now start your fork bomb.
This isn't a particularly agressive fork bomb; each iteration doesn't use much in the way of resources and it will take a while to cause any problems. There is only one chain, and once killall has killed the last child in the chain, the whole chain will exit from the bottom.
Try this variation
static_build: cd static make & make |
The & tells bash to launch the make process in background and for bash to return, allowing make to move to the next command line (where another make process is invoked). The load average increases much faster with this variation. Why? Each parent make process launches two child processes. The number of processes with each iteration is 20, 21, 22 ... 2n. This is an exponential (rather than linear) fork bomb.
Expect this fork bomb to bring your machine to its knees. After only 10 iterations, there will be 210=103 child processes, each one of which is capable of starting a bifurcating chain. If killall misses even one of these processes, the chain will keep propagating. On the test machine, when we tried this in class, the OS kept killing make processes, presumably whenever they exceded one of the ulimits. killall wasn't capable of killing off the branching chain and we had to recover the machine by power cycling it. It would have been better to have another window, owned by root, with the command init 0 lined up and ready to hit <cr> when we moved the mouse into that window.
At the end of this lesson, a student pointed me to Core War (http://en.wikipedia.org/wiki/Core_War).
End Lesson 11
Change your Makefile so that the expected targets for the top directory Makefile all work. Check with make -d to show that your Makefile isn't running through implicit rules for prerequisites or targets that don't exist. Here's my top directory Makefile so far [37] .
When you download a package, the build directories will not exist and the Makefile(s) will either be in the top directory or will be created by some script in the top directory. We'll use the top Makefile to make the build directories.
In the top directory, move static/Makefile to Makefile.static (i.e. Makefile.static is in the top directory). Now remove the directory static (it should be empty). In Makefile, change the command for the target static_build to create the directory static and to copy Makefile.static to static/Makefile.
Note | ||
---|---|---|
mkdir is not idempotent
(if the directory already exists, rather than shutting up and allowing the next command to act,
mkdir complains and issues an error return code, thus forcing make to exit).
Add this line to the utility section at the top of your Makefile.
|
Here's the modification to the top Makefile [38] . Update as many other rules as you can using the commands for the static_build rule as a template.
In the Makefile in the top directory, why doesn't the target static_build have a prerequisite, while the target install does [39] ?
End Lesson 9
The rules work if we do them in the standard work order (all, clean, distclean). However a user should be able to make any target in any order and have the Makefile do something sensible.
What happens if you run make distclean then make clean? What's the problem [40] ?
We have to fix the clean rule to handle the non existance of build directories. make has no tests for the existance of directories, so we use bash to handle it. Here's a stand-alone bash script that tests for the existance of a directory.
#! /bin/bash # test_build_subdirectories.sh if [ -d "static" ] #test if "static" is a directory then cd static echo "directory exists" cd - else echo "directory doesn't exist" fi |
Create this file with your editor (type it in by hand, rather than swiping it with your mouse, so you get practice at writing scripts). Next make it executable and then run it.
my_function-2.0# chmod 755 !$ chmod 755 test_build_subdirectories.sh my_function-2.0# ./!$ ./test_build_subdirectories.sh directory doesn't exist my_function-2.0# |
When this script is run from the command line, a bash shell is invoked (the first line of the script) and then the subsequent bash commands are fed to this (single) shell. In contrast with make, a separate bash shell is invoked for each command (i.e. for each line; there is no memory or state carried over between make command lines). Thus each make command has to be a whole bash script. In bash, you run multiple commands on one line by putting a ";" between commands. Here's the one line version of the script above
if [ -d "static" ]; then echo "directory exists"; cd static; make clean; cd -; else echo "directory doesn't exist"; fi |
Swipe this line with your mouse and run it from the command line to show that it's the same as the multiline version. The whole line runs from your current bash shell (i.e. from a single invocation of the shell).
Modify the commands in the clean rule to first test for the existance of the build subdirectories (the echo commands in the demo script aren't needed). Check that you can run make distclean; make clean multiple times.
In the top directory Makefile, add a command to the target distclean to remove the directory static and its contents. Here's the modified rules in Makefile [41] .
If you used the commands from the template given for the static_build rule above, then the uninstall rule in your top Makefile may look like this.
cd $(STATIC_BUILD_DIR); make uninstall; cd - |
After running make distclean, run make uninstall to remove the installed files as well. What caused the failure [42] ? What's the fix? This is where the linear and continuous behaviour of math breaks down and computing starts to look like the lawyer's or IRS form maker's world. You could
The last option is the least messy and easiest to maintain.
Here's my uninstall rule in the top Makefile.
uninstall: make -f Makefile.static uninstall |
Here's my top Makefile [43]
Man pages are written in a formatting language called troff (http://en.wikipedia.org/wiki/Troff) (pronounced T-roff for typesetting roff - if you want to know about roff look at the wiki entry).
Note | |
---|---|
on my machine it's groff (i.e. gnu roff) |
Note | |
---|---|
In Unix, files are produced in two steps
|
The display of man pages uses less as the pager (the program formatting the output for your screen). Not too long ago, man pages were shown in an 80 column fixed width font format and assumed a DEC terminal, usually a vt220, for display. In the early days of computing, all the terminals were made by the now defunct DEC and if you wanted to format the contents of your file, your text had to contain DEC terminal commands (called "escape sequences") to turn on/off bold, start new paragraphs, tab etc.
Note | |
---|---|
You can see the type of terminal you're using by issuing the command echo $TERM. On Linux, it will probably be xterm (or linux), the X-window emulation of a DEC vt220. |
Note | |
---|---|
In the history of computing, DEC is well regarded by technical people. In an era when IBM owned the mainframe market and treated their customers ruthlessly, DEC was run by technical people, for technical people. DEC opened up the market for mini computers by telling everyone that they shouldn't spend all their money on a mainframe, when a mini computer would do just as well. (The classic entry into a well established market, is to come in at the bottom with a cheap product, and eat the dominant player out from underneath). Consequently places without a lot of money (e.g. Universities) bought DEC computers (e.g. the PDP-8), and most CS students cut their teeth on DEC hardware. Because of the availability of an idle machine, Unix was developed on a PDP-11. (It's possible otherwise that Unix would not have been written.) A mainframe would have been too expensive to let two researchers work with for a year. DEC developed virtual memory. Over a decade or two, the work world was filled with people familiar with DEC hardware. DEC foundered when control was wrested from the technical people by the managers, who then brought in a previously unneeded layer of their buddies, middle level managers (all ex-IBM'ers, the ones IBM was only too glad to be rid of). These outsiders, directed the company to non-technical targets that only middle level managers know about (going to meetings). Over a decade, DEC was pushed inexorably to its death. By comparison, technical people were ambivalent about Sun, although it produced good hardware and software. Scott McNealy, Sun's CEO is famous for his regard for users and the general public, by saying in 1999 "You have zero privacy anyway. Get over it." (quoted from Scott McNealy http://en.wikipedia.org/wiki/Scott_McNealy) Microsoft is regarded by technical people, as the evil empire, for its business practices. IBM has funded a lot of Linux development, going some way to reforming its image from the 1960s. Still it appears there are large divisions in IBM, with some people wanting to stay the old way, and other people wanting to assist open source software (it saves IBM development costs, and IBM only wants to deliver services, not code). If DEC had survived till today, it's not obvious how it would be regarded by technical people. Technical people have flocked to the GPL world and it's likely that DEC would have been abandonned by them. However technical people don't count for much in the world of business, otherwise Microsoft would have gone a long time ago. |
Although troff was used to produce documents and theses in the early days of Unix, it's only used for man pages now. No-one learns troff anymore either and if you want to write a man page, you grab a sample man page and edit it using the monkey-typewrite-sonnet principle, and using man to view your file (this is called "coding by theft").
Note | |
---|---|
If you look in /usr/man you will see about 10 man page directories man1..mann, one for each section of the man pages. Info about the contents of each section of man pages can be seen at man page (http://en.wikipedia.org/wiki/Man_page). man1 contains the man pages for user applications. |
Pick some short man page in /usr/man/man1 and copy it to man/my_function.1, i.e. to a new directory in your top directory.
Note | |
---|---|
There can be a directory doc in the top directory. This is usually for documentation in ascii format, while the man directory is reserved for troff formatted man pages. |
Note | |
---|---|
Man pages can be compressed e.g./usr/man/man1/foo.1.gz, to save disk space (from back in the days when disks were expensive), and when viewed, are decompressed on the fly. (On a relatively slow machine, see how long it takes to load a large compressed man page e.g.man bash. Most of the time taken to display the page is spent decompressing the file.) If you only have compressed man pages, copy a short one to your project's man directory and uncompress it with gunzip. |
Let's say the original file was foo.1. First, in the file, change all occurences of foo to my_function and FOO to MY_FUNCTION, then look at the output with man ./my_function. Then (one step at a time, in case you mess up the troff formatting commands, and using another window to view your updates with man ./my_function) change
You can guess at the meaning of the formatting commands in the man page if you like. As an exercise, see if you can produce the my_function.1 man page without trying to figure out what they're doing.
Here's my my_function.1 [44] .
In Makefile in the top directory, update the targets install, uninstall to handle the man page (you'll need another directory in the #directories section). Check that man my_function displays your my_function man page.
While static linking is conceptually a simple process, dynamic linking requires support from utilities and the OS at link and at run time.
Note | |
---|---|
Dynamic library files are also called shared objects (or shared libraries), and have the extension .so. It seems that the name shared is more common now. Possibly dynamic is an older name going out of favour, which was originally used to contrast with the name static. Everyone will understand either term dynamic or shared, when used to describe libraries. |
As previously noted, both dynamic libraries and static libraries are built from object files with the same name, but having different formats (the object files used for the dynamic libraries have been produced with the -fPIC option). It's difficult to use the same Makefile to produce the two different libraries in the same directory. In fact it's hard to do it even with two different Makefiles in the same directory, as make can't tell if the .o files are for the dynamic or the static library. Usually you produce the two libraries in separate build directories, using the same source files in a source directory.
cd to the directory my_function-2.0. You will construct the dynamic library in the top directory. Then in the next section you will move to the directory my_function-2.0/dynamic and write a Makefile to make the library.
The dynamically linked executable is linked against a particular version of the dynamic library (i.e. it can't use a different version of the library). The version required is called the soname (shared object name). The soname is derived from the full version information. The version information is usually 3 numbers; major version number, minor version number and another number (which has several names).
For class discussion purposes, let's say that we're writing library version number 2.0.1, and that all versions 2.0.x will be compatible, but that we know that version 2.1 (whenever we get around to writing it) will be incompatible with version 2.0. In this case the soname we give the library file is libmy_function.so.2.0. By declaring libmy_function.so.2.0 to be the soname, the author (that's us) is saying that we guarantee v2.0 to be incompatible with 2.1...2.n, (and we'll to use a different soname to differentiate them), and we're guaranteeing that all versions 2.0.x will be compatible. The executable then will be linked against the soname and not the library filename libmy_function.so.2.0.1. The soname is put into the dynamic library file libmy_function.so.2.0.1 by the linker. Later we copy the dynamic library file libmy_function.so.2.0.1 to /usr/lib and ldconfig will add a link from libmy_function.so.2.0 to libmy_function.so.2.0.1 making the new library file accessable by the soname.
Note | |
---|---|
If instead the author(s) had used libmy_function.so.2 as the soname, then ldconfig would have installed a link from libmy_function.so.2. By giving the library the soname of only the major version number, the authors are guaranteeing that verions 2.1...2.n will all be callable as v2. In this case the executable will be linked against the soname of libmy_function.so.2 and not v2.0. |
Rarely (nowadays) will the soname have no version requirement. i.e. the soname will not be libmy_function.so. To have libmy_function.so as the soname means that the author guarantees that all future versions of the library will be completely compatible with the first version. Programmers change their APIs all the time and would never promise that the calls to their functions will stay the same.
Note | |
---|---|
Programmers could keep their code backwards compatible by adding new functions whenever they needed a new functionality, rather than changing their current function. Thus if a new functionality for print_int() was needed, then a new function print_int_2() might be written, which itself would call the original print_int() for the original functionality and the new code for the new functionality. I expect Microsoft has to do it that way, to maintain backwards compatibility (people reasonably expect their programs from W95 days to install on WinXP). In Linux, the programmers don't want to maintain the old code and don't try to maintain backwards compatibility (or in some cases of sheer bastardry, deliberately break code to force people to upgrade). If you're the programmer, then you know enough about your code for this not to be a problem. If you're an end user, with more important things to do with your life than fix programs which don't run anymore, you're expected to install the latest version of some distro or recompile all your programs. Many of these programs don't know about the new libraries and won't compile. This is a real pain for users and a nightmare for vendors trying to maintain Linux applications. To handle this, vendors specify that their code is only supported on Distro_X, version_Y. With an infinitude of Linux distros, you can't expect that two arbitary programs you want to use, program X and program Y, will both run on your installation. The fragmentation of Linux is similar to the Unix Wars which lead vendors to abandon commercial Unices, to flock to the uniform and maintained APIs from Windows. I often hear comments from the Linux world, where people expectantly ask at the beginning of each year "will this year be the year that Linux conquers the desktop?" These people hopefully look for signs from above, like the penetration of Firefox into the user base, and plot the time till 50% of users have it on their desktop and proclaim with great certainty, that at some definite time in future (say 3 years), Linux will be victorious on the desktop. I've watched all this for 15yrs and no-one has learned the lesson of the Unix Wars: vendors won't use your platform if you make it incompatible with their code. You know what? the answer is "no". Linux is not even going after the desktop. Linux is for hardcore programmers only, for people who can handle a system that's constantly broken. Linux is not for the general public. Of course the reply from the Linux community is that they don't care: they're making the best code possible. Well OK, but you can't have it both ways; you can't have "the best code in the world" and expect people to use it too. To help handle this, I have kept all my libraries back since my first Linux install, I never scrub my old installation and instead upgrade by compiling where possible. I never upgrade a working program, no matter what the author's website says about how much better the latest version is. I wait till the program is thoroughly broken and then I load the new version onto a test machine incase the new version breaks something. Even then sometimes it takes a few weeks to find that something has broken and I can no longer remember what I did or which thing might have broken my machine and I can't find the broken code to back out the problem. After too many of these I have to install code from a distro. |
The program used to assemble the machine's dynamic library information is ldconfig. This is run at boot time and following the install of any dynamic library. ldconfig looks at dynamic libraries in the directories listed in /etc/ld.so.conf. Here's my ld.so.conf (go look at yours)
my_function-2.0# more /etc/ld.so.conf /usr/local/lib /usr/X11R6/lib /usr/local/BerkeleyDB.3.2/lib /usr/local/BerkeleyDB.4.1/lib /usr/local/BerkeleyDB.4.2/lib |
(the directories /lib and /usr/lib are implicitly in this list).
ldconfig does the following
my_function# dir /etc/ld.so* -rw-r--r-- 1 root root 158 Apr 26 2006 /etc/ld.so.conf -rw-r--r-- 1 root root 105111 Oct 16 20:00 /etc/ld.so.cache |
Let's make a shared library. We need object file(s) in a different (relocatable) format to the static library and we use the flag -fPIC (Position Independand Code).
my_function-2.0# gcc -Wall -c -o print_int_static.o print_int.c #static my_function-2.0# gcc -fPIC -Wall -c -o print_int.o print_int.c #dynamic my_function-2.0# ls -alFrt -rw-r--r-- 1 root root 350 Oct 25 15:46 print_int.c -rw-r--r-- 1 root root 932 Oct 25 23:24 print_int_static.o -rw-r--r-- 1 root root 1236 Oct 25 23:24 print_int.o my_function-2.0# nm -o print_int.o print_int.o: U _GLOBAL_OFFSET_TABLE_ print_int.o:00000000 T __i686.get_pc_thunk.bx print_int.o:0000002d T hello_world print_int.o:00000000 T print_int print_int.o: U printf print_int.o: U puts |
There are two new functions not in the statically compiled print_int.o; they are __i686.get_pc_thunk.bx, _GLOBAL_OFFSET_TABLE_.
Note | |
---|---|
For more on relocatable code, GOT (the global offset table) and get_pc_thunk.bx see Introductin to position independant code (http://www.gentoo.org/proj/en/hardened/pic-guide.xml) and Howto locate and fix .text relocations (http://www.gentoo.org/proj/en/hardened/pic-fix-guide.xml) |
Remember we're upto version 2.0.1 of this library. Next we need an soname (shared object name). We've decided that the library will be visible as version 2.0, so the soname is libmy_function.so.2.0. Tell gcc to create the shared object.
the format of the command is gcc -shared -Wl,-soname,your_soname -o library_name file_list library_list my_function-2.0# gcc -shared -Wl,-soname,libmy_function.so.2.0 -o libmy_function.so.2.0.1 print_int.o my_function-2.0# ls -alFrt -rw-r--r-- 1 root root 1236 Oct 25 23:24 print_int.o -rwxr-xr-x 1 root root 6063 Oct 25 23:30 libmy_function.so.2.0.1* |
Note | |
---|---|
Here library_list is the dynamic library(s) that the dynamic library itself needs; (in our case only -lc which gcc knows about by default). You could explicitly add -lc at the end of the command line, but the linker always goes to libc.so for unresolved externals (external function names), so the library_list is not needed in this case. (We'll learn about externals later.) |
The dynamic library is larger than the .o file or the .a file. I'll let you explore the .so file with nm at your leisure. The shared object needs to call libc.so (for puts()) and looks a lot like the dynamically linked executable a.out, which also called libc.so.
When you're compiling, you haven't done the install yet, so the dynamic library isn't in /usr/lib (one of the usual places for the linker to look for a library). To produce the dynamic executable, you link against the library you've just built (in the current directory). You haven't run ldconfig yet, so you don't have the link from a file with the soname to your library either. Can you guess how to generate a dynamically linked executable from here (i.e. linking to a dynamic library in the current directory)? Is the executable you produced a real dynamically linked executable? How would you know [45] ?
Yes, the executable is dependant on libmy_function.so.2.0 (the soname and not dependant on the filename). Notice
my_function-2.0# ldd my_function.dynamic linux-gate.so.1 => (0xb7fdd000) libmy_function.so.2.0 => not found libc.so.6 => /lib/libc.so.6 (0xb7e72000) /lib/ld-linux.so.2 (0xb7fde000) |
my_function-2.0# ./my_function.dynamic ./my_function.dynamic: error while loading shared libraries: libmy_function.so.2.0: cannot open shared object file: No such file or directory |
The version number requirement is written into the executable at link time.
Note | |
---|---|
Why isn't the version information that's part of the filename good enough [46] ? |
Before we can use this library, we have to copy it to a place that ldconfig knows about and run ldconfig (who do you need to be to do this?).
my_function-2.0# cp libmy_function.so.2.0.1 /usr/lib my_function-2.0# ldconfig -v | grep libmy_function libmy_function.so.2.0 -> libmy_function.so.2.0.1 my_function-2.0# ls -alFrt /usr/lib | grep libmy_function lrwxrwxrwx 1 root root 23 Oct 25 16:43 libmy_function.so.2.0 -> libmy_function.so.2.0.1* -rwxr-xr-x 1 root root 6063 Oct 25 17:10 libmy_function.so.2.0.1* |
Why did we run ldconfig after the install [47] ?
link:
There is filename linking and linking by the compiler.
link is an overloaded term in computing. (overloaded: having different meanings depending on context. In our day-to-day world, a person's name is overloaded - I accept any of these parameters when passed to my your_name(); "Joe", "Joe Mack", "Joseph Mack", my SSN#.)
- in compiling: linking describes assembling object files into an executable or library. The library itself is then linked with files to produce an executable.
- with filenames: a link to a file allows the target file to be called by another, usually simpler name. This is commonly used with files that have their version number(s) as part of their name. A link from libc.so to libc.so.6 allows libc.so.6 to be called libc.so.
Although we've already linked our executable to the dynamic library in the current directory, we now can link to the library in /usr/lib. Just so that we won't have to wonder if we've linked to the wrong file (the libraries in the current directory), move the libraries in your current directory out of the way.
Note | |
---|---|
Joe: I've rearranged the order of this class since I first wrote this. You won't have libmy_function.a in the directory. However for the exercise, copy the static library file from static. |
Run this command line bash instruction (use tab to fill in filenames as you construct the command).
my_function-2.0# for file in libmy_function.so.2.0.1 libmy_function.a; do mv $file $file.sav;done my_function-2.0# dir -rw-r--r-- 1 root root 1094 Oct 25 15:52 libmy_function.a.sav -rwxr-xr-x 1 root root 6063 Oct 25 23:30 libmy_function.so.2.0.1.sav* |
The command you just ran is a one line version of this
for file in libmy_function.so.2.0.1 libmy_function.a do mv $file $file.sav done |
Note | |
---|---|
You should learn to use bash at the command line, or you will be stuck with the same caveman interface that most everyone else is forced to use, where you point at a picture (icon) and grunt (click your mouse). (Thanks to Eben Moglen for the caveman metaphor.) |
Here's one way to link the executable
my_function-2.0# gcc -o my_function.dynamic main.o /usr/lib/libmy_function.so.2.0 my_function-2.0# ldd ./my_function.dynamic libmy_function.so.2.0 => /usr/lib/libmy_function.so.2.0 (0x40017000) libc.so.6 => /lib/libc.so.6 (0x40019000) /lib/ld-linux.so.2 (0x40000000) my_function-2.0# ls -alFrt -rw-r--r-- 1 root root 1094 Oct 25 15:52 libmy_function.a.sav -rwxr-xr-x 1 root root 8572 Oct 25 15:59 my_function* -rw-r--r-- 1 root root 932 Oct 25 23:24 print_int_static.o -rw-r--r-- 1 root root 1236 Oct 25 23:24 print_int.o #rwxr-xr-x 1 root root 6063 Oct 25 23:30 libmy_function.so.2.0.1.sav* -rwxr-xr-x 1 root root 8569 Oct 25 23:31 my_function.dynamic* my_function-2.0# ./my_function.dynamic 10 |
Note | |||
---|---|---|---|
I should not have had to explicitly designate the location of the dynamic library
for linking as I did above (I had to link against /usr/lib/libmy_function.so.2.0,
the flag -L/usr/lib didn't work either).
gcc should know to look in /lib and /usr/lib.
This was a 2.4.x kernel machine.
On another 2.6.x machine I added the link from libmy_function.so
|
If you're like me and you upgrade programs when you need to (rather than installing a new machine from scratch every year or two), you'll wind up with many different versions of your dynamic libraries and you will have executables linked against the whole range. Because of incompatibilities between the different versions of your dynamic libraries, you have to keep all your old copies of the dynamic libraries. This can be quite a lot of disk space. I have a dozen different versions of libc, going back to libc.so.4 from 1996. There is no point in trying to recompile my executables against the current version of libc.so; it would take 100's of hours and most old programs won't compile against the newer versions of the libraries - the calls are all different.
The alternative is to reinstall a distro from scratch, but if you're going to have to do that to get a working machine, you may as well run Windows.
Currently you are working in and your source files are in my_function-2.0.
Let's construct a Makefile for the dynamic libraries. Go to my_function-2.0 and make a directory there dynamic to build your dynamic library. Copy my_function-2.0/static/Makefile to my_function/dynamic. Follow the instructions below to modify it to make the dynamic library. Use your editor to do the following 's/static/dynamic/g', 's/STATIC/DYNAMIC/g'. Then we need the following
To uninstall, your Makefile has to do the following
Warning | |
---|---|
You should not uninstall a dynamic library if any installed executable links to it. The executable won't run any more. This uninstall does not detect this situation. You could use ldd to inspect all executables in $PATH and not proceed (or give a warning) before deleting an affected dynamic library. This would not be difficult to write, but would take a while to run. It might be better for the uninstall not to uninstall the dynamic library. |
Here's my dynamic/Makefile [48] . What's the "&" doing in the command ldconfig & [49] ?
Check with make -d that make isn't looking for implicit rules.
Copy dynamic/Makefile to Makefile.dynamic (do NOT overwrite your top Makefile). Modify the Makefile in the top directory to run the dynamic as well as the static build. Make sure the installed binary is the dynamically linked version (and not the statically linked). How do you do this and how do you check your result [50] ? Here's my top Makefile [51] . We're going to be looking at header files next, and then we'll return to libraries later in the section on storage.
The header file specifies the return and parameter types for each function and the type for each variable. This is part of (or may be all of) the API - the application programmer's interface - what a programmer needs to know to re-implement or modify functions in the code. Rather than leaving the declarations in the body of the code (where you can't easily find them), you expose the API by moving the declarations to header files.
Note | |
---|---|
Any function should be reimplementable from the contents of the header file (which specifies the return and parameter types for each function and the type for each variable) and the documentation that goes at the top of the function (which will appear in the man page). The reimplementer shouldn't need your source files to reimplement the functionality of your code. For maintenance, code has to be modular. In the distant future, it may be decided that a function needs to be recoded - maybe the current version is found to be too slow, or you have new hardware that can take advantage of some new feature. In this case, it should be possible to recode a function, without changing any other code in the package, and without even needing to know what the package does. |
Note | |
---|---|
for a tutorial on the preprocessor see C preprocessor http://www.cprogramming.com/tutorial/cpreprocessor.html. |
Note | |
---|---|
This section is ostensibly about including header files. However most of the section is about maintaining scope (deciding which information should be visible where) for the code that you include. You will see that most code can only be included once. |
Run make disclean in my_function-2.0 and copy the files to my_function.header and continue work there.
Here's our main.c again.
//main.c //Joseph Mack (C) 2009, released under GPL.v3 //illustrates function calls void print_int(int); int main() { int i = 10; print_int(i); return (0); } //-main.c------------------- |
The declaration void print_int(int) in main.c doesn't generate any code. It just tells the compiler about print-int() (the type of the return value, and the type and number of parameters), so that the compiler can do something sensible when it finds the call to print_int() inside main().
What if you wind up writing 100 files, all of which call print_int() and then you want to change print_int() to return an int rather than void? You've got a lot of editing to do, with the possibility of making mistakes. The solution is to include a header file. The traditional extension for header files (and the one gcc is expecting) is .h Here is your first attempt at a header file
//my_function.h //Joseph Mack (C) 2009, released under GPL.v3 //header file for my_function project void print_int(int); void hello_world(void); //-my_function.h------------------ |
You want the declaration of print_int() in the header file, visible in both .c files. To do this you use an #include statement in each of main.c and print_int.c, to include this header file. Here's my main.c
//main.c //Joseph Mack (C) 2009, released under GPL.v3 //illustrates function calls #include "my_function.h" int main() { int i = 10; print_int(i); return (0); } //-main.c------------------- |
and my print_int.c
//print_int.c //Joseph Mack (C) 2009, released under GPL.v3 //illustrates function calls #include <stdio.h> #include "my_function.h" void print_int(int x) { printf ("%d \n", x); } void hello_world(){ printf ("hello world!\n"); } //-print_int.c-------------------- |
Your source code is logically identical to the previous form, in which had the declarations were in the body of the source file(s). To check that the new form is logically identical, you need to compile the new code. The Makefile in the top directory is set up only to call Makefiles in the subdirectories. So instead, how would you compile from the command line, to check that your executable behaved as before [52] ?
The Makefile in the top directory isn't completely useless. Use it (unedited) to compile main.o [53] . Why did this work [54] ?
To see what happened on inclusion of the header file, look at the preprocessor output for the input main.c (run this yourself in case the line numbers in your version of the code are different).
my_function.header# gcc -E main.c >preprocessor.out my_function.header# cat !$ cat preprocessor.out # 1 "main.c" # 1 "<built-in>" # 1 "<command-line>" # 1 "main.c" # 1 "my_function.h" 1 void print_int(int); void hello_world(void); # 7 "main.c" 2 int main() { int i = 10; print_int(i); return (0); } |
The output shows the preprocessor reading main.c and finding the macro directing it to include my_function.h. Then the preprocessor continues reading at line 7 of main.c.
Note | |
---|---|
Some lines of the input file(s) do not produce output (e.g. comments and blank lines). |
Note | ||
---|---|---|
These output lines
are called linemarkers (and being valid C code, they are acceptable in the input C code file, although I don't know why you'd want to do that). (For more info on linemarkers and preprocessor output see Preprocessor Output http://gcc.gnu.org/onlinedocs/cpp/Preprocessor-Output.html). This particular linemarker says that the line of code came from line 7 of main.c and was added to the output after returning from another file (the "2" flag at the end of the linemarker). Linemarkers are used (among other things) to put information in the executable for the debugger. The debugger when stepping through the code can tell you, for any instruction it's executing, the line number in the original source file. Other code in the debugger can display the source code in that line. |
The output of the preprocessor is sent to the compiler to produce the *.o file. You could look at the output of the preprocessor with print_int.c as the input file, except that the output has the several 100 lines of stdio.h in it. First you'll have to wade through all of stdio.h before finding the text from my_function.h. We'll tackle this shortly.
Note | |
---|---|
Generally you should include your project's header file(s) in all your source files (i.e. in this case, in both .c files). (In a later section we'll write header files that are restricted to one source file.) The header file here is only needed in main.c, but it does no harm in print_int.c. You're mainly including your header in all source files for maintenance - having one header file for your project may be simpler. If one big header file isn't helpful, then by all means split it up. As your project gets bigger, you might find that the header file starts to include definitions needed by print_int.c. At least you should make sure that the header file is compatible with all your source files. |
In the section just above, you built the executable in the top directory just to try it out. In the final version, you want to use the build Makefile to compile the code. This requires the Makefile(s) to know about headers. Now copy Makefile.static to static/Makefile and change to the static subdirectory to continue your work. (You are now working in the build directory with the source (*.c, *.h) files in the directory above, the top directory.)
Note | |
---|---|
The source file(s) and the accompanying include file(s) are always together in the same directory. There is no incl or header directory under the top directory for include files, like there is for man pages |
In the build Makefile, we've already handled the location of the *.c file, by changing the target of the pattern rule to
%.o: ../%.c $(MAKEFILE) |
The first change to static/Makefile should be
%.o: ../%.c $(MAKEFILE) $(HEADERS) |
For this additional prerequisite to work, since you're in the build directory, $(HEADERS) must have the form ../file_1.h ../file_2.h. Add this to the variables section of your Makefile.
#header files HEADERS = ../$(EXEC).h |
The ../%.c in the pattern rule locates the *.c files, but no such thing exists for the header files. The problem remaining is for the preprocessor to locate the header file(s). When the preprocessor sees "my_function.h", does it look in the build directory or the top directory? To find out, we need the documentation. (I asked the computer first, but the answer wasn't as clear as the one in the documentation.)
C Preprocessor: Header Files; Search Path http://gcc.gnu.org/onlinedocs/gcc-3.0.1/cpp_2.html#SEC8.
GCC looks for headers requested with #include "file" first in the directory containing the current file, then in the same places it would have looked for a header requested with angle brackets. For example, if `/usr/include/sys/stat.h' contains #include "types.h", GCC looks for `types.h' first in `/usr/include/sys', then in its usual search path.
The answer is that the preprocessor looks for the header file in the same directory as the source file containing the #include statement, i.e. in the top directory.
Here's the output from the preprocessor when run from the build directory.
static# gcc -E ../main.c >preprocessor.out static# cat !$ cat preprocessor.out # 1 "../main.c" # 1 "<built-in>" # 1 "<command-line>" # 1 "../main.c" # 1 "../my_function.h" 1 void print_int(int); void hello_world(void); # 7 "../main.c" 2 int main() { int i = 10; print_int(i); return (0); } |
The preprocessor looks for and finds the include file ../my_function.h in the directory above the build directory, the same directory as the source file (and in the directory where everyone traditionally puts it).
So if we hadn't thought too hard and just let the Makefiles run, we would have got a compile. (We did need to make $(HEADERS) a prerequisite for most rules, but even if we hadn't done that, we would still have got a compile.) However now we know that/why it works. Modify your Makefiles so that $(HEADERS) is a prerequisite for relevant rules and check that you can do the normal make operations. Check with make -d that make isn't running unneccessary implicit rules.
Your static/Makefile is OK. Copy it to the top directory as Makefile.static.
Now modify the top Makefile to handle the modified build Makefile. Is the variable $(HEADERS) required in the top Makefile to do the build [55] ?
If someone is writing code that links to your library, they must know the declarations of the variables and functions in the library. If you install a library, then you must also install the header file(s), which have the declarations of the variables and functions in that library.
You don't install the header file for non-library parts of the executable; in our case, this would only be main(). If you're installing only an executable (and no libraries), then you don't install the header file(s), only the man pages for the executable.
The header file is installed in /usr/include. If the project is big enough and there are a sufficient number of header files, they will be installed in /usr/include/package_name. To install the header files, add this line to the variables section of your Makefile(s) and modify the install target.
#directories INSTALL_INCL_DIR = /usr/include |
For the target uninstall make sure your command will work if $(HEADERS) contains multiple files.
See include guards (http://en.wikipedia.org/wiki/Include_guard).
Run make distclean in my_function.header and copy your files to a new working directory my_function.include_guards. We're going to be making a new set of files here. Use Makefile.static as your Makefile (and remove the other Makefiles). You are now building in the top (and only) directory. Change the location of files in Makefile to be the current directory. You could just compile a straight executable, without building a static library if you like, but since the Makefile is setup to build a library, it's probably just as easy to leave it this way.
Your current header file works fine for small projects. In a large project, a source file may have many header files, any one of which can include other header files. Many of the include files may be written by other people and you'll have no control over them. Possibly there will be multiple calls to include a particular header file. The compiler has no problem seeing the same declaration many times. Try this modification to your include file and recompile your executable. There will be no errors or warnings.
//my_function.h //Joseph Mack (C) 2009, released under GPL.v3 //header file for function project void print_int(int); void print_int(int); //-my_function.h------------------ |
Change one of the declarations of print_int() to return int. You've already seen the consequences of this mismatch. Here's my_function.h [56] and here's my output [57] . You see the same error message for each .c file that includes this header file.
While the compiler can see the same declaration many times without problems, the definition is another matter. The compiler can only see the definition once (without giving an error). With multiple includes, gcc could find the same definition for a variable (or function) several times. The concept of the One Definition Rule (http://en.wikipedia.org/wiki/One_Definition_Rule), (also called ODR), is part of the definition of the C++ language, but not in C. (The concept has been informally adopted in C.)
Note | |
---|---|
Why can the compiler handle the same declaration multiple times, but not multiple definitions? I don't know: possibly it's easy to check that the declarations are the same - it's just a list of function name, parameter types and return types. The definition could be long and presumably the compiler writers decided not to bother checking if each definition is the same. gcc could be smart and ignore subsequent identical definitions, but instead it aborts. (Why gcc has to abort in this case, but allow linking of files containing a version of print_int(), which returns int and another which returns void I don't know.) |
Let's see how in a big project we could violate the ODR. We're going to make a chain of files, where the each file includes the next. In computer speak, the series of files have the relationship child, father, grandfather... (from which we'll derive their names). The first step will have the child calling the grandfather (the father isn't involved yet).
Here we write the grandfather file my_function_grandfather.h.
Note | |
---|---|
You can define (as well as declare) things in a header file too (remember - a header file is just another piece of C code). This is done for complicated variables and small utility functions. You wouldn't normally define print_int() in a header file (it normally would be in a .c file), but we're going to do it here for didactic purposes. |
So far we've used the header file for declarations only. Nothing is (yet) defined in header files, so shift the definition of print_int() (swipe some code) from print_int.c to my_function_grandfather.h.
//my_function_grandfather.h //Joseph Mack (C) 2009, released under GPL.v3 //illustrates function calls #include <stdio.h> void print_int(int x) { printf ("%d \n", x); } //-my_function_grandfather.h---------------- |
my_function_grandfather.h now has the definition of print_int().
Now mv print_int.c to my_function_child.c and modify it like this.
//my_function_child.c //Joseph Mack (C) 2009, released under GPL.v3 //illustrates function calls #include "my_function_grandfather.h" //-my_function_child.c-------------------- |
my_function.c takes the role of print_int.c, in that it holds the definition of print_int() albeit through the inclusion of my_function_grandfather.h. (Change your Makefile to use my_function_child.c as a source file.)
Note | |
---|---|
presumably in a real project, my_function_child.c would still have other functions, but right now, there's not much in there. |
Here's what my_function_child.c looks like to the preprocessor (omitting the 850 or so lines of <stdio.h>).
# 1 "my_function_child.c" # 1 "<built-in>" # 1 "<command-line>" # 1 "my_function_child.c" # 1 "my_function_grandfather.h" 1 . . (850 or so lines omitted) . # 834 "/usr/include/stdio.h" 3 4 # 7 "my_function_grandfather.h" 2 void print_int(int x) { printf ("%d \n", x); } # 8 "my_function_child.c" 2 |
As far as the compiler is concerned, my_function_child.c is just #include <stdio.h> followed by the three lines of print_int().
We need a modified main.c
//main.c //Joseph Mack (C) 2009, released under GPL.v3 //illustrates function calls void print_int(int); int main() { int i = 10; print_int(i); return (0); } //-main.c------------------- |
The new main.c doesn't #include any header files, but instead declares print_int() in the body of the code.
Note | |
---|---|
We'll see the reason why this declaration was chosen for this exercise in the section on storage. There we'll see that extern print_int() would be a better instruction. |
We've moved the definition of print_int() from the .c file to the .h file. Here's the logical connections amongst the source files.
my_function_grandfather.h void print_int(int) { //code } | | | | my_function_child.c #include "my_function_grandfather.h" / / main.c / / void print_int(int) / / int main(){ int i = 10; print_int(i); } |
On seeing the call in main() to print_int(), the linker looks for the definition of print_int() amongst the various *.o files, and finds that my_function_child.o has print_int() (which was included from my_grandfather.h).
Logically nothing has changed. Show that your new code compiles and runs just as before.
my_function# gcc -Wall -o my_function main.c my_function_child.c my_function# dir -rw-r--r-- 1 root root 332 Oct 27 19:52 my_function_grandfather.h -rw-r--r-- 1 root root 256 Oct 28 15:16 main.c -rw-r--r-- 1 root root 354 Oct 28 15:17 my_function_child.c -rwxr-xr-x 1 root root 6384 Oct 28 15:17 my_function* my_function# ./my_function 10 |
Now let's include a header file twice. Here's my_function_father.h, which is going to be the project's main header file. It includes my_function_grandfather.h, a special header file whose only purpose is to hold the definition of print_int().
//my_function_father.h //Joseph Mack (C) 2009, released under GPL.v3 //header file for function project #include "my_function_grandfather.h" void print_int(int); //-my_function_father.h------------------ |
Note | |
---|---|
Notice the declaration of print_int() in my_function_father.h. You don't need it, since my_function_father.h already gets the same declaration from my_function_grandfather.h, but I'm putting it in, since we're following the principle that all variables/functions should be declared in the main header file. |
Here's the new include tree.
my_function_grandfather.h void print_int(int) { //code } | \ | \ | my_function_father.h | | #include "my_function_grandfather.h" | / | / my_function_child.c #include "my_function_grandfather.h" #include "my_function_father.h" / / main.c / / void print_int(int) / / int main(){ int i = 10; print_int(i); } |
How many time is the definition of print_int() included in my_function_child.c [58] ?
Now do your compile
my_function# gcc -Wall -o my_function main.c my_function_child.c In file included from my_function_father.h:10, from my_function_child.c:8: my_function_grandfather.h:11: error: redefinition of 'print_int' my_function_grandfather.h:11: note: previous definition of 'print_int' was here |
Note: the error message refers to a previous definition at the same line number. This is because the compiler read the same file (my_function_grandfather.h) twice.
Here's the output of the preprocessor (with the includes from stdio.h edited out).
my_function.include_guards# gcc -E my_function_child.c >preprocessor.out (edit preprocessor.out to remove includes from stdio.h) my_function.include_guards# cat preprocessor.out # 1 "my_function_child.c" # 1 "<built-in>" # 1 "<command-line>" # 1 "my_function_child.c" # 1 "my_function_grandfather.h" 1 . . 740 lines of stdio.h . # 912 "/usr/include/stdio.h" 3 4 # 7 "my_function_grandfather.h" 2 void print_int(int x) { printf ("%d \n", x); } # 7 "my_function_child.c" 2 # 1 "my_function_father.h" 1 # 1 "my_function_grandfather.h" 1 void print_int(int x) { printf ("%d \n", x); } # 7 "my_function_father.h" 2 void print_int(int); # 8 "my_function_child.c" 2 |
You can see the definition of print_int() is included twice.
The cure is to stop the definition being included more than once, even if you tell the code to include the definition many times. (You remember the name of the principle of only doing something once...) The construct used is called an include guard and is implemented with preprocessor conditional operators.
Here is my_function_graphfather.h with include guards.
//my_function_grandfather.h //Joseph Mack (C) 2009, released under GPL.v3 //illustrates function calls #include <stdio.h> #ifndef MY_FUNCTION_GRANDFATHER_H #define MY_FUNCTION_GRANDFATHER_H void print_int(int x) { printf ("%d \n", x); } #endif /*MY_FUNCTION_GRANDFATHER_H*/ //-my_function_grandfather.h---------------- |
What are the lines
#ifndef MY_FUNCTION_GRANDFATHER_H #define MY_FUNCTION_GRANDFATHER_H . . #endif /*MY_FUNCTION_GRANDFATHER_H*/ |
doing in the header file? First we have to understand the preprocessor conditional directives.
What does defined mean?
The compiler uses information from outside the source code to direct its actions. A single set of source code files is expected to be able to run on different machine architectures, possibly using different languages for I/O, possibly using a range of graphics packages to display on the screen, possibly having a sound card (or not having one), and this information on the hardware in the machine is passed at compile time to the compiler by compile line flags and defines.
Note Why don't we have separate source code for each target machine? It's a maintenance nightmare. If you have different source files for each target machine, and you need to change something in one implementation of a particular file, then likely you need to do it in all implementations of it. No-one can do this without mistakes and while maintaining their sanity. Writing a single piece of code that runs on multiple platforms isn't easy either, but hardware changes much faster than software and you must write platform independant code, if you're to be a success as a programmer. Hardware only survives for 3-5 yrs, but you should plan for good software to last 50yrs (long beyond when you'll be around to maintain it).
The hope is that once someone finds out how to compile for some new architecture, the information is put in a header file and hopefully, the inclusion of this header file will allow your code to compile on the new architecture (or at least will make porting to the new architecture a lot easier).
In practice, there is little difference between a flag and a define (apart from how they're written).
-Wall is a command line flag telling the compiler to write all warnings.
(defined variables are UPPERCASE by convention - they're global.) The line
#define MY_VARIABLEtells the preprocessor to put MY_VARIABLE in a table with the value 1
Note The convention in C is that 1 is true (actually the convention is that anything but 0 is true) and 0 is false. Unlike many other languages, in C there is no TRUE and no FALSE. In application languages, the thinking is that the language's representation of TRUE and of FALSE is no business of the programmer. However C programmers want to know what their machine is doing and they want 1 and 0 and not true and false. #undef MY_VARIABLE tells the compiler to put MY_VARIABLE in a table with the value 0.
You can test the state of a variable with #ifdef and #ifndef.
Let's see how these directives stop the header file from being included more than once. Instructions with "#" in the first column are preprocessor directives. The preprocessor first sees this pair of directives: #ifxxx and the matching #endif.
#ifndef MY_FUNCTION_GRANDFATHER_H . . #endif /* MY_FUNCTION_GRANDFATHER_H */ |
The label MY_FUNCTION_GRANDFATHER_H is called a preprocessor macro and could be any string, but by tradition, it is the capitalised version of the header filename, with . replaced by _. If MY_FUNCTION_GRANDFATHER_H is not defined, then the preprocessor keeps reading the file. If MY_FUNCTION_GRANDFATHER_H is defined, then the preprocessor skips to the #endif directive, ignoring the content of the file. The include guards only allow the preprocessor to include a header file once when preprocessing an input file.
If the macro MY_FUNCTION_GRANDFATHER_H is not defined, the preprocessor reads into the file, and what does it see?
#define MY_FUNCTION_GRANDFATHER_H |
Now MY_FUNCTION_GRANDFATHER_H is defined, and the preprocessor keeps reading the header file. However, next time the preprocessor sees this header file, MY_FUNCTION_GRANDFATHER_H will be defined and the preprocessor will skip to the #endif ignoring the contents of the file.
Note | |
---|---|
The comment in the line #endif /* MY_FUNCTION_GRANDFATHER_H */ is so that you can associate an #ifdef with its ending #endif. There can be multiple #ifdef/#endif pairs and they could be at different ends of long files. |
You use include guards on all header files, whether or not you think they're needed. You never know, in the future someone will be using your code and will include a header file twice. Possibly only declarations will be harmlessly duplicated, but if someone adds a definition, then the compile will fail.
You didn't use my_function.h for the include guard demonstration, but you will be using it in the next section. Add include guards to the three header files my_function.h, my_function_father.h, my_function_grandfather.h. Here are my header files [59] .
Show that you can make your executable without error. You don't really need a Makefile for this (compile from the command line). If you want a Makefile, you have to handle the following
Here's my Makefile [60] .
Here's the output of the preprocessor running on my_function_child.c (stdio.h output has been removed).
# 1 "my_function_child.c" # 1 "<built-in>" # 1 "<command-line>" # 1 "my_function_child.c" # 1 "my_function_grandfather.h" 1 # 912 "/usr/include/stdio.h" 3 4 # 7 "my_function_grandfather.h" 2 void print_int(int x) { printf ("%d \n", x); } # 7 "my_function_child.c" 2 # 1 "my_function_father.h" 1 # 10 "my_function_father.h" # 1 "my_function_grandfather.h" 1 # 11 "my_function_father.h" 2 void print_int(int); # 8 "my_function_child.c" 2 |
print_int() is included when my_function_child.c includes my_function_grandfather.h. However when my_function_child.c includes my_function_father.h, which includes my_function_grandfather.h, nothing from my_function_grandfather.h is included (neither stdio.h nor print_int()).
Note | |
---|---|
Building an executable/library requires the preprocessor to read many files, the output of which becomes a *.o file. By using include guards, each *.o file will include definitions only once. What happens if two or more *.o files, each compiled from their source files, with headers that have include guards, include the same definition? This can happen (the two compiles are independant), but the linker will give an error on seeing the same definitions in separate *.o files. We'll address this in a later section. |
Remember I said that it was hard to comment out blocks of code in C and to nest comments? The preprocessor can remove blocks of code, so the compiler won't see them. Here I've put a new line into a piece of our code.
//print_int.c //Joseph Mack (C) 2009, released under GPL.v3 //illustrates function calls #include <stdio.h> #include "my_function.h" extern void print_int(int x) { //extra line added here printf ("hello from print_int \n"); printf ("%d \n", x); } //-print_int.c-------------------- |
Recompile your code to show that the extra print statement is working. Let's say we want to block out that code so it doesn't get compiled in.
//print_int.c //Joseph Mack (C) 2009, released under GPL.v3 //illustrates function calls #include <stdio.h> #include "my_function.h" extern void print_int(int x) { #if 0 //extra line added here printf ("hello from print_int \n"); #endif printf ("%d \n", x); } //-print_int.c-------------------- |
The arguement to the proprocessor directive #if is false, so code upto the #endif statement is not included. Run the modified code to show that the extra printf() statement is not run.
Note | |
---|---|
It used to be that the "#" had to be in the first column. This made nested proprocessor directives (all starting in the first column, rather than progressively indenting them), impossible to read (and mistake prone). It seems sometime in the last 20yrs or so, that the "#" just has to be the first non-white space character in the line. Now you can nest preprocessor directives and be able to read them too. |
What simple change would include the printf() statement back in the code [61] ? When you're done messing with the code, you remove the pair of preprocessor directives.
This is the preferred method for temporarily removing instructions from your code.
End Lesson 3
In application languages like Python, you have little or no control over how the data and functions are stored (it isn't your business, the compiler and OS will handle that). Since you've chosen to write in C, the programmer determines where and how everything is stored in memory at run time and needs to understand how memory is used. You can always accept the compiler's defaults, but sometimes the code won't do what you want.
There are two components of storage
There is a 1-to-1 mapping of storage types and hardware.
To program in C, you need to know how the hardware works.
Modern OSs are all virtual memory machines. This means that the slow harddisk is combined with the fast RAM, to produce a large amount of memory. Active processes are kept in RAM as much as possible, while processes that aren't running (are waiting on resources, or waiting on some event like a key stroke or packet from the network) are swapped to disk.
Swapping a job out frees up memory; if you have a Firefox window open, but haven't looked at it for 5mins, then then OS will swap it out, allowing other processes to use the memory. Access time to RAM is about 1nsec, while time access to a harddisk is about 10msec. Thus any process using disk, rather than RAM, as storage, is running about 105 times slower; your 1GHz clock CPU computer is running like a 10kHz computer. If a running job runs out of memory (e.g. compiline gcc, for which the link steps require about 500M of memory) and has to swap, then the process now using the disk as memory, is effectively dead in the water.
It gets even worse than this; there's thrashing. In multitasking OSs, each process gets a time slice (usually about 1/50sec). If there's not enough memory to run the running processes, then the OS will just be bringing back swapped out processes, just to swap them out again at the end of the time slice. In this case the machine is getting nowhere and is spending all its time reading and writing to disk. The only cure is to kill enough processes that there is enough memory for the running process(es).
You need enough RAM to fit the expected number of running processes. Swap is only for idle processes.
The technology used for RAM has changed many times over the last 70yrs. Currently capacitors store bits. These capacitors are leaky (they loose their charge in slightly more than 1nsec) and their contents have to be refreshed every 1nsec or so.
In early machines, RAM was divided into 4 physically separate segments named
In unix (and probably any virtual memory machine), there is only a single linear piece of memory (the 32 bit, or 64 bit memory length address space, for which the RAM and disk serve as physical storage). When a process is launched, the OS will assign addresses for its storage areas. These storage areas aren't neccessary contiguous; they could be anywhere in the address space. However each storage area is linear and contiguous. In a virtual memory machine, the process doesn't know the actual physical addresses involved; it only gets offsets from addresses that the OS knows about. If a process is swapped out and brought back again, it will almost certainly be to a different physical address, but the process won't know that; the only addresses the process knows about are offsets and they haven't changed.
see stack.
Back when I grew up, in cafeterias people ate from porcelain plates dispensed from a stack. The stack was a vertical cylinder wide enough to fit a plate, with a spring (and piston-like top plate) inside to hold a tower of plates. (this is the only image I could find plate cart http://t2.gstatic.com/images?q=tbn:wEvK_k1aP7d5WM:http://huffcomm.com/huffebn/art_8.12_platecart.gif). Three or four plates would show above counter height, so you could see the plates. The plates being curved usually nested just fine and the three or four plates poking out didn't fall over. Sometimes above-counter guides would hold the plates from moving sideways. If the spring constant (force needed to contract the spring a certain distance) was the same as the weight of the plate for the plate thickness, then adding or removing a plate from the top of the stack, left the new top plate at the same place. Thus people in a cafeteria line would always be picking a plate from the same height in the stack. (I notice in cafeterias that still have porcelain plates, that stacks are still being used.)
A stack is a LIFO (last in, first out) data structure (see LIFO http://en.wikipedia.org/wiki/LIFO_(computing)). By comparison, a queue (e.g. for buying tickets) is a FIFO (first in, first out) data structure. Other examples of a LIFO mentioned in the wiki pages is a pile of papers on an in-tray and a tightly packed elevator (lift), where the last people to get in have to get out to let others out.
Computer processes have a memory segment called the stack. All other memory in a computer is accessed randomly. In contrast, the (computer) stack, like the cafeteria stack, is only accessed from the end (called the top, as in "what's on the top of the stack?"). Unlike the cafeteria stack, where the top is always at the same place, the start of the computer stack is fixed (it's the start of the stack segment). The top of the stack (on Intel 80x86 CPUs) moves and is tracked by the SP (stack pointer register). If you push a number onto the stack, the SP will increment, if you pop the stack, the SP will decrement.
Stacks are designed to handle recursive operations and RPN type operations. As it turns out, most of what we think of as the computer's work (the calculations) is done on the stack. In fact, much of what the computer does is move data from one place to another; display characters, receive keystrokes, move data to and from the disk or over the network, move data between memory and the registers, keep track of the processes and whether they need resources or to be swapped in and out. This data is stored in regular (non-stack) memory.
There are two stack operations
There are variations on the two commands, depending how many words you want to push or pop (a long int - 64 bits - will require two push/pops on a 32 bit machine; the instructions might be called pushl/popl).
Note | |
---|---|
There are other stack operations, but they appear (to me) to be the result of creeping featuritis, and we'll ignore them. |
At compile time, what the stack will do (be used for) is unknown, so the size of the stack cannot be specified in the code (the OS specifies the stack size; the actual value can be seen with ulimit -a).
If at runtime, you pop an empty stack, you'll get a stack underflow error; if you push to a full stack, you get a stack overflow error.
We haven't talked much about errors, but whenever we've done so, it's been in the context of a return code from a just executed instruction. Errors communicated through return codes are the ones that can be handled at run time (i.e. in principle, it's possible to write code into your program, that will handle the situation; maybe not well, but at least the error can be addressed). These errors are declared in /usr/include/asm/errno.h (you'll get to know this file; have a look at it on your machine and introduce yourself).
Example error conditions (error name, error number, and C style comment) are
ENOENT 2 /* No such file or directory */ |
Possible action: prompt the user for another filename
ENOMEM 12 /* Out of memory */ |
If the machine is out of memory, it's possible that in a few minutes the processes using the memory won't be using it any more. Possible action: wait and try again; after several tries, give up and exit. This error is usually not handled, since if the machine is out of memory, then not much can be done. In this case, the process is killed by the OS.
EACCES 13 /* Permission denied */ |
This happens when accessing directories (reading, writing, executing). Possible action: send error message to screen and let user handle it.
ETXTBSY 26 /* Text file busy */ |
The error occurs when a file is locked. If a file happens to be an executing program, it's locked and can't be deleted. Possible action: inform the user that they can't do the operation right now.
As you'll see shortly, when a compiler constructs a stack, it is impossible to get a stack underflow. (You can get a stack underflow from your own assembly code.) Such an error would be a compiler bug and can't be handled at run time. Stack overflows are possible with deeply recursive algorithms (later we'll get the recursive calculation of the fibionacci series to overflow the stack). When the stack overflows, it runs into adjacent memory areas (where it will crash some other process) and the error can't be handled at run time either. In both cases the stack error (declared in /usr/include/ps/ierrors.h)
#define e_stackoverflow (-16) #define e_stackunderflow (-17) |
is passed to the OS and the process is killed.
The operations used on a stack are described by postfix notation. There are 3 types of notation for binary operation (in arithmetic and logic) and the names come from the position of the operator. In the following examples, the result is 7.
3 + 4 |
+ 3 4 |
3 4 + |
Note | |
---|---|
I went to Sydney University, just a few miles down the road from New South Wales University, where Hamblin worked. I was even on the staff of UNSW for a year. I never once heard Hamblin's name mentioned and only found out about him from the wiki pages about stacks when I was preparing this class. Similarly I did not find out about John Lyons (http://en.wikipedia.org/wiki/John_Lions) (also at UNSW), who first made the documentation for Unix available and was the first to teach Unix, for 3 more decades. Australia does not celebrate academic excellence, only sporting excellence. |
Postfix notation is more commonly known as RPN - Reverse Polish Notation http://en.wikipedia.org/wiki/Reverse_Polish_notation. In postfix notation, mathematical operations are described this way
3 4 + |
The machine doing the calculation moves from left to right. Here's what it will do (SP is the stack pointer register; AX and BX are registers used for mathematical operations; op AX,BX has AX as the destination of op with BX as the source)
data op SP stack AX BX - - 0 - - - 3 push 1 3 - - 4 push 2 4 3 - - + (pop AX 1 3 4 - pop BX 0 - 4 3 add AX,BX 0 - 7 3 push AX) 1 7 7 3 |
The four lines in "()" would be written as add AX,BX in the assembly code. I've just written the expansion of the add command as it would be implemented.
Early programmable calculators gave the users direct access to the stack, and required the user to enter the data/operations in RPN. Pushing the responsibility for recoding the problem into RPN onto the user, greatly simplified the design of these calculators (making them cheap to produce).
Note | |
---|---|
The manufacturers of these calculators were externalising costs, i.e. shifting the cost of programming the calculator to the user. The users knew what they were up for when buying these calculators (hmm, I can buy an RPN calculator for $10, or a fully programmable one for $800; mostly I'm balancing my checkbook and I only need a 4 function calculator - I'll buy the $10 calculator) so there was nothing underhand about externalising the costs in this case. It allowed the manufacturers to accumulate money and experience so that 20yrs later, they'd be producing fully programmable personal computers. Usually externalising costs is associated with more venal actions, such as polluting or making risky bets on the Stock Exchange (knowing that the Govt will bail you out if you get into trouble), in which the people who incurr the costs (the public) have not agreed to do so and often are not aware that they are bearing the costs. |
Change to another work directory - storage/auto.
Variables of the type auto (automatic) are put on the stack. A variable declared inside a pair of squigglies (i.e. '{}' ) will be automatically made auto by the compiler. Although you can use the storage type keyword auto in your code, you can only use it where the compiler will put it anyhow (i.e. without your help), so you don't ever need to use the storage type keyword auto (i.e. you don't have to do anything to make a variable auto or to put a variable on the stack).
Here is some code with an auto variable int i.
//main.c //Joseph Mack (C) 2009, released under GPL.v3 int j = 0; int main() { auto int i = 10; return (0); } //-main.c------------------- |
This code would normally be written without the auto storage specifier. The following code is identical in all respects to the code above.
//main.c //Joseph Mack (C) 2009, released under GPL.v3 int j = 0; int main() { int i = 10; return (0); } //-main.c------------------- |
Compile up ether of these pieces of code (gcc -o main main.c will do) and look at the executable with nm -on (the "n" sorts by addresses). Here's the relevant part of the output.
auto# nm -on main . . main:08048374 T main . . main:080495c0 A __bss_start . main:080495c8 B j main:080495cc A _end |
For the moment, ignore the storage type for j and let the compiler handle it. Notice that there is storage for j but not for i.
Let's look at the assembler version of this code (thanks to Bjorn Chambless' Linux Assembly Hello World Tutorial http://web.cecs.pdx.edu/~bjorn/CS200/linux_tutorial/ and to Running nasm and gdb http://www.csee.umbc.edu/help/nasm/nasm.shtml - note you can't get debugging info from nasm). Run these commands to see that you get a program the same at the one produced directly from the C file (at least to nm -on)
gcc -S main.c #produces the assembler version of the source code main.s as as -a --gstabs -o main.o main.s #-a output to screen. as is the assembler #-gstabs include debugging info ld -m elf_i386 /usr/lib/crt1.o /usr/lib/crti.o /usr/lib/crtn.o main.o -lc -o main.from_assember #add crt files (common run time files, to setup and break down program) |
Note | |
---|---|
You can find out about crt[1in].o here (http://gcc.gnu.org/ml/gcc/2000-10/msg00622.html). They're files so gcc can be semi-idependant of glibc. |
Write a Makefile that outputs a file main.from_C when produced directly by gcc from main.c, and main.from_assembler for the file produced by the assembler via main.s
FIXME - this needs to be rewritten and moved to the storage section
Here's an example of how/when to use a static function and what you'll see if you get it wrong.
Note | |
---|---|
The error you're about to see can happen when you're compiling large packages, which use code already on your machine. Two packages are included, which define the same variable/function. As can happen, these will be the same or different versions of the same code, (possibly with different definitions) and the compiler will abort. Figuring out what to do here is a real pain. You have to comment out the function in one package and then restore it later (in case it's needed again). With obsfuscated code, often the compiler will die on a line that doesn't have the variable in it at all - it was substituted with something else by the preprocessor. |
Fix print_int.c so that it calls the father, grandfather the same way my_function_child.c did (you could mv print_int.c to print_int.c.orig and then copy my_function_child.c to print_int.c - fix the internal names of the file if you do this).
Our main.c declares print_int() in the code. Change this to using the declaration in my_function_grandfather.h. Since the include guards are working, you can include the bottom header file my_function.h. Our main.c doesn't use an include file to declare print_int(), instead declaring it in the code. Comment out the declaration and instead include the header file my_function.h. Here's my main.c [62] .
Compile
my_function# gcc -Wall -o my_function main.c my_function_child.c /tmp/cc2gslzg.o(.text+0x0): In function `print_int': : multiple definition of `print_int' /tmp/ccobq838.o(.text+0x0): first defined here collect2: ld returned 1 exit status |
This wasn't terribly helpful. Turn on the -g (debugging) flag, which leaves more symbol information in the .o file. (I've edited the output a bit for clarity.)
pip:/src/www.austintek.com/c_class/class_code/my_function# gcc -Wall -g -c my_function_child.c pip:/src/www.austintek.com/c_class/class_code/my_function# gcc -Wall -g -c main.c pip:/src/www.austintek.com/c_class/class_code/my_function# gcc -Wall -o my_function main.o my_function_child.o my_function_child.o(.text+0x0): In function `print_int': my_function/my_function_grandfather.h:8: multiple definition of `print_int' main.o(.text+0x0):my_function/my_function_grandfather.h:8: first defined here collect2: ld returned 1 exit status |
or
my_function# gcc -Wall -g -o my_function main.c my_function_child.c /tmp/cczGn0tD.o(.text+0x0): In function `print_int': my_function/my_function_grandfather.h:8: multiple definition of `print_int' /tmp/ccvpq3p7.o(.text+0x0):my_function/my_function_grandfather.h:8: first defined here collect2: ld returned 1 exit status |
Note | |
---|---|
gcc does its compiling in /tmp using temporary filenames (here ccvpq3p7.o) to hold intermediate results. These temporary filenames are guaranteed to be unique, allowing multiple people to multiple compiles at the same time (all using /tmp for intermediate results). |
The compiler is twice seeing the definition of print_int() at my_function_grandfather.h:8. It doesn't tell you how the definition got to be included twice; you have to figure that out. Now we turn on the include guards; they're only needed in my_function_grandfather.h.
about linkage class:
.o files are collections of functions. On linking (into a library file), the functions are grouped according to their original .o file. Although we haven't talked about this yet, as part of the linking, you tell the linker whether functions compiled into one object file are allowed to call functions that have been compiled into another object file. The default (which we've been using), is to allow all functions to access all other functions. As a result, with all functions being visible to all other functions, the functions appear to have no memory of the object file from which they originally came. In computer speak you are setting the scope of a function; i.e. you are setting the scope of the function to be global or setting the function to be only visible withint the object file? In our case all functions have been visible globally.
To set the scope of a function, you set the linkage class by declaring, in the original source file, the function (or variable) to be either
- extern: in which case it can be seen (e.g., read, written, called) by code in other .o files (these functions/variables are global or public)
- static: in which case it can't be seen by functions in other .o files (these functions are private to code in that .o file).
Note The terms public, global or private denote the scope and can be applied to code in any language. The directives extern and static are the C syntax (i.e. the implementation) used to declare a variable or function to have public (or global, or private) scope. Since there is a one-to-one mapping in C from extern to public, people use either term when describing a function. If you don't declare one or the other, the default is extern. In the header file above, if the declaration of print_int() had been
void print_int(int);then the linker would have made print_int() an extern function. In the past, programmers always left off the extern and rarely got into trouble (C code always runs and compiles, remember). Now people are cleaning up their act and it's regarded good practice to explicitly declare your functions and variables to be one or the other i.e. static or extern.
Note Most code you'll download doesn't use extern and lets the compiler handle it. We're only in the early stages of this changeover.
The extern behaviour is the one you've come to expect from python: if you write a function, then it's accessable from everywhere in your program. If I'd never told you about linkage classes and you didn't know about the word extern, then the current my_function program (which doesn't have the string extern anywhere), would do what you expect: i.e. main() is allowed to use print_int().
If a function is declared extern, then the name of the function is visible to all other code. If the function is declared static, then the name of that function is only visible to the functions in that .o file.
Why would we want a function to be static? It's so not to pollute the global name space. Let's say two of you were writing a library for a camera and each of you were going to produce a single .o file (a total of two .o files), to be combined into a single library file. One of you is writing the camera controls and one of you is writing the image processing. By declaring a function to be static, you would be sure that the other person's code wouldn't know about it that function. Say the camera control person wanted a function rotate() to rotate the image on the viewer by 90° (landscape to portrait say). There's no reason the image processing code will ever want to rotate the image on the viewing screen, so the camera control person would declare their function static rotate(). That way they don't have to ask the image processing person if they happen to have a function called rotate() The camera control person will want a function to export an image rotated from portrait to landscape, and they'll probably call their function rotate() and since the code that does the camera controls won't be messing with already stored images, the image processing person knows that only their code will call this rotate() and so they declare the function static rotate(). Now there are two rotate() functions in the library and there is no namespace collision. The code in each object file only knows about its own rotate().
A bit later, we'll see examples of extern and static, which I hope will differentiate the two classes. In the meantime, just add extern to declarations.
In a previous section (granularity of object files), we saw how a library was seen by the linker primarily as a bunch of object files, and secondarily as a bunch of functions.
Here is the modified main.c, which has the preprocessor directive to include the header file.
//main.c //Joseph Mack (C) 2009, released under GPL.v3 //illustrates function calls //void print_int(int); #include "my_function.h" int main() { int i = 10; print_int(i); return (0); } //-main.c------------------- |
Note | |
---|---|
my_function.h is included with " " marks, to show that it's in the current directory, rather than with < > marks, which would indicate that it's in /usr/include. |
Show that main.c file compiles without warnings (what's the instruction [63] ?)
Include the header file in print_int.c as well (although it's not needed). Put the #include directive just below the line which includes stdio.h. Here's my code and my compile [64] .
Note | |
---|---|
I'd like to thank members of the Triangle Linux User's Group (TriLUG) for help and advice here. |
Copy all your files to a new directory my_function.shared.global_variables and continue work there.
Another aside about C:
C is a language designed by smart programmers, for smart programmers. It's easy to write. Compare assignment in C with assignment in Pascal and Ada
C: foo = 0; Pascal, Ada foo := 0;The "are you sure ':' " in Pascal and Ada is to let everyone know that the language was written by people who don't code, and who think that coders can't handle assignment without a committee's help.
Since C is written by programmers, if there's something wrong with C, it's conceivable that it will get fixed (although it may take decades). However if there's something wrong with a language written by a committee, it's unlikely that they'll acknowledge a problem, much less fix it.
C is an assembly language written to look like a higher level language. If the machine can do it, then C is capable of issuing the instruction, even if in the coding sense, it isn't a good idea. So if it can be done, it can be done in C. Most languages have preconceived ideas of what is OK and what is not OK to do. These ideas are neccessary if you have other constraints on your code e.g. safety, accountability, reliability. However you can't write a new language in these languages and you can't explore new ideas about computing with them. For this you need C.
Rarely does C step in and save you from shooting yourself in the foot. Here we look at a case when C does save you. (I don't know why the compiler writers decided to intervene here and not in other places.)
Note | |
---|---|
In applications oriented languages (e.g. Pascal, Java, Ada), the programmer is divorced from the hardware, and needn't know there is a computer executing the code: it could be some mathematically perfect black box or virtual machine for all they care; the machine might be under your desk, or in some room on the other side of the world and the terminal in front of you connects transparently to it; you don't know the OS or the hardware on your machine and you shouldn't have to know. You should be able to move your code to another platform and have it run the same as on the old platform. With C, you're programming as close to the hardware as you can get (without going to assembly code). With C you need some good idea of what your code is doing at the hardware level to know what your code will do - otherwise it won't always be doing what you think. You won't have to deal with the problem described here for a while. However it's an important programming principle, so I want you to know about it when it first comes up. |
With static libraries, every executable has its own copy of the library.
If the executable is using a dynamic library, there is only one copy (image) of the library in memory. When the first executable is loaded that needs the dynamic library, then the dynamic library is loaded also. When the last executable using the library exits, then that dynamic library is removed from memory. All executables using the dynamic library are given its location. When the executable needs to run the code in the dynamic library, the program counters and flags are set to the location of the dynamic library, and execution continues into the library. On return from the library, the original counters and flags are restored and execution resumes in the executable's code.
Several people could run a copy of the same executable, i.e. multiple copies of the executable (e.g. vi) would be running. All copies would use the one (in memory) copy of the dynamic library file. Alternately, different executables could have linked to the same dynamic library (e.g. graphics programs linked against the X libraries), and they could be running at the same time, all using the single copy of the dynamic library file.
Note | |
---|---|
The situation I'm about to describe here, isn't a problem for processes using a shared library and we'll see why. It is a problem for threads (which all belong to one process). Here, for reference, we analyse the case where there is no problem. Later we'll look at the situation with threads, where there is a problem. |
Let's see what could happen if you stored information in a library. Here's a problem piece of code; a counter in the library maintains state (the counter j keeps track of the number of times a function has been accessed). Note that print_int() isn't passed a parameter in this example.
//print_int.c //Joseph Mack (C) 2009, released under GPL.v3 #include <stdio.h> //problem declaration/definition int j = 0; void print_int(void) { //new line of code exhibiting the problem ++j; printf ("j=%d \n", j); } //-print_int.c-------------------- //main.c //Joseph Mack (C) 2009, released under GPL.v3 //illustrates function calls void print_int(void); int main() { print_int(); print_int(); return (0); } //-main.c------------------- /my_function# ./my_function.dynamic j=1 j=2 #Run the code again. There is no persistent storage of j. /my_function# ./my_function.dynamic j=1 j=2 |
Note | |
---|---|
The value of j is reset to 0 for each invocation of my_function. This means that the dynamic library is unloaded when the executable exits and is reloaded on the next invocation of the binary. |
Note | |
---|---|
This next para is what you would expect from the code. However it's not what happens in the case of shared libraries and we'll see why. |
From looking at the code, j is given a piece of storage (memory) in the library code and j is initiallised when the dynamic library is first loaded. The library code maintains state; i.e. the value of j is preserved between calls to print_int(). If your process is the only one using print_int() then the counter j will record the number of times you call print_int(). If some other process is calling the function at the same time, then unknown to you, the counter will be being incremented by the other user's process as well as by your process.
You can show this isn't true, by simultaneously running two copies of my_function. Since the runtime for my_function is short and no human is going to be able to interleave execution of two programs, you slow the program down a bit. Modify main() to force the code to sleep() for a 5secs (i.e. to be swapped out).
int main() { print_int(); sleep (5); print_int(); return (0); } |
The sleep() gives you enough time to run another copy of my_function in another window, thus having two my_function executables running at the same time. If j was in shared storage then this is what you'd expect
my_function 1 starts j=0 my_function 1 ++j j=1 my_function 1 sleep my_function 2 starts, j is already initialised my_function 2 ++j j=2 my_function 2 sleep my_function 1 wakens my_function 1 ++j j=3 my_function 1 exits my_function 2 wakens my_function 2 ++j j=4 my_function 2 exits |
If the scenario I've described was true, then the counters displayed by my_function when it runs, would be different when other code is using the library. Code up the sleep() modified main() and install your shared library. Run two copies of my_function (in two windows) and show that they run independantly.
Although you don't know it yet, the compiler writers have stepped in and saved you. Before I tell you what they did, let's see how you'd handle it if the compiler writers hadn't stepped in.
The cure is to declare/define all storage in your calling code (the code for which there is only one execution thread). Here's how you'd do it - declare j in main(). Note that print_int() has a parameter again, but it's a different one than in our earlier examples.
//print_int.c //Joseph Mack (C) 2009, released under GPL.v3 #include <stdio.h> void print_int(int j) { //new code without the problem ++j; printf ("j=%d \n", j); } //-print_int.c-------------------- //main.c //Joseph Mack (C) 2009, released under GPL.v3 //illustrates function calls void print_int(int); int k = 0; int main() { print_int(k); print_int(k); return (0); } //-main.c------------------- my_function# ./my_function.dynamic j=1 j=1 |
Why did the code print j=1 twice, rather than j=1,j=2 [65] ? Write code to increment j with each call to print_int(). Here's my code [66] and here's my output [67] .
The principle demonstrated in this last piece of code is that storage must be local to the process using it. In particular, multiple processes can't be allowed to write to a shared variable, unless you understand possible side effects (i.e. if you don't realise that multiple processes are writing to a variable).
What if the shared global variable is read only? Is it going to cause any problems [68] ? Let's add a constant dozen. Here's the new print_int().
//print_int.c //Joseph Mack (C) 2009, released under GPL.v3 //illustrates function calls #include <stdio.h> int j = 0; const int dozen = 12; void print_int(void) { ++j; printf ("j=%d dozen=%d \n", j, dozen); } //-print_int.c-------------------- |
The directive const tells the compiler that you aren't going to change dozen. For the code we've written, the directive makes no difference. Just for fun, add this line to print_int() just before the print() statement and try to compile it.
dozen += 1; |
We'll see how the directive const does its job shortly.
How did the compiler writers handle the original problem? They fixed the problem the same way we did it. They saw the global variable in the library code and instead of assigning storage in the library's memory space, they put that piece of code into the calling executable's memory space.
before after libmy_function.so int j; print_int(){ print_int(){ ++j; ++j; } } main.o int j; main(){ main(){ print_int(); print_int(); } } |
Add the flag -g to $(CFLAGS) by adding this line to your Makefile.
DEBUG = -g |
This flag adds information in the code for the debugger. A debugger is an essential tool in a programmer's armamentarium. You run it to figure out why code is running the way it does. Usually you use a debugger when the code is not doing what you want. Here it's doing what we want; we want to see how it's doing what it's doing. A debugger is a program that runs your program, instruction by instruction, and which allows you to see registers and variable values at each step.
Note | |
---|---|
Geeze: I don't know if this view comes from the dimness of time, but I liked the debugger that came with DeSmet C, the C compiler I used in my DOS days. It was simple to use and displayed variables and registers in multiple formats (as an integer, pointer, float ...). gdb doesn't seem so easy to use. Admittedly C on a multiprocessing platform in a more complicated setup that C on a 4 segment uniprocessor OS. |
As you do with all attempts to figure out what's going on, let's remove extraneous code. Here's print_int().
//print_int.c //Joseph Mack (C) 2009, released under GPL.v3 //illustrates function calls int j = 0; const int dozen = 12; void print_int(void) { ++j; } //-print_int.c-------------------- |
Compile your code and install the shared library.
Let's look at the library with nm using the -n option (which lists variables by address rather than alphabetically). The output has been edited to remove extraneous lines.
my_function.shared.global_variables# nm -n libmy_function.so.2.0.1 0000076c T print_int 000007d8 T _fini 000007f4 R dozen . . 00001924 A _GLOBAL_OFFSET_TABLE_ 00001948 A __bss_start 00001950 B j 00001954 A _end |
The addresses are offsets from the beginning of the library. There are two blocks of code
The TEXT (T) for print_int() starts at 0x76c, with the read only (R) variable dozen stored after the end of print_int() at 0x7f4. Although the definitions of j, dozen were consecutive instructions in the source code, j is stored in a different location 0x1950 (in BSS).
How big (how many bytes) is j [69] ?
Here's what the code looks like when loaded into memory (and run under gdb). Comments (starting with #) have been added.
my_function.shared.global_variables# gdb my_function.dynamic GNU gdb 6.4 Copyright 2005 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "i686-pc-linux-gnu"...Using host libthread_db library "/lib/libthread_db.so.1". (gdb) break 6 #stop at line 6 in the source code Breakpoint 1 at 0x80484c4: file main.c, line 6. (gdb) info break #list the break points Num Type Disp Enb Address What 1 breakpoint keep y 0x080484c4 in main at main.c:6 (gdb) l 1 //main.c 2 //Joseph Mack (C) 2009, released under GPL.v3 3 4 //illustrates function calls 5 6 void print_int(void); 7 8 int main() { 9 print_int(); 10 print_int(); (gdb) r # run (start executing) Starting program: /src/www.austintek.com/c_class/class_code/my_function.shared.global_variables/my_function.dynamic Breakpoint 1, main () at main.c:8 # execution stops after the breakpoint 8 int main() { (gdb) disp dozen # display dozen every time the debugger stops 1: dozen = 12 (gdb) disp j # display j every time the debugger stops 2: j = 0 (gdb) n # execute the next instruction (the call to print_int()) 9 print_int(); 3: dozen = 12 2: j = 0 1: dozen = 12 (gdb) n # execute the 2nd print_int() 10 print_int(); 3: dozen = 12 2: j = 1 1: dozen = 12 (gdb) n # execute the return() 11 return (0); 3: dozen = 12 2: j = 2 1: dozen = 12 (gdb) info address main # show the address of main() Symbol "main" is a function at address 0x80484c4. (gdb) info address print_int # show the address of print_int() Symbol "print_int" is a function at address 0xb808a76c. (gdb) info address dozen # show the address of dozen Symbol "dozen" is static storage at address 0xb808a7f4. (gdb) disp dozen 4: dozen = 12 (gdb) x 0xb808a7f4 # show the value at the address of dozen (ie the value of dozen) 0xb808a7f4 <dozen>: 0x0000000c (gdb) info address j # do the same for j Symbol "j" is static storage at address 0xb808b950. (gdb) disp j 5: j = 2 (gdb) x 0xb808b950 0xb808b950 <j>: 0x00000002 |
Remember we don't have the problem of j being shared in the case of the static library(why not [70] ?)
Copy files to the directory my_function.static.global_variables and modify your Makefile to make the static version of my_function
/my_function.static.global_variables# nm -n my_function.static w _Jv_RegisterClasses w __gmon_start__ U __libc_start_main@@GLIBC_2.0 0804827c T _init 080482e0 T _start 08048310 t __do_global_dtors_aux 08048370 t frame_dummy 08048394 T main 080483b0 T print_int 080483d0 T __libc_csu_fini 080483e0 T __libc_csu_init 0804843a T __i686.get_pc_thunk.bx 08048440 t __do_global_ctors_aux 0804846c T _fini 08048488 R _fp_hw 0804848c R _IO_stdin_used 08048490 R dozen 080494b0 D __data_start 080494b0 A __init_array_end 080494b0 A __init_array_start 080494b0 W data_start 080494b4 D __dso_handle 08049508 r __FRAME_END__ 0804950c D _DYNAMIC 080495d4 d __CTOR_LIST__ 080495d8 d __CTOR_END__ 080495dc d __DTOR_LIST__ 080495e0 D __DTOR_END__ 080495e4 d __JCR_END__ 080495e4 d __JCR_LIST__ 080495e8 D _GLOBAL_OFFSET_TABLE_ 08049600 A __bss_start 08049600 A _edata 08049600 b completed.5863 08049604 b dtor_idx.5865 08049608 B j 0804960c A _end |
The code used here had a global variable j. All code, whether you're aware of it or not, uses global variables. An example global variable is errno (error number). Shared libraries (along with all code) store error information in errno. Error information about the running of the library code needs to be stored in a global variable, and will be in a global location like is being used by j.
Since I brought up errno, here's an introduction to it. We'll return to errno later.
About errno
Unix (and C) were produced in an era when error handling wasn't something that people thought about much. It was much too early in computer evolution to handle errors. People were still working on getting computers to run for long enough to do useful work. No-one cared much if a computer crashed - they'd been doing that for decades and one more crash was no big deal. If the file you wanted wasn't there (because you'd forgotten to ask the operator to load the tape spool with that file), or you ran out of memory, then that was just too bad. Since your job couldn't run, the only choice available to the OS, was to kill your job and let the next job run.
The error handling system in Unix/C is based on return codes and errno (see Errors: errno in UNIX programs http://www.ibm.com/developerworks/aix/library/au-errnovariable/). Each instruction has a return code; some of these return codes indicate an error. If there's an error, you look in the global variable errno to see the last error issued; then you have to handle it. You don't have to look at the return codes, and even if you do, you don't have to look at errno and even if you look at errno, you may not be able to do anything about the error anyhow. Looking at return codes requires a lot of work and since every instruction has a return code, if you look at them all, you'll never get around to writing the code you started out to write. So in general people ignore return codes and don't look at errno.
A newer system, exception handling is used in the newer languages e.g. Java, C++. There, error handling still requires writing a lot of code, but the basis of the system is built in and you only have to write the code on top. It's still a lot of work, but it's better than errno.
Unix and C are not likely to be rewritten with exception handling, so you'll have to learn the errno system.
This version of the "hello world!" program includes much of the material covered so far. For a project name, use my_message (e.g. the statically linked library will be libmy_message.a). You will be revisiting this exercise after you've learned about storage types, so do this exercise in the directory my_message.header
The required functions are
print_success():
This prints the message "success".
print_error():
This prints the message "error".
parse_message(int)
parse_message() takes an int parameter. If the parameter has the value 0, then the function calls print_success(), else the function calls print_error(). (You will need to look up C code for a conditional.)
Note | |
---|---|
In a normal situation, you would print directly from parse_message(). The extra steps of printing from print_success(), print_error() are for the exercise only. |
main():
main() is in its own source file main.c, which stores a global int with possible values of 0/1. Although you haven't covered this yet, a global variable is declared (and in this case defined) at the top of the source file, i.e. before any functions are defined. main() calls parse_message() passing the int as a parameter.
You need to produce static and dynamic versions of the library file, which contains the functions parse_message(), print_success(), print_failure().
Where a part of the problem is not specified, you are free to do what you want. Document the purpose of each source file, including describing the parameters passed to each function and the value returned.
Since you're producing both static and dynamic libraries, build them in their own directories.
Write Makefiles to produce the libraries and the executable, install the appropriate files (you can fake a man page from previous exercises), uninstall, clean and distclean the directories.
Here's my answer.
[71] .The code in the previous problem didn't have storage types declared for functions and variables. The defaults were fine as it turned out, but we can do a little better; some of the functions/variables need to be declared extern or static.
Update the code. Here's my changes [72] .
[1]
The 12 characters "hello world!", plus the carriage return "\n".
[2]
Add the declaration of print_int() at the top of the file
//my_function_reversed.c //Joseph Mack (C) 2009, released under GPL.v3 //illustrates function calls #include <stdio.h> void print_int(int); int main() { int i = 10; print_int(i); return (0); } void print_int(int x){ printf ("%d \n", x); } //-my_function_reversed.c-------------------------- |
Here's the compile
my_function# gcc -Wall -c my_function_reversed.c my_function# gcc -Wall -o my_function_reversed my_function_reversed.c my_function# ./my_function_reversed 10 |
[3]
idempotent
[4]
#Makefile #Joseph Mack (C) 2009, released under GPL.v3 #demonstrates make for the my_function project #main.o: main.c # gcc -Wall -c main.c #print_int.o: print_int.c # gcc -Wall -c print_int.c my_function: print_int.o main.o gcc -Wall -o my_function print_int.o main.o #---------------------- |
[5]
my_function# touch clean |
[6]
For each of the 3 targets main.o, print_int.o, my_function run these commands
[7]
No warnings are issued, allowing you to remain ignorant of problem in code that should never be let out into the world.
[8]
$(OBS), $(EXEC).
[9]
my_function# make main.o gcc -Wall -c -o main.o main.c my_function# make print_int.o gcc -Wall -c -o print_int.o print_int.c |
The implicit rules are executed. They use $(CC) and $(CFLAGS) giving us the command(s) we see here.
[10]
There was no function main(). All C programs have to have a main() so they know where to start executing.
[11]
-Wall. We'll fix that too - with $(LDFLAGS), in the next section.
[12]
#Makefile #Joseph Mack (C) 2009, released under GPL.v3 #demonstrates make for the my_function project OBS = print_int.o main.o EXEC = my_function MAKEFILE = Makefile CC = gcc WARN = -Wall CFLAGS = $(WARN) LDFLAGS = -Wall .PHONY: all clean all: $(EXEC) $(EXEC): $(OBS) $(MAKEFILE) gcc $(LDFLAGS) -o $@ $(OBS) clean: rm -f $(OBS) $(EXEC) %.o: %.c $(MAKEFILE) $(CC) -c $(CFLAGS) $(CPPFLAGS) $< -o $@ #-Makefile--------------------- |
[13]
only root can write in /usr
[14]
//print_int.c //Joseph Mack (C) 2009, released under GPL.v3 //illustrates function calls #include <stdio.h> void print_int(int x) { printf ("%d \n", x); } void hello_world() { printf ("hello world!\n"); } //-print_int.c-------------------- |
[15]
I assumed that the rule would be of the form
%.a: %.o command |
or
lib%.a: %.o command |
if I guessed wrong, then I wouldn't find the rule at the command line and I'd have to find it in the manual.
my_function# make print_int.a make: *** No rule to make target `print_int.a'. Stop. my_function# make libprint_int.a make: *** No rule to make target `libprint_int.a'. Stop. |
[16]
#Makefile #Joseph Mack (C) 2009, released under GPL.v3 #demonstrates make for the my_function project #-------------------------- #user defined variables #project files OBS = print_int.o main.o EXEC = my_function LIB_OBS = print_int.o STATIC_LIB = lib$(EXEC).a #backup #at work, my usb disk is mounted on /pny/home/ #at home, my usb disk is mounted on /pny/work/ #when I change locations, I can copy the files with #cd ~/project_directory #cp -pauv /pny/$where_i_am/`pwd`/ . BACKUP_TOP_DIR = /pny/work #-------------------------- #utilities CP = cp -pauv RM = rm -f PWD = `pwd` #backup BACKUP = $(BACKUP_TOP_DIR)/$(PWD) #make specific variables MAKEFILE = Makefile CC = gcc AR = ar #FLAGS WARN = -Wall CFLAGS = $(WARN) LDFLAGS = -Wall #----------------------------- .PHONY: all clean install_lib all: $(EXEC) $(STATIC_LIB) $(EXEC_STATIC) $(EXEC): $(OBS) $(MAKEFILE) $(CC) $(LDFLAGS) -o $@ $(OBS) $(STATIC_LIB): $(LIB_OBS) $(MAKEFILE) $(AR) rcs $@ $(LIB_OBS) %.o: %.c $(MAKEFILE) $(CP) $< $<.sav $(CC) -c $(CFLAGS) $(CPPFLAGS) $< -o $@ $(CP) $< $<.sav2 $(CP) $< $(BACKUP) clean: $(RM) $(OBS) $(EXEC) $(STATIC_LIB) #-Makefile--------------------- |
[17]
#Makefile #Joseph Mack (C) 2009, released under GPL.v3 #demonstrates make for the my_function project #-------------------------- #user defined variables #project files OBS = print_int.o main.o EXEC = my_function EXEC_STATIC = my_function.static LIB_OBS = print_int.o STATIC_LIB = lib$(EXEC).a #backup #at work, my usb disk is mounted on /pny/home/ #at home, my usb disk is mounted on /pny/work/ #when I change locations, I can copy the files with #cd ~/project_directory #cp -pauv /pny/$where_i_am/`pwd`/ . BACKUP_TOP_DIR = /pny/work #-------------------------- #utilities CP = cp -pauv RM = rm -f PWD = `pwd` #backup BACKUP = $(BACKUP_TOP_DIR)/$(PWD) #make specific variables MAKEFILE = Makefile CC = gcc AR = ar #FLAGS WARN = -Wall CFLAGS = $(WARN) LDFLAGS = -Wall #----------------------------- .PHONY: all clean install_lib all: $(EXEC) $(STATIC_LIB) $(EXEC_STATIC) $(EXEC): $(OBS) $(MAKEFILE) $(CC) $(LDFLAGS) -o $@ $(OBS) $(EXEC_STATIC): main.o $(STATIC_LIB) $(MAKEFILE) $(CC) $(LDFLAGS) -o $@ $< ./$(STATIC_LIB) $(STATIC_LIB): $(LIB_OBS) $(MAKEFILE) $(AR) rcs $@ $(LIB_OBS) %.o: %.c $(MAKEFILE) $(CP) $< $<.sav $(CC) -c $(CFLAGS) $(CPPFLAGS) $< -o $@ $(CP) $< $<.sav2 $(CP) $< $(BACKUP) install_lib: $(STATIC_LIB) $(CP) $< $(INSTALL_LIB_DIR) clean: $(RM) $(OBS) $(EXEC) $(STATIC_LIB) $(EXEC_STATIC) #-Makefile--------------------- |
[18]
Ask the computer. Look for a relatively short .a in /usr/lib and inspect it with nm.
[19]
no
[20]
idempotent
[21]
uninstall: $(RM) $(INSTALL_BIN_DIR)/$(EXEC) $(RM) $(INSTALL_LIB_DIR)/$(STATIC_LIB) |
[22]
.PHONY
[23]
#Makefile #Joseph Mack (C) 2009, released under GPL.v3 #demonstrates make for the my_function project #-------------------------- #user defined variables #project files LIB_OBS = print_int.o OBS = $(LIB_OBS) main.o EXEC = my_function EXEC_STATIC = my_function.static STATIC_LIB = lib$(EXEC).a #backup #at work, my usb disk is mounted on /pny/home/ #at home, my usb disk is mounted on /pny/work/ #when I change locations, I can copy the files with #cd ~/project_directory #cp -pauv /pny/$where_i_am/`pwd`/ . BACKUP_TOP_DIR = /pny/work #-------------------------- #utilities CP = cp -pauv RM = rm -f PWD = `pwd` #backup BACKUP = $(BACKUP_TOP_DIR)/$(PWD) #make specific variables MAKEFILE = Makefile CC = gcc AR = ar #FLAGS WARN = -Wall CFLAGS = $(WARN) LDFLAGS = -Wall #directories INSTALL_LIB_DIR = /usr/lib INSTALL_BIN_DIR = /usr/bin #----------------------------- .PHONY: all clean install_lib install_bin install all: $(EXEC_STATIC) $(STATIC_LIB) $(EXEC_STATIC): main.o $(STATIC_LIB) $(MAKEFILE) $(CC) $(LDFLAGS) -o $@ $< ./$(STATIC_LIB) $(STATIC_LIB): $(LIB_OBS) $(MAKEFILE) $(AR) rcs $@ $(LIB_OBS) %.o: %.c $(MAKEFILE) $(CP) $< $<.sav $(CC) -c $(CFLAGS) $(CPPFLAGS) $< -o $@ $(CP) $< $<.sav2 $(CP) $< $(BACKUP) install: install_bin install_lib install_lib: $(STATIC_LIB) $(MAKEFILE) $(CP) $< $(INSTALL_LIB_DIR) install_bin: $(EXEC_STATIC) $(MAKEFILE) $(CP) $< $(INSTALL_BIN_DIR)/$(EXEC) uninstall: $(RM) $(INSTALL_BIN_DIR)/$(EXEC) $(RM) $(INSTALL_LIB_DIR)/$(STATIC_LIB) clean: $(RM) $(OBS) $(EXEC_STATIC) $(STATIC_LIB) #-Makefile--------------------- |
[24]
distclean: clean $(RM) *.sav *.sav2 |
[25]
//main.c //Joseph Mack (C) 2009, released under GPL.v3 //illustrates function calls void print_int(int); int main() { int i = 10; print_int(i); return (0); } //-main.c------------------- |
//print_int.c //Joseph Mack (C) 2009, released under GPL.v3 //illustrates function calls #include <stdio.h> void print_int(int x) { printf ("%d \n", x); } void hello_world() { printf ("hello world!\n"); } //-print_int.c-------------------- |
[26]
Only the source files have moved with respect to the Makfile; as before, the output files will be produced in the same directory as Makefile. Change the pattern rule to look for the source files in their new location.
[28]
#Makefile #Joseph Mack (C) 2009, released under GPL.v3 #demonstrates make for the my_function project #-------------------------- #user defined variables #project files OBS = print_int.o main.o EXEC = my_function EXEC_STATIC = my_function.static LIB_OBS = print_int.o #backup #at work, my usb disk is mounted on /pny/home/ #at home, my usb disk is mounted on /pny/work/ #when I change locations, I can copy the files with #cd ~/project_directory #cp -pauv /pny/$where_i_am/`pwd`/ . BACKUP_TOP_DIR = /pny/work #-------------------------- #utilities CP = cp -pauv RM = rm -f PWD = `pwd` #backup BACKUP = $(BACKUP_TOP_DIR)/$(PWD) #make specific variables MAKEFILE = Makefile CC = gcc AR = ar #FLAGS WARN = -Wall FPIC = -fPIC CFLAGS = $(WARN) $(FPIC) LDFLAGS = -Wall #STATIC LIB info STATIC_LIB = lib$(EXEC).a #directories INSTALL_LIB_DIR = /usr/lib INSTALL_BIN_DIR = /usr/bin #----------------------------- .PHONY: all clean distclean install install_static_lib install_static_bin uninstall all: $(STATIC_LIB) $(EXEC_STATIC) #all: $(EXEC) $(EXEC): $(OBS) $(MAKEFILE) $(CC) $(LDFLAGS) -o $@ $(OBS) $(EXEC_STATIC): main.o $(STATIC_LIB) $(MAKEFILE) $(CC) $(LDFLAGS) -o $@ $< ./$(STATIC_LIB) $(STATIC_LIB): $(LIB_OBS) $(MAKEFILE) $(AR) rcs $@ $(LIB_OBS) %.o: ../%.c $(MAKEFILE) $(CP) $< $<.sav $(CC) -c $(CFLAGS) $(CPPFLAGS) $< -o $@ $(CP) $< $<.sav2 $(CP) $< $(BACKUP) install: install_static_lib install_static_lib: $(STATIC_LIB) $(MAKEFILE) $(CP) $< $(INSTALL_LIB_DIR) install_static_bin: $(EXEC_STATIC) $(MAKFILE) $(CP) $< $(INSTALL_BIN_DIR)/$(EXEC) uninstall: $(RM) $(INSTALL_BIN_DIR)/$(EXEC) $(RM) $(INSTALL_LIB_DIR)/$(STATIC_LIB) clean: $(RM) $(OBS) $(EXEC) $(EXEC_STATIC) $(STATIC_LIB) distclean: clean $(RM) *.sav *.sav2 #-Makefile--------------------- |
[29]
Makefile is a prerequisite for some of the targets. When make finds a prerequisite, it next looks for a rule which has that prerequisite as a target. There are no explicit rules to make Makefile, so make next looks at the pattern rules and then at its list of implicit rules. When the list of rules is exhausted, make gives up and declares that it's finished with Makefile. (This is not quite the same as declaring Makefile to be upto date, but it's close enough for our purposes.)
[30]
add a rule for Makefile that does nothing. This works
$(MAKEFILE): |
[31]
You would like a wildcard way of handling this. If it's not obvious how to use a wildcard, you can try this first
.PHONY ... ../print_int.c |
On finding that the filename works, you can try a wildcard
.PHONY ... ../*.c |
which also works.
[32]
.PHONY: ... $(MAKEFILE) ../*.c |
[33]
GNU Make 3.81 Copyright (C) 2006 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. This program built for i686-pc-linux-gnu Reading makefiles... Reading makefile `Makefile'... Updating makefiles.... Considering target file `Makefile'. File `Makefile' does not exist. Finished prerequisites of target file `Makefile'. Must remake target `Makefile'. Successfully remade target file `Makefile'. Updating goal targets.... Considering target file `all'. File `all' does not exist. Considering target file `libmy_function.a'. File `libmy_function.a' does not exist. Considering target file `print_int.o'. File `print_int.o' does not exist. Looking for an implicit rule for `print_int.o'. Trying pattern rule with stem `print_int'. Trying implicit prerequisite `../print_int.c'. Trying rule prerequisite `Makefile'. Found an implicit rule for `print_int.o'. Considering target file `../print_int.c'. File `../print_int.c' does not exist. Finished prerequisites of target file `../print_int.c'. Must remake target `../print_int.c'. Successfully remade target file `../print_int.c'. Pruning file `Makefile'. Finished prerequisites of target file `print_int.o'. Must remake target `print_int.o'. cp -pauv ../print_int.c ../print_int.c.sav Putting child 0x0807a740 (print_int.o) PID 28427 on the chain. Live child 0x0807a740 (print_int.o) PID 28427 Reaping winning child 0x0807a740 PID 28427 gcc -c -Wall ../print_int.c -o print_int.o Live child 0x0807a740 (print_int.o) PID 28428 Reaping winning child 0x0807a740 PID 28428 cp -pauv ../print_int.c ../print_int.c.sav2 Live child 0x0807a740 (print_int.o) PID 28431 Reaping winning child 0x0807a740 PID 28431 cp -pauv ../print_int.c /pny/work/`pwd` Live child 0x0807a740 (print_int.o) PID 28432 Reaping winning child 0x0807a740 PID 28432 Removing child 0x0807a740 PID 28432 from chain. Successfully remade target file `print_int.o'. Pruning file `Makefile'. Finished prerequisites of target file `libmy_function.a'. Must remake target `libmy_function.a'. ar rcs libmy_function.a print_int.o Putting child 0x0807a768 (libmy_function.a) PID 28434 on the chain. Live child 0x0807a768 (libmy_function.a) PID 28434 Reaping winning child 0x0807a768 PID 28434 Removing child 0x0807a768 PID 28434 from chain. Successfully remade target file `libmy_function.a'. Considering target file `my_function.static'. File `my_function.static' does not exist. Considering target file `main.o'. File `main.o' does not exist. Looking for an implicit rule for `main.o'. Trying pattern rule with stem `main'. Trying implicit prerequisite `../main.c'. Trying rule prerequisite `Makefile'. Found an implicit rule for `main.o'. Considering target file `../main.c'. File `../main.c' does not exist. Finished prerequisites of target file `../main.c'. Must remake target `../main.c'. Successfully remade target file `../main.c'. Pruning file `Makefile'. Finished prerequisites of target file `main.o'. Must remake target `main.o'. cp -pauv ../main.c ../main.c.sav Putting child 0x0807ade0 (main.o) PID 28435 on the chain. Live child 0x0807ade0 (main.o) PID 28435 Reaping winning child 0x0807ade0 PID 28435 gcc -c -Wall ../main.c -o main.o Live child 0x0807ade0 (main.o) PID 28436 Reaping winning child 0x0807ade0 PID 28436 cp -pauv ../main.c ../main.c.sav2 Live child 0x0807ade0 (main.o) PID 28439 Reaping winning child 0x0807ade0 PID 28439 cp -pauv ../main.c /pny/work/`pwd` Live child 0x0807ade0 (main.o) PID 28440 Reaping winning child 0x0807ade0 PID 28440 Removing child 0x0807ade0 PID 28440 from chain. Successfully remade target file `main.o'. Pruning file `libmy_function.a'. Pruning file `Makefile'. Finished prerequisites of target file `my_function.static'. Must remake target `my_function.static'. gcc -Wall -o my_function.static main.o ./libmy_function.a Putting child 0x0807ad28 (my_function.static) PID 28442 on the chain. Live child 0x0807ad28 (my_function.static) PID 28442 Reaping winning child 0x0807ad28 PID 28442 Removing child 0x0807ad28 PID 28442 from chain. Successfully remade target file `my_function.static'. Finished prerequisites of target file `all'. Must remake target `all'. Successfully remade target file `all'. |
[34]
There's a file (actually directory) static in the current directory.
[35]
You can use some other name e.g. static_build or you can make static a .PHONY target. I'd say it was safer to use another name (there's no point in inviting trouble by using directory names for targets).
[36]
GNU Make 3.81 Copyright (C) 2006 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. This program built for i686-pc-linux-gnu Reading makefiles... Reading makefile `Makefile'... Updating makefiles.... Considering target file `Makefile'. File `Makefile' does not exist. Finished prerequisites of target file `Makefile'. Must remake target `Makefile'. Successfully remade target file `Makefile'. Updating goal targets.... Considering target file `all'. File `all' does not exist. Considering target file `static'. Finished prerequisites of target file `static'. No need to remake target `static'. Finished prerequisites of target file `all'. Must remake target `all'. Successfully remade target file `all'. make: Nothing to be done for `all'. |
[37]
#Makefile #Joseph Mack (C) 2009, released under GPL.v3 #demonstrates make for the my_function project #-------------------------- #user defined variables #project files OBS = print_int.o main.o EXEC = my_function EXEC_STATIC = my_function.static LIB_OBS = print_int.o #backup #at work, my usb disk is mounted on /pny/home/ #at home, my usb disk is mounted on /pny/work/ #when I change locations, I can copy the files with #cd ~/project_directory #cp -pauv /pny/$where_i_am/`pwd`/ . BACKUP_TOP_DIR = /pny/work #directories STATIC_BUILD_DIR = static #-------------------------- #utilities CP = cp -pauv RM = rm -f PWD = `pwd` #backup BACKUP = $(BACKUP_TOP_DIR)/$(PWD) #make specific variables MAKEFILE = Makefile CC = gcc AR = ar #FLAGS WARN = -Wall FPIC = -fPIC CFLAGS = $(WARN) $(FPIC) LDFLAGS = -Wall #STATIC LIB info STATIC_LIB = lib$(EXEC).a #directories INSTALL_LIB_DIR = /usr/lib INSTALL_BIN_DIR = /usr/bin #----------------------------- .PHONY: all clean distclean install static_build all: static_build static_build: cd $(STATIC_BUILD_DIR); make; cd - install: static_build cd $(STATIC_BUILD_DIR); make install; cd - uninstall: cd $(STATIC_BUILD_DIR); make uninstall; cd - clean: cd $(STATIC_BUILD_DIR); make clean; cd - distclean: clean $(RM) -r $(STATIC_BUILD_DIR) $(RM) *.sav *.sav2 #-Makefile--------------------- |
[38]
static_build: $(MKDIR) static $(CP) Makefile.static static/Makefile cd $(STATIC_BUILD_DIR); make; cd - |
[39]
Both targets need a prerequisite in one or other of the Makefile in the top directory or the Makefile in the build directory. Where you put it is a practical, rather than theoretical matter. Clearly having the prerequisite in the top Makefile would save the trip to the build directory to find out that nothing need be done.
[40]
the command in the clean rule fails as there is no build directory to cd to.
[41]
distclean: clean $(RM) *.sav *.sav2 $(RM) -r static |
[42]
There were no build directories for the command in the uninstall rule to cd into.
[43]
#Makefile #Joseph Mack (C) 2009, released under GPL.v3 #demonstrates make for the my_function project #-------------------------- #user defined variables #project files PROJECT = my_function LIB_OBS = print_int.o OBS = $(LIB_OBS) main.o EXEC = $(PROJECT) EXEC_STATIC = $(EXEC).static #backup #at work, my usb disk is mounted on /pny/home/ #at home, my usb disk is mounted on /pny/work/ #when I change locations, I can copy the files with #cd ~/project_directory #cp -pauv /pny/$where_i_am/`pwd`/ . BACKUP_TOP_DIR = /pny/work #directories STATIC_BUILD_DIR = static #-------------------------- #utilities CP = cp -pauv MKDIR = mkdir -p RM = rm -f PWD = `pwd` #backup BACKUP = $(BACKUP_TOP_DIR)/$(PWD) #make specific variables MAKEFILE = Makefile CC = gcc AR = ar #FLAGS WARN = -Wall FPIC = -fPIC CFLAGS = $(WARN) $(FPIC) LDFLAGS = -Wall #STATIC LIB info STATIC_LIB = lib$(EXEC).a #directories INSTALL_LIB_DIR = /usr/lib INSTALL_BIN_DIR = /usr/bin INSTALL_MAN_DIR = /usr/man/man1 #----------------------------- .PHONY: all static_build clean distclean install uninstall #all: static_build # #static_build: # cd $(STATIC_BUILD_DIR); make; cd - all: static_build dynamic_build static_build: $(MKDIR) static $(CP) Makefile.static static/Makefile cd $(STATIC_BUILD_DIR); make; cd - install: static_build dynamic_build cd $(STATIC_BUILD_DIR); make install; cd - cd $(DYNAMIC_BUILD_DIR); make install; cd - $(CP) man/$(EXEC).1 $(INSTALL_MAN_DIR) uninstall: #cd $(STATIC_BUILD_DIR); make uninstall; cd - #cd $(DYNAMIC_BUILD_DIR); make uninstall; cd - make -f Makefile.static uninstall make -f Makefile.dynamic uninstall $(RM) $(INSTALL_MAN_DIR)/$(EXEC).1 clean: if [ -d "$(STATIC_BUILD_DIR)" ]; then echo "exists"; cd $(STATIC_BUILD_DIR); make clean; cd -; else echo "no static"; fi if [ -d "$(DYNAMIC_BUILD_DIR)" ]; then echo "exists"; cd $(DYNAMIC_BUILD_DIR); make clean; cd -; else echo "no dynamic"; fi #cd $(STATIC_BUILD_DIR); make clean; cd - #cd $(DYNAMIC_BUILD_DIR); make clean; cd - distclean: clean $(RM) *.sav *.sav2 $(RM) -r static dynamic #-Makefile--------------------- |
[44]
'\" t .\" Title: my_function .\" Author: Joseph Mack .\" Generator: .\" Date: 11/19/2009 .\" Manual: .\" Source: .\" Language: English .\" .TH "MY_FUNCTION" "1" "11/19/2009" "[FIXME: source]" "[FIXME: manual]" .\" ----------------------------------------------------------------- .\" * set default formatting .\" ----------------------------------------------------------------- .\" disable hyphenation .nh .\" disable justification (adjust text to left margin only) .ad l .\" ----------------------------------------------------------------- .\" * MAIN CONTENT STARTS HERE * .\" ----------------------------------------------------------------- .SH "NAME" my_function \- displays "10" .SH "SYNOPSIS" .HP \w'\fBmy_function\fR\ 'u \fBmy_function\fR (has no options) .SH "DESCRIPTION" .PP \fBmy_function\fR - a class exercise\&. .SS "Options" .PP there are no options .RE .SH "SEE ALSO" .PP no other files |
[45]
my_function-2.0# gcc -o my_function.dynamic main.o ./libmy_function.so.2.0.1 my_function-2.0# dir -rw-r--r-- 1 root root 256 Oct 29 12:18 main.c -rw-r--r-- 1 root root 402 Nov 1 23:57 print_int.c -rw-r--r-- 1 root root 1236 Nov 2 12:54 print_int.o -rw-r--r-- 1 root root 1032 Nov 2 12:55 main.o -rwxr-xr-x 1 root root 6183 Nov 2 12:55 libmy_function.so.2.0.1* -rwxr-xr-x 1 root root 6538 Nov 2 12:58 my_function.dynamic* my_function-2.0# ldd my_function.dynamic linux-gate.so.1 => (0xb806b000) libmy_function.so.2.0 => not found libc.so.6 => /lib/libc.so.6 (0xb7f00000) /lib/ld-linux.so.2 (0xb806c000) |
[46]
The executable needs the soname, here libmy_function.so.2.0. The soname functionality can be provided by any file libmy_function.so.2.0.x. The executable doesn't need to know the name of the file providing the soname.
[47]
To update the machine's list of dynamic libraries and their functions, kept in /etc/ld.so.cache.
[48]
#Makefile #Joseph Mack (C) 2009, released under GPL.v3 #demonstrates make for the my_function project #-------------------------- #user defined variables #project files OBS = print_int.o main.o EXEC = my_function EXEC_DYNAMIC = my_function.dynamic LIB_OBS = print_int.o #backup #at work, my usb disk is mounted on /pny/home/ #at home, my usb disk is mounted on /pny/work/ #when I change locations, I can copy the files with #cd ~/project_directory #cp -pauv /pny/$where_i_am/`pwd`/ . BACKUP_TOP_DIR = /pny/work #-------------------------- #utilities CP = cp -pauv RM = rm -f PWD = `pwd` #backup BACKUP = $(BACKUP_TOP_DIR)/$(PWD) #make specific variables MAKEFILE = Makefile CC = gcc AR = ar #FLAGS WARN = -Wall FPIC = -fPIC CFLAGS = $(WARN) $(FPIC) LDFLAGS = -Wall #DYNAMIC LIB info LIB_MAJOR = 2 LIB_MINOR = 0 LIB_EXTRA = 1 DYNAMIC_LIB = lib$(EXEC).so.$(LIB_MAJOR).$(LIB_MINOR).$(LIB_EXTRA) SONAME = lib$(EXEC).so.$(LIB_MAJOR).$(LIB_MINOR) #directories INSTALL_LIB_DIR = /usr/lib INSTALL_BIN_DIR = /usr/bin #----------------------------- .PHONY: all clean distclean install install_dynamic_lib install_dynamic_bin uninstall $(MAKEFILE) ../*.c $(EXEC): $(OBS) $(MAKEFILE) $(CC) $(LDFLAGS) -o $@ $(OBS) $(EXEC_DYNAMIC): main.o $(DYNAMIC_LIB) $(MAKEFILE) $(CC) $(LDFLAGS) -o $@ $< ./$(DYNAMIC_LIB) $(DYNAMIC_LIB): $(LIB_OBS) $(MAKEFILE) $(CC) -shared -Wl,-soname,$(SONAME) -o $(DYNAMIC_LIB) $(LIB_OBS) %.o: ../%.c $(MAKEFILE) $(CP) $< $<.sav $(CC) -c $(CFLAGS) $(CPPFLAGS) $< -o $@ $(CP) $< $<.sav2 $(CP) $< $(BACKUP) install: install_dynamic_lib install_dynamic_bin install_dynamic_lib: $(DYNAMIC_LIB) $(MAKEFILE) $(CP) $< $(INSTALL_LIB_DIR) ldconfig & install_dynamic_bin: $(EXEC_DYNAMIC) $(MAKFILE) $(CP) $< $(INSTALL_BIN_DIR)/$(EXEC) uninstall: $(RM) $(INSTALL_BIN_DIR)/$(EXEC) $(RM) $(INSTALL_LIB_DIR)/lib$(EXEC).so.$(LIB_MAJOR).$(LIB_MINOR).$(LIB_EXTRA) $(RM) $(INSTALL_LIB_DIR)/$(SONAME) clean: $(RM) $(OBS) $(EXEC) $(EXEC_DYNAMIC) $(DYNAMIC_LIB) distclean: clean $(RM) *.sav *.sav2 #-Makefile--------------------- |
[49]
& is a bashism. It says to run the command ldconfig in background. ldconfig takes a while to run (it has to search all the files in the directories in /etc/ld.so.conf), and since you don't need the results of this command to finish your script, it's quite OK to let ldconfig go off and do its work by itself. So the & instruction tells bash to launch the command ldconfig, and instead of waiting for ldconfig to return, bash launches the next command. Meanwhile ldconfig continues to run and will return in its own good time. Putting jobs, whose results you don't need immediately into background, allows for faster execution of your Makefile. You can only do this if there's nothing below in the script, depending on ldconfig finishing.
[50]
Comment out the line in Makefile.static that installs the static binary. Then show that the installed binary requires the dynamic library libmy_function.so.2.0
my_function-2.0# ldd /usr/bin/my_function libmy_function.so.2.0 => /usr/lib/libmy_function.so.2.0 (0x40017000) libc.so.6 => /lib/libc.so.6 (0x40019000) /lib/ld-linux.so.2 (0x40000000) |
[51]
#Makefile #Joseph Mack (C) 2009, released under GPL.v3 #demonstrates make for the my_function project #-------------------------- #user defined variables #project files OBS = print_int.o main.o EXEC = my_function EXEC_STATIC = my_function.static LIB_OBS = print_int.o #backup #at work, my usb disk is mounted on /pny/home/ #at home, my usb disk is mounted on /pny/work/ #when I change locations, I can copy the files with #cd ~/project_directory #cp -pauv /pny/$where_i_am/`pwd`/ . BACKUP_TOP_DIR = /pny/work #directories STATIC_BUILD_DIR = static DYNAMIC_BUILD_DIR = dynamic #-------------------------- #utilities CP = cp -pauv MKDIR = mkdir -p RM = rm -f PWD = `pwd` #backup BACKUP = $(BACKUP_TOP_DIR)/$(PWD) #make specific variables MAKEFILE = Makefile CC = gcc AR = ar #FLAGS WARN = -Wall FPIC = -fPIC CFLAGS = $(WARN) $(FPIC) LDFLAGS = -Wall #STATIC LIB info STATIC_LIB = lib$(EXEC).a #directories INSTALL_LIB_DIR = /usr/lib INSTALL_BIN_DIR = /usr/bin INSTALL_MAN_DIR = /usr/man/man1 #----------------------------- .PHONY: all static_build dynamic_build clean distclean install uninstall #all: static_build # #static_build: # cd $(STATIC_BUILD_DIR); make; cd - all: static_build dynamic_build static_build: $(MKDIR) static $(CP) Makefile.static static/Makefile cd $(STATIC_BUILD_DIR); make; cd - dynamic_build: $(MKDIR) dynamic $(CP) Makefile.dynamic dynamic/Makefile cd $(DYNAMIC_BUILD_DIR); make; cd - install: static_build dynamic_build cd $(STATIC_BUILD_DIR); make install; cd - cd $(DYNAMIC_BUILD_DIR); make install; cd - $(CP) man/$(EXEC).1 $(INSTALL_MAN_DIR) uninstall: cd $(STATIC_BUILD_DIR); make uninstall; cd - cd $(DYNAMIC_BUILD_DIR); make uninstall; cd - $(RM) $(INSTALL_MAN_DIR)/$(EXEC).1 clean: cd $(STATIC_BUILD_DIR); make clean; cd - cd $(DYNAMIC_BUILD_DIR); make clean; cd - distclean: clean $(RM) *.sav *.sav2 $(RM) -r static dynamic #-Makefile--------------------- |
[52]
my_function.header# gcc -o my_function print_int.c main.c my_function.header# ./my_function 10 |
[53]
my_function.header# make main.o gcc -Wall -fPIC -c -o main.o main.c |
[54]
make used the implicit rule for producing a *.o file from a *.c file, and used $(CFLAGS) already declared in the Makefile.
[55]
No. You do need to modify the install and uninstall rules, but we'll handle that in the next section.
[56]
//my_function.h //Joseph Mack (C) 2009, released under GPL.v3 //header file for function project void print_int(int); int print_int(int); //-my_function.h------------------ |
[57]
my_function# gcc -Wall -o my_function main.c print_int.c In file included from main.c:9: my_function.h:11: error: conflicting types for 'print_int' my_function.h:10: note: previous declaration of 'print_int' was here In file included from print_int.c:9: my_function.h:11: error: conflicting types for 'print_int' my_function.h:10: note: previous declaration of 'print_int' was here |
[58]
Twice. Both my_function_grandfather.h and my_function_father.h contain the definition of print_int(). my_function_father.h contains the definition by including my_function_grandfather.h.
[59]
//my_function.h //Joseph Mack (C) 2009, released under GPL.v3 //header file for function project #ifndef MY_FUNCTION_H #define MY_FUNCTION_H extern void print_int(int); //#include "my_function.h" #endif /* MY_FUNCTION_H */ //-my_function.h------------------ |
//my_function_father.h //Joseph Mack (C) 2009, released under GPL.v3 //header file for function project #ifndef MY_FUNCTION_FATHER_H #define MY_FUNCTION_FATHER_H #include "my_function_grandfather.h" void print_int(int); #endif /* MY_FUNCTION_FATHER_H */ //-my_function_father.h------------------ |
//my_function_grandfather.h //Joseph Mack (C) 2009, released under GPL.v3 //illustrates function calls #include <stdio.h> #ifndef MY_FUNCTION_GRANDFATHER_H #define MY_FUNCTION_GRANDFATHER_H void print_int(int x) { printf ("%d \n", x); } #endif /*MY_FUNCTION_GRANDFATHER_H*/ //-my_function_grandfather.h---------------- |
[60]
#Makefile #Joseph Mack (C) 2009, released under GPL.v3 #demonstrates make for the my_function project #-------------------------- #user defined variables #project files PROJECT = my_function LIB_OBS = my_function_child.o OBS = $(LIB_OBS) main.o EXEC = $(PROJECT) EXEC_STATIC = $(EXEC).static SOURCES = my_function_child.c main.c #backup #at work, my usb disk is mounted on /pny/home/ #at home, my usb disk is mounted on /pny/work/ #when I change locations, I can copy the files with #cd ~/project_directory #cp -pauv /pny/$where_i_am/`pwd`/ . BACKUP_TOP_DIR = /pny/home #-------------------------- #utilities CP = cp -pauv RM = rm -f PWD = `pwd` #backup BACKUP = $(BACKUP_TOP_DIR)/$(PWD) #make specific variables MAKEFILE = Makefile CC = gcc AR = ar #FLAGS WARN = -Wall #FPIC = -fPIC #INCL = -I../ CFLAGS = $(WARN) $(FPIC) $(INCL) LDFLAGS = -Wall #STATIC LIB info STATIC_LIB = lib$(EXEC).a #include files HEADERS = $(EXEC).h $(EXEC)_father.h $(EXEC)_grandfather.h #directories INSTALL_LIB_DIR = /usr/lib INSTALL_BIN_DIR = /usr/bin #----------------------------- .PHONY: all clean distclean install install_static_lib install_static_bin uninstall $(MAKEFILE) $(SOURCES) $(HEADERS) all: $(STATIC_LIB) $(EXEC_STATIC) #all: $(EXEC) $(EXEC): $(OBS) $(MAKEFILE) $(HEADERS) $(CC) $(LDFLAGS) -o $@ $(OBS) $(EXEC_STATIC): main.o $(STATIC_LIB) $(MAKEFILE) $(HEADERS) $(CC) $(LDFLAGS) -o $@ $< ./$(STATIC_LIB) $(STATIC_LIB): $(LIB_OBS) $(MAKEFILE) $(HEADERS) $(AR) rcs $@ $(LIB_OBS) %.o: %.c $(MAKEFILE) $(HEADERS) $(CP) $< $<.sav $(CC) -c $(CFLAGS) $(CPPFLAGS) $< -o $@ $(CP) $< $<.sav2 $(CP) $< $(BACKUP) install: install_static_lib install_static_lib: $(STATIC_LIB) $(MAKEFILE) $(CP) $< $(INSTALL_LIB_DIR) install_static_bin: $(EXEC_STATIC) $(MAKFILE) $(CP) $< $(INSTALL_BIN_DIR)/$(EXEC) uninstall: $(RM) $(INSTALL_BIN_DIR)/$(EXEC) $(RM) $(INSTALL_LIB_DIR)/$(STATIC_LIB) clean: $(RM) $(OBS) $(EXEC) $(EXEC_STATIC) $(STATIC_LIB) distclean: clean $(RM) *.sav *.sav2 #-Makefile--------------------- |
[61]
//print_int.c //Joseph Mack (C) 2009, released under GPL.v3 //illustrates function calls #include <stdio.h> #include "my_function.h" extern void print_int(int x) { #if 1 //extra line added here printf ("hello from print_int \n"); #endif printf ("%d \n", x); } //-print_int.c-------------------- |
[62]
[63]
gcc -Wall -c main.c |
[64]
//print_int.c //Joseph Mack (C) 2009, released under GPL.v3 //illustrates function calls #include <stdio.h> #include "my_function.h" void print_int(int x){ printf ("%d \n", x); } //-print_int.c-------------------- |
the compile
my_function# gcc -Wall -c print_int.c my_function# gcc -Wall -c main.c my_function# gcc -Wall -o my_function main.o print_int.o my_function# ./my_function 10 |
[65]
The incremented value of j was not returned to main(). Both calls to print_int() used j=0. For j to work as a counter, you would change print_int() to return the incremented value of j.
[66]
//print_int.c //Joseph Mack (C) 2009, released under GPL.v3 //illustrates function calls #include <stdio.h> int print_int(int j ) { ++j; printf ("j=%d \n", j); return j; } //-print_int.c-------------------- //main.c //Joseph Mack (C) 2009, released under GPL.v3 //illustrates function calls int print_int(int); int k = 0; int main() { k = print_int(k); k = print_int(k); return (0); } //-main.c------------------- |
[67]
my_function# ./my_function.dynamic j=1 j=2 |
[68]
No. Al processes can read the shared global variable, and as long as they don't change it, the global variable will be the same to all processes that access it.
[69]
0x1954-0x1950=4 bytes i.e. j is a 32 bit integer.
[70]
With a static library, the executable has its own copy of the library.
[71]
//main.c //Joseph Mack (C) 2009, released under GPL.v3 //detects if error or success condition and calls functions to output message //variables // return_code: 0 if success, 1 if error #include "my_message.h" int return_code = 0; int main(void) { parse_message(return_code); return (0); } //-main.c------------------- |
//print_message.c //Joseph Mack (C) 2009, released under GPL.v3 //illustrates function calls //parameters: int // has value 0 or 1 // if 0 call void print_success(void) // if 1 call void print_error(void) #include <stdio.h> #include "my_message.h" #include "print_message.h" void parse_message(int x) { if (x == 0) { print_success(); } else { print_error(); } //x } void print_success(void) { printf ("success\n"); } void print_error(void) { printf ("error\n"); } //-print_message.c-------------------- |
//my_message.h //Joseph Mack (C) 2009, released under GPL.v3 //header file for my_message project #ifndef MY_MESSAGE_H #define MY_MESSAGE_H void parse_message(int); #endif /* MY_MESSAGE_H */ //-my_message.h------------------ |
//print_message.h //Joseph Mack (C) 2009, released under GPL.v3 //header file for my_message project #ifndef PRINT_MESSAGE_H #define PRINT_MESSAGE_H void parse_message(int); void print_success(void); void print_error(void); #endif /* PRINT_MESSAGE_H */ //-print_message.h------------------ |
#Makefile #Joseph Mack (C) 2009, released under GPL.v3 #demonstrates make for the my_message project #-------------------------- #user defined variables #project files PROJECT = my_message LIB_OBS = print_message.o OBS = $(LIB_OBS) main.o EXEC = $(PROJECT) EXEC_STATIC = $(EXEC).static EXEC_DYNAMIC = $(EXEC).dynamic #backup #at work, my usb disk is mounted on /pny/home/ #at home, my usb disk is mounted on /pny/work/ #when I change locations, I can copy the files with #cd ~/project_directory #cp -pauv /pny/$where_i_am/`pwd`/ . BACKUP_TOP_DIR = /pny/work #-------------------------- #utilities CP = cp -pauv RM = rm -f PWD = `pwd` MKDIR = mkdir -p #backup BACKUP = $(BACKUP_TOP_DIR)/$(PWD) #make specific variables MAKEFILE = Makefile CC = gcc AR = ar #FLAGS WARN = -Wall FPIC = -fPIC CFLAGS = $(WARN) $(FPIC) LDFLAGS = -Wall #files HEADERS = my_message.h print_message.h #directories INSTALL_INCL_DIR = /usr/include/ DYNAMIC_BUILD_DIR = dynamic STATIC_BUILD_DIR = static INSTALL_MAN_DIR = /usr/man/man1 #----------------------------- .PHONY: all clean distclean install uninstall all: static_build dynamic_build static_build: $(MKDIR) $(STATIC_BUILD_DIR) cp Makefile.static $(STATIC_BUILD_DIR)/$(MAKEFILE) cd $(STATIC_BUILD_DIR); make; cd - dynamic_build: $(MKDIR) $(DYNAMIC_BUILD_DIR) cp Makefile.dynamic $(DYNAMIC_BUILD_DIR)/$(MAKEFILE) cd $(DYNAMIC_BUILD_DIR); make; cd - install: static_build dynamic_build cd $(STATIC_BUILD_DIR); make install; cd - cd $(DYNAMIC_BUILD_DIR); make install; cd - $(CP) man/$(EXEC).1 $(INSTALL_MAN_DIR) $(CP) $(HEADERS) $(INSTALL_INCL_DIR) uninstall: make -f Makefile.static uninstall make -f Makefile.dynamic uninstall cd $(INSTALL_INCL_DIR); rm $(HEADERS); cd - $(RM) $(INSTALL_MAN_DIR)/$(EXEC).1 clean: if [ -d "$(STATIC_BUILD_DIR)" ]; then cd $(STATIC_BUILD_DIR); make clean; cd -; fi if [ -d "$(DYNAMIC_BUILD_DIR)" ]; then cd $(DYNAMIC_BUILD_DIR); make clean; cd -; fi distclean: clean $(RM) -r static dynamic $(RM) *.sav *.sav2 #-Makefile--------------------- |
#Makefile.static #Joseph Mack (C) 2009, released under GPL.v3 #demonstrates make for the my_message project #-------------------------- #user defined variables #project files PROJECT = my_message LIB_OBS = print_message.o OBS = $(LIB_OBS) main.o EXEC = $(PROJECT) EXEC_STATIC = $(EXEC).static SOURCES = ../print_message.c ../main.c #backup #at work, my usb disk is mounted on /pny/home/ #at home, my usb disk is mounted on /pny/work/ #when I change locations, I can copy the files with #cd ~/project_directory #cp -pauv /pny/$where_i_am/`pwd`/ . BACKUP_TOP_DIR = /pny/home #-------------------------- #utilities CP = cp -pauv RM = rm -f PWD = `pwd` #backup BACKUP = $(BACKUP_TOP_DIR)/$(PWD) #make specific variables MAKEFILE = Makefile CC = gcc AR = ar #FLAGS WARN = -Wall #FPIC = -fPIC CFLAGS = $(WARN) $(FPIC) LDFLAGS = -Wall #STATIC LIB info STATIC_LIB = lib$(EXEC).a #headers HEADERS = ../my_message.h ../print_message.h #directories INSTALL_LIB_DIR = /usr/lib INSTALL_BIN_DIR = /usr/bin #----------------------------- .PHONY: all clean distclean install_lib install_bin $(MAKEFILE) $(SOURCES) $(HEADERS) #all: $(STATIC_LIB) $(EXEC_STATIC) $(DYNAMIC_LIB) $(EXEC_DYNAMIC) all: $(STATIC_LIB) $(EXEC_STATIC) $(EXEC_STATIC): main.o $(STATIC_LIB) $(MAKEFILE) $(HEADERS) $(CC) $(LDFLAGS) -o $@ $< ./$(STATIC_LIB) $(STATIC_LIB): $(LIB_OBS) $(MAKEFILE) $(HEADERS) $(AR) rcs $@ $(LIB_OBS) %.o: ../%.c $(MAKEFILE) $(HEADERS) $(CP) $< $<.sav $(CC) -c $(CFLAGS) $(CPPFLAGS) $< -o $@ $(CP) $< $<.sav2 $(CP) $< $(BACKUP) install: install_static_lib install_static_lib: $(STATIC_LIB) $(MAKEFILE) $(HEADERS) $(CP) $< $(INSTALL_LIB_DIR) uninstall: $(RM) $(INSTALL_BIN_DIR)/$(EXEC) $(RM) $(INSTALL_LIB_DIR)/$(STATIC_LIB) clean: $(RM) $(OBS) $(EXEC_STATIC) $(STATIC_LIB) distclean: clean $(RM) *.sav *.sav2 #-Makefile.static--------------------- |
#Makefile.dynamic #Joseph Mack (C) 2009, released under GPL.v3 #demonstrates make for the my_message project #-------------------------- #user defined variables #project files PROJECT = my_message EXEC = $(PROJECT) EXEC_STATIC = $(EXEC).static EXEC_DYNAMIC = $(EXEC).dynamic LIB_OBS = print_message.o OBS = $(LIB_OBS) main.o SOURCES = ../print_message.c ../main.c #backup #at work, my usb disk is mounted on /pny/home/ #at home, my usb disk is mounted on /pny/work/ #when I change locations, I can copy the files with #cd ~/project_directory #cp -pauv /pny/$where_i_am/`pwd`/ . BACKUP_TOP_DIR = /pny/home #-------------------------- #utilities CP = cp -pauv RM = rm -f PWD = `pwd` #backup BACKUP = $(BACKUP_TOP_DIR)/$(PWD) #make specific variables MAKEFILE = Makefile CC = gcc AR = ar #FLAGS WARN = -Wall FPIC = -fPIC CFLAGS = $(WARN) $(FPIC) LDFLAGS = -Wall #DYNAMIC LIB info LIB_MAJOR = 2 LIB_MINOR = 0 LIB_EXTRA = 1 DYNAMIC_LIB = lib$(EXEC).so.$(LIB_MAJOR).$(LIB_MINOR).$(LIB_EXTRA) SONAME = lib$(EXEC).so.$(LIB_MAJOR).$(LIB_MINOR) #headers HEADERS = ../my_message.h ../print_message.h #directories INSTALL_LIB_DIR = /usr/lib INSTALL_BIN_DIR = /usr/bin #----------------------------- .PHONY: all clean distclean install install_lib install_bin uninstall #all: $(STATIC_LIB) $(EXEC_STATIC) $(DYNAMIC_LIB) $(EXEC_DYNAMIC) all: $(DYNAMIC_LIB) $(EXEC_DYNAMIC) $(EXEC_DYNAMIC): main.o $(DYNAMIC_LIB) $(MAKEFILE) $(HEADERS) $(CC) $(LDFLAGS) -o $@ $< ./$(DYNAMIC_LIB) $(DYNAMIC_LIB): $(LIB_OBS) $(MAKEFILE) $(HEADERS) $(CC) -shared -Wl,-soname,$(SONAME) -o $(DYNAMIC_LIB) $(LIB_OBS) %.o: ../%.c $(MAKEFILE) $(HEADERS) $(CP) $< $<.sav $(CC) -c $(CFLAGS) $(CPPFLAGS) $< -o $@ $(CP) $< $<.sav2 $(CP) $< $(BACKUP) install: install_dynamic_lib install_dynamic_bin install_dynamic_lib: $(DYNAMIC_LIB) $(MAKEFILE) $(HEADERS) $(CP) $< $(INSTALL_LIB_DIR) ldconfig install_dynamic_bin: $(EXEC_DYNAMIC) $(MAKFILE) $(HEADERS) $(CP) $< $(INSTALL_BIN_DIR)/$(EXEC) uninstall: $(RM) $(INSTALL_BIN_DIR)/$(EXEC) $(RM) $(INSTALL_LIB_DIR)/lib$(EXEC).so.$(LIB_MAJOR).$(LIB_MINOR).$(LIB_EXTRA) $(RM) $(INSTALL_LIB_DIR)/$(SONAME) clean: $(RM) $(OBS) $(EXEC_DYNAMIC) $(DYNAMIC_LIB) distclean: clean $(RM) *.sav *.sav2 #-Makefile.dynamic--------------------- |
[72]
If a function is only called by another function in the same translation unit, it should be declared static. parse_message() is used in print_message.h, but is defined in another file, so print_message.h it should be declared extern. The compiler will find the definition eventually, even if you don't use the type specifier, but the code is cleaner if you tell the compiler that the definition isn't in this file.
//print_message.c . . static void print_success(void) { printf ("success\n"); } static void print_error(void) { printf ("error\n"); } //-print_message.c-------------------- |
//print_message.h //Joseph Mack (C) 2009, released under GPL.v3 //header file for my_message project #ifndef PRINT_MESSAGE_H #define PRINT_MESSAGE_H extern void parse_message(int); static void print_success(void); static void print_error(void); #endif /* PRINT_MESSAGE_H */ //-print_message.h------------------ |
AustinTek homepage | | Linux Virtual Server Links | AZ_PROJ map server | |