0% found this document useful (0 votes)
53 views

Unit V

GCC (GNU Compiler Collection) is a compiler system originally developed for the GNU operating system. It was created to promote freedom and cooperation among programmers. GCC supports compilation of C, C++, Objective-C, Fortran, Java and other languages. It consists of the GNU Compiler Collection (GCC) and other tools like Make, Binutils and Debugger. GCC is portable and runs on Unix-like systems and Windows. It compiles code in four steps: preprocessing, compilation, assembly and linking. The document provides instructions on installing and using GCC on Linux, Mac and Windows to compile simple C and C++ programs.

Uploaded by

Abhishek Dhaka
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
53 views

Unit V

GCC (GNU Compiler Collection) is a compiler system originally developed for the GNU operating system. It was created to promote freedom and cooperation among programmers. GCC supports compilation of C, C++, Objective-C, Fortran, Java and other languages. It consists of the GNU Compiler Collection (GCC) and other tools like Make, Binutils and Debugger. GCC is portable and runs on Unix-like systems and Windows. It compiles code in four steps: preprocessing, compilation, assembly and linking. The document provides instructions on installing and using GCC on Linux, Mac and Windows to compile simple C and C++ programs.

Uploaded by

Abhishek Dhaka
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 24

3CS4-06: Linux and Shell Programming Unit V

1. GCC (GNU Compiler Collection)

1.1 A Brief History and Introduction to GCC


The original GNU C Compiler (GCC) is developed by Richard Stallman, the founder of the GNU Project.
Richard Stallman founded the GNU project in 1984 to create a complete Unix-like operating system as
free software, to promote freedom and cooperation among computer users and programmers.
GCC, formerly for "GNU C Compiler", has grown over times to support many languages such as C (gcc),
C++ (g++), Objective-C, Objective-C++, Java (gcj), Fortran (gfortran), Ada (gnat), Go (gccgo), OpenMP,
Cilk Plus, and OpenAcc. It is now referred to as "GNU Compiler Collection". The mother site for GCC
is http://gcc.gnu.org/. The current version is GCC 12.2, released on 2022-08-19.
GCC is a key component of so-called "GNU Toolchain", for developing applications and writing
operating systems. The GNU Toolchain includes:

1. GNU Compiler Collection (GCC): a compiler suite that supports many languages, such as C/C++
and Objective-C/C++.

2. GNU Make: an automation tool for compiling and building applications.

3. GNU Binutils: a suite of binary utility tools, including linker and assembler.

4. GNU Debugger (GDB).

5. GNU Autotools: A build system including Autoconf, Autoheader, Automake and Libtool.

6. GNU Bison: a parser generator (similar to lex and yacc).

GCC is portable and run in many operating platforms. GCC (and GNU Toolchain) is currently available
on all Unixes. They are also ported to Windows (by Cygwin, MinGW and MinGW-W64). GCC is also
a cross-compiler, for producing executables on different platform.
C++ Standard Support
GCC supports different dialects of C++, corresponding to the multiple published ISO standards. Which
standard it implements can be selected using the -std= command-line option.
 C++98  C++17
 C++11  C++20
 C++14  C++23

1.2 Installing GCC on Linux


A. Command to install GCC and Development Tools on a Ubuntu

Open a command line terminal and install C compiler by installation of the development package build-
essential:

$ sudo apt update

1|Page
3CS4-06: Linux and Shell Programming Unit V

$ sudo apt install build-essential

B. Command to install GCC and Development Tools on a CentOS / RHEL 7 server


Open a command line terminal and type the following yum command as root user:
# yum group install "Development Tools"
OR
$ sudo yum group install "Development Tools"

C. Install GCC Compiler on Mac OS


Open a Terminal, and enter "gcc --version". If gcc is not installed, the system will prompt you to
install gcc.
$ gcc --version
......
Target: x86_64-apple-darwin14.5.0 // 64-bit target codes
Thread model: posix
OR with Xcode

Step #1: Install Xcode on a Apple Mac OS X

First, make sure Xcode is installed. If it is not installed on OS X, visit app store and install Xcode.

Step #2: Install gcc/LLVM compiler on OS X

Once installed, open Xcode and visit:

Xcode menu > Preferences > Downloads > choose "Command line tools" > Click "Install" button:

1.3 Post Installation


i. Versions
You could display the version of GCC via --version option:
$ gcc --version
gcc (GCC) 6.4.0

$ g++ --version
g++ (GCC) 6.4.0
More details can be obtained via -v option, for example,
$ gcc -v
Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-pc-cygwin/6.4.0/lto-wrapper.exe
Target: x86_64-pc-cygwin
Configured with: ......

2|Page
3CS4-06: Linux and Shell Programming Unit V

Thread model: posix


gcc version 6.4.0 (GCC)

$ g++ -v
Using built-in specs.
COLLECT_GCC=g++
COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-pc-cygwin/6.4.0/lto-wrapper.exe
Target: x86_64-pc-cygwin
Configured with: ......
Thread model: posix
gcc version 6.4.0 (GCC)
ii. Help
You can get the help manual via the --help option. For example,
$ gcc --help

iii. Man Pages


You can read the GCC manual pages (or man pages) via the man utility:
$ man gcc

// or

$ man g++

// Press space key for next page, or 'q' to quit.

Reading man pages under CMD or Bash shell can be difficult. You could generate a text file via:
$ man gcc | col -b > gcc.txt

The col utility is needed to strip the backspace. (For Cygwin, it is available in "Utils", "util-linux"
package.)
Alternatively, you could look for an online man pages, e.g., http://linux.die.net/man/1/gcc.
The GCC man pages are kept under "usr/share/man/man1".
$ whereis gcc

gcc: /usr/bin/gcc.exe /usr/lib/gcc /usr/share/man/man1/gcc.1.gz

1.4 Getting Started


The GNU C and C++ compiler are called gcc and g++, respectively.
i. Compile/Link a Simple C Program - hello.c
Below is the Hello-world C program hello.c:
// hello.c
#include <stdio.h>

int main()
{
printf("Hello, world!\n");
return 0;
}
To compile the hello.c:
$ gcc hello.c
3|Page
3CS4-06: Linux and Shell Programming Unit V

// Compile and link source file hello.c into executable a


The default output executable is called "a.out" (Linux/Unixes and Mac OS X).
ii. To run the program:
// (Unixes / Mac OS X) In Bash Shell - include the current path (./)
$ chmod a+x a.out
$ ./a.out
Notes for Linux and Bash Shell:
 In Bash shell, the default PATH does not include the current working directory. Hence, you need
to include the current path (./) in the command. (Windows include the current directory in the
PATH automatically; whereas Unixes do not - you need to include the current directory explicitly
in the PATH.)
 You also need to include the file extension, if any, i.e., "./a.out".
 In Unixes, the output file could be "a.out" or simply "a". Furthermore, you need to
assign executable file-mode (x) to the executable file "a.out", via command "chmod a+x filename"
(add executable file-mode "+x" to all users "a+x").
iii. To specify the output filename, use -o option:
// (Unixes / Mac OS X) In Bash shell
$ gcc -o hello hello.c
$ chmod a+x hello
$ ./hello
NOTE for Linux:
 In Linux, we typically omit the .exe file extension (meant for Windows only), and simply name
the output executable as hello (via command "gcc -o hello hello.c".
 You need to assign executable file mode via command "chmod a+x hello".
iv. Compile/Link a Simple C++ Program - hello.cpp
// hello.cpp
#include <iostream>
using namespace std;

int main()
{
cout << "Hello, world!" << endl;
return 0;
}
You need to use g++ to compile C++ program, as follows. We use the -o option to specify the output
file name.
// (Unixes / Mac OS X) In Bash shell

$ g++ -o hello hello.cpp

$ chmod a+x hello

$ ./hello

4|Page
3CS4-06: Linux and Shell Programming Unit V

v. More GCC Compiler Options


A few commonly-used GCC compiler options are:
$ g++ -Wall -g -o Hello.exe Hello.cpp

 -o: specifies the output executable filename.


 -Wall: prints "all" Warning messages.
 -g: generates additional symbolic debuggging information for use with gdb debugger.
vi. Compile and Link Separately
The above command compile the source file into object file and link with other object files and system
libraries into executable in one step. You may separate compile and link in two steps as follows:
// Compile-only with -c option

> g++ -c -Wall -g Hello.cpp

// Link object file(s) into an executable

> g++ -g -o Hello.exe Hello.o

The options are:


 -c: Compile into object file "Hello.o". By default, the object file has the same name as the source
file with extension of ".o" (there is no need to specify -o option). No linking with other object files
or libraries.
 Linking is performed when the input file are object files ".o" (instead of source file ".cpp" or ".c").
GCC uses a separate linker program (called ld.exe) to perform the linking.
vii. Compile and Link Multiple Source Files
Suppose that your program has two source files: file1.cpp, file2.cpp. You could compile all of them in
a single command:
$ g++ -o myprog.exe file1.cpp file2.cpp

However, we usually compile each of the source files separately into object file, and link them together
in the later stage. In this case, changes in one file does not require re-compilation of the other files.
$ g++ -c file1.cpp

$g++ -c file2.cpp

$ g++ -o myprog.exe file1.o file2.o

Compile into a Shared Library


To compile and link C/C++ program into a shared library (".dll" in Windows, ".so" in Unixes), use -
shared option. Read "Java Native Interface" for example.
1.5 GCC Compilation Process
GCC compiles a C/C++ program into executable in 4 steps as shown in the above diagram. For example,
a "gcc -o hello.exe hello.c" is carried out as follows:
1. Pre-processing: via the GNU C Preprocessor (cpp.exe), which includes the headers (#include)
and expands the macros (#define).
$ cpp hello.c > hello.i

The resultant intermediate file "hello.i" contains the expanded source code.
2. Compilation: The compiler compiles the pre-processed source code into assembly code for a
specific processor.

5|Page
3CS4-06: Linux and Shell Programming Unit V

$ gcc -S hello.i

The -S option specifies to produce assembly code, instead of object code. The resultant assembly
file is "hello.s".
3. Assembly: The assembler (as.exe) converts the assembly code into machine code in the object
file "hello.o".
$ as -o hello.o hello.s

4. Linker: Finally, the linker (ld.exe) links the object code with the library code to produce an
executable file "hello.exe".
$ ld -o hello.exe hello.o ...libraries...

Verbose Mode (-v)


You can see the detailed compilation process by enabling -v (verbose) option. For example,
$ gcc -v -o hello.exe hello.c

Defining Macro (-D)


You can use the -Dname option to define a macro, or -Dname=value to define a macro with a value.
The value should be enclosed in double quotes if it contains spaces.
1.6 Headers ( .h), Static Libraries ( .lib, .a) and Shared Library ( .dll, .so)

Static Library vs. Shared Library


A library is a collection of pre-compiled object files that can be linked into your programs via the linker.
Examples are the system functions such as printf() and sqrt().
There are two types of external libraries: static library and shared library.
1. A static library has file extension of ".a" (archive file) in Unixes or ".lib" (library) in Windows.
When your program is linked against a static library, the machine code of external functions used
in your program is copied into the executable. A static library can be created via
the archive program "ar.exe".
2. A shared library has file extension of ".so" (shared objects) in Unixes or ".dll" (dynamic link
library) in Windows. When your program is linked against a shared library, only a small table is
created in the executable. Before the executable starts running, the operating system loads the
machine code needed for the external functions - a process known as dynamic linking. Dynamic
linking makes executable files smaller and saves disk space, because one copy of a library can

6|Page
3CS4-06: Linux and Shell Programming Unit V

be shared between multiple programs. Furthermore, most operating systems allows one copy of
a shared library in memory to be used by all running programs, thus, saving memory. The shared
library codes can be upgraded without the need to recompile your program.
Because of the advantage of dynamic linking, GCC, by default, links to the shared library if it is
available.
You can list the contents of a library via "nm filename".
Searching for Header Files and Libraries (-I, -L and -l)
 When compiling the program, the compiler needs the header files to compile the source codes;
the linker needs the libraries to resolve external references from other object files or libraries.
The compiler and linker will not find the headers/libraries unless you set the appropriate
options, which is not obvious for first-time user. For each of the headers used in your source
(via #include directives), the compiler searches the so-called include-paths for these headers.
The include-paths are specified via -Idir option (or environment variable CPATH). Since the
header's filename is known (e.g., iostream.h, stdio.h), the compiler only needs the directories.
 The linker searches the so-called library-paths for libraries needed to link the program into an
executable. The library-path is specified via -Ldir option (uppercase 'L' followed by the
directory path) (or environment variable LIBRARY_PATH). In addition, you also have to
specify the library name. In Unixes, the library libxxx.a is specified via -lxxx option (lowercase
letter 'l', without the prefix "lib" and ".a" extension). In Windows, provide the full name such
as -lxxx.lib. The linker needs to know both the directories as well as the library names. Hence,
two options need to be specified.
Default Include-paths, Library-paths and Libraries
Try list the default include-paths in your system used by the "GNU C Preprocessor" via "cpp -v":
$cpp -v

......

#include "..." search starts here:

#include <...> search starts here:

/usr/lib/gcc/x86_64-pc-cygwin/6.4.0/include
/usr/include
/usr/lib/gcc/x86_64-pc-cygwin/6.4.0/../../../../lib/../include/w32api

Try running the compilation in verbose mode (-v) to study the library-paths (-L) and libraries (-l) used
in your system:
> gcc -v -o hello.exe hello.c

......

-L/usr/lib/gcc/x86_64-pc-cygwin/6.4.0

-L/usr/x86_64-pc-cygwin/lib

-L/usr/lib

-L/lib

-lgcc_s // libgcc_s.a

-lgcc // libgcc.a

7|Page
3CS4-06: Linux and Shell Programming Unit V

-lcygwin // libcygwin.a

-ladvapi32 // libadvapi32.a

-lshell32 // libshell32.a

-luser32 // libuser32.a

-lkernel32 // libkernel32.a

1.7 GCC Environment Variables

GCC uses the following environment variables:


 PATH: For searching the executables and run-time shared libraries (.dll, .so).
 CPATH: For searching the include-paths for headers. It is searched after paths specified in -
I<dir> options. C_INCLUDE_PATH and CPLUS_INCLUDE_PATH can be used to specify C
and C++ headers if the particular language was indicated in pre-processing.
 LIBRARY_PATH: For searching library-paths for link libraries. It is searched after paths
specified in -L<dir> options.

1.8 Debug C Program using gdb in 6 Simple Steps


Write a sample C program with errors for debugging purpose
To learn C program debugging, let us create the following C program that calculates and prints the
factorial of a number. However this C program contains some errors in it for our debugging purpose.

$ vim factorial.c
# include <stdio.h>
int main()
{
int i, num, j;
printf ("Enter the number: ");
scanf ("%d", &num );

for (i=1; i<num; i++)


j=j*i;
printf("The factorial of %d is %d\n",num,j);
}
$ gcc factorial.c

$ ./a.out
Enter the number: 3
The factorial of 3 is 12548672
Let us debug it while reviewing the most useful commands in gdb.
Step 1. Compile the C program with debugging option -g
Compile your C program with -g option. This allows the compiler to collect the debugging
information.
8|Page
3CS4-06: Linux and Shell Programming Unit V

$ gcc -g factorial.c
Note: The above command creates a.out file which will be used for debugging as shown below.
Step 2. Launch gdb
Launch the C debugger (gdb) as shown below.
$ gdb a.out
Step 3. Set up a break point inside C program
Syntax:
break line_number
Other formats:
 break [file_name]:line_number
 break [file_name]:func_name
Places break point in the C program, where you suspect errors. While executing the program, the
debugger will stop at the break point, and gives you the prompt to debug.
So before starting up the program, let us place the following break point in our program.
break 10
Breakpoint 1 at 0x804846f: file factorial.c, line 10.

Step 4. Execute the C program in gdb debugger


run [args]
You can start running the program using the run command in the gdb debugger. You can also give
command line arguments to the program via run args. The example program we used here does not
requires any command line arguments so let us give run, and start the program execution.
run
Starting program: /home/sathiyamoorthy/Debugging/c/a.out
Once you executed the C program, it would execute until the first break point, and give you the
prompt for debugging.
Breakpoint 1, main () at factorial.c:10
10 j=j*i;
You can use various gdb commands to debug the C program as explained in the sections below.

Step 5. Printing the variable values inside gdb debugger


Syntax: print {variable}

Examples:
print i
print j
print num
9|Page
3CS4-06: Linux and Shell Programming Unit V

(gdb) p i
$1 = 1
(gdb) p j
$2 = 3042592
(gdb) p num
$3 = 3
(gdb)
As you see above, in the factorial.c, we have not initialized the variable j. So, it gets garbage value
resulting in a big numbers as factorial values.
Fix this issue by initializing variable j with 1, compile the C program and execute it again.
Even after this fix there seems to be some problem in the factorial.c program, as it still gives wrong
factorial value.
So, place the break point in 10th line, and continue as explained in the next section.

Step 6. Continue, stepping over and in – gdb commands


There are three kind of gdb operations you can choose when the program stops at a break point. They
are continuing until the next break point, stepping in, or stepping over the next program lines.
 c or continue: Debugger will continue executing until the next break point.
 n or next: Debugger will execute the next line as single instruction.
 s or step: Same as next, but does not treats function as a single instruction, instead goes into the
function and executes it line by line.
By continuing or stepping through you could have found that the issue is because we have not used
the <= in the ‘for loop’ condition checking. So changing that from < to <= will solve the issue.

gdb command shortcuts


Use following shortcuts for most of the frequent gdb operations.

 l – list
 p – print
 c – continue
 s – step
 ENTER: pressing enter key would execute the previously executed command again.

Miscellaneous gdb commands


 l command: Use gdb command l or list to print the source code in the debug mode. Use l line-
number to view a specific line number (or) l function to view a specific function.
 bt: backtrack – Print backtrace of all stack frames, or innermost COUNT frames.
 help – View help for a particular gdb topic — help TOPICNAME.
 quit – Exit from the gdb debugger.
1.9 Software Package

10 | P a g e
3CS4-06: Linux and Shell Programming Unit V

Linux-like operating systems offer a centralized mechanism for finding and installing software. Software
is usually distributed in the form of packages, kept in repositories. Working with packages is known as
package management. Packages provide the core components of an operating system, along with shared
libraries, applications, services, and documentation.
A package management system does much more than one-time installation of software. It also provides
tools for upgrading already-installed packages. Package repositories help to ensure that code has been
vetted for use on your system, and that the installed versions of software have been approved by
developers and package maintainers.
When configuring servers or development environments, it’s often necessary to look beyond official
repositories. Packages in the stable release of a distribution may be out of date, especially where new or
rapidly-changing software is concerned. Nevertheless, package management is a vital skill for system
administrators and developers, and the wealth of packaged software for major distributions is a
tremendous resource.
This guide is intended as a quick reference for the fundamentals of finding, installing, and upgrading
packages on a variety of distributions, and should help you translate that knowledge between systems.
Package Management Systems: A Brief Overview

Most package systems are built around collections of package files. A package file is usually an archive
which contains compiled applications and other resources used by the software, along with installation
scripts. Packages also contain valuable metadata, including their dependencies, a list of other packages
required to install and run them.
While their functionality and benefits are broadly similar, packaging formats and tools vary by platform:
 For Debian / Ubuntu: .deb packages installed by apt and dpkg
 For Rocky / Fedora / RHEL: .rpm packages installed by yum
 For FreeBSD: .txz packages installed by pkg
 In Debian and systems based on it, like Ubuntu, Linux Mint, and Raspbian, the package format
is the .deb file. apt, the Advanced Packaging Tool, provides commands used for most common
operations: Searching repositories, installing collections of packages and their dependencies,
and managing upgrades. apt commands operate as a front-end to the lower-level dpkg utility,
which handles the installation of individual .deb files on the local system, and is sometimes
invoked directly.
Recent releases of most Debian-derived distributions include a single apt command, which
offers a concise and unified interface to common operations that have traditionally been handled
by the more-specific apt-get and apt-cache.
 Rocky Linux, Fedora, and other members of the Red Hat family use RPM files. These used to
use a package manager called yum. In recent versions of Fedora and its derivatives, yum has
been supplanted by dnf, a modernized fork which retains most of yum’s interface.
 FreeBSD’s binary package system is administered with the pkg command. FreeBSD also offers
the Ports Collection, a local directory structure and tools which allow the user to fetch, compile,
and install packages directly from source using Makefiles. It’s usually much more convenient
to use pkg, but occasionally a pre-compiled package is unavailable, or you may need to change
compile-time options.
i. Update Package Lists

Most systems keep a local database of the packages available from remote repositories. It’s best to
update this database before installing or upgrading packages. As a partial exception to this
pattern, dnf will check for updates before performing some operations, but you can ask at any time
whether updates are available.

11 | P a g e
3CS4-06: Linux and Shell Programming Unit V

 For Debian / Ubuntu: sudo apt update


 For Rocky / Fedora / RHEL: dnf check-update
 For FreeBSD Packages: sudo pkg update
 For FreeBSD Ports: sudo portsnap fetch update
ii. Upgrade Installed Packages

Making sure that all of the installed software on a machine stays up to date would be an enormous
undertaking without a package system. You would have to track upstream changes and security alerts
for hundreds of different packages. While a package manager doesn’t solve every problem you’ll
encounter when upgrading software, it does enable you to maintain most system components with a
few commands.
On FreeBSD, upgrading installed ports can introduce breaking changes or require manual configuration
steps. It’s best to read /usr/ports/UPDATING before upgrading with portmaster.
 For Debian / Ubuntu: sudo apt upgrade
 For Rocky / Fedora / RHEL: sudo dnf upgrade
 For FreeBSD Packages: sudo pkg upgrade
iii. Find a Package

Most distributions offer a graphical or menu-driven front end to package collections. These can be a
good way to browse by category and discover new software. Often, however, the quickest and most
effective way to locate a package is to search with command-line tools.
 For Debian / Ubuntu: apt search search_string
 For Rocky / Fedora / RHEL: dnf search search_string
 For FreeBSD Packages: pkg search search_string
Note: On Rocky, Fedora, or RHEL, you can search package titles and descriptions together by using dnf
search all. On FreeBSD, you can search descriptions by using pkg search -D
iv. View Info about a Specific Package

When deciding what to install, it’s often helpful to read detailed descriptions of packages. Along with
human-readable text, these often include metadata like version numbers and a list of the package’s
dependencies.
 For Debian / Ubuntu: apt show package
 For Rocky / Fedora / RHEL: dnf info package
 For FreeBSD Packages: pkg info package
 For FreeBSD Ports: cd /usr/ports/category/port && cat pkg-descr
v. Install a Package from Repositories

Once you know the name of a package, you can usually install it and its dependencies with a single
command. In general, you can supply multiple packages to install at once by listing them all.
 For Debian / Ubuntu: sudo apt install package
 For Rocky / Fedora / RHEL: sudo dnf install package
 For FreeBSD Packages: sudo pkg install package
vi. Install a Package from the Local Filesystem

Sometimes, even though software isn’t officially packaged for a given operating system, a developer or
vendor will offer package files for download. You can usually retrieve these with your web browser, or
via curl on the command line. Once a package is on the target system, it can often be installed with a
single command.
12 | P a g e
3CS4-06: Linux and Shell Programming Unit V

On Debian-derived systems, dpkg handles individual package files. If a package has unmet
dependencies, gdebi can often be used to retrieve them from official repositories.
On On Rocky Linux, Fedora, or RHEL, dnf is used to install individual files, and will also handle
needed dependencies.
 For Debian / Ubuntu: sudo dpkg -i package.deb
 For Rocky / Fedora / RHEL: sudo dnf install package.rpm
 For FreeBSD Packages: sudo pkg add package.txz
vii. Remove One or More Installed Packages

Since a package manager knows what files are provided by a given package, it can usually remove them
cleanly from a system if the software is no longer needed.
 For Debian / Ubuntu: sudo apt remove package
 For Rocky / Fedora / RHEL: sudo dnf erase package
 For FreeBSD Packages: sudo pkg delete package
viii. Get Help

In addition to web-based documentation, keep in mind that Unix manual pages (usually referred to
as man pages) are available for most commands from the shell. To read a page, use man:
$man page
In man, you can navigate with the arrow keys. Press / to search for text within the page, and q to quit.
 For Debian / Ubuntu: man apt
 For Rocky / Fedora / RHEL: man dnf
 For FreeBSD Packages: man pkg
 For FreeBSD Ports: man ports
Source code management

Source code management (SCM) is used to track modifications to a source code repository. SCM tracks
a running history of changes to a code base and helps resolve conflicts when merging updates from
multiple contributors. SCM is also synonymous with Version control.

As software projects grow in lines of code and contributor head count, the costs of communication
overhead and management complexity also grow. SCM is a critical tool to alleviate the organizational
strain of growing development costs.

The importance of source code management tools

When multiple developers are working within a shared codebase it is a common occurrence to make
edits to a shared piece of code. Separate developers may be working on a seemingly isolated feature,
however this feature may use a shared code module. Therefore developer 1 working on Feature 1 could
make some edits and find out later that Developer 2 working on Feature 2 has conflicting edits.

Before the adoption of SCM this was a nightmare scenario. Developers would edit text files directly
and move them around to remote locations using FTP or other protocols. Developer 1 would make edits
and Developer 2 would unknowingly save over Developer 1’s work and wipe out the changes. SCM’s
role as a protection mechanism against this specific scenario is known as Version Control.

SCM brought version control safeguards to prevent loss of work due to conflict overwriting. These
safeguards work by tracking changes from each individual developer and identifying areas of conflict
13 | P a g e
3CS4-06: Linux and Shell Programming Unit V

and preventing overwrites. SCM will then communicate these points of conflict back to the developers
so that they can safely review and address.

This foundational conflict prevention mechanism has the side effect of providing passive
communication for the development team. The team can then monitor and discuss the work in progress
that the SCM is monitoring. The SCM tracks an entire history of changes to the code base. This allows
developers to examine and review edits that may have introduced bugs or regressions.

The benefits of source code management

In addition to version control SCM provides a suite of other helpful features to make collaborative code
development a more user friendly experience. Once SCM has started tracking all the changes to a project
over time, a detailed historical record of the projects life is created. This historical record can then be
used to ‘undo’ changes to the codebase. The SCM can instantly revert the codebase back to a previous
point in time. This is extremely valuable for preventing regressions on updates and undoing mistakes.

The SCM archive of every change over a project's life time provides valuable record keeping for a
project's release version notes. A clean and maintained SCM history log can be used interchangeably
as release notes. This offers insight and transparency into the progress of a project that can be shared
with end users or non-development teams.

SCM will reduce a team’s communication overhead and increase release velocity. Without SCM
development is slower because contributors have to take extra effort to plan a non-overlapping sequence
of develop for release. With SCM developers can work independently on separate branches of feature
development, eventually merging them together.

Overall SCM is a huge aid to engineering teams that will lower development costs by allowing
engineering resources to execute more efficiently. SCM is a must have in the modern age of software
development. Professional teams use version control and your team should too.
Source code management best practices
Commit often
Commits are cheap and easy to make. They should be made frequently to capture updates to a code
base. Each commit is a snapshot that the codebase can be reverted to if needed. Frequent commits give
many opportunities to revert or undo work. A group of commits can be combined into a single commit
using a rebase to clarify the development log.
Ensure you're working from latest version
SCM enables rapid updates from multiple developers. It’s easy to have a local copy of the codebase fall
behind the global copy. Make sure to git pull or fetch the latest code before making updates. This will
help avoid conflicts at merge time.
Make detailed notes
Each commit has a corresponding log entry. At the time of commit creation, this log entry is populated
with a message. It is important to leave descriptive explanatory commit log messages. These commit
log messages should explain the “why” and “what” that encompass the commits content. These log
messages become the canonical history of the project’s development and leave a trail for future
contributors to review.
Review changes before committing
14 | P a g e
3CS4-06: Linux and Shell Programming Unit V

SCM’s offer a ‘staging area’. The staging area can be used to collect a group of edits before writing
them to a commit. The staging area can be used to manage and review changes before creating the
commit snapshot. Utilizing the staging area in this manner provides a buffer area to help refine the
contents of the commit.
Use Branches
Branching is a powerful SCM mechanism that allows developers to create a separate line of
development. Branches should be used frequently as they are quick and inexpensive. Branches enable
multiple developers to work in parallel on separate lines of development. These lines of development
are generally different product features. When development is complete on a branch it is then merged
into the main line of development.
Agree on a Workflow
By default SCMs offer very free form methods of contribution. It is important that teams establish
shared patterns of collaboration. SCM workflows establish patterns and processes for merging branches.
If a team doesn't agree on a shared workflow it can lead to inefficient communication overhead when it
comes time to merge branches.
Revision Control System (RCS)
A revision control system (RCS) is an application capable of storing, logging, identifying, merging or
identifying information related to the revision of software, application documentation, papers or forms.
Most revision control systems store this information with the help of a differential utility for documents.
A revision control system is an essential tool for an organization with multi-developer tasks or projects,
as it is capable of identifying issues and bugs and of retrieving an earlier working version of an
application or document whenever required.
Most revision control systems run as independent standalone applications. There are two types of
revision control systems: centralized and decentralized. Some applications like spreadsheets and word
processors have built-in revision control mechanisms. Designers and developers at times use revision
control for maintaining the documentation along with the configuration files for their developments.
High-quality documentation and products are possible with the proper use of revision control systems.
A revision control system has the following features:

 For all documents and document types, up-to-date history can be made available.
 It is a simple system and does not require other repository systems.
 For every document maintained, check-ins and check-outs can be done.
 It has the ability to retrieve and revert to an old version of the document. This is extremely
helpful in case of accidental deletions.
 In a streamlined manner, side features and bugs can be identified and fixed using the system.
Troubleshooting is also made easier.
 Its tag system helps in differentiating between alpha, beta or release versions for different
documents or applications.
 Collaboration becomes easier in a multi-person application development project.

CVS - Concurrent Versions System

CVS is a version control system, an important component of Source Configuration Management


(SCM). Using it, you can record the history of sources files, and documents. It fills a similar role to the
free software RCS, PRCS, and Aegis packages.

15 | P a g e
3CS4-06: Linux and Shell Programming Unit V

CVS is a production quality system in wide use around the world, including many free software projects.
While CVS stores individual file history in the same format as RCS, it offers the following significant
advantages over RCS:

 It can run scripts which you can supply to log CVS operations or enforce site-specific policies.
 Client/server CVS enables developers scattered by geography or slow modems to function as
a single team. The version history is stored on a single central server and the client machines
have a copy of all the files that the developers are working on. Therefore, the network between
the client and the server must be up to perform CVS operations (such as checkins or updates)
but need not be up to edit or manipulate the current versions of the files. Clients can perform
all the same operations which are available locally.
 In cases where several developers or teams want to each maintain their own version of the files,
because of geography and/or policy, CVS's vendor branches can import a version from another
team (even if they don't use CVS), and then CVS can merge the changes from the vendor branch
with the latest files if that is what is desired.
 Unreserved checkouts, allowing more than one developer to work on the same files at the
same time.
 CVS provides a flexible modules database that provides a symbolic mapping of names to
components of a larger software distribution. It applies names to collections of directories and
files. A single command can manipulate the entire collection.
 CVS servers run on most unix variants, and clients for Windows NT/95, OS/2 and VMS are
also available. CVS will also operate in what is sometimes called server mode against local
repositories on Windows 95/NT.

CVS, and the older RCS, offer version control (or revision control), the practice of maintaining
information about a project's evolution so that prior versions may be retrieved, changes tracked, and,
most importantly, the efforts of a team of developers coordinated.

Basic Concepts

RCS (Revision Control System) works within a single directory. To accommodate large projects using
a hierarchy of several directories, CVS creates two new concepts called the repository and the sandbox.
The repository (also called an archive) is the centralized storage area, managed by the version control
system and the repository administrator, which stores the projects' files. The repository contains
information required to reconstruct historical versions of the files in a project. An administrator sets up
and controls the repository using the procedures and commands.

A sandbox (also called a working directory) contains copies of versions of files from the repository.
New development occurs in sandboxes, and any number of sandboxes may be created from a single
repository. The sandboxes are independent of one another and may contain files from different stages
of the development of the same project. Users set up and control sandboxes using the procedures and
commands found in "CVS User Reference”.

In a typical interaction with the version control system, a developer checks out the most current code
from the repository, makes changes, tests the results, and then commits those changes back to the
repository when they are deemed satisfactory.

Locking and Merging

Some systems, including RCS, use a locking model to coordinate the efforts of multiple developers by
serializing file modifications. Before making changes to a file, a developer must not only obtain a copy
of it, but he must also request and obtain a lock on it from the system. This lock serves to prevent (really
dissuade) multiple developers from working on the same file at the same time. When the changes are
committed, the developer unlocks the file, permitting other developers to gain access to it. The locking

16 | P a g e
3CS4-06: Linux and Shell Programming Unit V

model is pessimistic: it assumes that conflicts must be avoided. Serialization of file modifications
through locks prevents conflicts. But it is cumbersome to have to lock files for editing when bug-
hunting. Often, developers will circumvent the lock mechanism to keep working, which is an invitation
to trouble. Unlike RCS and SCCS, CVS uses a merging model which allows everyone to have access
to the files at all times and supports concurrent development. The merging model is optimistic: it
assumes that conflicts are not common and that when they do occur, it usually isn't difficult to resolve
them. CVS is capable of operating under a locking model via the -L and -l options to the admin
command. Also, CVS has special commands (edit and watch) for those who want additional
development coordination support. CVS uses locks internally to prevent corruption when multiple
people are accessing the repository simultaneously, but this is different from the user-visible locks of
the locking model discussed here.

Conflicts and Merging

In the event that two developers commit changes to the same version of a file, CVS automatically defers
the commit of the second committer's file. The second developer then issues the cvs update command,
which merges the first developer's changes into the local file. In many cases, the changes will be in
different areas of the file, and the merge is successful. However, if both developers have made changes
to the same area of the file, the second to commit will have to resolve the conflict. This involves
examination of the problematic area(s) of the file and selection among the multiple versions or making
changes that resolve the conflict. CVS only detects textual conflicts, but conflict resolution is concerned
with keeping the project as a whole logically consistent. Therefore, conflict resolution sometimes
involves changing files other than the one about which CVS complained. For example, if one developer
adds a parameter to a function definition, it may be necessary for all the calls to that function to be
modified to pass the additional parameter. This is a logical conflict, so its detection and resolution is
the job of the developers (with support from tools like compilers and debuggers); CVS won't notice the
problem. In any merge situation, whether or not there was a conflict, the second developer to commit
will often want to retest the resulting version of the project because it has changed since the original
commit. Once it passes, the developer will need to recommit the file.

Tagging

CVS tracks file versions by revision number, which can be used to retrieve a particular revision from
the repository. In addition, it is possible to create symbolic tags so that a group of files (or an entire
project) can be referred to by a single identifier even when the revision numbers of the files are not the
same (which is most often the case). This capability is often used to keep track of released versions or
other important project milestones.

For example, the symbolic tag hello-1_0 might refer to revision number 1.3 of hello.c and revision
number 1.1 of Makefile (symbolic tags are created with the tag and rtag commands).

Branching

The simplest form of development is linear, in which there is a succession of revisions to a file, and
each derived from the prior revision. Many projects can get by with a completely linear development
process, but larger projects (as measured by number of files, number of developers, and/or the size of
the user community) often run into maintenance issues that require additional capabilities. Sometimes,
it is desirable to do some speculative development while the main line of development continues
uninterrupted. Other times, bugs in the currently released version must be fixed while work on the next
version is underway. In both of these cases, the solution is to create a branch (fork) from an appropriate
point in the development of the project. If at a future point some or all of the changes on the branch are
needed back on the main line of development (or elsewhere), they can be merged in (joined). Branches
are forked with the tag -b command; they are joined with the update -j command.

17 | P a g e
3CS4-06: Linux and Shell Programming Unit V

AWK
The awk command is fundamentally a scripting language and a powerful text manipulation tool in
Linux. It is named after its founders Alfred Aho, Peter Weinberger, and Brian Kernighan. Awk is
popular because of its ability to process text (strings) as easily as numbers.
It scans a sequence of input lines, or records, one by one, searching for lines that match the pattern.
When a match is found, an action can be performed. It is a pattern-action language.

Input to awk can come from files, redirection and pipes or directly from standard input.

Terminology

Let’s get on to some basic terms before we dive into the tutorial. This will make it easier for you to
understand the concept better.

1. Records: awk perceives each line as a record.

 RS is used to mention record separators. By default, RS is set to newline.


 NR is the variable that tracks the record number. Its value is equal to the record being processed. NR
can be assumed to be the line number in the default scenario.
2. Fields: Each record is split into fields. That means each line is broken into fields.

 FS is the field separator. By default FS is set to whitespace. That means each word is a field.
 NF is the Number of Fields in a particular record.
Fields are numbered as:
 $0 for the whole line.
 $1 for the first field.
 $2 for the second field.
 $n for the nth field.
 $NF for the last field.
 $NF-1 for the second last field.
Standard format of awk

The standard format of awk command is:

1 $ awk ' BIGIN{/instructions/} /pattern/ {ACTIONS} END{/instructions}' file_name


 The pattern-action pair is to be enclosed within a single quote(‘)
 BEGIN and END is optional and is used for mentioning actions to be performed before and after
processing the input.
 The pattern represents the condition that if fulfilled leads to execution of the action
 The action specifies the precise set of commands to be performed when there is a successful match.
 file_name is to be specified if the input is coming from a file.
Basic usage of the awk command

awk can be used to print a message to the terminal based on some pattern in the text. If you run awk
command without any pattern and just a single print command, awk prints the message every time you
hit enter. This happens because awk command is expecting input from the command line interface.

1 $ awk '{print "This is how awk command is used for printing"}'

18 | P a g e
3CS4-06: Linux and Shell Programming Unit V

Awk Printing
Processing input from the command line using awk

We saw in the previous example that if no input-source is mentioned then awk simply takes input from
the command line.

Input under awk is seen as a collection of records and each record is further a collection of fields. We
can use this to process input in real-time.

1 $ awk '$3=="linux" {print "That is amazing!", $1}'

This code looks for the pattern where the third word in the line is ‘linux”. When a match is found it
prints the message. Here we have referenced the first field from the same line. Before moving forward,
let’s create a text file for use as input.

This can be done using cat command in linux.

creating a text file using cat


The text of the file is:

1 First 200
2 Second 300
3 Third 150
4 Fourth 300
5 Fifth 250
6 Sixth 500

19 | P a g e
3CS4-06: Linux and Shell Programming Unit V

7 Seventh 100
8 Eight 50
9 Ninth 70
10 Tenth 270
These could be the dues in rupees for different customers named First, Second…so on.

Printing from a file using fields

Input from a file can be printed using awk. We can refer to different fields to print the output in a fancy
manner.
1 $ awk '{print $1, "owes", $2}' rec.txt

print awk
$1 and $2 are used for referring to fields one and two respectively. These in our input file are the first
and second words in each line. We haven’t mentioned any pattern in this command therefore awk
command runs the action on every record. The default pattern for awk is “” which matches every line.

Playing with awk separators

There are three types of separators in awk.

 OFS: output field separator


 FS: field separator
 RS: record separator
1. Output field separator (OFS)

You can notice that by default print command separates the output fields by a whitespace. This can be
changed by changing OFS.
1 $ awk 'OFS=" owes " {print $1,$2}' rec.txt

OFS
20 | P a g e
3CS4-06: Linux and Shell Programming Unit V

The same output is achieved as the previous case. The default output field separator has been changed
from whitespace to ” owes “. This, however, is not the best way to change the OFS. All the separators
should be changed in the BEGIN section of the awk command.

2. Field Separator (FS)

Field separator can be changed by changing the value of FS. By default, FS is set to whitespace. We
created another file with the follow data. Here the name and the amount are separated by ‘-‘

1 First-200
2 Second-300
3 Third-150
4 Fourth-300
5 Fifth-250
6 Sixth-500
7 Seventh-100
8 Eight-50
9 Ninth-70
10 Tenth-270
1 $ awk 'FS="-" {print $1}' rec-sep.txt

You can notice that the first line of the output is wrong. It seems that for the first record awk was not
able to separate the fields. This is because we have mentioned the statement that changes the field
separator in the action section. The first time action section runs, is after the first record has been
processed. In this case, First-200 is read and processed with field separator as whitespace.

Correct way:
1 $ awk 'BEGIN {FS="-"} {print $1}' rec_1.txt

21 | P a g e
3CS4-06: Linux and Shell Programming Unit V

Now we get the correct output. The first record has been separated successfully. Any statement placed
in the BEGIN section runs before processing the input. BEGIN section is most often used to print a
message before the processing of input.

3. Record separator (RS)

The third type of separator is record separator. By default record separator is set to newline. Record
separator can be changed by changing the value of RS. Changing RS is useful in case the input is a CSV
(comma-separated value) file.

For example if the input is:

First-200,Second-300,Third-150,Fourth-300,Fifth-250,Sixth-500,Seventh-100,Eight-50,Ninth-
1
70,Tenth-270
This is the same input as above but in a comma separated format.

We can process such a file by changing the RS field.

1 $ awk 'BEGIN {FS="-"; RS=","; OFS=" owes Rs. "} {print $1,$2}' rec_2.txt

Reading csv
Boolean operations in awk

Boolean operations can be used as patterns. Different field values can be used to carry out comparisons.
awk works like an if-then command. In our data, we can find customers with more than Rs. 200 due.

1 $ awk '$2>200 {print $1, "owes Rs.",$2}' rec.txt

22 | P a g e
3CS4-06: Linux and Shell Programming Unit V

This gives us the list by comparing the second field of each record with the 200 and printing if the
condition is true.

Matching string literals using the awk command

Since awk works with fields we can use this to our benefit. Running ls -l command gives the list of all
the files in the current directory with additional information.

ls -l
The awk command can be used along with ls -l to find out which files were created in the month of
May. $6 is the field for displaying the month in which the file was created. We can use this and match
the field with string ‘May’.

1 $ ls -l | awk '$6=="May" {print $9}'

ls -l with awk
User-defined variables in awk

To perform additional operations variables can be defined in awk. For example to calculate the sum in
the list of people with dues greater than 200 we can define a sum variable to calculate the sum.

1 $ awk 'BEGIN {sum=0} $2>200 {sum=sum+$2; print $1} END{print sum}' rec.txt

sum variable
23 | P a g e
3CS4-06: Linux and Shell Programming Unit V

The sum variable is initialized in the BEGIN section, updated in the action section, and printed in the
END section. The action section would be used only if the condition mentioned in the pattern section
is true. Since the pattern is checked for each line, the structure works as a loop with an update being
performed each time the condition is met.

Counting with the awk command

The awk command can also be used to count the number of lines, the number of words, and even the
number of characters. Let’s start with counting the number of lines with the awk command.

Count the number of lines

The number of lines can be printed by printing out the NR variable in the END section. NR is used to
store the current record number. Since the END section is accessed after all the records are processed,
NR in the END section would contain the total number of records.
1 $ awk 'END { print NR }' rec.txt

Number of Lines
Count number of words

To get the number of words, NF can be used. NF is the number of fields in each record. If NF is totalled
over all the records, the number of words can be achieved. In the command, c is used to count the
number of words. For each line, the total number of fields in that line is added to c. In the END section,
printing c would give the total number of words.
1 $ awk 'BEGIN {c=0} {c=c+NF} END{print c}' rec.txt

Number of Words
Count number of characters

Number of characters for each line can be obtained by using the in built length function of awk. $0 is
used for getting the entire record. length($0) would give the number of characters in that record.
1 awk '{ print "number of characters in line", NR,"=" length($0) }' rec.txt

Number Of Characters

24 | P a g e

You might also like