Know What Your Linker Knows

I just had a very frustrating experience getting my new project to link to Bullet Physics.

Each time I face something which causes much frustration, I usually learn something. It's usually obvious as soon as I find the solution, but it takes a lot of trial and error before getting there.

The more of these tedious lessons I learn, the more wisdom I gain to avoid them in the future (hopefully).

I hope that you reading this will learn from my mistake so that you can avoid the frustration.

The Problem

My task was to solve the following linker error:

Link spargus_vehicle_prototype 
src/Main.o: In function `btDantzigSolver::solveMLCP(btMatrixX<float> const&, btVectorX<float> const&, btVectorX<float>&, btVectorX<float> const&, btVectorX<float> const&, btAlignedObjectArray<int> const&, int, bool)':
Main.cpp:(.text._ZN15btDantzigSolver9solveMLCPERK9btMatrixXIfERK9btVectorXIfERS5_S7_S7_RK20btAlignedObjectArrayIiEib[_ZN15btDantzigSolver9solveMLCPERK9btMatrixXIfERK9btVectorXIfERS5_S7_S7_RK20btAlignedObjectArrayIiEib]+0x5ff): 
undefined reference to `btSolveDantzigLCP(int, float*, float*, float*, float*, int, float*, float*, int*, btDantzigScratchMemory&)'

It seemed simple enough at first, just find the library I was missing from Bullet.

I tried several techniques for tracking down all the Bullet shared objects which were built to ensure I was getting everything which was compiled, but I was still getting the error. I knew I was successfully linking most of Bullet because other structures I was using worked fine.

At this point, it's very important that you know you are actually running the link command you think you are running. I confirmed this via Jam's very helpful verbose output when there are errors. This wasn't the problem.

I eventually resorted to copying the exact link call (via make VERBOSE=1) which was being used to link the working Bullet example project. I knew that example used the code my project was tripping up on, so I thought it would solve the problem. Still no dice!

The Solution

Most everything in the standard compilation pipeline generates files. On Linux, those are .o, .so and .a files.

These files seem opaque, but there are tools which can tell you what their contents are. This makes sense because if they were completely opaque, there wouldn't be any way for the linker to associate one piece of text (a function call) with another (a definition). Those .o files are all the linker knows about your program.

objdump allows you to see what functions are defined in a .o file (it can do much more than that as well).

I found the file where the missing function was defined, then found that file's .o. I ran objdump to see which symbols were defined in that .o:

objdump -t -C btDantzigLCP.o

-t displays the symbol table; -C decodes C++ mangled function names.

There's my missing function:

0000000000004b10 g     F .text  0000000000001431 btSolveDantzigLCP(int, double*, double*, double*, double*, int, double*, double*, int*, btDantzigScratchMemory&)

Let's compare that to the linker error:

undefined reference to `btSolveDantzigLCP(int, float*, float*, float*, float*, int, float*, float*, int*, btDantzigScratchMemory&)'

The linker cannot find the function my code asking for because its parameters do not match!

Let's look at the definition of that function:

bool btSolveDantzigLCP(int n, btScalar *A, btScalar *x, btScalar *b,
                       btScalar *outer_w, int nub, btScalar *lo, btScalar *hi, 
                       int *findex, btDantzigScratchMemory &scratchMem)

That btScalar type must be typedef'd to something different than my program. Let's look at its definition:

//The btScalar type abstracts floating point numbers, to easily switch between double and single floating point precision.
#if defined(BT_USE_DOUBLE_PRECISION)
    typedef double btScalar;
    //this number could be bigger in double precision
    #define BT_LARGE_FLOAT 1e30
#else
    typedef float btScalar;
    //keep BT_LARGE_FLOAT*BT_LARGE_FLOAT < FLT_MAX
    #define BT_LARGE_FLOAT 1e18f
#endif

I took a look at the compilation defines for the working example project, and sure enough, -DBT_USE_DOUBLE_PRECISION is defined explicitly. I added it to my compile commands, cleaned (there's a whole other lesson about cleaning…), and sure enough, it linked!

I should've known to try it sooner, but it usually takes some beating into my head to learn a lesson.

Another example

I used a similar solution previously. That time, I was at work debugging a problem where the game's behavior seemed impossible. The game was complaining about a missing string, but there was no way based on the SVN version that that was the case.

I ended up opening the executable itself in Emacs and searching for the missing string in the string table. Sure enough, the executable did not have the string, so it couldn't have my modification. This was perplexing because the game's SVN version said it should have that commit.

I discovered that the game's builder (an automated machine I do not have access to) had local modifications to the file. No wonder the string was missing! This was very surprising to my boss and me, especially because it must've had those modifications for a week or two.

My boss reverted the modifications and all was right in the world again. We still don't know who made those modifications and then left them there.

Lesson learned

Don't use guess-and-check style methods to stumble your way from one linker error to another (or in my case, bash my head against the same one).

Know what data your linker is actually working with!

An Aside: Bullet Impressions

Other than this -DBT_USE_DOUBLE_PRECISION issue, getting Bullet set up was quite easy. I definitely appreciate libraries which compile and build with a single command. I didn't even have to run any dependency installation scripts.

Additionally, the Example Browser worked out of the box and gave me some motivation to continue with the integration.

Published on . Written by Macoy Madson.