Byte Alignment and Ordering
Byte Alignment and Ordering
Realtime systems consist of multiple processors communicating with each other via messages. For message communication to work correctly, the message formats should be defined unambiguously. In many systems this is achieved simply by defining C/C++ structures to implement the message format. Using C/C++ structures is a simple approach, but it has its own pitfalls. The problem is that different processors/compilers might define the same structure differently, thus causing incompatibility in the interface definition. There are two reasons for these incompatibilities:
The compiler padding is illustrated in the following example. Here a char is assumed to be one byte, a short is two bytes and a long is four bytes.
};
char pad1; // Pad to start the long word at a 4 byte boundary long message_length; char version; char pad2; // Pad to start a short at a 2 byte boundary short destination_processor; char pad3[4]; // Pad to align the complete structure to a 16 byte boundary
In the above example, the compiler has added pad bytes to enforce byte alignment rules of the target processor. If the above message structure was used in a different compiler/microprocessor combination, the pads inserted by that compiler might be different. Thus two applications using the same structure definition header file might be incompatible with each other. Thus it is a good practice to insert pad bytes explicitly in all C-structures that are shared in a interface between machines differing in either the compiler and/or microprocessor.
Single byte numbers can be aligned at any address Two byte numbers should be aligned to a two byte boundary Four byte numbers should be aligned to a four byte boundary Structures between 1 and 4 bytes of data should be padded so that the total structure is 4 bytes. Structures between 5 and 8 bytes of data should be padded so that the total structure is 8 bytes. Structures between 9 and 16 bytes of data should be padded so that the total structure is 16 bytes. Structures greater than 16 bytes should be padded to 16 byte boundary.
Byte Ordering
Microprocessors support big-endian and little-endian byte ordering. Big-endian is an order in which the "big end" (most significant byte) is stored first (at the lowest address). Little-endian is an order in which the "little end" (least significant byte) is stored first. The table below shows the representation of the hexadecimal number 0x0AC0FFEE on a big-endian and littleendian machine. The contents of memory locations 0x1000 to 0x1003 are shown.
0x1000 0x1001 0x1002 0x1003 0x0A Bigendian Little- 0xEE endian 0xC0 0xFF 0xFF 0xC0 0xEE 0x0A
The situation is actually quite similar to that of Lilliputians in Gulliver's Travels. Lilliputians were divided into two groups based on the end from which the egg should be broken. The big-endians preferred to break their eggs from the larger end. The little-endians broke their eggs from the smaller end.
Conversion Routines
Routines to convert between big-endian and little-endian formats are actually quite straight forward. The routines shown below will convert from both ways, i.e. big-endian to little-endian and back.
Big-endian to Little-endian conversion and back short convert_short(short in) { short out; char *p_in = (char *) ∈ char *p_out = (char *) &out; p_out[0] = p_in[1]; p_out[1] = p_in[0]; return out; } long convert_long(long in) { long out; char *p_in = (char *) ∈ char *p_out = (char *) &out; p_out[0] = p_in[3]; p_out[1] = p_in[2]; p_out[2] = p_in[1]; p_out[3] = p_in[0]; return out; }
Memory Alignment
"Memory alignment" is a term which refers to where a multi-byte variable is stored in a computer with a 16-bit or larger data path to memory. In the following discussion, we assume that we have a computer with a 16-bit (2 byte) wide path to memory (such as an 8086, 80286, or 80386sx). In such a computer, the following picture represents what the first few memory locations look like, where each box represents a byte of memory, and the numbers are the memory addresses:
Each row in this figure represents the 16 bits that can be transferred from the memory to the CPU when data is requested, or to the memory when data is saved. The way that the hardware is set up, any request to memory can involve only one row in the table. It's easy to see that if the CPU wants to get the byte stored at address 3, it's no problem -- it simply brings in the 16 bits that contain bytes 2 and 3, and extracts the correct byte. The same is true for any single byte, stored anywhere in memory. However, when a two-byte word is requested, the story changes a bit. The address of a word is the address of the first byte in the word, so the word at address 4 is really made up of two bytes: byte 4 and byte 5 (the darkened box in the picture). In this example, it's again no problem to read this word from memory -- the CPU simply requests the 16 bits starting at address 4.
The problem comes when the program requests the word stored at address 5 (or any other odd address). In this case, the desired word is made up of bytes 5 and 6, and since they aren't in the same row in our picture, we can't simply request that word from memory. The CPU will translate such a request from a program into two memory requests: one for the word made up of bytes 4 and 5, and one for the word made up of bytes 6 and 7. The CPU can then extract bytes 5 and 6 from these two words, and put these two bytes together in order to create the word of data that was requested. Notice that this entire discussion could be repeated with larger data sizes -- if we were using a computer with a 32-bit wide data path to memory, then we could look at the problem of accessing a 32-bit variable that spanned across two 32-bit pieces of memory. With that in mind, we next define what is meant by an "aligned access" and an "unaligned access." The first example that we considered, where the word at address 4 was requested, is an example of an aligned access, whereas the second access is an example of an unaligned access. More generally, if you are accessing an n-byte piece of data, the beginning address must be a multiple of n in order for the access to be aligned. All other accesses are unaligned accesses. Since we were talking about 2-byte pieces of data (words), aligned accesses are those that are at even addresses. On a 32-bit computer, all 32-bit data must be stored at an address that is a multiple of 4 in order to be aligned. Notice that as a special case of this definition, accessing one byte of data is always an aligned access. Alignment is important because, as explained above, an unaligned read access can take twice as long as an aligned access. In fact, on many of the newer processors, such as the Sun SPARC and the DEC Alpha, there is no hardware to handle unaligned accesses! This means that such accesses must be handled by software, which can be over 10 times slower than an aligned access. Furthermore, when you declare a word variable in assembly language, the assembler puts that word at the next available address, making absolutely no distinction between aligned and unaligned data. Fortunately there is an easy way to fix this. To make sure that data is aligned, there is a special assembler directive ALIGN. To use this directive to make sure a piece of data is aligned, simply place "ALIGN n" on a line immediately before the data you want aligned, where you replace n with the size of the data you want aligned. The ALIGN statement will add enough "padding" so that the following data begins at a multiple of n. The following example shows data segment declarations for variables of different sizes -- in the first example, the multi-byte data may or may not be aligned, so you may suffer a performance penalty. In the second example, the ALIGN instruction is used to guarantee that the multi-byte variables are aligned. Possibly Unaligned
BVAR NROWS NCOLS CRET LADDR .DATA DB 4 DW 25 DW 80 DB 13 DD ?
Alignment Guaranteed
BVAR NROWS NCOLS CRET LADDR .DATA DB 4 ALIGN 2 DW 25 DW 80 DB 13 ALIGN 4 DD ?
What we've seen in this section is that as far as efficiency goes, all words are not equal. While high-level languages usually take care of alignment issues for the programmer, assembly language provides no such "help" unless it is explicitly asked for by the programmer. Therefore, alignment needs to be kept in mind when writing assembly language programs.
Byte Ordering
Another issue that comes up when accessing a multi-byte piece of data is called "byte ordering" (or sometimes called the "byte sex" of a machine). In particular, consider a 16-bit word of data, 1234h, that is stored in memory. This word consists of two bytes: 12h, called the "Most Significant Byte" (or MSB), and 34h, called the "Least Significant Byte" (or LSB). If this word were to be stored at address 4, then as we have just seen the bytes will be stored in memory locations 4 and 5. The question is, which byte goes where? There are two options for a 16-bit word:
Little Endian: On a little endian machine, the LSB (the "little end") is stored in the first byte. So in the example above, 34h would be stored in byte 4 and 12h would be stored in byte 5. The Intel x86 family of processors that are used in PC compatible computers are all little endian, as is the Vax line of computers build by Digital Equipment Corporation. Big Endian: On a big endian machine, the MSB is stored first. In this example above, this translates into 12h being stored in location 4 and 34h being stored in location 5. The Motorola 68k processors used in Apple Macintosh computers are big endian, as are IBM mainframes and Sun SPARC processors.
Unlike alignment, which can be an issue even if you are writing a program to run on only one machine, byte ordering questions don't commonly arise unless you are writing programs that share data between different types of computers. For example, consider a program that saves a large array of integers to disk. If this program were run on an PC compatible computer, then the integer 1234h would be stored on disk as the byte 34h followed by the byte 12h, since the computer is little endian. If someone with a Macintosh tried to read in this data file, they would read the bytes in the order saved by the PC compatible, but the Macintosh program would now interpret 34h as the MSB, since the Macintosh is big endian. So the integer value that should have been loaded as 1234h=4660d is now seen as 3412h=13330d, which is quite a difference! So how are problems with byte ordering fixed? There are basically three common ways of dealing with byte ordering problems:
Define a "universal byte ordering" for all transmission and storage. The idea here is that it doesn't matter what order the data is stored in the memory while a program is running -- it only matters when the data is transferred to another computer by using either a file or some sort of transmission method. For such transfers we can define a standard byte order, say big endian, and computers that don't use this order will have to convert data into their own byte ordering before using it. Certain data that is transmitted on the Internet uses this strategy, and uses a standard ordering that is called "network byte ordering". Require data files to indicate which byte ordering is used in that file, and require all software to be able to handle both byte orderings. The disadvantage of this method is that software must be more complex in order to deal with both possible byte orderings. The big advantage is that if a certain data file is used most often on one particular kind of computer then it can use the natural byte ordering of that computer. The TIFF image format is an example of this solution to the byte ordering problem. Special hardware that can handle both byte orderings. Many of the newer processors (such as the Power PC and the DEC Alpha) can be set into either little endian or big endian mode, so they can deal with data no matter how it is stored! This method has clear advantages, and it seems to make the byte ordering problem disappear -- however, it almost makes the problem worse by creating no clear distinction about which ordering should be "standard," and so software still has to deal with finding out which byte ordering is used by particular data.
To summarize, the byte ordering problem is another issue that arises from multi-byte data, and is most important when dealing with data that must be shared between various types of machines. What makes this problem acute in today's computing environment is that the two most popular machines, which should be able to share data, use different byte orderings: PC compatibles are all little endian, and Macintosh computers are all big endian.
1 Definitions 2 Problems 3 Architectures o 3.1 RISC o 3.2 x86 and x64 o 3.3 Compatibility 4 Data Structure Padding o 4.1 Computing padding 5 Typical alignment of C structs on x86 6 References 7 See also 8 External links
Although data structure alignment is a fundamental issue for all modern computers, many computer languages and computer language implementations handle data alignment automatically. Certain C and C++ implementations and assembly language allow at least partial control of data structure padding, which may be useful in certain special circumstances.
Definitions
A memory address a, is said to be n-byte aligned when n is a power of two and a is a multiple of n bytes. In this context a byte is the smallest unit of memory access, i.e.
each memory address specifies a different byte. An n-byte aligned address would have log2 n least-significant zeros when expressed in binary. A memory access is said to be aligned when the datum being accessed is n bytes long and the datum address is n-byte aligned. When a memory access is not aligned, it is said to be misaligned. Note that by definition byte memory accesses are always aligned. A memory pointer that refers to primitive data that is n bytes long is said to be aligned if it is only allowed to contain addresses that are n-byte aligned, otherwise it is said to be unaligned. A memory pointer that refers to a data aggregate (a data structure or array) is aligned if (and only if) each primitive datum in the aggregate is aligned. Note that the definitions above assume that each primitive datum is a power of two bytes long. When this is not the case (as with 80-bit floating-point on x86) the context influences the conditions where the datum is considered aligned or not.
Problems
A computer accesses memory a single memory word at a time. As long as the memory word size is at least as large as the largest primitive data type supported by the computer, aligned accesses will always access a single memory word. This may not be true for misaligned data accesses. If the highest and lowest bytes in a datum are not within the same memory word the computer must split the datum access into multiple memory accesses. This requires a lot of complex circuitry to generate the memory accesses and coordinate them. To handle the case where the memory words are in different memory pages the processor must either verify that both pages are present before executing the instruction or be able to handle a TLB miss or a page fault on any memory access during the instruction execution. When a single memory word is accessed the operation is atomic, i.e. the whole memory word is read or written at once and other devices must wait until the read or write operation completes before they can access it. This may not be true for unaligned accesses to multiple memory words, e.g. the first word might be read by one device, both words written by another device and then the second word read by the first device so that the value read is neither the original value nor the updated value. Although such failures are rare, they can be very difficult to identify.
Architectures RISC
Most RISC processors will generate an alignment fault when a load or store instruction accesses a misaligned address. This allows the operating system to emulate the misaligned access using other instructions. For example, the alignment fault handler might use byte loads or stores (which are always aligned) to emulate a larger load or store instruction. Some architectures like MIPS have special unaligned load and store instructions. One unaligned load instruction gets the bytes from the memory word with the lowest byte address and another gets the bytes from the memory word with the highest byte address. Similarly, store-high and store-low instructions store the appropriate bytes in the higher and lower memory words respectively. The DEC Alpha architecture has a two-step approach to unaligned loads and stores. The first step is to load the upper and lower memory words into separate registers. The second step is to extract or modify the memory words using special low/high instructions similar to the MIPS instructions. An unaligned store is completed by storing the modified memory words back to memory. The reason for this complexity is that the original Alpha architecture could only read or write 32-bit or 64-bit values. This proved to be a severe limitation that often led to code bloat and poor performance. Later Alpha processors added byte and double-byte load and store instructions. Because these instructions are larger and slower than the normal memory load and store instructions they should only be used when necessary. Most C and C++ compilers have an unaligned attribute that can be applied to pointers that need the unaligned instructions.
While the x86 architecture originally did not require aligned memory access and still works without it, SSE2 instructions on x86 and x64 CPUs do require the data to be 128-bit (16-byte) aligned and there can be substantial performance advantages from using aligned data on these architectures.
Compatibility
The advantage to supporting unaligned access is that it is easier to write compilers that do not need to align memory, at the expense of the cost of slower access. One way to increase performance in RISC processors which are designed to maximize raw performance is to require data to be loaded or stored on a word boundary. So though memory is commonly addressed by 8 bit bytes, loading a 32 bit integer or 64 bit floating point number would be required to be start at every 64 bits on a 64 bit machine. The processor could flag a fault if it were asked to load a number which was not on such a boundary, but this would result in a slower call to a routine which would need to figure out which word or words contained the data and extract the equivalent value.
Computing padding
The following formula provides the number of padding bytes required to align the start of a data structure:
padding = (align - (offset mod align)) mod align
For example, the padding to add to offset 0x59d for a structure aligned to every 4 bytes is 3. The structure will then start at 0x5a0, which is a multiple of 4. Or alternatively when the alignment is a power of two, the following formula provides the new offset (where & is a bitwise AND and ~ a bitwise NOT):
new offset = align + ((offset - 1) & ~(align - 1))
Data structure members are stored sequentially in a memory so that in the structure below the member Data1 will always precede Data2 and Data2 will always precede Data3:
struct MyData { short Data1; short Data2; short Data3;
If the type "short" is stored in two bytes of memory then each member of the data structure depicted above would be 2-byte aligned. Data1 would be at offset 0, Data2 at offset 2 and Data3 at offset 4. The size of this structure would be 6 bytes. The type of each member of the structure usually has a default alignment, meaning that it will, unless otherwise requested by the programmer, be aligned on a pre-determined boundary. The following typical alignments are valid for compilers from Microsoft, Borland, and GNU when
};
compiling for x86:
A char (one byte) will be 1-byte aligned. A short (two bytes) will be 2-byte aligned. An int (four bytes) will be 4-byte aligned. A float (four bytes) will be 4-byte aligned. A double (eight bytes) will be 8-byte aligned on Windows and 4-byte aligned on Linux.
Here is a structure with members of various types, totaling 8 bytes before compilation:
struct MixedData { char Data1; short Data2; int Data3; char Data4;
After compilation the data structure will be supplemented with padding bytes to ensure a proper alignment for each of its members:
};
struct MixedData /* after compilation */ { char Data1; char Padding0[1]; /* For the following 'short' to be aligned on a 2 byte boundary */ short Data2; int Data3; char Data4; char Padding1[3];
};
The compiled size of the structure is now 12 bytes. It is important to note that the last member is padded with the number of bytes required to conform to the largest type of the structure. In this case 3 bytes are added to the last member to pad the structure to the size of a long word. It is possible to change the alignment of structures to reduce the memory they require (or to conform to an existing format) by changing the compilers alignment (or packing) of structure members. Requesting that the MixedData structure above be aligned to a one byte boundary will have the compiler discard the predetermined alignment of the members and no padding bytes would be inserted. While there is no standard way of defining the alignment of structure members, some compilers use #pragma directives to specify packing inside source files. Here is an example:
#pragma pack(push) #pragma pack(1) struct MyPackedData { char Data1; long Data2; char Data3; }; #pragma pack(pop) /* restore original alignment from stack */ /* push current alignment to stack */ /* set alignment to 1 byte boundary */
This structure would have a compiled size of 6 bytes. The above directives are available in compilers from Microsoft, Borland, GNU and many others.
See also:
Stride of an array
Type punning