0% found this document useful (0 votes)

245 views

Byte Alignment and Ordering

Byte alignment and ordering can cause incompatibility in message formats between processors. 1) Compilers add padding bytes to structures to enforce alignment rules of the target processor, but different compilers may add different padding. 2) Processors use either big-endian or little-endian byte ordering, storing multi-byte values differently in memory. Conversion routines are needed when transferring between processors with different ordering.

Uploaded by

rodriguesvasco

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOC, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

245 views

Byte Alignment and Ordering

Uploaded by

rodriguesvasco

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOC, PDF, TXT or read online on Scribd

You are on page 1/ 9

Byte Alignment and Ordering

Realtime systems consist of multiple processors communicating with each other via messages. For message communication to work correctly, the message formats should be defined unambiguously. In many systems this is achieved simply by defining C/C++ structures to implement the message format. Using C/C++ structures is a simple approach, but it has its own pitfalls. The problem is that different processors/compilers might define the same structure differently, thus causing incompatibility in the interface definition. There are two reasons for these incompatibilities:

Byte Alignment Restrictions Byte Ordering

Byte Alignment Restrictions

Most 16-bit and 32-bit processors do not allow words and long words to be stored at any offset. For example, the Motorola 68000 does not allow a 16 bit word to be stored at an odd address. Attempting to write a 16 bit number at an odd address results in an exception.

Why Restrict Byte Alignment?

32 bit microprocessors typically organize memory as shown below. Memory is accessed by performing 32 bit bus cycles. 32 bit bus cycles can however be performed at addresses that are divisible by 4. (32 bit microprocessors do not use the address lines A1 and A0 for addressing memory). The reasons for not permitting misaligned long word reads and writes are not difficult to see. For example, an aligned long word X would be written as X0, X1, X2 and X3. Thus the microprocessor can read the complete long word in a single bus cycle. If the same microprocessor now attempts to access a long word at address 0x000D, it will have to read bytes Y0, Y1, Y2 and Y3. Notice that this read cannot be performed in a single 32 bit bus cycle. The microprocessor will have to issue two different reads at address 0x100C and 0x1010 to read the complete long word. Thus it takes twice the time to read a misaligned long word.
Byte Byte Byte Byte 0 1 2 3 0x1000 0x1004 X0 0x1008 0x100C 0x1010 Y3 Y0 Y1 Y2 X1 X2 X3

Compiler Byte Padding

Compilers have to follow the byte alignment restrictions defined by the target microprocessors. This means that compilers have to add pad bytes into user defined structures so that the structure does not violate any restrictions imposed by the target microprocessor.

The compiler padding is illustrated in the following example. Here a char is assumed to be one byte, a short is two bytes and a long is four bytes.

User Defined Structure

struct Message { short opcode; char subfield; long message_length; char version; short destination_processor; };

Actual Structure Definition Used By the Compiler

struct Message { short opcode; char subfield;

};

char pad1; // Pad to start the long word at a 4 byte boundary long message_length; char version; char pad2; // Pad to start a short at a 2 byte boundary short destination_processor; char pad3[4]; // Pad to align the complete structure to a 16 byte boundary

In the above example, the compiler has added pad bytes to enforce byte alignment rules of the target processor. If the above message structure was used in a different compiler/microprocessor combination, the pads inserted by that compiler might be different. Thus two applications using the same structure definition header file might be incompatible with each other. Thus it is a good practice to insert pad bytes explicitly in all C-structures that are shared in a interface between machines differing in either the compiler and/or microprocessor.

General Byte Alignment Rules

The following byte padding rules will generally work with most 32 bit processor. You should consult your compiler and microprocessor manuals to see if you can relax any of these rules.

Single byte numbers can be aligned at any address Two byte numbers should be aligned to a two byte boundary Four byte numbers should be aligned to a four byte boundary Structures between 1 and 4 bytes of data should be padded so that the total structure is 4 bytes. Structures between 5 and 8 bytes of data should be padded so that the total structure is 8 bytes. Structures between 9 and 16 bytes of data should be padded so that the total structure is 16 bytes. Structures greater than 16 bytes should be padded to 16 byte boundary.

Structure Alignment for Efficiency

Sometimes array indexing efficiency can also determine the pad bytes in the structure. Note that compilers index into arrays by calculating the address of the indexed entry by the multiplying the index with the size of the structure. This number is then added to the array base address to obtain the final address. Since this operation involves a multiply, indexing into arrays can be expensive. The array indexing can be considerably speeded up by just making sure that the structure size is a power of 2. The compiler can then replace the multiply with a simple shift operation.

Byte Ordering
Microprocessors support big-endian and little-endian byte ordering. Big-endian is an order in which the "big end" (most significant byte) is stored first (at the lowest address). Little-endian is an order in which the "little end" (least significant byte) is stored first. The table below shows the representation of the hexadecimal number 0x0AC0FFEE on a big-endian and littleendian machine. The contents of memory locations 0x1000 to 0x1003 are shown.
0x1000 0x1001 0x1002 0x1003 0x0A Bigendian Little- 0xEE endian 0xC0 0xFF 0xFF 0xC0 0xEE 0x0A

Why Different Byte Ordering?

This is a difficult question. There is no logical reason why different microprocessor vendors decided to use different ordering schemes. Most of the reasons are historical. For example, Intel processors have traditionally been little-endian. Motorola processors have always been big-endian.

The situation is actually quite similar to that of Lilliputians in Gulliver's Travels. Lilliputians were divided into two groups based on the end from which the egg should be broken. The big-endians preferred to break their eggs from the larger end. The little-endians broke their eggs from the smaller end.

Conversion Routines
Routines to convert between big-endian and little-endian formats are actually quite straight forward. The routines shown below will convert from both ways, i.e. big-endian to little-endian and back.
Big-endian to Little-endian conversion and back short convert_short(short in) { short out; char *p_in = (char *) &in; char *p_out = (char *) &out; p_out[0] = p_in[1]; p_out[1] = p_in[0]; return out; } long convert_long(long in) { long out; char *p_in = (char *) &in; char *p_out = (char *) &out; p_out[0] = p_in[3]; p_out[1] = p_in[2]; p_out[2] = p_in[1]; p_out[3] = p_in[0]; return out; }

Alignment and Ordering Issues

This handout will explain alignment and byte-ordering issues -- topics which are ignored by the book but can make an important impact on the way programs run.

Memory Alignment
"Memory alignment" is a term which refers to where a multi-byte variable is stored in a computer with a 16-bit or larger data path to memory. In the following discussion, we assume that we have a computer with a 16-bit (2 byte) wide path to memory (such as an 8086, 80286, or 80386sx). In such a computer, the following picture represents what the first few memory locations look like, where each box represents a byte of memory, and the numbers are the memory addresses:

Each row in this figure represents the 16 bits that can be transferred from the memory to the CPU when data is requested, or to the memory when data is saved. The way that the hardware is set up, any request to memory can involve only one row in the table. It's easy to see that if the CPU wants to get the byte stored at address 3, it's no problem -- it simply brings in the 16 bits that contain bytes 2 and 3, and extracts the correct byte. The same is true for any single byte, stored anywhere in memory. However, when a two-byte word is requested, the story changes a bit. The address of a word is the address of the first byte in the word, so the word at address 4 is really made up of two bytes: byte 4 and byte 5 (the darkened box in the picture). In this example, it's again no problem to read this word from memory -- the CPU simply requests the 16 bits starting at address 4.

The problem comes when the program requests the word stored at address 5 (or any other odd address). In this case, the desired word is made up of bytes 5 and 6, and since they aren't in the same row in our picture, we can't simply request that word from memory. The CPU will translate such a request from a program into two memory requests: one for the word made up of bytes 4 and 5, and one for the word made up of bytes 6 and 7. The CPU can then extract bytes 5 and 6 from these two words, and put these two bytes together in order to create the word of data that was requested. Notice that this entire discussion could be repeated with larger data sizes -- if we were using a computer with a 32-bit wide data path to memory, then we could look at the problem of accessing a 32-bit variable that spanned across two 32-bit pieces of memory. With that in mind, we next define what is meant by an "aligned access" and an "unaligned access." The first example that we considered, where the word at address 4 was requested, is an example of an aligned access, whereas the second access is an example of an unaligned access. More generally, if you are accessing an n-byte piece of data, the beginning address must be a multiple of n in order for the access to be aligned. All other accesses are unaligned accesses. Since we were talking about 2-byte pieces of data (words), aligned accesses are those that are at even addresses. On a 32-bit computer, all 32-bit data must be stored at an address that is a multiple of 4 in order to be aligned. Notice that as a special case of this definition, accessing one byte of data is always an aligned access. Alignment is important because, as explained above, an unaligned read access can take twice as long as an aligned access. In fact, on many of the newer processors, such as the Sun SPARC and the DEC Alpha, there is no hardware to handle unaligned accesses! This means that such accesses must be handled by software, which can be over 10 times slower than an aligned access. Furthermore, when you declare a word variable in assembly language, the assembler puts that word at the next available address, making absolutely no distinction between aligned and unaligned data. Fortunately there is an easy way to fix this. To make sure that data is aligned, there is a special assembler directive ALIGN. To use this directive to make sure a piece of data is aligned, simply place "ALIGN n" on a line immediately before the data you want aligned, where you replace n with the size of the data you want aligned. The ALIGN statement will add enough "padding" so that the following data begins at a multiple of n. The following example shows data segment declarations for variables of different sizes -- in the first example, the multi-byte data may or may not be aligned, so you may suffer a performance penalty. In the second example, the ALIGN instruction is used to guarantee that the multi-byte variables are aligned. Possibly Unaligned
BVAR NROWS NCOLS CRET LADDR .DATA DB 4 DW 25 DW 80 DB 13 DD ?

Alignment Guaranteed
BVAR NROWS NCOLS CRET LADDR .DATA DB 4 ALIGN 2 DW 25 DW 80 DB 13 ALIGN 4 DD ?

What we've seen in this section is that as far as efficiency goes, all words are not equal. While high-level languages usually take care of alignment issues for the programmer, assembly language provides no such "help" unless it is explicitly asked for by the programmer. Therefore, alignment needs to be kept in mind when writing assembly language programs.

Byte Ordering
Another issue that comes up when accessing a multi-byte piece of data is called "byte ordering" (or sometimes called the "byte sex" of a machine). In particular, consider a 16-bit word of data, 1234h, that is stored in memory. This word consists of two bytes: 12h, called the "Most Significant Byte" (or MSB), and 34h, called the "Least Significant Byte" (or LSB). If this word were to be stored at address 4, then as we have just seen the bytes will be stored in memory locations 4 and 5. The question is, which byte goes where? There are two options for a 16-bit word:

Little Endian: On a little endian machine, the LSB (the "little end") is stored in the first byte. So in the example above, 34h would be stored in byte 4 and 12h would be stored in byte 5. The Intel x86 family of processors that are used in PC compatible computers are all little endian, as is the Vax line of computers build by Digital Equipment Corporation. Big Endian: On a big endian machine, the MSB is stored first. In this example above, this translates into 12h being stored in location 4 and 34h being stored in location 5. The Motorola 68k processors used in Apple Macintosh computers are big endian, as are IBM mainframes and Sun SPARC processors.

Unlike alignment, which can be an issue even if you are writing a program to run on only one machine, byte ordering questions don't commonly arise unless you are writing programs that share data between different types of computers. For example, consider a program that saves a large array of integers to disk. If this program were run on an PC compatible computer, then the integer 1234h would be stored on disk as the byte 34h followed by the byte 12h, since the computer is little endian. If someone with a Macintosh tried to read in this data file, they would read the bytes in the order saved by the PC compatible, but the Macintosh program would now interpret 34h as the MSB, since the Macintosh is big endian. So the integer value that should have been loaded as 1234h=4660d is now seen as 3412h=13330d, which is quite a difference! So how are problems with byte ordering fixed? There are basically three common ways of dealing with byte ordering problems:

Define a "universal byte ordering" for all transmission and storage. The idea here is that it doesn't matter what order the data is stored in the memory while a program is running -- it only matters when the data is transferred to another computer by using either a file or some sort of transmission method. For such transfers we can define a standard byte order, say big endian, and computers that don't use this order will have to convert data into their own byte ordering before using it. Certain data that is transmitted on the Internet uses this strategy, and uses a standard ordering that is called "network byte ordering". Require data files to indicate which byte ordering is used in that file, and require all software to be able to handle both byte orderings. The disadvantage of this method is that software must be more complex in order to deal with both possible byte orderings. The big advantage is that if a certain data file is used most often on one particular kind of computer then it can use the natural byte ordering of that computer. The TIFF image format is an example of this solution to the byte ordering problem. Special hardware that can handle both byte orderings. Many of the newer processors (such as the Power PC and the DEC Alpha) can be set into either little endian or big endian mode, so they can deal with data no matter how it is stored! This method has clear advantages, and it seems to make the byte ordering problem disappear -- however, it almost makes the problem worse by creating no clear distinction about which ordering should be "standard," and so software still has to deal with finding out which byte ordering is used by particular data.

To summarize, the byte ordering problem is another issue that arises from multi-byte data, and is most important when dealing with data that must be shared between various types of machines. What makes this problem acute in today's computing environment is that the two most popular machines, which should be able to share data, use different byte orderings: PC compatibles are all little endian, and Macintosh computers are all big endian.

Data structure alignment

Data structure alignment is the way data is arranged and accessed in computer memory. It consists of two separate but related issues: data alignment and data structure padding. When modern computers read from or write to a memory address, it will do this in word sized chunks (e.g. 4 byte chunks on a 32-bit system). Data alignment is to put the data at a memory offset equal to some multiple of the word size, which increases the system's performance due to how the CPU handles memory. To align the data, it may be necessary to insert some meaningless bytes between the end of the last data structure and the start of the next, which is data structure padding. For example, when the computer's word size is 4 bytes, the data to be read should be at a memory offset which is some multiple of 4. When this is not the case, e.g. the data starts at the 14th byte instead of the 16th byte, then the computer has to read two 4-byte chunks and do some calculation before the requested data has been read, or it may generate an alignment fault. Even tough the previous data structure ends at the 14th byte, the next data Contents structure should start at the 16th byte. Two padding bytes are inserted between the two data structures to align the next data structure to the 16th byte.
[hide]

1 Definitions 2 Problems 3 Architectures o 3.1 RISC o 3.2 x86 and x64 o 3.3 Compatibility 4 Data Structure Padding o 4.1 Computing padding 5 Typical alignment of C structs on x86 6 References 7 See also 8 External links

Although data structure alignment is a fundamental issue for all modern computers, many computer languages and computer language implementations handle data alignment automatically. Certain C and C++ implementations and assembly language allow at least partial control of data structure padding, which may be useful in certain special circumstances.

Definitions
A memory address a, is said to be n-byte aligned when n is a power of two and a is a multiple of n bytes. In this context a byte is the smallest unit of memory access, i.e.

each memory address specifies a different byte. An n-byte aligned address would have log2 n least-significant zeros when expressed in binary. A memory access is said to be aligned when the datum being accessed is n bytes long and the datum address is n-byte aligned. When a memory access is not aligned, it is said to be misaligned. Note that by definition byte memory accesses are always aligned. A memory pointer that refers to primitive data that is n bytes long is said to be aligned if it is only allowed to contain addresses that are n-byte aligned, otherwise it is said to be unaligned. A memory pointer that refers to a data aggregate (a data structure or array) is aligned if (and only if) each primitive datum in the aggregate is aligned. Note that the definitions above assume that each primitive datum is a power of two bytes long. When this is not the case (as with 80-bit floating-point on x86) the context influences the conditions where the datum is considered aligned or not.

Problems
A computer accesses memory a single memory word at a time. As long as the memory word size is at least as large as the largest primitive data type supported by the computer, aligned accesses will always access a single memory word. This may not be true for misaligned data accesses. If the highest and lowest bytes in a datum are not within the same memory word the computer must split the datum access into multiple memory accesses. This requires a lot of complex circuitry to generate the memory accesses and coordinate them. To handle the case where the memory words are in different memory pages the processor must either verify that both pages are present before executing the instruction or be able to handle a TLB miss or a page fault on any memory access during the instruction execution. When a single memory word is accessed the operation is atomic, i.e. the whole memory word is read or written at once and other devices must wait until the read or write operation completes before they can access it. This may not be true for unaligned accesses to multiple memory words, e.g. the first word might be read by one device, both words written by another device and then the second word read by the first device so that the value read is neither the original value nor the updated value. Although such failures are rare, they can be very difficult to identify.

Architectures RISC
Most RISC processors will generate an alignment fault when a load or store instruction accesses a misaligned address. This allows the operating system to emulate the misaligned access using other instructions. For example, the alignment fault handler might use byte loads or stores (which are always aligned) to emulate a larger load or store instruction. Some architectures like MIPS have special unaligned load and store instructions. One unaligned load instruction gets the bytes from the memory word with the lowest byte address and another gets the bytes from the memory word with the highest byte address. Similarly, store-high and store-low instructions store the appropriate bytes in the higher and lower memory words respectively. The DEC Alpha architecture has a two-step approach to unaligned loads and stores. The first step is to load the upper and lower memory words into separate registers. The second step is to extract or modify the memory words using special low/high instructions similar to the MIPS instructions. An unaligned store is completed by storing the modified memory words back to memory. The reason for this complexity is that the original Alpha architecture could only read or write 32-bit or 64-bit values. This proved to be a severe limitation that often led to code bloat and poor performance. Later Alpha processors added byte and double-byte load and store instructions. Because these instructions are larger and slower than the normal memory load and store instructions they should only be used when necessary. Most C and C++ compilers have an unaligned attribute that can be applied to pointers that need the unaligned instructions.

x86 and x64

While the x86 architecture originally did not require aligned memory access and still works without it, SSE2 instructions on x86 and x64 CPUs do require the data to be 128-bit (16-byte) aligned and there can be substantial performance advantages from using aligned data on these architectures.

Compatibility
The advantage to supporting unaligned access is that it is easier to write compilers that do not need to align memory, at the expense of the cost of slower access. One way to increase performance in RISC processors which are designed to maximize raw performance is to require data to be loaded or stored on a word boundary. So though memory is commonly addressed by 8 bit bytes, loading a 32 bit integer or 64 bit floating point number would be required to be start at every 64 bits on a 64 bit machine. The processor could flag a fault if it were asked to load a number which was not on such a boundary, but this would result in a slower call to a routine which would need to figure out which word or words contained the data and extract the equivalent value.

Data Structure Padding

Although the compiler (or interpreter) normally allocates individual data items on aligned boundaries, data structures often have members with different alignment requirements. To maintain proper alignment the translator normally inserts additional unnamed data members so that each member is properly aligned. In addition the data structure as a whole may be padded with a final unnamed member. This allows each member of an array of structures to be properly aligned. Padding is only inserted when a structure member is followed by a member with a larger alignment requirement or at the end of the structure. By changing the ordering of members in a structure, it is possible to change the amount of padding required to maintain alignment. For example, if members are sorted by ascending or descending alignment requirements a minimal amount of padding is required. The minimal amount of padding required is always less than the largest alignment in the structure. Computing the maximum amount of padding required is more complicated, but is always less than the sum of the alignment requirements for all members minus twice the sum of the alignment requirements for the least aligned half of the structure members. Although C and C++ do not allow the compiler to reorder structure members to save space, other languages might. It is also possible to tell most C and C++ compilers to "pack" the members of a structure to a certain level of alignment, e.g. "pack(2)" means align data members larger than a byte to a two-byte boundary so that any padding members are at most one byte long. One use for such "packed" structures is to conserve memory. For example, a structure containing a single byte and a fourbyte integer would require three additional bytes of padding. A large array of such structures would use 37.5% less memory if they are packed, although accessing each structure might take longer. This compromise may be considered a form of space-time tradeoff. Although use of "packed" structures is most frequently used to conserve memory space, it may also be used to format a data structure for transmission using a standard protocol. However in this usage, care must also be taken to ensure that the values of the struct members are stored with the endianness required by the protocol (often network byte order), which may be different from the endianness used natively by the host machine.

Computing padding
The following formula provides the number of padding bytes required to align the start of a data structure:
padding = (align - (offset mod align)) mod align

For example, the padding to add to offset 0x59d for a structure aligned to every 4 bytes is 3. The structure will then start at 0x5a0, which is a multiple of 4. Or alternatively when the alignment is a power of two, the following formula provides the new offset (where & is a bitwise AND and ~ a bitwise NOT):
new offset = align + ((offset - 1) & ~(align - 1))

Typical alignment of C structs on x86

Data structure members are stored sequentially in a memory so that in the structure below the member Data1 will always precede Data2 and Data2 will always precede Data3:
struct MyData { short Data1; short Data2; short Data3;

If the type "short" is stored in two bytes of memory then each member of the data structure depicted above would be 2-byte aligned. Data1 would be at offset 0, Data2 at offset 2 and Data3 at offset 4. The size of this structure would be 6 bytes. The type of each member of the structure usually has a default alignment, meaning that it will, unless otherwise requested by the programmer, be aligned on a pre-determined boundary. The following typical alignments are valid for compilers from Microsoft, Borland, and GNU when

};
compiling for x86:

A char (one byte) will be 1-byte aligned. A short (two bytes) will be 2-byte aligned. An int (four bytes) will be 4-byte aligned. A float (four bytes) will be 4-byte aligned. A double (eight bytes) will be 8-byte aligned on Windows and 4-byte aligned on Linux.

Here is a structure with members of various types, totaling 8 bytes before compilation:
struct MixedData { char Data1; short Data2; int Data3; char Data4;

After compilation the data structure will be supplemented with padding bytes to ensure a proper alignment for each of its members:

};

struct MixedData /* after compilation */ { char Data1; char Padding0[1]; /* For the following 'short' to be aligned on a 2 byte boundary */ short Data2; int Data3; char Data4; char Padding1[3];

};

The compiled size of the structure is now 12 bytes. It is important to note that the last member is padded with the number of bytes required to conform to the largest type of the structure. In this case 3 bytes are added to the last member to pad the structure to the size of a long word. It is possible to change the alignment of structures to reduce the memory they require (or to conform to an existing format) by changing the compilers alignment (or packing) of structure members. Requesting that the MixedData structure above be aligned to a one byte boundary will have the compiler discard the predetermined alignment of the members and no padding bytes would be inserted. While there is no standard way of defining the alignment of structure members, some compilers use #pragma directives to specify packing inside source files. Here is an example:
#pragma pack(push) #pragma pack(1) struct MyPackedData { char Data1; long Data2; char Data3; }; #pragma pack(pop) /* restore original alignment from stack */ /* push current alignment to stack */ /* set alignment to 1 byte boundary */

This structure would have a compiled size of 6 bytes. The above directives are available in compilers from Microsoft, Borland, GNU and many others.

Byte Alignment and Ordering

Uploaded by

Byte Alignment and Ordering

Uploaded by

Byte Alignment and Ordering

Byte Alignment Restrictions Byte Ordering

Byte Alignment Restrictions

Why Restrict Byte Alignment?

Compiler Byte Padding

User Defined Structure

Actual Structure Definition Used By the Compiler

General Byte Alignment Rules

Structure Alignment for Efficiency

Why Different Byte Ordering?

Alignment and Ordering Issues

Data structure alignment

x86 and x64

Data Structure Padding

Typical alignment of C structs on x86

You might also like