About Blog Contact

The strict aliasing rule: type punning goes bad

Floating point numbers are usually represented as IEEE754 format in memory. I would like to examine the individual bytes that makes it up. The most natural thought would be just pointer cast it:

float a = 1.0;
printf("float: %f; IEEE754: %x\n", a, *(unsigned int*)&a);

The result is:

float: 1.000000; IEEE754: 3f800000

The answer is correct. However, there’s a problem with it: it violates the strict aliasing rule. The strict aliasing rule is a rule governing “aliasing”, or through what expression types, are we allowed to access stored values. According to the rule, for a certain value, only a few types of pointers may be used to access it. In general, apart from pointers of the same type as the value, only the following pointers may be dereferenced to it (only C):

See examples:

int a = 1.0;

unsigned int* b = (unsigned int*)&a; // OK
signed int* c = (signed int*)&a; // OK
const unsigned int* d = (const unsigned int*)&a; // OK
char* e = (char*) &a; // OK
float* f = (float*) &a; // NOT OK

// Overlaying a struct on a buffer: NOT OK!
uint64_t* buffer;
SomeStruct* m = (SomeStruct*)(buffer);

This means that in generate, type punning, or using a type to refer to another type is not allowed. Now you can see very clearly that the print-the-float program violates this rule. But why this restriction? Won’t this just affect the program’s expressiveness and flexibility? I mean, it worked just fine, and didn’t emit any error at all when compiled with gcc -Wall -Wextra -std=c11.

Well, if there were no rules forbidding strict aliasing, then optimization would be harder for compilers. If any pointer could possibly be a pointer at any other data structure, then the compiler must find all operations that could modify the structure before applying optimizations such as reordering operations or remove operations altogether. If the strict aliasing rule is in place, then the compiler would only need to considering pointers of the allowed types. So assuming the programmer follows this rule allows the compiler to more easily optimize code.

Here’s an example to show how bad things could happens if compiler operates under the rule while you’re not.

float f(int* a, float* b) {
   *b = 1.0f;
   *a = 0;
   return *b;
}


float x = 0;
float ans = f((int*)&x, &x);   

What do you expect ans to be? 0, right? Wrong! If you compile it with -O0, then the result is indeed 0. However, if you compile it with -O3, then the result is 1.0. You can see that the generated code simply returned 1:

f:
        movss   .LC0(%rip), %xmm0
        movss   %xmm0, (%rsi)
        movl    $0, (%rdi)
        ret
.LC0:
        .long   1065353216

Where 1065353216 is 0x3f800000, 1.0 in IEEE754 format. This shows the meaning of undefined behavior: inconsistent results could arise from violation of the rule. In this case, the compiler assumed that int *a and float* b would not referred to the same data structure, so the *a=0 expression is deemed to be redundant and optimized out. You could try to have the compiler not make such an assumption by using the flag -Wno-strict-aliasing, but GCC 9.2.1 apparently weren’t able to find out this violation of strict aliasing under -O3, and still returned 1.0.

Of course, no one write code like the example about. However, such problems could happen, if a function violates the rule and the violation is obscured with many calls. Here is a more plausible example. Also, sometimes the rule could cause de-optimization, as shown here.

So, now we know the rule, how do we handle operations such as printing the underlying representation of an float? Here are a few possible methods:

  1. Use char*. As the rule implies, any object can be pointer cast to a char*, and the compiler would make no assumptions about the referred value being a char. So the code could be written as:

     float a = 1.0;
     char* b = (char *) &a;
     for(int i = 3; i >= 0; i++) {
         printf("%02hhx", *(b+i));
     }
    

    A few possible ways this could go wrong: the machine might not be little endian, so the ordering might need to change; char might not be 8 bits, though I serious doubt that you can find such a system that’s still in use. Also, float is not guarenteed by the C++ standard to be 32 bits wide, and I had seen quite a few embedded systems that had a 16 bit float.

  2. Use memcpy. Since memcpy take in void *, there are no type-punning problems. Also, the system could quite possibly optimize it into a register move.
  3. Use a union. A union is a structure to store data and provide multiple representations of the same data. The code:

     float a = 1.0;
     union {
         float f;
         uint32_t u;
     } float_int;
     float_int.f = a;
     printf("%x", float_int.u);
    

    However, this does not work in C++, only in C.

For the specific problem of printing a floating point number, however, there is a more direct way:

float a = 1.0;
printf("%a", a);

This prints 0x1p+0, 1.0 in hexadecimal floating point format. Getting the full hex representation is just concatenating them together and adding a sign bit.

Further Reading