I have an intel processor on my pc here, running in 64-bit mode, x86_64, where the registers have 64 bits in size, if I use the word register, or use some flag optimization the variable tends to be placed in the register, but if I put a value above 32 bits, the compiler complains, ie my int is not being 64 bits, why does this happen? It was not to be 64 bits, if the variable were placed in the register, and I did not even get its memory address? That is, it is not even placed on the stack.
#include <stdio.h> int main(void) { register int x = 4294967296; return 0; }
Compile:
gcc example.c -o example -Wall -Wextra
Output:
warning: overflow in implicit constant conversion [-Woverflow] register int x = 4294967296;
Answer
The C register
keyword does nothing to override the width of C type you chose. int
is a 32-bit type in all 32 and 64-bit x86 ABIs, including x86-64 System V and Windows x64. (long
is also 32-bit on Windows x64, but 64-bit on Linux / Mac / everything else x86-641.)
A register int
is still an int
, subject to all the limits of INT_MAX
, INT_MIN
, and signed overflow being undefined behaviour. It doesn’t turn your C source into a portable assembly language.
Using register
just tells the compiler to stop you from taking the variable’s address, so even in debug mode (with minimal optimization) a very naive compiler can keep the variable in (the low half of) a register without running into any surprises later in the function. (Of course modern compilers don’t need this help normally, but for some register
actually does have an effect in debug mode.)
If a register has 64 bits, why should the convention of int be 32 bits
int
is 32-bit because nobody wants an array of int
to become 64-bit on a 64-bit machine, having twice the cache footprint.
A very few C implementations have int
= int64_t
(for example on some Cray machines, I think I’ve read), but it’s extremely rare even outside of x86-64 where 32-bit is the “natural” and most efficient operand-size for machine code. Even DEC Alpha (which was aggressively 64-bit and designed from scratch for 64-bit) I think still used 32-bit int.
Making int
32-bit when growing from 16 to 32-bit machines made sense, because 16-bit is “too small” sometimes. (But remember that ISO C only guarantees that int
is at least 16 bits. If you need more than that, in a truly portable program you’d better use long
or int_least32_t
.)
But 32 bits is “big enough” for most programs, and 64-bit machines always have fast 32-bit integers, so int
stayed 32-bit when moving from 32 to 64-bit machines.
On some machines, 16-bit integers aren’t very well supported. e.g. implementing the wrapping to 16 bits with uint16_t
on MIPS would require extra AND-immediate instructions. So making int
a 16-bit type would have been a poor choice there.
On x86 you could just use 16-bit operand size, and use movzx
instead of mov
when copying, but it’s “normal” for int
to be 32-bit on 32-bit machines so x86 32-bit ABIs all chose that.
When ISAs were extended from 32 to 64-bit, there was zero performance reason to make int
wider, unlike the 16->32 case. (Also in that case, short
stayed 16-bit so there was a typename for both 16 and 32-bit integers, even before C99 stdint.h
existed).
On x86-64, the default operand-size is still 32-bit; mov rax, rcx
takes an extra prefix byte (REX.W) vs. mov eax, ecx
, so 32-bit is slightly more efficient. Also, 64-bit multiply was slightly slower on some CPUs, and 64-bit division is significantly slower than 32-bit even on current Intel CPUs. The advantages of using 32bit registers/instructions in x86-64
Also, compilers need a primitive type for int32_t
, if they want to provide the optional int32_t
at all. (The fixed-width 2’s complement types are optional, unlike the int_least32_t
and so on which isn’t guaranteed to be 2’s complement or free of padding.)
Compilers with 16-bit short
and 64-bit int
could have an implementation-specific type name like __int32
that they use as the typedef for int32_t
/ uint32_t
, so this argument isn’t a total showstopper. But it would be weird.
When growing from 16 to 32, it made sense to change int
to be wider than the ISO C minimum, because you still have short
as a name for 16-bit. (This argument isn’t super great because you do have long
as a name for 32-bit integers on 32-bit systems.)
But when growing to 64-bit, you want some type to be a 32-bit integer type. (And long
can’t be narrower than int
). char
/ short
/ int
/ long
(or long long
) covers all 4 possible operand-sizes. int32_t
isn’t guaranteed to be available on all systems, so expecting everyone to use that if they want 32-bit signed integers is not a viable option for portable code.
Footnote 1:
You could argue either way whether it’s better for long
to be a 32 or 64-bit type. Microsoft’s choice of keeping it 32-bit meant that struct layouts using long
might not change between 32 and 64-bit code (but they would if they included pointers).
ISO C requires long
to be at least a 32-bit type (actually they define it in terms of the min and max value that can be represented, but elsewhere they do require that integer types are binary integers with optional padding).
Anyway, some code uses long
because it needs a 32-bit type, but it doesn’t need 64; in many cases more bits isn’t better, they’re just not needed.
Within a single ABI like x86-64 System V, it’s semi-convenient to always have long be the same width as a pointer, but since portable code always needs to use unsigned long long
or uint64_t
or uint_least64_t
or uintptr_t
depending on the use-case, it might be a mistake for x86-64 System V to have chosen 64-bit long.
OTOH, wider types for locals can sometimes save instructions by avoiding sign-extending when indexing a pointer, but the fact that signed overflow is undefined behaviour often lets compilers widen int
in the asm when convenient.