Unless I'm missing something, `(void)x` also has undefined beahvior
if x is uninitialized,
though it's very likely to do nothing in practice.
The behavior [of int a = a;] is undefined. In C11 and later
(N1570 6.3.2.1p2):
Except when [...] an lvalue that does not have array type is
converted to the value stored in the designated object (and is
no longer an lvalue); this is called lvalue conversion.
[...]
If the lvalue designates an object of automatic storage
duration that could have been declared with the register
storage class (never had its address taken), and that object
is uninitialized (not declared with an initializer and no
assignment to it has been performed prior to use), the
behavior is undefined.
Long digression follows.
The "could have been declared with the register storage class"
seems quite odd. And in fact it is quite odd.
It's tempting to assume that `int n = n;` did not have undefined
behavior prior to C11, or that accessing an automatic object whose
address has not been taken does not have undefined behavior even
in C11 or later, but it's not that simple.
In C90, the non-normative Annex G (renamed to Annex J in later
editions) says:
The behavior in the following circumstances is undefined:
[...]
- The value of an uninitialized object that has automatic storage
duration is used before a value is assigned (6.5.7).
6.5.7 discusses initialization, and says that "If an object that
has automatic storage duration is not initialized explicitly, its
value is indeterminate", and C90's definition of "undefined behavior" explicitly refers to use of indeterminately valued objects, though
it's not 100% clear that using an indeterminate value *always*
has undefined behavior.
So in C90, `int n = n;` explicitly had undefined behavior, even if
all possible bit representations for an object of type int correspond
to valid values (C90 didn't mention "trap representations").
C99 added a definition for "indeterminate value": "either an
unspecified value or a trap representation", and drops the mention
of indeterminate values in the definition of "undefined behavior".
It dropped the reference to uninitialized objects in Annex G/J.
I believe that in C99, `int n = n;` is well defined *if* int
has no trap representations, or if the representation stored in
the memory occupied by n happens not to be a trap representation.
If int has trap representations, and that memory happens to contain
such a representation, the behavior is undefined.
I found a discussion in comp.std.c from 2023, subject "Does reading
an uninitialized object have undefined behavior?".
The discontinued IA-64/Itanium processor had something called
"NaT", "Not a Thing". NaT representations exist only in CPU
registers, not in memory. (Imagine an extra bit for each register
indicating whether the register contains a "thing".) A NaT allows
for representations that act like C trap representations (called
non-value representations in C23) even for types with no trap
representations (for example where all 2**N possible representations correspond to valid values) -- but again, only in CPU registers.
https://www.open-std.org/jtc1/sc22/wg14/www/docs/dr_338.htm
So the "could have been declared with the register storage class"
wording was added in C11 specifically to cater to the IA64. This
change would have been superfluous in C90, where the behavior was
undefined anyway, but is a semantically significant change between
C99 and C11. (If some future CPU has something like NaT that can
be stored in memory, the wording might need to be updated yet again.)
My takeaway is that if it requires this much research to determine
whether accessing the value of an uninitialized object has undefined
behavior (in which circumstances and which edition of the standard),
I'll just avoid doing so altogether. I'll initialize objects
when they're defined whenever practical. If it's not practical
for some reason, I won't initialize it with some dummy value; I'll
leave it uninitialized so the compiler has a chance to warn me if
I accidentally use it before assigning a value to it.
David Brown <david.brown@hesbynett.no> writes:
[...]
As far as I understand it (and I hope to be corrected if I am wrong),
Your hope is about to be fulfilled.
"int a = a;" is not undefined behaviour as long as the implementation
does not have trap values for "int". It simply leaves "a" as an
unspecified value - just like "int a;" does. Thus it is not in any
way "worse" than "int a;" as far as C semantics are concerned. Any
difference is a matter of implementation - and the usual
implementation effect is to disable "not initialised" warnings.
The behavior is undefined. In C11 and later (N1570 6.3.2.1p2):
Except when [...] an lvalue that does not have array type is
converted to the value stored in the designated object (and is no
longer an lvalue); this is called lvalue conversion.
[...]
If the lvalue designates an object of automatic storage duration that
could have been declared with the register storage class (never had
its address taken), and that object is uninitialized (not declared
with an initializer and no assignment to it has been performed prior
to use), the behavior is undefined.
It is in much the same category as "(void) x;", which is an idiom for
skipping an "unused variable" or "unused parameter" warning.
Unless I'm missing something, `(void)x` also has undefined beahvior
if x is uninitialized, though it's very likely to do nothing in
practice.
Long digression follows.
The "could have been declared with the register storage class" seems
quite odd. And in fact it is quite odd.
It's tempting to assume that `int n = n;` did not have undefined
behavior prior to C11, or that accessing an automatic object whose
address has not been taken does not have undefined behavior even
in C11 or later, but it's not that simple.
In C90, the non-normative Annex G (renamed to Annex J in later
editions) says:
The behavior in the following circumstances is undefined:
[...]
- The value of an uninitialized object that has automatic storage
duration is used before a value is assigned (6.5.7).
6.5.7 discusses initialization, and says that "If an object that
has automatic storage duration is not initialized explicitly, its
value is indeterminate", and C90's definition of "undefined behavior" explicitly refers to use of indeterminately valued objects, though
it's not 100% clear that using an indeterminate value *always*
has undefined behavior.
So in C90, `int n = n;` explicitly had undefined behavior, even if
all possible bit representations for an object of type int correspond
to valid values (C90 didn't mention "trap representations").
C99 added a definition for "indeterminate value": "either an
unspecified value or a trap representation", and drops the mention
of indeterminate values in the definition of "undefined behavior".
It dropped the reference to uninitialized objects in Annex G/J.
I believe that in C99, `int n = n;` is well defined *if* int
has no trap representations, or if the representation stored in
the memory occupied by n happens not to be a trap representation.
If int has trap representations, and that memory happens to contain
such a representation, the behavior is undefined.
I found a discussion in comp.std.c from 2023, subject "Does reading
an uninitialized object have undefined behavior?".
The discontinued IA-64/Itanium processor had something called
"NaT", "Not a Thing". NaT representations exist only in CPU
registers, not in memory. (Imagine an extra bit for each register
indicating whether the register contains a "thing".) A NaT allows
for representations that act like C trap representations (called
non-value representations in C23) even for types with no trap
representations (for example where all 2**N possible representations correspond to valid values) -- but again, only in CPU registers.
https://www.open-std.org/jtc1/sc22/wg14/www/docs/dr_338.htm
So the "could have been declared with the register storage class"
wording was added in C11 specifically to cater to the IA64. This
change would have been superfluous in C90, where the behavior was
undefined anyway, but is a semantically significant change between
C99 and C11. (If some future CPU has something like NaT that can
be stored in memory, the wording might need to be updated yet again.)
My takeaway is that if it requires this much research to determine
whether accessing the value of an uninitialized object has undefined
behavior (in which circumstances and which edition of the standard),
I'll just avoid doing so altogether. I'll initialize objects
when they're defined whenever practical. If it's not practical
for some reason, I won't initialize it with some dummy value; I'll
leave it uninitialized so the compiler has a chance to warn me if
I accidentally use it before assigning a value to it.
Tim Rentsch <tr.17687@z991.linuxsc.com> writes:
Keith Thompson <Keith.S.Thompson+u@gmail.com> writes:
The "could have been declared with the register storage class"
seems quite odd. And in fact it is quite odd.
I don't have the same reaction. The point of this phrase is that
undefined behavior occurs only for variables that don't have
their address taken. The phrase used describes that nicely.
Any questions related to "registerness" can be ignored, because
'register' in C really has nothing to do with hardware registers,
despite the name.
DR 338 is explicitly motivated by an IA-64 feature that applies only to
CPU registers. An object whose address is taken can't be stored (only)
in a register, so it can't have a NaT representation.
The phrase used is "could have been declared with register storage class (never had its address taken)". Surely "never had its address taken"
would have been clear enough if CPU registers weren't a big part of the motivation.
David Brown <david.brown@hesbynett.no> writes:
On 20/03/2025 11:20, Keith Thompson wrote:
Tim Rentsch <tr.17687@z991.linuxsc.com> writes:
Keith Thompson <Keith.S.Thompson+u@gmail.com> writes:DR 338 is explicitly motivated by an IA-64 feature that applies only
The "could have been declared with the register storage class"
seems quite odd. And in fact it is quite odd.
I don't have the same reaction. The point of this phrase is that
undefined behavior occurs only for variables that don't have
their address taken. The phrase used describes that nicely.
Any questions related to "registerness" can be ignored, because
'register' in C really has nothing to do with hardware registers,
despite the name.
to
CPU registers. An object whose address is taken can't be stored (only)
in a register, so it can't have a NaT representation.
The phrase used is "could have been declared with register storage
class
(never had its address taken)". Surely "never had its address taken"
would have been clear enough if CPU registers weren't a big part of the
motivation.
I too think the phrasing is a bit odd.
Just because a variable's address is taken, does not mean it cannot be
put in a cpu register by the compiler. If the variable is not
accessed in a way that actually requires putting it in memory, then
the compiler can put it in a cpu register (or otherwise optimise it).
So simply taking the address of a variable on IA-64 does not mean it
cannot be in a register, and thus does not necessarily mean it cannot
be NaT. Taking the address of a variable means the variable cannot be
declared "register", but it does not mean it cannot be /in/ a
register.
Sure, any variable that's stored in memory can be mirrored by holding
its value in a register.
int n = 42; // Assume n is assigned a memory address
printf("n+1=%d n+2=%d\n", n+1, n+2);
A compiler could plausibly store the value of n in a register before computing n+1, and then reuse the register value to compute n+2.
My understanding is that IA-64 NaT (Not a Thing) representations
exist only for registers, and the NaT bit should be cleared when
a value is stored in the register.
The odd wording in the standard allows an IA-64 C compiler to
take advantage of NaT representations for their intended purpose.
It might impose some minor constraints on what machine code can be
generated, but *most* of the cases where a NaT could be accessed
are undefined behavior in C.
It seems very strange to me that this is UB:
int foo1(void) {
int x;
return x;
}
while this is not :
int foo2(void) {
int x;
int * p = &x;
return x;
}
(Unfortunately, godbolt.org doesn't seem to have a gcc IA-64 compiler
in its list.)
It strikes me that it would have been far simpler for the standard
simply to say that using the value of an uninitialised and unassigned
variable is undefined behaviour.
In C90, it was. C99 changed that, making the behavior defined if the representation is not a trap representation.
For C99, a conforming IA-64 C compiler would have had to go out of its
way to avoid accessing NaT representations. For example, if you wrote
{
int n;
n;
}
the most straightforward IA-64 code would store n in a register and
not initialize it, resulting in a trap when the register is read.
A compiler might have to generate code to store an arbitrary value
in the register to void the trap.
I'm undecided on whether reading the value of an uninitialized
automatic object *should* be undefined behavior, but given that
it isn't, the C11 committee made the smallest possible change to
cater to IA-64 semantics.
David Brown <david.brown@hesbynett.no> writes:
I see that, but I believe it would be much simpler and clearer if
attempting to read an uninitialised and unassigned local variable were
undefined behaviour in every case.
I probably agree (I haven't given it all that much thought), but the committee made a specific decision between C90 and C99 to say that
reading an uninitialized automatic object is *not* undefined behavior.
I'm don't know why they did that (though, all else being equal, reducing
the number of instances of undefined behavior is a good thing), but
reversing that decision for this one issue is not something they decided
to do.
Alternatively, it could have said that the value is unspecified in
every case. Then on the IA-64, the compiler would have to ensure that
registers do not have their NaT bit set even if they are not
initialised - this would not be a difficult task. Enabling use of the
NaT bit for detection of bugs could then be a compiler option if
implementations wanted to provide that feature.
The whole point of the NaT bit is to detect accesses to uninitialized
values. Requiring the compiler to arbitrarily clear that bit
doesn't strike me as a good idea.
I dislike the way that wording was added to the standard specifically
to cater to one specific CPU (which happens to have been discontinued
later). I would have been happier with a more general solution.
I that making accessing the value of an uninitialized automatic
object UB would have been much cleaner, and it would have allowed for sensible use of NaT by IA-64 compilers. But without knowing *why*
the committee removed that UB between C90 and C99, I'm hesitant to
say it was a mistake.
Meanwhile, I will in effect assume that accessing uninitialized objects
is UB, i.e., I'll carefully avoid doing so.
David Brown <david.brown@hesbynett.no> writes:
[...]I believe it would be much simpler and clearer if attempting
to read an uninitialised and unassigned local variable were
undefined behaviour in every case.
I probably agree (I haven't given it all that much thought), but
the committee made a specific decision between C90 and C99 to say
that reading an uninitialized automatic object is *not* undefined
behavior. I'm don't know why they did that (though, all else
being equal, reducing the number of instances of undefined
behavior is a good thing), but reversing that decision for this
one issue is not something they decided to do.
Alternatively, it could have said that the value is unspecified
in every case. Then on the IA-64, the compiler would have to
ensure that registers do not have their NaT bit set even if they
are not initialised - this would not be a difficult task.
Enabling use of the NaT bit for detection of bugs could then be a
compiler option if implementations wanted to provide that
feature.
The whole point of the NaT bit is to detect accesses to
uninitialized values. Requiring the compiler to arbitrarily clear
that bit doesn't strike me as a good idea.
I dislike the way that wording was added to the standard
specifically to cater to one specific CPU (which happens to have
been discontinued later). I would have been happier with a more
general solution. I that making accessing the value of an
uninitialized automatic object UB would have been much cleaner,
and it would have allowed for sensible use of NaT by IA-64
compilers.
But without knowing *why* the committee removed that UB between
C90 and C99, I'm hesitant to say it was a mistake.
Tim Rentsch <tr.17687@z991.linuxsc.com> writes:
[...]
An addle-brained view. Anyone who thinks that should be forcibly
removed from any activity involving software development.
Be less rude.
Tim Rentsch <tr.17687@z991.linuxsc.com> writes:
Keith Thompson <Keith.S.Thompson+u@gmail.com> writes:
David Brown <david.brown@hesbynett.no> writes:
[...]I believe it would be much simpler and clearer if attempting
to read an uninitialised and unassigned local variable were
undefined behaviour in every case.
I probably agree (I haven't given it all that much thought), but
the committee made a specific decision between C90 and C99 to say
that reading an uninitialized automatic object is *not* undefined
behavior. I'm don't know why they did that (though, all else
being equal, reducing the number of instances of undefined
behavior is a good thing), but reversing that decision for this
one issue is not something they decided to do.
Your description of what was done is wrong. It is still the case in
C99 that trying to access an uninitialized object is undefined
behavior, at least potentially, except for accesses using a type
that either is a character type or has no trap representations (and
all types other than unsigned char may have trap representations,
depending on the implementation). A statement like
int a = a;
may still be given a warning as potential undefined behavior, even
in C99.
I had already mentioned that distinction earlier in the thread.
The mistake is thinking that UB for uninitialized access was
removed in C99. It wasn't. Narrowed, yes; removed, no.
Acknowledged.
Tim Rentsch <tr.17687@z991.linuxsc.com> writes:
Keith Thompson <Keith.S.Thompson+u@gmail.com> writes:
[how to indicate a variable not being used is okay]
[some quoted text rearranged]
Unless I'm missing something, `(void)x` also has undefined beahvior
if x is uninitialized,
Right. Using (void)&x is better.
I'm not convinced -- and it's far less idiomatic.
I don't think
I've ever seen (void)&x in code, and if I did I'd wonder what the
author's intent was.
(void)x is a common idiom for hinting to the compiler that it
doesn't need to complain about x being unused. (void)&x doesn't
tell the compiler that the *value* of x is used. I'm not sure how
much difference that makes.
Even with (void)x and/or (void)&x, a compiler *could* still warn
about x being unused, or about the programmer's use of an ugly font.
though it's very likely to do nothing in practice.
Unless x is volatile qualified, in which there must be an access
to x in the generated code.
The behavior [of int a = a;] is undefined. In C11 and later
(N1570 6.3.2.1p2):
Except when [...] an lvalue that does not have array type is
converted to the value stored in the designated object (and is
no longer an lvalue); this is called lvalue conversion.
[...]
If the lvalue designates an object of automatic storage
duration that could have been declared with the register
storage class (never had its address taken), and that object
is uninitialized (not declared with an initializer and no
assignment to it has been performed prior to use), the
behavior is undefined.
Long digression follows.
The "could have been declared with the register storage class"
seems quite odd. And in fact it is quite odd.
I don't have the same reaction. The point of this phrase is that
undefined behavior occurs only for variables that don't have
their address taken. The phrase used describes that nicely.
Any questions related to "registerness" can be ignored, because
'register' in C really has nothing to do with hardware registers,
despite the name.
DR 338 is explicitly motivated by an IA-64 feature that applies only to
CPU registers. An object whose address is taken can't be stored (only)
in a register, so it can't have a NaT representation.
The phrase used is "could have been declared with register storage class (never had its address taken)". Surely "never had its address taken"
would have been clear enough if CPU registers weren't a big part of the motivation.
https://www.open-std.org/jtc1/sc22/wg14/www/docs/dr_338.htm
So the "could have been declared with the register storage class"
wording was added in C11 specifically to cater to the IA64. This
change would have been superfluous in C90, where the behavior was
undefined anyway, but is a semantically significant change between
C99 and C11. (If some future CPU has something like NaT that can
be stored in memory, the wording might need to be updated yet again.)
My takeaway is that if it requires this much research to determine
whether accessing the value of an uninitialized object has undefined
behavior (in which circumstances and which edition of the standard),
I'll just avoid doing so altogether. I'll initialize objects
when they're defined whenever practical. If it's not practical
for some reason, I won't initialize it with some dummy value; I'll
leave it uninitialized so the compiler has a chance to warn me if
I accidentally use it before assigning a value to it.
I think you are overthinking the question. In cases where it's
important to give an initial value to a variable, and can be done
so at the point of its declaration, use an initializer; otherwise
don't.
My overthinking led me to essentially the same conclusion, so I don't
see the problem. And I also found it to be an interesting exploration
of how certain aspects of the C standard have evolved over time.
We don't have to read several different C standards, or
even only one, to reach that conclusion.
No, but we do have to read one or more C standards to counter an
argument that `int a = a;` is well defined.
If someone wants to know
exactly which border cases are safe and which cases are not, then
reading the relevant version(s) of the C standard is needed, but
in most situations it isn't. It's important for the C standard to
be precise about what it prescribes, but as far as initialization
goes it's easy to write code that doesn't need that level of
detail. Compiler writers need to know such things; in the
particular case of when and where to initialize, most developers
don't.
Most developers don't read this newsgroup.
Sysop: | Keyop |
---|---|
Location: | Huddersfield, West Yorkshire, UK |
Users: | 546 |
Nodes: | 16 (2 / 14) |
Uptime: | 42:54:50 |
Calls: | 10,392 |
Files: | 14,064 |
Messages: | 6,417,215 |