So, even if we had compile-time checks for bugs, C compilers and the
standard are not prepared to handle the implications to make C a safer language.
On Thu, 8 Feb 2024 01:01:56 -0300, Thiago Adams wrote:
So, even if we had compile-time checks for bugs, C compilers and the
standard are not prepared to handle the implications to make C a safer
language.
Do you want C to turn into Java? In Java, rules about reachability are
built into the language. And these rules are simplistic ones, based on
the state of the art back in the 1990s, not taking account of
improvements in compiler technology since then. For example, in the
following code, the uninitialized declaration of “Result” is a compile-time error, even though a human looking at the code can figure
out that there is no way it will be left uninitialized at the point of reference:
Let's say C compilers can detect all sorts of bugs at compile time.
How would C compilers report that? As an error or a warning?
Let's use this sample:
int main() {
int a = 1;
a = a / 0;
}
GCC says:
warning: division by zero is undefined [-Wdivision-by-zero]
5 | a = a / 0;
| ^ ~
In case GCC or any other compiler reports this as an error, then C programmers would likely complain. Am I right?
So, even if we had compile-time checks for bugs, C compilers and the
standard are not prepared to handle the implications to make C a safer language.
From my point of view, we need an error, not a warning.
But we also
need a way to ignore the error in case the programmer wants to see what happens, with a division by zero, for instance. (Please note that this
topic IS NOT about this specific warning; it is just a sample.)
Warnings work more or less like this. The problem with warnings is that
they are not standardized - the "name/number" and the way to
disable/enable them.
So this is the first problem we need to solve in C to make the language safer. We need a mechanism.
I also believe we can have a standard profile for warnings that will
change the compiler behaviour; for instance, making undefined behaviour
an error.
The C standard is complacent with the lack of error messages. Sometimes
it says "message is encouraged...". This shifts the responsibility from
the language. But when someone says "C is dangerous," the language as a
whole is blamed, regardless of whether you are using MISRA, for instance.
Thus, not only are the mechanics of the language is unprepared, but the standard is also not prepared to assume the responsibility of being a
source of guidance and safety.
Thiago Adams <thiago.adams@gmail.com> writes:
Let's say C compilers can detect all sorts of bugs at compile time.
How would C compilers report that? As an error or a warning?
Let's use this sample:
int main() {
int a = 1;
a = a / 0;
}
GCC says:
warning: division by zero is undefined [-Wdivision-by-zero]
5 | a = a / 0;
| ^ ~
In case GCC or any other compiler reports this as an error, then C
programmers would likely complain. Am I right?
Someone will always complain, but a conforming compiler can report this
as a fatal error.
Division by zero has undefine behavior. Under the standard's definition
of undefined behavior, it says:
NOTE
Possible undefined behavior ranges from ignoring the situation
completely with unpredictable results, to behaving during
translation or program execution in a documented manner
characteristic of the environment (with or without the issuance
of a diagnostic message), to terminating a translation or
execution (with the issuance of a diagnostic message).
Though it's not quite that simple. Rejecting the program if the
compiler can't prove that the division will be executed would IMHO
be non-conforming. Code that's never executed has no behavior, so it
doesn't have undefined behavior.
But of course any compiler can reject anything it likes in
non-conforming mode. See for example "gcc -Werror".
But even ignoring that, a culture of paying very close attention to
non-fatal warnings could go a long way towards making C safer (assuming compilers are clever enough to issue good warnings).
From my point of view, we need an error, not a warning. But we also
need a way to ignore the error in case the programmer wants to see
what happens, with a division by zero, for instance. (Please note that
this topic IS NOT about this specific warning; it is just a sample.)
Warnings work more or less like this. The problem with warnings is
that they are not standardized - the "name/number" and the way to disable/enable them.
So this is the first problem we need to solve in C to make the
language safer. We need a mechanism.
I also believe we can have a standard profile for warnings that will
change the compiler behaviour; for instance, making undefined
behaviour an error.
Em 2/8/2024 5:17 AM, David Brown escreveu:
...
I think it would be better if compilers were stricter in their default
modes, even if their stricter compliant modes have to reduce the
severity of some warnings. People should be allowed to write "weird"
code that looks very much like an error - but perhaps it is those
people who should need extra flags or other effort, not those that
write "normal" code and want to catch their bugs. (I know this can't
be done for tools like gcc, because it could cause problems with
existing code.)
Yes this is my point.
But I believe we need a standard mechanism, and this is the first step towards safety in C.
Consider this code.
int main(void)
{
int a = 1, b = 2;
#ifdef _MSC_VER
#pragma warning( push )
#pragma warning( disable : 4706 )
#else
#pragma GCC diagnostic push
#pragma GCC diagnostic ignored "-Wparentheses"
#endif
if (a = b){}
#ifdef _MSC_VER
#pragma warning( pop )
#else
#pragma GCC diagnostic pop
#endif
}
This code wants to use a = b inside the if condition.
The code shows how to disable the warning in GCC and MSCV.
If we had in the standard a number for the warning, and a mechanism for disabling then we could have something like
int main(void)
{
int a = 1, b = 2;
if (a = b) [[disable:4706]]
{
}
}
Maybe
int main(void)
{
int a = 1, b = 2;
if (a = b) _ignore("I want to assign here...")
{
}
}
That is applied to any warning on that specific line.
The advantage is the warning ID is not necessary
Let's say C compilers can detect all sorts of bugs at compile time.
How would C compilers report that? As an error or a warning?
Let's use this sample:
int main() {
int a = 1;
a = a / 0;
}
GCC says:
warning: division by zero is undefined [-Wdivision-by-zero]
5 | a = a / 0;
| ^ ~
In case GCC or any other compiler reports this as an error, then C programmers would likely complain. Am I right?
So, even if we had compile-time checks for bugs, C compilers and the
standard are not prepared to handle the implications to make C a safer language.
From my point of view, we need an error, not a warning. But we also
need a way to ignore the error in case the programmer wants to see what happens, with a division by zero, for instance.
topic IS NOT about this specific warning; it is just a sample.)
Warnings work more or less like this. The problem with warnings is that
they are not standardized - the "name/number" and the way to
disable/enable them.
So this is the first problem we need to solve in C to make the language safer. We need a mechanism.
I also believe we can have a standard profile for warnings that will
change the compiler behaviour; for instance, making undefined behaviour
an error.
The C standard is complacent with the lack of error messages. Sometimes
it says "message is encouraged...". This shifts the responsibility from
the language. But when someone says "C is dangerous," the language as a
whole is blamed, regardless of whether you are using MISRA, for instance.
Thus, not only are the mechanics of the language is unprepared, but the standard is also not prepared to assume the responsibility of being a
source of guidance and safety.
One more:
struct X {double d;};
struct Y {int i;};
int main()
{
struct X x;
struct Y *p;
p = &x;
}
GCC says:
warning: incompatible pointer types assigning to 'struct Y *' from
'struct X *' [-Wincompatible-pointer-types]
7 | p = &x;
| ^ ~~
For this one a C++ compiler gives:
<source>:7:6: error: cannot convert 'X*' to 'Y*' in assignment
7 | p = &x;
So, this may be also related with the "spirit of C". I like the idea of
the programmer can do whatever they want.
Consider the question
"Is C language safe?"
The answer will be , well, the language itself is very vague, depends
of the compiler you use.
Em 2/8/2024 12:25 PM, David Brown escreveu:
I was having a look at C# specification. It uses pragma.
https://learn.microsoft.com/en-us/dotnet/csharp/language-reference/preprocessor-directives#nullable-context
Sample
#pragma warning disable 414, CS3021
#pragma warning restore CS3021
It also have warning numbers, I am not sure if other C# compilers uses
the same warnings "id".
David Brown <david.brown@hesbynett.no> writes:
On 08/02/2024 05:59, Keith Thompson wrote:
Thiago Adams <thiago.adams@gmail.com> writes:
Let's say C compilers can detect all sorts of bugs at compile time.Someone will always complain, but a conforming compiler can report
How would C compilers report that? As an error or a warning?
Let's use this sample:
int main() {
int a = 1;
a = a / 0;
}
GCC says:
warning: division by zero is undefined [-Wdivision-by-zero]
5 | a = a / 0;
| ^ ~
In case GCC or any other compiler reports this as an error, then C
programmers would likely complain. Am I right?
this as a fatal error.
I'm not /entirely/ convinced. Such code is only undefined behaviour
at run-time, I believe. A compiler could reject the code (give a
fatal error) if it is sure that this will be reached when the code is
run. But can it be sure of that, even if it is in "main()" ?
Freestanding implementations don't need to run "main()" (not all my
programs have had a "main()" function), and the freestanding/hosted
implementation choice is a matter of the implementation, not just the
compiler.
In a freestanding implementation, main() *might* be just another
function. In that case a compiler can't prove that the code will be
invoked.
I was assuming a hosted implementation -- and the compiler knows whether
its implementation is hosted or freestanding.
On 08/02/2024 16:04, Keith Thompson wrote:
In a freestanding implementation, main() *might* be just anotherWell sometimes it can. The boot routine or program entry point is by defintion always invoked, and you can generally prove that at least some
function. In that case a compiler can't prove that the code will be
invoked.
code is always reached from that. However it is the halting problem, and
you can never prove for all cases, even if the code must be reached or
not reached regardless of runtime inputs.
On 09/02/2024 07:19, David Brown wrote:
On 08/02/2024 17:14, Malcolm McLean wrote:The "is this code reached?" problem is the halting problem with the
On 08/02/2024 16:04, Keith Thompson wrote:
In a freestanding implementation, main() *might* be just anotherWell sometimes it can. The boot routine or program entry point is by
function. In that case a compiler can't prove that the code will be
invoked.
defintion always invoked, and you can generally prove that at least
some code is always reached from that. However it is the halting
problem, and you can never prove for all cases, even if the code must
be reached or not reached regardless of runtime inputs.
It is not "the halting problem". What you are trying to say, is that
it is undecidable or not a computable problem in the general case.
trivial and unimportant difference that the code in question does not
have to be "exit()".
On 09/02/2024 08:33, David Brown wrote:
On 09/02/2024 08:46, Malcolm McLean wrote:Well I've been accused of wasting my English degree, and so now I'm
On 09/02/2024 07:19, David Brown wrote:
On 08/02/2024 17:14, Malcolm McLean wrote:The "is this code reached?" problem is the halting problem with the
On 08/02/2024 16:04, Keith Thompson wrote:
In a freestanding implementation, main() *might* be just anotherWell sometimes it can. The boot routine or program entry point is
function. In that case a compiler can't prove that the code will be >>>>>> invoked.
by defintion always invoked, and you can generally prove that at
least some code is always reached from that. However it is the
halting problem, and you can never prove for all cases, even if the
code must be reached or not reached regardless of runtime inputs.
It is not "the halting problem". What you are trying to say, is
that it is undecidable or not a computable problem in the general case. >>>>
trivial and unimportant difference that the code in question does not
have to be "exit()".
No, it is not.
The two problems can be shown to be equivalently "hard" - that is, if
you could find a solution to one, it would let you solve the other.
But that does not make them the same problem.
And even if they /were/ the same thing, writing "this is undecidable"
or "this is infeasible to compute" is clear and to the point. Writing
"this is the halting problem" is name-dropping a computer science
theory in order to look smart - and like most such attempts, is more
smart-arse than smart.
going to accuse you of wasting your mathematics-related degree.
On 09/02/2024 08:46, Malcolm McLean wrote:
On 09/02/2024 07:19, David Brown wrote:
On 08/02/2024 17:14, Malcolm McLean wrote:The "is this code reached?" problem is the halting problem with the
On 08/02/2024 16:04, Keith Thompson wrote:
In a freestanding implementation, main() *might* be just anotherWell sometimes it can. The boot routine or program entry point is by
function. In that case a compiler can't prove that the code will be >>>>> invoked.
defintion always invoked, and you can generally prove that at least
some code is always reached from that. However it is the halting
problem, and you can never prove for all cases, even if the code must
be reached or not reached regardless of runtime inputs.
It is not "the halting problem". What you are trying to say, is that it >>> is undecidable or not a computable problem in the general case.
trivial and unimportant difference that the code in question does not
have to be "exit()".
No, it is not.
The two problems can be shown to be equivalently "hard" - that is, if you could find a solution to one, it would let you solve the other. But that does not make them the same problem.
On 09/02/2024 10:24, Ben Bacarisse wrote:
David Brown <david.brown@hesbynett.no> writes:
On 09/02/2024 08:46, Malcolm McLean wrote:
The two problems can be shown to be equivalently "hard" - that is, if you >>> could find a solution to one, it would let you solve the other. But that >>> does not make them the same problem.Sure. But it's not the halting problem for another reason as well.
In the formal models (that have halting problems and so on), it's "the
computation" that is the problem instance. That is, the code and all
the data are presented for an answer. A C compiler has to decide on
reachability without knowing the input.
For example, is this code "undefined":
int c, freq[128];
while ((c = getchar()) != EOF) freq[c]++;
? Maybe it was knocked up as a quick way to collect stats on base64
encoded data streams. If so, it's fine (at least in terms of UB).
Perhaps until you read an input that has 2**31 or more of the same
character.
David Brown <david.brown@hesbynett.no> writes:
On 09/02/2024 08:46, Malcolm McLean wrote:
The two problems can be shown to be equivalently "hard" - that is, if you
could find a solution to one, it would let you solve the other. But that
does not make them the same problem.
Sure. But it's not the halting problem for another reason as well.
In the formal models (that have halting problems and so on), it's "the computation" that is the problem instance. That is, the code and all
the data are presented for an answer. A C compiler has to decide on reachability without knowing the input.
For example, is this code "undefined":
int c, freq[128];
while ((c = getchar()) != EOF) freq[c]++;
? Maybe it was knocked up as a quick way to collect stats on base64
encoded data streams. If so, it's fine (at least in terms of UB).
On 09/02/2024 10:24, Ben Bacarisse wrote:
David Brown <david.brown@hesbynett.no> writes:Yes, but I specified "regardless of runtime inputs".
On 09/02/2024 08:46, Malcolm McLean wrote:Sure. But it's not the halting problem for another reason as well.
On 09/02/2024 07:19, David Brown wrote:
On 08/02/2024 17:14, Malcolm McLean wrote:The "is this code reached?" problem is the halting problem with the
On 08/02/2024 16:04, Keith Thompson wrote:
In a freestanding implementation, main() *might* be just another >>>>>>> function. In that case a compiler can't prove that the code will be >>>>>>> invoked.Well sometimes it can. The boot routine or program entry point is by >>>>>> defintion always invoked, and you can generally prove that at least >>>>>> some code is always reached from that. However it is the halting
problem, and you can never prove for all cases, even if the code must >>>>>> be reached or not reached regardless of runtime inputs.
It is not "the halting problem". What you are trying to say, is that it >>>>> is undecidable or not a computable problem in the general case.
trivial and unimportant difference that the code in question does not
have to be "exit()".
No, it is not.
The two problems can be shown to be equivalently "hard" - that is, if you >>> could find a solution to one, it would let you solve the other. But that >>> does not make them the same problem.
In the formal models (that have halting problems and so on), it's "the
computation" that is the problem instance. That is, the code and all
the data are presented for an answer. A C compiler has to decide on
reachability without knowing the input.
For example, is this code "undefined":
int c, freq[128];
while ((c = getchar()) != EOF) freq[c]++;
? Maybe it was knocked up as a quick way to collect stats on base64
encoded data streams. If so, it's fine (at least in terms of UB).
If a signed integer overflows the behaviour is undefined, so you also have
to prove that the input stream is short. And of course you also forgot to intialise to zero.
On 09/02/2024 10:24, Ben Bacarisse wrote:
David Brown <david.brown@hesbynett.no> writes:Yes, but I specified "regardless of runtime inputs". A bit downthread,
On 09/02/2024 08:46, Malcolm McLean wrote:
On 09/02/2024 07:19, David Brown wrote:
On 08/02/2024 17:14, Malcolm McLean wrote:The "is this code reached?" problem is the halting problem with the
On 08/02/2024 16:04, Keith Thompson wrote:
In a freestanding implementation, main() *might* be just another >>>>>>> function. In that case a compiler can't prove that the code will be >>>>>>> invoked.Well sometimes it can. The boot routine or program entry point is by >>>>>> defintion always invoked, and you can generally prove that at least >>>>>> some code is always reached from that. However it is the halting
problem, and you can never prove for all cases, even if the code must >>>>>> be reached or not reached regardless of runtime inputs.
It is not "the halting problem". What you are trying to say, is
that it
is undecidable or not a computable problem in the general case.
trivial and unimportant difference that the code in question does not
have to be "exit()".
No, it is not.
The two problems can be shown to be equivalently "hard" - that is, if
you
could find a solution to one, it would let you solve the other. But
that
does not make them the same problem.
Sure. But it's not the halting problem for another reason as well.
In the formal models (that have halting problems and so on), it's "the
computation" that is the problem instance. That is, the code and all
the data are presented for an answer. A C compiler has to decide on
reachability without knowing the input.
For example, is this code "undefined":
int c, freq[128];
while ((c = getchar()) != EOF) freq[c]++;
? Maybe it was knocked up as a quick way to collect stats on base64
encoded data streams. If so, it's fine (at least in terms of UB).
you can be forgiven for having missed that. But DB less so, especially
when I'm accused of being the one who doesn't read things properly.
And
even when I agree there might be some truth in that, and explain why, I
am then accused of misrepresenting what an Oxford English degree is like.
If you change the halting problem such that some of the symbols on the
tape are allowed to have unknown values then I don't think you are
changing it in any mathematically very interesting way so it is still "trival", but if you attempt a halt decider it will substantially change
your programming approach, and so it is no longer "unimportant".
bart <bc@freeuk.com> writes:
On 09/02/2024 10:24, Ben Bacarisse wrote:
David Brown <david.brown@hesbynett.no> writes:
On 09/02/2024 08:46, Malcolm McLean wrote:
The two problems can be shown to be equivalently "hard" - that is, if you >>>> could find a solution to one, it would let you solve the other. But that >>>> does not make them the same problem.Sure. But it's not the halting problem for another reason as well.
In the formal models (that have halting problems and so on), it's "the
computation" that is the problem instance. That is, the code and all
the data are presented for an answer. A C compiler has to decide on
reachability without knowing the input.
For example, is this code "undefined":
int c, freq[128];
while ((c = getchar()) != EOF) freq[c]++;
? Maybe it was knocked up as a quick way to collect stats on base64
encoded data streams. If so, it's fine (at least in terms of UB).
Perhaps until you read an input that has 2**31 or more of the same
character.
Yes, that's the point. Any UB is dependent on the input. It's not a
static property of the code.
David Brown <david.brown@hesbynett.no> writes:
[...]
But I am not sure that the /compiler/ knows that it is compiling for a[...]
hosted or freestanding implementation. The same gcc can be used for
Linux hosted user code and a freestanding Linux kernel.
A conforming compiler must predefine the macro __STDC_HOSTED__ to either
0 or 1 (since C99).
This is something which has long been of fascination to me: how
exactly do you get a C compiler to actually fail a program with a
hard error when there is obviously something wrong, while not also
failing on completely harmless matters.
Thiago Adams <thiago.adams@gmail.com> writes:
Let's say C compilers can detect all sorts of bugs at compile time.
How would C compilers report that? As an error or a warning?
Let's use this sample:
int main() {
int a = 1;
a = a / 0;
}
GCC says:
warning: division by zero is undefined [-Wdivision-by-zero]
5 | a = a / 0;
| ^ ~
In case GCC or any other compiler reports this as an error, then C
programmers would likely complain. Am I right?
Someone will always complain, but a conforming compiler can report this
as a fatal error.
Division by zero has undefine behavior. Under the standard's definition
of undefined behavior, it says:
NOTE
Possible undefined behavior ranges from ignoring the situation
completely with unpredictable results, to behaving during
translation or program execution in a documented manner
characteristic of the environment (with or without the issuance
of a diagnostic message), to terminating a translation or
execution (with the issuance of a diagnostic message).
Though it's not quite that simple. Rejecting the program if the
compiler can't prove that the division will be executed would IMHO
be non-conforming. Code that's never executed has no behavior, so
it doesn't have undefined behavior.
bart <bc@freeuk.com> writes:
[...]
This is something which has long been of fascination to me: how
exactly do you get a C compiler to actually fail a program with a
hard error when there is obviously something wrong, while not also
failing on completely harmless matters.
I think the answer is obvious: unless and until you find someone
who works on a C compiler and who has exactly the same sense that
you do of "when there is obviously something wrong" and of what
conditions fall under the heading of "completely harmless matters",
and also the same sense that you do of how a C compiler should
behave in those cases, you won't get exactly what you want unless
you do it yourself.
On 10/02/2024 01:59, Tim Rentsch wrote:
bart <bc@freeuk.com> writes:
[...]
This is something which has long been of fascination to me: how
exactly do you get a C compiler to actually fail a program with a
hard error when there is obviously something wrong, while not also
failing on completely harmless matters.
I think the answer is obvious: unless and until you find someone
who works on a C compiler and who has exactly the same sense that
you do of "when there is obviously something wrong" and of what
conditions fall under the heading of "completely harmless matters",
and also the same sense that you do of how a C compiler should
behave in those cases, you won't get exactly what you want unless
you do it yourself.
Take this function:
void F() {
F();
F(1);
F(1, 2.0);
F(1, 2.0, "3");
F(1, 2.0, "3", F);
}
Even if /one/ of those calls is correct, the other four can't be
possibly be correct as well.
Is there anyone here who doesn't think there is something obviously wrong?
How about this one:
#include <stdio.h>
int main(void) {
int a;
L1:
printf("Hello, World!\n");
}
Ramp up the warnings and a compiler will tell you about unused 'a' and
'L1'. Use -Werror and the compilation will fail.
Is there anyone here who thinks that running this program with those
unused identifiers is not completely harmless?
#include <stdio.h>
int main(void) {
int a;
L1:
printf("Hello, World!\n");
}
Ramp up the warnings and a compiler will tell you about unused 'a' and
'L1'. Use -Werror and the compilation will fail.
Is there anyone here who thinks that running this program with those
unused identifiers is not completely harmless?
On 2024-02-10, bart <bc@freeuk.com> wrote:
#include <stdio.h>
int main(void) {
int a;
L1:
printf("Hello, World!\n");
}
Ramp up the warnings and a compiler will tell you about unused 'a' and
'L1'. Use -Werror and the compilation will fail.
Is there anyone here who thinks that running this program with those
unused identifiers is not completely harmless?
Unused warnings exist because they help catch bugs.
double distance(double x, double y)
{
return sqrt(x*x + x*x);
}
The diagnostic will not catch all bugs of this type, since just one use is enough to silence it, but catching something is better than nothing.
Removing unused cruft also helps to keep the code clean. Stray material sometimes gets left behind after refactoring, or careless copy paste.
Unused identifiers are a "code smell".
Sometimes something must be left unused. It's good to be explicit about
that: to have some indication that it's deliberately unused.
When I implemented unused warnings in my Lisp compiler, I found a bug right away.
https://www.kylheku.com/cgit/txr/commit/?id=5ee2cd3b2304287c010237e03be4d181412e066f
In this diff hunk against in the assembler:
@@ -217,9 +218,9 @@
(q me.(cur-pos)))
(inc c)
me.(set-pos p)
- (format t "~,5d: ~,08X ~a\n" (trunc p 4) me.(get-word) dis-txt)
+ (format stream "~,5d: ~,08X ~a\n" (trunc p 4) me.(get-word) dis-txt)
(while (< (inc p 4) q)
- (format t "~,5d: ~,08X\n" (trunc p 4) me.(get-word)))
+ (format stream "~,5d: ~,08X\n" (trunc p 4) me.(get-word)))
me.(set-pos q)
(set p q)))
c))
The format function was given argument t, a nickname for standard output, so this code ignored the stream parameter and always sent output to standard output.
With the unused warnings, it got diagnosed.
How about this one:
#include <stdio.h>
int main(void) {
int a;
L1:
printf("Hello, World!\n");
}
Ramp up the warnings and a compiler will tell you about unused 'a' and
'L1'. Use -Werror and the compilation will fail.
Is there anyone here who thinks that running this program with those
unused identifiers is not completely harmless?
On Sat, 10 Feb 2024 20:22:52 +0000, bart <bc@freeuk.com> wrote in <uq8lus$3dceu$1@dont-email.me>:
How about this one:
#include <stdio.h>
int main(void) {
int a;
L1:
printf("Hello, World!\n");
}
Ramp up the warnings and a compiler will tell you about unused 'a' and
'L1'. Use -Werror and the compilation will fail.
Is there anyone here who thinks that running this program with those
unused identifiers is not completely harmless?
Yet you promoted warnings to errors, just to find a way to make
it fail. :(
("Is there anyone here who thinks that" bart's continuous
complaining about options to gcc deserve any merit?)
On 10/02/2024 21:49, Kaz Kylheku wrote:
On 2024-02-10, bart <bc@freeuk.com> wrote:
#include <stdio.h>
int main(void) {
int a;
L1:
printf("Hello, World!\n");
}
Ramp up the warnings and a compiler will tell you about unused 'a' and
'L1'. Use -Werror and the compilation will fail.
Is there anyone here who thinks that running this program with those
unused identifiers is not completely harmless?
Unused warnings exist because they help catch bugs.
double distance(double x, double y)
{
return sqrt(x*x + x*x);
}
The diagnostic will not catch all bugs of this type, since just one use is >> enough to silence it, but catching something is better than nothing.
Removing unused cruft also helps to keep the code clean. Stray material
sometimes gets left behind after refactoring, or careless copy paste.
Unused identifiers are a "code smell".
This is a different kind of analysis. IMV it doesn't belong in a routine compilation, just something you do periodically, or when you're stuck
for ideas.
In your example, maybe you did want x*x*2, and the 'y' is either a
parameter no longer needed, or not yet needed, or temporarily not used.
So it is not 'obviously wrong', and by itself, not using a parameter is harmless.
I'm looking for a result from a compiler which is either Pass or Fail,
not Maybe.
With the unused warnings, it got diagnosed.
So you use the linty options when you're stuck with a bug, as I suggested.
[...] I'm curious why there is resistance to conditionals written
like this:
if( 1 == a)
...that is to say, with the constant first.
I've done that in C and Perl. Are the aesthetics so
bad that I should quit that? Isn't it safer to write
it that way, so that a dropped "=" is pointed out on
compilation?
On 10/02/2024 01:59, Tim Rentsch wrote:
bart <bc@freeuk.com> writes:
[...]
This is something which has long been of fascination to me: how
exactly do you get a C compiler to actually fail a program with a
hard error when there is obviously something wrong, while not also
failing on completely harmless matters.
I think the answer is obvious: unless and until you find someone
who works on a C compiler and who has exactly the same sense that
you do of "when there is obviously something wrong" and of what
conditions fall under the heading of "completely harmless matters",
and also the same sense that you do of how a C compiler should
behave in those cases, you won't get exactly what you want unless
you do it yourself.
Take this function:
void F() {
F();
F(1);
F(1, 2.0);
F(1, 2.0, "3");
F(1, 2.0, "3", F);
}
Even if /one/ of those calls is correct, the other four can't be
possibly be correct as well.
Is there anyone here who doesn't think there is something obviously wrong?
How about this one:
#include <stdio.h>
int main(void) {
int a;
L1:
printf("Hello, World!\n");
}
Ramp up the warnings and a compiler will tell you about unused 'a' and
L1'. Use -Werror and the compilation will fail.
Is there anyone here who thinks that running this program with those
unused identifiers is not completely harmless?
Tim Rentsch <tr.17687@z991.linuxsc.com> writes:
[snip discussion of a program that divides by zero]
An implementation can refuse to translate the program, but not
because undefined behavior occurs. The undefined behavior here
happens only when the program is executed, but just compiling the
program doesn't do that. No execution, no undefined behavior.
Still the program may be rejected, because it is not strictly
conforming (by virtue of having output depend on the undefined
behavior if the program is ever run).
Just to be clear, would you say that a conforming hosted implementation
may reject this program:
#include <limits.h>
#include <stdio.h>
int main(void) {
printf("INT_MAX = %d\n", INT_MAX);
}
solely because it's not strictly conforming?
On 11/02/2024 00:40, vallor wrote:
("Is there anyone here who thinks that" bart's continuousThe compiler should be invoked with
complaining about options to gcc deserve any merit?)
gcc foo.c
Em 2/10/2024 6:49 PM, Kaz Kylheku escreveu:
On 2024-02-10, bart <bc@freeuk.com> wrote:
#include <stdio.h>
int main(void) {
int a;
L1:
printf("Hello, World!\n");
}
Ramp up the warnings and a compiler will tell you about unused 'a' and
'L1'. Use -Werror and the compilation will fail.
Is there anyone here who thinks that running this program with those
unused identifiers is not completely harmless?
Unused warnings exist because they help catch bugs.
double distance(double x, double y)
{
return sqrt(x*x + x*x);
}
Unused warning is a good sample to explain my point of view.
I want a "warning profile" inside the compiler to do a "automatic code review".
The criteria is not only complain about UB etc..the criteria is the same
used by humans (in the context of the program, how critical etc) to
approve or not a code.
On 2024-02-11, bart <bc@freeuk.com> wrote:
This is a different kind of analysis. IMV it doesn't belong in a routine
compilation, just something you do periodically, or when you're stuck
for ideas.
Periodically translates to never. If there are some situations you don't
want in the code, the best thing is to intercept any change which
introduces such, and not allow it to be merged.
In your example, maybe you did want x*x*2, and the 'y' is either a
parameter no longer needed, or not yet needed, or temporarily not used.
If we take a correct program and add an unused variable to it, it
doesn't break. Everyone knows that. That isn't the point.
So it is not 'obviously wrong', and by itself, not using a parameter is
harmless.
While it's not obviously wrong, it's not obviously right either.
Moreover, it is a hard fact that the parameter y is not used.
Granted, not every truth about a program is equally useful, but
experience shows that reporting unused identifiers pays off. My own experience and experiences of others. That's why such diagnostics are implemented in compilers.
I'm looking for a result from a compiler which is either Pass or Fail,
not Maybe.
Things are fortunately not going to revert to the 1982 state of the
art, though.
The job of the compile is not only to translate the code or report
failure, but to unearth noteworthy facts about the code and remark on
them.
With the unused warnings, it got diagnosed.
So you use the linty options when you're stuck with a bug, as I suggested.
The point is, I would likely not have found that bug to this day without
the diagnostic. You want to be informed /before/ the bug is identified
in the field.
vallor <vallor@cultnix.org> writes:
[...]
Regarding the topic, I'm curious why there is resistance
to conditionals written like this:
if( 1 == a)
...that is to say, with the constant first.
I've done that in C and Perl. Are the aesthetics so
bad that I should quit that? Isn't it safer to write
it that way, so that a dropped "=" is pointed out on
compilation?
I personally find "Yoda conditions" jarring.
I'm aware that `1 == a` and `a == 1` are equivalent, but I read them differently. The latter asks something about a, namely whether it's
equal to 1. The former asks something about 1, which is inevitably a
silly question; I already know all about 1. I find that when reading
such code, I have to pause for a moment (perhaps a fraction of a second)
and mentally reverse the condition to understand it.
Note that I'm describing, not defending, the way I react to it.
Writing "=" rather than "==" is a sufficienly rare mistake, and likely
to be caught quickly because most compilers warn about it, that it's
just not worth scrambling the code to avoid it.
If you've internalized the commutativity of "==" so well that seeing
`1 == a` rather than `a == 1` doesn't bother you, that's fine.
But consider that some people reading your code are likely to have
reactions similar to mine.
On 10/02/2024 21:49, Kaz Kylheku wrote:
On 2024-02-10, bart <bc@freeuk.com> wrote:
#include <stdio.h>
int main(void) {
int a;
L1:
printf("Hello, World!\n");
}
Ramp up the warnings and a compiler will tell you about unused 'a' and
'L1'. Use -Werror and the compilation will fail.
Is there anyone here who thinks that running this program with those
unused identifiers is not completely harmless?
Unused warnings exist because they help catch bugs.
double distance(double x, double y)
{
return sqrt(x*x + x*x);
}
The diagnostic will not catch all bugs of this type, since just one
use is
enough to silence it, but catching something is better than nothing.
Removing unused cruft also helps to keep the code clean. Stray material
sometimes gets left behind after refactoring, or careless copy paste.
Unused identifiers are a "code smell".
This is a different kind of analysis. IMV it doesn't belong in a routine compilation, just something you do periodically, or when you're stuck
for ideas.
In your example, maybe you did want x*x*2, and the 'y' is either a
parameter no longer needed, or not yet needed, or temporarily not used.
So it is not 'obviously wrong', and by itself, not using a parameter is harmless.
I'm looking for a result from a compiler which is either Pass or Fail,
not Maybe.
On 11/02/2024 11:01, Malcolm McLean wrote:
On 11/02/2024 00:40, vallor wrote:
("Is there anyone here who thinks that" bart's continuousThe compiler should be invoked with
complaining about options to gcc deserve any merit?)
gcc foo.c
As a first stab, I'd use:
gcc -std=c11 -pedantic -W -Wall -Wextra -Werror foo.c
and try very hard to fix any/all warnings.
On 11/02/2024 11:31, Richard Harnden wrote:
On 11/02/2024 11:01, Malcolm McLean wrote:
On 11/02/2024 00:40, vallor wrote:
("Is there anyone here who thinks that" bart's continuousThe compiler should be invoked with
complaining about options to gcc deserve any merit?)
gcc foo.c
As a first stab, I'd use:
gcc -std=c11 -pedantic -W -Wall -Wextra -Werror foo.c
and try very hard to fix any/all warnings.
That sounds like an incredibly slow and painful way to code.
During development, you are adding, modifying, commenting, uncommenting
and tearing down code all the time.
C already requires you to dot all the Is and cross all the Ts because of
its syntax and type needs. Why make the job even harder?
On 11/02/2024 02:46, Kaz Kylheku wrote:
On 2024-02-11, bart <bc@freeuk.com> wrote:
Granted, not every truth about a program is equally useful, but
experience shows that reporting unused identifiers pays off. My own
experience and experiences of others. That's why such diagnostics are
implemented in compilers.
Take these declarations at file-scope:
typedef int A;
static int B;
int C;
typedef long long int int64_t; // visible via stdint.h
They are not used anywhere in this translation unit. gcc will report B
being unused, but not the others.
'C' might be used in other translation units; I don't know if the linker
will pick that up, or maybe that info is not known to it.
A and int64_t can't be reported because the declarations for them may be inside a header (as is the case for int64_t) used by other modules where
they /are/ used.
But if not, they could also indicate errors. (Maybe there is also
'typedef float D', and some variable should have been type A not D.)
So potentially useful information that you say is important, but can't
be or isn't done by a compiler.
(This where whole-program compilers like the ones I do come into their
own.)
I'm looking for a result from a compiler which is either Pass or Fail,
not Maybe.
Things are fortunately not going to revert to the 1982 state of the
art, though.
The job of the compile is not only to translate the code or report
failure, but to unearth noteworthy facts about the code and remark on
them.
That's rubbish. People are quite happy to use endless scripting
languages where the bytecode compiler does exactly that: translate
source to linear bytecode in a no-nonsense fashion.
On 11/02/2024 12:38, bart wrote:
On 11/02/2024 11:31, Richard Harnden wrote:
As a first stab, I'd use:
gcc -std=c11 -pedantic -W -Wall -Wextra -Werror foo.c
and try very hard to fix any/all warnings.
That sounds like an incredibly slow and painful way to code.
During development, you are adding, modifying, commenting,
uncommenting and tearing down code all the time.
C already requires you to dot all the Is and cross all the Ts because
of its syntax and type needs. Why make the job even harder?
Fixing things early /is/ easier. YMMobviouslyV.
On 11/02/2024 02:46, Kaz Kylheku wrote:
On 2024-02-11, bart <bc@freeuk.com> wrote:
This is a different kind of analysis. IMV it doesn't belong in a routine >>> compilation, just something you do periodically, or when you're stuck
for ideas.
Periodically translates to never. If there are some situations you don't
want in the code, the best thing is to intercept any change which
introduces such, and not allow it to be merged.
I have an option in one of compilers called '-unused'. It displays a
list of unused parameter, local and global variables.
I should use it more often than I do. But in any case, it is a
by-product of an internal check where no storage is allocated for
variables, and no spilling is done for parameters.
The first unused parameter it reports on one app, is where the function
is part of a suite of functions that need to share the same set of parameters. Not all functions will use all parameters.
Most unused non-parameters are left-overs from endless modifications. (Temporary debugging variables are usually written in capitals so are
easy to spot.)
In your example, maybe you did want x*x*2, and the 'y' is either a
parameter no longer needed, or not yet needed, or temporarily not used.
If we take a correct program and add an unused variable to it, it
doesn't break. Everyone knows that. That isn't the point.
So it is not 'obviously wrong', and by itself, not using a parameter is
harmless.
While it's not obviously wrong, it's not obviously right either.
Moreover, it is a hard fact that the parameter y is not used.
Granted, not every truth about a program is equally useful, but
experience shows that reporting unused identifiers pays off. My own
experience and experiences of others. That's why such diagnostics are
implemented in compilers.
Take these declarations at file-scope:
typedef int A;
static int B;
int C;
typedef long long int int64_t; // visible via stdint.h
They are not used anywhere in this translation unit. gcc will report B
being unused, but not the others.
'C' might be used in other translation units; I don't know if the linker
will pick that up, or maybe that info is not known to it.
Things are fortunately not going to revert to the 1982 state of the
art, though.
The job of the compile is not only to translate the code or report
failure, but to unearth noteworthy facts about the code and remark on
them.
That's rubbish.
People are quite happy to use endless scripting
languages where the bytecode compiler does exactly that: translate
source to linear bytecode in a no-nonsense fashion.
Those are people who want a fast or even instant turnaround.
Some of use want to treat languages that target native code in the same
way; like scripting languages, but with the benefit of strict
type-checking and faster code!
The point is, I would likely not have found that bug to this day withoutWith the unused warnings, it got diagnosed.
So you use the linty options when you're stuck with a bug, as I suggested. >>
the diagnostic. You want to be informed /before/ the bug is identified
in the field.
So you run that option (like -unused in my product), /before/ it gets to
the field.
I can't routinely use -unused for the 100s of compiles I might do in one
day, even if the source was up-to-date that morning with zero unused
vars, because I will be compiling part-finished or temporarily
commented-out code all the time. Eg. there might be an empty function body.
Tim Rentsch <tr.17687@z991.linuxsc.com> writes:
Keith Thompson <Keith.S.Thompson+u@gmail.com> writes:
Tim Rentsch <tr.17687@z991.linuxsc.com> writes:
[snip discussion of a program that divides by zero]
An implementation can refuse to translate the program, but not
because undefined behavior occurs. The undefined behavior here
happens only when the program is executed, but just compiling the
program doesn't do that. No execution, no undefined behavior.
Still the program may be rejected, because it is not strictly
conforming (by virtue of having output depend on the undefined
behavior if the program is ever run).
Just to be clear, would you say that a conforming hosted
implementation may reject this program:
#include <limits.h>
#include <stdio.h>
int main(void) {
printf("INT_MAX = %d\n", INT_MAX);
}
solely because it's not strictly conforming?
My understanding of the C standard is that a hosted implementation
may choose not to accept the above program and still be conforming,
because this program is not strictly conforming. (Please assume
subsequent remarks always refer to implementations that are both
hosted and conforming.)
Also, assuming we have ruled out cases involving #error, a
conforming implementation may choose not to accept a given program
if and only if the program is not strictly conforming. Being
strictly conforming is the only criterion that matters (again
assuming there is no #error) in deciding whether an implementation
may choose not to accept the program in question.
I'm guessing that what you mean by "may reject" is the same as what
I mean by "may choose not to accept". I'd like to know if you think
that's right, or if you think there is some difference between the
two. (My intention is that the two phrases have the same meaning.)
Does the above adequately address the question you want answered?
I'm not sure. As I recall, I gave up on trying to understand what
you think "accept" means.
N1570 5.1.2.3p6:
A program that is correct in all other aspects, operating on
correct data, containing unspecified behavior shall be a
correct program and act in accordance with 5.1.2.3.
Does that not apply to the program above? How can it do so if it's
rejected (or not "accepted")?
The same paragraph says that "A *conforming hosted implementation*
shall accept any strictly conforming program". Are you reading
that as implying that *only* strictly conforming programs must be
accepted?
As a practical matter, an implementation that accepts *only*
strictly conforming programs would be very nearly useless. I
don't see anything in the standard that says a program can be
rejected purely because it's not strictly conforming, and I don't
believe that was the intent.
Em 2/11/2024 9:06 AM, David Brown escreveu:
The best that can be done, is what is done today - compilers have lots
of warnings that can be enabled or disabled individually. Some are
considered important enough and universal enough that they are enabled
by default. There will be a group of warnings (gcc -Wall) that the
compiler developers feel are useful to a solid majority of developers
without having too many false positives on things the developers
consider good code. And there will be an additional group of warnings
(gcc -Wall -Wextra) as a starting point for developers who want
stricter code rules, and who will usually then have explicit flags for
fine-grained control of their particular requirements.
I think it is possible having the following.
A way to specify a set of warnings/errors. It can be a string for instance. Make some warning in this set standard.
And beyond that, there are a variety of niche checking tools for
particular cases, and large (and often expensive) code quality and
static checking tool suites for more advanced checks.
Yes, I agree we can have tools, and each tool can solve the problem.
But my point in having something standardized is because we can have "standardized safety" and "standardized mechanism to control static
analyses tools".
The same assumptions you have in on compiler you can have in another.
We can compare this approach with C++ for instance, when in C++ we have
an error and in C a warning, that means the error is part of the C++ language, it works in the same way in any compiler.
The other advantage is not having each tool with its own annotations.
Today GCC has some annotations, MSVC has SAL for instance etc.
vallor <vallor@cultnix.org> writes:
[...] I'm curious why there is resistance to conditionals written
like this:
if( 1 == a)
...that is to say, with the constant first.
I've done that in C and Perl. Are the aesthetics so
bad that I should quit that? Isn't it safer to write
it that way, so that a dropped "=" is pointed out on
compilation?
Do you know the phrase too clever by half? It describes
this coding practice. A partial solution at best, and
what's worse it comes with a cost for both writers and
readers of the code. It's easier and more effective just
to use -Wparentheses, which doesn't muck up the code and
can be turned on and off easily. There are better ways
for developers to spend their time than trying to take
advantage of clever but tricky schemes that don't help
very much and are done more thoroughly and more reliably
by using pre-existing automated tools. Too much buck,
not nearly enough bang.
Tim Rentsch <tr.17687@z991.linuxsc.com> writes:
Keith Thompson <Keith.S.Thompson+u@gmail.com> writes:
Tim Rentsch <tr.17687@z991.linuxsc.com> writes:
Keith Thompson <Keith.S.Thompson+u@gmail.com> writes:
Tim Rentsch <tr.17687@z991.linuxsc.com> writes:
[snip discussion of a program that divides by zero]
An implementation can refuse to translate the program, but not
because undefined behavior occurs. The undefined behavior here
happens only when the program is executed, but just compiling the
program doesn't do that. No execution, no undefined behavior.
Still the program may be rejected, because it is not strictly
conforming (by virtue of having output depend on the undefined
behavior if the program is ever run).
Just to be clear, would you say that a conforming hosted
implementation may reject this program:
#include <limits.h>
#include <stdio.h>
int main(void) {
printf("INT_MAX = %d\n", INT_MAX);
}
solely because it's not strictly conforming?
My understanding of the C standard is that a hosted implementation
may choose not to accept the above program and still be conforming,
because this program is not strictly conforming. (Please assume
subsequent remarks always refer to implementations that are both
hosted and conforming.)
Also, assuming we have ruled out cases involving #error, a
conforming implementation may choose not to accept a given program
if and only if the program is not strictly conforming. Being
strictly conforming is the only criterion that matters (again
assuming there is no #error) in deciding whether an implementation
may choose not to accept the program in question.
I'm guessing that what you mean by "may reject" is the same as what
I mean by "may choose not to accept". I'd like to know if you think
that's right, or if you think there is some difference between the
two. (My intention is that the two phrases have the same meaning.)
Does the above adequately address the question you want answered?
I'm not sure. As I recall, I gave up on trying to understand what
you think "accept" means.
N1570 5.1.2.3p6:
[CORRECTION: 3p4]
A program that is correct in all other aspects, operating on
correct data, containing unspecified behavior shall be a
correct program and act in accordance with 5.1.2.3.
Does that not apply to the program above? How can it do so if it's
rejected (or not "accepted")?
The same paragraph says that "A *conforming hosted implementation*
shall accept any strictly conforming program". Are you reading
that as implying that *only* strictly conforming programs must be
accepted?
As a practical matter, an implementation that accepts *only*
strictly conforming programs would be very nearly useless. I
don't see anything in the standard that says a program can be
rejected purely because it's not strictly conforming, and I don't
believe that was the intent.
My understanding of the C standard is that 'shall accept' is
meant in the sense of 'shall use its best efforts to complete
translation phases 1 through 8 successfully and produce an
executable'.
That sounds reasonable. I wish the standard actually defined
"accept".
Where you say "5.1.2.3p6:" I expect you mean "4p3".
Yes.
Where you say "the same paragraph" I expect you mean "4p6".
Yes.
The word "reject" does not appear in the C standard. In my own
writing I am trying henceforth to use "accept" exclusively and
not use "reject". For the safe of discussion I can take "reject"
to mean the logical complement of "accept", which is to say a
program is either accepted or rejected, never both and never
neither. Does that last sentence match your own usage?
Yes, "reject" means "not accept". There might be some nuance that
that definition misses, so I'll try to avoid using the word "reject"
in this discussion.
The C standard has only one place where a statement is made about
accepting a program, saying in 4p6 that implementations shall
accept any strictly conforming program; no other paragraph in the
standard mentions accepting a program. Given that, it's hard for
me to understand how someone could read the standard as saying
anything other than that a program must be accepted if it is
strictly conforming, but if the program is not strictly conforming
then there is no requirement that it be accepted. In short form, a
program must be accepted if and only if it is strictly conforming.
Does that summary mean something different than your phrase "*only*
strictly conforming programs must be accepted"?. My understanding
of the C standard is that strictly conforming programs must be
accepted, but implementations are not required to accept any
program that is not strictly conforming.
Certainly a conforming implementation must accept any strictly
conforming program (insert handwaving about capacity limits).
I can understand how one might read that requirement as implying
that an implementation need not accept any program that is not
strictly conforming. I don't read it that way.
In response to your question about 4p3, the short answer is that
any non-strictly-conforming program that an implementation chooses
not to accept is not correct in all other aspects, so 4p3 does not
apply. If you want to talk about that further we should split that
off into a separate thread, because 4p3 has nothing to do with
program acceptance.
I say it does. Under 4p3, the above program (that prints the value
of INT_MAX) is a "correct program", so it must "act in accordance
with 5.1.2.3". It cannot do so unless it is first accepted.
You're saying that the correctness of a program can depend on
whether an implementation chooses to accept it. I disagree.
An implementation that does not accept the above program is not
conforming because the implementation violates 4p3.
Sysop: | Keyop |
---|---|
Location: | Huddersfield, West Yorkshire, UK |
Users: | 546 |
Nodes: | 16 (2 / 14) |
Uptime: | 154:08:28 |
Calls: | 10,383 |
Files: | 14,054 |
Messages: | 6,417,843 |