Forum: >>> Magnum BBS <<<

Re: Is C ready to become a safer language?

From Lawrence D'Oliveiro@21:1/5 to Thiago Adams on Thu Feb 8 04:40:25 2024

On Thu, 8 Feb 2024 01:01:56 -0300, Thiago Adams wrote:

So, even if we had compile-time checks for bugs, C compilers and the
standard are not prepared to handle the implications to make C a safer language.

Do you want C to turn into Java? In Java, rules about reachability are
built into the language. And these rules are simplistic ones, based on
the state of the art back in the 1990s, not taking account of
improvements in compiler technology since then. For example, in the
following code, the uninitialized declaration of “Result” is a
compile-time error, even though a human looking at the code can figure
out that there is no way it will be left uninitialized at the point of reference:

char Result; /* not allowed! */
boolean LastWasBackslash = false;
for (;;)
{
++ColNr;
final int ich = Input.read();
if (ich < 0)
{
--ColNr;
EOF = true;
Result = '\n';
break;
}
else if (LastWasCR && (char)ich == '\012')
{
/* skip LF following CR */
--ColNr;
LastWasCR = false;
LastWasBackslash = false;
}
else if (!LastWasBackslash && (char)ich == '\\')
{
LastWasBackslash = true;
LastWasCR = false;
}
else if (!LastWasBackslash || !IsEOL((char)ich))
{
Result = (char)ich;
break;
}
else
{
++LineNr;
ColNr = 0;
LastWasCR = (char)ich == '\015';
LastWasBackslash = false;
} /*if*/
} /*for*/
EOL = EOF || IsEOL(Result);
LastWasCR = Result == '\015';

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Kaz Kylheku@21:1/5 to Lawrence D'Oliveiro on Thu Feb 8 05:00:04 2024

On 2024-02-08, Lawrence D'Oliveiro <ldo@nz.invalid> wrote:

On Thu, 8 Feb 2024 01:01:56 -0300, Thiago Adams wrote:

So, even if we had compile-time checks for bugs, C compilers and the
standard are not prepared to handle the implications to make C a safer
language.

Do you want C to turn into Java? In Java, rules about reachability are
built into the language. And these rules are simplistic ones, based on
the state of the art back in the 1990s, not taking account of
improvements in compiler technology since then. For example, in the
following code, the uninitialized declaration of “Result” is a compile-time error, even though a human looking at the code can figure
out that there is no way it will be left uninitialized at the point of reference:

Because C doesn't mandate such a warning, GCC was able to rid itself
of naive, badly implemented diagnostics in this area.

I have some old installations of GCC which still warn about some of my
code (that some variable might be used uninitialized). Newer compilers
are silent on that code, due to doing better analysis.

--
TXR Programming Language: http://nongnu.org/txr
Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal
Mastodon: @Kazinator@mstdn.ca

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Kaz Kylheku@21:1/5 to Thiago Adams on Thu Feb 8 04:58:27 2024

On 2024-02-08, Thiago Adams <thiago.adams@gmail.com> wrote:

Let's say C compilers can detect all sorts of bugs at compile time.

How would C compilers report that? As an error or a warning?

ISO C doesn't distinguish "error" from "warning"; it speaks only about diagnostics. It requires diagnostics for certain situations.

In compiler (not ISO C) parlance, we might understand "error" to be a
situation when a diagnostic is issued, and the implementation terminates
the translation, so that a translated program or translation unit is not produced.

A "warning" is any other situation when a diagnostic is issued and
translation continues.

(These concepts extend into run-time. There could be a run-time diagnostic which doesn't terminate the program, and one which does.)

With these definitions ...

Let's use this sample:

int main() {
int a = 1;
a = a / 0;
}

GCC says:

warning: division by zero is undefined [-Wdivision-by-zero]
5 | a = a / 0;
| ^ ~
In case GCC or any other compiler reports this as an error, then C programmers would likely complain. Am I right?

They might.

A programmer taking a standard-based view might remark that
the program is not required to be diagnosed by ISO C.

It does invoke undefined behavior if executed, though.

Based on the deduction that the program has unconditional undefined
behavior, it is legitimate to terminate translating the program with a diagnostic, since that is a possible consequence of undefined behavior.

So, even if we had compile-time checks for bugs, C compilers and the
standard are not prepared to handle the implications to make C a safer language.

From my point of view, we need an error, not a warning.

Implementations are free to implement arbitrary diagnostics, and also nonconforming modes of translation, which stop translating a program
that does not violate any ISO C syntax or constraint rule.

Thus compilers can provide tools to help programmers enforce
rules that don't exist in the language as such.

But we also
need a way to ignore the error in case the programmer wants to see what happens, with a division by zero, for instance. (Please note that this
topic IS NOT about this specific warning; it is just a sample.)

Warnings work more or less like this. The problem with warnings is that
they are not standardized - the "name/number" and the way to
disable/enable them.

So this is the first problem we need to solve in C to make the language safer. We need a mechanism.

I also believe we can have a standard profile for warnings that will
change the compiler behaviour; for instance, making undefined behaviour
an error.

The C standard is complacent with the lack of error messages. Sometimes
it says "message is encouraged...". This shifts the responsibility from

I don't believe so. The C standard absolutely requires diagnostics
for certain situations. Everywhere else, it doesn't.

I don't remember seeing any text "encouraging" a diagnostic.
That's ambiguous: is it really required or not?

the language. But when someone says "C is dangerous," the language as a
whole is blamed, regardless of whether you are using MISRA, for instance.

Thus, not only are the mechanics of the language is unprepared, but the standard is also not prepared to assume the responsibility of being a
source of guidance and safety.

The standard specifies which constructs are required to do what under
what conditions, and those situations are safe.

Sometimes "do what" is unspecified (the implementation choices from a
range of safe behaviors) or implementation-defined (similar, but
implementation also documents the choice).

The standard also specifies that some situations must be diagnosed,
like syntax and type errors and other constraint rule violations.

Everything else is undefined behavior (some of it potentially
defined by the implementation as a "documented extension").

Avoidance of undefined behavior is left to the careful design
of the program, and whatever tools the implementation and third parties
provide for detecting undefined behavior.

Some languages, like Common Lisp, provide a notion of safety level.
Over individual expressions in Lisp, we can declare optimization
parameters: safety (0-3) and speed (0-3). We can also declare facts to
the Common Lisp compiler like types of operands and results. When we
lie to the compiler, the behavior becomes undefined, but the situation
is nuanced. Safe code diagnoses errors. If we tell the Lisp compiler
that some X is a cons cell, and then access (car X), but at run time, X
turns out to be a string, an error will be signaled. However, if we
compile the code with safety 0, all bets are off: the machine code may
blindly access the string object as if it were a cons cell, with
disastrous consequences.

C could benefit from an approach along these lines. The big problem is
that in C it's hard to impossible to make many undefined behaviors
safe (as in detect them and abort the program).

For isntance, there is no way to tell whether a pointer is valid
or not, or how large an array it points to.

Lisp is a safe, dynamic language first, and an aggressively optimized
language second. It's easy to tell that (car X) is accessing a string
and not a cons cell thanks to the run-time information in the objects.

It's easier to strip away safety from a safe language, and generate
unsafe code that works with lower level machine types, than to introduce
safety into a machine-oriented language, because the data
representations don't accomodate the needed run-time bits.

--
TXR Programming Language: http://nongnu.org/txr
Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal
Mastodon: @Kazinator@mstdn.ca

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From David Brown@21:1/5 to Keith Thompson on Thu Feb 8 09:17:45 2024

On 08/02/2024 05:59, Keith Thompson wrote:

Thiago Adams <thiago.adams@gmail.com> writes:

Let's say C compilers can detect all sorts of bugs at compile time.

How would C compilers report that? As an error or a warning?

Let's use this sample:

int main() {
int a = 1;
a = a / 0;
}

GCC says:

warning: division by zero is undefined [-Wdivision-by-zero]
5 | a = a / 0;
| ^ ~

In case GCC or any other compiler reports this as an error, then C
programmers would likely complain. Am I right?

Someone will always complain, but a conforming compiler can report this
as a fatal error.

I'm not /entirely/ convinced. Such code is only undefined behaviour at run-time, I believe. A compiler could reject the code (give a fatal
error) if it is sure that this will be reached when the code is run.
But can it be sure of that, even if it is in "main()" ? Freestanding implementations don't need to run "main()" (not all my programs have had
a "main()" function), and the freestanding/hosted implementation choice
is a matter of the implementation, not just the compiler.

Division by zero has undefine behavior. Under the standard's definition
of undefined behavior, it says:

NOTE

Possible undefined behavior ranges from ignoring the situation
completely with unpredictable results, to behaving during
translation or program execution in a documented manner
characteristic of the environment (with or without the issuance
of a diagnostic message), to terminating a translation or
execution (with the issuance of a diagnostic message).

Though it's not quite that simple. Rejecting the program if the
compiler can't prove that the division will be executed would IMHO
be non-conforming. Code that's never executed has no behavior, so it
doesn't have undefined behavior.

Indeed.

In particular, someone could write :

static inline void undefined_behaviour(void) {
1 / 0;
}

and use that for a "can never happen" indicator for compiler optimisation :

int foo(int x) {
if (x < 0) undefined_behaviour();
// From here on, the compiler can optimise using the
// assumption that x >= 0
...
}

gcc and clang have __builtin_unreachable() for this purpose - "calling"
that function that is always run-time undefined behaviour.

But of course any compiler can reject anything it likes in
non-conforming mode. See for example "gcc -Werror".

Yes. Once I have a project roughly in shape, I always enable that so
that I don't miss any warnings.

But even ignoring that, a culture of paying very close attention to
non-fatal warnings could go a long way towards making C safer (assuming compilers are clever enough to issue good warnings).

Yes.

I think it would be better if compilers were stricter in their default
modes, even if their stricter compliant modes have to reduce the
severity of some warnings. People should be allowed to write "weird"
code that looks very much like an error - but perhaps it is those people
who should need extra flags or other effort, not those that write
"normal" code and want to catch their bugs. (I know this can't be done
for tools like gcc, because it could cause problems with existing code.)

Still, I am happy to see that the latest gcc trunk has made some pre-C99 features into fatal errors instead of warnings - implicit function
declarations are now fatal errors.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Richard Kettlewell@21:1/5 to Thiago Adams on Thu Feb 8 15:37:26 2024

Thiago Adams <thiago.adams@gmail.com> writes:

From my point of view, we need an error, not a warning. But we also
need a way to ignore the error in case the programmer wants to see
what happens, with a division by zero, for instance. (Please note that
this topic IS NOT about this specific warning; it is just a sample.)

All this already exists. Compilers warn about many possible or actual
problems (such as your example) and the warnings can be selectively or
globally turned into errors. For example see: https://gcc.gnu.org/onlinedocs/gcc-13.2.0/gcc/Warning-Options.html#index-Werror

Warnings work more or less like this. The problem with warnings is
that they are not standardized - the "name/number" and the way to disable/enable them.

Having to configure warnings and errors separately for each compiler in
a multi-platform environment is a tiny cost compared to the cost of
developing and maintaining cross-platform source code, multiple build platforms, multiple test platforms, etc. So I don’t think there’s much benefit to be hand from standardization here. Historically the addition
of useful warning options to compilers has outpaced the development of
the C standard in any case.

So this is the first problem we need to solve in C to make the
language safer. We need a mechanism.

I also believe we can have a standard profile for warnings that will
change the compiler behaviour; for instance, making undefined
behaviour an error.

This can’t be done completely at compile time, at least not without an unacceptably high level of false positives.

Tools do exist to detect undefined behavior at runtime and terminate the program, though. A couple of examples are:

https://clang.llvm.org/docs/MemorySanitizer.html
https://clang.llvm.org/docs/AddressSanitizer.html

However there are severe limitations:
* Coverage is only partial.
* Performance is impacted.
* Some of the tools are unsuited to production environments, see e.g.
https://www.openwall.com/lists/oss-security/2016/02/17/9

You could also look at (partial) hardware-based responses to the
undefined behavior problem such as https://www.cl.cam.ac.uk/research/security/ctsrd/cheri/

--
https://www.greenend.org.uk/rjk/

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From David Brown@21:1/5 to Thiago Adams on Thu Feb 8 16:25:48 2024

On 08/02/2024 13:48, Thiago Adams wrote:

Em 2/8/2024 5:17 AM, David Brown escreveu:
...

I think it would be better if compilers were stricter in their default
modes, even if their stricter compliant modes have to reduce the
severity of some warnings. People should be allowed to write "weird"
code that looks very much like an error - but perhaps it is those
people who should need extra flags or other effort, not those that
write "normal" code and want to catch their bugs. (I know this can't
be done for tools like gcc, because it could cause problems with
existing code.)

Yes this is my point.
But I believe we need a standard mechanism, and this is the first step towards safety in C.

Consider this code.

int main(void)
{
    int a = 1, b = 2;

    #ifdef _MSC_VER
      #pragma warning( push )
      #pragma warning( disable : 4706 )
    #else
      #pragma GCC diagnostic push
      #pragma GCC diagnostic ignored "-Wparentheses"
    #endif

    if (a = b){}

   #ifdef _MSC_VER
      #pragma warning( pop )
    #else
      #pragma GCC diagnostic pop
    #endif
}

This code wants to use a = b inside the if condition.
The code shows how to disable the warning in GCC and MSCV.

If we had in the standard a number for the warning, and a mechanism for disabling then we could have something like

Standard numbers would be a /really/ bad idea - standard names would be
vastly better. There are already a couple of attributes that
standardise manipulation of warnings - [[nodiscard]] and
[[maybe_unused]]. But that syntax is not scalable. Perhaps :

[[ignored(parentheses)]]
[[warn(parentheses)]]
[[error(parentheses)]]

int main(void)
{
    int a = 1, b = 2;

    if (a = b) [[disable:4706]]
    {
    }
}

Maybe

int main(void)
{
    int a = 1, b = 2;

    if (a = b) _ignore("I want to assign here...")
    {
    }
}

That is applied to any warning on that specific line.
The advantage is the warning ID is not necessary

The disadvantage then is that it affects all warnings, not the ones you
know are safe to ignore. And anything that applies to a specific line
is a non-starter for C - you would have to attach it to a statement or
other syntactic unit.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From bart@21:1/5 to Thiago Adams on Thu Feb 8 16:30:49 2024

On 08/02/2024 04:01, Thiago Adams wrote:

Let's say C compilers can detect all sorts of bugs at compile time.

How would C compilers report that? As an error or a warning?

Let's use this sample:

int main() {
    int a = 1;
    a = a / 0;
}

GCC says:

warning: division by zero is undefined [-Wdivision-by-zero]
    5 | a = a / 0;
      |        ^ ~

In case GCC or any other compiler reports this as an error, then C programmers would likely complain. Am I right?

So, even if we had compile-time checks for bugs, C compilers and the
standard are not prepared to handle the implications to make C a safer language.

From my point of view, we need an error, not a warning. But we also
need a way to ignore the error in case the programmer wants to see what happens, with a division by zero, for instance.

Replace the 0 with a variable containing zero at runtime.

(Please note that this

topic IS NOT about this specific warning; it is just a sample.)

Warnings work more or less like this. The problem with warnings is that
they are not standardized - the "name/number" and the way to
disable/enable them.

So this is the first problem we need to solve in C to make the language safer. We need a mechanism.

I also believe we can have a standard profile for warnings that will
change the compiler behaviour; for instance, making undefined behaviour
an error.

The C standard is complacent with the lack of error messages. Sometimes
it says "message is encouraged...". This shifts the responsibility from
the language. But when someone says "C is dangerous," the language as a
whole is blamed, regardless of whether you are using MISRA, for instance.

Thus, not only are the mechanics of the language is unprepared, but the standard is also not prepared to assume the responsibility of being a
source of guidance and safety.

This is something which has long been of fascination to me: how exactly
do you get a C compiler to actually fail a program with a hard error
when there is obviously something wrong, while not also failing on
completely harmless matters.

So, taking gcc as an example, the same program may:

* Pass, and generate a runnable binary

* Warn, and generate a runnable binary still

* Fail with a hard error.

But the latter is unusual; you might get it with a syntax error for example.

The compiler-behaviour yet get depends on the options that are supplied,
which in turn depends on the programmer: THEY get to choose whether a
program passes or not, something you that you might expect to be the
compiler's job!

The compiler power-users here will have their own set-ups that define
which classes of programs will pass or fail. But I think compilers
should do that out of the box.

--------------------------------------------------

c:\c>type c.c
int main(void) {
int a, b;
a=b/0;
}

c:\c>tcc c.c

c:\c>gcc c.c
c.c: In function 'main':
c.c:3:5: warning: division by zero [-Wdiv-by-zero]
3 | a=b/0;
| ^

c:\c>mcc c
Compiling c.c to c.exe
Proc: main
MCL Error: Divide by zero Line:3 in:c.c

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From bart@21:1/5 to Thiago Adams on Thu Feb 8 17:02:39 2024

On 08/02/2024 12:20, Thiago Adams wrote:

One more:

struct X {double d;};
struct Y {int i;};
int main()
{
struct X x;
struct Y *p;
p = &x;
}

GCC says:

warning: incompatible pointer types assigning to 'struct Y *' from
'struct X *' [-Wincompatible-pointer-types]
    7 | p = &x;
      |    ^ ~~

For this one a C++ compiler gives:

<source>:7:6: error: cannot convert 'X*' to 'Y*' in assignment
    7 | p = &x;

So, this may be also related with the "spirit of C". I like the idea of
the programmer can do whatever they want.

They can, they just have to write it like this:

p = (struct Y *)&x;

or, with a common extension:

p = (typeof(p))&x;

(or, in my language, it can be just 'p := cast(&x)'. Without a cast, it

Consider the question

"Is C language safe?"

The answer will be , well, the language itself is very vague, depends
of the compiler you use.

And the options you provide that define the dialect of C and its strictness.

But even if compilers were strict by default, too many things in C are
unsafe, but still legal. That's due to the language design which is not
going to change.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From David Brown@21:1/5 to Thiago Adams on Thu Feb 8 21:42:58 2024

On 08/02/2024 17:15, Thiago Adams wrote:

Em 2/8/2024 12:25 PM, David Brown escreveu:

I was having a look at C# specification. It uses pragma.

https://learn.microsoft.com/en-us/dotnet/csharp/language-reference/preprocessor-directives#nullable-context

Sample
#pragma warning disable 414, CS3021
#pragma warning restore CS3021

gcc and clang both accept #pragmas for controlling warnings. These
could be standardised as an alternative to (or in addition to) attributes.

It also have warning numbers, I am not sure if other C# compilers uses
the same warnings "id".

No. Warnings are determined by compilers. Well-designed tools use
names, and sometimes there can be compatibility (such as clang
originally copying gcc, then each copying the other for warning flag
names depending on who implements it first). Numbers are not IME at all consistent - they are a left-over from the days when adding proper error messages or flag names to a compiler would take too much code space. It
is amateurish that MS has yet to fix this.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From David Brown@21:1/5 to Keith Thompson on Fri Feb 9 08:14:18 2024

On 08/02/2024 17:04, Keith Thompson wrote:

David Brown <david.brown@hesbynett.no> writes:

On 08/02/2024 05:59, Keith Thompson wrote:

Thiago Adams <thiago.adams@gmail.com> writes:

Let's say C compilers can detect all sorts of bugs at compile time.

How would C compilers report that? As an error or a warning?

Let's use this sample:

int main() {
int a = 1;
a = a / 0;
}

GCC says:

warning: division by zero is undefined [-Wdivision-by-zero]
5 | a = a / 0;
| ^ ~

In case GCC or any other compiler reports this as an error, then C
programmers would likely complain. Am I right?

Someone will always complain, but a conforming compiler can report
this as a fatal error.

I'm not /entirely/ convinced. Such code is only undefined behaviour
at run-time, I believe. A compiler could reject the code (give a
fatal error) if it is sure that this will be reached when the code is
run. But can it be sure of that, even if it is in "main()" ?
Freestanding implementations don't need to run "main()" (not all my
programs have had a "main()" function), and the freestanding/hosted
implementation choice is a matter of the implementation, not just the
compiler.

In a freestanding implementation, main() *might* be just another
function. In that case a compiler can't prove that the code will be
invoked.

I was assuming a hosted implementation -- and the compiler knows whether
its implementation is hosted or freestanding.

It's reasonable to assume "hosted" unless you have particular reason not to.

But I am not sure that the /compiler/ knows that it is compiling for a
hosted or freestanding implementation. The same gcc can be used for
Linux hosted user code and a freestanding Linux kernel. Does the
compiler always know which when compiling a unit that happens to contain "main()" ? I don't think gcc's "-ffreestanding" or "-fno-hosted" flags
are much used - after all, virtually every freestanding C implementation
also implements at least a fair part of the C standard library, and
implements the same semantics in the functions it provides, so there
really is no difference for the compiler.

(This is not something I claim to have any kind of clear answer for -
it's an open question for my part.)

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From David Brown@21:1/5 to Malcolm McLean on Fri Feb 9 08:19:33 2024

On 08/02/2024 17:14, Malcolm McLean wrote:

On 08/02/2024 16:04, Keith Thompson wrote:

In a freestanding implementation, main() *might* be just another
function. In that case a compiler can't prove that the code will be
invoked.

Well sometimes it can. The boot routine or program entry point is by defintion always invoked, and you can generally prove that at least some
code is always reached from that. However it is the halting problem, and
you can never prove for all cases, even if the code must be reached or
not reached regardless of runtime inputs.

It is not "the halting problem". What you are trying to say, is that it
is undecidable or not a computable problem in the general case.

Compilers and linkers can - and do - map potential reachability, erring
on the side of false positives. It is very common, at least in embedded systems where you try to avoid wasting space, to trace reachability and
discard code and data that is known to never be reachable. But this is
done by treating any symbol referenced by each function (after dead-code elimination) as being reachable by that function, which may include
false positives.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From David Brown@21:1/5 to Malcolm McLean on Fri Feb 9 09:33:07 2024

On 09/02/2024 08:46, Malcolm McLean wrote:

On 09/02/2024 07:19, David Brown wrote:

On 08/02/2024 17:14, Malcolm McLean wrote:

On 08/02/2024 16:04, Keith Thompson wrote:

In a freestanding implementation, main() *might* be just another
function. In that case a compiler can't prove that the code will be
invoked.

Well sometimes it can. The boot routine or program entry point is by
defintion always invoked, and you can generally prove that at least
some code is always reached from that. However it is the halting
problem, and you can never prove for all cases, even if the code must
be reached or not reached regardless of runtime inputs.

It is not "the halting problem". What you are trying to say, is that
it is undecidable or not a computable problem in the general case.

The "is this code reached?" problem is the halting problem with the
trivial and unimportant difference that the code in question does not
have to be "exit()".

No, it is not.

The two problems can be shown to be equivalently "hard" - that is, if
you could find a solution to one, it would let you solve the other. But
that does not make them the same problem.

And even if they /were/ the same thing, writing "this is undecidable" or
"this is infeasible to compute" is clear and to the point. Writing
"this is the halting problem" is name-dropping a computer science theory
in order to look smart - and like most such attempts, is more smart-arse
than smart.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From David Brown@21:1/5 to Malcolm McLean on Fri Feb 9 11:17:54 2024

On 09/02/2024 11:03, Malcolm McLean wrote:

On 09/02/2024 08:33, David Brown wrote:

On 09/02/2024 08:46, Malcolm McLean wrote:

On 09/02/2024 07:19, David Brown wrote:

On 08/02/2024 17:14, Malcolm McLean wrote:

On 08/02/2024 16:04, Keith Thompson wrote:

In a freestanding implementation, main() *might* be just another
function. In that case a compiler can't prove that the code will be >>>>>> invoked.

Well sometimes it can. The boot routine or program entry point is
by defintion always invoked, and you can generally prove that at
least some code is always reached from that. However it is the
halting problem, and you can never prove for all cases, even if the
code must be reached or not reached regardless of runtime inputs.

It is not "the halting problem". What you are trying to say, is
that it is undecidable or not a computable problem in the general case. >>>>

The "is this code reached?" problem is the halting problem with the
trivial and unimportant difference that the code in question does not
have to be "exit()".

No, it is not.

The two problems can be shown to be equivalently "hard" - that is, if
you could find a solution to one, it would let you solve the other.
But that does not make them the same problem.

And even if they /were/ the same thing, writing "this is undecidable"
or "this is infeasible to compute" is clear and to the point. Writing
"this is the halting problem" is name-dropping a computer science
theory in order to look smart - and like most such attempts, is more
smart-arse than smart.

Well I've been accused of wasting my English degree, and so now I'm
going to accuse you of wasting your mathematics-related degree.

I haven't seen anyone here accusing you of wasting your English
literature degree. I think you have a lot of trouble communicating with
others here, and I think you have a strong tendency to invent what you
think others might have written, rather than reading what they actually
wrote. It is not helped by your "slapdash" style. I would have
expected someone with a university level degree in English to have a
greater emphasis on reading and understanding, and communicating.

But that does not mean I could or did accuse you of wasting your degree.
I don't know nearly enough about your life to comment on that. If you
are where you are now, and if your career, hobbies, interests,
education, or anything else in your life has benefited from the degree,
then it was not wasted.

I studied mathematics and computation. Both have been very useful in my
work. I can't claim that much of the high-level mathematics is directly applicable to my job, but the training in logical thinking, problem
solving, and an emphasis on proof is vital. In addition, I was lucky
enough to have a tutor that put a lot of weight on communication and
accurate technical writing, which is an essential part of my job.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Ben Bacarisse@21:1/5 to David Brown on Fri Feb 9 10:24:09 2024

David Brown <david.brown@hesbynett.no> writes:

On 09/02/2024 08:46, Malcolm McLean wrote:

On 09/02/2024 07:19, David Brown wrote:

On 08/02/2024 17:14, Malcolm McLean wrote:

On 08/02/2024 16:04, Keith Thompson wrote:

In a freestanding implementation, main() *might* be just another
function.� In that case a compiler can't prove that the code will be >>>>> invoked.

Well sometimes it can. The boot routine or program entry point is by
defintion always invoked, and you can generally prove that at least
some code is always reached from that. However it is the halting
problem, and you can never prove for all cases, even if the code must
be reached or not reached regardless of runtime inputs.

It is not "the halting problem".� What you are trying to say, is that it >>> is undecidable or not a computable problem in the general case.

The "is this code reached?" problem is the halting problem with the
trivial and unimportant difference that the code in question does not
have to be "exit()".

No, it is not.

The two problems can be shown to be equivalently "hard" - that is, if you could find a solution to one, it would let you solve the other. But that does not make them the same problem.

Sure. But it's not the halting problem for another reason as well.

In the formal models (that have halting problems and so on), it's "the computation" that is the problem instance. That is, the code and all
the data are presented for an answer. A C compiler has to decide on reachability without knowing the input.

For example, is this code "undefined":

int c, freq[128];
while ((c = getchar()) != EOF) freq[c]++;

? Maybe it was knocked up as a quick way to collect stats on base64
encoded data streams. If so, it's fine (at least in terms of UB).

--
Ben.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Ben Bacarisse@21:1/5 to bart on Fri Feb 9 13:01:59 2024

bart <bc@freeuk.com> writes:

On 09/02/2024 10:24, Ben Bacarisse wrote:

David Brown <david.brown@hesbynett.no> writes:

On 09/02/2024 08:46, Malcolm McLean wrote:

The two problems can be shown to be equivalently "hard" - that is, if you >>> could find a solution to one, it would let you solve the other. But that >>> does not make them the same problem.

Sure. But it's not the halting problem for another reason as well.
In the formal models (that have halting problems and so on), it's "the
computation" that is the problem instance. That is, the code and all
the data are presented for an answer. A C compiler has to decide on
reachability without knowing the input.
For example, is this code "undefined":
int c, freq[128];
while ((c = getchar()) != EOF) freq[c]++;
? Maybe it was knocked up as a quick way to collect stats on base64
encoded data streams. If so, it's fine (at least in terms of UB).

Perhaps until you read an input that has 2**31 or more of the same
character.

Yes, that's the point. Any UB is dependent on the input. It's not a
static property of the code.

--
Ben.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From bart@21:1/5 to Ben Bacarisse on Fri Feb 9 12:45:42 2024

On 09/02/2024 10:24, Ben Bacarisse wrote:

David Brown <david.brown@hesbynett.no> writes:

On 09/02/2024 08:46, Malcolm McLean wrote:

The two problems can be shown to be equivalently "hard" - that is, if you
could find a solution to one, it would let you solve the other. But that
does not make them the same problem.

Sure. But it's not the halting problem for another reason as well.

In the formal models (that have halting problems and so on), it's "the computation" that is the problem instance. That is, the code and all
the data are presented for an answer. A C compiler has to decide on reachability without knowing the input.

For example, is this code "undefined":

int c, freq[128];
while ((c = getchar()) != EOF) freq[c]++;

? Maybe it was knocked up as a quick way to collect stats on base64
encoded data streams. If so, it's fine (at least in terms of UB).

Perhaps until you read an input that has 2**31 or more of the same
character.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Ben Bacarisse@21:1/5 to Malcolm McLean on Fri Feb 9 13:28:18 2024

Malcolm McLean <malcolm.arthur.mclean@gmail.com> writes:

On 09/02/2024 10:24, Ben Bacarisse wrote:

David Brown <david.brown@hesbynett.no> writes:

On 09/02/2024 08:46, Malcolm McLean wrote:

On 09/02/2024 07:19, David Brown wrote:

On 08/02/2024 17:14, Malcolm McLean wrote:

On 08/02/2024 16:04, Keith Thompson wrote:

In a freestanding implementation, main() *might* be just another >>>>>>> function.� In that case a compiler can't prove that the code will be >>>>>>> invoked.

Well sometimes it can. The boot routine or program entry point is by >>>>>> defintion always invoked, and you can generally prove that at least >>>>>> some code is always reached from that. However it is the halting
problem, and you can never prove for all cases, even if the code must >>>>>> be reached or not reached regardless of runtime inputs.

It is not "the halting problem".� What you are trying to say, is that it >>>>> is undecidable or not a computable problem in the general case.

The "is this code reached?" problem is the halting problem with the
trivial and unimportant difference that the code in question does not
have to be "exit()".

No, it is not.

The two problems can be shown to be equivalently "hard" - that is, if you >>> could find a solution to one, it would let you solve the other. But that >>> does not make them the same problem.

Sure. But it's not the halting problem for another reason as well.
In the formal models (that have halting problems and so on), it's "the
computation" that is the problem instance. That is, the code and all
the data are presented for an answer. A C compiler has to decide on
reachability without knowing the input.
For example, is this code "undefined":
int c, freq[128];
while ((c = getchar()) != EOF) freq[c]++;
? Maybe it was knocked up as a quick way to collect stats on base64
encoded data streams. If so, it's fine (at least in terms of UB).

Yes, but I specified "regardless of runtime inputs".

How can it be the halting problem "regardless of runtime inputs"
considering my point that the HP instances include the input? You may
be thinking of the universal halting problem, but that does not really
fit what you said.

A better way to put it would be just to state that almost all
interesting run-time properties of programs are undecidable. And Rice's theorem gives any one who is curious the exact definition of
"interesting".

If a signed integer overflows the behaviour is undefined, so you also have
to prove that the input stream is short. And of course you also forgot to intialise to zero.

I did indeed. Thanks. Overflow is just another example of the point I
was making, but not zeroing wasn't!

--
Ben.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From David Brown@21:1/5 to Malcolm McLean on Fri Feb 9 15:03:36 2024

On 09/02/2024 12:24, Malcolm McLean wrote:

On 09/02/2024 10:24, Ben Bacarisse wrote:

David Brown <david.brown@hesbynett.no> writes:

On 09/02/2024 08:46, Malcolm McLean wrote:

On 09/02/2024 07:19, David Brown wrote:

On 08/02/2024 17:14, Malcolm McLean wrote:

On 08/02/2024 16:04, Keith Thompson wrote:

In a freestanding implementation, main() *might* be just another >>>>>>> function. In that case a compiler can't prove that the code will be >>>>>>> invoked.

Well sometimes it can. The boot routine or program entry point is by >>>>>> defintion always invoked, and you can generally prove that at least >>>>>> some code is always reached from that. However it is the halting
problem, and you can never prove for all cases, even if the code must >>>>>> be reached or not reached regardless of runtime inputs.

It is not "the halting problem". What you are trying to say, is
that it
is undecidable or not a computable problem in the general case.

The "is this code reached?" problem is the halting problem with the
trivial and unimportant difference that the code in question does not
have to be "exit()".

No, it is not.

The two problems can be shown to be equivalently "hard" - that is, if
you
could find a solution to one, it would let you solve the other. But
that
does not make them the same problem.

Sure. But it's not the halting problem for another reason as well.

In the formal models (that have halting problems and so on), it's "the
computation" that is the problem instance. That is, the code and all
the data are presented for an answer. A C compiler has to decide on
reachability without knowing the input.

For example, is this code "undefined":

int c, freq[128];
while ((c = getchar()) != EOF) freq[c]++;

? Maybe it was knocked up as a quick way to collect stats on base64
encoded data streams. If so, it's fine (at least in terms of UB).

Yes, but I specified "regardless of runtime inputs". A bit downthread,
you can be forgiven for having missed that. But DB less so, especially
when I'm accused of being the one who doesn't read things properly.

Why should I be "forgiven" for missing that, when I did not miss it and
when it is not at all relevant to what I wrote? I saw it, but responded
only to the clear bit of your claim "it is the halting problem", and
ignored the jumbled part about "runtime inputs". My point stands
whatever you meant to say about runtime inputs, and however what you
meant relates to what you tried to write.

So again, you did not read my post, or you are bizarrely blaming /me/
for something you think Ben did.

And
even when I agree there might be some truth in that, and explain why, I
am then accused of misrepresenting what an Oxford English degree is like.

Again, read my posts. I said your description of your degree does not
match my experience with my own degree at Oxford, nor what I heard from
others at the time who studied English literature. You may have given
an accurate description of your personal experiences - I have no way to
either prove or disprove that, and no reason to suspect you of being intentionally deceptive. But I believe it to have been unusual, and due
to a bad tutor.

If you change the halting problem such that some of the symbols on the
tape are allowed to have unknown values then I don't think you are
changing it in any mathematically very interesting way so it is still "trival", but if you attempt a halt decider it will substantially change
your programming approach, and so it is no longer "unimportant".

I would prefer to think a bit about how a volatile input tape would
relate to the halting problem as it is normally stated, before offering
an opinion on how it may or may not change the result. I suspect you
are correct that it will not change the problem or the results, but I
would want to be a bit more rigorous about what is meant before jumping
to conclusions.

However, I have no idea what you mean by "if you attempt a halt decider
it will substantially change your programming approach".

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From David Brown@21:1/5 to Ben Bacarisse on Fri Feb 9 15:08:00 2024

On 09/02/2024 14:01, Ben Bacarisse wrote:

bart <bc@freeuk.com> writes:

On 09/02/2024 10:24, Ben Bacarisse wrote:

David Brown <david.brown@hesbynett.no> writes:

On 09/02/2024 08:46, Malcolm McLean wrote:

The two problems can be shown to be equivalently "hard" - that is, if you >>>> could find a solution to one, it would let you solve the other. But that >>>> does not make them the same problem.

Sure. But it's not the halting problem for another reason as well.
In the formal models (that have halting problems and so on), it's "the
computation" that is the problem instance. That is, the code and all
the data are presented for an answer. A C compiler has to decide on
reachability without knowing the input.
For example, is this code "undefined":
int c, freq[128];
while ((c = getchar()) != EOF) freq[c]++;
? Maybe it was knocked up as a quick way to collect stats on base64
encoded data streams. If so, it's fine (at least in terms of UB).

Perhaps until you read an input that has 2**31 or more of the same
character.

Yes, that's the point. Any UB is dependent on the input. It's not a
static property of the code.

Well, /some/ UB is determinable at compile time - such as an identifier
having both internal and external linkage in the same unit, and a few
other bits and pieces that the language designers probably thought were
too burdensome to require compiler implementers to handle.

But mostly UB is a runtime issue, and therefore usually it is dependent
on the input.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From David Brown@21:1/5 to Keith Thompson on Fri Feb 9 20:15:49 2024

On 09/02/2024 17:28, Keith Thompson wrote:

David Brown <david.brown@hesbynett.no> writes:
[...]

But I am not sure that the /compiler/ knows that it is compiling for a
hosted or freestanding implementation. The same gcc can be used for
Linux hosted user code and a freestanding Linux kernel.

[...]

A conforming compiler must predefine the macro __STDC_HOSTED__ to either
0 or 1 (since C99).

Okay, that looks like a difference. A compiler could, I believe, call
itself "freestanding", define that to 0, and otherwise act exactly like
a hosted implementation. But it seems unlikely.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Tim Rentsch@21:1/5 to bart on Fri Feb 9 17:59:51 2024

bart <bc@freeuk.com> writes:

[...]

This is something which has long been of fascination to me: how
exactly do you get a C compiler to actually fail a program with a
hard error when there is obviously something wrong, while not also
failing on completely harmless matters.

I think the answer is obvious: unless and until you find someone
who works on a C compiler and who has exactly the same sense that
you do of "when there is obviously something wrong" and of what
conditions fall under the heading of "completely harmless matters",
and also the same sense that you do of how a C compiler should
behave in those cases, you won't get exactly what you want unless
you do it yourself.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Tim Rentsch@21:1/5 to Keith Thompson on Sat Feb 10 02:28:40 2024

Keith Thompson <Keith.S.Thompson+u@gmail.com> writes:

Thiago Adams <thiago.adams@gmail.com> writes:

Let's say C compilers can detect all sorts of bugs at compile time.

How would C compilers report that? As an error or a warning?

Let's use this sample:

int main() {
int a = 1;
a = a / 0;
}

GCC says:

warning: division by zero is undefined [-Wdivision-by-zero]
5 | a = a / 0;
| ^ ~

In case GCC or any other compiler reports this as an error, then C
programmers would likely complain. Am I right?

Someone will always complain, but a conforming compiler can report this
as a fatal error.

Division by zero has undefine behavior. Under the standard's definition
of undefined behavior, it says:

NOTE

Possible undefined behavior ranges from ignoring the situation
completely with unpredictable results, to behaving during
translation or program execution in a documented manner
characteristic of the environment (with or without the issuance
of a diagnostic message), to terminating a translation or
execution (with the issuance of a diagnostic message).

Though it's not quite that simple. Rejecting the program if the
compiler can't prove that the division will be executed would IMHO
be non-conforming. Code that's never executed has no behavior, so
it doesn't have undefined behavior.

An implementation can refuse to translate the program, but not
because undefined behavior occurs. The undefined behavior here
happens only when the program is executed, but just compiling the
program doesn't do that. No execution, no undefined behavior.
Still the program may be rejected, because it is not strictly
conforming (by virtue of having output depend on the undefined
behavior if the program is ever run).

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From bart@21:1/5 to Tim Rentsch on Sat Feb 10 20:22:52 2024

On 10/02/2024 01:59, Tim Rentsch wrote:

bart <bc@freeuk.com> writes:

[...]

This is something which has long been of fascination to me: how
exactly do you get a C compiler to actually fail a program with a
hard error when there is obviously something wrong, while not also
failing on completely harmless matters.

I think the answer is obvious: unless and until you find someone
who works on a C compiler and who has exactly the same sense that
you do of "when there is obviously something wrong" and of what
conditions fall under the heading of "completely harmless matters",
and also the same sense that you do of how a C compiler should
behave in those cases, you won't get exactly what you want unless
you do it yourself.

Take this function:

void F() {
F();
F(1);
F(1, 2.0);
F(1, 2.0, "3");
F(1, 2.0, "3", F);
}

Even if /one/ of those calls is correct, the other four can't be
possibly be correct as well.

Is there anyone here who doesn't think there is something obviously wrong?

How about this one:

#include <stdio.h>
int main(void) {
int a;
L1:
printf("Hello, World!\n");
}

Ramp up the warnings and a compiler will tell you about unused 'a' and
'L1'. Use -Werror and the compilation will fail.

Is there anyone here who thinks that running this program with those
unused identifiers is not completely harmless?

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Richard Harnden@21:1/5 to bart on Sat Feb 10 21:30:07 2024

On 10/02/2024 20:22, bart wrote:

On 10/02/2024 01:59, Tim Rentsch wrote:

bart <bc@freeuk.com> writes:

[...]

This is something which has long been of fascination to me: how
exactly do you get a C compiler to actually fail a program with a
hard error when there is obviously something wrong, while not also
failing on completely harmless matters.

I think the answer is obvious: unless and until you find someone
who works on a C compiler and who has exactly the same sense that
you do of "when there is obviously something wrong" and of what
conditions fall under the heading of "completely harmless matters",
and also the same sense that you do of how a C compiler should
behave in those cases, you won't get exactly what you want unless
you do it yourself.

Take this function:

void F() {
    F();
    F(1);
    F(1, 2.0);
    F(1, 2.0, "3");
    F(1, 2.0, "3", F);
}

Even if /one/ of those calls is correct, the other four can't be
possibly be correct as well.

Is there anyone here who doesn't think there is something obviously wrong?

Gcc says, "warning: passing arguments to 'F' without a prototype is
deprecated in all versions of C and is not supported in C2x [-Wdeprecated-non-prototype]"

How about this one:

#include <stdio.h>
int main(void) {
    int a;
    L1:
    printf("Hello, World!\n");
}

Ramp up the warnings and a compiler will tell you about unused 'a' and
'L1'. Use -Werror and the compilation will fail.

Is there anyone here who thinks that running this program with those
unused identifiers is not completely harmless?

The point of the warning is that maybe you meant to use them. Remove
them if they're not needed, or fix the code so they do get used.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Kaz Kylheku@21:1/5 to bart on Sat Feb 10 21:49:20 2024

On 2024-02-10, bart <bc@freeuk.com> wrote:

#include <stdio.h>
int main(void) {
int a;
L1:
printf("Hello, World!\n");
}

Ramp up the warnings and a compiler will tell you about unused 'a' and
'L1'. Use -Werror and the compilation will fail.

Is there anyone here who thinks that running this program with those
unused identifiers is not completely harmless?

Unused warnings exist because they help catch bugs.

double distance(double x, double y)
{
return sqrt(x*x + x*x);
}

The diagnostic will not catch all bugs of this type, since just one use is enough to silence it, but catching something is better than nothing.

Removing unused cruft also helps to keep the code clean. Stray material sometimes gets left behind after refactoring, or careless copy paste.
Unused identifiers are a "code smell".

Sometimes something must be left unused. It's good to be explicit about
that: to have some indication that it's deliberately unused.

When I implemented unused warnings in my Lisp compiler, I found a bug right away.

https://www.kylheku.com/cgit/txr/commit/?id=5ee2cd3b2304287c010237e03be4d181412e066f

In this diff hunk against in the assembler:

@@ -217,9 +218,9 @@
(q me.(cur-pos)))
(inc c)
me.(set-pos p)
- (format t "~,5d: ~,08X ~a\n" (trunc p 4) me.(get-word) dis-txt)
+ (format stream "~,5d: ~,08X ~a\n" (trunc p 4) me.(get-word) dis-txt)
(while (< (inc p 4) q)
- (format t "~,5d: ~,08X\n" (trunc p 4) me.(get-word)))
+ (format stream "~,5d: ~,08X\n" (trunc p 4) me.(get-word)))
me.(set-pos q)
(set p q)))
c))

The format function was given argument t, a nickname for standard output, so this code ignored the stream parameter and always sent output to standard output.

With the unused warnings, it got diagnosed.

--
TXR Programming Language: http://nongnu.org/txr
Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal
Mastodon: @Kazinator@mstdn.ca

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From bart@21:1/5 to Kaz Kylheku on Sun Feb 11 00:02:46 2024

On 10/02/2024 21:49, Kaz Kylheku wrote:

On 2024-02-10, bart <bc@freeuk.com> wrote:

#include <stdio.h>
int main(void) {
int a;
L1:
printf("Hello, World!\n");
}

Ramp up the warnings and a compiler will tell you about unused 'a' and
'L1'. Use -Werror and the compilation will fail.

Is there anyone here who thinks that running this program with those
unused identifiers is not completely harmless?

Unused warnings exist because they help catch bugs.

double distance(double x, double y)
{
return sqrt(x*x + x*x);
}

The diagnostic will not catch all bugs of this type, since just one use is enough to silence it, but catching something is better than nothing.

Removing unused cruft also helps to keep the code clean. Stray material sometimes gets left behind after refactoring, or careless copy paste.
Unused identifiers are a "code smell".

This is a different kind of analysis. IMV it doesn't belong in a routine compilation, just something you do periodically, or when you're stuck
for ideas.

In your example, maybe you did want x*x*2, and the 'y' is either a
parameter no longer needed, or not yet needed, or temporarily not used.

So it is not 'obviously wrong', and by itself, not using a parameter is harmless.

I'm looking for a result from a compiler which is either Pass or Fail,
not Maybe.

Sometimes something must be left unused. It's good to be explicit about
that: to have some indication that it's deliberately unused.

When I implemented unused warnings in my Lisp compiler, I found a bug right away.

https://www.kylheku.com/cgit/txr/commit/?id=5ee2cd3b2304287c010237e03be4d181412e066f

In this diff hunk against in the assembler:

@@ -217,9 +218,9 @@
(q me.(cur-pos)))
(inc c)
me.(set-pos p)
- (format t "~,5d: ~,08X ~a\n" (trunc p 4) me.(get-word) dis-txt)
+ (format stream "~,5d: ~,08X ~a\n" (trunc p 4) me.(get-word) dis-txt)
(while (< (inc p 4) q)
- (format t "~,5d: ~,08X\n" (trunc p 4) me.(get-word)))
+ (format stream "~,5d: ~,08X\n" (trunc p 4) me.(get-word)))
me.(set-pos q)
(set p q)))
c))

The format function was given argument t, a nickname for standard output, so this code ignored the stream parameter and always sent output to standard output.

With the unused warnings, it got diagnosed.

So you use the linty options when you're stuck with a bug, as I suggested.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From vallor@21:1/5 to bart on Sun Feb 11 00:40:53 2024

On Sat, 10 Feb 2024 20:22:52 +0000, bart <bc@freeuk.com> wrote in <uq8lus$3dceu$1@dont-email.me>:

How about this one:

#include <stdio.h>
int main(void) {
int a;
L1:
printf("Hello, World!\n");
}

Ramp up the warnings and a compiler will tell you about unused 'a' and
'L1'. Use -Werror and the compilation will fail.

Is there anyone here who thinks that running this program with those
unused identifiers is not completely harmless?

Yet you promoted warnings to errors, just to find a way to make
it fail. :(

("Is there anyone here who thinks that" bart's continuous
complaining about options to gcc deserve any merit?)

Regarding the topic, I'm curious why there is resistance
to conditionals written like this:

if( 1 == a)

...that is to say, with the constant first.

I've done that in C and Perl. Are the aesthetics so
bad that I should quit that? Isn't it safer to write
it that way, so that a dropped "=" is pointed out on
compilation?

--
-v

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From bart@21:1/5 to vallor on Sun Feb 11 01:06:51 2024

On 11/02/2024 00:40, vallor wrote:

On Sat, 10 Feb 2024 20:22:52 +0000, bart <bc@freeuk.com> wrote in <uq8lus$3dceu$1@dont-email.me>:

How about this one:

#include <stdio.h>
int main(void) {
int a;
L1:
printf("Hello, World!\n");
}

Ramp up the warnings and a compiler will tell you about unused 'a' and
'L1'. Use -Werror and the compilation will fail.

Is there anyone here who thinks that running this program with those
unused identifiers is not completely harmless?

Yet you promoted warnings to errors, just to find a way to make
it fail. :(

I didn't promote anything. I said IF you increased the warnings, AND
used -Werror, it will fail.

-Werror is what has always been suggested to me when I complain that
some clear error only results in a warning, which means a runnable
program has been produced.

("Is there anyone here who thinks that" bart's continuous
complaining about options to gcc deserve any merit?)

Actually I didn't mention gcc. But when I do, if it managed to do its
job without so much effort on my part then there would be fewer complaints.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Kaz Kylheku@21:1/5 to bart on Sun Feb 11 02:46:21 2024

On 2024-02-11, bart <bc@freeuk.com> wrote:

On 10/02/2024 21:49, Kaz Kylheku wrote:

On 2024-02-10, bart <bc@freeuk.com> wrote:

#include <stdio.h>
int main(void) {
int a;
L1:
printf("Hello, World!\n");
}

Ramp up the warnings and a compiler will tell you about unused 'a' and
'L1'. Use -Werror and the compilation will fail.

Is there anyone here who thinks that running this program with those
unused identifiers is not completely harmless?

Unused warnings exist because they help catch bugs.

double distance(double x, double y)
{
return sqrt(x*x + x*x);
}

The diagnostic will not catch all bugs of this type, since just one use is >> enough to silence it, but catching something is better than nothing.

Removing unused cruft also helps to keep the code clean. Stray material
sometimes gets left behind after refactoring, or careless copy paste.
Unused identifiers are a "code smell".

This is a different kind of analysis. IMV it doesn't belong in a routine compilation, just something you do periodically, or when you're stuck
for ideas.

Periodically translates to never. If there are some situations you don't
want in the code, the best thing is to intercept any change which
introduces such, and not allow it to be merged.

In your example, maybe you did want x*x*2, and the 'y' is either a
parameter no longer needed, or not yet needed, or temporarily not used.

If we take a correct program and add an unused variable to it, it
doesn't break. Everyone knows that. That isn't the point.

So it is not 'obviously wrong', and by itself, not using a parameter is harmless.

While it's not obviously wrong, it's not obviously right either.

Moreover, it is a hard fact that the parameter y is not used.

Granted, not every truth about a program is equally useful, but
experience shows that reporting unused identifiers pays off. My own
experience and experiences of others. That's why such diagnostics are implemented in compilers.

I'm looking for a result from a compiler which is either Pass or Fail,
not Maybe.

Things are fortunately not going to revert to the 1982 state of the
art, though.

The job of the compile is not only to translate the code or report
failure, but to unearth noteworthy facts about the code and remark on
them.

With the unused warnings, it got diagnosed.

So you use the linty options when you're stuck with a bug, as I suggested.

The point is, I would likely not have found that bug to this day without
the diagnostic. You want to be informed /before/ the bug is identified
in the field.

--
TXR Programming Language: http://nongnu.org/txr
Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal
Mastodon: @Kazinator@mstdn.ca

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Tim Rentsch@21:1/5 to vallor on Sat Feb 10 19:54:13 2024

vallor <vallor@cultnix.org> writes:

[...] I'm curious why there is resistance to conditionals written
like this:

if( 1 == a)

...that is to say, with the constant first.

I've done that in C and Perl. Are the aesthetics so
bad that I should quit that? Isn't it safer to write
it that way, so that a dropped "=" is pointed out on
compilation?

Do you know the phrase too clever by half? It describes
this coding practice. A partial solution at best, and
what's worse it comes with a cost for both writers and
readers of the code. It's easier and more effective just
to use -Wparentheses, which doesn't muck up the code and
can be turned on and off easily. There are better ways
for developers to spend their time than trying to take
advantage of clever but tricky schemes that don't help
very much and are done more thoroughly and more reliably
by using pre-existing automated tools. Too much buck,
not nearly enough bang.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Tim Rentsch@21:1/5 to bart on Sat Feb 10 22:21:32 2024

bart <bc@freeuk.com> writes:

On 10/02/2024 01:59, Tim Rentsch wrote:

bart <bc@freeuk.com> writes:

[...]

This is something which has long been of fascination to me: how
exactly do you get a C compiler to actually fail a program with a
hard error when there is obviously something wrong, while not also
failing on completely harmless matters.

I think the answer is obvious: unless and until you find someone
who works on a C compiler and who has exactly the same sense that
you do of "when there is obviously something wrong" and of what
conditions fall under the heading of "completely harmless matters",
and also the same sense that you do of how a C compiler should
behave in those cases, you won't get exactly what you want unless
you do it yourself.

Take this function:

void F() {
F();
F(1);
F(1, 2.0);
F(1, 2.0, "3");
F(1, 2.0, "3", F);
}

Even if /one/ of those calls is correct, the other four can't be
possibly be correct as well.

Is there anyone here who doesn't think there is something obviously wrong?

How about this one:

#include <stdio.h>
int main(void) {
int a;
L1:
printf("Hello, World!\n");
}

Ramp up the warnings and a compiler will tell you about unused 'a' and
L1'. Use -Werror and the compilation will fail.

Is there anyone here who thinks that running this program with those
unused identifiers is not completely harmless?

In both cases the answer is, It depends.

There are scenarios where I would want the first example to compile successfully and without any complaints. There are other scenarios
where I would want the second example to be given fatal errors during compilation. Good compilers provide a range of options, knowing that
different circumstances call for different compilation outcomes.
Even if you want the same set of error and warning conditions in
every single compile that you do, other people don't. So you better
get used to the idea of setting the various options the way you want
them, or else write your own compiler and discover that no one else
will use it because it doesn't offer any way to select the particular
sets of choices they need for the various compilation scenarios that
are important to what they're doing.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Tim Rentsch@21:1/5 to Keith Thompson on Sat Feb 10 23:22:14 2024

Keith Thompson <Keith.S.Thompson+u@gmail.com> writes:

Tim Rentsch <tr.17687@z991.linuxsc.com> writes:
[snip discussion of a program that divides by zero]

An implementation can refuse to translate the program, but not
because undefined behavior occurs. The undefined behavior here
happens only when the program is executed, but just compiling the
program doesn't do that. No execution, no undefined behavior.
Still the program may be rejected, because it is not strictly
conforming (by virtue of having output depend on the undefined
behavior if the program is ever run).

Just to be clear, would you say that a conforming hosted implementation
may reject this program:

#include <limits.h>
#include <stdio.h>
int main(void) {
printf("INT_MAX = %d\n", INT_MAX);
}

solely because it's not strictly conforming?

My understanding of the C standard is that a hosted implementation
may choose not to accept the above program and still be conforming,
because this program is not strictly conforming. (Please assume
subsequent remarks always refer to implementations that are both
hosted and conforming.)

Also, assuming we have ruled out cases involving #error, a conforming implementation may choose not to accept a given program if and only if
the program is not strictly conforming. Being strictly conforming is
the only criterion that matters (again assuming there is no #error) in
deciding whether an implementation may choose not to accept the
program in question.

I'm guessing that what you mean by "may reject" is the same as what
I mean by "may choose not to accept". I'd like to know if you think
that's right, or if you think there is some difference between the
two. (My intention is that the two phrases have the same meaning.)

Does the above adequately address the question you want answered?

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Richard Harnden@21:1/5 to Malcolm McLean on Sun Feb 11 11:31:49 2024

On 11/02/2024 11:01, Malcolm McLean wrote:

On 11/02/2024 00:40, vallor wrote:

("Is there anyone here who thinks that" bart's continuous
complaining about options to gcc deserve any merit?)

The compiler should be invoked with

gcc foo.c

As a first stab, I'd use:

gcc -std=c11 -pedantic -W -Wall -Wextra -Werror foo.c

and try very hard to fix any/all warnings.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From David Brown@21:1/5 to Thiago Adams on Sun Feb 11 13:06:57 2024

On 11/02/2024 00:44, Thiago Adams wrote:

Em 2/10/2024 6:49 PM, Kaz Kylheku escreveu:

On 2024-02-10, bart <bc@freeuk.com> wrote:

    #include <stdio.h>
    int main(void) {
      int a;
      L1:
      printf("Hello, World!\n");
    }

Ramp up the warnings and a compiler will tell you about unused 'a' and
'L1'. Use -Werror and the compilation will fail.

Is there anyone here who thinks that running this program with those
unused identifiers is not completely harmless?

Unused warnings exist because they help catch bugs.

   double distance(double x, double y)
   {
     return sqrt(x*x + x*x);
   }

Unused warning is a good sample to explain my point of view.
I want a "warning profile" inside the compiler to do a "automatic code review".
The criteria is not only complain about UB etc..the criteria is the same
used by humans (in the context of the program, how critical etc) to
approve or not a code.

While I appreciate the desire here, it is completely impossible in
practice. There are two main hinders. One is that different people
want different things from code reviews - code that is fine and
acceptable practice for one group or on one project might be banned
outright in another project or group. The other is that code reviewers generally know more than you can express in code (even if you have a
language that supports assumptions, assertions, and contracts), and this knowledge is important in code reviews but cannot be available to
automatic tools.

The best that can be done, is what is done today - compilers have lots
of warnings that can be enabled or disabled individually. Some are
considered important enough and universal enough that they are enabled
by default. There will be a group of warnings (gcc -Wall) that the
compiler developers feel are useful to a solid majority of developers
without having too many false positives on things the developers
consider good code. And there will be an additional group of warnings
(gcc -Wall -Wextra) as a starting point for developers who want stricter
code rules, and who will usually then have explicit flags for
fine-grained control of their particular requirements.

And beyond that, there are a variety of niche checking tools for
particular cases, and large (and often expensive) code quality and
static checking tool suites for more advanced checks.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From bart@21:1/5 to Kaz Kylheku on Sun Feb 11 12:24:42 2024

On 11/02/2024 02:46, Kaz Kylheku wrote:

On 2024-02-11, bart <bc@freeuk.com> wrote:

This is a different kind of analysis. IMV it doesn't belong in a routine
compilation, just something you do periodically, or when you're stuck
for ideas.

Periodically translates to never. If there are some situations you don't
want in the code, the best thing is to intercept any change which
introduces such, and not allow it to be merged.

I have an option in one of compilers called '-unused'. It displays a
list of unused parameter, local and global variables.

I should use it more often than I do. But in any case, it is a
by-product of an internal check where no storage is allocated for
variables, and no spilling is done for parameters.

The first unused parameter it reports on one app, is where the function
is part of a suite of functions that need to share the same set of
parameters. Not all functions will use all parameters.

Most unused non-parameters are left-overs from endless modifications. (Temporary debugging variables are usually written in capitals so are
easy to spot.)

In your example, maybe you did want x*x*2, and the 'y' is either a
parameter no longer needed, or not yet needed, or temporarily not used.

If we take a correct program and add an unused variable to it, it
doesn't break. Everyone knows that. That isn't the point.

So it is not 'obviously wrong', and by itself, not using a parameter is
harmless.

While it's not obviously wrong, it's not obviously right either.

Moreover, it is a hard fact that the parameter y is not used.

Granted, not every truth about a program is equally useful, but
experience shows that reporting unused identifiers pays off. My own experience and experiences of others. That's why such diagnostics are implemented in compilers.

Take these declarations at file-scope:

typedef int A;
static int B;
int C;
typedef long long int int64_t; // visible via stdint.h

They are not used anywhere in this translation unit. gcc will report B
being unused, but not the others.

'C' might be used in other translation units; I don't know if the linker
will pick that up, or maybe that info is not known to it.

A and int64_t can't be reported because the declarations for them may be
inside a header (as is the case for int64_t) used by other modules where
they /are/ used.

But if not, they could also indicate errors. (Maybe there is also
'typedef float D', and some variable should have been type A not D.)

So potentially useful information that you say is important, but can't
be or isn't done by a compiler.

(This where whole-program compilers like the ones I do come into their own.)

I'm looking for a result from a compiler which is either Pass or Fail,
not Maybe.

Things are fortunately not going to revert to the 1982 state of the
art, though.

The job of the compile is not only to translate the code or report
failure, but to unearth noteworthy facts about the code and remark on
them.

That's rubbish. People are quite happy to use endless scripting
languages where the bytecode compiler does exactly that: translate
source to linear bytecode in a no-nonsense fashion.

Those are people who want a fast or even instant turnaround.

Some of use want to treat languages that target native code in the same
way; like scripting languages, but with the benefit of strict
type-checking and faster code!

With the unused warnings, it got diagnosed.

So you use the linty options when you're stuck with a bug, as I suggested.

The point is, I would likely not have found that bug to this day without
the diagnostic. You want to be informed /before/ the bug is identified
in the field.

So you run that option (like -unused in my product), /before/ it gets to
the field.

I can't routinely use -unused for the 100s of compiles I might do in one
day, even if the source was up-to-date that morning with zero unused
vars, because I will be compiling part-finished or temporarily
commented-out code all the time. Eg. there might be an empty function body.

Or I've deleted the body for a rewrite, but still need the same variables.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From David Brown@21:1/5 to Keith Thompson on Sun Feb 11 13:27:54 2024

On 11/02/2024 02:51, Keith Thompson wrote:

vallor <vallor@cultnix.org> writes:
[...]

Regarding the topic, I'm curious why there is resistance
to conditionals written like this:

if( 1 == a)

...that is to say, with the constant first.

I've done that in C and Perl. Are the aesthetics so
bad that I should quit that? Isn't it safer to write
it that way, so that a dropped "=" is pointed out on
compilation?

I personally find "Yoda conditions" jarring.

I'm aware that `1 == a` and `a == 1` are equivalent, but I read them differently. The latter asks something about a, namely whether it's
equal to 1. The former asks something about 1, which is inevitably a
silly question; I already know all about 1. I find that when reading
such code, I have to pause for a moment (perhaps a fraction of a second)
and mentally reverse the condition to understand it.

Note that I'm describing, not defending, the way I react to it.

Writing "=" rather than "==" is a sufficienly rare mistake, and likely
to be caught quickly because most compilers warn about it, that it's
just not worth scrambling the code to avoid it.

If you've internalized the commutativity of "==" so well that seeing
`1 == a` rather than `a == 1` doesn't bother you, that's fine.
But consider that some people reading your code are likely to have
reactions similar to mine.

I would second that opinion.

People who have a left-to-right language as their native tongue will
find "if (a == 1)" makes more sense, because it reads more like a normal language sentence. It requires less cognitive effort to interpret it,
and it is therefore more likely that they will interpret it correctly.

You can of course train yourself to be familiar with other arrangements,
and they will eventually look "natural" to you. People who are used to programming in Forth feel it makes more sense to write "a 1 = if",
because that's the Forth way.

But if you want to write code that is as clear as possible to as many
other programmers as possible (and that is a good aim, though of course
not an overriding aim), try reading the code aloud as a sentence.

And if your compiler does not immediately warn on "if (a = 1)", enable
better warnings on the compiler, or get a better compiler. Or if you
have no choice but to use a poor compiler, get a linter or use a good
compiler for linting in additional to the real target compiler.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From David Brown@21:1/5 to bart on Sun Feb 11 13:15:17 2024

On 11/02/2024 01:02, bart wrote:

On 10/02/2024 21:49, Kaz Kylheku wrote:

On 2024-02-10, bart <bc@freeuk.com> wrote:

    #include <stdio.h>
    int main(void) {
      int a;
      L1:
      printf("Hello, World!\n");
    }

Ramp up the warnings and a compiler will tell you about unused 'a' and
'L1'. Use -Werror and the compilation will fail.

Is there anyone here who thinks that running this program with those
unused identifiers is not completely harmless?

Unused warnings exist because they help catch bugs.

   double distance(double x, double y)
   {
     return sqrt(x*x + x*x);
   }

The diagnostic will not catch all bugs of this type, since just one
use is
enough to silence it, but catching something is better than nothing.

Removing unused cruft also helps to keep the code clean. Stray material
sometimes gets left behind after refactoring, or careless copy paste.
Unused identifiers are a "code smell".

This is a different kind of analysis. IMV it doesn't belong in a routine compilation, just something you do periodically, or when you're stuck
for ideas.

Absolutely not!

You want these checks as soon as possible. You don't want to find out
about a bug because someone complained about glitches and problems in
the delivered system - you want to find out about it as soon as you
wrote it.

If your static error checking, or linting, is not done as part of the
compile, it should be done /before/ the compilation. Not as an
afterthought when you are bored!

You can have extra levels of checking and simulation run separately if
they take significantly longer to run - just like running test suites
and regression tests. It is not uncommon in large development groups to
have big and advanced checks and reports done when code is checked into development branches of the source code control system - and passing
these is a requirement before moving the code to the master branch. If
these big checking systems take a couple of hours to run, then you can't
run them for every change - but you can run them overnight.

Checking for unused variables takes a couple of milliseconds, and should
always be done.

In your example, maybe you did want x*x*2, and the 'y' is either a
parameter no longer needed, or not yet needed, or temporarily not used.

So it is not 'obviously wrong', and by itself, not using a parameter is harmless.

I'm looking for a result from a compiler which is either Pass or Fail,
not Maybe.

Early on in a project, you expect to have lots of things like this. You
can temporarily disable such warnings that come up a lot. As your code solidifies, you enable them again. And once you have got code that
looks like a something worth testing, you enable "-Werror" so that all
warnings are treated as fatal errors - that way, none will be missed as
you build the code.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From bart@21:1/5 to Richard Harnden on Sun Feb 11 12:38:03 2024

On 11/02/2024 11:31, Richard Harnden wrote:

On 11/02/2024 11:01, Malcolm McLean wrote:

On 11/02/2024 00:40, vallor wrote:

("Is there anyone here who thinks that" bart's continuous
complaining about options to gcc deserve any merit?)

The compiler should be invoked with

gcc foo.c

As a first stab, I'd use:

gcc -std=c11 -pedantic -W -Wall -Wextra -Werror foo.c

and try very hard to fix any/all warnings.

That sounds like an incredibly slow and painful way to code.

During development, you are adding, modifying, commenting, uncommenting
and tearing down code all the time.

C already requires you to dot all the Is and cross all the Ts because of
its syntax and type needs. Why make the job even harder?

Your command line will fail a program because this variable:

int a;

is not yet used, or you've temporarily commented out the code where it
is used.

Instead of concentrating on getting working code, you now have to divert
your attention to all these truly pedantic matters.

Have you thought of WHY you are even allowed to do gcc foo.c ?

By all means run that command line when you reach certain stages, and
certainly before you ship.

My complaint is that 'gcc foo.c' is usually too lax, but your set of
options is too draconian.

I want a compiler that allows my unused variable by default, but doesn't
allow this assignment by default:

int a;
char* b=a;

It should fail it without being told it needs to fail it. And without a
million users of gcc each having to create their own suite of options,
in effect creating their own language dialect.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Richard Harnden@21:1/5 to bart on Sun Feb 11 13:23:02 2024

On 11/02/2024 12:38, bart wrote:

On 11/02/2024 11:31, Richard Harnden wrote:

On 11/02/2024 11:01, Malcolm McLean wrote:

On 11/02/2024 00:40, vallor wrote:

("Is there anyone here who thinks that" bart's continuous
complaining about options to gcc deserve any merit?)

The compiler should be invoked with

gcc foo.c

As a first stab, I'd use:

gcc -std=c11 -pedantic -W -Wall -Wextra -Werror foo.c

and try very hard to fix any/all warnings.

That sounds like an incredibly slow and painful way to code.

During development, you are adding, modifying, commenting, uncommenting
and tearing down code all the time.

C already requires you to dot all the Is and cross all the Ts because of
its syntax and type needs. Why make the job even harder?

Fixing things early /is/ easier. YMMobviouslyV.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From David Brown@21:1/5 to bart on Sun Feb 11 15:06:19 2024

On 11/02/2024 13:24, bart wrote:

On 11/02/2024 02:46, Kaz Kylheku wrote:

On 2024-02-11, bart <bc@freeuk.com> wrote:

Granted, not every truth about a program is equally useful, but
experience shows that reporting unused identifiers pays off. My own
experience and experiences of others. That's why such diagnostics are
implemented in compilers.

Take these declarations at file-scope:

   typedef int A;
   static int B;
   int C;
   typedef long long int int64_t;        // visible via stdint.h

They are not used anywhere in this translation unit. gcc will report B
being unused, but not the others.

'C' might be used in other translation units; I don't know if the linker
will pick that up, or maybe that info is not known to it.

Some linkers certainly can pick up this kind of thing, and use it to
discard code, data, or sections that are not needed. It's not common to
warn about unused symbols, however, since you could easily be
overwhelmed. If you have a static library, or some C files that you are
using as a library, then any given program will probably only need a
small fraction of the functions they provide - unused code and data is
then not an indication of a probably error, but a normal part of the coding.

A and int64_t can't be reported because the declarations for them may be inside a header (as is the case for int64_t) used by other modules where
they /are/ used.

Not quite.

Compiler warnings apply to a compilation, which is done on a translation
unit - generally a C file with some headers included in it. If you
write "static int B;" in a header and use it with two C files, one of
which uses the variable B and the other does not, then you'll get a
warning (with "gcc -Wall") for the one compilation but not for the other.

"int64_t" is defined in a system header. gcc (and other compilers)
treat system headers (any header included with < > brackets) differently
- most warnings are disabled for them, because they often contain lots
of things you don't need, and because they may use different styles than
you choose for your own code.

Unused typedefs don't trigger a warning in gcc (even with -Wall) unless
they are local to a function, because it's only in that case that it is
likely to be because of a bug in the code.

But if not, they could also indicate errors. (Maybe there is also
'typedef float D', and some variable should have been type A not D.)

So potentially useful information that you say is important, but can't
be or isn't done by a compiler.

Warnings can never be perfect, can't catch all errors, and can't avoid
all false positives and false negatives.

(This where whole-program compilers like the ones I do come into their
own.)

Yes, whole-program analysis can do checks that cannot be done when
analysing individual translation units.

I'm looking for a result from a compiler which is either Pass or Fail,
not Maybe.

Things are fortunately not going to revert to the 1982 state of the
art, though.

The job of the compile is not only to translate the code or report
failure, but to unearth noteworthy facts about the code and remark on
them.

That's rubbish. People are quite happy to use endless scripting
languages where the bytecode compiler does exactly that: translate
source to linear bytecode in a no-nonsense fashion.

Some people might be happy with that. I am not.

To me, my main use of a compiler is a "development" tool. It helps me
develop correct code. It is entirely possible to have a compiler
separate from the linter - this was common in the early days of C, and
the most extensive static error checking is done by dedicated analysis
tools. But for a large amount of common static checking, there is a
strong overlap in the code analysis done by a checker and that done by
an optimiser - it makes sense to combine the two aspects of development.

Compilers can also be used as tools for building or installing software,
run by someone other than the developers of the software. In such
cases, the person running the compiler is far less interested in
warnings - they hope the code is bug-free when they get it. Even then, however, warnings (as long as there are no false positives) can be
helpful in case something goes wrong.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From bart@21:1/5 to Richard Harnden on Sun Feb 11 14:17:26 2024

On 11/02/2024 13:23, Richard Harnden wrote:

On 11/02/2024 12:38, bart wrote:

On 11/02/2024 11:31, Richard Harnden wrote:

As a first stab, I'd use:

gcc -std=c11 -pedantic -W -Wall -Wextra -Werror foo.c

and try very hard to fix any/all warnings.

That sounds like an incredibly slow and painful way to code.

During development, you are adding, modifying, commenting,
uncommenting and tearing down code all the time.

C already requires you to dot all the Is and cross all the Ts because
of its syntax and type needs. Why make the job even harder?

Fixing things early /is/ easier. YMMobviouslyV.

If it's something that needs to be fixed, or is even part of the final
product.

A lot of code may be replaced five minutes later.

A program is gradually built and converges to its final form with lots
of deviations along the way. Obviously my mileage /is/ different as I
would find it stifling for it to conform to your standards at every
single step of the way, even for code with an expected half-life
measured in seconds.

(As it happens, I write most substantial projects in a different
language, and generate a C version, if needed, all at once using a
transpiler.

The development process in that other language is more informal, yet its compiler fails assignments between incompatible pointer types, and
ignores unused labels. The sort of sensible behaviour I'd want in a C
compiler by default.)

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Kaz Kylheku@21:1/5 to bart on Sun Feb 11 18:32:43 2024

On 2024-02-11, bart <bc@freeuk.com> wrote:

On 11/02/2024 02:46, Kaz Kylheku wrote:

On 2024-02-11, bart <bc@freeuk.com> wrote:

This is a different kind of analysis. IMV it doesn't belong in a routine >>> compilation, just something you do periodically, or when you're stuck
for ideas.

Periodically translates to never. If there are some situations you don't
want in the code, the best thing is to intercept any change which
introduces such, and not allow it to be merged.

I have an option in one of compilers called '-unused'. It displays a
list of unused parameter, local and global variables.

If that list is not in the standard error reporting format that
editors understand like:

foo.c:13:warning: unused variable "a"

it's going to be troublesome to use.

I should use it more often than I do. But in any case, it is a
by-product of an internal check where no storage is allocated for
variables, and no spilling is done for parameters.

The first unused parameter it reports on one app, is where the function
is part of a suite of functions that need to share the same set of parameters. Not all functions will use all parameters.

Most unused non-parameters are left-overs from endless modifications. (Temporary debugging variables are usually written in capitals so are
easy to spot.)

In your example, maybe you did want x*x*2, and the 'y' is either a
parameter no longer needed, or not yet needed, or temporarily not used.

If we take a correct program and add an unused variable to it, it
doesn't break. Everyone knows that. That isn't the point.

So it is not 'obviously wrong', and by itself, not using a parameter is
harmless.

While it's not obviously wrong, it's not obviously right either.

Moreover, it is a hard fact that the parameter y is not used.

Granted, not every truth about a program is equally useful, but
experience shows that reporting unused identifiers pays off. My own
experience and experiences of others. That's why such diagnostics are
implemented in compilers.

Take these declarations at file-scope:

typedef int A;
static int B;
int C;
typedef long long int int64_t; // visible via stdint.h

They are not used anywhere in this translation unit. gcc will report B
being unused, but not the others.

These diagnostics would be nice to have. They require that the
compiler check the file provenance of the declaration.

We only want to know that the typedefs and "int C" are not used,
if those declarations are in the same file, not if they
came from a header.

You wouldn't want

#include <stdio.h>

generating warnings that you didn't use ferror, fputs, scanf, ...!

'C' might be used in other translation units; I don't know if the linker
will pick that up, or maybe that info is not known to it.

C might be used in other translation units; yet it would be useful to
have a warning that C is not used in this translation unit.

But only if the declaration didn't come from a header.

Things are fortunately not going to revert to the 1982 state of the
art, though.

The job of the compile is not only to translate the code or report
failure, but to unearth noteworthy facts about the code and remark on
them.

That's rubbish.

I.e. you're disagreeing with best practices from the software
engineering field.

People are quite happy to use endless scripting
languages where the bytecode compiler does exactly that: translate
source to linear bytecode in a no-nonsense fashion.

This is an informal fallacy known as whataboutism.

Since "anyone can code", legions of dilettantes use poorly engineered
tools today.

So what? Some of them also don't test or document, or use version
control. So neither should you?

"There exist plumbers who string together pipes and hope for the best,
so engineering in the area of fluid dynamics is for shit."

Those are people who want a fast or even instant turnaround.

Some of use want to treat languages that target native code in the same
way; like scripting languages, but with the benefit of strict
type-checking and faster code!

All static information about a program is part of type checking!

Have you heard of the Curry-Howard correspondence? In a nutshell, it's
a mathematical result which says that type systems and formal logic
are equivalent.

Type checking isn't just about rejecting when an integer argument
is given to a string parameter.

Type checking means checking logical consistencies. When a compiler
checks types, it is evaluating logic.

Any logical proposition that we can verify about a program is a type
check.

If we decide to diagnose unused variables, that's a type check.

With the unused warnings, it got diagnosed.

So you use the linty options when you're stuck with a bug, as I suggested. >>

The point is, I would likely not have found that bug to this day without
the diagnostic. You want to be informed /before/ the bug is identified
in the field.

So you run that option (like -unused in my product), /before/ it gets to
the field.

Then you have to estimate: how many days before releasing to the field
do you do that, based on guessing how many bugs that might uncover.

It's an extra grunt work that someone has to be assigned to.

If fixes arise out of it, they will all be root caused to earlier work
items, which are probably associated with closed work tickets. Do you
open new tickets for those, or re-open the old ones? Or just put it
under its own ticket?

If you always have all merged code in a state where there are no unused identifiers, you don't have any of this.

I can't routinely use -unused for the 100s of compiles I might do in one
day, even if the source was up-to-date that morning with zero unused
vars, because I will be compiling part-finished or temporarily
commented-out code all the time. Eg. there might be an empty function body.

How recently and for how many years have you worked in a software
engineering team of more than five people, using tools that you didn't
cob together yourself?

--
TXR Programming Language: http://nongnu.org/txr
Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal
Mastodon: @Kazinator@mstdn.ca

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Tim Rentsch@21:1/5 to Keith Thompson on Sun Feb 11 23:48:51 2024

Keith Thompson <Keith.S.Thompson+u@gmail.com> writes:

Tim Rentsch <tr.17687@z991.linuxsc.com> writes:

Keith Thompson <Keith.S.Thompson+u@gmail.com> writes:

Tim Rentsch <tr.17687@z991.linuxsc.com> writes:
[snip discussion of a program that divides by zero]

An implementation can refuse to translate the program, but not
because undefined behavior occurs. The undefined behavior here
happens only when the program is executed, but just compiling the
program doesn't do that. No execution, no undefined behavior.
Still the program may be rejected, because it is not strictly
conforming (by virtue of having output depend on the undefined
behavior if the program is ever run).

Just to be clear, would you say that a conforming hosted
implementation may reject this program:

#include <limits.h>
#include <stdio.h>
int main(void) {
printf("INT_MAX = %d\n", INT_MAX);
}

solely because it's not strictly conforming?

My understanding of the C standard is that a hosted implementation
may choose not to accept the above program and still be conforming,
because this program is not strictly conforming. (Please assume
subsequent remarks always refer to implementations that are both
hosted and conforming.)

Also, assuming we have ruled out cases involving #error, a
conforming implementation may choose not to accept a given program
if and only if the program is not strictly conforming. Being
strictly conforming is the only criterion that matters (again
assuming there is no #error) in deciding whether an implementation
may choose not to accept the program in question.

I'm guessing that what you mean by "may reject" is the same as what
I mean by "may choose not to accept". I'd like to know if you think
that's right, or if you think there is some difference between the
two. (My intention is that the two phrases have the same meaning.)

Does the above adequately address the question you want answered?

I'm not sure. As I recall, I gave up on trying to understand what
you think "accept" means.

N1570 5.1.2.3p6:

A program that is correct in all other aspects, operating on
correct data, containing unspecified behavior shall be a
correct program and act in accordance with 5.1.2.3.

Does that not apply to the program above? How can it do so if it's
rejected (or not "accepted")?

The same paragraph says that "A *conforming hosted implementation*
shall accept any strictly conforming program". Are you reading
that as implying that *only* strictly conforming programs must be
accepted?

As a practical matter, an implementation that accepts *only*
strictly conforming programs would be very nearly useless. I
don't see anything in the standard that says a program can be
rejected purely because it's not strictly conforming, and I don't
believe that was the intent.

My understanding of the C standard is that 'shall accept' is
meant in the sense of 'shall use its best efforts to complete
translation phases 1 through 8 successfully and produce an
executable'.

Where you say "5.1.2.3p6:" I expect you mean "4p3".

Where you say "the same paragraph" I expect you mean "4p6".

The word "reject" does not appear in the C standard. In my own
writing I am trying henceforth to use "accept" exclusively and
not use "reject". For the safe of discussion I can take "reject"
to mean the logical complement of "accept", which is to say a
program is either accepted or rejected, never both and never
neither. Does that last sentence match your own usage?

The C standard has only one place where a statement is made about
accepting a program, saying in 4p6 that implementations shall
accept any strictly conforming program; no other paragraph in the
standard mentions accepting a program. Given that, it's hard for
me to understand how someone could read the standard as saying
anything other than that a program must be accepted if it is
strictly conforming, but if the program is not strictly conforming
then there is no requirement that it be accepted. In short form, a
program must be accepted if and only if it is strictly conforming.
Does that summary mean something different than your phrase "*only*
strictly conforming programs must be accepted"?. My understanding
of the C standard is that strictly conforming programs must be
accepted, but implementations are not required to accept any
program that is not strictly conforming.

In response to your question about 4p3, the short answer is that
any non-strictly-conforming program that an implementation chooses
not to accept is not correct in all other aspects, so 4p3 does not
apply. If you want to talk about that further we should split that
off into a separate thread, because 4p3 has nothing to do with
program acceptance.

I agree that an implementation that chooses not to accept any
program that it can determine to be not strictly conforming has
very little practical utility. On the other hand I don't think
that matters because no one is going to put in the effort needed to
produce such an implementation.

Regarding your last sentence

I don't see anything in the standard that says a
program can be rejected purely because it's not
strictly conforming, and I don't believe that was
the intent.

One, there is nothing in the C standard about rejecting a program,
only about what programs must be accepted, and

Two, in the last clause, I'm not completely sure what the "that"
is that you don't believe, but in any case I have no idea what
reasoning underlies your belief (or lack thereof). Can you
explain what it is you mean by that part of the sentence, and
what your reasoning is or why you think it?

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From David Brown@21:1/5 to Thiago Adams on Mon Feb 12 11:32:40 2024

On 11/02/2024 19:43, Thiago Adams wrote:

Em 2/11/2024 9:06 AM, David Brown escreveu:

The best that can be done, is what is done today - compilers have lots
of warnings that can be enabled or disabled individually. Some are
considered important enough and universal enough that they are enabled
by default. There will be a group of warnings (gcc -Wall) that the
compiler developers feel are useful to a solid majority of developers
without having too many false positives on things the developers
consider good code. And there will be an additional group of warnings
(gcc -Wall -Wextra) as a starting point for developers who want
stricter code rules, and who will usually then have explicit flags for
fine-grained control of their particular requirements.

I think it is possible having the following.
A way to specify a set of warnings/errors. It can be a string for instance. Make some warning in this set standard.

And beyond that, there are a variety of niche checking tools for
particular cases, and large (and often expensive) code quality and
static checking tool suites for more advanced checks.

Yes, I agree we can have tools, and each tool can solve the problem.

But my point in having something standardized is because we can have "standardized safety" and "standardized mechanism to control static
analyses tools".

The same assumptions you have in on compiler you can have in another.

I entirely agree that it would be nice if warnings and warning sets were standardised - or at least there was a subset of standard warnings
supported. You'd have to pick new names for the subsets, and they would probably have to be picked explicitly to avoid conflict with existing
code. But you could have a range of named string options for different warnings, and sets of "standard C warning levels" - "scw1", "scw2", etc.

gcc and clang already cooperate a lot with warning names. Intel icc
follows them. You'd have to get MS on board to make it a reality - and
they are the ones who would have to do a lot of work on their tools and
IDEs, since they currently use a number system. (It could have been
worse - if they had used names, but different names, it would be more confusing.)

And for MS to be interested, and to avoid duplication of effort, you'd
want C++ included from day one.

We can compare this approach with C++ for instance, when in C++ we have
an error and in C a warning, that means the error is part of the C++ language, it works in the same way in any compiler.

C++ diagnostics have the same rules as for C. And the main C++
compilers part of the same compiler suites as the main C compilers.

The difference is that when C++ started to become mainstream, there was
little in the way of existing code - compilers could mark bad practices
as warnings or even errors, by default. For C, however, there was a significant body of code and some code that compiled and worked would
fail to compile if the new checks were enabled by default. Thus the C compilers left them off by default.

So the reason for stricter defaults in the major C++ compilers compared
to their C siblings is backwards compatibility for older C source code.

The other advantage is not having each tool with its own annotations.
Today GCC has some annotations, MSVC has SAL for instance etc.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From vallor@21:1/5 to tr.17687@z991.linuxsc.com on Tue Feb 13 06:22:50 2024

On Sat, 10 Feb 2024 19:54:13 -0800, Tim Rentsch
<tr.17687@z991.linuxsc.com> wrote in <86sf1z4pwa.fsf@linuxsc.com>:

vallor <vallor@cultnix.org> writes:

[...] I'm curious why there is resistance to conditionals written
like this:

if( 1 == a)

...that is to say, with the constant first.

I've done that in C and Perl. Are the aesthetics so
bad that I should quit that? Isn't it safer to write
it that way, so that a dropped "=" is pointed out on
compilation?

Do you know the phrase too clever by half? It describes
this coding practice. A partial solution at best, and
what's worse it comes with a cost for both writers and
readers of the code. It's easier and more effective just
to use -Wparentheses, which doesn't muck up the code and
can be turned on and off easily. There are better ways
for developers to spend their time than trying to take
advantage of clever but tricky schemes that don't help
very much and are done more thoroughly and more reliably
by using pre-existing automated tools. Too much buck,
not nearly enough bang.

Thank you all for setting me straight on this. I'll stop
my yoda-conditionals, and start using -Wall.

--
-v

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Tim Rentsch@21:1/5 to Keith Thompson on Sat Mar 23 10:35:55 2024

Keith Thompson <Keith.S.Thompson+u@gmail.com> writes:

Tim Rentsch <tr.17687@z991.linuxsc.com> writes:

Keith Thompson <Keith.S.Thompson+u@gmail.com> writes:

Tim Rentsch <tr.17687@z991.linuxsc.com> writes:

Keith Thompson <Keith.S.Thompson+u@gmail.com> writes:

Tim Rentsch <tr.17687@z991.linuxsc.com> writes:
[snip discussion of a program that divides by zero]

An implementation can refuse to translate the program, but not
because undefined behavior occurs. The undefined behavior here
happens only when the program is executed, but just compiling the
program doesn't do that. No execution, no undefined behavior.
Still the program may be rejected, because it is not strictly
conforming (by virtue of having output depend on the undefined
behavior if the program is ever run).

Just to be clear, would you say that a conforming hosted
implementation may reject this program:

#include <limits.h>
#include <stdio.h>
int main(void) {
printf("INT_MAX = %d\n", INT_MAX);
}

solely because it's not strictly conforming?

My understanding of the C standard is that a hosted implementation
may choose not to accept the above program and still be conforming,
because this program is not strictly conforming. (Please assume
subsequent remarks always refer to implementations that are both
hosted and conforming.)

Also, assuming we have ruled out cases involving #error, a
conforming implementation may choose not to accept a given program
if and only if the program is not strictly conforming. Being
strictly conforming is the only criterion that matters (again
assuming there is no #error) in deciding whether an implementation
may choose not to accept the program in question.

I'm guessing that what you mean by "may reject" is the same as what
I mean by "may choose not to accept". I'd like to know if you think
that's right, or if you think there is some difference between the
two. (My intention is that the two phrases have the same meaning.)

Does the above adequately address the question you want answered?

I'm not sure. As I recall, I gave up on trying to understand what
you think "accept" means.

N1570 5.1.2.3p6:

[CORRECTION: 3p4]

A program that is correct in all other aspects, operating on
correct data, containing unspecified behavior shall be a
correct program and act in accordance with 5.1.2.3.

Does that not apply to the program above? How can it do so if it's
rejected (or not "accepted")?

The same paragraph says that "A *conforming hosted implementation*
shall accept any strictly conforming program". Are you reading
that as implying that *only* strictly conforming programs must be
accepted?

As a practical matter, an implementation that accepts *only*
strictly conforming programs would be very nearly useless. I
don't see anything in the standard that says a program can be
rejected purely because it's not strictly conforming, and I don't
believe that was the intent.

My understanding of the C standard is that 'shall accept' is
meant in the sense of 'shall use its best efforts to complete
translation phases 1 through 8 successfully and produce an
executable'.

That sounds reasonable. I wish the standard actually defined
"accept".

Where you say "5.1.2.3p6:" I expect you mean "4p3".

Yes.

Where you say "the same paragraph" I expect you mean "4p6".

Yes.

The word "reject" does not appear in the C standard. In my own
writing I am trying henceforth to use "accept" exclusively and
not use "reject". For the safe of discussion I can take "reject"
to mean the logical complement of "accept", which is to say a
program is either accepted or rejected, never both and never
neither. Does that last sentence match your own usage?

Yes, "reject" means "not accept". There might be some nuance that
that definition misses, so I'll try to avoid using the word "reject"
in this discussion.

The C standard has only one place where a statement is made about
accepting a program, saying in 4p6 that implementations shall
accept any strictly conforming program; no other paragraph in the
standard mentions accepting a program. Given that, it's hard for
me to understand how someone could read the standard as saying
anything other than that a program must be accepted if it is
strictly conforming, but if the program is not strictly conforming
then there is no requirement that it be accepted. In short form, a
program must be accepted if and only if it is strictly conforming.
Does that summary mean something different than your phrase "*only*
strictly conforming programs must be accepted"?. My understanding
of the C standard is that strictly conforming programs must be
accepted, but implementations are not required to accept any
program that is not strictly conforming.

Certainly a conforming implementation must accept any strictly
conforming program (insert handwaving about capacity limits).

No handwaving needed. If "accept" is taken to mean 'shall use its
best efforts to ...', etc., there is no problem with those efforts
failing for reasons outside the implementation's control.

I can understand how one might read that requirement as implying
that an implementation need not accept any program that is not
strictly conforming. I don't read it that way.

James Kuyper has said

The standard never talks about rejection. It requires that
"A conforming hosted implementation shall accept any
strictly conforming program.". No such requirement applies
to any program which is not strictly conforming. (4p6)

Clearly James thinks implementations are free not to accept any
program that is not strictly conforming. Do you think James is
wrong?

In an earlier posting you said

I don't see anything in the standard that says a program can
be rejected purely because it's not strictly conforming, and
I don't believe that was the intent.

What is the basis for that belief? Can you point to some passage in
the C standard, or some other materials, that supports it? Or do
you believe it just because you think no sensible person could
intend otherwise?

In response to your question about 4p3, the short answer is that
any non-strictly-conforming program that an implementation chooses
not to accept is not correct in all other aspects, so 4p3 does not
apply. If you want to talk about that further we should split that
off into a separate thread, because 4p3 has nothing to do with
program acceptance.

I say it does. Under 4p3, the above program (that prints the value
of INT_MAX) is a "correct program", so it must "act in accordance
with 5.1.2.3". It cannot do so unless it is first accepted.

It's a circular argument. You assume that a non-strictly conforming
program does not violate the requirement that the program be "correct
in all other aspects" and then conclude that the program is correct.
You're assuming the very thing you set out to prove.

What is the purpose of 4p3? What is the motivation for including it
in the C standard? Do you think it's there only to make programs
like the INT_MAX program above be non-rejectable? What else does it
do? What do you think is meant by "act in accordance with 5.1.2.3"?
Besides the C standard, what other materials or resources have you
consulted in forming your opinions or drawing your conclusions?

Suppose 4p3 were not included in the C standard. How would that
change the C language? What would be different, besides questions
like the one we are considering now?

The C90 standard does not include any requirement along the lines of
4p3 in C99 and later. Does that mean C90 is different in some way?
What prompted the insertion of 4p3 into C99? What efforts have you
made to investigate that? Or have you not made any?

You're saying that the correctness of a program can depend on
whether an implementation chooses to accept it. I disagree.

Certainly whether a program is strictly conforming CAN make a
difference in whether a program is "correct in all other aspects"
with respect to whether 4p3 applies. I have posted an example
actually observed in the behaviors of gcc and clang (having to do
with the size of an array type). The example violates no
constraints or syntax rules, has no undefined behavior, and uses
only language features specified in the C standard. Despite all
that clang rejects it. The only basis for clang's rejecting the
program is that the program is not strictly conforming. So one way
or another 4p3 is not relevant.

An implementation that does not accept the above program is not
conforming because the implementation violates 4p3.

Your reasoning the 4p3 applies here is flawed, because it's a
circular argument. More compellingly, you haven't presented any
evidence that 4p3 is meant to be relevant to the question of
acceptance. I know you think it is, and you have offered some
reasoning why it should be, but your explanations are not evidence.
What leads you to believe that members of the ISO C committee look
at this issue the same way you do? Have you made any effort to
answer that question?

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

Who's Online

System Info

Sysop:	Keyop
Location:	Huddersfield, West Yorkshire, UK
Users:	546
Nodes:	16 (2 / 14)
Uptime:	154:08:28
Calls:	10,383
Files:	14,054
Messages:	6,417,843

Re: Is C ready to become a safer language?

Who's Online

System Info