Forum: >>> Magnum BBS <<<

Memory protection between compilation units?

From Mateusz Viste@21:1/5 to All on Wed Jun 11 15:32:39 2025

This might not be a strictly C question, but it definitely concerns all
C programmers.

Earlier today, I fixed an out-of-bounds write bug. An obvious issue:

static int *socks[0xffff];

void update_my_socks(int *sock, int val) {
socks[val & 0xffff] = sock;
}

While the presented issue is common knowledge for anyone familiar with
C, *locating* the bug was challenging. The program did not crash at the
moment of the out-of-bounds write but much later - somewhere entirely different, in a different object file that maintained a static pointer
for tracking a position in a linked list. To my surprise, the pointer
was randomly reset to NULL about once a week, causing a segfault.
Tracing this back to an unrelated out-of-bounds write elsewhere in the
code was tedious, to say the least.

This raises a question: how can such corruptions be detected sooner?
Protected mode prevents interference between programs but doesn’t
safeguard a program from corrupting itself. Is there a way to enforce
memory protection between module files of the same program? After all,
static objects shouldn't be accessible outside their compilation unit.

How would you approach this?

Mateusz

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From =?UTF-8?Q?Josef_M=C3=B6llers?=@21:1/5 to Mateusz Viste on Wed Jun 11 16:06:10 2025

On 11.06.25 15:32, Mateusz Viste wrote:

This might not be a strictly C question, but it definitely concerns all
C programmers.

Earlier today, I fixed an out-of-bounds write bug. An obvious issue:

static int *socks[0xffff];

void update_my_socks(int *sock, int val) {
socks[val & 0xffff] = sock;
}

While the presented issue is common knowledge for anyone familiar with
C, *locating* the bug was challenging. The program did not crash at the moment of the out-of-bounds write but much later - somewhere entirely different, in a different object file that maintained a static pointer
for tracking a position in a linked list. To my surprise, the pointer
was randomly reset to NULL about once a week, causing a segfault.
Tracing this back to an unrelated out-of-bounds write elsewhere in the
code was tedious, to say the least.

The pointer was allocated immediately behind the "socks" array, i.e. as
the 0x10000th element of the array (I have analyzed a similar problem
for our son a couple of years ago, where the problem occurred and
vanished when he added some debug statements ;-) ).

This raises a question: how can such corruptions be detected sooner? Protected mode prevents interference between programs but doesn’t
safeguard a program from corrupting itself. Is there a way to enforce
memory protection between module files of the same program? After all,
static objects shouldn't be accessible outside their compilation unit.

I guess it can't because modules can access variables from other
modules, so either you forbid module B to modify a variable from module
A, which would break almost every moderately complex program, or you
fall into this trap.
Thus said ... this is not a problem of memory protection but a problem
of an out-of-bounds programming error. And ... no, you can't forbid this
as well, as there are quite a number of programs that define a
variable-length array (usually in a structure) as having a size of 1 and happily writing to index 1234.

How would you approach this?

Difficult, but, as I said, it's a programming error.

Josef

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Lew Pitcher@21:1/5 to Mateusz Viste on Wed Jun 11 14:30:30 2025

On Wed, 11 Jun 2025 15:32:39 +0200, Mateusz Viste wrote:

This might not be a strictly C question, but it definitely concerns all
C programmers.

Earlier today, I fixed an out-of-bounds write bug. An obvious issue:

static int *socks[0xffff];

void update_my_socks(int *sock, int val) {
socks[val & 0xffff] = sock;
}

While the presented issue is common knowledge for anyone familiar with
C, *locating* the bug was challenging. The program did not crash at the moment of the out-of-bounds write but much later - somewhere entirely different, in a different object file that maintained a static pointer
for tracking a position in a linked list. To my surprise, the pointer
was randomly reset to NULL about once a week, causing a segfault.
Tracing this back to an unrelated out-of-bounds write elsewhere in the
code was tedious, to say the least.

Your questions, below, are all quite valid, and (AFAICT) all relate to
how your operating environment (OS, linker, libraries, etc) works.

In general, you prevent or detect such issues by understanding
1) the environment in which your code runs,
2) the operation and implications of each component linked into your
process, and
3) the operation and implications of each compilation unit compiled
in each component of your process.

You will not be able to completely understand some components, as
you will probably, at best, only have documentation, and not source code
for them. Others will be too complex to properly understand.

This raises a question: how can such corruptions be detected sooner? Protected mode prevents interference between programs but doesn’t
safeguard a program from corrupting itself.

For components for which you have source code, bench-checking, peer
review, unit-testing, integration testing, compliance testing, and
performance testing should catch most flaws. Use the appropriate
tools: a language linter to catch language usage errors, a profiling
program to find where your code spends it's time, and a memory-use
tracking program (like valgrind, for instance) to catch out-of-bounds conditions.

Is there a way to enforce memory protection between module files of
the same program? After all, static objects shouldn't be accessible
outside their compilation unit.

This all depends on your linker/binder and your operating environment
(OS, etc). The linker or binder arranges your compilation units into
a cohesive whole, combining and arranging static memory areas, code
blocks, etc, to suit the requirements of you operating environment.
Your operating environment arranges all that into memory in order
to execute the code, which means moving those static memory blocks,
dynamic blocks, and code blocks around. The end result is that,
when executing, the placement and boundaries of each compilation-unit's "static" memory depends entirely on where the linker and OS decide
they should be. Objects that live "in isolation" in the source code
may occupy contiguous sequential memory locations in execution. The
OS may or may not provide some sort of "fencing" around objects or
blocks.

How would you approach this?

Carefully. Very carefully.
I could tell stories..... :-)

HTH
--
Lew Pitcher
"In Skills We Trust"

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Scott Lurndal@21:1/5 to josef@invalid.invalid on Wed Jun 11 14:32:39 2025

=?UTF-8?Q?Josef_M=C3=B6llers?= <josef@invalid.invalid> writes:

On 11.06.25 15:32, Mateusz Viste wrote:

This might not be a strictly C question, but it definitely concerns all
C programmers.

Earlier today, I fixed an out-of-bounds write bug. An obvious issue:

static int *socks[0xffff];

void update_my_socks(int *sock, int val) {
socks[val & 0xffff] = sock;
}

While the presented issue is common knowledge for anyone familiar with
C, *locating* the bug was challenging. The program did not crash at the
moment of the out-of-bounds write but much later - somewhere entirely
different, in a different object file that maintained a static pointer
for tracking a position in a linked list. To my surprise, the pointer
was randomly reset to NULL about once a week, causing a segfault.
Tracing this back to an unrelated out-of-bounds write elsewhere in the
code was tedious, to say the least.

valgrind.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From David Brown@21:1/5 to Mateusz Viste on Wed Jun 11 17:14:44 2025

On 11/06/2025 15:32, Mateusz Viste wrote:

This might not be a strictly C question, but it definitely concerns all
C programmers.

Earlier today, I fixed an out-of-bounds write bug. An obvious issue:

static int *socks[0xffff];

void update_my_socks(int *sock, int val) {
socks[val & 0xffff] = sock;
}

While the presented issue is common knowledge for anyone familiar with
C, *locating* the bug was challenging. The program did not crash at the moment of the out-of-bounds write but much later - somewhere entirely different, in a different object file that maintained a static pointer
for tracking a position in a linked list. To my surprise, the pointer
was randomly reset to NULL about once a week, causing a segfault.
Tracing this back to an unrelated out-of-bounds write elsewhere in the
code was tedious, to say the least.

This raises a question: how can such corruptions be detected sooner? Protected mode prevents interference between programs but doesn’t
safeguard a program from corrupting itself. Is there a way to enforce
memory protection between module files of the same program? After all,
static objects shouldn't be accessible outside their compilation unit.

How would you approach this?

Your key tools for catching such errors early are static error checking
and then run-time checkers. Then when you get strange symptoms, a debugger.

Static error checking (like gcc -O2 -Wall -Wextra) will not catch
everything, but it will catch /some/ out-of-bounds errors and other
bugs. The more you catch there with your compiler, the better. There
are also more advance static error checking tools for special
situations, or special prices.

Run-time checks like valgrind or gcc / clang sanitizers can catch quite
a lot of out-of-bounds accesses and other run-time errors. They can
take some practice to use well, and can have a significant impact on the run-time characteristics of the code (such as timing or memory usage)
which may then affect the way the code is run. And of course they won't
catch bugs unless the buggy parts of the code are actually run in a way
that triggers the problem.

For debugging problems like this with gdb, you can put a data breakpoint
on the pointer that is your known symptom. Set it to stop when
something writes 0 to it - then you can see where you are in code when
that happens. Of course, that will be a real pain if it only happens
once a week.

If you suspect a buffer overflow, then you can also look in your map
file for the pointer, and then look at what is next to it in memory.
This is inconvenient with static data - you have to combine it with
listing files, as the details of static data don't make it through to
the linker-generated map files. You might see more information using a debugger.

As you can see, there is no simple solution to this!

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Opus@21:1/5 to Mateusz Viste on Wed Jun 11 17:19:36 2025

On 11/06/2025 15:32, Mateusz Viste wrote:

This might not be a strictly C question, but it definitely concerns all
C programmers.
(...)
This raises a question: how can such corruptions be detected sooner? Protected mode prevents interference between programs but doesn’t
safeguard a program from corrupting itself. Is there a way to enforce
memory protection between module files of the same program? After all,
static objects shouldn't be accessible outside their compilation unit.

This is an interesting question, indeed not specific to C.

This would require fine-grained memory protection, something that would
require hardware support. Most OSs that implement some kind of
"processes" use memory protection to isolate processes, but that's not
more fine-grained than that.

So the short answer is: you have no means of doing this with current
OSs, hardware and languages.

Language-wise, the options to make memory corruption less likely is to implement bounds checking and other mechanisms like that.

In C, to avoid out-of-bounds access of arrays, you could check all your
array accesses dynamically (by checking indices). But that would require
using the right array length for checking, which you may also get wrong,
as this would be "manual".

There is a proposed extension for the RISC-V ISA called CHERI that
offers the kind of fine-grained memory protection that could fit your
purpose here. This is a topic that is certainly being investigated. But
nothing available outside of research for now.

To answer your question in a more practical way, I would rewrite your
code snippet as something like the following, making it safer and
clearer to maintain:

#define SOCKS_LEN 65536 // or (1U << 16), whatever better expresses the intent.

static int *socks[SOCKS_LEN];

void update_my_socks(int *sock, int val) {
socks[val % SOCKS_LEN] = sock;
}

Note that the modulo (% SOCKS_LEN) will be compiled as a mask by the
compiler if SOCKS_LEN is a power of two. So no need to bother with
trying to hand-optimize it. But the code above also works if SOCKS_LEN
is not a power of two. That's robust.

Second note: you chose to wrap indices around to handle possible
out-of-bounds accesses. That may or may not be a good idea depending on
the exact context. You may alternatively want to do nothing if val is
out of bounds:

void update_my_socks(int *sock, int val) {
if (val >= SOCKS_LEN)
return;

socks[val] = sock;
}

Of course, if you want to be able to handle the case where there is an
error, you may also want to return an error from update_my_socks()
instead of having a function returning nothing. Or call some specific
error function. Your pick.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Kaz Kylheku@21:1/5 to Mateusz Viste on Wed Jun 11 15:36:47 2025

On 2025-06-11, Mateusz Viste <mateusz@x.invalid> wrote:

How would you approach this?

Custom linker script which aligns the static area address of each module
to a page size, and introduces a dummy page-sized object.

Then at program load time, we iterate over these, and unmap the dummy
pages.

We might also have to think about perhaps coalescing the
non-zero-initialized and zero-initialized ("BSS") data. Or, rather than
saying coalescing, perhaps not separating the two. Or else separate implementing the strategy for the two areas: have unmapped pages in the
"BSS" area as well as the initialized data.

--
TXR Programming Language: http://nongnu.org/txr
Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal
Mastodon: @Kazinator@mstdn.ca

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Mateusz Viste@21:1/5 to All on Thu Jun 12 10:28:57 2025

Thank you all for your thoughtful responses. You rightly identified
that the problem is essentially an out-of-bounds access - a symptom of
deeper code quality issues. The bug in question managed to pass unit
tests, peer review, functional tests, and it didn’t trigger any
warnings from GCC or clang, even with the strict -Weverything flag I
enforce across my teams. This underscores a fundamental truth: every
software has bugs, and some, like this one, are notoriously difficult
to locate. The bug caused a segfault about once every 10 days,
manifesting in an unrelated part of the code and sometimes days after
the out-of-bounds write occurred.

This led me to wonder how I could accelerate such crashes to simplify debugging. In large programs, unnoticed memory corruption becomes more probable. One strategy is to break the program into modular parts that communicate via IPC so programs would be protected from each other
thanks to the wonders of protected mode. However, this approach
sacrifices the efficiency and simplicity of function calls. A more
elegant solution would be to leverage the MMU to isolate the memory of
each compilation unit, triggering a segfault when a unit accesses
memory outside its scope. Unfortunately, such technology does not seem
to exist yet - at least not in the Linux world (which is my target
platform).

Mateusz

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Mikko@21:1/5 to Mateusz Viste on Thu Jun 12 11:40:20 2025

On 2025-06-11 13:32:39 +0000, Mateusz Viste said:

This might not be a strictly C question, but it definitely concerns all
C programmers.

Earlier today, I fixed an out-of-bounds write bug. An obvious issue:

static int *socks[0xffff];

void update_my_socks(int *sock, int val) {
socks[val & 0xffff] = sock;
}

While the presented issue is common knowledge for anyone familiar with
C, *locating* the bug was challenging. The program did not crash at the moment of the out-of-bounds write but much later - somewhere entirely different, in a different object file that maintained a static pointer
for tracking a position in a linked list. To my surprise, the pointer
was randomly reset to NULL about once a week, causing a segfault.
Tracing this back to an unrelated out-of-bounds write elsewhere in the
code was tedious, to say the least.

This raises a question: how can such corruptions be detected sooner? Protected mode prevents interference between programs but doesn’t
safeguard a program from corrupting itself. Is there a way to enforce
memory protection between module files of the same program? After all,
static objects shouldn't be accessible outside their compilation unit.

How would you approach this?

The traditional method to ensure that a program or a part of a program
does not do what it must not do is testing. In this case the tester
must modify the code so that the array socks is a part of a larger
data structure and and call update_my_socks with different values for
val, including the critical values -1, 0, 0xfffe, and 0xffff.

--
Mikko

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Mateusz Viste@21:1/5 to All on Thu Jun 12 11:05:02 2025

On Thu, 12 Jun 2025 11:40 Mikko wrote:

The traditional method to ensure that a program or a part of a program
does not do what it must not do is testing. In this case the tester
must modify the code so that the array socks is a part of a larger
data structure and and call update_my_socks with different values for
val, including the critical values -1, 0, 0xfffe, and 0xffff.

Essentially checking for out-of-bounds writes using safeguard markers:

struct {
int low;
int array[0xffff];
int high;
} x;

low = -1;
high = -1;

do_some_job(&x);

assert((low == -1) && (high == -1));

This approach might be a valid strategy, but is it practical?
Uncertain. Foolproof? Definitely not: an out-of-bounds write could
easily occur 4 KiB past the array and be undetected.

While various testing methods exist, my original question wasn’t about testing scenarios, but rather about potential methods to isolate and
protect compilation units from one another.

It appears this is not a novel idea and there are some solutions, for
example CHERI: https://en.wikipedia.org/wiki/Capability_Hardware_Enhanced_RISC_Instructions

But this requires special hardware, while I am looking for something
that would be usable on Linux with commodity x86_64 hardware.

Mateusz

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Mateusz Viste@21:1/5 to All on Thu Jun 12 14:41:03 2025

On Wed, 11 Jun 2025 17:19 Opus wrote:

There is a proposed extension for the RISC-V ISA called CHERI that
offers the kind of fine-grained memory protection that could fit your
purpose here.

CHERI was indeed one of the first links that google offered when I
tried looking for an existing solution. But as you noted, it's not
available on "normal" hardware, and sadly google wasn't able to propose
any more "real-world" alternatives.

Second note: you chose to wrap indices around to handle possible out-of-bounds accesses. That may or may not be a good idea depending
on the exact context. You may alternatively want to do nothing if val
is out of bounds

This was about a primitive 64K hash map, so out of bounds situations
were expected impossible to happen... if the programmer hadn't
sized his array 1 entry too short.

Mateusz

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Mateusz Viste@21:1/5 to All on Thu Jun 12 14:31:17 2025

On Wed, 11 Jun 2025 17:14 David Brown wrote:

For debugging problems like this with gdb, you can put a data
breakpoint on the pointer that is your known symptom. Set it to stop
when something writes 0 to it - then you can see where you are in
code when that happens. Of course, that will be a real pain if it
only happens once a week.

The idea is good, but as you observed it is hard to apply in a
production situation when the issue happens like three times a month.

In fact, a breakpoint would be even overkill - I'd be perfectly happy
for the program crashing when said variable changes. Like a
runtime-setup assertion that constantly checks the state of the
variable. Sadly, I'm not aware of such mechanism either. :)

Mateusz

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Tim Rentsch@21:1/5 to Mateusz Viste on Thu Jun 12 06:05:09 2025

Mateusz Viste <mateusz@x.invalid> writes:

This might not be a strictly C question, but it definitely concerns all
C programmers.

Earlier today, I fixed an out-of-bounds write bug. An obvious issue:

static int *socks[0xffff];

void update_my_socks(int *sock, int val) {
socks[val & 0xffff] = sock;
}

While the presented issue is common knowledge for anyone familiar with
C, *locating* the bug was challenging. The program did not crash at the moment of the out-of-bounds write but much later - somewhere entirely different, in a different object file that maintained a static pointer
for tracking a position in a linked list. To my surprise, the pointer
was randomly reset to NULL about once a week, causing a segfault.
Tracing this back to an unrelated out-of-bounds write elsewhere in the
code was tedious, to say the least.

This raises a question: how can such corruptions be detected sooner? Protected mode prevents interference between programs but doesn?t
safeguard a program from corrupting itself. Is there a way to enforce
memory protection between module files of the same program? After all, static objects shouldn't be accessible outside their compilation unit.

How would you approach this?

The code in question shows several classic error patterns. In no
particular order:

* buffer overflow
* off-by-one error
* hard-coded constants (rather than symbolic)
* bitwise operator with signed operand
* using & to effect what is really a modulo operation
* two of the above combine to impose a constraint on a
hard-coded value, and the constraint is never checked

Of course some of these, notably buffer overflow, are hard to find.
But some of them are easy. The hard-coded constants stand out like a
neon sign, especially because one is duplicated. Check for any
constant written in open code above the value of, say, 10. Once the
offending example is found, it can be rewritten, as for example

static int *socks[0xffff];

void update_my_socks(int *sock, int val) {
const unsigned N = sizeof socks / sizeof socks[0];
socks[val % N] = sock;
}

This revision doesn't fix the program but it does eliminate the bug. (Presumably fixing the program will happen later.) Of course the
code should be further revised so that the temptation to use the
hard-coded value elsewhere is reduced, but this revision at least is
a step in the right direction.

Also, whenever a cockroach is seen, you can be sure there are other
cockroaches around. Each of the types of errors evidenced by the
original code (at least three of the list of six types) represent
bugs waiting to be found; go through the code and check for all
of them, at least for the ones that can be located easily. Add
these error classes to the list of potential problems checked
during code review.

I acknowledge that this response isn't exactly an answer to the
original question. It does illustrate though a kind of thinking
that can be useful when trying to track down hard-to-find bugs.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Scott Lurndal@21:1/5 to Mateusz Viste on Thu Jun 12 13:21:31 2025

Mateusz Viste <mateusz@x.invalid> writes:

On Wed, 11 Jun 2025 17:19 Opus wrote:

There is a proposed extension for the RISC-V ISA called CHERI that
offers the kind of fine-grained memory protection that could fit your
purpose here.

CHERI was indeed one of the first links that google offered when I
tried looking for an existing solution. But as you noted, it's not
available on "normal" hardware, and sadly google wasn't able to propose
any more "real-world" alternatives.

A real-world alternative is the Unisys Clearpath Libra system,
with fine grained capability-based object security (descended
from the B6500).

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Scott Lurndal@21:1/5 to Mateusz Viste on Thu Jun 12 13:18:04 2025

Mateusz Viste <mateusz@x.invalid> writes:

<snip>

This led me to wonder how I could accelerate such crashes to simplify >debugging. In large programs, unnoticed memory corruption becomes more >probable. One strategy is to break the program into modular parts that >communicate via IPC so programs would be protected from each other
thanks to the wonders of protected mode. However, this approach
sacrifices the efficiency and simplicity of function calls. A more
elegant solution would be to leverage the MMU to isolate the memory of
each compilation unit, triggering a segfault when a unit accesses
memory outside its scope. Unfortunately, such technology does not seem
to exist yet - at least not in the Linux world (which is my target
platform).

CHERI is designed to adddress those issue. And there is a linux-based
PoC.

https://cheri-alliance.org/

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From David Brown@21:1/5 to Mateusz Viste on Thu Jun 12 15:29:39 2025

On 12/06/2025 14:31, Mateusz Viste wrote:

On Wed, 11 Jun 2025 17:14 David Brown wrote:

For debugging problems like this with gdb, you can put a data
breakpoint on the pointer that is your known symptom. Set it to stop
when something writes 0 to it - then you can see where you are in
code when that happens. Of course, that will be a real pain if it
only happens once a week.

The idea is good, but as you observed it is hard to apply in a
production situation when the issue happens like three times a month.

In fact, a breakpoint would be even overkill - I'd be perfectly happy
for the program crashing when said variable changes. Like a
runtime-setup assertion that constantly checks the state of the
variable. Sadly, I'm not aware of such mechanism either. :)

Run-time assertions or other specific run-time checks will be triggered
when they see the given condition. For example, if you had compiled
with "gcc -fsanitize=null", then you'd get a run-time error and "crash"
when the null pointer was dereferenced.

But that only tells you when you look at the corrupted data - it tells
you nothing about when the data was corrupted.

A data breakpoint is triggered when the data item is written (or read, depending on the settings). I have only used these on embedded systems,
and don't know about their support in x86 hardware (assuming that is
your target). But the point is that the breakpoint would be hit in the
buggy code with the buffer overrun, rather than in the correct code that
used the pointer that got stomped on.

Data breakpoints are not perfect either - you will also get a hit when legitimate code changes the same address, and have to have filtering to
skip such false positives. They obviously do not directly help make the unwanted situation occur often enough for convenient debugging, but they
might nonetheless be useful. (Perhaps you have a bug that regularly
stomps on the pointer, and other code that regularly writes to the
pointer with valid data. The failure might only happen once a week by coincidence in timing, while the incorrect write to the pointer might
occur far more often.)

So data breakpoints are not always helpful, but I have used them in
similar circumstances and they are often a tool people don't know much
about.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Scott Lurndal@21:1/5 to David Brown on Thu Jun 12 14:27:47 2025

David Brown <david.brown@hesbynett.no> writes:

On 12/06/2025 14:31, Mateusz Viste wrote:

On Wed, 11 Jun 2025 17:14 David Brown wrote:

For debugging problems like this with gdb, you can put a data
breakpoint on the pointer that is your known symptom. Set it to stop
when something writes 0 to it - then you can see where you are in
code when that happens. Of course, that will be a real pain if it
only happens once a week.

The idea is good, but as you observed it is hard to apply in a
production situation when the issue happens like three times a month.

In fact, a breakpoint would be even overkill - I'd be perfectly happy
for the program crashing when said variable changes. Like a
runtime-setup assertion that constantly checks the state of the
variable. Sadly, I'm not aware of such mechanism either. :)

<snip>

A data breakpoint is triggered when the data item is written (or read, >depending on the settings). I have only used these on embedded systems,
and don't know about their support in x86 hardware (assuming that is
your target). But the point is that the breakpoint would be hit in the
buggy code with the buffer overrun, rather than in the correct code that
used the pointer that got stomped on.

I use data breakpoints on x86_64 systems routinely. The hdw supports a
small number of hardware data breakpoints. The gdb 'watch'command
will set a hardware data breakpoint. They're very useful, if
you know the address of the data that is being corrupted.

ARM64 also supports data breakpoints.

Data breakpoints are not perfect either - you will also get a hit when >legitimate code changes the same address, and have to have filtering to

GDB also has some filtering capability (the 'condition' command).

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Michael S@21:1/5 to Scott Lurndal on Thu Jun 12 20:01:59 2025

On Wed, 11 Jun 2025 14:32:39 GMT
scott@slp53.sl.home (Scott Lurndal) wrote:

=?UTF-8?Q?Josef_M=C3=B6llers?= <josef@invalid.invalid> writes:

On 11.06.25 15:32, Mateusz Viste wrote:

This might not be a strictly C question, but it definitely
concerns all C programmers.

Earlier today, I fixed an out-of-bounds write bug. An obvious
issue:

static int *socks[0xffff];

void update_my_socks(int *sock, int val) {
socks[val & 0xffff] = sock;
}

While the presented issue is common knowledge for anyone familiar
with C, *locating* the bug was challenging. The program did not
crash at the moment of the out-of-bounds write but much later -
somewhere entirely different, in a different object file that
maintained a static pointer for tracking a position in a linked
list. To my surprise, the pointer was randomly reset to NULL about
once a week, causing a segfault. Tracing this back to an unrelated
out-of-bounds write elsewhere in the code was tedious, to say the
least.

valgrind.

Probably too slow. If I were in Mateusz's situation, I would try AddressSanitizer.
Never tried it myself, but it looks like better fit for this particular relatively simple case of buffer overrun.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Richard Heathfield@21:1/5 to Scott Lurndal on Thu Jun 12 19:15:26 2025

On 11/06/2025 15:32, Scott Lurndal wrote:

=?UTF-8?Q?Josef_M=C3=B6llers?= <josef@invalid.invalid> writes:

On 11.06.25 15:32, Mateusz Viste wrote:

This might not be a strictly C question, but it definitely concerns all
C programmers.

Earlier today, I fixed an out-of-bounds write bug. An obvious issue:

static int *socks[0xffff];

void update_my_socks(int *sock, int val) {
socks[val & 0xffff] = sock;
}

While the presented issue is common knowledge for anyone familiar with
C, *locating* the bug was challenging. The program did not crash at the
moment of the out-of-bounds write but much later - somewhere entirely
different, in a different object file that maintained a static pointer
for tracking a position in a linked list. To my surprise, the pointer
was randomly reset to NULL about once a week, causing a segfault.
Tracing this back to an unrelated out-of-bounds write elsewhere in the
code was tedious, to say the least.

valgrind.

Sure. Or some people prefer to single-step with a debugger. Such
people can make their lives a little easier by surrounding the
buffer with sentinel soldiers, setting the sentinel soldiers to a
magic number, and putting a watch on them both - the buffer high
soldier and the buffer low soldier.

--
Richard Heathfield
Email: rjh at cpax dot org dot uk
"Usenet is a strange place" - dmr 29 July 1999
Sig line 4 vacant - apply within

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Kaz Kylheku@21:1/5 to Mateusz Viste on Thu Jun 12 18:59:10 2025

On 2025-06-12, Mateusz Viste <mateusz@x.invalid> wrote:

Thank you all for your thoughtful responses. You rightly identified
that the problem is essentially an out-of-bounds access - a symptom of
deeper code quality issues. The bug in question managed to pass unit
tests, peer review, functional tests, and it didn’t trigger any
warnings from GCC or clang, even with the strict -Weverything flag I
enforce across my teams. This underscores a fundamental truth: every
software has bugs, and some, like this one, are notoriously difficult
to locate. The bug caused a segfault about once every 10 days,
manifesting in an unrelated part of the code and sometimes days after
the out-of-bounds write occurred.

This led me to wonder how I could accelerate such crashes to simplify debugging.

Below is a proof-of-concept program that works in GNU/Linux. For
rapidity of prototyping, I have assumed a page size of 4096; this is not
right for all systems.

The my_array[] array is declared between two page-sized and page-aligned
guard arrays, guard_0 and guard_1.

The program write-protects the two arrays with mprotect.

The output demonstrates that the egregious overrun of my_array[],
namely a write to my_array[5000] triggers a segfault:

$ ./prog
Address of guard_0: 0x4c4000
Address of my_array: 0x4c5000
Address of guard_1: 0x4c6000
guard_1 is now write-protected (read-only).
writing my_array[0] succeeded
Segmentation fault (core dumped)

With a little additional effort, we can manipulate the declarations
such that the high element of my_array[] will be placed just before
the guard_1 page. Then we will have byte-accurate overrun detection,
at the loss of accurate underrun detection.

A bunch of decades ago, hacker Bruce Perens developed a malloc
debugging library called Electric Fence which implemented exactly
this technique, but for malloced objects. We can think of this
as "Electric Fence, but for static".

Note that all static arrays have initializers. This is so that they
are part of the same category of non-zero-initialized data.

I suspect that this will work fine if all three arrays are
zero-initialized or all three are non-zero-initialized, but not
for mixtures. The reason is that zero-initialized and
non-zero-initialized statics are separated and put into different
sections.

Try it, verify that my_array[-1] = 0 segfaults, showing that
there is accurate underrun protection. Try manipulating the
declarations to get my_array to butt up against guard_1.

Code follows ...

#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <sys/mman.h>

#define PAGE_SIZE 4096

static char __attribute__((aligned(PAGE_SIZE))) guard_0[PAGE_SIZE] = { 1 }; static char my_array[42] = { 1 };
static char __attribute__((aligned(PAGE_SIZE))) guard_1[PAGE_SIZE] = { 1 };

int main() {
printf("Address of guard_0: %p\n", (void*)guard_0);
printf("Address of my_array: %p\n", (void*)my_array);
printf("Address of guard_1: %p\n", (void*)guard_1);

if (mprotect(guard_0, PAGE_SIZE, PROT_READ) == -1) {
perror("mprotect guard_0 failed");
return EXIT_FAILURE;
}

if (mprotect(guard_1, PAGE_SIZE, PROT_READ) == -1) {
perror("mprotect guard_1 failed");
return EXIT_FAILURE;
}

printf("guard_1 is now write-protected (read-only).\n");

my_array[0] = 2;

printf("writing my_array[0] succeeded\n");

my_array[5000] = 2;

printf("writing my_array[5000] should not have succeeded\n");

return EXIT_SUCCESS;
}

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Mateusz Viste@21:1/5 to All on Fri Jun 13 08:47:17 2025

On Fri, 13 Jun 2025 08:00 Bonita Montero wrote:

Therefore I love bounds-checking C++ containers with MSVC (debug
builds) and with the libstdc++ runtime (enabled via macro). (...)
Debug builds are usually much slower, but if you use C++ that's even
more slower since simple things like a container acces via []-operator
occur with a separate function call while debugging. With iterator
-debugging that's even slower. But this price is worth the advantage
that you can easily find bounds-problems with C++.

Sounds similar to Pixar's "Electric Fence" that Kaz mentioned earlier: https://linux.die.net/man/3/efence

Depending on the performance impact this may or may not be a viable
solution to debug a rare production issue, but still nice to know it
exists.

Mateusz

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Mateusz Viste@21:1/5 to All on Fri Jun 13 09:13:13 2025

On Thu, 12 Jun 2025 20:01:59 Michael S wrote:

Probably too slow. If I were in Mateusz's situation, I would try AddressSanitizer.

Still slow - albeit maybe at an acceptable level. But if not suitable
for production code, this is actually an awesome addition for testing
builds. Thanks for the hint!

Never tried it myself, but it looks like better fit for this
particular relatively simple case of buffer overrun.

Part of the problem was that I had no clue this was a stupid buffer
overrun before I actually found the issue. My leading hypothesis was
involving mischievous gremlins tampering with bits in my variables.

In hindsight, enabling -fsanitize=address in testing builds could have highlighted the problem sooner, potentially sparing me a few hours of
hunting.

Mateusz

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Mateusz Viste@21:1/5 to All on Fri Jun 13 08:42:05 2025

On Thu, 12 Jun 2025 18:59 Kaz Kylheku wrote:

Below is a proof-of-concept program that works in GNU/Linux. For
rapidity of prototyping, I have assumed a page size of 4096; this is
not right for all systems.

This is very cool! A variation of the classic "sentinel-guarded
memory" concept, where sentinels are write-protected rather than
requiring runtime checks against some magic signature.

Another potential strategy would be to safeguard the static array
itself, or any other data storage for that matter, immediately after the legitimate code has finished using it. Then unprotect it only when
needed again. While this might not be a good performer for
high-frequency operations, it could be an interesting practice
for memory regions that are rarely modified.

man mprotect() suggests that it should be used only on mmap-ed memory,
but apparently under Linux it works with everything.

Mateusz

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Mateusz Viste@21:1/5 to All on Fri Jun 13 08:59:27 2025

On Thu, 12 Jun 2025 06:05 Tim Rentsch wrote:

The code in question shows several classic error patterns. In no
particular order:

* buffer overflow
* off-by-one error

I'd consider that one item, since one leads to another.

* bitwise operator with signed operand

My mistake. Real code is acting on something else than an int, I wasn't
paying enough attention when writing the illustrative example.

* using & to effect what is really a modulo operation

You think of it as modulo, I think of it as "bits trimming".
Essentially same operation, but different viewpoints I guess.

I acknowledge that this response isn't exactly an answer to the
original question. It does illustrate though a kind of thinking
that can be useful when trying to track down hard-to-find bugs.

Thank you for your insightful remarks. I completely agree - the best
way to debug a program is to avoid the need for debugging in the first
place. :-) But working with a large, 15-year-old codebase that has
seen contributions from dozens of programmers makes things a bit
non-ideal sometimes.

Mateusz

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Mateusz Viste@21:1/5 to All on Fri Jun 13 14:14:20 2025

On Fri, 13 Jun 2025 09:21 pozz wrote:

However this strategy assumes you already know there's some
instruction that write to the array at an out-of-bound position.

Yes, though I see Kaz's idea is to proactively protect all memory used
by the program. It's an interesting concept, though not particularly
practical.

I think the situation of the original post is different. His program
crashed infrequently, very infrequently, and he didn't know anything
about the cause. I think it was a very big effort to link the crash
to the array (in another source module) and to the out-of-bound
access of the array.

You are spot on indeed. Huge program with lots of modules, processing
millions of data entries every minute. Realizing that the issue was an
out of bounds situation was challenging because the symptoms were in a
totally different part of the program. Very confusing.

Hence why I was wondering if there is any way to make invalid memory
accesses *within the same program* generate a segfault, so next time I
have to deal with such self-sabotaging program I know at least which
module (compilation unit) to look at. Since then I learned that:
- There is no readily available mechanism for this today on x86
- CHERI shows great promise, possibly in the coming years
- mprotect() can offer some degree of protection but must be used
carefully, as it primarily safeguards against writes in general rather
than restricting which parts of the code can access memory

Mateusz

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Michael S@21:1/5 to Mateusz Viste on Fri Jun 13 16:56:23 2025

On Fri, 13 Jun 2025 14:14:20 +0200
Mateusz Viste <mateusz@x.invalid> wrote:

- There is no readily available mechanism for this today on x86

A significant part of x86 installed base (all Intel Core CPUs starting
from gen 6 up to gen 9 and their Xeon contemporaries) has extension
named Itel MPX that was invented exactly for that purpose. But it didn't
work particularly well. Compiler people never liked it, but despite
that it was supported by several generations of gcc and probably by
clang as well.

The proper solution to your problem is to stop using memory-unsafe
language for complex application programming. It's not that successful
use of unsafe languages is for complex application programming is
impossible. The practice proved many times that it can be done. But
only by very good team. You team is not good enough.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Richard Heathfield@21:1/5 to Michael S on Fri Jun 13 15:43:09 2025

On 13/06/2025 14:56, Michael S wrote:

The proper solution to your problem is to stop using memory-unsafe
language for complex application programming.

Not if you know what you're doing.

It's not that successful
use of unsafe languages is for complex application programming is
impossible.

It isn't.

The practice proved many times that it can be done. But
only by very good team. You team is not good enough.

Sound advice. If you can't stand the heat, get out of the
kitchen. Go and drive a cab or something, and leave programming
to the grown-ups.

--
Richard Heathfield
Email: rjh at cpax dot org dot uk
"Usenet is a strange place" - dmr 29 July 1999
Sig line 4 vacant - apply within

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Kaz Kylheku@21:1/5 to Mateusz Viste on Fri Jun 13 16:19:03 2025

On 2025-06-13, Mateusz Viste <mateusz@x.invalid> wrote:

On Fri, 13 Jun 2025 08:00 Bonita Montero wrote:

Therefore I love bounds-checking C++ containers with MSVC (debug
builds) and with the libstdc++ runtime (enabled via macro). (...)
Debug builds are usually much slower, but if you use C++ that's even
more slower since simple things like a container acces via []-operator
occur with a separate function call while debugging. With iterator
-debugging that's even slower. But this price is worth the advantage
that you can easily find bounds-problems with C++.

Sounds similar to Pixar's "Electric Fence" that Kaz mentioned earlier: https://linux.die.net/man/3/efence

Depending on the performance impact this may or may not be a viable
solution to debug a rare production issue, but still nice to know it
exists.

Saved my ass back in 1994. I cranked out an event-driven windowing
UI over ncurses and had a crash somewhere. The ncurses guys pointed
me to efence.

--
TXR Programming Language: http://nongnu.org/txr
Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal
Mastodon: @Kazinator@mstdn.ca

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Kaz Kylheku@21:1/5 to Mateusz Viste on Fri Jun 13 16:17:28 2025

On 2025-06-13, Mateusz Viste <mateusz@x.invalid> wrote:

On Thu, 12 Jun 2025 18:59 Kaz Kylheku wrote:

Below is a proof-of-concept program that works in GNU/Linux. For
rapidity of prototyping, I have assumed a page size of 4096; this is
not right for all systems.

This is very cool! A variation of the classic "sentinel-guarded
memory" concept, where sentinels are write-protected rather than
requiring runtime checks against some magic signature.

Another potential strategy would be to safeguard the static array
itself, or any other data storage for that matter, immediately after the legitimate code has finished using it. Then unprotect it only when
needed again. While this might not be a good performer for
high-frequency operations, it could be an interesting practice
for memory regions that are rarely modified.

I have taken such an approach in the integration between Valgrind
and the TXR Lisp garbage collector. Free objects are inaccessible.
During conservative parts of the scan, we could encounter, in the
run-time stack, a pointer to a freed object. So the Valgrind
API has to be used to make the object accessible before examining it.
If it is a free object, it is marked inaccessible again and ignored.

--
TXR Programming Language: http://nongnu.org/txr
Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal
Mastodon: @Kazinator@mstdn.ca

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Kaz Kylheku@21:1/5 to Michael S on Fri Jun 13 17:14:14 2025

On 2025-06-13, Michael S <already5chosen@yahoo.com> wrote:

The proper solution to your problem is to stop using memory-unsafe
language for complex application programming. It's not that successful
use of unsafe languages is for complex application programming is
impossible. The practice proved many times that it can be done. But
only by very good team. You team is not good enough.

There are disadvantages to it even if the team is good and the work
product is free of defects.

--
TXR Programming Language: http://nongnu.org/txr
Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal
Mastodon: @Kazinator@mstdn.ca

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Kaz Kylheku@21:1/5 to Mateusz Viste on Fri Jun 13 16:23:28 2025

On 2025-06-13, Mateusz Viste <mateusz@x.invalid> wrote:

On Fri, 13 Jun 2025 09:21 pozz wrote:

However this strategy assumes you already know there's some
instruction that write to the array at an out-of-bound position.

Yes, though I see Kaz's idea is to proactively protect all memory used
by the program. It's an interesting concept, though not particularly practical.

The question you posed at the root of the thread, in the middle of the
article was: "Is there a way to enforce memory protection between module
files of the same program?".

Well, that is one way. Put guard pages around their statics, and have
a little framework whereby the init routines of all the modules can
regsiter these pages. You can make it so that it all disapepars based
on some #define.

It can be entirely practical, depending on the program. Even
in some programs of moderate complexity.

--
TXR Programming Language: http://nongnu.org/txr
Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal
Mastodon: @Kazinator@mstdn.ca

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Tim Rentsch@21:1/5 to Keith Thompson on Fri Jun 13 15:48:37 2025

Keith Thompson <Keith.S.Thompson+u@gmail.com> writes:

wij <wyniijj5@gmail.com> writes:

On Fri, 2025-06-13 at 08:03 +0200, Bonita Montero wrote:

Am 12.06.2025 um 15:05 schrieb Tim Rentsch:

void update_my_socks(int *sock, int val) {
const unsigned N = sizeof socks / sizeof socks[0];
socks[val % N] = sock;
}

For someone who uses bounds-checked containers in C++ every day
this really looks achaic.

Really? What are they?

Feel free to discuss that in comp.lang.c++.

As a point of information, I have given up reading posts from
Bonita Montero.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Tim Rentsch@21:1/5 to Mateusz Viste on Fri Jun 13 16:31:26 2025

Mateusz Viste <mateusz@x.invalid> writes:

On Thu, 12 Jun 2025 06:05 Tim Rentsch wrote:

The code in question shows several classic error patterns. In no
particular order:

* buffer overflow
* off-by-one error

I'd consider that one item, since one leads to another.

You shouldn't. Even if they seem to be related in this instance,
they are distinct kinds of errors. The code I posted to eliminate
the buffer overflow does avoid that problem but it still had an
off-by-one error.

* using & to effect what is really a modulo operation

You think of it as modulo, I think of it as "bits trimming".
Essentially same operation, but different viewpoints I guess.

It isn't wrong to think of bitwise-and as masking-in (or possibly
masking-out) of certain bits, but it still isn't a modulo. A modulo
operation is what is desired; in some cases that can be effected by
a bitwise-and, but in this case bitwise-and does the wrong thing.
The whole point is that it is NOT essentially the same operation.
It's a different operation, and in this case the wrong one.

I acknowledge that this response isn't exactly an answer to the
original question. It does illustrate though a kind of thinking
that can be useful when trying to track down hard-to-find bugs.

Thank you for your insightful remarks. I completely agree - the best
way to debug a program is to avoid the need for debugging in the first
place. :-) But working with a large, 15-year-old codebase that has
seen contributions from dozens of programmers makes things a bit
non-ideal sometimes.

I think you have misunderstood the point of my comments. In some
cases one is confronted with a symptom that defies one's best
efforts to diagnose what is causing the symptom. Looking for known
classes of errors is another arrow in the quiver of techniques for
discovering what is causing the observed behavior. My point is that
there are several types of errors that could have been used, after
the fact, to uncover what was causing your problem here. Taking
this approach might end up using a fair bit of time, but that time
is not wasted if it finds other potential lurking bugs, and there is
a good chance it will.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Michael S@21:1/5 to Richard Heathfield on Sat Jun 14 22:07:02 2025

On Fri, 13 Jun 2025 15:43:09 +0100
Richard Heathfield <rjh@cpax.org.uk> wrote:

On 13/06/2025 14:56, Michael S wrote:

The practice proved many times that it can be done. But
only by very good team. You team is not good enough.

Sound advice. If you can't stand the heat, get out of the
kitchen. Go and drive a cab or something, and leave programming
to the grown-ups.

That does not sound right.
There are plenty of people that can be successful programmer despite
lacking abilities to be successful programmers in unsafe languages.
More so, it's not uncommon for people that can successfully program in
unsafe languages to be less productive application programmers than
people that, as you put it, "can't stand the heat".

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Mateusz Viste@21:1/5 to Michael S on Sat Jun 14 21:37:51 2025

On 13.06.2025 15:56, Michael S wrote:

A significant part of x86 installed base (all Intel Core CPUs starting
from gen 6 up to gen 9 and their Xeon contemporaries) has extension
named Itel MPX that was invented exactly for that purpose. But it didn't
work particularly well. Compiler people never liked it, but despite
that it was supported by several generations of gcc and probably by
clang as well.

This does not really sound like something "readily available", unless you
are suggesting that I migrate to a Linux kernel from 10 years ago, switch
to gcc 5.0 and use outdated hardware.

The proper solution to your problem is to stop using memory-unsafe
language for complex application programming. It's not that successful
use of unsafe languages is for complex application programming is
impossible. The practice proved many times that it can be done. But
only by very good team. You team is not good enough.

Just to clarify: I didn’t post here seeking help with a simple out-of-bounds
issue, nor was I here to vent. I’ve been wrangling C code in complex,
high-performance systems for over a decade - I’m managing just fine. Code
improvement is a continual, non-negotiable process in our line of work, but
fires happen occasionally nonetheless. While fixing the issue, I started
wondering about how faults like this could be located faster, that is
assuming they do slip into production - because in spite of the testing
process, some faults will inevitably get to customers.

A crash that happens closer to the source of the problem (same compilation
unit) would significantly ease the debugging effort. I figured it was a
topic worth sharing, in the spirit of sparking some constructive
discussions.

Mateusz

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Mateusz Viste@21:1/5 to Tim Rentsch on Sat Jun 14 22:22:12 2025

On 14.06.2025 01:31, Tim Rentsch wrote:

It isn't wrong to think of bitwise-and as masking-in (or possibly >masking-out) of certain bits, but it still isn't a modulo. A modulo >operation is what is desired;

By "different viewpoints," I meant that while you approach the problem by
applying a modulo operation to the index so it fits the array size, I tend
to think in terms of ensuring the index correctly maps to a location within
an n-bit address space. Naturally, the array should accommodate the maximum
possible index for the given address space, and that’s where the original
code fell short. And you're absolutely right that hardcoded values are
problematic, the size of the array should have been linked with the n-bits
address space expectation.

I think you have misunderstood the point of my comments. In some
cases one is confronted with a symptom that defies one's best
efforts to diagnose what is causing the symptom. Looking for known
classes of errors is another arrow in the quiver of techniques for >discovering what is causing the observed behavior.

My remark was tongue-in-cheek, but we’re clearly on the same wavelengt, no
worries. Digging into “known classes of errors” when facing bit-fiddling
gremlins is precisely how I pinpointed the root cause, and proactively
tracking other similar mistakes is on my todo. But this is an obvious,
mechanical and uninteresting subject. As I mentioned to Michael earlier,
improving code quality is a long-term, essential aspect of our work,
there’s no question about that. But alongside this continuous effort, I’m
always exploring strategies to be more defensive towards the current,
non-ideal code.

In this case, my initial thought was to split the program into smaller
components that communicate via IPC. This approach would allow a faulty
component to crash with a segfault without compromising the memory of other
parts and greatly easing the debugging process. An IPC is much more
limiting and slower than a function call, so it made me wonder if it is
possible to achieve a similar level of isolation within a single program.
That question led me to post here.

While there is no magic solution yet, Kaz suggested a clever workaround
using mprotect(), a compromise I’m considering applying in a few places.

Mateusz

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Waldek Hebisch@21:1/5 to Mateusz Viste on Sun Jun 15 13:57:59 2025

Mateusz Viste <mateusz@not.gonna.tell> wrote:

On 13.06.2025 15:56, Michael S wrote:

A significant part of x86 installed base (all Intel Core CPUs starting
from gen 6 up to gen 9 and their Xeon contemporaries) has extension
named Itel MPX that was invented exactly for that purpose. But it didn't >>work particularly well. Compiler people never liked it, but despite
that it was supported by several generations of gcc and probably by
clang as well.

This does not really sound like something "readily available", unless you
are suggesting that I migrate to a Linux kernel from 10 years ago, switch
to gcc 5.0 and use outdated hardware.

The proper solution to your problem is to stop using memory-unsafe
language for complex application programming. It's not that successful
use of unsafe languages is for complex application programming is >>impossible. The practice proved many times that it can be done. But
only by very good team. You team is not good enough.

Just to clarify: I didn’t post here seeking help with a simple out-of-bounds
issue, nor was I here to vent. I’ve been wrangling C code in complex, high-performance systems for over a decade - I’m managing just fine. Code improvement is a continual, non-negotiable process in our line of work, but fires happen occasionally nonetheless. While fixing the issue, I started wondering about how faults like this could be located faster, that is assuming they do slip into production - because in spite of the testing process, some faults will inevitably get to customers.

A crash that happens closer to the source of the problem (same compilation unit) would significantly ease the debugging effort. I figured it was a
topic worth sharing, in the spirit of sparking some constructive
discussions.

You should understand that C array indexing and pointer pointer
operations are defined in specific way. This has several
advantages. But also has significant cost: checking validity
of array indexing in C is much harder than in other languages.
Namely, in most languages implementation knows size/bounds of
an array and can automatically generate checks on each access.
This has some cost, but modern experience is that this cost
is quite acceptable (on average about 5-10% increase in runtime
and similar increase in size). In C compiler sometimes knows
size of the array, but in general it does not. So in C you
either use half measures, like hoping that paging hardware
will catch of of bound access (possibly arranging data layout to
increase chance of fault) or very expensive approches,
which essentially bundle bounds with the pointer (Intel
tried to add hardware support for this, but even with
hardware support it is still much more expensive than checking
in some other languages).

IIUC in your example the array was global, so compiler knew its
bound and in principle could generate bounds checks. But
I am not aware of C compiler which actually generate such
checks. AFAIK gcc sanitize options are doing somewhat different
thing, Tiny C has an option to generate bounds checks, but
it is not clear to me in which cases it is effective (and you
probably would not use Tiny C for preformance critical code).

Note that in C++ when you use C arrays, you have the same
situation as in C. But you can instead use array classes which
check accesses.

--
Waldek Hebisch

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Mateusz Viste@21:1/5 to antispam@fricas.org on Sun Jun 15 20:27:17 2025

On 15.06.2025 15:57, antispam@fricas.org wrote:

IIUC in your example the array was global, so compiler knew its
bound and in principle could generate bounds checks. But
I am not aware of C compiler which actually generate such
checks.

There was one apparently as early as 1983 :)

https://www.doc.ic.ac.uk/~afd/rarepapers/KendallBccRuntimeCheckingsforC.pdf

Granted, it wasn’t a full-fledged C compiler, more of a bounds-checking code
generator. Still, the paper is a fascinating read and highlights that this
topic has been explored for quite some time. A more recent variation on the
theme can be seen here (based on GCC BP, abandoned a couple years ago):

https://www.cs.purdue.edu/homes/xyzhang/fall07/Papers/TR181.pdf

That said, detecting out-of-bounds array access is no panacea. Memory
corruption can arise from various sources, such as dangling pointers or
poorly managed pointer arithmetic. Hence why I was looking in the direction
of the MMU. All compilation units of a program share the same set of TLBs.
I figured there might perhaps be a way to isolate a given compilation unit
in different TLBs, effectively sandboxing its memory, then make this unit
communicate with the rest of the program via shm when shared memory
accesses are needed.

Of course, even if such solution would be possible, it would not be very
practical. Besides, one could easily achieve the same isolation by turning
that compilation unit into a standalone, service-providing daemon.

Mateusz

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Waldek Hebisch@21:1/5 to Mateusz Viste on Sun Jun 15 23:50:15 2025

Mateusz Viste <mateusz@not.gonna.tell> wrote:

That said, detecting out-of-bounds array access is no panacea. Memory corruption can arise from various sources, such as dangling pointers or poorly managed pointer arithmetic.

AFAICS there is no reason for explicit pointer arithmetic in well
written C programs. Implicit pointer arithmetic (coming from array
indexing) is done by compiler so should be no problem. Like in
case of bounds checking using other languages can help in avoiding
dangling pointers.

Hence why I was looking in the direction
of the MMU. All compilation units of a program share the same set of TLBs.
I figured there might perhaps be a way to isolate a given compilation unit
in different TLBs, effectively sandboxing its memory, then make this unit communicate with the rest of the program via shm when shared memory
accesses are needed.

Changing TLB-s content is rather expensive. Also what "its memory"
is supposed to mean? Normaly functions in a C program pass pointers
to other functions, so several functions can legaly access rather
large and varying in time parts of memory. Best approximation to
your idea available in PC hardware is 286/386 segmentation. But
it proved to be quite inconvenient, so "everybody" is now using flat
mode. One could try to emulate segmentation using paging hardware,
and your idea clearly goes in such direction, but it is unlikely
to work well.

--
Waldek Hebisch

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Kaz Kylheku@21:1/5 to Waldek Hebisch on Mon Jun 16 01:01:35 2025

On 2025-06-15, Waldek Hebisch <antispam@fricas.org> wrote:

Mateusz Viste <mateusz@not.gonna.tell> wrote:

That said, detecting out-of-bounds array access is no panacea. Memory
corruption can arise from various sources, such as dangling pointers or
poorly managed pointer arithmetic.

AFAICS there is no reason for explicit pointer arithmetic in well
written C programs.

LOL, you heard it here.

Implicit pointer arithmetic (coming from array
indexing) is done by compiler so should be no problem. Like in

Array indexing *is* pointer arithmetic.

Are you not aware of this equivalence?

(E1)[(E2)] <---> *((E1) + (E2))

In fact, let's draw the commutative diagram

(E1)[(E2)] <---> *((E1) + (E2))
^ ^
| |
| |
v v
(E2)[(E1)] <---> *((E2) + (E1))

You're not saying anything here other than that you like the p[i]
/notation/ better than *(p + i), and &p[i] better than p + i.

Great, thanks for sharing!

You're not doing yourself any favor by confusing
"not styled in my taste" with "not well written".

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From James Kuyper@21:1/5 to Waldek Hebisch on Mon Jun 16 06:12:01 2025

On 2025-06-16 06:00, Waldek Hebisch wrote:

Kaz Kylheku <643-408-1753@kylheku.com> wrote:

...

You're not saying anything here other than that you like the p[i]
/notation/ better than *(p + i), and &p[i] better than p + i.

The indexing notation at least have chance of being automatically
checked (in cases when compiler/checker knows array size). With arbitrary user-written pointer arithmetic there is no hope of automatic checking.

Since they are, by definition, equivalent, *(p+i) is can be
automatically checked under precisely the same situations where p[i] can
be checked. It makes NO difference.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Waldek Hebisch@21:1/5 to Kaz Kylheku on Mon Jun 16 10:00:34 2025

Kaz Kylheku <643-408-1753@kylheku.com> wrote:

On 2025-06-15, Waldek Hebisch <antispam@fricas.org> wrote:

Mateusz Viste <mateusz@not.gonna.tell> wrote:

That said, detecting out-of-bounds array access is no panacea. Memory
corruption can arise from various sources, such as dangling pointers or
poorly managed pointer arithmetic.

AFAICS there is no reason for explicit pointer arithmetic in well
written C programs.

LOL, you heard it here.

Implicit pointer arithmetic (coming from array
indexing) is done by compiler so should be no problem. Like in

Array indexing *is* pointer arithmetic.

Are you not aware of this equivalence?

(E1)[(E2)] <---> *((E1) + (E2))

Learn to read.

In fact, let's draw the commutative diagram

(E1)[(E2)] <---> *((E1) + (E2))
^ ^
| |
| |
v v
(E2)[(E1)] <---> *((E2) + (E1))

You're not saying anything here other than that you like the p[i]
/notation/ better than *(p + i), and &p[i] better than p + i.

The indexing notation at least have chance of being automatically
checked (in cases when compiler/checker knows array size). With arbitrary user-written pointer arithmetic there is no hope of automatic checking.

--
Waldek Hebisch

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Louis Krupp@21:1/5 to Mateusz Viste on Mon Jun 16 06:29:30 2025

On 6/11/2025 7:32 AM, Mateusz Viste wrote:

This might not be a strictly C question, but it definitely concerns all
C programmers.

Earlier today, I fixed an out-of-bounds write bug. An obvious issue:

static int *socks[0xffff];

void update_my_socks(int *sock, int val) {
socks[val & 0xffff] = sock;
}

<snip>

Imagine an alternate universe in which array declarations took the form (borrowed from Unisys ALGOL):

array_name[lower_bound : upper_bound]

The array in question would have been declared

static int *socks[0 : 0xffff]

The mask 0xffff and the upper bound would have been the same, and the
code would have been obviously right instead of subtly wrong.

Louis

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Mateusz Viste@21:1/5 to All on Mon Jun 16 15:01:28 2025

On Mon, 16 Jun 2025 06:29:30 Louis Krupp wrote:

Imagine an alternate universe in which array declarations took the
form (borrowed from Unisys ALGOL):

array_name[lower_bound : upper_bound]

This alternate C universe you describe looks appealing, but I strongly
suspect it is currently tormented by violent conflicts between the
noble 0-based traditionalists, the idealistic 1-based reformists, and
the rogue "random-based" anarchists. Our C is not perfect, by we could
have ended with much worse.

Mateusz

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Tim Rentsch@21:1/5 to Waldek Hebisch on Mon Jun 16 06:10:46 2025

antispam@fricas.org (Waldek Hebisch) writes:

Mateusz Viste <mateusz@not.gonna.tell> wrote:

That said, detecting out-of-bounds array access is no panacea. Memory
corruption can arise from various sources, such as dangling pointers or
poorly managed pointer arithmetic.

AFAICS there is no reason for explicit pointer arithmetic in well
written C programs.

This assertion is in effect a No True Scotsman statement.

Implicit pointer arithmetic (coming from array
indexing) is done by compiler so should be no problem.

Even if there is no direct manipulation ("pointer arithmetic") of
pointer variables, access can be checked only if array bounds
information is available, and in many cases it isn't. The reason is
(among other things) C doesn't have array parameters; what it does
have instead is pointer parameters. At the point in the code when
an "array" access is to be done, the information needed to check
that an index value is in bounds just isn't available. The culprit
here is not explicit pointer arithmetic, but lacking the information
needed to do a bounds check. That lack is inherent in how the C
language works with respect to arrays and pointer conversion.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Rosario19@21:1/5 to Richard Heathfield on Mon Jun 16 18:14:05 2025

On Thu, 12 Jun 2025 19:15:26 +0100, Richard Heathfield wrote:

Sure. Or some people prefer to single-step with a debugger. Such
people can make their lives a little easier by surrounding the
buffer with sentinel soldiers, setting the sentinel soldiers to a
magic number, and putting a watch on them both - the buffer high
soldier and the buffer low soldier.

I think out of bound of the array many times there is a write of the 2
limit bounds memory... but there are cases where bound are ok but
memory is written out the array the same, in some other places

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Richard Heathfield@21:1/5 to All on Mon Jun 16 17:53:31 2025

On 16/06/2025 17:14, Rosario19 wrote:

On Thu, 12 Jun 2025 19:15:26 +0100, Richard Heathfield wrote:

Sure. Or some people prefer to single-step with a debugger. Such
people can make their lives a little easier by surrounding the
buffer with sentinel soldiers, setting the sentinel soldiers to a
magic number, and putting a watch on them both - the buffer high
soldier and the buffer low soldier.

I think out of bound of the array many times there is a write of the 2
limit bounds memory... but there are cases where bound are ok but
memory is written out the array the same, in some other places

<whoosh>

--
Richard Heathfield
Email: rjh at cpax dot org dot uk
"Usenet is a strange place" - dmr 29 July 1999
Sig line 4 vacant - apply within

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Waldek Hebisch@21:1/5 to Tim Rentsch on Mon Jun 16 16:47:26 2025

Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:

antispam@fricas.org (Waldek Hebisch) writes:

Mateusz Viste <mateusz@not.gonna.tell> wrote:

That said, detecting out-of-bounds array access is no panacea. Memory
corruption can arise from various sources, such as dangling pointers or
poorly managed pointer arithmetic.

AFAICS there is no reason for explicit pointer arithmetic in well
written C programs.

This assertion is in effect a No True Scotsman statement.

Implicit pointer arithmetic (coming from array
indexing) is done by compiler so should be no problem.

Even if there is no direct manipulation ("pointer arithmetic") of
pointer variables, access can be checked only if array bounds
information is available, and in many cases it isn't. The reason is
(among other things) C doesn't have array parameters; what it does
have instead is pointer parameters. At the point in the code when
an "array" access is to be done, the information needed to check
that an index value is in bounds just isn't available. The culprit
here is not explicit pointer arithmetic, but lacking the information
needed to do a bounds check. That lack is inherent in how the C
language works with respect to arrays and pointer conversion.

Yes, I wrote this in an earlier message. Here OP concern was
specifically "poorly managed pointer arithmetic".

--
Waldek Hebisch

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Tim Rentsch@21:1/5 to Mateusz Viste on Tue Jul 1 09:54:36 2025

Mateusz Viste <mateusz@not.gonna.tell> writes:

On 14.06.2025 01:31, Tim Rentsch wrote:

It isn't wrong to think of bitwise-and as masking-in (or possibly
masking-out) of certain bits, but it still isn't a modulo. A
modulo operation is what is desired;

By "different viewpoints," I meant that while you approach the
problem by applying a modulo operation to the index so it fits the
array size, I tend to think in terms of ensuring the index
correctly maps to a location within an n-bit address space.
Naturally, the array should accommodate the maximum possible index
for the given address space, and that?s where the original code
fell short. And you're absolutely right that hardcoded values are problematic, the size of the array should have been linked with
the n-bits address space expectation.

I understand what you're doing. However one thinks of it, what is
needed is a way to ensure the produced index value is in the range
of array index values, and that the mapping covers the full range of
array index values. Using bitwise-and is a way of solving a less
general problem. Unfortunately: one, although it is known that
using bitwise-and works only for certain array sizes, there was no
check or assertion in the code to verify that requirement; two,
it's a holdover from earlier times when the performance difference
might matter, but now it's a premature optimization (and in most
cases does not result in any improvement); and three, in this case
using bitwise-and contributed to the bug, which wouldn't have
happened if modulo had been used instead.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

Who's Online
Recent Visitors
- Gretchiie
  Tue Sep 16 05:20:21 2025
  from Derry, Nh via Telnet
- Ginger1
  Mon Sep 15 19:33:54 2025
  from London via SSH
- Bob Worm
  Mon Sep 15 15:42:34 2025
  from Wales, Uk via Telnet
- Gretchiie
  Mon Sep 15 05:16:29 2025
  from Derry, Nh via Telnet
- Fred Blogs
  Mon Sep 15 00:03:12 2025
  from Uk via SSH
- Plume
  Sun Sep 14 09:34:52 2025
  from Uk via Raw
- Gretchiie
  Sun Sep 14 06:07:30 2025
  from Derry, Nh via Telnet
- Thlc
  Sat Sep 13 17:11:34 2025
  from Rognac, France via Telnet

System Info

Sysop:	Keyop
Location:	Huddersfield, West Yorkshire, UK
Users:	546
Nodes:	16 (2 / 14)
Uptime:	24:09:18
Calls:	10,390
Calls today:	1
Files:	14,064
Messages:	6,417,011

Memory protection between compilation units?

Who's Online

Recent Visitors

System Info