This might not be a strictly C question, but it definitely concerns all
C programmers.
Earlier today, I fixed an out-of-bounds write bug. An obvious issue:
static int *socks[0xffff];
void update_my_socks(int *sock, int val) {
socks[val & 0xffff] = sock;
}
While the presented issue is common knowledge for anyone familiar with
C, *locating* the bug was challenging. The program did not crash at the moment of the out-of-bounds write but much later - somewhere entirely different, in a different object file that maintained a static pointer
for tracking a position in a linked list. To my surprise, the pointer
was randomly reset to NULL about once a week, causing a segfault.
Tracing this back to an unrelated out-of-bounds write elsewhere in the
code was tedious, to say the least.
This raises a question: how can such corruptions be detected sooner? Protected mode prevents interference between programs but doesn’t
safeguard a program from corrupting itself. Is there a way to enforce
memory protection between module files of the same program? After all,
static objects shouldn't be accessible outside their compilation unit.
How would you approach this?
This might not be a strictly C question, but it definitely concerns all
C programmers.
Earlier today, I fixed an out-of-bounds write bug. An obvious issue:
static int *socks[0xffff];
void update_my_socks(int *sock, int val) {
socks[val & 0xffff] = sock;
}
While the presented issue is common knowledge for anyone familiar with
C, *locating* the bug was challenging. The program did not crash at the moment of the out-of-bounds write but much later - somewhere entirely different, in a different object file that maintained a static pointer
for tracking a position in a linked list. To my surprise, the pointer
was randomly reset to NULL about once a week, causing a segfault.
Tracing this back to an unrelated out-of-bounds write elsewhere in the
code was tedious, to say the least.
This raises a question: how can such corruptions be detected sooner? Protected mode prevents interference between programs but doesn’t
safeguard a program from corrupting itself.
Is there a way to enforce memory protection between module files of
the same program? After all, static objects shouldn't be accessible
outside their compilation unit.
How would you approach this?
On 11.06.25 15:32, Mateusz Viste wrote:
This might not be a strictly C question, but it definitely concerns all
C programmers.
Earlier today, I fixed an out-of-bounds write bug. An obvious issue:
static int *socks[0xffff];
void update_my_socks(int *sock, int val) {
socks[val & 0xffff] = sock;
}
While the presented issue is common knowledge for anyone familiar with
C, *locating* the bug was challenging. The program did not crash at the
moment of the out-of-bounds write but much later - somewhere entirely
different, in a different object file that maintained a static pointer
for tracking a position in a linked list. To my surprise, the pointer
was randomly reset to NULL about once a week, causing a segfault.
Tracing this back to an unrelated out-of-bounds write elsewhere in the
code was tedious, to say the least.
This might not be a strictly C question, but it definitely concerns all
C programmers.
Earlier today, I fixed an out-of-bounds write bug. An obvious issue:
static int *socks[0xffff];
void update_my_socks(int *sock, int val) {
socks[val & 0xffff] = sock;
}
While the presented issue is common knowledge for anyone familiar with
C, *locating* the bug was challenging. The program did not crash at the moment of the out-of-bounds write but much later - somewhere entirely different, in a different object file that maintained a static pointer
for tracking a position in a linked list. To my surprise, the pointer
was randomly reset to NULL about once a week, causing a segfault.
Tracing this back to an unrelated out-of-bounds write elsewhere in the
code was tedious, to say the least.
This raises a question: how can such corruptions be detected sooner? Protected mode prevents interference between programs but doesn’t
safeguard a program from corrupting itself. Is there a way to enforce
memory protection between module files of the same program? After all,
static objects shouldn't be accessible outside their compilation unit.
How would you approach this?
This might not be a strictly C question, but it definitely concerns all
C programmers.
(...)
This raises a question: how can such corruptions be detected sooner? Protected mode prevents interference between programs but doesn’t
safeguard a program from corrupting itself. Is there a way to enforce
memory protection between module files of the same program? After all,
static objects shouldn't be accessible outside their compilation unit.
How would you approach this?
This might not be a strictly C question, but it definitely concerns all
C programmers.
Earlier today, I fixed an out-of-bounds write bug. An obvious issue:
static int *socks[0xffff];
void update_my_socks(int *sock, int val) {
socks[val & 0xffff] = sock;
}
While the presented issue is common knowledge for anyone familiar with
C, *locating* the bug was challenging. The program did not crash at the moment of the out-of-bounds write but much later - somewhere entirely different, in a different object file that maintained a static pointer
for tracking a position in a linked list. To my surprise, the pointer
was randomly reset to NULL about once a week, causing a segfault.
Tracing this back to an unrelated out-of-bounds write elsewhere in the
code was tedious, to say the least.
This raises a question: how can such corruptions be detected sooner? Protected mode prevents interference between programs but doesn’t
safeguard a program from corrupting itself. Is there a way to enforce
memory protection between module files of the same program? After all,
static objects shouldn't be accessible outside their compilation unit.
How would you approach this?
The traditional method to ensure that a program or a part of a program
does not do what it must not do is testing. In this case the tester
must modify the code so that the array socks is a part of a larger
data structure and and call update_my_socks with different values for
val, including the critical values -1, 0, 0xfffe, and 0xffff.
There is a proposed extension for the RISC-V ISA called CHERI that
offers the kind of fine-grained memory protection that could fit your
purpose here.
Second note: you chose to wrap indices around to handle possible out-of-bounds accesses. That may or may not be a good idea depending
on the exact context. You may alternatively want to do nothing if val
is out of bounds
For debugging problems like this with gdb, you can put a data
breakpoint on the pointer that is your known symptom. Set it to stop
when something writes 0 to it - then you can see where you are in
code when that happens. Of course, that will be a real pain if it
only happens once a week.
This might not be a strictly C question, but it definitely concerns all
C programmers.
Earlier today, I fixed an out-of-bounds write bug. An obvious issue:
static int *socks[0xffff];
void update_my_socks(int *sock, int val) {
socks[val & 0xffff] = sock;
}
While the presented issue is common knowledge for anyone familiar with
C, *locating* the bug was challenging. The program did not crash at the moment of the out-of-bounds write but much later - somewhere entirely different, in a different object file that maintained a static pointer
for tracking a position in a linked list. To my surprise, the pointer
was randomly reset to NULL about once a week, causing a segfault.
Tracing this back to an unrelated out-of-bounds write elsewhere in the
code was tedious, to say the least.
This raises a question: how can such corruptions be detected sooner? Protected mode prevents interference between programs but doesn?t
safeguard a program from corrupting itself. Is there a way to enforce
memory protection between module files of the same program? After all, static objects shouldn't be accessible outside their compilation unit.
How would you approach this?
On Wed, 11 Jun 2025 17:19 Opus wrote:
There is a proposed extension for the RISC-V ISA called CHERI that
offers the kind of fine-grained memory protection that could fit your
purpose here.
CHERI was indeed one of the first links that google offered when I
tried looking for an existing solution. But as you noted, it's not
available on "normal" hardware, and sadly google wasn't able to propose
any more "real-world" alternatives.
This led me to wonder how I could accelerate such crashes to simplify >debugging. In large programs, unnoticed memory corruption becomes more >probable. One strategy is to break the program into modular parts that >communicate via IPC so programs would be protected from each other
thanks to the wonders of protected mode. However, this approach
sacrifices the efficiency and simplicity of function calls. A more
elegant solution would be to leverage the MMU to isolate the memory of
each compilation unit, triggering a segfault when a unit accesses
memory outside its scope. Unfortunately, such technology does not seem
to exist yet - at least not in the Linux world (which is my target
platform).
On Wed, 11 Jun 2025 17:14 David Brown wrote:
For debugging problems like this with gdb, you can put a data
breakpoint on the pointer that is your known symptom. Set it to stop
when something writes 0 to it - then you can see where you are in
code when that happens. Of course, that will be a real pain if it
only happens once a week.
The idea is good, but as you observed it is hard to apply in a
production situation when the issue happens like three times a month.
In fact, a breakpoint would be even overkill - I'd be perfectly happy
for the program crashing when said variable changes. Like a
runtime-setup assertion that constantly checks the state of the
variable. Sadly, I'm not aware of such mechanism either. :)
On 12/06/2025 14:31, Mateusz Viste wrote:
On Wed, 11 Jun 2025 17:14 David Brown wrote:
For debugging problems like this with gdb, you can put a data
breakpoint on the pointer that is your known symptom. Set it to stop
when something writes 0 to it - then you can see where you are in
code when that happens. Of course, that will be a real pain if it
only happens once a week.
The idea is good, but as you observed it is hard to apply in a
production situation when the issue happens like three times a month.
In fact, a breakpoint would be even overkill - I'd be perfectly happy
for the program crashing when said variable changes. Like a
runtime-setup assertion that constantly checks the state of the
variable. Sadly, I'm not aware of such mechanism either. :)
A data breakpoint is triggered when the data item is written (or read, >depending on the settings). I have only used these on embedded systems,
and don't know about their support in x86 hardware (assuming that is
your target). But the point is that the breakpoint would be hit in the
buggy code with the buffer overrun, rather than in the correct code that
used the pointer that got stomped on.
Data breakpoints are not perfect either - you will also get a hit when >legitimate code changes the same address, and have to have filtering to
=?UTF-8?Q?Josef_M=C3=B6llers?= <josef@invalid.invalid> writes:
On 11.06.25 15:32, Mateusz Viste wrote:
This might not be a strictly C question, but it definitely
concerns all C programmers.
Earlier today, I fixed an out-of-bounds write bug. An obvious
issue:
static int *socks[0xffff];
void update_my_socks(int *sock, int val) {
socks[val & 0xffff] = sock;
}
While the presented issue is common knowledge for anyone familiar
with C, *locating* the bug was challenging. The program did not
crash at the moment of the out-of-bounds write but much later -
somewhere entirely different, in a different object file that
maintained a static pointer for tracking a position in a linked
list. To my surprise, the pointer was randomly reset to NULL about
once a week, causing a segfault. Tracing this back to an unrelated
out-of-bounds write elsewhere in the code was tedious, to say the
least.
valgrind.
=?UTF-8?Q?Josef_M=C3=B6llers?= <josef@invalid.invalid> writes:
On 11.06.25 15:32, Mateusz Viste wrote:
This might not be a strictly C question, but it definitely concerns all
C programmers.
Earlier today, I fixed an out-of-bounds write bug. An obvious issue:
static int *socks[0xffff];
void update_my_socks(int *sock, int val) {
socks[val & 0xffff] = sock;
}
While the presented issue is common knowledge for anyone familiar with
C, *locating* the bug was challenging. The program did not crash at the
moment of the out-of-bounds write but much later - somewhere entirely
different, in a different object file that maintained a static pointer
for tracking a position in a linked list. To my surprise, the pointer
was randomly reset to NULL about once a week, causing a segfault.
Tracing this back to an unrelated out-of-bounds write elsewhere in the
code was tedious, to say the least.
valgrind.
Thank you all for your thoughtful responses. You rightly identified
that the problem is essentially an out-of-bounds access - a symptom of
deeper code quality issues. The bug in question managed to pass unit
tests, peer review, functional tests, and it didn’t trigger any
warnings from GCC or clang, even with the strict -Weverything flag I
enforce across my teams. This underscores a fundamental truth: every
software has bugs, and some, like this one, are notoriously difficult
to locate. The bug caused a segfault about once every 10 days,
manifesting in an unrelated part of the code and sometimes days after
the out-of-bounds write occurred.
This led me to wonder how I could accelerate such crashes to simplify debugging.
Therefore I love bounds-checking C++ containers with MSVC (debug
builds) and with the libstdc++ runtime (enabled via macro). (...)
Debug builds are usually much slower, but if you use C++ that's even
more slower since simple things like a container acces via []-operator
occur with a separate function call while debugging. With iterator
-debugging that's even slower. But this price is worth the advantage
that you can easily find bounds-problems with C++.
Probably too slow. If I were in Mateusz's situation, I would try AddressSanitizer.
Never tried it myself, but it looks like better fit for this
particular relatively simple case of buffer overrun.
Below is a proof-of-concept program that works in GNU/Linux. For
rapidity of prototyping, I have assumed a page size of 4096; this is
not right for all systems.
The code in question shows several classic error patterns. In no
particular order:
* buffer overflow
* off-by-one error
* bitwise operator with signed operand
* using & to effect what is really a modulo operation
I acknowledge that this response isn't exactly an answer to the
original question. It does illustrate though a kind of thinking
that can be useful when trying to track down hard-to-find bugs.
However this strategy assumes you already know there's some
instruction that write to the array at an out-of-bound position.
I think the situation of the original post is different. His program
crashed infrequently, very infrequently, and he didn't know anything
about the cause. I think it was a very big effort to link the crash
to the array (in another source module) and to the out-of-bound
access of the array.
- There is no readily available mechanism for this today on x86
The proper solution to your problem is to stop using memory-unsafe
language for complex application programming.
It's not that successful
use of unsafe languages is for complex application programming is
impossible.
The practice proved many times that it can be done. But
only by very good team. You team is not good enough.
On Fri, 13 Jun 2025 08:00 Bonita Montero wrote:
Therefore I love bounds-checking C++ containers with MSVC (debug
builds) and with the libstdc++ runtime (enabled via macro). (...)
Debug builds are usually much slower, but if you use C++ that's even
more slower since simple things like a container acces via []-operator
occur with a separate function call while debugging. With iterator
-debugging that's even slower. But this price is worth the advantage
that you can easily find bounds-problems with C++.
Sounds similar to Pixar's "Electric Fence" that Kaz mentioned earlier: https://linux.die.net/man/3/efence
Depending on the performance impact this may or may not be a viable
solution to debug a rare production issue, but still nice to know it
exists.
On Thu, 12 Jun 2025 18:59 Kaz Kylheku wrote:
Below is a proof-of-concept program that works in GNU/Linux. For
rapidity of prototyping, I have assumed a page size of 4096; this is
not right for all systems.
This is very cool! A variation of the classic "sentinel-guarded
memory" concept, where sentinels are write-protected rather than
requiring runtime checks against some magic signature.
Another potential strategy would be to safeguard the static array
itself, or any other data storage for that matter, immediately after the legitimate code has finished using it. Then unprotect it only when
needed again. While this might not be a good performer for
high-frequency operations, it could be an interesting practice
for memory regions that are rarely modified.
The proper solution to your problem is to stop using memory-unsafe
language for complex application programming. It's not that successful
use of unsafe languages is for complex application programming is
impossible. The practice proved many times that it can be done. But
only by very good team. You team is not good enough.
On Fri, 13 Jun 2025 09:21 pozz wrote:
However this strategy assumes you already know there's some
instruction that write to the array at an out-of-bound position.
Yes, though I see Kaz's idea is to proactively protect all memory used
by the program. It's an interesting concept, though not particularly practical.
wij <wyniijj5@gmail.com> writes:
On Fri, 2025-06-13 at 08:03 +0200, Bonita Montero wrote:
Am 12.06.2025 um 15:05 schrieb Tim Rentsch:
void update_my_socks(int *sock, int val) {
const unsigned N = sizeof socks / sizeof socks[0];
socks[val % N] = sock;
}
For someone who uses bounds-checked containers in C++ every day
this really looks achaic.
Really? What are they?
Feel free to discuss that in comp.lang.c++.
On Thu, 12 Jun 2025 06:05 Tim Rentsch wrote:
The code in question shows several classic error patterns. In no
particular order:
* buffer overflow
* off-by-one error
I'd consider that one item, since one leads to another.
* using & to effect what is really a modulo operation
You think of it as modulo, I think of it as "bits trimming".
Essentially same operation, but different viewpoints I guess.
I acknowledge that this response isn't exactly an answer to the
original question. It does illustrate though a kind of thinking
that can be useful when trying to track down hard-to-find bugs.
Thank you for your insightful remarks. I completely agree - the best
way to debug a program is to avoid the need for debugging in the first
place. :-) But working with a large, 15-year-old codebase that has
seen contributions from dozens of programmers makes things a bit
non-ideal sometimes.
On 13/06/2025 14:56, Michael S wrote:
The practice proved many times that it can be done. But
only by very good team. You team is not good enough.
Sound advice. If you can't stand the heat, get out of the
kitchen. Go and drive a cab or something, and leave programming
to the grown-ups.
A significant part of x86 installed base (all Intel Core CPUs starting
from gen 6 up to gen 9 and their Xeon contemporaries) has extension
named Itel MPX that was invented exactly for that purpose. But it didn't
work particularly well. Compiler people never liked it, but despite
that it was supported by several generations of gcc and probably by
clang as well.
The proper solution to your problem is to stop using memory-unsafe
language for complex application programming. It's not that successful
use of unsafe languages is for complex application programming is
impossible. The practice proved many times that it can be done. But
only by very good team. You team is not good enough.
It isn't wrong to think of bitwise-and as masking-in (or possibly >masking-out) of certain bits, but it still isn't a modulo. A modulo >operation is what is desired;
I think you have misunderstood the point of my comments. In some
cases one is confronted with a symptom that defies one's best
efforts to diagnose what is causing the symptom. Looking for known
classes of errors is another arrow in the quiver of techniques for >discovering what is causing the observed behavior.
On 13.06.2025 15:56, Michael S wrote:
A significant part of x86 installed base (all Intel Core CPUs starting
from gen 6 up to gen 9 and their Xeon contemporaries) has extension
named Itel MPX that was invented exactly for that purpose. But it didn't >>work particularly well. Compiler people never liked it, but despite
that it was supported by several generations of gcc and probably by
clang as well.
This does not really sound like something "readily available", unless you
are suggesting that I migrate to a Linux kernel from 10 years ago, switch
to gcc 5.0 and use outdated hardware.
The proper solution to your problem is to stop using memory-unsafe
language for complex application programming. It's not that successful
use of unsafe languages is for complex application programming is >>impossible. The practice proved many times that it can be done. But
only by very good team. You team is not good enough.
Just to clarify: I didn’t post here seeking help with a simple out-of-bounds
issue, nor was I here to vent. I’ve been wrangling C code in complex, high-performance systems for over a decade - I’m managing just fine. Code improvement is a continual, non-negotiable process in our line of work, but fires happen occasionally nonetheless. While fixing the issue, I started wondering about how faults like this could be located faster, that is assuming they do slip into production - because in spite of the testing process, some faults will inevitably get to customers.
A crash that happens closer to the source of the problem (same compilation unit) would significantly ease the debugging effort. I figured it was a
topic worth sharing, in the spirit of sparking some constructive
discussions.
IIUC in your example the array was global, so compiler knew its
bound and in principle could generate bounds checks. But
I am not aware of C compiler which actually generate such
checks.
That said, detecting out-of-bounds array access is no panacea. Memory corruption can arise from various sources, such as dangling pointers or poorly managed pointer arithmetic.
Hence why I was looking in the direction
of the MMU. All compilation units of a program share the same set of TLBs.
I figured there might perhaps be a way to isolate a given compilation unit
in different TLBs, effectively sandboxing its memory, then make this unit communicate with the rest of the program via shm when shared memory
accesses are needed.
Mateusz Viste <mateusz@not.gonna.tell> wrote:
That said, detecting out-of-bounds array access is no panacea. Memory
corruption can arise from various sources, such as dangling pointers or
poorly managed pointer arithmetic.
AFAICS there is no reason for explicit pointer arithmetic in well
written C programs.
Implicit pointer arithmetic (coming from array
indexing) is done by compiler so should be no problem. Like in
Kaz Kylheku <643-408-1753@kylheku.com> wrote:...
You're not saying anything here other than that you like the p[i]
/notation/ better than *(p + i), and &p[i] better than p + i.
The indexing notation at least have chance of being automatically
checked (in cases when compiler/checker knows array size). With arbitrary user-written pointer arithmetic there is no hope of automatic checking.
On 2025-06-15, Waldek Hebisch <antispam@fricas.org> wrote:
Mateusz Viste <mateusz@not.gonna.tell> wrote:
That said, detecting out-of-bounds array access is no panacea. Memory
corruption can arise from various sources, such as dangling pointers or
poorly managed pointer arithmetic.
AFAICS there is no reason for explicit pointer arithmetic in well
written C programs.
LOL, you heard it here.
Implicit pointer arithmetic (coming from array
indexing) is done by compiler so should be no problem. Like in
Array indexing *is* pointer arithmetic.
Are you not aware of this equivalence?
(E1)[(E2)] <---> *((E1) + (E2))
In fact, let's draw the commutative diagram
(E1)[(E2)] <---> *((E1) + (E2))
^ ^
| |
| |
v v
(E2)[(E1)] <---> *((E2) + (E1))
You're not saying anything here other than that you like the p[i]
/notation/ better than *(p + i), and &p[i] better than p + i.
This might not be a strictly C question, but it definitely concerns all
C programmers.
Earlier today, I fixed an out-of-bounds write bug. An obvious issue:
static int *socks[0xffff];
void update_my_socks(int *sock, int val) {
socks[val & 0xffff] = sock;
}
<snip>
Imagine an alternate universe in which array declarations took the
form (borrowed from Unisys ALGOL):
array_name[lower_bound : upper_bound]
Mateusz Viste <mateusz@not.gonna.tell> wrote:
That said, detecting out-of-bounds array access is no panacea. Memory
corruption can arise from various sources, such as dangling pointers or
poorly managed pointer arithmetic.
AFAICS there is no reason for explicit pointer arithmetic in well
written C programs.
Implicit pointer arithmetic (coming from array
indexing) is done by compiler so should be no problem.
Sure. Or some people prefer to single-step with a debugger. Such
people can make their lives a little easier by surrounding the
buffer with sentinel soldiers, setting the sentinel soldiers to a
magic number, and putting a watch on them both - the buffer high
soldier and the buffer low soldier.
On Thu, 12 Jun 2025 19:15:26 +0100, Richard Heathfield wrote:
Sure. Or some people prefer to single-step with a debugger. Such
people can make their lives a little easier by surrounding the
buffer with sentinel soldiers, setting the sentinel soldiers to a
magic number, and putting a watch on them both - the buffer high
soldier and the buffer low soldier.
I think out of bound of the array many times there is a write of the 2
limit bounds memory... but there are cases where bound are ok but
memory is written out the array the same, in some other places
antispam@fricas.org (Waldek Hebisch) writes:
Mateusz Viste <mateusz@not.gonna.tell> wrote:
That said, detecting out-of-bounds array access is no panacea. Memory
corruption can arise from various sources, such as dangling pointers or
poorly managed pointer arithmetic.
AFAICS there is no reason for explicit pointer arithmetic in well
written C programs.
This assertion is in effect a No True Scotsman statement.
Implicit pointer arithmetic (coming from array
indexing) is done by compiler so should be no problem.
Even if there is no direct manipulation ("pointer arithmetic") of
pointer variables, access can be checked only if array bounds
information is available, and in many cases it isn't. The reason is
(among other things) C doesn't have array parameters; what it does
have instead is pointer parameters. At the point in the code when
an "array" access is to be done, the information needed to check
that an index value is in bounds just isn't available. The culprit
here is not explicit pointer arithmetic, but lacking the information
needed to do a bounds check. That lack is inherent in how the C
language works with respect to arrays and pointer conversion.
On 14.06.2025 01:31, Tim Rentsch wrote:
It isn't wrong to think of bitwise-and as masking-in (or possibly
masking-out) of certain bits, but it still isn't a modulo. A
modulo operation is what is desired;
By "different viewpoints," I meant that while you approach the
problem by applying a modulo operation to the index so it fits the
array size, I tend to think in terms of ensuring the index
correctly maps to a location within an n-bit address space.
Naturally, the array should accommodate the maximum possible index
for the given address space, and that?s where the original code
fell short. And you're absolutely right that hardcoded values are problematic, the size of the array should have been linked with
the n-bits address space expectation.
Sysop: | Keyop |
---|---|
Location: | Huddersfield, West Yorkshire, UK |
Users: | 546 |
Nodes: | 16 (2 / 14) |
Uptime: | 24:09:18 |
Calls: | 10,390 |
Calls today: | 1 |
Files: | 14,064 |
Messages: | 6,417,011 |