Forum: >>> Magnum BBS <<<

Paper: PR2: Peephole Raw Pointer Rewriting with LLMs for Translating C

From John R Levine@21:1/5 to All on Fri May 9 12:27:11 2025

Automated tools translate C to Rust but produce lousy Rust code because of
C's loose pointer semantics. They use an LLM to improve it somewhat.

Abstract
There has been a growing interest in translating C code to Rust due to
Rust's robust memory and thread safety guarantees. Tools such as C2RUST
enable syntax-guided transpilation from C to semantically equivalent Rust
code. However, the resulting Rust programs often rely heavily on unsafe constructs--particularly raw pointers--which undermines Rust's safety guarantees. This paper aims to improve the memory safety of Rust programs generated by C2RUST by eliminating raw pointers. Specifically, we propose
a peephole raw pointer rewriting technique that lifts raw pointers in individual functions to appropriate Rust data structures. Technically, PR2 employs decision-tree-based prompting to guide the pointer lifting
process. Additionally, it leverages code change analysis to guide the
repair of errors introduced during rewriting, effectively addressing
errors encountered during compilation and test case execution. We
implement PR2 as a prototype and evaluate it using gpt-4o-mini on 28
real-world C projects. The results show that PR2 successfully eliminates
13.22% of local raw pointers across these projects, significantly
enhancing the safety of the translated Rust code. On average, PR2
completes the transformation of a project in 5.44 hours, at an average
cost of $1.46.

https://arxiv.org/abs/2505.04852

Regards,
John Levine, johnl@taugh.com, Taughannock Networks, Trumansburg NY
Please consider the environment before reading this e-mail. https://jl.ly

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Derek@21:1/5 to All on Tue May 13 21:30:43 2025

All,

Automated tools translate C to Rust but produce lousy Rust code because of C's loose pointer semantics. They use an LLM to improve it somewhat.

Developers could always stay with C and switch on all the
pointer+array bounds checking that GCC/LLVM have been supporting for
some years (30 in the case of gcc).

I have been trying to find out how many products written in Rust
actually ship with the checking still switched on.

Way back when, most products written in Pascal used to ship with the
checking switched off, so that customers did not see the strange
errors+program termination.

I suspect that the same is happening with Rust. If so, how does using
Rust make the code safer than using C without any checking switched
on?

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From arnold@freefriends.org@21:1/5 to derek-nospam@shape-of-code.com on Wed May 14 08:21:51 2025

In article <25-05-005@comp.compilers>,
Derek <derek-nospam@shape-of-code.com> wrote:

I suspect that the same is happening with Rust. If so, how does using
Rust make the code safer than using C without any checking switched
on?

Rust catches many problems at compile time. I am not at all a Rust
expert, or even a novice, but I don't think Rust does runtime
bounds checking, since it relies on compiler analysis instead.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Kaz Kylheku@21:1/5 to arnold@freefriends.org on Wed May 14 20:01:47 2025

On 2025-05-14, arnold@freefriends.org <arnold@freefriends.org> wrote:

In article <25-05-005@comp.compilers>,
Derek <derek-nospam@shape-of-code.com> wrote:

I suspect that the same is happening with Rust. If so, how does using
Rust make the code safer than using C without any checking switched
on?

Rust catches many problems at compile time. I am not at all a Rust
expert, or even a novice, but I don't think Rust does runtime
bounds checking, since it relies on compiler analysis instead.

How would it be safe if you could write a Rust program that asks the
user to input a random decimal number, and then uses it an index to
access an array, without any check?

The compiler will eliminate bounds checks at compile time if it can
infer they are unnecessary; e.g. a loop sets up a dummy variable to step
over the correct range, and does not mess with it otherwise.

--
TXR Programming Language: http://nongnu.org/txr
Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal
Mastodon: @Kazinator@mstdn.ca

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From anton@mips.complang.tuwien.ac.at@21:1/5 to Kaz Kylheku on Thu May 15 07:48:12 2025

Kaz Kylheku <643-408-1753@kylheku.com> writes:

On 2025-05-14, arnold@freefriends.org <arnold@freefriends.org> wrote:

[Rust] relies on compiler analysis instead.

How would it be safe if you could write a Rust program that asks the
user to input a random decimal number, and then uses it an index to
access an array, without any check?

I don't know if Rust does it this way, but it could reject a program
that does a[i] if it cannot prove that i is an allowed index for a.
For your example, a program like this would be rejected:

input i
print a[i]

(using what little I remember from BASIC syntax because I don't know the Rust syntax:-). If you want the compiler to accept it, you could write

input i
if i < length[a] then
print a[i]
else
print "index out of range"
endif

- anton
--
M. Anton Ertl
anton@mips.complang.tuwien.ac.at
http://www.complang.tuwien.ac.at/anton/
[I believe that Rust does runtime checks unless it can prove at compile time that they're not needed.
It has a fancy exception system to catch access violations. -John]

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Christopher F Clark@21:1/5 to All on Fri May 16 02:26:57 2025

There has been some debate here about how Rust is "safer". And having
written a little bit or Rust, I can explain that a little bit.

The main point of Rust's safety guarantees are around "heap allocated" memory, not around array bounds checking, although I believe that references to arrays are bounds checked and that it is more difficult to turn off array bounds checking in Rust than in Pascal. It is not a compiler option. It has to be done by declaring a module to be "unsafe" and then it is obvious that that particular
module is responsible for its own checking (and I still don't know whether it applies to array bounds checking or not) since I have written production code in
Rust and never have written an unsafe module, as it was unnecessary to do so. The safe code is generally sufficiently expressive and performant that one doesn't need (in many cases) to write "unsafe" code.

So, assuming that one is writing safe Rust. One gets checking, but does so with negligible performance impact. It did not impact the SQL engine we wrote in Rust
and we benchmarked it to be certain.

But, now returning to the main point. Rust has a "different" model of dealing with "heap allocated" memory. It is vaguely akin to Java's garbage collection model, in that memory continues to exist as long as there are potential references to that memory. And this is the job of the "borrow checker" to ensure
that at compile time that can be proven to be true. And, for me, the easiest way
to think about it is that Rust treats "heap memory" like it was a stack but it has coroutines, so their lifetime can be extended beyond a simple stack.

Still in any case, like C ownership conventions, all objects in safe Rust have an owner and exist as long as that owner says they do. And, you cannot get a pointer to such an object, except by "borrowing it" from the owner. The borrow checker enforces that rule and while you have a "borrowed copy" of the object, the owner cannot get rid of it. Moreover, the borrow checker makes sure that the
code "borrowing" the object stops borrowing it before the owner wants to get rid
of it. You get a compile time error if the borrow checker cannot prove that is true. And, in the simplest cases, the creation of an object (and its deletion) are done via scopes, thus making it all very stack-like.

Moreover, beyond simple cases, you need to decorate your object with "lifetimes". That's one of the ways you can express nontrivial uses of an object. Fortunately, lots of simple cases are covered and don't explicitly need lifetimes, e.g. you use an object in a stack-like fashion where you borrow it (and don't take a pointer to it that can be leaked--pointers that cannot be leaked are generally ok). If you do take a pointer, that can be leaked, you will
likely need lifetime annotations. And, how does the borrow checker assure that pointers cannot be leaked (or at least did it in the Rust compiler I used), by requiring ownership to be hierarchical, such that the owned object is a child of
the owning object (e.g. ownership is a tree, not a DAG, a tree). Thus, you don't
create Rust objects that are general graphs and make the borrow checker happy. You can make stacks and queues and trees, but not general graphs, not even DAGs using the base mechanism.

Of course, that's a pretty strict mechanism, so safe Rust has a solution to it. It has reference counted pointers (i.e. ones that one can garbage collect). Those let you make DAGs. When you "borrow" one of those the count is incremented
and stop borrowing it, the count is decremented and upon the count becoming zero, the object is freed. Not my favorite garbage collecting scheme, but it is "safe"

And, if you want truly circular links, there are "weak references" in addition. You cannot directly access an object through a weak reference. You need to write
code that promotes it to a strong reference to access it, and that code performs
the checking to be sure the object exists.

This is not all of Rust's safety guarantees. Objects in Rust are also immutable by default. You cannot just borrow an object and mutate it. You must explicitly borrow a mutable copy, from an owner (or borrower) who themselves has a mutable copy. Moreover, while your code has a mutable copy, it has an exclusive copy of the object, no one else can get a copy from that owner. You can pass down to your childrem immutable copies or your mutable copy. But, if I recall correctly,
you cannot mutate the object while they have "borrowed" it.

All of this, means that Rust code is written in a more "functional programming style". You don't generally make an array and mutate it. You make a new copy of the array with your changes. And while that may seem inefficient. There are many
algorithms that work well in the regime. Moreover, if the Rust compiler can determine that your code is safe, it can eliminate making copies and do in place
modification.

In my opinion, this makes Rust code more challenging to write, but it does live up to its goal of making the code "safer". You simply cannot easily write "unsafe" code. The compiler simply refuses to compile it. And, my guess is that's why only a small percentage of C code can be turned into *safe* Rust. So many C idioms don't enforce the safe Rust rules. They allow mutating objects in place. They allow passing pointers to places that don't enforce the lifetime rules. They don't require programmers to check that pointers to objects point to
valid objects. You cannot compile any of those things in a safe Rust module. It's not just bounds checking. It's limiting programmers to code that the compiler can prove is safe and not compiling anything the compiler cannot prove is safe.

-- ****************************************************************************** Chris Clark email: christopher.f.clark@compiler-resources.com Compiler Resources, Inc. Web Site: http://world.std.com/~compres
23 Bailey Rd voice: (508) 435-5016
Berlin, MA 01503 USA twitter: @intel_chris ------------------------------------------------------------------------------

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From George Neuner@21:1/5 to arnold@freefriends.org on Thu May 15 11:52:16 2025

On Wed, 14 May 2025 08:21:51 +0000, arnold@freefriends.org wrote:

In article <25-05-005@comp.compilers>,
Derek <derek-nospam@shape-of-code.com> wrote:

I suspect that the same is happening with Rust. If so, how does using
Rust make the code safer than using C without any checking switched
on?

Rust catches many problems at compile time. I am not at all a Rust
expert, or even a novice, but I don't think Rust does runtime
bounds checking, since it relies on compiler analysis instead.

Debug builds in Rust may do considerable runtime checking depending on
what the code is trying to do.

There is a small amount of checking done even in release builds. There
are always some things that can't be checked at compile time.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From cross@spitfire.i.gajendra.net@21:1/5 to arnold@freefriends.org on Fri May 16 15:42:33 2025

In article <25-05-006@comp.compilers>, <arnold@freefriends.org> wrote:

In article <25-05-005@comp.compilers>,
Derek <derek-nospam@shape-of-code.com> wrote:

I suspect that the same is happening with Rust. If so, how does using
Rust make the code safer than using C without any checking switched
on?

Rust catches many problems at compile time. I am not at all a Rust
expert, or even a novice, but I don't think Rust does runtime
bounds checking, since it relies on compiler analysis instead.

Other way 'round, mostly. Array bounds checking is performed at
runtime, but if the compiler can prove that the bounds check is
superfluous (trivial example: the index is the constant 0 for a
non-empty array) then it can elide the code that does the check.
Someone has put together a nice document demonstrating some of
the more useful techniques:

https://github.com/Shnatsel/bounds-check-cookbook/

- Dan C.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Kaz Kylheku@21:1/5 to cross@spitfire.i.gajendra.net on Fri May 16 17:57:35 2025

On 2025-05-16, cross@spitfire.i.gajendra.net <cross@spitfire.i.gajendra.net> wrote:

In article <25-05-006@comp.compilers>, <arnold@freefriends.org> wrote:

In article <25-05-005@comp.compilers>,
Derek <derek-nospam@shape-of-code.com> wrote:

I suspect that the same is happening with Rust. If so, how does using >>>Rust make the code safer than using C without any checking switched
on?

Rust catches many problems at compile time. I am not at all a Rust
expert, or even a novice, but I don't think Rust does runtime
bounds checking, since it relies on compiler analysis instead.

Other way 'round, mostly. Array bounds checking is performed at
runtime, but if the compiler can prove that the bounds check is
superfluous (trivial example: the index is the constant 0 for a
non-empty array) then it can elide the code that does the check.

The logic doesn't even have to be specific to array bounds checking.

If we know that "i" is in the range 0 to 9, then "if (i < 10) S;"
is dead code, whether appearing literally that way in the source
code, or whether such a test is generated for an array access.

--
TXR Programming Language: http://nongnu.org/txr
Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal
Mastodon: @Kazinator@mstdn.ca

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

Who's Online
Recent Visitors
- Guest
  Sun Jun 15 18:58:11 2025
  from No via SSH
- Plume
  Sun Jun 15 15:01:03 2025
  from Uk via SSH
- Centurion
  Sun Jun 15 09:44:59 2025
  from Berea, Ohio via Telnet
- Deasl
  Sun Jun 15 08:43:59 2025
  from Foo, Bar via SSH
- Deasl
  Sun Jun 15 08:41:06 2025
  from Foo, Bar via SSH
- Plume
  Sat Jun 14 21:49:07 2025
  from Uk via SSH
- Max Prime
  Sat Jun 14 16:47:41 2025
  from United Kingdom via SSH
- Deasl
  Sat Jun 14 16:38:22 2025
  from Foo, Bar via SSH

System Info

Sysop:	Keyop
Location:	Huddersfield, West Yorkshire, UK
Users:	493
Nodes:	16 (2 / 14)
Uptime:	175:52:34
Calls:	9,705
Calls today:	5
Files:	13,736
Messages:	6,178,954

Paper: PR2: Peephole Raw Pointer Rewriting with LLMs for Translating C

Who's Online

Recent Visitors

System Info