The TRACTOR program aims to automate the translation of legacy C code to Rust. The goal is to achieve the same quality and style that a skilled Rust developer would produce, thereby eliminating the entire class of memory safety security vulnerabilities present in C programs. This program may involve novel combinations of software analysis, such as static analysis and dynamic analysis, and machine learning techniques like large language models.
Highlights from the forum thread:
There’s even a conspiracy theory that the Rust Foundation’s 501 organization type was chosen so it can conduct lobbying. The implication being that the Rust Foundation is behind government recommendations to move toward memory safe languages. (Big Borrow-Checker, if you will).
Assuming a worst case scenario, this could be the worst thing to happen to Rust’s image. We end up with billions of lines of rewritten Rust code that is full of soundness and logic bugs, and that no one understands.
DARPA funds some projects on a “there is an infinitesimal chance of success, but if you succeed, it’s a big deal” basis. Silent Talk is an example here - very unlikely to succeed, even at the beginning, but if you could hold a radio conversation without sound, that’d be a huge deal for special operations forces.
I’m gonna guess this is going to be a major pain to debug.
Translating entire codebases with LLMs? What could POSSIBLY go wrong?
I also don’t see how it would ever be possible to directly translate C to Rust. They’re so fundamentally different that things are bound to not work the same.
I don’t even understand how they are going to get around the memory security they are doing this translation for. Watch them have to break the security features of Rust just to make certain programs work.
I would expect that’s part of the point, if a C program can’t be converted to a language that doesn’t allow memory violations that probably indicates that there are execution pathways that result in memory violations.
What could go wrong with using human programmers to convert it?
If you’re going to insist on perfection for something like this then you’re probably never going to get anything done. Convert the program and then test and debug it just like you’d do with any newly written code. The idea is to make it easier to do that, not to make it so you don’t have to do it at all.
I’m gonna guess this is going to be a major
painprofit to debug.Some “AI” grifters gonna be showering in that state paper.
I think this is an interesting idea. If they’re able to pull it off, I think it will cement the usefulness of LLMs. I have my doubts, but it’s worth trying. I’d imagine that the LLM is specially tuned to be more adept at this task. Your bog-standard GPT-4 or Claude will probably be unreliable.
Having built code converters for the same language to auto migrate to a later version of that language, I’m incredibly worried. We still had to manually verify every thing.
I’m hopeful though that this does become the wave of the future. There’s some serious legacy shit out there that doesn’t have enough of a financial gain to revisit and rewrite.
Yeah, they’ll probably have to check everything. Though, I wonder if even just checking that everything is good to go would save time from manually re-writing it all. While it may not be a smashing success, it could still prove useful.
I dunno, I’m interested to see how this plays out.
I’vd tried multiple times to convert existing code or createnew ones using LLMs. The first attempts are OK, but once you start refining the prompts, they all go off-the-rails.
Most of the time, the generated code uses old or deprecated libraries or APIs. You point that out and they correct it. But a few iterations later, you’re refining something else and the old, deprecated calls come back. Once again, you point it out and it gets corrected.
Forget trying to correct it yourself by hand, because now it’s diverged from the LLM context. And this can happen in multiple places in the code. Rinse. Repeat.
At some point you just give up. Either it’s wrong or it will be wrong in different ways later. You have to read through every line to find strange, divergent errors. Over and over. It gets exhausting.
At the end, it feels like maybe you could have done it faster and more quickly yourself, but the time has already been sunk.
Maybe it would be easier to translate to Ada? That is for C code that doesn’t make heavy use of malloc/free. The idea of Rust’s borrow checker as I understand it is to statically track the references to malloc’d memory to make sure that you never use-after-free or double-free. If your C code uses malloc in uncontrolled ways, then massaging it to satisfy a borrow checker sounds horribly difficult and you should either give up, or run it under a very managed environment like valgrind. If (as is typical of embedded code) it just does stuff with some fixed memory buffers and doesn’t do much runtime allocation, then there isn’t anything for a borrow checker to look after, so you can use a safe language (Ada) that doesn’t have borrow checking.
Disclaimer: I don’t use Rust at the moment. Someday. I do like Ada despite its verbosity, but it’s not that great at managing dynamic memory. It is starting to take on Rust influences to help with that.
AFAIK you can get around this by using raw pointers / unsafe blocks in Rust, then have a human target those to rewrite it in a safe, structured way.