Fil-C: A memory-safe C implementation

186 points by chmaynard 14 hours ago

mbrock 9 hours ago

I'm working on packaging Fil-C for Nix, as well as integrating Fil-C as a toolchain in Nix so you can build any Nix package with Fil-C.

https://github.com/mbrock/filnix

It's working. It builds tmux, nethack, coreutils, Perl, Tcl, Lua, SQLite, and a bunch of other stuff.

Binary cache on https://filc.cachix.org so you don't have to wait 40 minutes for the Clang fork to build.

If you have Nix with flakes on a 64-bit Linux computer, you can run

    nix run github:mbrock/filnix#nethack

right now!

yjftsjthsd-h 6 hours ago

That is super cool, and I will probably start running it on at least a test box shortly.
How does python work? Of course I can just add filc.python to my system, but if I `python3 -m pip install whatever` will it just rebuild any C modules with the fil-c compiler?
Cloudef 3 hours ago

Yeah, I was thinking nix will be probably one of the first things that can easily adapt Fil-C as it already packages in a way that allows different packages to be completely independent of each other, thus Fil-C's ABI compatibility does not matter. I assume other targets will be mostly enterprise distros where the perf hit and source compatibility issues are less of a concern, and memory safety is absolutely critical.
Fil-C compiled flatpaks might be a interesting target as well for normal desktop users. (e.g. running a browser)
I wonder if GPU graphics are possible in Fil-C land? Perhaps only if whole mesa stack is compiled using Fil-C as well, limiting GPU use to open drivers?
kragen 8 hours ago

That's very exciting! Thank you!

kragen 10 hours ago

Either Fil-C or a different implementation of the same idea seems essential to me. A great deal of software has been written in C, and without some way of running it, we lose access to that intellectual heritage. But pervasive security vulnerabilities mean that the traditional "YOLO" approach to C compilation is a bad idea for software that has to handle untrusted input, such as Web browsing or email.

Pizlo seems to have found an astonishingly cheap way to do the necessary pointer checking, which hopefully I will be able to understand after more study. (The part I'm still confused about is how InvisiCaps work with memcpy.)

tialaramex points out that we shouldn't expect C programmers to be excited about Fil-C. The point tialaramex mentions is "DWIM", like, accessing random memory and executing in constant time, but I think most C programmers won't be willing to take a 4× performance hit. After all, if they wanted to be using a slow language, they wouldn't be writing their code in C. But I think that's the wrong place to look for interest: Fil-C's target audience is users of C programs, not authors of C programs. We want the benefits of security and continued usage of existing working codebases, without having to pay the cost to rewrite everything in Rust or TypeScript or whatever. And for many of us, much of the time, the performance hit may be acceptable.

nielsbot 10 hours ago

I like to share this every time there's a post about memory safe C:
Apple has a memory-safer C compiler/variant they use to compile their boot loaders:
https://saaramar.github.io/iBoot_firebloom/
- pizlonator 8 hours ago
  
  That was my idea and I wrote a good chunk of the compiler and runtime.
  
  kragen 8 hours ago
  
  (For those who didn't make the connection, pizlonator also wrote Fil-C.)
  
  frumplestlatz 5 hours ago
  
  I don’t want to out myself or simply cast vague aspersions against which one cannot easily defend, so I’ll just say that was not the reality on the ground.
  
  pizlonator 4 hours ago
  
  Sounds like you weren't there
  
  frumplestlatz 4 hours ago
  
  Okay. I really don’t want to break HN rules, but it’s interesting to see that your time away hasn’t tempered your myopic ego one bit.
  
  jibal an hour ago
  
  That breaks HN rules.
  
  kragen an hour ago
  
  (But revealing your identity would not.)
- EPWN3D 7 hours ago
  
  You don't even need to reverse it. It's in the public clang, and I'm working on helping my team adopt it in some test cases.
  And it's not just the bounds-checking that's great -- it makes a bunch of C anti-patterns much harder, and it makes you think a lot harder about pointer ownership and usage. Really a great addition to the language, and it's source-compatible with empty macro-definitions (with two exceptions).
  
  pizlonator 4 hours ago
  
  > It's in the public clang
  I think you’re thinking of something else
  
  kragen 7 hours ago
  
  Interesting! How do you get started?
- kragen 10 hours ago
  
  Yeah, fat pointers are definitely a viable approach, but a lot of the existing C code that is the main argument for Fil-C assumes it can put a pointer in a long. (Most of the C code that assumed you could put it in an int has been flushed out by now, but that was a big problem when the Alpha came out.) I'm guessing that the amount of existing C code in Apple's bootloader is minimal, maybe 1000 lines, not the billions of lines you can compile with Fil-C.
  
  ummonk 9 hours ago
  
  Couldn’t one just make long bigger then to make it match?
  
  jibal an hour ago
  
  There's a lot of code that makes assumptions about the number of bytes in a long rather than diligently using sizeof ... remember, the whole point here is low quality code.
  
  kragen 9 hours ago
  
  Maybe so; I haven't tried. Probably a lot less code depends on unsigned long wrapping at 2⁶⁴ than used to depend on unsigned int wrapping at 2¹⁶, and we got past that. But stability standards were lower then. Any code that runs on both 32-bit and 64-bit LP64 systems can't be too dependent on the exact sizeof long, and sizeof long already isn't sizeof int the way it was on 32-bit platforms.
  
  cryptonector 2 hours ago
  
  It's going to break stuff one way or another.
  
  matthewfcarlson 9 hours ago
  
  You’re off by a few orders of magnitude. I’ll grant you, what is the bootloader becomes a very complex question. Even if you scope it to just “what is the code physically etched into the chip as the mask ROM” (secureROM) you’re talking hundreds of thousands. If you’re talking about all the code that runs before the kernel starts executing you’re talking hundreds of millions.
  
  kragen 9 hours ago
  
  No, I was only talking about the pre-existing C code that wasn't written for the bootloader, which therefore might have incompatibilities with fat pointers you had to hunt down and fix.
  Also I'm really skeptical about your "hundreds of millions" number, even if we're talking about all the code that runs before the kernel starts. How do you figure? The entire Linux kernel doesn't contain a hundred millions of lines of code, and that includes all the drivers for network cards, SCSI controllers, and multiport serial boards that nobody's made in 30 years, plus ports to Alpha, HP PA-RISC, Loongson, Motorola 68000, and another half-dozen architectures. All of that contains maybe 30 million lines. glibc is half a million. Firefox 140.4.0esr is 33 million. You're saying that the bootloader is six times the size of Firefox?
  Are you really suggesting that tens of gigabytes of source code are compiled into the bootloader? That would make the bootloader at least a few hundred megabytes of executable code, probably gigabytes, wouldn't it?
- astrange 10 hours ago
  
  A descendent of this is in clang as -fbounds-safety.
- conradev 10 hours ago
  
  and the author of Fil-C worked on that!
  
  kragen 9 hours ago
  
  Oh, somehow I missed that connection!
TuxSH 10 hours ago

Also this is _de facto_ limited to userspace application for the mainstream OSes if my understanding is correct.
Reading Fil-C website's "InvisiCaps by example" page, I see that "Laundering Integers As Pointers" is disallowed. This essentially disqualifies Fil-C for low-level work, which makes for a substantial part of C programs.
(int2ptr for MMIO/pre-allocated memory is in theory UB, in practice just fine as long as you don't otherwise break aliasing rules (and lifetime rules in C++) - as the compiler will fail to track provenance at least once).
But that isn't really what Fil-C is aimed at - the value is, as you implied, in hardening userspace applications.
- pizlonator 8 hours ago
  
  It’s not so fundamental of a limitation.
  Fil-C already allows memory mapped I/O in the form of mmap.
  The only thing missing that is needed for kernel level MMIO is a way to forge a capability. I don’t allow that right now, but that’s mostly a policy decision. It also falls out from the fact that InvisiCaps optimize the lower by having it double as a pointer to the top of the capability. That’s also not fundamental; it’s an implementation choice.
  It’s true that InvisiCaps will always disallow int to ptr casts, in the sense that you get a pointer with no capability. You’d want MMIO code to have some intrinsic like `zunsafe_forge_ptr` that clearly calls out what’s happening and then you’d use that wherever you define your memory mapped registers.
  
  cryptonector 2 hours ago
  
  Can you "launder" pointers through integers just to do things like drop `const`? It's a very common pattern to have to drop attributes like `const` due to crappy APIs: `const foo a = ...; foo b = (foo *)(uintptr_t)a;`
  
  kragen an hour ago
  
  Hopefully Pizlo will correct me if I get this wrong, but I don't think Fil-C's pointer tagging enforces constness, which isn't needed for C in any case. This C code compiles with no warnings and outputs "Howlong\n" with GCC 12.2.0-14 -ansi -pedantic -Wall -Wextra:
  #include <stdio.h> int main() { const char c[] = "Howling\n"; char *p = (char*)c; p[4] = 'o'; printf("%s", c); return 0; }
  Somewhat to my surprise, it still compiles successfully with no warnings as C++ (renaming to deconst.cc and compiling with g++). I don't know C++ that well, since I've only been using it for 35 years, which isn't nearly long enough to learn the whole language unless you write a compiler for it.
  Same results with Debian clang (and clang++) version 14.0.6 with the same options.
  Of course, if you change c[] to *c, it will segfault. But it still compiles successfully without warnings.
  Laundering your pointer through an integer is evidently not necessary.
  
  pizlonator 6 minutes ago
  
  Fil-C capabilities have a read only bit of this purpose.
  
  kragen 2 minutes ago
  
  Interesting, so you really can enforce constness on references? Does that mean the above code will crash on Fil-C?
  
  jibal an hour ago
  
  Did you compile with -Wcast-qual?
  
  kragen an hour ago
  
  No, that does give a warning.
  
  apple1417 an hour ago
  
  I'll preface this by saying my experience is with embedded, not kernel, but I can't imagine MMIO is significantly different.
  There would still be ways to make it work with a more restricted intrinsic, if you didn't want to open up the ability for full pointer forging. At a high level, you're basically just saying "This struct exists at this constant physical address, and doesn't need initialisation". I could imagine a "#define UART zmmio_ptr(UART_Type, 0x1234)" - which perhaps requires a compile time constant address. Alternatively, it's not uncommon for embedded compilers to have a way to force a variable to a physical address, maybe you'd write something like "UART_Type UART __at(0x1234);". I believe this is technically already possible using sections, it's just a massive pain creating one section per struct for dozens and dozens.
  Unfortunately the way existing code does it is pretty much always "#define UART ((UART_Type*)0x1234)". I feel like trying to identify this pattern is probably too risky a heuristic, so source code edits seem required to me.
  
  cyberax 5 hours ago
  
  I'm curious, what's your strategy for integrating the GC with low-level code? I've been thinking about trying to use it for Arduino development. Mostly as a thought experiment for now (as I'm playing with Rust on RP2040).
  
  pizlonator 4 hours ago
  
  If it's userlevel code, then it just works.
  It's a concurrent GC.
  If I wanted to go to kernel, I'd probably get rid of the GC. I've tweeted about what Fil-C would look like without GC. Short version: use-after-free would not trap anymore, but you wouldn't be able to use it to break out of the capability system. Similar to CHERI without its capability GC.
  
  cyberax 3 hours ago
  
  Arduino is kinda both. You have full control over the execution flow, but then you have to actually exercise the full control over the execution flow. The only major wrinkle are the hardware interrupts.
  One interesting feature is that there might be some synergy there. The GC safepoints can be used to implement cooperative multitasking, with capabilities making it safe.
- mbrock 9 hours ago
  
  Check out this document to see how the Fil-C ports of Python and Perl and so on work:
  https://github.com/mbrock/filnix/blob/main/ports/analysis.md
  This is still within the userspace application realm but it's good to know that Fil-C does have explicit capability-preserving operations (`zxorptr`, `zretagptr`, etc) to do e.g. pointer tagging, and special support for mapping pointers to integer table indices and back (`zptrtable`, etc).
- kragen 10 hours ago
  
  Yes, I think that's reasonable. I imagine you wouldn't have to extend Fil-C very much to sneak some memory-mapped I/O addresses into your program, but maybe having the garbage collector pause the program in the middle of an interrupt handler would have other bad effects. Like, if you were generating a video signal, you'd surely get display glitches.
nickpsecurity 5 hours ago

Softbound + CETS was one of the old attempts I liked:
https://people.cs.rutgers.edu/~santosh.nagarakatte/softbound...
CCured was another:
https://people.eecs.berkeley.edu/~necula/Papers/ccured_popl0...
- kragen 3 hours ago
  
  Good to see you back! I hope you're doing okay; I know you ran into some real trouble a couple of years ago.

gnabgib 13 hours ago

No discussion, but just on the front page last week (31 points) https://news.ycombinator.com/item?id=45655519

Previous discussion:

2025 Safepoints and Fil-C (87 points, 1 month ago, 44 comments) https://news.ycombinator.com/item?id=45258029

2025 Fil's Unbelievable Garbage Collector (603 points, 2 months ago, 281 comments) https://news.ycombinator.com/item?id=45133938

2024 The Fil-C Manifesto: Garbage In, Memory Safety Out (13 points, 17 comments) https://news.ycombinator.com/item?id=39449500

ridiculous_fish 2 hours ago

Extraordinary project. I had several questions which I believe I have answered for myself (pizlonator please correct if wrong):

1. How do we prevent loading a bogus lower through misaligned store or load?

Answer: Misaligned pointer load/stores are trapped; this is simply not allowed.

2. How are pointer stores through a pointer implemented (e.g. `*(char **)p = s`) - does the runtime have to check if *p is "flight" or "heap" to know where to store the lower?

Answer: no. Flight (i.e. local) pointers whose address is taken are not literally implemented as two adjacent words; rather the call frame is allocated with the same object layout as a heap object. The flight pointer is its "intval" and its paired "lower" is at the same offset in the "aux" allocation (presumably also allocated as part of the frame?).

3. How are use-after-return errors prevented? Say I store a local pointer in a global variable and then return. Later, I call a new function which overwrites the original frame - can't I get a bogus `lower` this way?

Answer: no. Call frames are allocated by the GC, not the usual C stack. The global reference will keep the call frame alive.

That leads to the following program, which definitely should not work, and yet does. ~Amazing~ Unbelievable:

    #include <stdio.h>
    
    char *bottles[100];
    
    __attribute__((noinline))
    void beer(int count) {
        char buf[64];
        sprintf(buf, "%d bottles of beer on the wall", count);
        bottles[count] = buf;
    }
    
    int main(void) {
        for (int i=0; i < 100; i++) beer(i);
        for (int i=99; i >= 0; i--) puts(bottles[i]);
    }

jibal an hour ago

Hmmm ... there's a danger here that people will test their programs compiled with Fil-C and think that they are safe to compile with a "normal" compiler. I would hope for an option to flag any undefined behavior.

Panzerschrek an hour ago

Great effort, but I find the whole idea somewhat flawed. If one needs speed, he can't use this C implementation, because it's several times slower. If speed isn't important, why not just using a memory safe language? And if both are important, why not using Rust?

Recompiling existing software written in C using Fil-C isn't also a great idea, since some modifications are likely needed, at least for fixing bugs found with usage of Fil-C. And after these bugs are fixed, why continue using Fil-C?

mrunix an hour ago

Most software works flawlessly without modifications on Fil-C, the performance isn't *that* bad and there are applications where security is more important than performance (for example military applications)

nextaccountic 8 hours ago

Somewhat related, safe C++ proposal is not being continued

https://news.ycombinator.com/item?id=45234460

synergy20 11 hours ago

posted multiple times, x86 only last time I checked

pizlonator 8 hours ago

Yeah because I’m limiting my test matrix.
There’s nothing about how Fil-C is designed that constrains it to x86_64. It doesn’t strongly rely on x86’s memory model. It doesn’t strongly rely on 64-bit.
I’m focusing on one OS and arch until I have more contributors and so more bandwidth to track bugs across a wider set of platforms.
- khamidou 3 hours ago
  
  Hey Filip – while we're talking about memory architecture, have you started looking at ARM's EMTE extension (e.g https://security.apple.com/blog/memory-integrity-enforcement...)? Could it eventually replace invisicaps?
  
  pizlonator 3 hours ago
  
  No. MTE is probabilistic. Fil-C is deterministic.
lambdaone 10 hours ago

All the more reason to make it portable. I wonder if this can be implemented via LLVM?
- kragen 10 hours ago
  
  It is implemented via LLVM.

EverydayBalloon 5 hours ago

[dead]

dmitrygr 7 hours ago

TLDR: 4x slowdown in the normal case

the performance overhead of this approach for most programs makes them run about four times more slowly

pizlonator 7 hours ago

> TLDR: 4x slowdown in the normal case
4x slower isn't the normal case. 4x is at the upper end of the overheads you'll see.
- geraldog 6 hours ago
  
  That's good to know!
  C is immensely powerful, portable and probably as fast as you can go without hand-coding in the architecture-specific assembly. Most of the world's information systems (our cyberstructure) rely directly or indirectly on C. And don't get me wrong, I'm a great enthusiast of the idea of sticking to memory-safe languages like Rust from now on.
  The hard truth is will live with legacy C code, period. Pizlo's heroic effort bridges the gap so to speak, it kind of sandboxes userspace C in a way that inherently adds memory safety to legacy code. There are only a few corner cases now that can't be bothered by any slow-down vis-a-vis unsafe C, and the great majority of code across every industry would benefit much more from the reduced surface of exposure.

braggerxyz 2 hours ago

Every time I read C and memory safety, I just think Golang. Especially for user space

thomasmg an hour ago

Go programs are not fully memory-safe if they use multiple treads, due to possible data races with fat pointers: https://news.ycombinator.com/item?id=44672003