From my personal experience, buffered and unbuffered writers are different enough that I think it’s a bit of a mistake to make them indistinguishable to the type system. An unbuffered writer sends the data out of process immediately. A buffered writer usually doesn’t, so sleeping after a write (or just doing something else and not writing more for a while) will delay the write indefinitely. An unbuffered write does not do this.
This means that plenty of algorithms are correct with unbuffered writers and are incorrect with buffered writers. I’ve been bitten by this and diagnosed bugs caused by this multiple times.
Meanwhile an unbuffered writer has abysmal performance if you write a byte at a time.
I’d rather see an interface (trait, abstract class, whatever the language calls it) for a generic writer, with appropriate warnings that you probably don’t want to use it unless you take specific action to address its shortcomings, and subtypes for buffered and unbuffered writers.
And there could be a conditional buffering wrapper that temporarily adds buffering to a generic writer and is zero-cost if applied to an already buffered writer. A language with enforced borrowing semantics like Rust could make this very hard to misuse. But even Python could do it decently well, e.g.:
w: MaybeBufferedByteWriter
with io.LocalBuffer(w) as bufwriter:
do stuff with bufwriter
It's an interesting choice, but every writer now needs to handle:
1) vectored i/o (array of arrays, lots of fun for cache lines)
2) buffering
3) a splat optimization for compression? (skipped over in this post, but mentioned in an earlier one)
I'm skeptical here, but I guess we will see if adding this overhead on all I/O is a win. Devirtualization helps _sometimes_ but when you've got larger systems it's entirely possible you've got sync and async I/O in the same optimization space and lose out on optimization opportunities.
In practice, I/O stacks tend to consist of a lot of composition, and in many cases, leak a lot of abstractions. Buffering is one part, corking/backpressure is another (neither of which is handled here, but I might be mistaken). In some cases, you've got meaningful framing on streams that needs to be maintained (or decorated with metadata).
If it works out, I suppose this will be a new I/O paradigm. In fairness, nobody has _really_ solved I/O yet, so maybe a brave new swing is what we need.
I agree on the last point of the lack of composition here.
While it's true that writers need to be aware of buffering to make use of fancy syscalls, implementing that should be an option, but not a requirement.
Naively this would mean implementing one of two APIs in an interface, which ruins the direct peformance. So I see why the choice was made, but I still hope for something better.
It's probably not possible with zig's current capabilities, but I would ideally like to see a solution that:
- Allows implementations to know at comptime what the interface actually implements and optimize for that (is buffering supported? Can you get access to the buffer inplace for zero copy?).
- For the generic version (which is in the vtable), choose one of the methods and wrap it (at comptime).
There's so many directions to take Zig into (more types? more metaprogramming? closer to metal?) so it's always interesting to see new developments!
> While it's true that writers need to be aware of buffering to make use of fancy syscalls, implementing that should be an option, but not a requirement.
Buffering is implemented and handled in the vtable struct itself, the writers (implentations of the interface) themselves don't actually have to know or care about it other than passing through the user-provided buffer when initializing the vtable.
If you don't want buffering, you can pass a zero-length buffer upon creation, and it'll get optimized out. This optimization doesn't require devirtualization because the buffering happens before any virtual function calls.
I wonder if making this change will improve design of buffering across IO implementers because buffering needs consideration upfront, rather than treatment as some feature bolted on the side?
It’s a good sacrifice if the redesign, whilst being more complicated, is avoiding an oversimplified abstraction which end up restricting optimisation opportunities.
From my personal experience, buffered and unbuffered writers are different enough that I think it’s a bit of a mistake to make them indistinguishable to the type system. An unbuffered writer sends the data out of process immediately. A buffered writer usually doesn’t, so sleeping after a write (or just doing something else and not writing more for a while) will delay the write indefinitely. An unbuffered write does not do this.
This means that plenty of algorithms are correct with unbuffered writers and are incorrect with buffered writers. I’ve been bitten by this and diagnosed bugs caused by this multiple times.
Meanwhile an unbuffered writer has abysmal performance if you write a byte at a time.
I’d rather see an interface (trait, abstract class, whatever the language calls it) for a generic writer, with appropriate warnings that you probably don’t want to use it unless you take specific action to address its shortcomings, and subtypes for buffered and unbuffered writers.
And there could be a conditional buffering wrapper that temporarily adds buffering to a generic writer and is zero-cost if applied to an already buffered writer. A language with enforced borrowing semantics like Rust could make this very hard to misuse. But even Python could do it decently well, e.g.:
It's an interesting choice, but every writer now needs to handle:
1) vectored i/o (array of arrays, lots of fun for cache lines)
2) buffering
3) a splat optimization for compression? (skipped over in this post, but mentioned in an earlier one)
I'm skeptical here, but I guess we will see if adding this overhead on all I/O is a win. Devirtualization helps _sometimes_ but when you've got larger systems it's entirely possible you've got sync and async I/O in the same optimization space and lose out on optimization opportunities.
In practice, I/O stacks tend to consist of a lot of composition, and in many cases, leak a lot of abstractions. Buffering is one part, corking/backpressure is another (neither of which is handled here, but I might be mistaken). In some cases, you've got meaningful framing on streams that needs to be maintained (or decorated with metadata).
If it works out, I suppose this will be a new I/O paradigm. In fairness, nobody has _really_ solved I/O yet, so maybe a brave new swing is what we need.
I agree on the last point of the lack of composition here.
While it's true that writers need to be aware of buffering to make use of fancy syscalls, implementing that should be an option, but not a requirement.
Naively this would mean implementing one of two APIs in an interface, which ruins the direct peformance. So I see why the choice was made, but I still hope for something better.
It's probably not possible with zig's current capabilities, but I would ideally like to see a solution that:
- Allows implementations to know at comptime what the interface actually implements and optimize for that (is buffering supported? Can you get access to the buffer inplace for zero copy?).
- For the generic version (which is in the vtable), choose one of the methods and wrap it (at comptime).
There's so many directions to take Zig into (more types? more metaprogramming? closer to metal?) so it's always interesting to see new developments!
> While it's true that writers need to be aware of buffering to make use of fancy syscalls, implementing that should be an option, but not a requirement.
Buffering is implemented and handled in the vtable struct itself, the writers (implentations of the interface) themselves don't actually have to know or care about it other than passing through the user-provided buffer when initializing the vtable.
If you don't want buffering, you can pass a zero-length buffer upon creation, and it'll get optimized out. This optimization doesn't require devirtualization because the buffering happens before any virtual function calls.
I wonder if making this change will improve design of buffering across IO implementers because buffering needs consideration upfront, rather than treatment as some feature bolted on the side?
It’s a good sacrifice if the redesign, whilst being more complicated, is avoiding an oversimplified abstraction which end up restricting optimisation opportunities.