zh3 14 hours ago

Seared into my soul is the experience porting a linux pipe-based application to Windows, thinking it's all posix and given it's all in memory the performance will be more or less the same. The performance was hideous, even after we found that having pipes waiting for a connection more or less ground windows to a halt.

Some years later this got revisited due to needing to use the same thing under C# on Win10 and while it was better it was still a major embarrassment how big the performance gap was.

  • dataflow 3 hours ago

    > The performance was hideous, even after we found that having pipes waiting for a connection more or less ground windows to a halt.

    When you say the performance was hideous, are you referring to I/O after the pipe is already connected/open, or before? The former would he surprising, but the latter not - opening and closing a ton of pipes is not something you'd expect an OS to be optimized for - and it would be somewhat surprising if your use case requires the latter.

    • zh3 2 hours ago

      Literally just having spare listening sockets, ready for incoming connections (and obv. not busy-waiting on them). Just reducing to the number actually in-use was the biggest speed-up - it was like Windows was busy-waiting internally for new connections (it wasn't a huge number either, something like 8 or 12).

  • SoftTalker 12 hours ago

    Well POSIX only defines behavior, not performance. Every platform and OS will have its own performance idiosyncracies.

    • klysm 11 hours ago

      How on earth would POSIX define performance of something like pipes?

      • SoftTalker 11 hours ago

        I was addressing "it's all posix and given it's all in memory the performance will be more or less the same."

        Not claiming that POSIX should or could attempt to address performance.

      • pjmlp 4 hours ago

        By using Big O notation, or deadlines like on RTOS APIs, as two possible examples on how to express performance on a standard.

  • andrewmcwatters 12 hours ago

    Did you find that you needed interprocess communication to replace the gap?

    • spacechild1 5 hours ago

      pipes are a form of interprocess communication :) I guess you meant shared memory?

      • andrewmcwatters 5 hours ago

        Yes. Yeah, you're right. Sockets could also be used, but I guess when I think of IPC, I generally think of shared memory.

johnisgood 14 hours ago

FWIW there is readv() / writev(), splice(), sendfile(), funopen(), and io_buffer() as well.

splice() is great when transferring data between pipes and UNIX sockets with zero-copy, but it is Linux-only.

splice() is the fastest and most efficient way to transfer data through pipes (on Linux), especially for large volumes. It bypasses memory allocations in userspace (as opposed to read(v)/write(v)), there is no extra buffer management logic, there is no memcpy() or iovec traversal.

Sadly on BSDs, for pipes, readv() / writev() is the most performant way to achieve the same if I am not mistaken. Please correct me if I am wrong.

At any rate, this is a great article.

  • messe 13 hours ago

    > sendfile() is file-to-socket (zero-copy as well), and has very high performance as well, for both Linux and BSDs. It only supports file-to-socket, however, and well, to stay relevant, sendmsg() can't be used with pipes in the general case, it is for UNIX domain sockets, INET sockets, and other socket types.

    On Linux, sendfile supports more than just file to socket, as it's implemented using splice. I've used it for file-to-block-device in the past.

    • johnisgood 12 hours ago

      On BSDs probably not, as they don't have splice, but that is good to know. I wonder if on BSDs it really is readv() and writev() that are the fastest way to achieve the same thing as has been done in the article. Maybe I am missing something. I would like to be corrected.

      • messe 12 hours ago

        AFAIK, neither OpenBSD nor NetBSD has sendfile. On FreeBSD, I think you're correct regarding it being file-to-socket only.

        • zambal 12 hours ago

          Indeed, if I'm not mistaken Netflix at least used to use (and commit to kernel) FreeBSD on content servers because of its superior sendfile performance

  • wavesquid 6 hours ago

    > splice() is the fastest and most efficient way to transfer data through pipes (on Linux), especially for large volumes. It bypasses memory allocations in userspace (as opposed to read(v)/write(v)), there is no extra buffer management logic, there is no memcpy() or iovec traversal.

    Proper use of io_uring should finally have it beat or at least matched.

  • tedunangst 9 hours ago

    Shared memory, like shm_open and fd passing, would be even faster and fully portable.

gigatexal 15 hours ago

This is such a dope article. I love that it comes from time to time.

lukeh 11 hours ago

Does modern Linux have anything close to Doors? I’ve an embedded application where two processes exchange small amounts of data which are latency sensitive, and I’m wondering if there’s anything better than AF_UNIX.

  • the8472 9 hours ago

    shared memory provides the lowest latency, but you still need to deal with task wakeup, which is usually done via futexs. Google was working on a FUTEX_SWAP call for linux which would have allowed direct handover from one task to another, not sure what happened to that.

    • Galanwe 2 hours ago

      If you really want low latency, then you should be OK to trade power/CPU for it, and you can just spin instead of being woken up.

  • mort96 9 hours ago

    Would be helpful to know what your problem is with AF_UNIX at the moment. Is it lacking in features you want? Is it higher latency than you'd want? Is the server/client socket API style not appropriate for your use-case?

    • lukeh 9 hours ago

      Well, it’s probably fine but, it’s an audio application where metering (not audio) is delivered from a control plane process to a UI process. Lower latency is better. But haven’t measured it.

aeonik 18 hours ago

I feel bad that this doesn't have any comments, the article was really great.

I'd like to use splice more, but the end of the article talked about the security implications and some ABI breaking.

I'm curious to know if long term plans are to keep splice around?

I'd also be curious how hard it would be to patch the default pipe to always use splice for performance improvements.