Huge Binaries

(fzakaria.com)

127 points | by todsacerdoti 9 hours ago

11 comments

yjftsjthsd-h 7 hours ago
> I had observed binaries beyond 25GiB, including debug symbols. How is this possible? These companies prefer to statically build their services to speed up startup and simplify deployment. Statically including all code in some of the world’s largest codebases is a recipe for massive binaries.
I am very sympathetic to wanting nice static binaries that can be shipped around as a single artifact[0], but... surely at some point we have to ask if it's worth it? If nothing else, that feels like a little bit of a code smell; surely if your actual executable code doesn't even fit in 2GB it's time to ask if that's really one binary's worth of code or if you're actually staring at like... a dozen applications that deserve to be separate? Or get over it the other way and accept that sometimes the single artifact you ship is a tarball / OCI image / EROFS image for systemd[1] to mount+run / self-extracting archive[2] / ...
[0] Seriously, one of my background projects right now is trying to figure out if it's really that hard to make fat ELF binaries.
[1] https://systemd.io/PORTABLE_SERVICES/
[2] https://justine.lol/ape.html > "PKZIP Executables Make Pretty Good Containers"
[-]
- jmmv 6 hours ago
  This is something that always bothered me while I was working at Google too: we had an amazing compute and storage infrastructure that kept getting crazier and crazier over the years (in terms of performance, scalability and redundancy) but everything in operations felt slow because of the massive size of binaries. Running a command line binary? Slow. Building a binary for deployment? Slow. Deploying a binary? Slow.
  The answer to an ever-increasing size of binaries was always "let's make the infrastructure scale up!" instead of "let's... not do this crazy thing maybe?". By the time I left, there were some new initiatives towards the latter and the feeling that "maybe we should have put limits much earlier" but retrofitting limits into the existing bloat was going to be exceedingly difficult.
  [-]
  - joatmon-snoo 3 hours ago
    There's a lot of tooling built on static binaries:
    - google-wide profiling: the core C++ team can collect data on how much of fleet CPU % is spent in absl::flat_hash_map re-bucketing (you can find papers on this publicly)
    - crashdump telemetry
    - dapper stack trace -> codesearch
    Borg literally had to pin the bash version because letting the bash version float caused bugs. I can't imagine how much harder debugging L7 proxy issues would be if I had to follow a .so rabbit hole.
    I can believe shrinking binary size would solve a lot of problems, and I can imagine ways to solve the .so versioning problem, but for every problem you mention I can name multiple other probable causes (eg was startup time really execvp time, or was it networked deps like FFs).
    [-]
    - Filligree 1 hour ago
      There’s no way my proxy binary actually requires 25GB of code, or even the 3GB it is. Sounds to me like the answer is a tree shaker.
      [-]
      - Sesse__ 1 hour ago
        Google implemented the C++ equivalent of a tree shaker in their build system around 2009.
- jcelerier 1 hour ago
  What's wild to me is not using -gsplit-dwarf to have separate debug info and "normal-sized" binaries
- shevy-java 1 hour ago
  > https://systemd.io/PORTABLE_SERVICES/
  Systemd and portable?
- forrestthewoods 5 hours ago
  If you have 25gb of executables then I don’t think it matters if that’s one binary executable or a hundred. Something has gone horribly horribly wrong.
  I don’t think I’ve ever seen a 4gb binary yet. I have seen instances where a PDB file hit 4gb and that caused problems. Debug symbols getting that large is totally plausible. I’m ok with that at least.
  [-]
  - wolfi1 3 hours ago
    I did, it was a Spring Boot fat jar with a NLP, I had to deploy it to the biggest instance AWS could offer, the costs were enormous
  - throwawaymobule 2 hours ago
    A few ps3 games I've seen had 4GB or more binaries.
    This was a problem because code signing meant it needed to be completely replaced by updates.
    [-]
    - swiftcoder 29 minutes ago
      > A few ps3 games I've seen had 4GB or more binaries.
      Is this because they are embedding assets into the binary? I find it hard to believe anyone was carrying around enough code to fill 4GB in the PS3 era...
10000truths 4 hours ago
Debug symbol size shouldn't be influencing relocation jump distances - debug info has its own ELF section.
Regardless of whether you're FAANG or not, nothing you're running should require an executable with a 2 GB large .text section. If you're bumping into that limit, then your build process likely lacks dead code elimination in the linking step. You should be using LTO for release builds. Even the traditional solution (compile your object files with -ffunction-sections and link with --gc-sections) does a good job of culling dead code at function-level granularity.
[-]
- saagarjha 3 hours ago
  Google Chrome ships as a 500 MB binary on my machine, so if you're embedding a web browser, that's how much you need minimum. Now tack on whatever else your application needs and it's easy to see how you can go past 2 GB if you're not careful. (To be clear, I am not making a moral judgment here, I am just saying it's possible to do. Whether it should happen is a different question.)
  [-]
  - throwawaymobule 1 hour ago
    Do you have some special setup?
    Chromium is in the hundred and something MB range on mine last I looked. Might expand to more on install.
    [-]
    - saagarjha 1 hour ago
      I just checked Google Chrome Framework on my Mac, it was a little over 400 MB. Although now that I think about it it's probably a universal binary so you can cut that in half?
- yablak 3 hours ago
  FAANGs we're deeply involved in designing LTO. See, e.g.,
  https://research.google/pubs/thinlto-scalable-and-incrementa...
  And other refs.
  And yet...
  [-]
  - jeffbee 1 hour ago
    Google also uses identical code folding. It's a pretty silly idea that a shop that big doesn't know about the compiler flags.
yablak 3 hours ago
> We would like to keep our small code-model. What other strategies can we pursue?
Move all the hot BBs near each other, right?
Facebook's solution: https://github.com/llvm/llvm-project/blob/main/bolt%2FREADME...
Google's:
https://lists.llvm.org/pipermail/llvm-dev/2019-September/135...
meisel 57 minutes ago
> Responses to my publication submissions often claimed such problems did not exist
I see this often even in communities of software engineers, where people who are unaware of certain limitations at scale will announce that the research is unnecessary
stncls 6 hours ago
> The simplest solution however is to use -mcmodel=large which changes all the relative CALL instructions to absolute JMP.
Makes sense, but in the assembly output just after, there is not a single JMP instruction. Instead, CALL <immediate> is replaced with putting the address in a 64-bit register, then CALL <register>, which makes even more sense. But why mention the JMP thing then? Is it a mistake or am I missing something? (I know some calls are replaced by JMP, but that's done regardless of -mcmodel=large)
[-]
- dwattttt 2 hours ago
  I would assume loose language, referring to a CALL as a JMP. However of the two reasons given to dislike the large code model, register pressure isn't relevant to that particular snippet.
  It's performing a call, ABIs define registers that are not preserved over calls; writing the destination to one of those won't affect register pressure.
wyldfire 1 hour ago
> What other strategies can we pursue?
You can use thunks/trampolines. lld can make them for some architectures, presumably also for x86_64. Though I don't know why it didn't in your case.
But, like the large code model it can be expensive to add trampolines, both in icache performance and just execution if a trampoline is in a particularly hot path.
[-]
- setheron 7 minutes ago
  In many ways that is what the PLT is also.
  This is what my next post will explore. I ran into some issues with the GOT that I'll have to explore solutions for.
  I'm writing this for myself mostly. The whole idea for code models when you have thunks feels unnecessary.
doubletwoyou 8 hours ago
25 GiB for a single binary sounds horrifying
at some point surely some dynamic linking is warranted
[-]
- nneonneo 7 hours ago
  To be fair, this is with debug symbols. Debug builds of Chrome were in the 5GB range several years ago; no doubt that’s increased since then. I can remember my poor laptop literally running out of RAM during the linking phase due to the sheer size of the object files being linked.
  Why are debug symbols so big? For C++, they’ll include detailed type information for every instantiation of every type everywhere in your program, including the types of every field (recursively), method signatures, etc. etc., along with the types and locations of local variables in every method (updated on every spill and move), line number data, etc. etc. for every specialization of every function. This produces a lot of data even for “moderate”-sized projects.
  Worse: for C++, you don’t win much through dynamic linking because dynamically linking C++ libraries sucks so hard. Templates defined in header files can’t easily be put in shared libraries; ABI variations mean that dynamic libraries generally have to be updated in sync; and duplication across modules is bound to happen (thanks to inlined functions and templates). A single “stuck” or outdated .so might completely break a deployment too, which is a much worse situation than deploying a single binary (either you get a new version or an old one, not a broken service).
  [-]
  - yjftsjthsd-h 7 hours ago
    Can't debug symbols be shipped as separate files?
    [-]
    - bregma 3 hours ago
      The problem is that when a final binary is linked everything goes into it. Then, after the link step, all the debug information gets stripped out into the separate symbols file. That means at some point during the build the target binary file will contain everything. I can not, for example, build clang in debug mode on my work machine because I have only 32 GB of memory and the OOM killer comes out during the final link phase.
      Of course, separate binaries files make no difference at runtime since only the LOAD segments get loaded (by either the kernel or the dynamic loader, depending). The size of a binary on disk has little to do with the size of a binary in memory.
      [-]
      - jcelerier 1 hour ago
        > The problem is that when a final binary is linked everything goes into it
        I don't think that's the case on Linux, when using -gsplit-dwarf the debug info is put in separate files at the object file level, they are never linked into binaries.
    - yablak 3 hours ago
      Yes, but it can be more of a pain keeping track of pairs. In production though, this is what's done. And given a fault, the debug binary can be found in a database and used to gdb the issue given the core. You do have to limit certain online optimizations in order to have useful tracebacks.
      This also requires careful tracking of prod builds and their symbol files... A kind of symbol db.
  - tempay 7 hours ago
    I’ve seen LLVM dependent builds hit well over 30GB. At that point it started breaking several package managers.
  - 01HNNWZ0MV43FF 7 hours ago
    I've hit the same thing in Rust, probably for the same reasons.
    Isn't the simple solution to use detached debug files?
    I think Windows and Linux both support them. That's how phones like Android and iOS get useful crash reports out of small binaries, they just upload the stack trace and some service like Sentry translates that back into source line numbers. (It's easy to do manually too)
    I'm surprised the author didn't mention it first. A 25 GB exe might be 1 GB of code and 24 GB of debug crud.
    [-]
    - nicoburns 7 minutes ago
      > Isn't the simple solution to use detached debug files?
      It should be. But the tooling for this kind of thing (anything to do with executable formats including debug info and also things like linking and cross-compilation) is generally pretty bad.
    - dwattttt 3 hours ago
      > I think Windows and Linux both support them.
      Detached debug files has been the default (only?) option in MS's compiler since at least the 90s.
      I'm not sure at what point it became hip to do that around Linux.
- 0xbadcafebee 7 hours ago
  To be fair, they worked at Google, their engineering decisions are not normal. They might just decide that 25 GiB binaries are worth a 0.25% speedup at start time, potentially resulting in tens of millions of dollars' worth of difference. Nobody should do things the way Google does, but it's interesting to think about.
- flohofwoe 4 hours ago
  The overall size wouldn't get smaller just because it is dynamically linked, on the contrary (because DLLs are a dead code elimination barrier). 25 GB is insane either way, something must have gone horribly wrong very early in the development process (also why, even ship with debug information included, that doesn't make sense in the first place).
shevy-java 1 hour ago
25GB seems excessive, but I keep on having the basic compile toolchain as statically compiled executables. It simply works better when things go awry.
a_t48 7 hours ago
I've seen terrible, terrible binary sizes with Eigen + debug symbols, due to how Eigen lazy evaluation works (I think). Every math expression ends up as a new template instantiation.
[-]
- forrestthewoods 5 hours ago
  Eigen is one of the worst libraries when it comes to both exe size and compile times. <shudder>
gerikson 8 hours ago
The HN de-sensationalize algo for submission titles needs tweaking. Original title is simply "Huge Binaries".
[-]
- acosmism 6 hours ago
  agreed. Binaries is a bit too sensational for my taste. this can be further optimized.
  [-]
  - fuzzfactor 26 minutes ago
    "Files So Big They Might As Well Be Trinaries".
  - binaryturtle 5 hours ago
    "Bins"? :)
    [-]
    - bayindirh 4 hours ago
      01.
      Why not?
      [-]
      - DHRicoF 3 hours ago
        False