How NASA Built Artemis II’s Fault-Tolerant Computer

(cacm.acm.org)

68 points | by speckx 10 hours ago

6 comments

  • y1n0 7 minutes ago
    NASA didn't build this, Lockheed Martin and their subcontractors did. Articles and headlines like this make people think that NASA does a lot more than they actually do. This is like a CEO claiming credit for everything a company does.
  • dmk 1 hour ago
    The quote from the CMU guy about modern Agile and DevOps approaches challenging architectural discipline is a nice way of saying most of us have completely forgotten how to build deterministic systems. Time-triggered Ethernet with strict frame scheduling feels like it's from a parallel universe compared to how we ship software now.
    • ramraj07 1 hour ago
      I take the opposite message from that line - out of touch teams working on something so over budget and so overdue, and so bureaucratic, and with such an insanely poor history of success, and they talk as if they have cured cancer.

      This is the equivalent of Altavista touting how amazing their custom server racks are when Google just starts up on a rack of naked motherboards and eats their lunch and then the world.

      Lets at least wait till the capsule comes back safely before touting how much better they are than "DevOps" teams running websites, apparently a comparison that's somehow relevant here to stoke egos.

      • danhon 55 minutes ago
        You mean like this?

        "With limited funds, Google founders Larry Page and Sergey Brin initially deployed this system of inexpensive, interconnected PCs to process many thousands of search requests per second from Google users. This hardware system reflected the Google search algorithm itself, which is based on tolerating multiple computer failures and optimizing around them. This production server was one of about thirty such racks in the first Google data center. Even though many of the installed PCs never worked and were difficult to repair, these racks provided Google with its first large-scale computing system and allowed the company to grow quickly and at minimal cost."

        https://blog.codinghorror.com/building-a-computer-the-google...

      • bluegatty 15 minutes ago
        No, space is just hard.

        Everything is bespoke.

        You need 10x cost to get every extra '9' in reliability and manned flight needs a lot of nines.

        People died on the Apollo missions.

        It just costs that much.

        • arduanika 4 minutes ago
          Please, this is hacker news. Nothing else is hard outside of our generic software jobs, and we could totally solve any other industry in an afternoon.
          • geerlingguy 0 minutes ago
            I mean I can just replace Dropbox with a shell script.
      • HNisCIS 8 minutes ago
        What would you suggest? Vibe coding a react app that runs on a Mac mini to control trajectory? What happens when that Mac mini gets hit with an SEU or even a SEGR? Guess everyone just dies?
      • simoncion 53 minutes ago
        > ...they talk as if they have cured cancer.

        I'd chalk that up to the author of the article writing for a relatively nontechnical audience and asking for quotes at that level.

    • tayk47999 1 hour ago
      [dead]
  • jbritton 27 minutes ago
    I wonder how often problems happen that the redundancy solves. Is radiation actually flipping bits and at what frequency. Can a sun flare cause all the computers to go haywire.
  • object-a 38 minutes ago
    How big of a challenge are hardware faults and radiation for orbital data centers? It seems like you’d eat a lot of capacity if you need 4x redundancy for everything
    • totetsu 35 minutes ago
      They dont go into here.. but I thought that NASA also used like 250nm chips in space for radiation resistance. Are there even any radiation resistance GPUs out there?
      • pclmulqdq 33 minutes ago
        Absolutely not, although the latest fabs with rad-tolerant processors are at ~20 nm. There are FDSOI processes in that generation that I assume can be made radiation-tolerant.
      • linzhangrun 24 minutes ago
        It seems not; anti-interference primarily relies on using older manufacturing processes, including for military equipment, and then applying an anti-interference casing or hardware redundancy correction similar to ECC.
  • starkparker 9 hours ago
    Headline needs its how-dectomy reverted to make sense
    • arduanika 6 minutes ago
      (Off-topic:) Great word. Is that the usual word for it? Totally apt, and it should be the standard.
  • ConanRus 1 hour ago
    [dead]