Clojure: Transducers

(clojure.org)

85 points | by tosh 2 days ago

10 comments

  • drob518 1 hour ago
    Transducers work even better with a Clojure library called Injest. It has macros similar to the standard Clojure threading macros except Injest’s macros will recognize when you’re using transducers and automatically compose them correctly. You can even mix and match transducers and non-transducer functions and Injest will do its best to optimize the sequence of operations. And wait, there’s more! Injest has a parallelizing macro that will use transducers with the Clojure reducers library for simple and easy use of all your cores. Get it here: https://github.com/johnmn3/injest

    Note: I’m not the author of Injest, just a satisfied programmer.

  • adityaathalye 1 hour ago
    May I offer a little code riff slicing FizzBuzz using transducers, as one would do in practice, in real code (as in not a screening interview round).

    Demo One: Computation and Output format pulled apart

      (def natural-nums (rest (range)))
    
      (def fizz-buzz-xform
        (comp (map basic-buzz)
              (take 100))) ;; early termination
    
      (transduce fizz-buzz-xform ;; calculate each step
                 conj ;; and use this output method
                 []   ;; to pour output into this data structure
                 natural-nums)
    
      (transduce fizz-buzz-xform ;; calculate each step
                 str ;; and use this output method
                 ""  ;; to catenate output into this string
                 natural-nums) ;; given this input
    
      (defn suffix-comma  [s]  (str s ","))
    
      (transduce (comp fizz-buzz-xform
                       (map suffix-comma)) ;; calculate each step
                 str ;; and use this output method
                 ""  ;; to catenate output into this string
                 natural-nums) ;; given this input
    
    Demos two and three for your further entertainment are here: https://www.evalapply.org/posts/n-ways-to-fizzbuzz-in-clojur...

    (edit: fix formatting, and kill dangling paren)

  • bjoli 2 hours ago
    I made srfi-171 [0], transducers for scheme. If you have any questions about them in general I can probably answer them. My version is pretty similar to the clojure version judging by the talks Rich Hickey gave on them.

    I know a lot of people find them confusing.

    0: https://srfi.schemers.org/srfi-171/srfi-171.html

  • talkingtab 42 minutes ago
    When I first read about transducers I was wowed. For example, if I want to walk all the files on my computer and find the duplicate photos in the whole file system, transducers provide a conveyor belt approach. And whether there are saving in terms of memory or anything, maybe. But the big win for me was to think about the problem as pipes instead of loops. And then if you could add conditionals and branches it is even easier to think about. At least I find it so.

    I tried to implement transducers in JavaScript using yield and generators and that worked. That was before async/await, but now you can just `await readdir("/"); I'm unclear as to whether transducers offer significant advantages over async/await?

    [[Note: I have a personal grudge against Java and since Clojure requires Java I just find myself unable to go down that road]]

    • justinhj 4 minutes ago
      You could always try ClojureScript
  • pjmlp 57 minutes ago
    Nowadays you can make use of some transducers ideas via gatherers in Java, however it isn't as straightforward as in plain Clojure.
  • thih9 1 hour ago
    • adityaathalye 1 hour ago
      I'd reckon most of Clojure is from ten years ago. Excellent backward compatibility, you see :) cf. https://hopl4.sigplan.org/details/hopl-4-papers/9/A-History-...
    • whalesalad 1 hour ago
      It's a blessing and a curse that zero innovation has occurred in the Clojure space since 2016. Pretty sure the only big things has been clojure.spec becoming more mainstream and the introduction of deps.edn to supplant lein. altho I am still partial to lein.
      • seancorfield 17 minutes ago
        Clojure 1.9: Spec.

        Clojure 1.10: datafy/nav + tap> which has spawned a whole new set of tooling for exploring data.

        Clojure 1.11: portable math (clojure.math, which also works on ClojureScript).

        Clojure 1.12: huge improvements in Java interop.

        And, yes, the new CLI and deps.edn, and tools.build to support "builds as programs".

        • whalesalad 0 minutes ago
          Things have surely happened and the language has improved, but would you consider any of this to be innovative?
  • eduction 1 hour ago
    The key insight behind transducers is that a ton of performance is lost not to bad algorithms or slow interpreters but to copying things around needlessly in memory, specifically through intermediate collections.

    While the mechanics of transducers are interesting the bottom line is they allow you to fuse functions and basic conditional logic together in such a way that you transform a collection exactly once instead of n times, meaning new allocation happens only once. Once you start using them you begin to see intermediate collections everywhere.

    Of course, in any language you can theoretically do everything in one hyperoptimized loop; transducers get you this loop without much of a compromise on keeping your program broken into simple, composable parts where intent is very clear. In fact your code ends up looking nearly identical (especially once you learn about eductions… cough).

    • fud101 1 hour ago
      These sound wild in terms of promise but I never understood them in a practical way.
      • moomin 1 hour ago
        They're not really that interesting. They're "reduce transformers". So, take a reduction operation, turn it into an object, define a way to convert one reduction operation into another and you're basically done. 99% of the time they're basically mapcat.

        The real thing to learn is how to express things in terms of reduce. Once you've understood that, just take a look at e.g. the map and filter transducers and it should be pretty obvious. But it doesn't work until you've grasped the fundamentals.

      • eduction 32 minutes ago
        Canonical example is rewriting a non transducing set of collection transformations like

           (->> posts
              (map with-user)
              (filter authorized?)
              (map with-friends)
              (into []))
        
        That’s five collections, this is two, using transducers:

            (into []
                  (comp
                    (map with-user)
                    (filter authorized?)
                    (map with-friends))
                  posts)
        
        A transducer is returned by comp, and each item within comp is itself a transducer. You can see how the flow is exactly like the double threading macro.

        map for example is called with one arg, this means it will return a transducer, unlike in the first example when it has a second argument, the coll posts, so immediately runs over that and returns a new coll.

        The composed transducer returned by comp is passed to into as the second of three arguments. In three argument form, into applies the transducer to each item in coll, the third argument. In two argument form, as in the first example, it just puts coll into the first argument (also a coll).

        • kccqzy 18 minutes ago
          That does not sound like a good example. The two-argument form of `map` already returns a lazy sequence. Same for `filter`. I thought lazy sequences are already supposed to get rid of the performance problem of materializing the entire collection. So
          • eduction 12 minutes ago
            Lazy sequences reduce the size of intermediate collections but they “chunk” - you get 32 items at a time, multiply that by however many transformations you have and obviously by the size of the items.

            There are some additional inefficiencies in terms of context capturing at each lazy transformation point. The problem gets worse outside of a tidy immediate set of transformations like you’ll see in any example.

            This article gives a good overview of the inefficiencies, search on “thunk” for tldr. https://clojure-goes-fast.com/blog/clojures-deadly-sin/ (I don’t agree with its near condemnation of the whole lazy pattern (laziness is quite useful - we can complain about it because we have it, it would suck if we didn’t).)

            • eduction 0 minutes ago
              This, by the way, is why the lead example in the original linked post in clojure.org is very much like mine.
  • instig007 18 minutes ago
    You get this for free in Haskell, and you also save on not having to remember useless terminology for something that has no application on their own outside Foldables anyways.
    • Maxatar 9 minutes ago
      >...you also save on not having to remember useless terminology...

      It may be true in this particular case, but in my admittedly brief experience using Haskell you absolutely end up having to remember a hell of a lot of useless terminology for incredibly trivial things.

  • mannycalavera42 2 hours ago
    transducers and async flow are :chefkiss
  • faraway9911 1 hour ago
    [dead]