Alice is impatient

(brooker.co.za)

36 points | by birdculture 3 hours ago

5 comments

  • trb 1 hour ago
    Considering other metrics then p99 for user impact is unwise. All users will at some point experience a <1% request, it's not like half of all users will only send requests what will be under your median latency, some of their requests will hit your worst-case.

    By focusing on the tail and optimizing worst cases you help users more than by improving your median latency.

  • rustybolt 1 hour ago
    This article contains very little substance. Show me the math!
    • AgentOrange1234 1 hour ago
      Yes I found this very hard to follow. I appreciate expressing ideas in math like E_a[X] as much as the next guy, but there is no definition or even description of what the heck E or E_a or Var(x) even mean, so how is anyone supposed to understand the reasoning here? All I get from this is a claim that experienced latency is different than the mean, which sounds important, but I still have no intuition as to why this is. Which is sad, because Booker's blog is often deeply amazing.
      • NightMKoder 40 minutes ago
        This is standard statistics terminology - E(X) is https://en.wikipedia.org/wiki/Expected_value . E_a is presumably Alice's perceived expected value. Var(X) is https://en.wikipedia.org/wiki/Variance . The law of large numbers says the arithmetic average of observations becomes E(X) with enough samples.

        I'm pretty sure what the author is saying is:

        E(X) =:= \sum_t(t * P(X = t)) is the definition

        another important note is P(X^2 = t^2) = P(X = t) - because it's the same distribution.

        E_a(X) is a bit sloppy, but consider X_a aka Alice's latency "experience" distribution. The argument is:

        P(X_a = t) = t * P(X = t) / \sum_u(u * P(X = u)) - i.e. scale the probability up by t but make it sum to 1.

        Then

        E(X_a) = \sum_t(t * P(X_a = t)) = \sum_t(t * t * P(X = t) / \sum_u(u * P(X = u))

        aka

        E(X^2) / E(X)

        Then (from wikipedia)

        Var(X) = E(X^2) - (E(X))^2

        And we get

        E(X_a) = (Var(X) + (E(X))^2) / E(X) = E(X) + Var(X) / E(X)

  • zaik 1 hour ago
    Is the formula for E_a[X] trivial? I don't see it immediately...
  • perching_aix 1 hour ago
    I've grown to dislike the typical tail measurements completely. What I usually look at these days is what share of unique users experience an "unacceptable experience" over a measurement period instead.

    I find it much more inquisitive and visceral, to the extent that p99 now boggles my mind. 2N would be dreadful as an availability figure, yet for UX it's treated very different. So much so that my measurements corroborate exactly that; good UX requires the same many-nines reliability as e.g. DCs, not one or two.

    I wonder if it's p90 and p99 to blame for the shoddy services we have, in a way. It's pretty hard to argue for fixing something when it's presented as only going wrong 0.5% or less of the time after all. Even if at scale that means most of your users are experiencing it weekly.