Inference cost at scale with napkin math

(injuly.in)

37 points | by gmays 4 days ago

2 comments

  • BadBadJellyBean 3 minutes ago
    I'd like to see a bit of the running costs inside the napkin math. Power, cooling, maintenance, rent, etc. are probably significant factors as well.
  • smalltorch 1 hour ago
    >This largely depends on whether you own or rent your hardware. At $40,000 per B200, your lifetime cost per user is 40_000/num_users. In the 100% duty cycle case (worst for cost), that's 6k$ per user. Realistically, serving 300 users per GPU you'll spend a lifetime cost of about $133 per user, plus the datacenter/upkeep bill. If you rent the GPU, the cost is more straightforward. At an hourly rate of $43, your hourly cost per user is 4/num_users. For num_users=300 you get an hourly rate of about $0.013 per user, or $9.36 per month.

    This leads me to believe you can buy a GPU but leave it at a data center?

    Do people do this? I don't understand. Or are you equating upkeep bill to electricity on premises?