Inference cost at scale with napkin math

(injuly.in)

37 points | by gmays 4 days ago

2 comments

BadBadJellyBean 3 minutes ago
I'd like to see a bit of the running costs inside the napkin math. Power, cooling, maintenance, rent, etc. are probably significant factors as well.
smalltorch 1 hour ago
>This largely depends on whether you own or rent your hardware. At $40,000 per B200, your lifetime cost per user is 40_000/num_users. In the 100% duty cycle case (worst for cost), that's 6k$ per user. Realistically, serving 300 users per GPU you'll spend a lifetime cost of about $133 per user, plus the datacenter/upkeep bill. If you rent the GPU, the cost is more straightforward. At an hourly rate of $43, your hourly cost per user is 4/num_users. For num_users=300 you get an hourly rate of about $0.013 per user, or $9.36 per month.
This leads me to believe you can buy a GPU but leave it at a data center?
Do people do this? I don't understand. Or are you equating upkeep bill to electricity on premises?
[-]
- __s 1 hour ago
  You can, people do. https://www.linkedin.com/posts/activity-7409593739138060288-...
  [-]
  - smalltorch 1 hour ago
    So what's the cost separating them from placing this box at their premise?
    Network throughout?
    [-]
    - namibj 58 minutes ago
      Plus power and cooling.