noneabove1182@sh.itjust.worksM to

LocalLLaMA@sh.itjust.worksEnglish · 1 year ago

TensorRT-LLM evaluation of the new H200 GPU achieves 11,819 tokens/s on Llama2-13B

0

9

TensorRT-LLM evaluation of the new H200 GPU achieves 11,819 tokens/s on Llama2-13B

noneabove1182@sh.itjust.worksM to

LocalLLaMA@sh.itjust.worksEnglish · 1 year ago

0

H200 is up to 1.9x faster than H100. This performance is enabled by H200’s larger, faster HBM3e memory.

https://nvidianews.nvidia.com/news/nvidia-supercharges-hopper-the-worlds-leading-ai-computing-platform

You must log in or register to comment.

Chat

LocalLLaMA@sh.itjust.works

localllama@sh.itjust.works

You are not logged in. However you can subscribe from another Fediverse account, for example Lemmy or Mastodon. To do this, paste the following into the search field of your instance: !localllama@sh.itjust.works

Community to discuss about LLaMA, the large language model created by Meta AI.

This is intended to be a replacement for r/LocalLLaMA on Reddit.

Visibility: Public

This community can be federated to other instances and be posted/commented in by their users.

9 users / day
11 users / week
11 users / month
341 users / 6 months
1 local subscriber
2.24K subscribers
182 Posts
667 Comments
Modlog