noneabove1182@sh.itjust.worksM to LocalLLaMA@sh.itjust.worksEnglish · 1 year agoTensorRT-LLM evaluation of the new H200 GPU achieves 11,819 tokens/s on Llama2-13Bgithub.comexternal-linkmessage-square0fedilinkarrow-up19arrow-down10file-text
arrow-up19arrow-down1external-linkTensorRT-LLM evaluation of the new H200 GPU achieves 11,819 tokens/s on Llama2-13Bgithub.comnoneabove1182@sh.itjust.worksM to LocalLLaMA@sh.itjust.worksEnglish · 1 year agomessage-square0fedilinkfile-text
H200 is up to 1.9x faster than H100. This performance is enabled by H200’s larger, faster HBM3e memory. https://nvidianews.nvidia.com/news/nvidia-supercharges-hopper-the-worlds-leading-ai-computing-platform