Releases88
Frequency1 week 3 days
Last Release
Stars13.8K
TensorRT LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and supports state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT LLM also contains components to create Python and C++ runtimes that orchestrate the inference execution in a performant way.
Subscribe above to receive notifications when new versions are released.
VersionDate
Stability
Stability is determined by the version string and my be inaccurate.
v0.13.0 Stable
v0.12.0 Stable
github/v0.12.0 Unknown
v0.11.0 Stable
v0.10.0 Stable
v0.9.0 Stable
v0.8.0 Stable
gemma Unknown
v0.7.1 Stable
v0.7.0 Stable
v0.6.1 Stable
v0.6.0 Stable
v0.5.0 Stable