NVIDIA Inference Breakthrough Makes Conversational AI Smarter, More Interactive From Cloud to Edge
NVIDIA today launched TensorRT™ 8, the eighth generation of the company’s AI software, which slashes inference time in half for language queries — enabling developers to build the world’s best-performing search engines, ad recommendations and chatbots and offer them from the cloud to the edge.
TensorRT 8’s optimizations deliver record-setting speed for language applications, running BERT-Large, one of the world’s most widely used transformer-based models, in 1.2 milliseconds. In the past, companies had to reduce their model size, which resulted in significantly less accurate results. Now, with TensorRT 8, companies can double or triple their model size to achieve dramatic improvements in accuracy.