Performance Optimization
Dec 20, 2024
Accelerating GPU Analytics Using RAPIDS and Ray
RAPIDS is a suite of open-source GPU-accelerated data science and AI libraries that are well supported for scale-out with distributed engines like Spark and...
4 MIN READ
Dec 05, 2024
Unified Virtual Memory Supercharges pandas with RAPIDS cuDF
cuDF-pandas, introduced in a previous post, is a GPU-accelerated library that accelerates pandas to deliver significant performance improvements—up to 50x...
5 MIN READ
Oct 03, 2024
Event: NVIDIA cuOpt at INFORMS 2024
Join NVIDIA cuOpt engineers at INFORMS 2024 on October 22-23 to learn how to revolutionize accelerated computing.
1 MIN READ
Sep 24, 2024
Accelerating Leaderboard-Topping ASR Models 10x with NVIDIA NeMo
NVIDIA NeMo has consistently developed automatic speech recognition (ASR) models that set the benchmark in the industry, particularly those topping the Hugging...
13 MIN READ
Sep 11, 2024
Constant Time Launch for Straight-Line CUDA Graphs and Other Performance Enhancements
CUDA Graphs are a way to define and batch GPU operations as a graph rather than a sequence of stream launches. A CUDA Graph groups a set of CUDA kernels and...
8 MIN READ
Aug 08, 2024
Improving GPU Performance by Reducing Instruction Cache Misses
GPUs are specially designed to crunch through massive amounts of data at high speed. They have a large amount of compute resources, called streaming...
11 MIN READ
Jul 18, 2024
Accelerating Vector Search: NVIDIA cuVS IVF-PQ Part 2, Performance Tuning
In the first part of the series, we presented an overview of the IVF-PQ algorithm and explained how it builds on top of the IVF-Flat algorithm, using the...
14 MIN READ
Jul 18, 2024
Accelerating Vector Search: NVIDIA cuVS IVF-PQ Part 1, Deep Dive
In this post, we continue the series on accelerating vector search using NVIDIA cuVS. Our previous post in the series introduced IVF-Flat, a fast algorithm for...
14 MIN READ
Jul 16, 2024
Building an AI Agent for Supply Chain Optimization with NVIDIA NIM and cuOpt
Enterprises face significant challenges in making supply chain decisions that maximize profits while adapting quickly to dynamic changes. Optimal supply chain...
8 MIN READ
Jul 08, 2024
Deploy Multilingual LLMs with NVIDIA NIM
Multilingual large language models (LLMs) are increasingly important for enterprises operating in today's globalized business landscape. As businesses expand...
9 MIN READ
May 10, 2024
Dynamic Control Flow in CUDA Graphs with Conditional Nodes
CUDA Graphs can provide a significant performance increase, as the driver is able to optimize execution using the complete description of tasks and...
7 MIN READ
Mar 12, 2024
Calculating Video Quality Using NVIDIA GPUs and VMAF-CUDA
Video quality metrics are used to evaluate the fidelity of video content. They provide a consistent quantitative measurement to assess the performance of the...
14 MIN READ
Feb 21, 2024
Limiting CPU Threads for Better Game Performance
Many PC games are designed around an eight-core console with an assumption that their software threading system ‘just works’ on all PCs, especially...
6 MIN READ
Jan 16, 2024
Robust Scene Text Detection and Recognition: Inference Optimization
In this post, we delve deeper into the inference optimization process to improve the performance and efficiency of our machine learning models during the...
9 MIN READ
Jan 16, 2024
Robust Scene Text Detection and Recognition: Implementation
To make scene text detection and recognition work on irregular text or for specific use cases, you must have full control of your model so that you can do...
6 MIN READ
Jan 16, 2024
Robust Scene Text Detection and Recognition: Introduction
Identification and recognition of text from natural scenes and images become important for use cases like video caption text recognition, detecting signboards...
8 MIN READ