Пакет: libvkfft-dev (1.2.26 ds1-1)

Връзки за libvkfft-dev

Ресурси за Debian:

Изтегляне на пакет-източник vkfft.

Отговорници:

Външни препратки:

Начална страница [github.com]

Подобни пакети:

Vulkan/CUDA/HIP/OpenCL Fast Fourier Transform library

VkFFT is an efficient GPU-accelerated multidimensional Fast Fourier Transform library for Vulkan/CUDA/HIP/OpenCL projects. VkFFT aims to provide the community with an open-source alternative to Nvidia's cuFFT library while achieving better performance. VkFFT is written in C language and supports Vulkan, CUDA, HIP and OpenCL as backends.

   1D/2D/3D systems

   Forward and inverse directions of FFT

   Support for big FFT dimension sizes. Current limits: C2C or even

C2R/R2C - (2^32, 2^32, 2^32). Odd C2R/R2C - (2^12, 2^32, 2^32). R2R - (2^12, 2^12, 2^12). Depends on the amount of shared memory on the device. (will be increased later).

   Radix-2/3/4/5/7/8/11/13 FFT. Sequences using radix 3, 5, 7, 11 and

13 have comparable performance to that of powers of 2.

   Bluestein's FFT algorithm for all other sequences. Full coverage

of C2C range, single upload (2^12, 2^12, 2^12) for R2C/C2R/R2R. Optimized to have as few memory transfers as possible by using zero padding and merged convolution support of VkFFT

   Single, double and half precision support. Double precision uses

CPU-generated LUT tables. Half precision still does all computations in single and only uses half precision to store data.

   All transformations are performed in-place with no performance

loss. Out-of-place transforms are supported by selecting different input/output buffers.

   No additional transposition uploads. Note: Data can be reshuffled

after the Four Step FFT algorithm with an additional buffer (for big sequences). Doesn't matter for convolutions - they return to the input ordering (saves memory).

   Complex to complex (C2C), real to complex (R2C), complex to real

(C2R) transformations and real to real (R2R) Discrete Cosine Transformations of types I, II, III and IV. R2R, R2C and C2R are optimized to run up to 2x times faster than C2C and take 2x less memory

   1x1, 2x2, 3x3 convolutions with symmetric or nonsymmetric kernel

(no register overutilization)

   Native zero padding to model open systems (up to 2x faster than

simply padding input array with zeros). Can specify the range of sequences filled with zeros and the direction where zero padding is applied (read or write stage)

   WHDCN layout - data is stored in the following order (sorted by

increase in strides): the width, the height, the depth, the coordinate (the number of feature maps), the batch number

   Multiple feature/batch convolutions - one input, multiple kernels

   Multiple input/output/temporary buffer split. Allows using data

split between different memory allocations and mitigates 4GB single allocation limit.

   Works on Nvidia, AMD and Intel GPUs. And Raspberry Pi 4 GPU.

   Works on Windows, Linux and macOS

   VkFFT supports Vulkan, CUDA, HIP and OpenCL as backend to cover

wide range of APIs

   Header-only library with Vulkan interface, which allows appending

VkFFT directly to user's command buffer. Kernels are compiled at run-time

Етикети: Software Development: Библиотеки, Role: Development Library

Изтегляне на libvkfft-dev

Изтегляне за всички налични архитектури
Архитектура	Големина на пакета	Големина след инсталиране	Файлове
all	80,5 кБ	1 566,0 кБ	[списък на файловете]