Releases · oneapi-src/oneCCL

What's New:

Optimizations on key-value store support to scale up to 3000 nodes
New APIs for Allgather, Broadcast and group API calls
Performance Optimizations for scaleup for Allgather, Allreduce, and Reduce-scatter for scaleup and scaleout
Performance Optimizations for CPU single node
Optimizations to reuse Level Zero events.
Change of the default mechanism for IPC exchange to pidfd

What's new:

Bug fixes

What's New:

Optimizations to limit the memory consumed by oneCCL
Optimizations to limit the number of file descriptors maintained opened by oneCCL.
Align the support for in-place for the Allgatherv and Reduce-scatter collectives to follow the same behavior as NCCL.
In particular, the Allgatherv collective is in place when:
send_buff == recv_buff rank_offset, where rank_offset = sum (recv_counts[i]), for all I<rank.
Reduce-scatter is in-place when recv_buff == send_buff rank *recv_count.
When using the environment variable CCL_WORKER_AFFINITY, oneCCL enforces the requirement that the length of the list should be equal to the number of workers.
Bug fixes.

What's New

Performance improvements for scaleup for all message sizes for AllReduce, Allgather, and Reduce_Scatter.
Optimizations also include small message sizes that appear in inference apps.
Performance improvements for scaleout for Allreduce, Reduce, Allgather, and Reduce_Scatter.
Optimized memory usage of oneCCL.
Support for PMIx 4.2.6.
Bug fixes.

Removals

oneCCL 2021.12 removes support for PMIx 4.2.2

This update provides bug fixes to maintain driver compatibility for Intel® Data Center GPU Max Series.

This update addresses stability issues with distributed Training and Inference workloads on Intel® Data Center GPU Max Series.

Added point to point blocking communication operations for send and receive.
Performance optimizations for Reduce-Scatter.
Improved profiling with Intel® Instrumentation and Tracing Technology (ITT) profiling level.

Improved scaling efficiency of the Scaleup algorithms for ReduceScatter
Optimized performance of oneCCL scaleup collectives by utilizing the new embedded Data Streaming Accelerator in Intel® 4th Generation Xeon Scalable Processors (formerly code-named Sapphire Rapids)

• Optimizations across the board including improved scaling efficiency of the Scaleup algorithms for Alltoall and Allgather
• Add collective selection for scaleout algorithm for device (GPU) buffers

• Provides optimized performance for Intel® Data Center GPU Max Series utilizing oneCCL.
• Enables support for Allreduce, Allgather, Reduce, and Alltoall connectivity for GPUs on the same node.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Releases: oneapi-src/oneCCL

Intel(R) oneAPI Collective Communications Library (oneCCL) 2021.14

Intel(R) oneAPI Collective Communications Library (oneCCL) 2021.13Intel(R) oneAPI Collective Communications Library (oneCCL) 2021.13.1

Intel(R) oneAPI Collective Communications Library (oneCCL) 2021.13

Intel(R) oneAPI Collective Communications Library (oneCCL) 2021.12

Intel(R) oneAPI Collective Communications Library (oneCCL) 2021.11.2

Intel(R) oneAPI Collective Communications Library (oneCCL) 2021.11.1

Intel(R) oneAPI Collective Communications Library (oneCCL) 2021.11

Intel(R) oneAPI Collective Communications Library (oneCCL) 2021.10

Intel(R) oneAPI Collective Communications Library (oneCCL) 2021.9

Intel(R) oneAPI Collective Communications Library (oneCCL) 2021.8