Fusion wgpu compilation cache #1069

nathanielsimard · 2023-12-14T21:32:04Z

Changes

Add a cache step where the compilation of fused kernels are only executed once, then the kernel ID is used without any overhead.

Refactor the way we create kernels with fusion into a DSL so that we can use it to create WGSL shaders easily, not just when using fusion. I migrated the unary kernels to use the new way, but following PRs will migrate more kernels and remove a lot of code.

Drop

I removed the clamp max and min functions in Fusion as well as in WGPU, since we need to reduce the number of operations that can be fused to their basic operations. Long story short, they will be fused, so we don't need specific kernels, and having those functions will actually break the fusing stream, so they actually hurt performance as well as add complexity.

codecov · 2023-12-14T22:05:36Z

Codecov Report

Attention: 31 lines in your changes are missing coverage. Please review.

Comparison is base (d9f93d3) 85.55% compared to head (0e1a5e6) 85.66%.
Report is 12 commits behind head on main.

Files	Patch %	Lines
burn-wgpu/src/fusion/elemwise/builder.rs	10.00%	18 Missing ⚠️
burn-wgpu/src/codegen/operator.rs	77.27%	5 Missing ⚠️
burn-wgpu/src/codegen/shader.rs	20.00%	4 Missing ⚠️
burn-wgpu/src/element.rs	66.66%	2 Missing ⚠️
burn-wgpu/src/fusion/cache.rs	91.66%	2 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #1069       /-   ##
==========================================
  Coverage   85.55%   85.66%    0.11%     
==========================================
  Files         508      509        1     
  Lines       53910    54126      216     
==========================================
  Hits        46122    46368      246     
  Misses       7788     7758      -30

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

louisfd

Awesome, please look at my comments

louisfd · 2023-12-15T21:36:54Z

burn-wgpu/src/codegen/kernel.rs

+impl ElemWiseKernelCodegen<CompilationPhase> {
+ /// Compile the kernel into a [compute shader](ComputeShader).
+ pub fn compile(self) -> ComputeShader {
+ let mut inputs = Vec::with_capacity(self.input_bindings.len());


inputs and outpus can be directly self.input_bindings and self.output_bindings

louisfd · 2023-12-15T21:44:36Z

burn-wgpu/src/codegen/shader.rs

 pub enum Visibility {
 Read,
 ReadWrite,
 }

-#[derive(Debug, Clone, Hash, PartialEq, Eq, Copy)]
+#[derive(Debug, Clone, PartialEq, Eq, Copy)]
 pub enum Elem {
 F32,
 #[allow(dead_code)]


louisfd · 2023-12-15T21:48:00Z

burn-wgpu/src/fusion/cache.rs

+ items: HashSet<String>,
+}
+
+pub enum CachedComputeShader {


louisfd · 2023-12-15T21:48:29Z

burn-wgpu/src/fusion/cache.rs

+ fn source(&self) -> SourceTemplate {
+ match self {
+ CachedComputeShader::Cached(_) => {
+ panic!("NoSource compute shader should only be used by a higher level cache.")


louisfd · 2023-12-15T21:51:54Z

burn-wgpu/src/fusion/elemwise/builder.rs

 inputs,
 outputs,
 locals,
 operators: self.operators.clone(),
 scalars_f32: self.scalars_f32,
 device: self.device.clone(),
+ cache: KernelCache::default(),


Is it useless

louisfd · 2023-12-15T22:15:21Z

burn-wgpu/src/template/unary_scalar.wgsl

@@ -1,21 0,0 @@
-@group(0)


Don't forget to delete safe tanh file

louisfd

Perfect

nathanielsimard added 11 commits December 12, 2023 13:52

Refactor fusion in the wgpu backend

7322496

WIP

4b17a53

Refactor

e159935

WIP

d53ac6c

Fix inplace ops

1241f4a

Works ish

035f417

Cleanup Output

e7bb27e

Refactoring

4dcd181

Refactor Clamp

59f254d

Cleanup

b9819e6

Cleanup

8a4d7c5

nathanielsimard changed the title ~~Refactor/fusion wgpu~~ Fusion wgpu compilation cache Dec 14, 2023

Updates

725af1d

nathanielsimard requested a review from louisfd December 14, 2023 21:42

Fix CI

e84fc6d

louisfd requested changes Dec 15, 2023

View reviewed changes

Code review

0e1a5e6

louisfd approved these changes Dec 18, 2023

View reviewed changes

louisfd merged commit b5c49c5 into main Dec 18, 2023
15 checks passed

louisfd deleted the refactor/fusion-wgpu branch December 18, 2023 17:16

louisfd mentioned this pull request Dec 20, 2023

tanh_should_not_have_numerical_bugs_on_macos is failing on MacOS #1086

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fusion wgpu compilation cache #1069

Fusion wgpu compilation cache #1069

nathanielsimard commented Dec 14, 2023 •

edited

Loading

codecov bot commented Dec 14, 2023 •

edited

Loading

louisfd left a comment

louisfd Dec 15, 2023

louisfd Dec 15, 2023

louisfd Dec 15, 2023

louisfd Dec 15, 2023

louisfd Dec 15, 2023

louisfd Dec 15, 2023

louisfd left a comment

Fusion wgpu compilation cache #1069

Fusion wgpu compilation cache #1069

Conversation

nathanielsimard commented Dec 14, 2023 • edited Loading

Changes

codecov bot commented Dec 14, 2023 • edited Loading

Codecov Report

louisfd left a comment

Choose a reason for hiding this comment

louisfd Dec 15, 2023

Choose a reason for hiding this comment

louisfd Dec 15, 2023

Choose a reason for hiding this comment

louisfd Dec 15, 2023

Choose a reason for hiding this comment

louisfd Dec 15, 2023

Choose a reason for hiding this comment

louisfd Dec 15, 2023

Choose a reason for hiding this comment

louisfd Dec 15, 2023

Choose a reason for hiding this comment

louisfd left a comment

Choose a reason for hiding this comment

nathanielsimard commented Dec 14, 2023 •

edited

Loading

codecov bot commented Dec 14, 2023 •

edited

Loading