Based on the work of Keenan Crane, this is an implementation of Conformal curvature flow using spin transformations.
- Eigen (header only, version 3.3.7)
- IGL (header only)
- qmake (version 5)
- CUDA (version 9.0 or greater)
- Thrust (shipped with CUDA)
- CUSP (header only)
- Google Test
- Google Benchmark
The following environment variables are required for compilation:
CUDA_PATH : Needs to point to the base directory of your cuda lib and includes CUDA_ARCH : Your device compute capability (e.g. 30, 35, ... 50, 52 ...) HOST_COMPILER : Your local g compiler (compatible with cuda compiles, g 4.8.5)
> git submodule update --init --recursive
> qmake
> make -j
[1] K. Crane, U. Pinkall, and P. Schröder, “Spin transformations of discrete surfaces,” ACM Transactions on Graphics, vol. 30, no. 4, p. 1, Jul. 2011.
[2] K. Crane, U. Pinkall, and P. Schröder, “Robust fairing via conformal curvature flow,” ACM Transactions on Graphics, vol. 32, no. 4, p. 1, Jul. 2013.
- The library contains a host and a device implementation, separated by namspaces.
- The host implementation is header only and was designed to feel like an extension of the popular IGL library (it follows their style guide too.)
- The device implementation requires compiling into a shared library and has been R-pathed into the tests and benchmarks for convenience.
- The project comes with 2 host samples, and one device sample. The first host, and only device sample is a demo implementation of willmore flow. The second host sample named matrix_bake, will export all matrices required by the tests and benchmarks in matrix market format.
- The project comes with a full set of host and device tests and benchmarks.
- Currently all subdirectories dump an executable in their directory.
- The host side follows the same style as IGL using single capital letters for parameter names of common data, and snake case for function names.
- The device side uses a similar style but due to the increased number of parameters in functions, I decided to provide full parameter names.
- Both use view types where ever possible to avoid needless copies, but also to avoid enforcing specific data types on the user.
##NOTE
- Spin positions test is failing for the DEVICE side implementation as it finds an equivalent but different solution to the linear system
- Before testing and benchmarking you must place the matrices and models folders in the test directory
This project was developed as a final year programming assignment, with the goal of reimplementing a sequential algorithm on the GPU, and benchmarking the performance gains. I chose to implement conformal willmore flow using spin tranformations as nature of a geometry processing project lends itself to data oriented design. I have outlined some of the more noteworthy parts of the project below.
Adjacency Matrix Indices
One of the main aspects of the project was to assemble sparse matrices in parallel. One possible approach would be to compute lists of triplets, which could then be sorted by row and column index, before being reduced. The draw back of this approach is of course using two sorts. The two sparse matrices required for this project both share the same sparsity pattern (laplacian), and we know that, all entries in the matrix will only be written to a maximum of twice (except the diagonals), so it makes sense to try and use atomics here as the number of collisions will be very low. To facilitate this, each thread working on the matrix would require an index to write out it's value, and this function provides that.
The goal is to produce the matrix column and row indices, as well as diagonal indices (useful for later), and entry indices for parallel assembly. The first two are essentially the vertex-vertex adjacency lists with added diagonals. The adjacency lists can be found by creating a list of all possible half edges and then using unique to remove duplicates. Once we've obtained them, we can find the locations where our diagonal entries should be inserted by comparing each row index against each column index and writing a one if, it is less. If we follow this with a reduce, we've obtained the amount of rows per column before the diagonal, so to finish we can add an ascending list and the cumulative valence of each row (prior to diagonals being inserted) to these reduced values and voi-la.
row_index: 0 0 0 1 1 2 2 3 3 3
col_index: 1 2 3 0 3 0 3 0 1 2
--------------------------------
compared : 0 0 0 1 0 1 0 1 1 1
--------------------------------
index : 0 1 2 3
reduced : 0 1 1 3
--------------------------------
valence : 3 2 2 3
cumval : 0 3 5 7 10
sequence : 0 1 2 3
(reduced cumval sequence =)
diagonals: 0 5 8 13
Handy optimizations are to pack the comparisson and reduce into one kernel using a thrust transform iterator on the way in, and then pack the final sequence and valence add in to the end using a transform output iterator, leaving us with one single kernel launch.
Given the diagonals, it is simple to produce the row and column indices by copying across the vertex-vertex adjacency lists, but making sure to skip our diagonals. Then in a second pass, the diagonals indices are copied to using an ascending sequence. In practice this can all be done with thrust and permutation iterators.
row_index: 0 0 0 1 1 2 2 3 3 3
col_index: 1 2 3 0 3 0 3 0 1 2
diagonals: 0 5 8 13
--------------------------------------
1st pass : - 0 0 0 1 - 1 2 - 2 3 3 3 -
2nd pass : 0 0 0 0 1 1 1 2 2 2 3 3 3 3
As for the regular entry indices, we use a parallel binary search method on each in each face (same order as the threads in the sparse matrix assembly kernels.) Given an edge A -> B we find the upper bound and lower bounds as the cumulative valence of vertex A, and one past A. We then perform a binary search within these bounds for B. Unfortunately this can yeild unbalanced work-loads for threads in the kernel, however it need only be computed once and executes quick enough for this purpose, especially when combined with the gains in the sparse matrix assembly.
row_index : 0 0 0 1 1 2 2 3 3 3
col_index : 1 2 3 0 3 0 3 0 1 2
cumval : 0 3 5 7 10
------------------------------
f0 : 0 1 3
lower_bound : 0 3 7
upper_bound : 3 5 10
b-search : 0 4 7
Cotangent Laplacian
The cotangent laplacian is a very useful discrete representation of a surfaces mean curvature, used heavily for this algorithm. It will be recomputed at every iteration, and so it was essential that the assembly be quick. Initially I began by launching 6 threads per face, meaning 2 threads per face edge, which directly corresponds to the amount of writes into the matrix required. This allowed for one atomic add per thread, and hence a fairly quick assembly, however the uneven number of threads per block, and the fact that threads working on the same face could be in different warps limited my ability to combat bank conflicts. My original solution was to instead have 8 threads per face, with 2 dummy threads doing no work, but aligning the rest. However I then realised that once the alignment was in place, the entire computation could be performed using only interwarp communcation rather than block-wide. This would mean I could get rid of shared memory and potential back conflicts, and instead use warp shuffling to share data between my threads. At this point it seemed wasteful to have 2 dead threads, so I settled for 4 threads per face, with one dead thread. Each alive thread would write to global memory twice, once in the upper triangular part and once in the lower. The thread communication is fairly simple, each thread loads in a vertex positon (1-per thread), and then using a warp shuffle, we subtract the neighbours vertex value to produce 3 edges. With these edges I first compute the face area, and then the final cotangent laplacian value. Writing these value back to global memory is simple since we have computed the entry indices in the previous step.
if lane < 3:
read vertex_pos
shuffle (0 1 2 -) -> (1 2 0 1)
shuffle down subtract (1-2 2-0 0-1 1-2)
shuffle donw neighbor_edge
inv_area <- 0.5 * rsqrt(cross(edge, neighbor_edge))
result <- dot(edge, neigbor_edge) * inv_area
In the real implementation I don't just kill off the 4th thread, instead I use it as temporary storage to make shuffle downs easier. The main kernel leaves the diagonals untouched, as they are easier computed in a second pass, we need simply reduce each column and write the negation of the sum.
Intrinsic Dirac
The intrinsic dirac operator is the core of the algorithm and luckily has the same sparsity pattern as the cotangent laplacian, so we can reuse the row and column indices, as well as the diagonal and entry indices for computation. The resulting computation is more complex, involving quaternion hammilton products, and is not entirely symmetrical, however the resolution wasn't too bad. Another issue I faced (unresolved) is that we can't take advantage of vectorized writes to global memory when using atomics as they're only implemented for 4 byte primitives. The final diagonal computation was also more challenging to implement, involving more variables than the simple reduce for a cotangent laplacian. To optimize this I first noted that all contributions were coming from adjacent triangles, and so could probably be modelled using a reduce. In particular, the contribution would come from the case where we use the same vertex for both sides of our edge, which after a subtraction will result in zero, so we can therefor assume that all diagonals will be purely real quaternions with a zero imaginary vector. The real part can be computed as the dot product of the two edges stemming from our vertex, plus the regular real component computed from only our change in mean curvature. After all this we can model the diagonals as a reduce by vertex triangle adjacency, where at each adjacency pair we transform the two edges containing our vertex into the resulting diagonal.
Tri <- [v0, v1, v2]
find two edges:
e0 <- v1-v0
e1 <- v2-v0
result <- (rho^2 * area / 9) dot(e0, e1) / (4 * area)
Solving the sparse Linear systems
This project involved solving two sparse linear systems. The first called Similarity transforms, produces a quaternion per vertex that best describes it's conformal deformation, using the intrinsic dirac matrix as a left hand side. The second called Spin positions, produces new vertex positions that best fit the edges calculated using the similarity transforms. This was probably the most challenging and important stage of the project. These two processes are the slowest host functions as well, meaning it was crucial to speed them up. I ended up implementing two methods for both, one using a direct solve via the cuSolverSp and cuSparse API, and the other using an iterative conjugate gradient solver via the CUSP API.
Interestingly the conjugate gradient solve out performs a direct cholesky decomposition drastically for the similarity transforms, and beats the host version for some of the smaller meshes. The direct solve wins for larger meshes in the Spin positions step, where the iterative solve performs poorly.
I ran out of time before I was able to implement the mean curvature calculation and projection on the device side, and would add this next. I would also work on moving duplicate code out of the sparse matrix assembly kernels which would allow the shuffle methods to be used by other functions, such as face area calculations. Finally I would like to have used streams more as there is definitely potential to overlap computation in this project.
name | iterations | real_time | cpu_time | time_unit |
---|---|---|---|---|
HOST_cotangent_laplacian_cube | 182627 | 3852.71 | 3852.37 | ns |
HOST_cotangent_laplacian_spot | 90 | 7.68602e 06 | 7.65249e 06 | ns |
HOST_cotangent_laplacian_bunny | 35 | 2.03471e 07 | 2e 07 | ns |
HOST_cotangent_laplacian_tyra | 8 | 7.18162e 07 | 6.8106e 07 | ns |
HOST_divergent_edges_cube | 471392 | 1477.79 | 1477.58 | ns |
HOST_divergent_edges_spot | 100 | 5.24034e 06 | 5.2395e 06 | ns |
HOST_divergent_edges_bunny | 50 | 1.39971e 07 | 1.39957e 07 | ns |
HOST_divergent_edges_tyra | 12 | 5.04251e 07 | 5.04111e 07 | ns |
HOST_face_area_cube | 4344953 | 164.346 | 164.308 | ns |
HOST_face_area_spot | 2074 | 339835 | 339751 | ns |
HOST_face_area_bunny | 723 | 974836 | 974671 | ns |
HOST_face_area_tyra | 226 | 3.1432e 06 | 3.14241e 06 | ns |
HOST_intrinsic_dirac_cube | 44546 | 15991 | 15988.7 | ns |
HOST_intrinsic_dirac_spot | 8 | 7.53803e 07 | 7.53403e 07 | ns |
HOST_intrinsic_dirac_bunny | 1 | 6.25203e 08 | 6.25151e 08 | ns |
HOST_intrinsic_dirac_tyra | 1 | 1.83794e 09 | 1.83772e 09 | ns |
HOST_mean_curvature_cube | 2044611 | 365.646 | 365.569 | ns |
HOST_mean_curvature_spot | 1228 | 559729 | 559669 | ns |
HOST_mean_curvature_bunny | 623 | 1.14741e 06 | 1.14732e 06 | ns |
HOST_mean_curvature_tyra | 162 | 4.46348e 06 | 4.463e 06 | ns |
HOST_orthonormalize_cube | 912301 | 768.27 | 768.2 | ns |
HOST_orthonormalize_spot | 5326 | 135622 | 135611 | ns |
HOST_orthonormalize_bunny | 2243 | 310662 | 310637 | ns |
HOST_orthonormalize_tyra | 749 | 934909 | 934832 | ns |
HOST_project_basis_cube | 1746431 | 397.31 | 397.274 | ns |
HOST_project_basis_spot | 11059 | 62670 | 62664.8 | ns |
HOST_project_basis_bunny | 4811 | 142988 | 142976 | ns |
HOST_project_basis_tyra | 1644 | 419267 | 419232 | ns |
HOST_quaternion_matrix_cube | 64695 | 10676.5 | 10675.6 | ns |
HOST_quaternion_matrix_spot | 24 | 2.96319e 07 | 2.96293e 07 | ns |
HOST_quaternion_matrix_bunny | 12 | 5.74237e 07 | 5.7419e 07 | ns |
HOST_quaternion_matrix_tyra | 3 | 2.02537e 08 | 2.02519e 08 | ns |
HOST_similarity_xform_cube | 29636 | 23613.9 | 23611.8 | ns |
HOST_similarity_xform_spot | 1 | 6.35867e 08 | 6.35806e 08 | ns |
HOST_similarity_xform_bunny | 1 | 4.63868e 09 | 4.63829e 09 | ns |
HOST_similarity_xform_tyra | 1 | 8.65426e 09 | 8.65345e 09 | ns |
HOST_spin_positions_cube | 38234 | 18642.5 | 18641 | ns |
HOST_spin_positions_spot | 1 | 6.29275e 08 | 6.29218e 08 | ns |
HOST_spin_positions_bunny | 1 | 4.65188e 09 | 4.65144e 09 | ns |
HOST_spin_positions_tyra | 1 | 8.61795e 09 | 8.61715e 09 | ns |
HOST_vertex_mass_cube | 3144441 | 224.885 | 224.867 | ns |
HOST_vertex_mass_spot | 1726 | 412928 | 412894 | ns |
HOST_vertex_mass_bunny | 575 | 1.21601e 06 | 1.21591e 06 | ns |
HOST_vertex_mass_tyra | 180 | 3.92145e 06 | 3.9211e 06 | ns |
HOST_vertex_normals_cube | 398109 | 1786.48 | 1786.32 | ns |
HOST_vertex_normals_spot | 370 | 4.03304e 06 | 2.02852e 06 | ns |
HOST_vertex_normals_bunny | 262 | 5.30947e 06 | 2.79652e 06 | ns |
HOST_vertex_normals_tyra | 118 | 1.05496e 07 | 6.1032e 06 | ns |
HOST_vertex_triangle_adjacency_cube | 209312 | 2846.6 | 2811.03 | ns |
HOST_vertex_triangle_adjacency_spot | 1168 | 696248 | 674157 | ns |
HOST_vertex_triangle_adjacency_bunny | 308 | 2.49694e 06 | 2.46715e 06 | ns |
HOST_vertex_triangle_adjacency_tyra | 86 | 8.45339e 06 | 8.39191e 06 | ns |
HOST_vertex_vertex_adjacency_cube | 372804 | 1905.26 | 1901.45 | ns |
HOST_vertex_vertex_adjacency_spot | 172 | 4.13094e 06 | 4.12159e 06 | ns |
HOST_vertex_vertex_adjacency_bunny | 55 | 1.28761e 07 | 1.28478e 07 | ns |
HOST_vertex_vertex_adjacency_tyra | 16 | 4.38669e 07 | 4.37587e 07 | ns |
name | iterations | real_time | cpu_time | time_unit |
---|---|---|---|---|
HOST_cotangent_laplacian_cube | 271031 | 2576.69 | 2574.4 | ns |
HOST_cotangent_laplacian_spot | 103 | 6.9254e 06 | 6.71974e 06 | ns |
HOST_cotangent_laplacian_bunny | 42 | 1.6869e 07 | 1.62396e 07 | ns |
HOST_cotangent_laplacian_tyra | 12 | 5.43377e 07 | 5.20702e 07 | ns |
HOST_divergent_edges_cube | 261742 | 2713.79 | 2711.13 | ns |
HOST_divergent_edges_spot | 90 | 7.78507e 06 | 7.77556e 06 | ns |
HOST_divergent_edges_bunny | 34 | 2.00418e 07 | 2.00014e 07 | ns |
HOST_divergent_edges_tyra | 7 | 9.34462e 07 | 9.31885e 07 | ns |
HOST_face_area_cube | 6017563 | 113.593 | 113.481 | ns |
HOST_face_area_spot | 2523 | 280010 | 279535 | ns |
HOST_face_area_bunny | 878 | 805238 | 803422 | ns |
HOST_face_area_tyra | 270 | 2.583e 06 | 2.57878e 06 | ns |
HOST_intrinsic_dirac_cube | 47654 | 13423.8 | 13411.1 | ns |
HOST_intrinsic_dirac_spot | 10 | 6.21724e 07 | 6.20564e 07 | ns |
HOST_intrinsic_dirac_bunny | 1 | 5.33196e 08 | 5.32015e 08 | ns |
HOST_intrinsic_dirac_tyra | 1 | 1.6005e 09 | 1.59639e 09 | ns |
HOST_mean_curvature_cube | 3933690 | 178.399 | 178.214 | ns |
HOST_mean_curvature_spot | 1458 | 479014 | 478333 | ns |
HOST_mean_curvature_bunny | 756 | 978688 | 976697 | ns |
HOST_mean_curvature_tyra | 158 | 4.39168e 06 | 4.38173e 06 | ns |
HOST_orthonormalize_cube | 2125151 | 330.327 | 329.96 | ns |
HOST_orthonormalize_spot | 8477 | 83593.6 | 83457.2 | ns |
HOST_orthonormalize_bunny | 3731 | 189577 | 189290 | ns |
HOST_orthonormalize_tyra | 1255 | 565450 | 564393 | ns |
HOST_project_basis_cube | 5539554 | 128.969 | 128.826 | ns |
HOST_project_basis_spot | 18190 | 40110.3 | 40037.7 | ns |
HOST_project_basis_bunny | 8605 | 84039.3 | 83884.8 | ns |
HOST_project_basis_tyra | 2904 | 233260 | 232939 | ns |
HOST_quaternion_matrix_cube | 78228 | 8942.42 | 8933.33 | ns |
HOST_quaternion_matrix_spot | 29 | 2.40185e 07 | 2.39839e 07 | ns |
HOST_quaternion_matrix_bunny | 15 | 4.68857e 07 | 4.68207e 07 | ns |
HOST_quaternion_matrix_tyra | 4 | 1.63473e 08 | 1.63205e 08 | ns |
HOST_similarity_xform_cube | 38645 | 18438.3 | 18414 | ns |
HOST_similarity_xform_spot | 1 | 7.42056e 08 | 7.38969e 08 | ns |
HOST_similarity_xform_bunny | 1 | 4.0992e 09 | 4.08846e 09 | ns |
HOST_similarity_xform_tyra | 1 | 7.66838e 09 | 7.6493e 09 | ns |
HOST_spin_positions_cube | 49689 | 13943.9 | 13925.5 | ns |
HOST_spin_positions_spot | 1 | 5.45467e 08 | 5.44281e 08 | ns |
HOST_spin_positions_bunny | 1 | 4.34899e 09 | 4.33687e 09 | ns |
HOST_spin_positions_tyra | 1 | 7.72303e 09 | 7.69932e 09 | ns |
HOST_vertex_mass_cube | 4696537 | 150.733 | 150.515 | ns |
HOST_vertex_mass_spot | 1984 | 345572 | 344981 | ns |
HOST_vertex_mass_bunny | 730 | 979333 | 977439 | ns |
HOST_vertex_mass_tyra | 222 | 3.39383e 06 | 3.38649e 06 | ns |
HOST_vertex_normals_cube | 603713 | 1159.48 | 1158.22 | ns |
HOST_vertex_normals_spot | 1018 | 5.08585e 06 | 1.03785e 06 | ns |
HOST_vertex_normals_bunny | 415 | 8.53295e 06 | 1.69286e 06 | ns |
HOST_vertex_normals_tyra | 147 | 2.08555e 07 | 5.71142e 06 | ns |
HOST_vertex_triangle_adjacency_cube | 450150 | 1691.17 | 1687.71 | ns |
HOST_vertex_triangle_adjacency_spot | 957 | 672810 | 640440 | ns |
HOST_vertex_triangle_adjacency_bunny | 366 | 1.85957e 06 | 1.8288e 06 | ns |
HOST_vertex_triangle_adjacency_tyra | 102 | 6.66418e 06 | 6.64907e 06 | ns |
HOST_vertex_vertex_adjacency_cube | 454305 | 1478.19 | 1476.08 | ns |
HOST_vertex_vertex_adjacency_spot | 183 | 3.80413e 06 | 3.79766e 06 | ns |
HOST_vertex_vertex_adjacency_bunny | 62 | 1.09962e 07 | 1.09678e 07 | ns |
HOST_vertex_vertex_adjacency_tyra | 19 | 3.6735e 07 | 3.66314e 07 | ns |
name | iterations | real_time | cpu_time | time_unit |
---|---|---|---|---|
DEVICE_adjacency_matrix_indices_cube | 1859 | 341886 | 340272 | ns |
DEVICE_adjacency_matrix_indices_spot | 952 | 815516 | 811983 | ns |
DEVICE_adjacency_matrix_indices_bunny | 648 | 1043000 | 1040070 | ns |
DEVICE_adjacency_matrix_indices_tyra | 303 | 2308040 | 2305360 | ns |
DEVICE_cotangent_laplacian_cube | 2636 | 266425 | 264868 | ns |
DEVICE_cotangent_laplacian_spot | 1822 | 389558 | 387782 | ns |
DEVICE_cotangent_laplacian_bunny | 1180 | 592784 | 590895 | ns |
DEVICE_cotangent_laplacian_tyra | 250 | 2780910 | 2779790 | ns |
DEVICE_divergent_edges_cube | 2755 | 250017 | 248657 | ns |
DEVICE_divergent_edges_spot | 1829 | 388904 | 386151 | ns |
DEVICE_divergent_edges_bunny | 1227 | 573293 | 571326 | ns |
DEVICE_divergent_edges_tyra | 416 | 1661450 | 1661310 | ns |
DEVICE_face_area_cube | 174640 | 4013.78 | 4013.42 | ns |
DEVICE_face_area_spot | 39931 | 18190.5 | 18189 | ns |
DEVICE_face_area_bunny | 10000 | 70404.2 | 70398.4 | ns |
DEVICE_face_area_tyra | 10000 | 228716 | 228697 | ns |
DEVICE_intrinsic_dirac_cube | 11973 | 55211.1 | 55206.5 | ns |
DEVICE_intrinsic_dirac_spot | 605 | 1157520 | 1156930 | ns |
DEVICE_intrinsic_dirac_bunny | 265 | 2778810 | 2767580 | ns |
DEVICE_intrinsic_dirac_tyra | 58 | 11694600 | 11693600 | ns |
DEVICE_quaternion_matrix_cube | 64005 | 10361 | 10360.2 | ns |
DEVICE_quaternion_matrix_spot | 1619 | 432756 | 432711 | ns |
DEVICE_quaternion_matrix_bunny | 825 | 844616 | 844546 | ns |
DEVICE_quaternion_matrix_tyra | 278 | 2.40125e 06 | 2.40105e 06 | ns |
DEVICE_similarity_xform_direct_cube | 410 | 1.65968e 06 | 1.65722e 06 | ns |
DEVICE_similarity_xform_direct_spot | 1 | 3.72199e 09 | 3.72164e 09 | ns |
DEVICE_similarity_xform_direct_bunny | 1 | 1.48061e 10 | 1.48048e 10 | ns |
DEVICE_similarity_xform_direct_tyra | 1 | 9.21995e 10 | 9.2191e 10 | ns |
DEVICE_similarity_xform_iterative_cube | 354 | 1.97964e 06 | 1.97947e 06 | ns |
DEVICE_similarity_xform_iterative_spot | 1 | 8.37923e 08 | 8.35415e 08 | ns |
DEVICE_similarity_xform_iterative_bunny | 1 | 1.60284e 09 | 1.59807e 09 | ns |
DEVICE_similarity_xform_iterative_tyra | 1 | 4.12793e 09 | 4.12634e 09 | ns |
DEVICE_spin_positions_direct_spot | 1 | 3.68252e 09 | 3.68218e 09 | ns |
DEVICE_spin_positions_direct_bunny | 1 | 1.4813e 10 | 1.48117e 10 | ns |
DEVICE_spin_positions_direct_tyra | 1 | 9.26087e 10 | 9.25993e 10 | ns |
DEVICE_spin_positions_iterative_spot | 1 | 3.20449e 09 | 3.19223e 09 | ns |
DEVICE_spin_positions_iterative_bunny | 1 | 2.43237e 10 | 2.42893e 10 | ns |
DEVICE_spin_positions_iterative_tyra | 1 | 4.5125e 10 | 4.5104e 10 | ns |
DEVICE_vertex_mass_cube | 1850 | 382962 | 381300 | ns |
DEVICE_vertex_mass_spot | 774 | 918686 | 914777 | ns |
DEVICE_vertex_mass_bunny | 715 | 985748 | 981242 | ns |
DEVICE_vertex_mass_tyra | 550 | 1.27652e 06 | 1.27083e 06 | ns |
DEVICE_vertex_triangle_adjacency_cube | 7564 | 83150.4 | 83142.5 | ns |
DEVICE_vertex_triangle_adjacency_spot | 975 | 645653 | 644782 | ns |
DEVICE_vertex_triangle_adjacency_bunny | 687 | 951790 | 950354 | ns |
DEVICE_vertex_triangle_adjacency_tyra | 341 | 2.00665e 06 | 2.00445e 06 | ns |
DEVICE_vertex_vertex_adjacency_cube | 1754 | 402834 | 401607 | ns |
DEVICE_vertex_vertex_adjacency_spot | 328 | 2.11945e 06 | 2.11526e 06 | ns |
DEVICE_vertex_vertex_adjacency_bunny | 205 | 3.42353e 06 | 3.41836e 06 | ns |
DEVICE_vertex_vertex_adjacency_tyra | 78 | 8.39362e 06 | 8.38729e 06 | ns |
name | iterations | real_time | cpu_time | time_unit |
---|---|---|---|---|
DEVICE_adjacency_matrix_indices_cube | 7884 | 79085 | 78952.7 | ns |
DEVICE_adjacency_matrix_indices_spot | 3765 | 186365 | 186057 | ns |
DEVICE_adjacency_matrix_indices_bunny | 1981 | 354997 | 354429 | ns |
DEVICE_adjacency_matrix_indices_tyra | 698 | 1.00454e 06 | 1.00289e 06 | ns |
DEVICE_cotangent_laplacian_cube | 19028 | 36727.9 | 36673.4 | ns |
DEVICE_cotangent_laplacian_spot | 7318 | 95927 | 95779 | ns |
DEVICE_cotangent_laplacian_bunny | 3788 | 185237 | 184952 | ns |
DEVICE_cotangent_laplacian_tyra | 639 | 1.09654e 06 | 1.09491e 06 | ns |
DEVICE_divergent_edges_cube | 22758 | 31024.4 | 30975 | ns |
DEVICE_divergent_edges_spot | 5095 | 137633 | 137418 | ns |
DEVICE_divergent_edges_bunny | 2589 | 270829 | 270412 | ns |
DEVICE_divergent_edges_tyra | 714 | 882609 | 881253 | ns |
DEVICE_face_area_cube | 242575 | 2905.48 | 2902.24 | ns |
DEVICE_face_area_spot | 43658 | 17166.9 | 17149 | ns |
DEVICE_face_area_bunny | 10000 | 59671.9 | 59607.6 | ns |
DEVICE_face_area_tyra | 10000 | 167694 | 167522 | ns |
DEVICE_intrinsic_dirac_cube | 12720 | 48998.3 | 48921.5 | ns |
DEVICE_intrinsic_dirac_spot | 1224 | 571206 | 570304 | ns |
DEVICE_intrinsic_dirac_bunny | 546 | 1.28079e 06 | 1.27895e 06 | ns |
DEVICE_intrinsic_dirac_tyra | 115 | 5.41181e 06 | 5.40575e 06 | ns |
DEVICE_quaternion_matrix_cube | 73965 | 8841.74 | 8831.27 | ns |
DEVICE_quaternion_matrix_spot | 2955 | 237677 | 237398 | ns |
DEVICE_quaternion_matrix_bunny | 1530 | 458183 | 457665 | ns |
DEVICE_quaternion_matrix_tyra | 540 | 1.29391e 06 | 1.29254e 06 | ns |
DEVICE_similarity_xform_direct_cube | 585 | 1.18767e 06 | 1.18497e 06 | ns |
DEVICE_similarity_xform_direct_spot | 1 | 8.58772e 08 | 8.57566e 08 | ns |
DEVICE_similarity_xform_direct_bunny | 1 | 2.73471e 09 | 2.7312e 09 | ns |
DEVICE_similarity_xform_direct_tyra | 1 | 7.14028e 09 | 7.13095e 09 | ns |
DEVICE_similarity_xform_iterative_cube | 422 | 1.63953e 06 | 1.63654e 06 | ns |
DEVICE_similarity_xform_iterative_spot | 3 | 2.18979e 08 | 2.18621e 08 | ns |
DEVICE_similarity_xform_iterative_bunny | 1 | 7.19855e 08 | 7.17361e 08 | ns |
DEVICE_similarity_xform_iterative_tyra | 1 | 2.08458e 09 | 2.07955e 09 | ns |
DEVICE_spin_positions_direct_spot | 1 | 8.20441e 08 | 8.1929e 08 | ns |
DEVICE_spin_positions_direct_bunny | 1 | 2.71461e 09 | 2.71119e 09 | ns |
DEVICE_spin_positions_direct_tyra | 1 | 6.88368e 09 | 6.87474e 09 | ns |
DEVICE_spin_positions_iterative_spot | 1 | 1.39873e 09 | 1.39645e 09 | ns |
DEVICE_spin_positions_iterative_bunny | 1 | 1.09265e 10 | 1.08924e 10 | ns |
DEVICE_spin_positions_iterative_tyra | 1 | 2.23644e 10 | 2.23161e 10 | ns |
DEVICE_vertex_mass_cube | 9017 | 72878.7 | 72766.8 | ns |
DEVICE_vertex_mass_spot | 2724 | 256301 | 255283 | ns |
DEVICE_vertex_mass_bunny | 2391 | 294982 | 293866 | ns |
DEVICE_vertex_mass_tyra | 1370 | 512385 | 510719 | ns |
DEVICE_vertex_triangle_adjacency_cube | 9364 | 68021.9 | 67925 | ns |
DEVICE_vertex_triangle_adjacency_spot | 1339 | 470203 | 468639 | ns |
DEVICE_vertex_triangle_adjacency_bunny | 1007 | 647694 | 645818 | ns |
DEVICE_vertex_triangle_adjacency_tyra | 508 | 1.35423e 06 | 1.35101e 06 | ns |
DEVICE_vertex_vertex_adjacency_cube | 4683 | 146918 | 146687 | ns |
DEVICE_vertex_vertex_adjacency_spot | 501 | 1.36479e 06 | 1.36066e 06 | ns |
DEVICE_vertex_vertex_adjacency_bunny | 338 | 2.0472e 06 | 2.04122e 06 | ns |
DEVICE_vertex_vertex_adjacency_tyra | 132 | 4.88977e 06 | 4.88117e 06 | ns |