Stars
llama3 implementation one matrix multiplication at a time
Make PyTorch models up to 40% faster! Thunder is a source to source compiler for PyTorch. It enables using different hardware executors at once; across one or thousands of GPUs.
A curated list of practical guide resources of LLMs (LLMs Tree, Examples, Papers)
A Fusion Code Generator for NVIDIA GPUs (commonly known as "nvFuser")
Run compilers interactively from your web browser and interact with the assembly
Optimizing SGEMM kernel functions on NVIDIA GPUs to a close-to-cuBLAS performance.
A tool based on Excalidraw to create stop motion animations and slides.
helper scripts for vivado and vivado_hls build with cmake.
Examples shown as part of the tutorial "Productive parallel programming on FPGA with high-level synthesis".
a cheat-sheet for mathematical notation in code form
FPGA SoC Linux Device Tree Overlay FPGA Manager U-Boot&Linux Kernel&Debian11 Images (for Xilinx:Zynq Ultrascale MPSoC)
Example for ZynqMP-FPGA-XRT(Xilinx RunTime for ZynqMP-FPGA-Linux)
Tool for updating the contents of BlockRAMs found in Xilinx 7 series bitstreams.
Scalable systolic array-based matrix-matrix multiplication implemented in Vivado HLS for Xilinx FPGAs.
A booklet on machine learning systems design with exercises. NOT the repo for the book "Designing Machine Learning Systems"
A collection of out-of-tree LLVM passes for teaching and learning
Intro to Creative Coding workshop with p5.js and Tone.js
A high-level performance analysis tool for FPGA-based accelerators
A blog for LLVM(v9.0.0 or v11.0.0) beginner, step by step, with detailed documents and comments. Record the way I learn LLVM and accomplish a complete project for FPGA High-Level Synthesis with it.
ehacinom / Saga
Forked from wadetgong/SagaSaga is a mobile app that lets users team up and compete in local scavenger hunts comprised of challenging riddles, augmented reality games, and geolocation puzzles. It's like Escape Room, but for …