Docs | Tutorial | DiffTaichi | Examples | Contribute | Forum
Documentations | Chat | taichi-nightly | taichi-nightly-cuda-10-0 | taichi-nightly-cuda-10-1 |
---|---|---|---|---|
# Python 3.6/3.7 needed
# CPU only. No GPU/CUDA needed. (Linux, OS X and Windows)
python3 -m pip install taichi-nightly
# With GPU (CUDA 10.0) support (Linux only)
python3 -m pip install taichi-nightly-cuda-10-0
# With GPU (CUDA 10.1) support (Linux only)
python3 -m pip install taichi-nightly-cuda-10-1
Linux (CUDA) | OS X (10.14 ) | Windows | |
---|---|---|---|
Build | |||
PyPI |
- (SIGGRAPH Asia 2019) High-Performance Computation on Sparse Data Structures [Video] [BibTex]
- by Yuanming Hu, Tzu-Mao Li, Luke Anderson, Jonathan Ragan-Kelley, and Frédo Durand
- (ICLR 2020) Differentiable Programming for Physical Simulation [Video] [BibTex] [Code]
- by Yuanming Hu, Luke Anderson, Tzu-Mao Li, Qi Sun, Nathan Carr, Jonathan Ragan-Kelley, and Frédo Durand
- (Done) Fully implement the LLVM backend to replace the legacy source-to-source C /CUDA backends (By Dec 2019)
- The only missing features compared to the old source-to-source backends:
- Vectorization on CPUs. Given most users who want performance are using GPUs (CUDA), this is given low priority.
- Automatic shared memory utilization. Postponed until Feb/March 2020.
- The only missing features compared to the old source-to-source backends:
- (Done) Redesign & reimplement (GPU) memory allocator (by the end of Jan 2020)
- (WIP) Tune the performance of the LLVM backend to match that of the legacy source-to-source backends (Hopefully by mid Feb, 2020. Current progress: setting up/tuning for final benchmarks)
-
(Feb 14, 2020) v0.5.0 released with a new Apple Metal GPU backend for Mac OS X users! (by Ye Kuang [k-ye])
- Just initialize your program with
ti.init(..., arch=ti.metal)
and run Taichi on your Mac GPUs! - A few takeaways if you do want to use the Metal backend:
- For now, the Metal backend only supports
dense
SNodes and 32-bit data types. It doesn't supportti.random()
orprint()
. - Pre-2015 models may encounter some undefined behaviors under certain conditions (e.g. read-after-write). According to our tests, it seems like the memory order on a single GPU thread could go inconsistent on these models.
- The
[]
operator in Python is slow in the current implementation. If you need to do a large number of reads, consider dumping all the data to anumpy
array viato_numpy()
as a workaround. For writes, consider first generating the data into anumpy
array, then copying that to the Taichi variables as a whole. - Do NOT expect a performance boost yet, and we are still profiling and tuning the new backend. (So far we only saw a big performance improvement on a 2015 MBP 13-inch model.)
- For now, the Metal backend only supports
- Just initialize your program with
-
(Feb 12, 2020) v0.4.6 released.
- (For compiler developers) An error will be raised when
TAICHI_REPO_DIR
is not a valid path (by Yubin Peng [archibate]) - Fixed a CUDA backend deadlock bug
- Added test selectors
ti.require()
andti.archs_excluding()
(by Ye Kuang [k-ye]) ti.init(**kwargs)
now takes a parameterdebug=True/False
, which turns on debug mode if true- ... or use
TI_DEBUG=1
to turn on debug mode non-intrusively - Fixed
ti.profiler_clear
- Added
GUI.line(begin, end, color, radius)
andti.rgb_to_hex
- Renamed
ti.trace
(Matrix trace) toti.tr
.ti.trace
is now for logging withti.TRACE
level - Fixed return value of
ti test_cpp
(thanks to Ye Kuang [k-ye]) - Raise default loggineg level to
ti.INFO
instead of trace to make the world quiter - General performance/compatibility improvements
- Doc updated
- (For compiler developers) An error will be raised when
-
(Feb 6, 2020) v0.4.5 released.
ti.init(arch=..., print_ir=..., default_fp=..., default_ip=...)
now supported.ti.cfg.xxx
is deprecated- Immediate data layout specification supported after
ti.init
. No need to wrap data layout definition with@ti.layout
anymore (unless you intend to do so) ti.is_active
,ti.deactivate
,SNode.deactivate_all
supported in the new LLVM x64/CUDA backend. Example- Experimental Windows non-UTF-8 path fix (by Yubin Peng [archibate])
ti.global_var
(which duplicatesti.var
) is removedti.Matrix.rotation2d(angle)
added
-
(Feb 5, 2020) v0.4.4 released.
- For developers: ffi-navigator support [doc]. (by masahi)
- Fixed
f64
precision support ofsin
andcos
on CUDA backends (by Kenneth Lozes [KLozes]) - Make Profiler print the arch name in its title (by Ye Kuang [k-ye])
- Tons of invisible contributions by Ye Kuang [k-ye], for the WIP Metal backend
Profiler
working on CPU devices. To enable,ti.cfg.enable_profiler = True
. Callti.profiler_print()
to print kernel running times- General performance improvements
-
(Feb 3, 2020) v0.4.3 released.
GUI.circles
2.4x faster- General performance improvements
-
(Feb 2, 2020) v0.4.2 released.
- GUI framerates are now more stable
- Optimized OffloadedRangeFor with const bounds. Light computation programs such as
mpm88.py
is 30% faster on CUDA due to reduced kernel launches - Optimized CPU parallel range for performance
-
(Jan 31, 2020) v0.4.1 released.
- Fixed an autodiff bug introduced in v0.3.24. Please update if you are using Taichi differentiable programming.
- Updated
Dockerfile
(by Shenghang Tsai [jackalcooper]) pbf2d.py
visualization performance boosted (by Ye Kuang [k-ye])- Fixed
GlobalTemporaryStmt
codegen
-
(Jan 30, 2020) v0.4.0 released.
- Memory allocator redesigned
- Struct-fors with pure dense data structures will be demoted into a range-for, which is faster since no element list generation is needed
- Python 3.5 support is dropped. Please use Python 3.6(pip)/3.7(pip)/3.8(Windows: pip; OS X & Linux: build from source) (by Chujie Zeng [Psycho7])
ti.deactivate
now supported on sparse data structuresGUI.circles
(batched circle drawing) performance improved by 30x- Minor bug fixes (by Yubin Peng [archibate], Ye Kuang [k-ye])
- Doc updated