1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317
|
# Hacking on the Futhark Compiler
The Futhark compiler is a significant body of code with a not entirely
straightforward design. The main source of documentation is the
Haddock comments in the source code itself, including [this general
overview of the compiler
architecture](https://hackage.haskell.org/package/futhark-0.24.3/docs/Futhark.html).
To build the compiler, you need a recent version of
[GHC](http://ghc.haskell.org/), which can be installed via
[ghcup](https://www.haskell.org/ghcup/). Alternatively, if you
[install Nix](https://nixos.org/download.html#download-nix) then you
can run `nix-shell` to get a shell environment in which all necessary
tools are installed.
After that, run `make docs` to generate internal compiler
documentation in HTML format. The last few lines of output will tell
you the name of an `index.html` file which you should then open. Go
to the documentation for the module named `Futhark`, which contains an
introduction to the compiler architecture.
For contributing code, see the [Haskell style guide](STYLE.md).
If you feel that the documentation is incomplete, or something lacks
an explanation, then feel free to [report it as an
issue](https://github.com/diku-dk/futhark/issues). Documentation bugs
are bugs too.
## Building
We include a `Makefile` with the following targets.
* `make build` (or just `make`) builds the compiler.
* `make install` builds the compiler and copies the resulting binaries
to `$HOME/.local/bin`, or `$PREFIX/bin` if the `PREFIX` environment
variable is set.
* `make docs` builds internal compiler documentation. For the user
documentation, see the `docs/` subdirectory.
* `make check` style-checks all code. Requires [GNU
Parallel](https://www.gnu.org/software/parallel/).
* `make check-commit` style-checks all code staged for a commit.
Requires [GNU Parallel](https://www.gnu.org/software/parallel/).
You can also use `cabal` directly if you are familiar with it. In
particular, `cabal run futhark -- args...` is useful for running the
Futhark compiler with the provided args.
### Enabling profiling
Asking GHC to generate profiling information is useful not just for
the obvious purpose of gathering profiling information, but also so
that stack traces become more informative. Run
make configure-profile
to turn on profiling. This setting will be stored in the file
`cabal.project.local` and all future builds will be with profiling
information. Note that the compiler runs significantly slower this
way.
To produce a profiling report when running the compiler, add ` RTS -p`
to the *end* command line.
See also [the chapter on
profiling](https://downloads.haskell.org/ghc/latest/docs/users_guide/profiling.html)
in the GHC User's Guide.
Note that GHCs code generator is sometimes slightly buggy in its
handling of profiled code. If you encounter a compiler crash with an
error message like "PAP object entered", then this is a GHC bug.
### Debugging compiler crashes
By default, Haskell does not produce very good stack traces. If you
compile with `make configure-profile` as mentioned above, you can pass
` RTS -xc` to the Futhark compiler in order to get better stack
traces. You will see that you actually get *multiple* stack traces,
as the Haskell runtime system will print a stack trace for every
signal it receives, and several of these occur early, when the program
is read from disk. Also, the *final* stack trace is often some
diagnostic artifact. Usually the second-to-last stack trace is what
you are looking for.
## Testing
### Only internal compilation
This command tests compilation *without* compiling the generated C
code, which speeds up testing for internal compiler errors:
futhark test -C tests --pass-compiler-option=--library
### Running only a single unit test
cabal run unit -- -p '/reshape . fix . iota 3d/'
The argument to `-p` is the name of the test that fails, as reported
by `cabal test`. You may have to scroll through the output a bit to
find it.
## Debugging Internal Type Errors
The Futhark compiler uses a typed core language, and the type checker
is run after every pass. If a given pass produces a program with
inconsistent typing, the compiler will report an error and
abort. While not every compiler bug will manifest itself as a core
language type error (unfortunately), many will. To write the erroneous
core program to `filename` in case of type error, pass `-vfilename` to
the compiler. This will also enable verbose output, so you can tell
which pass fails. The `-v` option is also useful when the compiler
itself crashes, as you can at least tell where in the pipeline it got
to.
## Checking Generated Code
Hacking on the compiler will often involve inspecting the quality of
the generated code. The recommended way to do this is to use
`futhark c` or `futhark opencl` to compile a Futhark program to an
executable. These backends insert various forms of instrumentation
that can be enabled by passing run-time options to the generated
executable.
- As a first resort, use `-t` option to use the built-in runtime
measurements. A nice trick is to pass `-t /dev/stderr`, while
redirecting standard output to `/dev/null`. This will print the
runtime on the screen, but not the execution result.
- Optionally use `-r` to ask for several runs, e.g. `-r 10`. If
combined with `-t`, this will cause several runtimes to be printed
(one per line).
- Pass `-D` to have the program print information on allocation and
deallocation of memory.
- (`futhark opencl` and `futhark cuda` only) Use the `-D` option to
enable synchronous execution. `clFinish()` or the CUDA equivalent
will be called after most OpenCL operations, and a running log of
kernel invocations will be printed. At the end of execution, the
program prints a table summarising all kernels and their total
runtime and average runtime.
## Using `futhark dev`
For debugging specific compiler passes, the `futhark dev` subcommand
allows you to tailor your own compilation pipeline using command line
options. It is also useful for seeing what the AST looks like after
specific passes.
### `FUTHARK_COMPILER_DEBUGGING` environment variable
You can set the level of debug verbosity via the environment variable
`FUTHARK_COMPILER_DEBUGGING`. It has the following effects:
- `FUTHARK_COMPILER_DEBUGGING=1`:
The frontend prints internal names. (This may affect code
generation in some cases, so turn it off when actually
generating code.)
Tools that talk to server-mode executables will print the messages
sent back and forth on the standard error stream.
- `FUTHARK_COMPILER_DEBUGGING=2`:
All of the effects of `FUTHARK_COMPILER_DEBUGGING=1`.
The frontend prints explicit type annotations.
## Running compiler pipelines
You can run the various compiler passes in whatever order you wish.
There are also various shorthands for running entire standard pipelines:
- `--gpu`: pipeline used for GPU backends (stopping just before adding
memory information).
- `--gpu-mem`: pipeline used for GPU backends, with memory
information. This will show the IR that is passed to ImpGen.
- `--seq`: pipeline used for sequential backends (stopping just before
adding memory information).
- `--seq-mem`: pipeline used for sequential backends, with memory
information. This will show the IR that is passed to ImpGen.
- `--mc`: pipeline used for multicore backends (stopping just before
adding memory information).
- `--mc-mem`: pipeline used for multicore backends, with memory
information. This will show the IR that is passed to ImpGen.
By default, `futhark dev` will print the resulting IR. You can switch to
a different *action* with one of the following options:
- `--compile-imp-seq`: generate sequential ImpCode and print it.
- `--compile-imp-gpu`: generate GPU ImpCode and print it.
- `--compile-imp-multicore`: generate multicore ImpCode and
print it.
You must use the appropriate pipeline as well (e.g. `--gpu-mem` for
`--compile-imp-gpu`).
You can also use e.g. `--backend=c` to run the same code generation
and compilation as `futhark c`. This is useful for experimenting with
other compiler pipelines, but still producing an executable or
library.
## When you are about to have a bad day
When using the `cuda` backend, you can use the `--dump-ptx` runtime
option to dump PTX, a kind of high-level assembly for NVIDIA GPUs,
corresponding to the GPU kernels. This can be used to investigate why
the generated code isn\'t running as fast as you expect (not fun), or
even whether NVIDIAs compiler is miscompiling something (extremely not
fun). With the OpenCL backend, `--dump-opencl-binary` does the same
thing.
On AMD platforms, `--dump-opencl-binary` tends to produce an actual
binary of some kind, and it is pretty tricky to obtain a debugger for it
(they are available and open source, but the documentation and
installation instructions are terrible). Instead, AMDs OpenCL kernel
compiler accepts a `-save-temps=foo` build option, which will make it
write certain intermediate files, prefixed with `foo`. In particular, it
will write an `.s` file that contains what appears to be HSA assembly
(at least when using ROCm). If you find yourself having to do do this,
then you are definitely going to have a bad day, and probably evening
and night as well.
## Minimising programs
Sometimes you have a program that produces the wrong results rather
than crashing the compiler. These are some of the most difficult bugs
to handle. If the result is at least deterministic and you have some
way of compiling the program that does work (either an older version
or a different backend), then the following procedure is useful for
reducing the program as much as possible. Suppose that we are trying
to debug a miscompilation for the `opencl` backend where the `c`
backend works, the failing program is `prog.fut`, and the input data
is `prog.in`. Write the following script `test.sh`:
```
set -x
set -e
futhark c prog.fut -o prog-c
futhark opencl prog.fut -o prog-opencl
cat prog.in | ./prog-c -b > output-c
cat prog.in | ./prog-opencl -b > output-opencl
futhark datacmp output-c output-opencl
```
This compares the results obtained from running the program with the
two compilers. You can now (manually) start removing parts of
`prog.fut` while regularly rerunning `test.sh` to verify that it still
fails. In particular, you can easily remove program return values,
which is not the case if you are comparing against a fixed expected
output. Eventually you will have a hopefully small program that
produces different results with the two compilers, and you can look in
detail at the IR to figure out what goes wrong.
## Graphs of internal data structures
Some passes can prettyprint internal representations in
[GraphViz](https://graphviz.org/) format. For example, to see the
fusion graph (prior to fusion), do
$ futhark dev -e --inline-aggr -e foo.fut --fusion-graph > foo.dot
and then to render `foo.dot` as `foo.dot.pdf` with GraphViz:
$ dot foo.dot -Tpdf -O
## Using Oclgrind
[Oclgrind](https://github.com/jrprice/oclgrind) is an OpenCL simulator
similar to Valgrind that can help find memory and synchronisation
errors. It runs code somewhat slowly, but it allows testing of OpenCL
code on systems that are not otherwise capable of executing OpenCL.
It is very easy to run a program in Oclgrind:
oclgrind ./foo
For use in `futhark test`, we have [a wrapper
script](tools/oclgrindgrunner.sh) that returns with a nonzero exit
code if Oclgrind detects a memory error. You use it as follows:
futhark test foo.fut --backend=opencl --runner=tools/oclgrindrunner.sh
Some versions of Oclgrind have an unfortunate habit of [generating
code they don't know how to
execute](https://github.com/jrprice/Oclgrind/issues/204). To work
around this, disable optimisations in the OpenCL compiler:
futhark test foo.fut --backend=opencl --runner=tools/oclgrindrunner.sh --pass-option=--build-option=-O0
## Using `futhark script`
The `futhark script` command is a handy way to run (server-mode)
executables with arbitrary input, while also seeing logging output in
real time. This is particularly useful for programs whose benchmarking
input are complicated FutharkScript expressions.
If you have a program `infinite.fut` containing
```Futhark
entry main n = iterate 1000000000 (map ( 1)) (iota n)
```
then you can run
```
$ futhark script -D infinite.fut 'main 10i64'
```
to run it with debug prints. You can also use `-L` instead of `-D` to
just enable logging. The `main 10i64` can be an arbitrary FutharkScript
expression.
The above will compile `infinite.fut` using the `c` backend before
running it. Pass a `--backend` option to `futhark script` to use a
different backend, or pass an already compiled program instead of a
`.fut` file (e.g., `infinite`).
See the manpages for `futhark script` and `futhark literate` for more
information.
|