"An introduction to ParallelAccelerator.jl" on the Julia blog

composition.al 2016-03-22

This month, I have a guest post, “An introduction to ParallelAccelerator.jl”, appearing on the Julia blog!

ParallelAccelerator is a Julia package for high-performance, high-level array-style programming that my group at Intel Labs released recently. It provides a macro called @acc that Julia programmers can use to annotate functions that are written in array style.

Under the hood, it’s a compiler that intercepts the usual Julia JIT compilation process and compiles those @acc-annotated functions to fast, parallel native code. This compiler is written entirely in Julia, except for a small runtime component written in C. In my post, I give examples of how to use @acc, show performance results for those examples, and touch on some aspects of the compiler internals.

My post weighs in at nearly 5,000 words long, which makes it the longest post that’s ever appeared on the Julia blog:

$ wc -w *.md | sort -r | head
   40513 total
    4827 2016-03-01-parallelaccelerator.md
    3467 2013-05-10-callback.md
    3116 2013-05-23-graphical-user-interfaces-part1.md
    2582 2013-05-23-graphical-user-interfaces-part2.md
    2456 2012-03-11-shelling-out-sucks.md
    2350 2016-02-01-iteration.md
    2315 2013-04-08-put-this-in-your-pipe.md
    2267 2015-10-21-biojulia-sequence-analysis.md
    2154 2013-09-04-fast-numeric.md

I was worried about that, but somehow, the Julia team was okay with it! If it’s too long for you, though, here’s an impressively concise summary from @JuliaFeeds:

The new @acc macro modifies Julia's Abstract Syntax Tree to speed up implicitly parallel code in Base library func's https://t.co/5605QTmd0K
— Julia Feeds (@JuliaFeeds) March 2, 2016

Thanks to the Julia team, my colleagues at Intel Labs, and other readers for making many suggestions that helped improve the post (and stopped it from getting even longer)!

There’s also an interesting discussion here, in which Jiahao Chen points out that the oft-recommended (and arguably idiomatic) way to write Julia is, in fact, not in array style, but rather in devectorized¹ style, with explicit loops. Because of that, it would make sense to also compare the performance of @acc-annotated programs with devectorized Julia versions, and that’s something that we plan to work on soon.

In my post, in order to avoid any confusion with the vectorization that compilers do, I don’t use the words “vectorized” or “devectorized”. As programming idioms, though (as opposed to compiler optimizations), “vectorized” and “devectorized” are what I mean when I say “array-style programming” and “programming with explicit loops”, respectively. ↩