Note: recent changes in the way Julia represents functions and closures require a redesign of this package, which will be published when the 0.5 IR stabilises a bit more.
ISPC.jl uses the Intel ISPC compiler to compile fragments
of ISPC-C or Julia code to vector code (eg. SSE/AVX on Intel CPUs) at runtime.
It is similar to the standard @simd
macro but supports control statements and a
number of math functions.
Speedups of 2x-8x compared to plain Julia code can be achieved (single core).
ispc
(must be found in $PATH)libtool
or g++
(must be found in $PATH)llvm_dis
for looking at ISPC llvm assembly.Work in progress! The high-level interface is fairly functional already, but bugs and the odd missing math function should be expected. More documentation to come.
See the example notebook for an overview.
The high-level interface translates Julia code that has been annotated with a set of macros into code compiled by ISPC on-the-fly. Example:
using ISPC
@inline function mandel(c_re, c_im, count)
z_re = c_re
z_im = c_im
i = 0
while i < count
if (z_re * z_re + z_im * z_im > 4.0f0)
break
end
new_re = z_re*z_re - z_im*z_im
new_im = 2.0f0 * z_re * z_im
z_re = c_re + new_re
z_im = c_im + new_im
i += 1
end
return i
end
@ispc function mandelbrot_ispc(x0, y0, x1, y1, output, max_iters)
height, width = size(output)
dx = (x1 - x0) / width
dy = (y1 - y0) / height
@kernel(`--target=avx1-i32x8`) do
for i = 1:width
@foreach(1:height) do j
x = x0 + i * dx
y = y0 + j * dy
output[j,i] = mandel(x, y, max_iters)
end
end
end
output
end;
Supported ISPC constructs:
@foreach
@foreach(:active)
(untested)@unmasked
(untested)@coherent
(untested)The high-level interface is meant to be used with numerical kernels only. Not all of Julia's syntax and types are supported:
Arrays can only be indexed with integers, providing either a single index (linear indexing) or one index per dimension (multi-dimensional arrays). Indexing must yield an array element, not a sub-array.
All outer variables ("kernel arguments") must be primitive types or arrays of primitive types.
Simple composite types like UnitRange
are supported inside kernels
(eg. in for
loops) but not as kernel arguments at the moment.
Only functions that have a direct translation to ISPC are supported
inside kernels. User-defined functions may be used if they are declared
@inline
. This restriction may be lifted in the future.
Inner functions, exceptions, task switching, I/O etc. are not supported.
ISPC task-level constructs are not yet supported.
The low-level interface allows you to load and call fragments of ISPC code:
using ISPC
# A basic ISPC kernel:
code = """
export void simple(uniform float vin[], uniform float vout[],
uniform int count) {
foreach (index = 0 ... count) {
float v = vin[index];
if (v < 0.5)
v = v * v;
else
v = sqrt(v);
vout[index] = v;
}
}
"""
# Compile the code and get a function pointer to our kernel:
lib = load_ispc(code, `--target=avx1-i32x8`)
fptr = Libdl.dlsym(lib, "simple")
# Call the kernel:
vin = rand(Float32, 1000);
vout = zeros(Float32, 1000);
ccall(fptr, Void, (Ref{Float32}, Ref{Float32}, UInt64), vin, vout, length(vout))
@kernel
macro creates a lambda function containing the kernel code.code_lowered()
or code_typed()
is run on the main @ispc
function to get
information about closure variables captured by the kernel lambda. These become
kernel arguments.@generated
kernel_call
function.kernel_call
is called for the first time, type inference is run on the
kernel fragment. This gives us types for local variables and inlines functions
that can be inlined.goto
s back into if
and while
statements (ISPC does not support varying goto
s)ispc
, loaded with Libdl
and called with ccall
.11/03/2015
over 1 year ago
86 commits