literature

You should also look at the literature folder of the main repository.

'[[TOC]]

Functional vector/array DSLs and libraries

Accelerate

Nikola

Data parallel Haskell

Repa

Repa is "REgular, shape-polymorphic, Parallel Arrays" and is documented in three research papers (see below).

"Regular, shape-polymorphic, parallel arrays in Haskell" - ICFP 2010

Introduces the repa system
"Efficient Parallel Stencil Convolution in Haskell" - Haskell Symposium 2011

Extends repa with stencil operations
"Guiding Parallel Array Fusion with Indexed Types" - Haskell Symposium 2012

Allows the library-user to select between several different array representations

Feldspar

Obsidian

This article documents push-arrays: [http://dl.acm.org/citation.cfm?id=2103740]

Data.Vector

EmbArBB and HArBB

EmbArBB is a thin wrapper around Intel's ArBB exposing a small DSL, but still involves a lot of clutter. Not that promising in itself, but ArBB might be worth to take a look at.

HArBB is a ArBB back-end to Accelerate, though not supporting all of Accelerate's features and general folds are only efficiently implemented for certain operators (e.g. addition, multiplication and xor), but not for general lambda expressions.

Meta-Par
hmatrix

Copperhead

Look at sections 4.2-4.4 for considerations and references when mapping nested data parallelism to cuda.

The Copperhead tech report references in this context :

[1] N. Bell and M. Garland. Implementing sparse matrix-vector multipli- cation on throughput-oriented processors. In SC ’09: Proc. Conference on High Performance Computing Networking, Storage and Analysis, pages 1–11. ACM, 2009.

NESL

Implementation of a portable nested data-parallel language http://dl.acm.org/citation.cfm?id=155343 has a non-nested linefit in figure 7.

Heterogeneous computing and other DSLs/libraries

Intel Array Building Blocks
Microsoft Accelerator
Acceleware
Copperhead (GPU programming in Python)
Brook and BrookGPU
Merge

Qilin

A system for C++ that compiles to both CUDA and Intel TBB. The main contribution is to adaptively select how much of a computation is scheduled for the CPU and the GPU from the input size (N). They do this by running a training run with different input sizes N for both the CPU and GPU version and then fitting linear functions to these runs (x = input size, y = running time). Given a concrete problem instance of size N that has to be executed, they can now find the optimal division of labour from these two functions.

Other notes:

Has a method of dividing any program into two parts that can be executed in parallel, such that the results can be combined (one part for CPU another for GPU), this method is not described in the paper.
Performs stream fusion
Interfaces with CUBLAS for efficient versions of matrix multiplication etc.
Analyzes memory requirements of programs before GPU code-generation and divides GPU programs further if the required memory is not available on the GPU. The individual smaller programs are then executed in serial and their results are combined.

GPU programming

OpenCL specification
CUDA by Example
NVIDIA OpenCL programming guide

Finance

Rolfs FAMØS article
Michael & Joachim's thesis and paper
SPJ Financial Contracts article
Longstaff and schwartz
Coursera course
Eric Couffignal's dissertation http://eprints.maths.ox.ac.uk/927/1/eric_couffignals.pdf
High-Performance Quasi-Monte Carlo Financial Simulation: FPGA vs. GPP vs. GPU

DSL construction and infrastructure

Miscellaneous

Coursera: Introduction to Computational Finance and Financial Econometrics

https://class.coursera.org/compfinance-2012-001/class/index

Chalmers' Course on Parallel Functional Programming

http://www.cse.chalmers.se/edu/course/pfp/index.html

Funny sidenote: PFP is an abbreviation for both "Parallel Functional Programming" and "Probabilistic Functional Programming"

Survey "VectorMARK" in progress

literature

Functional vector/array DSLs and libraries

Accelerate

Nikola

Data parallel Haskell

Repa

Feldspar

Obsidian

Data.Vector

EmbArBB and HArBB

Copperhead

NESL

Heterogeneous computing and other DSLs/libraries

Qilin

GPU programming

Finance

DSL construction and infrastructure

Miscellaneous

Coursera: Introduction to Computational Finance and Financial Econometrics

Chalmers' Course on Parallel Functional Programming

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally