Monday, May 28, 2012

The first GPU programming in Paraiso

Now let's move on to the GPU. We need only a few modifications to turn the code generator that uses C++ into that that uses CUDA.
$ diff HelloWorld/Generator.hs HelloGPU/Generator.hs
40c40,42
<   { Native.directory = "./dist/"
---
>   { Native.directory = "./dist/" ,
>     Native.language  = Native.CUDA,
>     Native.cudaGridSize = (32,1)
The following Setup specifies that we want to generate CUDA code, and also specifies the CUDA grid dimensions. (The following code will be executed with dimensions <<<32,1>>>.)
mySetup :: Native.Setup Vec2 Int
mySetup =
  (Native.defaultSetup $ Vec :~ 10 :~ 20)
  { Native.directory = "./dist/" ,
    Native.language  = Native.CUDA,
    Native.cudaGridSize = (32,1)
  }
The generated code has been tested to produce the same result as the C++ version using nvcc 4.1 (with its built-in thrust) on TSUBAME 2.0 at Tokyo Institute of Technology.

Wednesday, May 23, 2012

API for Paraiso-generated codes

Our HelloWorld sample generates C++ code under dist folder. First have a look at the header file:

class TableMaker{
private: std::vector  om_s0_table;
private: int om_s1_total;
private: std::vector  om_m0_5;
public:  TableMaker () : om_s0_table(om_memory_size()),
  om_m0_5(om_memory_size()) {

}

Here we declare and allocate the memory for the static variables. Next comes the accessors for the OM size:

public: int om_size ()  {
return 200;
}
public: int om_size_0 ()  {
return 10;
}
public: int om_size_1 ()  {
return 20;
}

Note that all the auxiliary functions that Paraiso generates have name prefix om_. Users should give their variables and kernels unique names not prefixed with om_ to avoid name collisions.

The next series of size functions, om_memory_size(), returns the actual size of memory allocated. If the translated OM contains shift instructions, Paraiso allocates margin region for communications, and om_memory_size() will be greater than om_size().

public: int om_memory_size ()  {
return 200;
}
public: int om_memory_size_0 ()  {
return 10;
}
public: int om_memory_size_1 ()  {
return 20;
}

And here's the margin functions.

public: int om_lower_margin_0 ()  {
return 0;
}
public: int om_lower_margin_1 ()  {
return 0;
}
public: int om_upper_margin_0 ()  {
return 0;
}
public: int om_upper_margin_1 ()  {
return 0;
}

You get a nice element-wise accessor for each static array variable of the OM. There are accessors for scalar variables, too.

public: int & table (int i0, int i1)  {
return (om_s0_table)[((om_lower_margin_0()) + (i0)) + 
  ((om_memory_size_0()) * ((om_lower_margin_1()) + (i1)))];
}
public: int & total ()  {
return om_s1_total;
}

You also get a direct access to the raw data structure of each Array although, what the term "raw" means depends on the implementation detail.

public: std::vector  & table ()  {
return om_s0_table;
}

Finally, there are class methods that correspond to OM kernels.

public: void create () ;
};

Tuesday, May 22, 2012

Writing your first Orthotope Machine Program

Paraiso is language embedded in Haskell, specialized in the domain of explicit solvers of partial differential equations on uniform mesh. The computation of the domain is represented as the programs of the Orthotope Machine(OM). The Orthotope Machine is design to capture all of the parallelism inherent in the original problem. It is free from the order the elements of the array is stored and accessed. The dependencies between the instructions of the machine are clearly specified and independent instructions can be issued in any order or in parallel. Paraiso is the framework built around this OM; it can translate mathematical descriptions of the PDE-solving algorithm to OM programs; it can convert OM programs to native programs in C++ and CUDA; it can automatically tune them using evolutionary computation. Please refer to Paraiso paper for more details.

But in one short sentence, all you need to do is to create a value of type OM and pass it to the code generator.

Dive into the code

Our first sample code examples/HelloWorld/Generator.hs includes everything you need to tell Paraiso to let him generate code for you. Let's look inside it!

The first few lines are something like #include <stdio.h>.

#!/usr/bin/env runhaskell
{-# LANGUAGE NoImplicitPrelude, OverloadedStrings #-}
{-# OPTIONS -Wall #-}

import           Data.Dynamic
import           Data.Tensor.TypeLevel
import qualified Data.Text.IO as T
import           Language.Paraiso.Annotation (Annotation)
import           Language.Paraiso.Generator (generateIO)
import qualified Language.Paraiso.Generator.Native as Native
import           Language.Paraiso.Name
import           Language.Paraiso.OM
import           Language.Paraiso.OM.Builder
import           Language.Paraiso.OM.DynValue as DVal
import           Language.Paraiso.OM.PrettyPrint (prettyPrintA1)
import           Language.Paraiso.OM.Realm 
import qualified Language.Paraiso.OM.Reduce as Reduce
import           Language.Paraiso.Optimization
import           NumericPrelude
import           System.Process (system)

The first part of the sample defines a few names for the code we are going to generate. The type Name, defined in the module Language.Paraiso.Name, is just a newtype for Text defined in Data.Text module. We newtype it to distinguish the identifiers from the rest of the text.

-- the names we use
table, total, create, tableMaker :: Name
table = mkName "table"
total = mkName "total"
create = mkName "create"
tableMaker = mkName "TableMaker"

First, let's look at main routine. The first line creates an output folder. The second line is for pretty-printing the data-flow graphs of our first Orthotope Machine, namely myOM. This is just for your interest; it's not a necessary step for the code generation. The third line is the most important; it combines the myOM with mySetup and generates the C++ code.

main :: IO ()
main = do
  _ <- system "mkdir -p output"
  T.writeFile "output/OM.txt" $ prettyPrintA1 $ myOM
  _ <- generateIO mySetup myOM
  return () 

The data type Setup, defined in Language.Paraiso.Generator.Native, is responsible for all kinds of data needed at native-code generation timing but not needed in defining the abstract Orthotope Machine. Here, we specify the size of the actual OM to be generated, and the folder for the generated library. The default target language for code generation is C++.

mySetup :: Native.Setup Vec2 Int
mySetup = 
  (Native.defaultSetup $ Vec :~ 10 :~ 20)
  { Native.directory = "./dist/" 
  }

Next, let's look at the definition of myOM. To define an OM, pass four arguments, namely its name, the list of its global annotations, the list of its static variables, the list of its kernels, to function makeOM.

myOM :: OM Vec2 Int Annotation
myOM = optimize O3 $
  makeOM tableMaker [] myVars myKernels

myOM has two static variables. One is the array for the multiplication table; the other is for calculating the total of the table. They are Array andScalar Realm, respectively.

myVars :: [Named DynValue]
myVars = [Named table $ DynValue Array (typeOf (undefined::Int)),
          Named total $ DynValue Scalar (typeOf (undefined::Int))]

myOM has one kernel, called create. A kernel of an OM is a bunch of computation that is performed at once.

myKernels :: [Named (Builder Vec2 Int Annotation ())]
myKernels = [Named create createBuilder]

Finally, you can make OM kernels by using Builder Monad combinators. Here, let's make a two-dimensional OM kernel that loads x and y indices and multiply them. Let's also calculate the grand-sum of the table.

createBuilder :: Builder Vec2 Int Annotation ()
createBuilder = do 
  x <- bind $ loadIndex (undefined::Int) (Axis 0) 
  y <- bind $ loadIndex (undefined::Int) (Axis 1) 
  z <- bind $ x*y
  store table z
  store total $ reduce Reduce.Sum z

Thursday, May 17, 2012

Introduction to Computational Fluid Dynamics in Paraiso

Now the first paper on Paraiso is accepted. This is good, but like other just-born languages, there's a lot of refinements to be done, and nobody can use it without detailed documentation (even myself!).

To provide introduction to Paraiso syntax and its applications, I decided to post a series of articles that introduces computational fluid dynamics (CFD), in Paraiso.

For that end, we will be using the new donut branch and Paraiso 0.3.0.0 . Please refer to the Haddock and the Paraiso language website for further details.

The new branch
~$ git clone -b donut https://github.com/nushio3/Paraiso.git
Cloning into 'Paraiso'...
remote: Counting objects: 4743, done.
remote: Compressing objects: 100% (1815/1815), done.
remote: Total 4743 (delta 2885), reused 4637 (delta 2779)
Receiving objects: 100% (4743/4743), 2.65 MiB | 411 KiB/s, done.
Resolving deltas: 100% (2885/2885), done.
~$ cd Paraiso/
Paraiso$ git branch
* donut
Paraiso$

This branch already introduces a few changes from the published version, so be careful when reading the paper that
  •  Realms are more conventionally called Array and Scalar, rather than being Local and Global
  • C++ class generated by Paraiso has different interface.
Now, you should be able to install the latest Paraiso, as follows.
Paraiso$ cabal install
 (...a few minutes...) 
Registering Paraiso-0.3.0.0... 
Paraiso$ 

Hello World
Bad news is that, all the examples found in the first paper on Paraiso are now deprecated. However, you can find a new genetarion of examples here:
Paraiso$ cd examples/HelloWorld/
HelloWorld$ cat Generator.hs

Here is a Haskell code that uses Paraiso and generates a very simple library, in C++:
#!/usr/bin/env runhaskell
{-# LANGUAGE NoImplicitPrelude, OverloadedStrings #-}
{-# OPTIONS -Wall #-}

import           Data.Dynamic
import           Data.Tensor.TypeLevel
import qualified Data.Text.IO as T
import           Language.Paraiso.Annotation (Annotation)
import           Language.Paraiso.Generator (generateIO)
import qualified Language.Paraiso.Generator.Native as Native
import           Language.Paraiso.Name
import           Language.Paraiso.OM
import           Language.Paraiso.OM.Builder
import           Language.Paraiso.OM.DynValue as DVal
import           Language.Paraiso.OM.PrettyPrint (prettyPrintA1)
import           Language.Paraiso.OM.Realm 
import qualified Language.Paraiso.OM.Reduce as Reduce
import           Language.Paraiso.Optimization
import           NumericPrelude
import           System.Process (system)

-- the names we use
table, total, create, tableMaker :: Name
table = mkName "table"
total = mkName "total"
create = mkName "create"
tableMaker = mkName "TableMaker"


main :: IO ()
main = do
  _ <- system "mkdir -p output"
  T.writeFile "output/OM.txt" $ prettyPrintA1 $ myOM
  _ <- generateIO mySetup myOM
  return ()

mySetup :: Native.Setup Vec2 Int
mySetup = 
  (Native.defaultSetup $ Vec :~ 10 :~ 20)
  { Native.directory = "./dist/" 
  }

myOM :: OM Vec2 Int Annotation
myOM = optimize O3 $
  makeOM tableMaker [] myVars myKernels

myVars :: [Named DynValue]
myVars = [Named table $ DynValue Array (typeOf (undefined::Int)),
          Named total $ DynValue Scalar (typeOf (undefined::Int))]

myKernels :: [Named (Builder Vec2 Int Annotation ())]
myKernels = [Named create createBuilder]

createBuilder :: Builder Vec2 Int Annotation ()
createBuilder = do 
  x <- bind $ loadIndex (undefined::Int) (Axis 0) 
  y <- bind $ loadIndex (undefined::Int) (Axis 1) 
  z <- bind $ x*y
  store table z
  store total $ reduce Reduce.Sum z


Now let it generate the code and use it!
HelloWorld$ make
runhaskell Generator.hs
g++ main.cpp dist/TableMaker.cpp -o main.out

main.out uses the generated library and prints out the nice multiplication table for us.
HelloWorld$ ./main.out 
   0   0   0   0   0   0   0   0   0   0
   0   1   2   3   4   5   6   7   8   9
   0   2   4   6   8  10  12  14  16  18
   0   3   6   9  12  15  18  21  24  27
   0   4   8  12  16  20  24  28  32  36
   0   5  10  15  20  25  30  35  40  45
   0   6  12  18  24  30  36  42  48  54
   0   7  14  21  28  35  42  49  56  63
   0   8  16  24  32  40  48  56  64  72
   0   9  18  27  36  45  54  63  72  81
   0  10  20  30  40  50  60  70  80  90
   0  11  22  33  44  55  66  77  88  99
   0  12  24  36  48  60  72  84  96 108
   0  13  26  39  52  65  78  91 104 117
   0  14  28  42  56  70  84  98 112 126
   0  15  30  45  60  75  90 105 120 135
   0  16  32  48  64  80  96 112 128 144
   0  17  34  51  68  85 102 119 136 153
   0  18  36  54  72  90 108 126 144 162
   0  19  38  57  76  95 114 133 152 171
total: 8550 
HelloWorld$