Design of CINN/DSL

page md__WorkSpace_CINN_cinn_lang_README

This module is a simple DSL defined in CINN project. The DSL module aims to represent the overall computation in a hardware indenpendent way.



All the mutable elements in CINN are Object. Shared

The Shared objects are reference-count-self-contained container, which is similar to the std::shared_ptr.

One can pass a Shared object by passing a pointer and the consumer object should store it in a local Shared member variable.


The input or the temporary ouptut node.

Every Compute will output a Tensor, the tensor can be sliced.


The special tensor that represents a input slot.

PlaceHolder<float> A("A", {M, N});
PlaceHolder<float> B("B", {M, N});


The Operation is the operation on tensors, including

  • placeholder

  • compute

  • bound inference

Tensor C = Compute({M,N}/*output shape*/, [&](Var i, Var j) {
  Var k;
  return ReduceSum(A[i,k] * B[k,j], {k});

Bound inference

The PlaceHolder should define a shape.

Var M(Int(32));
Var N(Int(32));

PlaceHolder<float> A({M, N});

Var i,j;
Expr tmp = A[i][j] + 1; // i \in {0, M}; j \in {0, N}

To simplify the implementation, we use ISL to generate code for basic snippets.


The schedule will

  1. determine the order of computation, by topological sorting the computational graph composed of tensors.

  2. transforming the computations

order schedule

  1. Topological sort the tensors

  2. for each tensor, generate the code it needs.

Some examples

A matrix multiplication

// Declare some iterator variables.
Var i, j, k;
Placeholder<float> A({M, K}), B({K, N});

Tensor C = Compute({M, N}/*output shape*/, 
        [](Var i, Var j) {
            return ReduceSum(A(i,k) * B(k, j), k);
        }, "C");
Tensor D = Compute({M, N}, [](Var i, Var j) {
  return Map(C(i,j) + 1);

Schedule s = CreateSchedule(C);
auto func = Build(s, [A, B, C], target=target, name="matmul");

func(a, b, c);