Projects

DGL

Fast and memory-efficient message passing primitives for training Graph Neural Networks. Scale to giant graphs via multi-GPU acceleration and distributed training infrastructure.

Github repository

Optimization of Sparse N-D Tensor Kernels

Tensors are used to represent high dimensional data. For example, the attributes of an email conversation (subject, author, and time) can be represented by the use of a tensor or a 3-way array. Real world sparse tensors are extremely large, often follow power-law distribution, and are extremely sparse. Canonical Polyadic Decomposition (CPD) is one of the most common tensor factorization techniques, applicable to both dense and sparse tensors, and MTTKRP (multiplying a sparse matricized tensor by Khatri-Rao product), a key kernel, is a common bottleneck for CPD. We propose multiple new data structures to address 1) The fundamental difference between the parallel structure of threads in GPUs versus multicore CPUs, necessitating attention to load-balancing at both levels (between warps in a thread block and thread blocks in a grid); the diversity of the nonzero distribution patterns in a sparse tensor, with different representations being beneficial for ultra-sparse versus moderately sparse regions of a tensor.

  • Github repository
  • Paper on SC'2019
  • Paper on IPDPS'2019

  • Optimization of Sparse 2D Matrix Kernels

    Sparse Matrix-Vector (SpMV), Sparse Matrix-Multivector (SpMM), Sparse Matrix-Dense Matrix (SDDMM) products are key kernels for computational science and data science. While GPUs offer significantly higher peak performance and memory bandwidth than multicore CPUs, achieving high performance on sparse computations on GPUs is very challenging. We present an in-depth analysis and develop a new sparse-matrix representation and computation approach suited to achieving high data-movement efficiency and effective GPU parallelization.

  • Paper on PPoPP'2019
  • Paper on HiPC'2018
  • More reference papers can be found in the publication section