A Survey on Matrix Multiplication for GPU
Keywords:
GPU, NVIDIA CUDA, Shared Memory, Tiling, Matrix Multiplication, SIMDAbstract
Now a day sequential processing is certainly not sufficient for a large data
computation in the area of computer science and technology. The need for high-performance
computation is ever growing, even though certain problem sets remain within the area of highperformance computing with applications such as Weather Forecasting, Quantum Physics, and
Climate Research etc. Within the commercial area of computation, NVIDIA has an architectural
framework (NVIDIA CUDA)to harness the power of GPU’s which was previously only been
utilized for graphics application like 3D games, but now it has been used for certain types of
high-performance computation. In this paper, we will take a critical look at different techniques
of Matrix multiplication operation. This paper perform the Matrix multiplication problem with
different implementation techniques, and the results has compared on the basis of execution time
and find which technique is the most efficient approach for our problem set (matrix operation of
n size matrices).