Many problems in mathematics and computer science come down to matrix multiplication where the matrix entries are integers considered modulo a prime p. Quite often p is 2, but other primes are also interesting.
A great deal of effort has been put into optimizing this calculation on multicore CPUs. On our 64 core server, for instance, dense 1 million x 1 million matrices over the integers mod 2 can be multiplied in less than a day.
A recent project has produced good results for the case of p=2. A further project could explore other primes, or possibly look at Gaussian elimination.
This project aims to explore whether competitive performance can be obtained using a modern GPU with CUDA or OpenCL.
Experimental software and results. Optionally a software library.