The OpenDwarfs project provides a benchmark suite consisting of different computation/communication idioms, i.e., dwarfs, for state-of-art multicore and GPU systems. The first instantiation of the OpenDwarfs has been realized in OpenCL.
mpiBLAST is a freely available, open-source, parallel implementation of NCBI BLAST. By efficiently utilizing distributed computational resources through database fragmentation, query segmentation, intelligent scheduling, and parallel I/O, mpiBLAST improves NCBI BLAST performance by several orders of magnitude while scaling to hundreds of processors.
The CU2CL translator project seeks to deliver platform portability of CUDA codes via robust automated translation to OpenCL.
SeqInCloud, an abbreviation for the words “sequencing in the clouds”, combined with the Microsoft cloud computing platform and infrastructure, provides a portable cloud solution for next-generation sequence analysis. This resource optimizes data management, such as data partitioning and data transfer, to deliver better performance and resource use of cloud resources.
cuBLASTP, is an efficient fine-grained BLASTP implementation for the GPU using CUDA. The implementation includes research contributions, such as (1) memory-access reordering to reorder hits from column-major order to diagonal-major order, (2) position-based indexing to map a hit with a packed data structure to a bin, (3) aggressive hit filtering to eliminate hits beyond the threshold distance along the diagonal, (4) diagonal-based parallelism and hit-based parallelism for ungapped extension to extend sequences with different lengths in databases, and (5) hierarchical buffering to reduce memory-access overhead for the core data structures.
Heterogeneity is becoming a fact of life in HPC, largely driven by demands for increased parallelism and power efficiency over what traditional CPUs can provide. However, extracting the full performance of heterogeneous systems is non-trivial and requires architecture expertise. Future Work Retrofitting existing codes for heterogeneity is tedious and error-prone, architecture experts are in short supply, and accelerators are moving targets. Therefore, a single API for transparently executing optimized code on accelerators with minimal intervention is needed for scientific productivity.