Abstract (EN):
Dynamic partitioning is a promising technique where computations are transparently moved from a General Purpose Processor (GPP) to a coprocessor during application execution. To be effective, the mapping of computations to the coprocessor needs to consider aggressive optimizations. One of the mapping optimizations is loop pipelining, a technique extensively studied and known to allow substantial performance improvements. This paper describes a technique for pipelining Megablocks, a type of runtime loop developed for dynamic partitioning. The technique transforms the body of Megablocks into an acyclic dataflow graph which can be fully pipelined and is based on the atomic execution of loop iterations. For a set of 9 benchmarks without memory operations, we generated pipelined hardware versions of the loops and estimate that the presented loop pipelining technique increases the average speedup of non-pipelined coprocessor accelerated designs from 1.6x to 2.2x. For a larger set of 61 benchmarks which include memory operations, the technique achieves a speedup increase from 2.5x to 5.6x. ©2012 IEEE.
Language:
English
Type (Professor's evaluation):
Scientific