Go to:
Logótipo
Você está em: Start > Publications > View > Fast Heuristic-Based GPU Compiler Sequence Specialization
Map of Premises
Principal
Publication

Fast Heuristic-Based GPU Compiler Sequence Specialization

Title
Fast Heuristic-Based GPU Compiler Sequence Specialization
Type
Article in International Conference Proceedings Book
Year
2019-12-31
Authors
Ricardo Nobre
(Author)
FEUP
View Personal Page You do not have permissions to view the institutional email. Search for Participant Publications Without AUTHENTICUS Without ORCID
Luís Reis
(Author)
FEUP
View Personal Page You do not have permissions to view the institutional email. Search for Participant Publications Without AUTHENTICUS Without ORCID
Conference proceedings International
Pages: 494-505
International European Conference on Parallel and Distributed Computing (Euro-Par)
Turin, ITALY, AUG 27-28, 2018
Indexing
Publicação em ISI Web of Knowledge ISI Web of Knowledge - 0 Citations
Publicação em Scopus Scopus - 0 Citations
Other information
Authenticus ID: P-00Q-2NK
Abstract (EN): Iterative compilation focused on specialized phase orders (i.e., custom selections of compiler passes and orderings for each program or function) can significantly improve the performance of compiled code. However, phase ordering specialization typically needs to deal with large solution space. A previous approach, evaluated by targeting an x86 CPU, mitigates this issue by first using a training phase on reference codes to produce a small set of high-quality reusable phase orders. This approach then uses these phase orders to compile new codes, without any code analysis. In this paper, we evaluate the viability of using this approach to optimize the GPU execution performance of OpenCL kernels. In addition, we propose and evaluate the use of a heuristic to further reduce the number of evaluated phase orders, by comparing the speedups of the resulting binaries with those of the training phase for each phase order. This information is used to predict which untested phase order is most likely to produce good results (e.g., highest speedup). We performed our measurements using the PolyBench/GPU OpenCL benchmark suite on an NVIDIA Pascal GPU. Without heuristics, we can achieve a geomean execution speedup of 1.64x, using cross-validation, with 5 non-standard phase orders. With the heuristic, we can achieve the same speedup with only 3 non-standard phase orders. This is close to the geomean speedup achieved in our iterative compilation experiments exploring thousands of phase orders. Given the significant reduction in exploration time and other advantages of this approach, we believe that it is suitable for a wide range of compiler users concerned with performance.
Language: English
Type (Professor's evaluation): Scientific
No. of pages: 12
Documents
We could not find any documents associated to the publication.
Related Publications

Of the same authors

Compiler Phase Ordering as an Orthogonal Approach for Reducing Energy Consumption (2018)
Technical Report
Ricardo Nobre; Luís Reis; João M. P. Cardoso
Recommend this page Top
Copyright 1996-2025 © Faculdade de Medicina Dentária da Universidade do Porto  I Terms and Conditions  I Acessibility  I Index A-Z
Page created on: 2025-08-12 at 16:20:07 | Privacy Policy | Personal Data Protection Policy | Whistleblowing | Electronic Yellow Book