Predicting Compiler Optimization Performance for High-Performance Computing Applications

No Thumbnail Available

Date

2005-08-30

Journal Title

Series/Report No.

Journal ISSN

Volume Title

Publisher

Abstract

High performance computing application developers often spend a large amount of time in tuning their applications. Despite the advances in compilers and compiler optimization techniques, tuning efforts are still largely manual and require many trials and errors. One of the reasons for this is that many compiler optimizations do not always provide performance gain in all cases. Complicating the problem further is the fact that many compiler optimizations help performance in some cases, but hurt performance in other cases in the same application. To make it worse, it may help performance when it runs with a specific input set, but hurt the performance of the same application when it runs with a different input set. The central idea that this work deals with is whether machine learning techniques can be used to automate compiler optimization selection. Artificial Neural Networks (ANN), and Decision Trees (DT) are modelled, trained and used to predict whether Loop Unrolling optimizations should be applied or not for loops of serial programs. Simple loop characteristics such as iteration count, nesting level, and body size, are collected and used as input to the ANN or DT. A very simple microbenchmark is used to train the ANN, and this is used to predict the benefit of loop unrolling across differnt NAS (Serial Version) benchmarks. We find that an ANN trained using the microbenchmark accurately predicts whether loop unrolling is beneficial in 62\% of the cases. BT predicts correctly if loop unrolling is benefial in 82\% of the cases. Furthermore we find that benchmarks such as FT which perform poorly when tested with ANN trained with the microbenchmark yield accurate results in 69\% of the cases when tested using an ANN trained with loops from other NAS benchmarks. Decision trees used to classify loops (as being benefitted from loop unrolling or not) from the NAS benchmarks were found to have an accuracy of 79.54\%. A DT built using the microbenchmark correctly classified NAS loops 53\% of the time. Although the results show promise, we believe that to accurately automate compiler optimization selection, more complex loops may need to be modeled in the microbenchmark and many other factors may need to be taken into account in characterizing each loop nest.

Description

Keywords

Machine Learning, Neural Networks, High-Performance Computing Applications, Performance tuning, Compiler Optimization

Citation

Degree

MS

Discipline

Computer Engineering

Collections