4.6 KiB
C++ Profiling with gprof
What is gprof?
gprof is a GNU profiler that helps analyze where a program spends most of its execution time. It provides function call counts and execution time details.
Installing gprof
$ sudo apt-get install binutils
Compiling a C++ Program for Profiling
To use gprof, compile your program with the -pg flag:
$ g++ -pg -o test test.cpp
Example C++ Program
Create a file test.cpp with the following code:
#include <iostream>
void heavyComputation() {
volatile long long sum = 0;
for (long long i = 0; i < 500000000; ++i) {
sum += i; // Simple but expensive loop
}
}
void lightComputation() {
volatile int sum = 0;
for (int i = 0; i < 100000; ++i) {
sum += i; // Lighter loop
}
}
int main() {
heavyComputation(); // Call heavy function once
for (int i = 0; i < 1000; ++i) {
lightComputation(); // Call light function many times
}
return 0;
}
Running and Profiling the Program
- Compile the program:
$ g++ -pg -o test test.cpp - Execute the program to generate
gmon.out:$ ./test - Analyze the profiling data:
$ gprof test gmon.out > profile.txt $ cat profile.txt
Understanding the Output (in the profile.txt file)
- Flat Profile: Shows execution time spent in each function.
- Call Graph: Displays function call relationships and their execution time.
The profiling results show that heavyComputation() takes significantly more execution time than lightComputation(), even though lightComputation() is called 1000 times. The flat profile from gprof also indicates the percentage of time spent in each function, helping to identify bottlenecks. For the above test program, the flat profile is like this:
Each sample counts as 0.01 seconds.
% cumulative self self total
time seconds seconds calls ms/call ms/call name
85.71 0.24 0.24 1 240.00 240.00 heavyComputation()
14.29 0.28 0.04 1000 0.04 0.04 lightComputation()
0.00 0.28 0.00 1 0.00 0.00 __static_initialization_and_destruction_0(int, int)
Note: The __static_initialization_and_destruction_0(int, int) function is generated by the compiler during the static initialization and destruction phases of a program, particularly for global or static variables.
Best Practices for Using gprof
- Use
-O2optimizations but avoid-O3, which may inline functions and reduce profiling accuracy. - Profile with realistic input data to get meaningful results.
- Optimize the slowest functions first based on the profiling report.
Conclusion
gprof is a powerful tool for detecting performance bottlenecks in C++ programs. By identifying expensive functions, developers can make targeted optimizations.
Why Use volatile in the above program?
Compiler May Remove the Loops
When compiling a program, the compiler applies optimizations to make the code run faster. One such optimization is dead code elimination, where the compiler removes code that does not affect the program's observable behavior.
For example, consider this function:
void heavyComputation() {
long long sum = 0;
for (long long i = 0; i < 500000000; ++i) {
sum += i;
}
}
- The compiler notices that
sumis never used outside the function. - Since the result is discarded, the compiler may completely remove the loop.
- This means
heavyComputation()might do nothing at runtime, which ruins our profiling experiment.
How Does volatile Help?
Declaring a variable as volatile tells the compiler:
"This variable might change in ways you cannot predict, so do not optimize it away."
For example:
void heavyComputation() {
volatile long long sum = 0; // Mark sum as volatile
for (long long i = 0; i < 500000000; ++i) {
sum += i;
}
}
- Now, even if
sumis never used, the compiler must perform the loop. - The
volatilekeyword prevents the compiler from assuming thatsumis unimportant. - This ensures that the loop actually runs during profiling.
Does volatile Affect Performance?
Yes, but only slightly.
- Without
volatile, the compiler can optimize the loop aggressively. - With
volatile, every read and write tosumis guaranteed to happen exactly as written, preventing some optimizations.
However, this small cost is worth it for benchmarking, because it ensures that the loops are not removed.