Files
Jidong Xiao ad71d0519f re-organize
2025-02-26 12:16:53 -05:00

4.6 KiB

C++ Profiling with gprof

What is gprof?

gprof is a GNU profiler that helps analyze where a program spends most of its execution time. It provides function call counts and execution time details.

Installing gprof

$ sudo apt-get install binutils

Compiling a C++ Program for Profiling

To use gprof, compile your program with the -pg flag:

$ g++ -pg -o test test.cpp

Example C++ Program

Create a file test.cpp with the following code:

#include <iostream>

void heavyComputation() {
    volatile long long sum = 0;
    for (long long i = 0; i < 500000000; ++i) {
        sum += i;  // Simple but expensive loop
    }
}

void lightComputation() {
    volatile int sum = 0;
    for (int i = 0; i < 100000; ++i) {
        sum += i;  // Lighter loop
    }
}

int main() {
    heavyComputation();  // Call heavy function once
    for (int i = 0; i < 1000; ++i) {
        lightComputation();  // Call light function many times
    }
    return 0;
}

Running and Profiling the Program

  1. Compile the program:
    $ g++ -pg -o test test.cpp
    
  2. Execute the program to generate gmon.out:
    $ ./test
    
  3. Analyze the profiling data:
    $ gprof test gmon.out > profile.txt
    $ cat profile.txt
    

Understanding the Output (in the profile.txt file)

  • Flat Profile: Shows execution time spent in each function.
  • Call Graph: Displays function call relationships and their execution time.

The flat profile from gprof also indicates the percentage of time spent in each function, helping to identify bottlenecks. For the above test program, the flat profile is like this:

Each sample counts as 0.01 seconds.
  %   cumulative   self              self     total
 time   seconds   seconds    calls  ms/call  ms/call  name
 85.71      0.24     0.24        1   240.00   240.00  heavyComputation()
 14.29      0.28     0.04     1000     0.04     0.04  lightComputation()
  0.00      0.28     0.00        1     0.00     0.00  __static_initialization_and_destruction_0(int, int)

As can be seen, the profiling results show that heavyComputation() takes significantly more execution time than lightComputation(), even though lightComputation() is called 1000 times.

Note: The __static_initialization_and_destruction_0(int, int) function is generated by the compiler during the static initialization and destruction phases of a program, particularly for global or static variables.

Best Practices for Using gprof

  • Use -O2 optimizations but avoid -O3, which may inline functions and reduce profiling accuracy.
  • Profile with realistic input data to get meaningful results.
  • Optimize the slowest functions first based on the profiling report.

Conclusion

gprof is a powerful tool for detecting performance bottlenecks in C++ programs. By identifying expensive functions, developers can make targeted optimizations.

Why Use volatile in the above program?

Compiler May Remove the Loops

When compiling a program, the compiler applies optimizations to make the code run faster. One such optimization is dead code elimination, where the compiler removes code that does not affect the program's observable behavior.

For example, consider this function:

void heavyComputation() {
    long long sum = 0;
    for (long long i = 0; i < 500000000; ++i) {
        sum += i;
    }
}
  • The compiler notices that sum is never used outside the function.
  • Since the result is discarded, the compiler may completely remove the loop.
  • This means heavyComputation() might do nothing at runtime, which ruins our profiling experiment.

How Does volatile Help?

Declaring a variable as volatile tells the compiler:

"This variable might change in ways you cannot predict, so do not optimize it away."

For example:

void heavyComputation() {
    volatile long long sum = 0;  // Mark sum as volatile
    for (long long i = 0; i < 500000000; ++i) {
        sum += i;
    }
}
  • Now, even if sum is never used, the compiler must perform the loop.
  • The volatile keyword prevents the compiler from assuming that sum is unimportant.
  • This ensures that the loop actually runs during profiling.

Does volatile Affect Performance?

Yes, but only slightly.

  • Without volatile, the compiler can optimize the loop aggressively.
  • With volatile, every read and write to sum is guaranteed to happen exactly as written, preventing some optimizations.

However, this small cost is worth it for benchmarking, because it ensures that the loops are not removed.