diff options
Diffstat (limited to 'binutils-1.9/gprof.texinfo')
| -rw-r--r-- | binutils-1.9/gprof.texinfo | 943 |
1 files changed, 943 insertions, 0 deletions
diff --git a/binutils-1.9/gprof.texinfo b/binutils-1.9/gprof.texinfo new file mode 100644 index 0000000..a4161ed --- /dev/null +++ b/binutils-1.9/gprof.texinfo @@ -0,0 +1,943 @@ +\input texinfo @c -*-texinfo-*- + +@setfilename gprof.info +@settitle gprof +@setchapternewpage odd +@ifinfo +This file documents the gprof profiler of the GNU system. + +Copyright (C) 1988 Free Software Foundation, Inc. + +Permission is granted to make and distribute verbatim copies of +this manual provided the copyright notice and this permission notice +are preserved on all copies. + +@ignore +Permission is granted to process this file through Tex and print the +results, provided the printed document carries copying permission +notice identical to this one except for the removal of this paragraph +(this paragraph not being relevant to the printed manual). + +@end ignore +Permission is granted to copy and distribute modified versions of this +manual under the conditions for verbatim copying, provided that the entire +resulting derived work is distributed under the terms of a permission +notice identical to this one. + +Permission is granted to copy and distribute translations of this manual +into another language, under the above conditions for modified versions. +@end ifinfo + +@iftex +@finalout +@end iftex + +@titlepage +@sp 11 +@center @titlefont{gprof} +@sp 1 +@center The GNU Profiler +@sp 2 +@center Jay Fenlason and Richard Stallman +@sp 1 +@center @today +@sp 3 +This manual describes the GNU profiler, @code{gprof}, and how you can use +it to determine which parts of a program are taking most of the execution +time. We assume that you know how to write, compile, and execute programs. +GNU @code{gprof} was written by Jay Fenlason. +@page +@vskip 0pt plus 1filll +Copyright @copyright{} 1988 Free Software Foundation, Inc. + +Permission is granted to make and distribute verbatim copies of +this manual provided the copyright notice and this permission notice +are preserved on all copies. + +@ignore +Permission is granted to process this file through Tex and print the +results, provided the printed document carries copying permission +notice identical to this one except for the removal of this paragraph +(this paragraph not being relevant to the printed manual). + +@end ignore +Permission is granted to copy and distribute modified versions of this +manual under the conditions for verbatim copying, provided that the entire +resulting derived work is distributed under the terms of a permission +notice identical to this one. + +Permission is granted to copy and distribute translations of this manual +into another language, under the same conditions as for modified versions. + +@end titlepage + +@ifinfo +@node Top, Why, Top, (dir) +@ichapter Profiling a Program: Where Does It Spend Its Time? + +This manual describes the GNU profiler @code{gprof}, and how you can use it +to determine which parts of a program are taking most of the execution +time. We assume that you know how to write, compile, and execute programs. +GNU @code{gprof} was written by Jay Fenlason. + +@menu +* Why:: What profiling means, and why it is useful. +* Compiling:: How to compile your program for profiling. +* Executing:: How to execute your program to generate the + profile data file @file{gmon.out}. +* Analyzing:: How to run @code{gprof}, and how to specify + options for it. + +* Flat Profile:: The flat profile shows how much time was spent + executing directly in each function. +* Call Graph:: The call graph shows which functions called which + others, and how much time each function used + when its subroutine calls are included. + +* Implementation:: How the profile data is recorded and written. +* Sampling Error:: Statistical margins of error. + How to accumulate data from several runs + to make it more accurate. + +* Assumptions:: Some of @code{gprof}'s measurements are based + on assumptions about your program + that could be very wrong. + +* Incompatibilities:: (between GNU @code{gprof} and Unix @code{gprof}.) +@end menu +@end ifinfo + +@node Why, Compiling, Top, Top +@chapter Why Profile + +Profiling allows you to learn where your program spent its time and which +functions called which other functions while it was executing. This +information can show you which pieces of your program are slower than you +expected, and might be candidates for rewriting to make your program +execute faster. It can also tell you which functions are being called more +or less often than you expected. This may help you spot bugs that had +otherwise been unnoticed. + +Since the profiler uses information collected during the actual execution +of your program, it can be used on programs that are too large or too +complex to analyze by reading the source. However, how your program is run +will affect the information that shows up in the profile data. If you +don't use some feature of your program while it is being profiled, no +profile information will be generated for that feature. + +Profiling has several steps: + +@itemize @bullet +@item +You must compile and link your program with profiling enabled. +@xref{Compiling}. + +@item +You must execute your program to generate a profile data file. +@xref{Executing}. + +@item +You must run @code{gprof} to analyze the profile data. +@xref{Analyzing}. +@end itemize + +The next three chapters explain these steps in greater detail. + +The result of the analysis is a file containing two tables, the +@dfn{flat profile} and the @dfn{call graph} (plus blurbs which briefly +explain the contents of these tables). + +The flat profile shows how much time your program spent in each function, +and how many times that function was called. If you simply want to know +which functions burn most of the cycles, it is stated concisely here. +@xref{Flat Profile}. + +The call graph shows, for each function, which functions called it, which +other functions it called, and how many times. There is also an estimate +of how much time was spent in the subroutines of each function. This can +suggest places where you might try to eliminate function calls that use a +lot of time. @xref{Call Graph}. + +@node Compiling, Executing, Why, Top +@chapter Compiling a Program for Profiling + +The first step in generating profile information for your program is +to compile and link it with profiling enabled. + +To compile a source file for profiling, specify the @samp{-pg} option when +you run the compiler. (This is in addition to the options you normally +use.) + +To link the program for profiling, if you use a compiler such as @code{cc} +to do the linking, simply specify @samp{-pg} in addition to your usual +options. The same option, @samp{-pg}, alters either compilation or linking +to do what is necessary for profiling. Here are examples: + +@example +cc -g -c myprog.c utils.c -pg +cc -o myprog myprog.o utils.o -pg +@end example + +The @samp{-pg} option also works with a command that both compiles and links: + +@example +cc -o myprog myprog.c utils.c -g -pg +@end example + +If you run the linker @code{ld} directly instead of through a compiler such +as @code{cc}, you must specify the profiling startup file +@file{/lib/gcrt0.o} as the first input file instead of the usual startup +file @file{/lib/crt0.o}. In addition, you would probably want to specify +the profiling C library, @file{/usr/lib/libc_p.a}, by writing @samp{-lc_p} +instead of the usual @samp{-lc}. This is not absolutely necessary, but doing +this gives you number-of-calls information for standard library functions such +as @code{read} and @code{open}. For example: + +@example +ld -o myprog /lib/gcrt0.o myprog.o utils.o -lc_p +@end example + +If you compile only some of the modules of the program with @samp{-pg}, you +can still profile the program, but you won't get complete information about +the modules that were compiled without @samp{-pg}. The only information +you get for the functions in those modules is the total time spent in them; +there is no record of how many times they were called, or from where. This +will not affect the flat profile (except that the @code{calls} field for +the functions will be blank), but will greatly reduce the usefulness of the +call graph. + +So far GNU @code{gprof} has been tested only with C programs, but it ought +to work with any language in which programs are compiled and linked to form +executable files. If it does not, please let us know. + +@node Executing, Analyzing, Compiling, Top +@chapter Executing the Program to Generate Profile Data + +Once the program is compiled for profiling, you must run it in order to +generate the information that @code{gprof} needs. Simply run the program +as usual, using the normal arguments, file names, etc. The program should +run normally, producing the same output as usual. It will, however, run +somewhat slower than normal because of the time spent collecting and the +writing the profile data. + +The way you run the program---the arguments and input that you give +it---may have a dramatic effect on what the profile information shows. The +profile data will describe the parts of the program that were activated for +the particular input you use. For example, if the first command you give +to your program is to quit, the profile data will show the time used in +initialization and in cleanup, but not much else. + +You program will write the profile data into a file called @file{gmon.out} +just before exiting. If there is already a file called @file{gmon.out}, +its contents are overwritten. There is currently no way to tell the +program to write the profile data under a different name, but you can rename +the file afterward if you are concerned that it may be overwritten. + +In order to write the @file{gmon.out} file properly, your program must exit +normally: by returning from @code{main} or by calling @code{exit}. Calling +the low-level function @code{_exit} does not write the profile data, and +neither does abnormal termination due to an unhandled signal. + +The @file{gmon.out} file is written in the program's @emph{current working +directory} at the time it exits. This means that if your program calls +@code{chdir}, the @file{gmon.out} file will be left in the last directory +your program @code{chdir}'d to. If you don't have permission to write in +this directory, the file is not written. You may get a confusing error +message if this happens. (We have not yet replaced the part of Unix +responsible for this; when we do, we will make the error message +comprehensible.) + +@node Analyzing, Flat Profile, Executing, Top +@chapter Analyzing the Profile Data: @code{gprof} Command Summary + +After you have a profile data file @file{gmon.out}, you can run @code{gprof} +to interpret the information in it. The @code{gprof} program prints a +flat profile and a call graph on standard output. Typically you would +redirect the output of @code{gprof} into a file with @samp{>}. + +You run @code{gprof} like this: + +@example +@code{gprof} @var{options} [@var{executable-file} [@var{profile-data-files}@dots{}]] [> @var{outfile}] +@end example + +@noindent +Here square-brackets indicate optional arguments. + +If you omit the executable file name, the file @file{a.out} is used. If +you give no profile data file name, the file @file{gmon.out} is used. If +any file is not in the proper format, or if the profile data file does not +appear to belong to the executable file, an error message is printed. + +You can give more than one profile data file by entering all their names +after the executable file name; then the statistics in all the data files +are summed together. + +The following options may be used to selectively include or exclude +functions in the output: + +@table @code +@item -a +The @code{-a} option causes @code{gprof} to ignore static (private) +functions. (These are functions whose names are not listed as global, +and which are not visible outside the file/function/block where they +were defined.) Time spent in these functions, calls to/from them, +etc, will all be attributed to the function that was loaded directly +before it in the executable file. This is compatible with Unix +@code{gprof}, but a bad idea. This option affects both the flat +profile and the call graph. + +@item -e @var{function_name} +The @code{-e @var{function}} option tells @code{gprof} to not print +information about the function (and its children@dots{}) in the call +graph. The function will still be listed as a child of any functions +that call it, but its index number will be shown as @samp{[not +printed]}. + +@item -E @var{function_name} +The @code{-E @var{function}} option works like the @code{-e} option, +but time spent in the function (and children who were not called from +anywhere else), will not be used to compute the percentages-of-time +for the call graph. + +@item -f @var{function_name} +The @code{-f @var{function}} option causes @code{gprof} to limit the +call graph to the function and its children (and their +children@dots{}). + +@item -F @var{function_name} +The @code{-F @var{function}} option works like the @code{-f} option, +but only time spent in the function and its children (and their +children@dots{}) will be used to determine total-time and +percentages-of-time for the call graph. + +@item -z +If you give the @code{-z} option, @code{gprof} will mention all +functions in the flat profile, even those that were never called, and +that had no time spent in them. +@end table + +The order of these options does not matter. + +Note that only one function can be specified with each @code{-e}, +@code{-E}, @code{-f} or @code{-F} option. To specify more than one +function, use multiple options. For example, this command: + +@example +gprof -e boring -f foo -f bar myprogram > gprof.output +@end example + +@noindent +lists in the call graph all functions that were reached from either +@code{foo} or @code{bar} and were not reachable from @code{boring}. + +There are two other useful @code{gprof} options: + +@table @code +@item -b +If the @code{-b} option is given, @code{gprof} doesn't print the +verbose blurbs that try to explain the meaning of all of the fields in +the tables. This is useful if you intend to print out the output, or +are tired of seeing the blurbs. + +@item -s +The @code{-s} option causes @code{gprof} to summarize the information +in the profile data files it read in, and write out a profile data +file called @file{gmon.sum}, which contains all the information from +the profile data files that @code{gprof} read in. The file @file{gmon.sum} +may be one of the specified input files; the effect of this is to +merge the data in the other input files into @file{gmon.sum}. +@xref{Sampling Error}. + +Eventually you can run @code{gprof} again without @samp{-s} to analyze the +cumulative data in the file @file{gmon.sum}. +@end table + +@node Flat Profile, Call Graph, Analyzing, Top +@chapter How to Understand the Flat Profile +@cindex flat profile + +The @dfn{flat profile} shows the total amount of time your program +spent executing each function. Unless the @samp{-z} option is given, +functions with no apparent time spent in them, and no apparent calls +to them, are not mentioned. Note that if a function was not compiled +for profiling, and didn't run long enough to show up on the program +counter histogram, it will be indistinguishable from a function that +was never called. +@c ??? + +Here is a sample flat profile for a small program: + +@example +Each sample counts as 0.01 seconds. + +% time seconds cumsec calls function + 79.17 0.19 0.19 6 a + 16.67 0.04 0.23 1 main + 4.17 0.01 0.24 mcount + 0.00 0 0.24 1 profil +@end example + +@noindent +The functions are sorted by decreasing run-time spent in them. The +functions @code{mcount} and @code{profil} are part of the profiling +aparatus and appear in every flat profile; their time gives a measure of +the amount of overhead due to profiling. (These internal functions are +omitted from the call graph.) + +The sampling period estimates the margin of error in each of the time +figures. A time figure that is not much larger than this is not reliable. +In this example, the @code{seconds} field for @code{mcount} might well be 0 +or 0.02 in another run. @xref{Sampling Error}, for a complete discussion. + +Here is what the fields in each line mean: + +@table @code +@item % time +This is the percentage of the total execution time your program spent +in this function. These should all add up to 100%. + +@item seconds +This is the total number of seconds the computer spent executing the +user code of this function. + +@item cumsec +This is the cumulative total number of seconds the computer spent +executing this functions, plus the time spent in all the functions +above this one in this table. + +@item calls +This is the total number of times the function was called. If the +function was never called, or the number of times it was called cannot +be determined (probably because the function was not compiled with +profiling enabled), the @dfn{calls} field is blank. + +@item function +This is the name of the function. +@end table + +@node Call Graph, Implementation, Flat Profile, Top +@chapter How to Read the Call Graph + +@cindex call graph +The @dfn{call graph} shows how much time was spent in each function +and its children. From this information, you can find functions that, +while they themselves may not have used much time, called other +functions that did use unusual amounts of time. + +Here is a sample call from a small program. This call came from the +same @code{gprof} run as the flat profile example in the previous +chapter. + +@example +index % time self children called name + <spontaneous> +[1] 100.00 0 0.23 0 start [1] + 0.04 0.19 1/1 main [2] +---------------------------------------- + 0.04 0.19 1/1 start [1] +[2] 100.00 0.04 0.19 1 main [2] + 0.19 0 1/1 a [3] +---------------------------------------- + 0.19 0 1/1 main [2] +[3] 82.61 0.19 0 1+5 a [3] +---------------------------------------- +@end example + +The lines full of dashes divide this table into @dfn{entries}, one for each +function. Each entry has one or more lines. + +In each entry, the primary line is the one that starts with an index number +in square brackets. The end of this line says which function the entry is +for. The preceding lines in the entry describe the callers of this +function and the following lines describe its subroutines (also called +@dfn{children} when we speak of the call graph). + +The entries are sorted by time spent in the function and its subroutines. + +The internal profiling functions @code{mcount} and @code{profil} +(@pxref{Flat Profile}) are never mentioned in the call graph. + +@menu +* Primary:: Details of the primary line's contents. +* Callers:: Details of caller-lines' contents. +* Subroutines:: Details of subroutine-lines' contents. +* Cycles:: When there are cycles of recursion, + such as @code{a} calls @code{b} calls @code{a}@dots{} +@end menu + +@node Primary, Callers, Call Graph, Call Graph +@section The Primary Line + +The @dfn{primary line} in a call graph entry is the line that +describes the function which the entry is about and gives the overall +statistics for this function. + +For reference, we repeat the primary line from the entry for function +@code{a} in our main example, together with the heading line that shows the +names of the fields: + +@example +index % time self children called name +@dots{} +[3] 82.61 0.19 0 1+5 a [3] +@end example + +Here is what the fields in the primary line mean: + +@table @code +@item index +Entries are numbered with consecutive integers. Each function +therefore has an index number, which appears at the beginning of its +primary line. + +Each cross-reference to a function, as a caller or subroutine of +another, gives its index number as well as its name. The index number +guides you if you wish to look for the entry for that function. + +@item % time +This is the percentage of the total time that was spent in this +function, including time spent in subroutines called from this +function. + +The time spent in this function is counted again for the callers of +this function. Therefore, adding up these percentages is meaningless. + +@item self +This is the total amount of time spent in this function. This +should be identical to the number printed in the @code{seconds} field +for this function in the flat profile. + +@item children +This is the total amount of time spent in the subroutine calls made by +this function. This should be equal to the sum of all the @code{self} +and @code{children} entries of the children listed directly below this +function. + +@item called +This is the number of times the function was called. + +If the function called itself recursively, there are two numbers, +separated by a @samp{+}. The first number counts non-recursive calls, +and the second counts recursive calls. + +In the example above, the function @code{a} called itself five times, +and was called once from @code{main}. + +@item name +This is the name of the current function. The index number is +repeated after it. + +If the function is part of a cycle of recursion, the cycle number is +printed between the function's name and the index number +(@pxref{Cycles}). For example, if function @code{gnurr} is part of +cycle number one, and has index number twelve, its primary line would +be end like this: + +@example +gnurr <cycle 1> [12] +@end example +@end table + +@node Callers, Subroutines, Primary, Call Graph +@section Lines for a Function's Callers + +A function's entry has a line for each function it was called by. +These lines' fields correspond to the fields of the primary line, but +their meanings are different because of the difference in context. + +For reference, we repeat two lines from the entry for the function +@code{a}, the primary line and one caller-line preceding it, together +with the heading line that shows the names of the fields: + +@example +index % time self children called name +@dots{} + 0.19 0 1/1 main [2] +[3] 82.61 0.19 0 1+5 a [3] +@end example + +Here are the meanings of the fields in the caller-line for @code{a} +called from @code{main}: + +@table @code +@item self +An estimate of the amount of time spent in @code{a} itself when it was +called from @code{main}. + +@item children +An estimate of the amount of time spent in @code{a}'s subroutines when +@code{a} was called from @code{main}. + +The sum of the @code{self} and @code{children} fields is an estimate +of the amount of time spent within calls to @code{a} from @code{main}. + +@item called +Two numbers: the number of times @code{a} was called from @code{main}, +followed by the total number of nonrecursive calls to @code{a} from +all its callers. + +@item name and index number +The name of the caller of @code{a} to which this line applies, +followed by the caller's index number. + +Not all functions have entries in the call graph; some +options to @code{gprof} request the omission of certain functions. +When a caller has no entry of its own, it still has caller-lines +in the entries of the functions it calls. Since this caller +has no index number, the string @samp{[not printed]} is used +instead of one. + +If the caller is part of a recursion cycle, the cycle number is +printed between the name and the index number. +@end table + +If the identity of the callers of a function cannot be determined, a +dummy caller-line is printed which has @samp{<spontaneous>} as the +``caller's name'' and all other fields blank. This can happen for +signal handlers. +@c What if some calls have determinable callers' names but not all? + +@node Subroutines, Cycles, Callers, Call Graph +@section Lines for a Function's Subroutines + +A function's entry has a line for each of its subroutines---in other +words, a line for each other function that it called. These lines' +fields correspond to the fields of the primary line, but their meanings +are different because of the difference in context. + +For reference, we repeat two lines from the entry for the function +@code{main}, the primary line and a line for a subroutine, together +with the heading line that shows the names of the fields: + +@example +index % time self children called name +@dots{} +[2] 100.00 0.04 0.19 1 main [2] + 0.19 0 1/1 a [3] +@end example + +Here are the meanings of the fields in the subroutine-line for @code{main} +calling @code{a}: + +@table @code +@item self +An estimate of the amount of time spent directly within @code{a} +when @code{a} was called from @code{main}. + +@item children +An estimate of the amount of time spent in subroutines of @code{a} +when @code{a} was called from @code{main}. + +The sum of the @code{self} and @code{children} fields is an estimate +of the total time spent in calls to @code{a} from @code{main}. + +@item called +Two numbers, the number of calls to @code{a} from @code{main} +followed by the total number of nonrecursive calls to @code{a}. + +@item name +The name of the subroutine of @code{a} to which this line applies, +followed by the subroutine's index number. If the subroutine is +a function omitted from the call graph, it has no index number, +so @samp{[not printed]} appears instead. + +If the caller is part of a recursion cycle, the cycle number is +printed between the name and the index number. +@end table + +@node Cycles,, Subroutines, Call Graph +@section How Mutually Recursive Functions Are Described +@cindex cycle +@cindex recursion cycle + +The graph may be complicated by the presence of @dfn{cycles of +recursion} in the call graph. A cycle exists if a function calls +another function that (directly or indirectly) calls (or appears to +call) the original function. For example: if @code{a} calls @code{b}, +and @code{b} calls @code{a}, then @code{a} and @code{b} form a cycle. + +Whenever there are call-paths both ways between a pair of functions, they +belong to the same cycle. If @code{a} and @code{b} call each other and +@code{b} and @code{c} call each other, all three make one cycle. Note that +even if @code{b} only calls @code{a} if it was not called from @code{a}, +@code{gprof} cannot determine this, so @code{a} and @code{b} are still +considered a cycle. + +The cycles are numbered with consecutive integers. When a function +belongs to a cycle, each time the function name appears in the call graph +it is followed by @samp{<cycle @var{number}>}. + +The reason cycles matter is that they make the time values in the call +graph paradoxical. The ``time spent in children'' of @code{a} should +include the time spent in its subroutine @code{b} and in @code{b}'s +subroutines---but one of @code{b}'s subroutines is @code{a}! How much of +@code{a}'s time should be included in the children of @code{a}, when +@code{a} is indirectly recursive? + +The way @code{gprof} resolves this paradox is by creating a single entry +for the cycle as a whole. The primary line of this entry describes the +total time spent directly in the functions of the cycle. The +``subroutines'' of the cycle are the individual functions of the cycle, and +all other functions that were called directly by them. The ``callers'' of +the cycle are the functions, outside the cycle, that called functions in +the cycle. + +Here is a portion of the call graph which shows a cycle containing +functions @code{a} and @code{b}. The cycle was entered by a call to +@code{a} from @code{main}; both @code{a} and @code{b} called @code{c}.@refill + +@example +index % time self children called name +---------------------------------------- + 1.77 0 1/1 main [2] +[3] 91.71 1.77 0 1+5 <cycle 1 as a whole> [3] + 1.02 0 3 b <cycle 1> [4] + 0.75 0 2 a <cycle 1> [5] +---------------------------------------- + 3 a <cycle 1> [5] +[4] 52.85 1.02 0 0 b <cycle 1> [4] + 2 a <cycle 1> [5] + 0 0 3/6 c [6] +---------------------------------------- + 1.77 0 1/1 main [2] + 2 b <cycle 1> [4] +[5] 38.86 0.75 0 1 a <cycle 1> [5] + 3 b <cycle 1> [4] + 0 0 3/6 c [6] +---------------------------------------- +@end example + +@noindent +(The entire call graph for this program contains in addition an entry for +@code{main}, which calls @code{a}, and an entry for @code{c}, with callers +@code{a} and @code{b}.) + +@example +index % time self children called name + <spontaneous> +[1] 100.00 0 1.93 0 start [1] + 0.16 1.77 1/1 main [2] +---------------------------------------- + 0.16 1.77 1/1 start [1] +[2] 100.00 0.16 1.77 1 main [2] + 1.77 0 1/1 a <cycle 1> [5] +---------------------------------------- + 1.77 0 1/1 main [2] +[3] 91.71 1.77 0 1+5 <cycle 1 as a whole> [3] + 1.02 0 3 b <cycle 1> [4] + 0.75 0 2 a <cycle 1> [5] + 0 0 6/6 c [6] +---------------------------------------- + 3 a <cycle 1> [5] +[4] 52.85 1.02 0 0 b <cycle 1> [4] + 2 a <cycle 1> [5] + 0 0 3/6 c [6] +---------------------------------------- + 1.77 0 1/1 main [2] + 2 b <cycle 1> [4] +[5] 38.86 0.75 0 1 a <cycle 1> [5] + 3 b <cycle 1> [4] + 0 0 3/6 c [6] +---------------------------------------- + 0 0 3/6 b <cycle 1> [4] + 0 0 3/6 a <cycle 1> [5] +[6] 0.00 0 0 6 c [6] +---------------------------------------- +@end example + +The @code{self} field of the cycle's primary line is the total time +spent in all the functions of the cycle. It equals the sum of the +@code{self} fields for the individual functions in the cycle, found +in the entry in the subroutine lines for these functions. + +The @code{children} fields of the cycle's primary line and subroutine lines +count only subroutines outside the cycle. Even though @code{a} calls +@code{b}, the time spent in those calls to @code{b} is not counted in +@code{a}'s @code{children} time. Thus, we do not encounter the problem of +what to do when the time in those calls to @code{b} includes indirect +recursive calls back to @code{a}. + +The @code{children} field of a caller-line in the cycle's entry estimates +the amount of time spent @emph{in the whole cycle}, and its other +subroutines, on the times when that caller called a function in the cycle. + +The @code{calls} field in the primary line for the cycle has two numbers: +first, the number of times functions in the cycle were called by functions +outside the cycle; second, the number of times they were called by +functions in the cycle (including times when a function in the cycle calls +itself). This is a generalization of the usual split into nonrecursive and +recursive calls. + +The @code{calls} field of a subroutine-line for a cycle member in the +cycle's entry says how many time that function was called from functions in +the cycle. The total of all these is the second number in the primary line's +@code{calls} field. + +In the individual entry for a function in a cycle, the other functions in +the same cycle can appear as subroutines and as callers. These lines show +how many times each function in the cycle called or was called from each other +function in the cycle. The @code{self} and @code{children} fields in these +lines are blank because of the difficulty of defining meanings for them +when recursion is going on. + +@node Implementation, Sampling Error, Call Graph, Top +@chapter Implementation of Profiling + +Profiling works by changing how every function in your program is compiled +so that when it is called, it will stash away some information about where +it was called from. From this, the profiler can figure out what function +called it, and can count how many times it was called. This change is made +by the compiler when your program is compiled with the @samp{-pg} option. + +Profiling also involves watching your program as it runs, and keeping a +histogram of where the program counter happens to be every now and then. +Typically the program counter is looked at around 100 times per second of +run time, but the exact frequency may vary from system to system. + +A special startup routine allocates memory for the histogram and sets up a +clock signal handler to make entries in it. Use of this special startup +routine is one of the effects of using @samp{cc -pg} to link. The startup +file also includes an @code{exit} function which is responsible for writing +the file @file{gmon.out}. + +Number-of-calls information for library routines is collected by using a +special version of the C library. The programs in it are the same as in +the usual C library, but they were compiled with @samp{-pg}. If you link +your program with @samp{cc -pg}, it automatically uses the profiling +version of the library. + +The output from @code{gprof} gives no indication of parts of your program that +are limited by I/O or swapping bandwidth. This is because samples of the +program counter are taken at fixed intervals of run time. Therefore, the +time measurements in @code{gprof} output say nothing about time that your +program was not running. For example, a part of the program that creates +so much data that it cannot all fit in physical memory at once may run very +slowly due to thrashing, but @code{gprof} will say it uses little time. On +the other hand, sampling by run time has the advantage that the amount of +load due to other users won't directly affect the output you get. + +@node Sampling Error, Assumptions, Implementation, Top +@chapter Statistical Inaccuracy of @code{gprof} Output + +The run-time figures that @code{gprof} gives you are based on a sampling +process, so they are subject to statistical inaccuracy. If a function runs +only a small amount of time, so that on the average the sampling process +ought to catch that function in the act only once, there is a pretty good +chance it will actually find that function zero times, or twice. + +By contrast, the number-of-calls figures are derived by counting, not +sampling. They are completely accurate and will not vary from run to run +if your program is deterministic. + +The @dfn{sampling period} that is printed at the beginning of the flat +profile says how often samples are taken. The rule of thumb is that a +run-time figure is accurate if it is considerably bigger than the sampling +period. + +The actual amount of error is usually more than one sampling period. In +fact, if a value is @var{n} times the sampling period, the @emph{expected} +error in it is the square-root of @var{n} sampling periods. If the +sampling period is 0.01 seconds and @code{foo}'s run-time is 1 second, the +expected error in @code{foo}'s run-time is 0.1 seconds. It is likely to +vary this much @emph{on the average} from one profiling run to the next. +(@emph{Sometimes} it will vary more.) + +This does not mean that a small run-time figure is devoid of information. +If the program's @emph{total} run-time is large, a small run-time for one +function does tell you that that function used an insignificant fraction of +the whole program's time. Usually this means it is not worth optimizing. + +One way to get more accuracy is to give your program more (but similar) +input data so it will take longer. Another way is to combine the data from +several runs, using the @samp{-s} option of @code{gprof}. Here is how: + +@enumerate +@item +Run your program once. + +@item +Issue the command @samp{mv gmon.out gmon.sum}. + +@item +Run your program again, the same as before. + +@item +Merge the new data in @file{gmon.out} into @file{gmon.sum} with this command: + +@example +gprof -s @var{executable-file} gmon.out gmon.sum +@end example + +@item +Repeat the last two steps as often as you wish. + +@item +Analyze the cumulative data using this command: + +@example +gprof @var{executable-file} gmon.sum > @var{output-file} +@end example +@end enumerate + +@node Assumptions, Incompatibilities, Sampling Error, Top +@chapter Estimating @code{children} Times Uses an Assumption + +Some of the figures in the call graph are estimates---for example, the +@code{children} time values and all the the time figures in caller and +subroutine lines. + +There is no direct information about these measurements in the profile +data itself. Instead, @code{gprof} estimates them by making an assumption +about your program that might or might not be true. + +The assumption made is that the average time spent in each call to any +function @code{foo} is not correlated with who called @code{foo}. If +@code{foo} used 5 seconds in all, and 2/5 of the calls to @code{foo} came +from @code{a}, then @code{foo} contributes 2 seconds to @code{a}'s +@code{children} time, by assumption. + +This assumption is usually true enough, but for some programs it is far +from true. Suppose that @code{foo} returns very quickly when its argument +is zero; suppose that @code{a} always passes zero as an argument, while +other callers of @code{foo} pass other arguments. In this program, all the +time spent in @code{foo} is in the calls from callers other than @code{a}. +But @code{gprof} has no way of knowing this; it will blindly and +incorrectly charge 2 seconds of time in @code{foo} to the children of +@code{a}. + +We hope some day to put more complete data into @file{gmon.out}, so that +this assumption is no longer needed, if we can figure out how. For the +nonce, the estimated figures are usually more useful than misleading. + +@node Incompatibilities, , Assumptions, Top +@chapter Incompatibilities with Unix @code{gprof} + +GNU @code{gprof} and Berkeley Unix @code{gprof} use the same data file +@file{gmon.out}, and provide essentially the same information. But there a +few differences.@refill + +GNU @code{gprof} does not support the @samp{-c} option which prints a +static call graph based on reading the machine language of your +program. We think that program cross-references ought to be based on +the source files, which can be analyzed in a machine-independent +fashion.@refill + +For a recursive function, Unix @code{gprof} lists the function as a parent +and as a child, with a @code{calls} field that lists the number of +recursive calls. GNU @code{gprof} omits these lines and puts the number of +recursive calls in the primary line. + +When a function is suppressed from the call graph with @samp{-e}, GNU +@code{gprof} still lists it as a subroutine of functions that call it. + +The function names printed in GNU @code{gprof} output do not include +the leading underscores that are added internally to the front of all +C identifiers on many operating systems. + +The blurbs, field widths, and output formats are different. GNU +@code{gprof} prints blurbs after the tables, so that you can see the +tables without skipping the blurbs. + +@contents +@bye |
