When the duration of a method call is measured, there are two different possibilities to measure it:
Most likely you'll be interested in the wall clock time, that is the duration between the entry and the exit of a method as measured with a clock. For the profiling agent this is a straightforward measurement. While it might seem at first glance that measuring times should not have any significant overhead, this is not so if you need a high resolution measurement. Operating systems offer different timers with different performance overheads.
For example, on Microsoft Windows, the standard timer with a granularity of 10 milliseconds is very fast, because the operating system "caches" the current time. However, the duration of method calls can be as low as a few nanoseconds, so a high resolution timer is needed. A high resolution timer works directly with a special hardware device and carries a noticeable performance overhead. In JProfiler, CPU recording is disabled by default. If you compare the duration of the startup sequence of an application server with and without CPU recording, you will notice the difference.
Wall clock time is measured separately for each thread. In CPU views where the thread selection
includes multiple threads, the displayed times can be larger than the total execution time of
the application. If you have 10 parallel threads of the same class MyThreadClass
whose run()
method takes 1 second and "All threads" is selected in the call tree,
the MyThreadClass.run()
node in the call tree will display 10 seconds, even though
only one second has passed.
Since the CPU might be handling many threads with different priorities, the wall clock time is not the time the CPU has actually spent in that method. The scheduler of the operating system can interrupt the execution of a method multiple times and perform other tasks. The real time that was spent in the method by the CPU is called the CPU time. In extreme cases, the CPU time and the wall clock time can differ by a large factor, especially if the executing thread has a low priority.
The standard time measurement in JProfiler is wall clock time. If you wish to see the CPU time in the CPU views, you can change the measurement type in the profiling settings. The problem with CPU time measurements is that most operating systems provide this information with the granularity of the standard timer - high resolution measurements would carry too much overhead. This means the CPU times are only statistically valid for methods that have a CPU time bigger than the typical granularity of 10 milliseconds.
The notion of time measurement must be refined further, since not all times are equally interesting. Imagine a server application with a pool of threads waiting to perform a task. Most of the time would then be spent in the method that keeps the threads waiting while the actual task will only get a small part of the overall time and will be hard to spot. The necessary refinement is done with the concept of thread status. There are 4 different thread statuses in JProfiler:
Object.wait()
and the current thread
will only become runnable again when some other thread calls Object.notify()
on the same object.
When looking for performance bottlenecks, you're mostly interested in the "Runnable" thread state although it's always a good idea to have a look at the "Net I/O" and "Blocking" thread states in order to check if the network or synchronization issues are reducing the performance of your application.
Nodes in the call tree (methods, classes, packages or Java EE components, depending on the selected aggregation level) are sorted by total time. This is the sum of all execution times of this node on the particular call path as given by the ancestor nodes. Only threads in the current thread selection are considered and only measurements with the currently selected thread status are shown.
Optionally, the call tree offers the possibility to show the inherent time of a node. The inherent time is defined as the total time of a method minus the time of its child nodes. Since child nodes can only be in unfiltered classes, calls into filtered classes go into the inherent time. If you change your method call recording filters, the inherent times in the call tree can change.
While the call tree view shows all call stacks in your application, the hot spots view shows the methods that take most of the time. Each method can potentially be called through many different call stacks, so the invocation counts in the call tree and the hot spots view do not have to match. The hot spots view shows the inherent time rather than the total time. In addition, the hot spots view offers the option to include calls to filtered classes into the inherent time. Please see the article on hot spots and filters for a thorough discussion of this topic.
When you open a hot spot node, you see a reverse call tree. However, the times that are displayed in those backtraces do not have the same meaning as those in the call tree, since they do not express a time measurement for the corresponding node. Rather, the time displayed at each node indicates how much time that particular call tree contributes to the hot spot. If there is only one backtrace, you will see the hot spot time at each node.
The times that are shown for nodes (methods, classes, packages or Java EE components, depending on the selected aggregation level) in the call graph are the same as those in the hot spots view. The times that are associated with the incoming arrows are the same as those in the first level of the hot spot backtrace, since they show all calling nodes and the cumulated duration of their calls. The time on the outgoing arrows is a measurement that cannot be found in the call tree. It shows the cumulated duration of calls from this node, while the call tree shows the cumulated duration of calls from the current call stack.