Behind The Scenes - How Profiling Actually Works



1. Introduction

Although it is not necessary to know about the internals of profiling to successfully profile your application, it can help you to interpret data that is produced by JProfiler, be more confident when setting up application servers and remote applications for profiling and analyzing problems with profiling in general. You might also just be curious to know what's going on under the hood.

2. Time, space and thread profilers

If you've been profiling C applications, you might know the distinction between time and space profilers. A "time profiler" measures the execution paths of your application on the method level whereas a "space profiler" gives you insight into the development of the heap, such as which methods allocate most memory. Recently, more and more applications are multi-threaded and thread profilers have been developed to analyze thread synchronization issues.

Most or these traditional profilers are "post-mortem" profilers where the profiling wrapper or profiling agent code writes out a data file when the profiled application exits. For an interactive profiler, it makes sense to compare and correlate data from all three domains, so JProfiler combines time, space and thread profilers in a single application.

3. How profilers collect data

A profiler must have some means to collect the data it displays. Profiling data can come from an interface in the execution environment or it can be generated by instrumenting the application of the application.

One of the most basic common profilers, the Unix shell command time, acts as a wrapper to the profiled executable and retrieves post-mortem information about the process from the kernel. Profilers for native applications on Microsoft Windows can attach to running applications and receive available debug information to calculate their profiling data. These are examples of interfaces in the execution environment where the the binary of your application are not modified by the profiler.

The gprof Unix profiler (part of Unix since 4.2bsd UNIX in 1983) can be hooked into the compilation process by specifying an additional argument to the compiler (-pg). In this way, profiling code is added to your application. When the application exits, a data file is written to disk that contains call trees and execution times to be viewed with the gprof application. gprof is an example of a profiler that instruments your application.

JProfiler takes a mixed approach. It uses the profiling interface of the JVM and instruments classes at load time for tasks where the profiling interface of the JVM doesn't provide any data or adequate performance.

4. The profiling interface of the JVM

The profiling interface of the JVM is intended for profiling agents that are written in C or C++. If you open the include directory in your JDK, you will see a number of files with the extension .h. Those are the header files that tell a C/C++ library about the interface that is offered by the JVM. The basis for all communication between a native library and the JVM is the Java Native Interface (JNI), defined in jni.h.

The JNI allows Java code to call methods in the native library and vice versa. From Java code, you can use the System.load() call to load a native library into the same memory space. When you call a method whose declaration contains the "native" modifier, such as public native String getName();, a function in the list of loaded native libraries is searched for. The required name pattern of the corresponding C-function contains the package, the class and the method of the declaration in Java code. JNI also defines how Java data types are represented in a C/C++ library. When the native C-function is called, it gets a "JNI environment" interface as an additional parameter. With this environment interface, it can call Java methods, convert between C and Java data types, and perform other JVM specific operation such as creating Java threads and synchronizing on a Java monitor.

Until Java 1.5, Sun offered an ad-hoc profiling interface for tool vendors, the Java Virtual Machine Profiling Interface (JVMPI). The JVMPI was not standardized and its behavior varied considerably across different JVM. In addition, the JVMPI was not able to run with modern garbage collectors and had problems when profiling very large heaps. With Java 1.5, the JVM Tool Interface (JVMTI) was added to the Java platform to overcome these problems. JProfiler supports both JVMPI and JVMTI. The interfaces are defined in in jvmpi.h and jvmti.h They utilize the JNI for communication with the JVM, but provide an additional interface to configure profiling options. JVMPI and JVMTI are an event-based systems. The profiling agent library can register handler functions for different events. It can then enable or disable selected events.

Disabling events is important for reducing the overhead of the profiler. For example, in JProfiler, object allocation recording is switched off by default. When you switch on allocation recording in the GUI, the profiling agent tells the JVMPI/JVMTI interface that the event for object allocations should be enabled. If a lot of objects are created, this can produce a considerable overhead, both in the JVM itself as well in the profiling agent that has to perform bookkeeping operations for each event. During the startup phase of an application server, a lot of objects are created that you're most likely not interested in. Consequently, it's a good idea to leave object allocation recording switched off during that time. It increases the performance of the profiled application and reduces clutter in the generated data. The same goes for the measurement of method calls, called "CPU profiling" in JProfiler.

The JVMPI/JVMTI interface offers the following types of event:

Some information, like references between objects as well as the data in objects are not available from the events that the JVMPI/JVMTI fires. To get exhaustive information on all objects on the heap, the profiling agent can trigger a "heap dump". This command is invoked when you take a snapshot in the heap walker. The heap dump is performed differently for JVMPI and JVMTI: The JVMPI packs all the objects on the heap and the references between them into a single byte array and passes it to the profiling agent. That byte array is then parsed by the profiler and converted to an internal representation. Naturally, the memory requirements of this operation are huge: first, the heap is essentially duplicated in the byte array, then the profiling agent must parse it and translated it to data structures. In order to reduce the peak of the memory requirement, JProfiler saves the byte array to a temporary file on disk, releases the array and parses the contents of the temporary file. When profiling an application that maxes out the available physical memory, taking a heap dump can crash the JVM, simply because not enough physical memory is available to allocate the huge required regions of memory. With JVMTI (>= 1.5) the situation has much improved. With JVMTI, JProfiler can enumerate all existing references in the heap and build up its own data structures.

5. How the profiling agent is activated

Unlike a JNI library that you load and invoke from Java code, the profiling agent has to be activated at the very beginning of the JVM startup. This is achieved by adding the special JVM parameters

        -Xrunjprofiler
      
for Java <=1.4.2 (JVMPI) or
        -agentlib:jprofilerti
      
for Java >=1.5.0 (JVMTI) to the java command line. The -Xrun or -agentlib: parts tell the JVM that a JVMPI/JVMTI profiling agent should be loaded and the remaining characters of the parameter constitute the name of the native library. The canonical name or a native library depends of the platform. For a base name of jprofiler, the library name is jprofiler.dll on Microsoft Windows, libjprofiler.so on Linux and Unix, and libjprofiler.dylib on Mac OS X.

Parameters can be passed to the native profiling library by appending a colon for the JVMPI or an equal sign for the JVMTI to the profiling interface VM parameter and placing the parameter string behind it. If you pass the -Xrunjprofiler:port=10000 or -agentlib:jprofilerti=port=10000on the Java command line, the parameter port=10000 will be passed the the profiling agent.

If the JVM cannot load the specified native library, it quites with and error message. If it succeeds in loading the library it calls a special function in the library to give the profiling agent a chance to initialize itself.

6. Profiling agent and profiling GUI

Unlike basic profilers that collect data and write out a data file to disk, advanced profilers can display the profiling data at runtime. Although it would be possible to start the GUI directly from the profiling agent, it would be a bad idea to do so, since the profiled process would be disturbed by the secondary application and remote profiling would not be possible. Because of this, the JProfiler GUI is started separately and runs in a separate JVM. The communication between the profiling agent and the GUI is via a TCP/IP network socket. This is also the case if you start applications in JProfiler that are configured as "local" sessions.

In order to profile successfully, it's important to choose the right profiling parameters, especially the filters that limit the extent of the recorded call tree. Since this information is required at startup, the profiling agent stops the JVM and waits for a connection from the GUI where these parameters are configured. Once the connection has been established, the profiled application is allowed to start up.

The recorded profiling data resides in the internal data structures of the profiling agent. Only a small part of the recorded data is actually transferred to the GUI. For example, if you open the call tree or the back-traces in the hotspots views, only the next few levels are transferred from the agent to the GUI. If the entire call tree were transferred to the GUI, potentially big amounts of data would have to be transmitted through the socket. This would make the profiled process slower and remote profiling between different computers would not be feasible. In essence, you could say that the profiling agent keeps a database of the recorded profiling data while the GUI is a client that sends user-initiated queries to the database.