RizzoDev

Dripping Faucets and Memory Leaks

All Java developers will need to deal with memory leaks at some point in their careers. There are already numerous blog posts on the internet about developers' experiences. I figured I wold just add another, which discusses the trial and tribulations of my latest ordeal with a leaky faucet.

We encountered a situation where a long running application server infrastructure all of sudden started to experience sever unreliability issues. The application servers were starting to crash numerous times a day. The application server was Oracle's old OC4J J2EE container, however that is really irrelevant in this case. The application change would have caused issues on any application server platforms.

Initial Pipe Tightening

This particular matter was made a little more convoluted by the fact that there are actually multiple applications running in the same container. One of the applications was a historical memory hungry application which caused our infrastructure team to go down an errant path. They were watching process memory size and had concluded that we had reached a volume where our heap would eventually be exhausted and cause the container to crash. They started brainstorming on how they could lower the memory requirements on the server. They introduced two changes in attempt to help alleviate the situation: 1) they increased the max heap size another 256MB 2) they decided to try reducing the session timeout from 20 minutes to 5 minutes. You can easily deduce the rationale behind the 1st change. The thought behind the second change was a result of the fact that the environment we were dealing with was a service platform and there was no state being maintained in the application session.

Well these changes just mad matters worse. The first change ended up just increasing the CPU processes as the Garbage Collector (GC) started working a lot harder, and the session clean up processes just added more load. It obviously missed the target and the pipes were still leaking.

Call the Plumber

The tools available in the base JDK and the free tooling that is available on the web have really improved over the years. It has certainly become easier to start diagnosing issues, but it still is always difficult to find and batch the exact leak point.

Step 1: Water Test

The first step taken was to get jconsole (available in base SDK since Java 5) up and running

against the application server VM. As a developer it was advantageous to connect jconsole remotely form my development machine. Now each application server and version will certainly have some specific configuration that will be needed in order to connect. To connect to OC4J we had to include their admin client jar and some of the OC4J internal jars onto the classpath. We were then able to connect using a service jmx connection using the rmi protocol, service:jmx:rmi://hostname:port. To authenticate appropriately we had to use the OC4J admin console username and password. You may need to work with your devops team and/or sys admins to establish connectivity to your deployed development/test/prod environment. If you don't have proper activity/load on the server you may never see the problem. Obviously your other option is to attempt to create load/activity on a locally deployed version.

Once connected with jconsole you can start to review all sorts of trending information in regards to memory utilization, class loading, threads and CPU utilization.

After reviewing the graphing for awhile, I started to focus in on the memory tab. It became apparent pretty quickly that yes the heap was growing close to the max but would recover nicely once the GC ran. However I noticed that the non-heap "permgen" memory would just grow and grow and grow. Once it would near or hit the max permgen, configured as 256MB on our application servers, the server would crash.

Step 2: Flush the Dirty Water

At this point I started to suspect that there was some issue going on with the class loading and unloading. In other words I was suspicious of a classloader leak. I first tried to use jmap (another SDK tool) to dump the heap and permgen stats, but every time I tried to connect to the JVM process it would result in the VM crashing. Therefore I fell back to jconsole and used the jmx option to dump the heap. Look at the image below to determine the location of the option. The first argument specifies the name of the file to be created and the second argument indicates whether you want a garbage collection to be run before the heap dump is created.

Step 3: Send off to the Lab

I then used the the Eclipse Memory Analyzer (MAT) to start looking at the heap dump. Upon launching the MAT you get prompted whether you would like to run the Memory Leak report. You can see a snippet of the results which were produced by running that report. See output below:

The top suspect highlighted in the report re-enforced the possibility that we were actually experiencing a classloader leak. Note: the MAT tool is very powerful. It is worth spending some time with available tutorials to figure out everything you can do within the tool.

Step 4: Pressure Test

Next was to spend some time watching what was actually going on with the class loading to see if I could pinpoint the problematic spot in the code. In order to do that I simply turned on class loading tracing on the JVM. In this case, this would be the JVM used to run the application server. The information will be written to the stdout. Below are the two JVM options that were added to the start script:

-XX:+TraceClassLoading
-XX:+TraceClassUnloading

After the server was restarted and we watched activity in the production environment a clear pattern was able to be detected. There were classes being created from JAXB and the same derived classes were being loaded over and over again. Also it appeared that these classes were never being unloaded. The entries in the log would look like something below with actual package, class, method names inserted into the placeholders below.

[Loaded <package_name>.<class_name>$JaxbAccessorM ...<method_name>]

Step 5: Evaluation

With a bit of reading and searching the web there were numerous mentions about issues with JAXB either from bugs that caused memory leaks or misuse of JABContext.newInstance(). The later essentially indicating that use of newInstance() results in creating a new JAXBContext on the existing classloader and can result in numerous copies of classes being loaded.

Step 6: Resolution

Evaluating the code at hand we were able to pinpoint a case where a new JAXB Context was being created with each call to a specific method. This aligned perfectly with the duplicate class loading messages we were seeing in the stdout with the class load tracing on. The code was modified to use the same JAXB Context. You could accomplish this with a static initializer, singleton pattern or injection of a JAXB Context.

Once modified the crashes subsided!

Database Leak: getConnection() within Spring Transaction

I recently encountered an issue where one of my teammates was experiencing an issue where an application datasource would periodically become exhausted. After discussing it a few times, we hypothesized that it was a situation where there was a database connection leak. It was suspected the leak was within boundaries of some application code with a low call rate. This would line up with, why the application would run for an extended period and then all of a sudden max out its database connections. By inspecting the logs around the times of the database connection pool becoming exhausted it became pretty clear the potential location of the problematic code. The cause of the issue was not a result of necessarily bad code writing, but more of a case of not knowing how legacy semantics would work within the context of a spring transactional boundary. The semantics of how a call to DataSource.getConnection() behaves within the Spring transaction boundary is probably not ...

RizzoDev

Search This Blog

Dripping Faucets and Memory Leaks

Initial Pipe Tightening

Call the Plumber

Step 1: Water Test

Step 2: Flush the Dirty Water

Step 3: Send off to the Lab

Step 4: Pressure Test

Step 5: Evaluation

Step 6: Resolution

Comments

Post a Comment

Popular posts from this blog

Where to start on AngularJS for Java Developer

WSO2: Simple Pass through proxy (REST to REST with HTML Reply)

Database Leak: getConnection() within Spring Transaction