The incredible #zip -u

The customer called… according to him, a functionality that was working fine, mysteriously stopped working and it is up to you to investigate the problem. Just a reminder that you have those 17 items in your backlog that were supposed to be finished yesterday and this new incident is high priority, the phone won’t stop ringing, several business users are calling, the manager is coming towards your desk, the walls are closing in… you just found the component, look at that? The company was outsourcing the project so It was developed by a bunch of<insert random nationality here> back in 2009, that’s just great! In summary: Just another happy day in the IT world.

Ok, so here’s one of those posts where I share a walk-through on how to troubleshoot a problem with a Java Web Application. Let us assume that you already mapped the issue on the client-side, opened your browser’s Web Debugger (e.g., Firefox’s Firebug or Chrome’s Dev tools), read the Javascript code and found out that something in the back-end is messed up and it is not producing the expected result.

When you move on to the server-side, bear in mind that, in order to understand what the application is doing, you will have to READ CODE. Unfortunately, it is common to find situations where comprehensive up-to-date documentation is not available.

Let’s say that a class called “ChaoticUtils” shows up in some stack trace in the Application Server’s logs, how can we find the class and investigate further? Assuming that you don’t have access to the source, or, even worse, you can’t trust the release management process! In such cases it’s best to just get the JAR and decompile the class (you will be 100% sure you are looking at the exact same code that is being executed there). However, as you know, in the Java world it is common to have lots and lots of JAR packages that can be used for extensions/dependencies. In order to find the JAR package of a specific class, you can use this tool: “Whereis.jar“. Here’s how to use it:

$ java -jar whereis.jar CaothicUtils /opt/company/webapps/

Here’s the output generated by this tool:

/opt/company/webapps/ext_ear/APP-INF/lib::ext-links.jar::com/company/xxx/utils/CaothicUtils.class

Now let’s decompile this class to see what is going on. I recommend this tool called “JD Decompiler“, just load the .jar package into the program and it will decompile all the classes. Here’s a screenshot of its interface:

ext-links-jarIn case the root cause can’t be identified straight away just by looking at the code, you can introduce additional logging lines, e.g., System.out.println will print messages to the standard output (stdout) of the Application Server. So the objective would be to change the class or a JSP that is stored within a WAR or a JAR package to add these additional log messages, and, to speed things up, we are going to do that without having to unpackage, change the files and recreate the package.

So, without further ado, I introduce you to the incredible ‘zip -u‘!

For the ones that have some experience with Linux, the zip command is something very common, but I would like to focus on a specific parameter of this command: -u.

-u Replace (update) an existing entry in the zip archive only if it has been modified more recently than the version already in the zip archive. For example:
zip -u stuff *

Imagine that, in order to debug a JSP or a class, we would have to insert some lines of code to see if the variables were being populated accordingly, we certainly don’t want to unpackage every .war or .jar to insert a modified JSP or class during the troubleshooting.

For situations like that, you can use “zip -u”.

Of course… using the decompiled code to perform Remote Debugging is also an option, you can just set some breakpoints in your IDE and visualize the values of each variable through the DEBUG interface, here are the JVM arguments that you can use in order to do that:

-Djava.compiler\=NONE -Xdebug -Xnoagent -Xrunjdwp\:transport\=dt_socket,server\=y,address\=6065,suspend\=n

Anyway, in order to update, let’s say, a JSP inside a .war package we need to create the same folder structure outside the war and place a copy of the JSP inside this structure, for example:

chaotic.war
|– scripts
|– common
.  |– jsps
.    |– links
.      |– example.jsp
|– WEB-INF

If you are in the same directory as the ‘chaotic.war’ file, use the command “mkdir -p jsps/links” to create the folder structure and then copy the JSP to the ‘links’ folder. After you modify the JSP using your favorite editor (e.g., Emacs or vi) you just need to run the following command:

$ zip -u chaotic.war jsps/links/example.jsp

You should see something like this:

updating: jsps/links/example.jsp (deflated 66%)

** Bear in mind that it is necessary to redeploy the .ear package if you are updating a .jar inside ‘APP-INF/lib’ or the .war if you are updating a JSP or a .jar package, this is done through the Application Server, that’s how the changes take effect.

You can follow the same approach for classes inside a .jar package. If there is no Logger configured in the class and you are adding the good ol’ System.out.println() all over the code, you can discover what is the standard output (stdout) of the messages that are being generated by this class by following this approach, first run this command:

$ ps fuxa | grep java

Then identify the PID of your Application Server:

someuser 11058  3.6 10.1 2607008 1660612 ?   Sl   May13 225:40  \_ java  -Dprogram.name=run-default.sh -server -Xms1536m -Xmx1536m -XX:MaxPermSize=192M -XX:PermSize=192M -XX:+UseParallelGC -Xloggc:gc.log -XX:+PrintGCTimeStamps  -XX:+PrintGCDetails -XX:+PrintTenuringDistribution -XX:-TraceClassUnloading -Djava.net.preferIPv4Stack=true -Djava.endorsed.dirs=/opt/company/jboss-4.2.2.GA /lib/endorsed -classpath /opt/company/jboss-4.2.2.GA/bin/run.jar  org.jboss.Main -c default -b 0.0.0.0

Now, check the File Descriptors associated with this Process ID, by doing that we can identify which logs this process is writing to:

$ ls -la /proc/11058/fd | grep .log
l-wx------  1 someuser somegroup 64 May 17 14:44 1 ->  /opt/company/jboss-4.2.2.GA/bin/scripts/stdout_stderr_201105130805.log

Or you can just search for the text fragment you added to the log lines, assuming that you added a line that produces “###linkVar###” in the stdout, you can run:

$ find /opt/company -type f -print | tr '\n' '00' | xargs -0 grep -i ####linkVar####

And with that, we come to the end of another post.

I hope “zip -u” makes your life easier, if you have any cool tips for troubleshooting or productivity, please share with us in the comments section below.

Cheers!

Devops: Buzzword or the catalyst to fight conformity?

I have been meaning to write about this for quite some time now because this is the kind of stuff that should be chewing on every techie’s ear lately. Let me summarize the concept of DevOps from the point of view of a typical old-school manager (it’s funnier this way):

“ANARCHY! Developers jumping out of their cubicles and bashing into the server room bringing chaos and instauring pandemonium within the company”.

Now, here’s what it really means:

 “To bring Development and Operations together to build and deliver software more effectively and efficiently”.

This is cool but I want to take this post beyond the main aspects of DevOps, a good Release & Deployment process is definitely a  subject for an extensive discussion but the essence of it, the restlessness, that’s the point I want to touch today.

We love technology, we love to experiments with the “new toys”, either hardware or software (in my case, specifically, it’s definitely software due to budget issues), but I sincerely believe that the majority doesn’t want to assimilate any of these latest libraries/middlewares/APIs/Frameworks/methodologies/egregores frivolously, there’s value behind these tools, otherwise we wouldn’t have the hype around them and all the companies (or independent entities) behind such technologies wouldn’t be succeeding as they are. Now here comes the challenge: how do you introduce these changes to your project? It helps if you are the Senior Developer, it’s even more helpful if you are the Team Lead, but what about mere mortals, developers that are fighting on the trenches on a daily basis, or even enthusiasts that are labelled as “Systems Engineer” or “Support Analyst” (Yeah, I’m including myself in this category) that just don’t have a voice to break paradigms, some of them will give up and comply, another group will leave the company and there are those that will turn the apparently irreversible mess into something better.

I will present the archetypes that I’ve defined for each one of these developers (or IT Professionals in general):

The first group that gives up can be classified as “Furniture that writes code” – They are the guys that come to work everyday to do what they’re told, never bring anything new to the table, wait until 5 PM so they can go home and wait for Death to pay them a visit.

WinstonChair2

There’s the second group that I call “The Prodigious Tourists” – These guys (and girls) are geniuses, they carry a bias against mainstream stuff like Java or .NET, always leaning towards trending stuff, most of them would write a “Hello World” and start spreading the word about the new “silver bullet” that is out on the market, everything that you use is legacy technology for them, their skills are just as good as their ability to keep whining about all the company problems without presenting any tangible idea to solve them. They will, in most cases, leave the company to work for some cool startup where the receptionist is dressed as a Pokemon, then, as its product/service catalogue evolves, this company hires a consultant, things start getting too bureaucratic and they will pack their bag and move on to the next one.

nodejs

And then we have “The Mavericks” –  Office pariahs, people in the coffee room laugh at them because of their crazy ideas, they want to improve things, naive day dreamers that should not be near a server, they will struggle with their limited network access & awareness of office politics to enhance processes leaving a trace of rejected Proofs of Concept along the way.

cody

Maybe my interpretation of the latter is a little bit hyperbolic, but this one brings us closer to the profile of someone that needs to be involved in your company’s DevOps initiative, or any other cultural- change initiative for that matter. The restlessness should go beyond DevOps, the term was coined and gained notoriety to tackle a specific (and critical) problem: deliver software; So did “Agile” and “Extreme Programming” that came before it, but what about other inefficient processes that you have identified within the company? Why do you need 5 tickets to copy a file to that Websphere node? Why does your security request takes 3 weeks to be processed? Why Developers are not committing their Stored Procedures into version-control? Every company has similar issues and it’s easy to ignore them despite the pain and over-bureaucracy that they bring to your project, you can say that the problem lies in another department and, therefore, it’s out of your scope or that you don’t have a voice, no political power whatsoever, to raise a flag about these problems so you can’t do anything about it, these are all valid points as long as you wait for the right moment to strike and don’t let this inconformity flame be squelched, the worst excuse that I can imagine is the classic “That’s the way things are done around here”:

5monkeys1ladder

 

I heard about Hudson (proprietary father of Jenkins) before the “Continuous Integration” revolution, the little DTSTTCPW programs that were being used for Unit Testing arose way before the “Agile Manifesto”, but the methodology only becomes evangelizable when these cool buzzwords start flying around, which is definitely beneficial because the manager likes whatever he reads on trending magazines.

dilbert

That’s why DevOps is so cool, it gives you an opportunity to play with the new toys and, most important, to fix processes, the road to build and deliver software has so many aspects that present many opportunities to enhance and/or eliminate many things. Now you can finally share your opinions and ideas, you can externalize all your frustration.

office_space_printer

That’s it, if your team has a lot of messy processes and you are worried about how you should approach DevOps, there’s a brilliant talk by John Esser entitled “Creating a Culture for Continuous Delivery” that gives you 8 lessons to start breaking the paradigms with your company, I believe it’s an amazing place to start. You can read about the tools, install Jenkins on your machine, code a bunch of automation scripts but in the end, the company culture will present itself as the most challenging obstacle. Good luck!

Nashorn and the JVM Monitoring Challenge – PART 2

Hello and welcome to the second part of the “Nashorn and the JVM Monitoring Challenge” series, we will continue our quest to unveil what kind of chaotic things we will see once the JVM starts processing the bytecode that is executing untyped dynamic languages.

Let’s start with some good news: You don’t have to follow the OpenJDK Build instructions to start playing with Nashorn anymore, it was finally integrated to an early build of the Oracle’s JDK 8, I haven’t tried myself (Still playing in Ubuntu 12 with OpenJDK 8) but it should be there.

So, let’s get started, the agenda for today is going to be:

  1. Running a test script with the Nashorn Javascript engine
  2. Understanding Invokedynamic (It should be helpful to dive more into what we are going to see later)
  3. Monitor the JVM while running a Nashorn application

Running a test script

The Nashorn engine can be loaded in a Java class and then, with the instance of the ScriptEngine object, use the eval() method to execute Javascript code. Just write and compile the following class:

import javax.script.*;
public class EvalFile {
 public static void main(String[] args) throws Exception {
     // create a script engine manager
     ScriptEngineManager factory = new ScriptEngineManager();
     // create JavaScript engine
     ScriptEngine engine = factory.getEngineByName("nashorn");
     // evaluate JavaScript code from given file - specified by first argument
     engine.eval(new java.io.FileReader(args[0]));
 }
}

Once you have your ‘EvalFile’ class ready, create a dummy .js file just to give it a try, write something like:

print('Hello World');

Then you can execute this script like this:

java -cp nashorn.jar:. EvalFile dummy.js

In order to speed things up, I’ve created an alias to invoke the ‘jjs’ command-line tool so I don’t have to use this EvalFile class.

$alias jjs='/var/jdk8/openjdk8/nashorn/bin/jjs'

now, the .js file can be executed like this:

jjs dummy.js

Moving on, my test script here will only be used to explore how we can keep track of the chain of function calls and the number/size of objects that are being allocated into the JVM’s heap, if you really want to experience the power of Nashorn, you can refer to the official ‘Java Scripting Programmer’s Guide‘.

This test comprises two files:

— Model.js —

function Person(name, address, phone) {
    this.name = name;
    this.address = address;
    this.phone = phone;
    this.sayHello = function() {
        return "Hi, my name is " + this.name;
    }
}

— testNashorn.js —

#!/usr/bin/jjs
#
var Thread = java.lang.Thread;
load("./Model.js");
print("Welcome to testNashorn.js");
var people = [];
for(var i=0;i<100;i++){
    var p = new Person("Marcelo","Caucaia Street 17","+353 086 5555555");
    people.push(p);
}
//Lots of objects were allocated into memory at this point 
Thread.sleep(30000);
var myFunction = function(){
    var text = people[0].sayHello();
    print(text);
};
myFunction();

The first one (Model.js) is where I’m declaring my Person “Class”, 3 attributes and 1 meth.. sorry, function, the second file (testNashorn.js) is the actual program, it first transforms the Thread Java class into a Javascript variable, then loads the Model.js, i.e., runs the Javascript code inside that file and prepares our Person() function/constructor, it declares an empty objects array and enters a loop instruction that is going to create 100 objects in memory, after the ‘for’ loop, as I will need a second to trigger my script to generate a heap dump, I decided to add a ‘Thread.sleep(30000)’ there (30 secs is more than enough), once the program awakes, it declares a function (myFunction) that is going to print the value returned by the ‘sayHello()’ function from the first object stored in the ‘people’ array, this function is then called right afterwards.

Now, we can run the program:

jjs testNashorn.js

The output should be something like this:

Welcome to testNashorn.js
[30 sec pause]
Hi, my name is Marcelo

That’s it, we have our test script, let’s move on to the second topic.

Understanding invokedynamic

Ok, we already know that the Nashorn Javascript engine is 100% compliant with ECMA-262 5.1 and it is fully implemented with the new “invokedynamic” bytecode instruction, therefore, is faster and more compliant than Mozilla’s Rhino, but what’s invokedynamic?

Executing Java code is not the JVM’s solely purpose, every Java code is compiled into bytecode and this is the piece that gets consumed and processed by the JVM, if a programming language can be compiled to bytecode than its instructions can be interpreted by the JVM (e.g., Closure or Scala). The bytecode is an efficient simplified form of non-human-readable code that is executed closer to machine-level instructions, i.e., better performance.

The JVM has approximately 200 “opcodes” to perform invocation of instructions, handle access to fields and control objects and arrays. The following table presents the types of invocation bytecode operations that were available before JDK version 7:

Opcode

Usage

Invokestatic

For static methods

Invokevirtual

For non-private instance methods

Invokespecial

For private instance

Invokeinterface

For the receiver that implements the interface

A simple invocation to a method starts from a given “Call Site”, which is from where the request is initiated; it is assembled with the name of the method, the signature (access level, modifiers and return type) and the arguments that are processed by this method, the JVM will process this Call Site information and go through a set of operations: It is going to look for that method’s code within memory (Lookup), check if the types involved in the operations match (Type Checking), invokes the actual code (Branch) and then caches the location of that method so, if it is going to be needed again soon, the JVM already knows that memory address and speeds up the process (Cache).

normalCall

The new “Invokedynamic” bytecode operation allows the JVM to customize how the resources for the Call Site are assembled (dynamically) and also perform a different set of operations within the JVM so the field or method can be accessed (invoked). Instead of the regular Call Site, it integrates bytecode (invokedynamic operation with name and signature) with a bootstrap method, this is the component that will connect the Call Site with the “Method Handle”, once the handle finds the correct way of making this invocation occurs, the JVM will optimize the operation and the invokedynamic bytecode will be attached to the “Target Method” to avoid processing all these steps again. In a scenario where a scripting language that is running within the JVM needs to access a specific function, it is going to initiate the process by providing the bootstrap method with the invokedynamic instructions (name of the function followed by arguments and the return type), the JVM will look for the function within a Method Table (list of functions that are not associated to any object or class) based on the arguments that are defined at runtime (Lookup), once it finds the function, it will perform some language-specific type checking (Type Checking) and then it will finish the bootstrap process connecting the Call Site with the Method Handle so it can be executed (Branch), this connection is performed only once but Call Sites can be connected to new Method Handles.

dynamicCall

If you want a more in depth explanation, I strongly recommend this blog post here:

http://niklasschlimm.blogspot.ie/2012/02/java-7-complete-invokedynamic-example.html

Monitoring the JVM while running a Nashorn application

We have reached the last part today’s post, it’s time to diagnose the Rhinoceros (or… at least, try).

diagnoseNashorn

Let’s start with some Thread Dumps: if we run our testNashorn.jjs and take a few thread dumps (using the instructions documented in our previous post), this is what we get once we load them into our Thread Dump analyzer:

ThreadDumpNashorn

That’s it, say goodbye to the good-old readable stack trace. On this first analysis we reinforced once more how this change of paradigm will affect the way Java Performance Analysis and Troubleshooting is done nowadays, the chain of execution presented on the stack trace of the “Main” thread resembles bytecode instructions, the best clue to easily identify what initiated each set of instructions is the name of the Javascript file that is declared in the “jdk.nashorn.internal.scripts.Script” class (it can be found at the bottom of the stack trace), there are some familiar things there like the JVM native and internal threads but the rest got pretty cryptic for me.

So, what’s our alternative? As far as I know, there isn’t any. We can only use some arguments to run the program and get some debug data that is supposed to give us some clues as to where the calls are coming from, but it is not very clear. I believe that, if we grok the concepts behind ‘invokedynamic’, we can use the “–print-code” Nashorn argument and produce some debugging output that can be interpreted based on the dynamic calls that are generated by the Nashorn engine:

--print-code

We can also get more verbose results with the following command:

jjs -Dnashorn.debug=true -Dnashorn.methodhandles.debug.stacktrace=true --log=codegen:info testNashorn.js

What if the program hangs during the execution of a particular function? Under the development phase is pretty easy to just attach a debugger and step through the function calls but what if we are troubleshooting something in the production environment? I’m not sure if we have a valid alternative to that, if you have any ideas, please share on the comment section below.

What about Heap Dumps? Let’s see what we get when we take a Heap Dump during the execution of our testNashorn.js script:

HeapDumpNashorn

Yep, it also got a little weird here. Since we don’t have packages and typed classes, there’s no way to easily track down where are our “Classes” and the number/size of objects associated to them, I did some investigation and found this “jdk.nashorn.internal.scripts.JO” object that apparently serves as a wrapper to the objects created through Javascript functions, the downside is that it doesn’t separate the objects based on its “Class” (at least I didn’t find any parameter that pointed me anywhere near the “jdk.nashorn.internal.scripts.Script$Person” object), so if you have 100 instances of ‘Person’ and 100 instances of ‘Car’, they will be mingled in this sea of ‘JO’ instances (I haven’t tested other object forms yet, e.g., Object Literals; not sure if we would see something different). So, how do we easily keep track of the size of objects created from a given function()? Well, we could rely on the format of the attributes and play with OQL (Object Query Language) and isolate a given set of objects to determine how much space they are taking up, but that’s just messy. Currently, there are a few DEBUG parameters documented in “$OPENJDK8_HOME/nashorn/docs/DEVELOPER_README“, some of them are quite interesting and might provide the answers we need, e.g.:

print(Debug.map(p));
print(Debug.dumpCounters());

Now, to conclude this post, I leave you with a message from Jim Laskey, this is one of the replies that he sent me when I was questioning his team about these concerns:

“The next round of development will be focusing on tools, so what you are trying to accomplish will get easier. You have an opportunity to provide us guidance on this…. Stack crawls will get better once we start working on debugging APIs.”.

So there you have it, if you are interested in contribute to these debug APIs I hope this post can provide some guidance and/or raise awareness on the difficulties that we might face in a near future where we will be troubleshooting Nashorn-based enterprise applications.

Good Nashorning everyone! 😀

Nashorn and the JVM monitoring challenge – Part 1

 
Before we start Nashorning, I would like to contextualize what I’m trying to do here, I was curious about what kind of data we would see while monitoring a JVM executing the Nashorn Javascript engine, considering the deep technical details behind these untyped dynamic chaotic scripting languages, we should see something interesting, right? So, moved by curiosity, I’ve downloaded the OpenJDK 8 to start playing with Nashorn and dig more into that. I will share the results of my research but first, let’s talk about how JVM monitoring is done today.
 
 

Why do we monitor the JVM? Two reasons: Performance & Benchmarking.

 
Performance: What if your application is slow? Hanging? Is it crashing all of a sudden? Is there some specific event that freezes the entire system for a moment? Yeah.. you must find the root cause of this bottleneck, but how? Don’t worry, I will present a few tools/commands and share a bunch of vague conjectures to leave you even more confused some cool info to get you started on this amazing world of the JVM Monitoring, are you psyched?
 
Benchmarking: That’s when you take note of a given set of statistics that were generated while monitoring the system was being monitored and then you can use this report to compare how things change over time, i.e., after new patches, releases, etc. It’s an interesting way to avoid being caught with your pants down, if the system is changing over time you need to check if the resources you have will handle the work load.
 
I want to cover 3 things today:
1) Thread Dumps
2) Heap Dumps
3) Profilers
 
Cool, now… let’s see some code, shall we?
 

Thread Dump analysis

 
import java.io.BufferedInputStream;
import java.io.FileInputStream;
import java.io.FileNotFoundException;
import java.io.IOException;
public class TestRaceCondition {
    public static void main(String[] args) {
        for (int i = 0; i < 10; i++) {
            new MyFileReader().start();
        }
    }
}
class MyFileReader extends Thread {
    public static int num = 0;

    public void run() {
        try {
            BufferedInputStream reader = new BufferedInputStream(
            new FileInputStream("/home/themarcelor/someFile.txt"));
            readThisFile(reader);
        } catch (FileNotFoundException e) {
            e.printStackTrace();
        }
    }
    public static synchronized void readThisFile(BufferedInputStream reader){
        try {
            while (reader.available() > 0) {
                System.out.print((char) reader.read());
            }
           System.out.println(num++);
           Thread.sleep(20000);
        } catch (IOException e) {
            e.printStackTrace();
        } catch (InterruptedException e) {
            e.printStackTrace();
        }
    }
}
 
This code is going to create 10 threads and read the contents of “somFile.txt”, you can easily create a dummy file somewhere in your machine and adapt the code to read its lines, now.. let it create a few threads and then let’s take some Thread Dumps.
 
challenge Questions: Why are we doing this? What is a Thread Dump?

 

Answer: We are doing this to illustrate how a multi-threading program can, sometimes, face a situation where a given resource will be ‘locked’ by another activity or it won’t be available due to external factors, e.g., a number of connections were borrowed from a JDBC Connection Pool and the database can’t accept any more connections, there will be several threads/transactions waiting on the queue to be processed (just like our threads that will be waiting to read ‘someFile.txt’), that might slow down some activities in your system.

 

A Thread Dump is like a picture of whatever is going on inside the JVM, it’s a full report of all the Java objects/classes that are being executed at that specific moment, i.e., a bunch of stack traces containing the state of the threads along with information about locks.

 
Now, let’s create a small script to generate Thread Dumps:
 
#!/bin/sh
SLEEP=$1
PID=$2
while : ; do
  date
  kill -3 $PID
  sleep $SLEEP
done
 
This can be executed like this:
 
./takeThreadDumps.sh 5 4321
 
It is going to trigger the SIGQUIT signal against the Process Identifier (PID) 4321 every 5 seconds, it’s interesting to take multiple thread dumps because it help us analyze what happened within a given time-frame, we can identify what has been running for the last 10, 15 or 20 seconds, if some ExecuteThread: ’16’ that is processing some DB activity (e.g., oracle.jdbc.driver.T4CXAResource.doRollback (T4CXAResource.java:648) ) is showing up in 2 Thread Dumps that were taken in the interval of 10 seconds, something strange is going on.. you better call the DBA.
 
Yes, that’s for linux users, if you are a Windows user ( 😦 ) then you can run the Java command through your DOS prompt and hit [Ctrl] + [Break] to manually generate Thread Dumps, but it’s best to just use a command-line tool called ‘jstack’, it’s OS-independent, e.g.:
 
$JAVA_HOME/bin/jstack [PID] > td.txt
 
If you open one of these Thread Dump files, you are going to see something like this:
 


2013-05-22 12:30:43
Full thread dump Java HotSpot(TM) 64-Bit Server VM (23.7-b01 mixed mode):
“Attach Listener” daemon prio=10 tid=0x00007fddec001000 nid=0x491d waiting on condition [0x0000000000000000]
java.lang.Thread.State: RUNNABLE
“DestroyJavaVM” prio=10 tid=0x00007fde04009000 nid=0x48dc waiting on condition [0x0000000000000000]
java.lang.Thread.State: RUNNABLE
“Thread-5” prio=10 tid=0x00007fde04221000 nid=0x48ef waiting on condition [0x00007fdde36e6000]
java.lang.Thread.State: TIMED_WAITING (sleeping)
at java.lang.Thread.sleep(Native Method)
at MyFileReader.readThisFile(TestRaceCondition.java:40)
– locked <0x00000000db9c5f20> (a java.lang.Class for MyFileReader)
at MyFileReader.run(TestRaceCondition.java:28)

 
It’s perfectly possible to analyze the file without any tools, or even write some code to ‘squeeze’ the long running threads throughout a given set of Thread Dumps files, however, some tools can speed up the process by organizing the data to make it easier for our eyes, for that, I recommend a too called ThreadLogic. Here’s a screenshot of its interface loaded with a Thread Dump from our ‘TestRaceCondition’ program:
 
Threadlogic
 
So, as you can see, it categorizes the threads to facilitate the analysis, e.g., a useful command there can be found under “tools > Find long running threads…”, I won’t dive too much on its features as it’s pretty intuitive.
 
Now, about the Thread Dump Analysis: If you are a Java programmer, all this stuff will be processed by your brain naturally as you already understand Java Threads concepts, such as: synchronized, ‘Wait() & notify()’, sleep(), etc. If you are not a programmer, well.. I personally believe that every IT professional should strive to master two pillars: Code and Infrastructure, I know many programmers that hate network, server stuff and many systems engineers that hate programming, this has to stop. Anyway, let’s not digress on this philosophical matter,  if you are not a programmer and want some rules of thumb I only have one for you: Look at the monitors.
 
Imagine the JVM monitor as a supervisor of a construction site, 3 guys need to dig a ditch but they only have one shovel, so they need to take turns (synchronized), the first one grabs (locks/acquire the monitor) the shovel (resource) and the supervisor makes sure that they don’t make a mess and try to do the same thing at the same time (corrupt resource), when the worker gets tired he drops the shovel (release) and call the next guy (notify) that was waiting (wait, duh), some of them can take a nap (sleep) and some of them might wish to keep digging until they finish their part of the job and then allow someone else to take over (join). I hope you got the gist, here’s an image to illustrate this pathetic analogy (yeah.. that thing on the right-side it’s a monitor).
 
Monitor
 
Now, based on these concepts that you learned here, check the Tread Dumps again and share your impressions in the comments section, even better, share your personal experiences with Thread Dump analysis, I bet everyone would find this interesting.
 

Heap Dump analysis

 
Heap Dumps are ‘snapshots’ of heap usage, i.e., tables of data indicating how many classes/objects are loaded, how much memory the classes/objects are using and what references they have between each other, these Heap Dump files store information in a very specific format and can only be visualized with Heap Dump Analysis tools, e.g., “Eclipse Memory Analyzer” or “IBM HeapAnalyzer”.
 
This post is getting quite lenghty so I’m glad I have a bunch of other posts ready to cover this section:
 
 

Profilers

 

Here is where we do the real monitoring, the recording and analysis of the JVM’s internal data is done through a set of tools called ‘profilers’, Profiling is a mechanism whereby one of these tools connect to the JVM through the Java Virtual Machine Tool Interface (JVMTI) to retrieve information about cool JVM stuff, like Threads  and Memory (heap) usage.

It also allows a system administrator to send commands to Managed Beans through Java Management Extensions (JMX) methods.

Ok, let’s get down to business, run your java program with the following JVM arguments:

 

java -Dcom.sun.management.jmxremote.port = 9080 -Dcom.sun.management.jmxremote.authenticate = false -Dcom.sun.management.jmxremote.ssl = false TestRaceCondition

 

Now you should pick your Profiler, there are many implementations that work with JVMTI, here are a few options:

 
JRockit is my favorite (which should be replaced by Java Mission Control soon), the interface is very practical and it’s full of features (Czech out the video!), Visual VM is simpler but has a cool GC log viewer plugin, I’m not a big fan of jConsole but it’s shipped with older versions of the JDK so it might come in handy, Eclipse TPTP gives that feeling of working within the same space, which is good but you will need to download too many things to start using it (it’s pretty cool though), in summary: they’re all awesome! It depends on your mood.
 
Ok, that’s it for today, in the next part of the “Nashorn and the JVM monitoring challenge” series, we will learn more about ‘invokedynamic’ and the bizarre results that we get when we try to monitor/troubleshoot Nashorn-based applications.
 
Cheers!

Out Of Memory Errors and Heap Dumps

OMG! an Out Of Memory error! terror

Exception in thread "ThisIsM-YThread-0" java.lang.OutOfMemoryError: Java heap space.

This error just showed up in your application logs? Run to the hills!

In times like this it’s common to hear someone saying:

“Well, we can just add more memory to the server…”.

Ideally we need to investigate what really happened. Adding more RAM to a server or reconfigure the -Xmx (JVM argument that defines the maximum space of the Heap) it’s not the silver bullet here. See, per instance this basin full of water:

bacia-com-agua1

Imagine that, at the bottom of this basin, there’s a hole to drain the water, it keeps filling until a certain level e gets drained in an ideal pace, everything ok so far? Ok, imagine the same scenario where the water stream is so big that it’s filling the space faster than the hole can drain, eventually this basin is going to overflow. Therefore, considering this scenario we have to ask ourselves: Does increasing the size of the basin solve our problem?

The water is going to overflow regardless, the ‘OutOfMemoryError’ is going to happen anyway because you might have humongous objects in your heap or many many long living objects that are being constantly created.(http://en.wikipedia.org/wiki/Memory_leak).

So, you can generate a Heap Dump to analyze these objects, here are a few methods to generate Heap Dumps:

1) -XX:+HeapDumpOnOutOfMemoryError

2) -XX:+HeapDumpOnCtrlBreak                                                              (Windows)

3) jmap -dump:format=b,file=HeapDump.hprof <pid>                (JVM’s PID)

*The HeapDump will be generated in the 'working directory'
(e.g., In Tomcat, the HeapDump is generated in the /bin folder)

Here’s a good resource that gathers all the best approaches to generate Heap Dumps:

http://wiki.sdn.sap.com/wiki/pages/viewpage.action?pageId=33456

How do I read a Heap Dump?

You are going to need a tool that can interpret the Heap Dump, I recommend IBM Heap Analyzer and the Eclipse Memory Analyzer. If you already generated your Heap Dump file in the .hprof format so it is ready to be analyzed.

IBM Heap Analyzer: http://www.alphaworks.ibm.com/tech/heapanalyzer

1) Run the ‘ha398.jar’ file with the following command:

java -jar ha398.jar

If the Heap Dump is a large file, you might need to allocate more memory to the tool:

java -Xmx1800m -jar ha398.jar

If you have a decent machine, eventually you will gonna get through this screen here:

heap_analysis_1

2) After that, we will see a summary with Heap details:

heap_analysis_2

3) The analysis will take different paths depending on the scenario, if you are looking for a specific objects, few instances that are taking up too much space or large number of objects that are gradually occupying most of the Heap (i.e., Leak Suspect), the tool provides different ways of visualizing the objects (Object Types, instances and associations):

heap_analysis_3

The ‘Eclipse Memory Analyzer’ works in a similar way, but beyond the cool graphs, it can focus the analysis by automatically identifying the leak suspects providing a report that starts from the Threads that were allocating these objects when the Heap Dump was taken:

heap_analysis_4

Remember that the leak suspect routines might lead to wrong conclusions, don’t trust in any automatic analysis, understand how your application works e verify what is being expected and what is not, don’t assume that you can open any Heap Dump and point out the culprit, understanding the low-level functionalities is surely helpful.

Another advantage of Eclipse Memory Analyzer it’s the interface, check out this ‘tree view’:

heap_analysis_5

 
 

Garbage Collection logs

Minor Collection e Full GCs

When the Garbage Collection occurs in the Young Generation, we have a minor collection, when it occurs on the Old Generation we have a Full GC (Stop the world, M&S), here is a representation of a minor collection taken from the GC logs:
 
17767.137: [GC [PSYoungGen: 1052672K->175413K(1228096K)] 3100706K->2233257K(4035264K), 0.4126550 secs] [Times: user=0.78 sys=0.02, real=0.42 secs]
 
You see? It is just as simple as this:

hieroglifo

Let’s try to understand what each value means:

  • 17767.137 = Elapsed time in seconds since the JVM was started.
  • [PSYoungGen: 1052672K->175413K(1228096K)] = Space taken up on the Young Generation BEFORE the GC -> Space that was taken up on the Young Generation AFTER the GC
  • 3100706K->2233257K(4035264K) = Space taken up on the entire heap BEFORE the GC – > Space taken up on the entire heap AFTER the GC (You should not consider the Permanent Generation)
  • 0.4126550 secs = Time of GC operation
  • [Times: user=0.78 sys=0.02, real=0.42 secs] = refer to the table below: 


Real is wall clock time – time from start to finish of the call. This is all elapsed time including time slices used by other processes and time the process spends blocked (for example if it is waiting for I/O to complete).
User is the amount of CPU time spent in user-mode code (outside the kernel) within the process. This is only actual CPU time used in executing the process. Other processes and time the process spends blocked do not count towards this figure.
Sys is the amount of CPU time spent in the kernel within the process. This means executing CPU time spent in system calls within the kernel, as opposed to library code, which is still running in user-space. Like ‘user’, this is only CPU time used by the process.

*When in doubt, just read User+Sys.

Isn’t that amazing? Now you can see a log entry like this one and interpret what is going on (to generate these log entries use the following argument: -verbose:gc, or, if you want to see this on a specific file, -Xloggc:file), you can start creating graphics showing what is happening with your JVM in a given moment, like this one:

  minor_gc

This way we can confirm that 10 mb of objects were promoted to the Old generation, do you agree?

Here is an example of how the Full GC is represented in the logs:

 
17684.691: [Full GC [PSYoungGen: 175417K->0K(1228096K)] [ParOldGen: 2715481K->2019446K(2807168K)] 2890899K->2019446K(4035264K) [PSPermGen: 118571K->118548K(163840K)], 6.1196070 secs] [Times: user=11.34 sys=0.02, real=6.12 secs]
 
You can use tools to visualize this kind of activities, for example, the GC Viewer:

gc_viewer

Again, why this is so interesting? Because, with this knowledge, you can solve performance problems and even tune your JVM, the most common issues (that involve memory) are caused by:

  1. Number and size of objects.
  2. Average lifetime of an object
  3. Size of the Heap.

 

That’s it for now. See you on the next post.