Tuesday, July 24, 2012


Last week I got a mail from my lead "Hey !! we need all the sql queries that are executed when we make any api call", I was like, really???That much of copy paste I'll have to do(poor me). Anyway I had all the rest calls and the queries for them in JSON format, all I had to do was to write one awk script. But with awk too I knew I will have to do lots of formatting so I searched online if there is any command line JSON parser or any utility so that i don't have to do it manually and then I found one cool tool "JSAWK"(AWK with javascript power).

Its pretty cool and if you know awk its pretty easy to use. I am not an expert, I am just sharing what I learnt and what I did.

Lets take a simple JSON data example :
{ "A" :[{ "B":"b","C":"c"},{"D":"d","E":"e"}]}

above is one simple json, now i want to extract each key value pair, here is a valid jsawk command.
Supposing above json is in file json.txt

cat json.txt | jsawk 'return this.A'
response := [{"B":"b","C":"c"},{"D":"d","E":"e"}]

Here we are getting an array as response what if I want to get elements of array??Let's try.

cat json.txt | jsawk 'return this.A[0]'
response := {"B":"b","C":"c"}

You can even use |(pipes) with JSAWK
cat json.txt | jsawk 'return this.A[0]' | jsawk 'return this.B'
response := b

What JSAWK does is it uses javascript to filter JSON arrays in results array and print them to stdout.
Since it gives you an array as result, traversing and fetching data becomes too easy. These are ways to access the elements now what if we want to modify our json, make some selection on elements?? Can we do that??

YES!!!!We have couple of options with jsawk to help us, they are below :

Options                              Usage
-n                                        Suppress the printing of result set

-q                                        Filter Json through JSONQuery query

-v <name=value>                Set global value name to value

-f                                          load and run specified javascript file before processing the JSON

-a <script>                           Run the specified snippet of javascript after processing JSON input.

-b<script>                            Run the specified snippet of javascript before processing JSON input.

JSAWK supports spidermonkey Javascript interpreter and hence all the powers of it plus it has some other methods available. For more info check this link 


Next time you have to parse some JSON, you know what you can use.

Thursday, July 5, 2012

JAVA Garbage Collection and JVM Arguments

This post is about various JVM arguments available to tune Garbage Collection. Its not an exhaustive list but contains almost all important parameters you will ever need to tune your application. Once we are done with the arguments, we will see how to analyze the gc logs.

 I will take what I want !!

  We can choose which garbage collector we want for our application. Following are the available
  1. -XX:+UseSerialGC                         -Use serial GC
  2. -XX:+UseParallelGC                      -Use Parallel GC
  3. -XX:+UseParallelOldGC                -Use Parallel Compacting GC
  4. -XX:+UseConcMarkSweepGC      -Use Concurrent Mark Sweep(CMS) 

  Whats Going On??

  Choosing the garbage collector is one thing but question is what its doing behind??Let's hear it from   
  the collector himself. We can enable printing of details of garbage collection every time it happens
  using following parameters.

  1. -XX:+PrintGC                                   -Print basic details every time garbage collection happens
  2. -XX:+PrintGCDetails                        -Print more detailed information for each gc.
  3. -XX:+PrintGCTimeStamps              -Prints a timestamp at start of each gc.
  4. -Xloggc: <filepath>                             -Print the details in given log file.

   Distribute(Be Generous but don't waste)

   Okay!!! So we have selected our collector, we have enabled the logs but we haven't done anything
   about the memory yet, heart of the GC. Lets tune it. 
  1. -Xmsn : You can specify the initial heap size to be allocated. n is number of bytes eg -Xms512m  (m for MB)
  2. -Xmxn: The maximum size of heap in bytes eg. -Xmx1024m
  3. -XX:MinHeapFreeRatio : Minimum percentage of free space of total heap size. Suppose minimum is fixed to 20%, when percentage of free space in any generation reaches 20% then size of generation is expanded to maintain the ratio.  eg -XX:MinHeapFreeRatio=40
  4. -XX:MaxHeapFreeRatio : Maximum percentage of free space of total size. If in any generation free space exceeds this value then heap is shrunk to maintain it.
  5. -XX:NewSize : Default initial size of new young generation eg. -XX:NewSize=128m
  6. -XX:NewRatio : Ratio between young and old generation eg -XX:NewRatio=3
  7. -XX:SurvivorRatio :  Ratio between survivor space and eden space -XX:newSurvivorRatio=7
  8. -XX:MaxPermSize : Maximum size of permanent generation.

Other than these parameters there are some which are specific to type of collectors, we are not talking about them right now. Now next thing is how and when we setup these parameters. Answer is before starting your application you need to set these parameters.

On Unix machines you can use:
        export JAVA_OPTS="<argumrnt> <value>, .... "
        eg :  export JAVA_OPTS="-Xms1024m -Xmx2048m -XX:NewSize=256m"

Usually every java application, servers have start script, there also you can put the JAVA_OPTS values.

!!!!!!!!!!!!!!!!!!!!!!!!!!!ENOUGH THEORY, LETS DO SOME PRACTICAL !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

If you have any java application, you can use it. I am using apache geronimo app server which is hosting a sample daytrader application(developed by IBM, later donated to apache. Used for benchmarking of geronimo app server). I am generating load through apache jmeter. I used following 
JAVA_OPTS to start my app server. 

JAVA_OPTS ="-Xms128m -Xmx256m -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -Xloggc=
/home/administrator/gclog.log -XX:MaxPermSize=128m"

I generated a small load on server, this was logged in the gclog file.

19.542: [GC [PSYoungGen: 36486K->5213K(53632K)] 44251K->15877K(141056K), 0.0113620 secs] [Times: user=0.04 sys=0.00, real=0.01 secs] 
20.802: [GC [PSYoungGen: 53597K->3797K(65728K)] 64261K->17705K(153152K), 0.0182770 secs] [Times: user=0.04 sys=0.00, real=0.02 secs] 
22.633: [GC [PSYoungGen: 61589K->1652K(79360K)] 75497K->19223K(166784K), 0.0090320 secs] [Times: user=0.04 sys=0.00, real=0.01 secs] 
26.096: [GC [PSYoungGen: 73268K->3566K(75200K)] 90839K->22617K(162624K), 0.0160600 secs] [Times: user=0.04 sys=0.00, real=0.01 secs] 

The format of log is :
<Time_stamp_since_start_of_server> : [GC [<Collector> : <starting_occupancy_young_gen> -> <ending_occupancy_young_gen>(max_young_generation_size)]<starting_heap_occupancy> -> <ending_heap_occupancy> <time taken>

NOTE : The above log was generated for JDK1.6 and the format changes with every version.

So taking the first line from log we can say that Using Parallel scavenge young generation garbage collection happened after 19.542 seconds from starting of the server, initially 36 MB was being used after gc only 5 Mb is being used in young generation, while initially 44MB was being used in old generation but after GC only 15MB is used and the GC took 0.01 sec to complete.

So now we know how to select collector, what all params are there and how can we see what actually is 
going on behind the scene. Stay tuned for next post where we will look at different profiling tools available and try to learn what should be the values of parameters and how can we find the optimal values for them.