Thursday, June 14, 2012

JAVA Garbage Collection Algorithms and JVM Tuning

This post is about the way garbage collector works in Java and how can we tune it for any special need of our application. First part is all about the algorithms and the way GC works. Second part is about the different tuning parameters JVM provides us to tune garbage collector(stay tuned). I have tried to create some scenarios to better understand the tuning of GC(Garbage Collector, I will use GC from now onwards).

What is a Garbage Collector and why we need it??

Those having a c/c++ background must have used malloc/free or new/delete operators to allocate and deallocate memory. In java deallocation is done automatically, programmer doesn't needs to care about the deallocation. So GC is someone who does it for programmers in background(Say thanks to GC :P ).

A GC is responsible for :

  • Allocating memory
  • Make sure that objects which are still referenced remain in memory
  • All the objects which are no longer in use are removed and memory is freed. 

Since now we know what a GC, Question is how it works??So lets dig into the GC algorithms, but first
lets see what are the choices GC have to chose an algo.

Options With GC Algorithms 

  • Parallel Vs Serial
          A garbage collector algorithm can be parallel or serial. In former case if a system has multiple  
          cores the collection work is divided and ran on different core. In case of serial, even if machine
          has multiple cores, only one core will be used for collection.

  • Concurrent Vs Stop the World
         As the name suggests in case of stop the world, the application is stopped while collection while 
         in case of concurrent algorithm collection is done with application running. Stop the world case
         is easy but some applications may have a no pause requirement. In case of concurrent, collection   
         happens on objects which may get updated while collecting, hence they add some extra
         overheads and require a bigger heap size.
  • Compacting Vs Non-Compacting
          Once garbage collection is done, GC may or may not compact the memory. Moving the live 
          objects to one end of memory creates a free pool of memory at other end. It's easy and fast to
          allocate memory with one free end. Non compacting GC algorithm are fast but it causes memory
          fragmentation and slow down the allocation.

Generational Collection and JAVA HotSpot JVM

As of J2SE 5 update 6, in JVM there are in total four garbage collectors and all of them use generational collection technique. In generational collection, memory is divided into different generations, that is separate pools holding objects of different ages. Most widely  used implementation has two generations young and old .  Young generation collection happens frequently and contains most of the unreferenced objects.  Objects which survive few collection cycles are aged and moved to old generation. Old generation is typically larger than young generation and takes significantly large time to fill. So collection is infrequent but it takes lots of time.

Hotspot Generations

Memory in java hotspot jvm is divided into 3 generations a young generation, old generation and permanent generation. As name suggests young generation contains young objects, old generation contains objects which survived two or three young collection cycle and large objects which got allocated directly in old generation, permanent generation contains holds objects that jvm finds convenient to have GC manage, such as objects describing classes and methods.

The young generation consists of three areas one eden space and two survivor spaces. Most objects are allocated in eden space, while one of the survivor space contains objects who have survived one collection cycle and have been given another chance to die and get collected before they age enough to be moved to old generation, one of the survivor space is empty all the time.

Hotspot Collectors

Lets talk about four garbage collectors in hotspot jvm.

  1. Serial Collector
          Using serial collector both young generation and old generation collection happens in stop the 
          world fashion ie the application execution is halted while performing the collection.
          Young generation collection using serial collector
          In case of young generation the live objects in eden space is copied to one of the empty survivor  
          space (To in image), if the object size is large then those are tenured and directly copied to old 
          space. Objects in occupied survivor space that are still young are also copied to the other survivor
          space while objects which are relatively old are moved to old space. If the To survivor space is 
          filled while eden and occupied survivor space still contain some objects, then they are moved 
          straight to the old space.  Once the copy is done the objects which are in eden and from space are
          collected (such objects are marked with red cross). Once collection is over eden and from space 
         are empty and to space contains all live objects, at this point of time the survivor spaces swap 

         Old Generation collection using serial collector
         Serial collector uses mark sweep and compact  algorithm to collect old and permanent generations        
         In mark phase the collector identifies which objects are live. In next phase the collector performs 
         sliding compaction, moving the live objects to one side thus creating one free memory pool. This 
         helps in increasing the allocation speed of new objects as we can use the bump the pointer  
         method for allocation.

        Serial collector is by default used for any application on non-server type machines. On other
        machines serial garbage collector can be chosen by using -XX:+UseSerialGC command line 

    2.  Parallel Collector  

         Parallel collector is a parallel version of serial collector which takes advantage of multiple cpus
         and large memory available on today's server class machines.  

         Young generation collection using Parallel Collector
         Young generation collection algorithm is parallel version of serial collection. Its still works in 
          stop the world fashion but the garbage collection happens in parallel with  reduced overheads
          to increase the throughput of application.

         Old Generation collection using Parallel Collector
         Parallel collector uses the same serial mark, sweep and compact algorithm as serial collector for
         old and permanent generations.

        Parallel collectors are used on server type machines and applications which have no low pause
        time constraints. Parallel collector can be explicitly requested by using -XX:+UseParallelGC 
        command line option.

    3.  Parallel Compacting Collector
        The difference between parallel collector and parallel compacting collector is that it uses new
        algorithm for old generation collection     

        Young generation collection using Parallel Compacting Collector
        It uses the same algorithm as parallel collector for young generation collection.

        Old Generation collection using Parallel Compacting Collector
        It uses a new algorithm which works in three phases in stop the world, mostly parallel and sliding
        compaction manner. In first phase the generation is divided into regions. In marking phase the 
        objects are divided between the garbage collector threads and are marked in parallel, as the object 
        is identified live, the region data in which the object is in, is updated. 

        Due to previous collections it happens that generations have high density of live objects in left side
        and low density in right side. Compaction of left side is not worth it because of very small memory
        to be recovered. So in summary phase first thing it does is to find a dividing point in region so that
        from right side of it enough memory can be recovered. The summary phase next calculates the 
        address and size of first byte of live data for each compacted region. 
        In compaction phase garbage collection threads use the summary data to identify regions to be
        filled and then the threads copy data independently.

        Parallel compacting collector can be explicitly requested by using -XX:+UseParallelOldGC

   4.  Concurrent Mark Sweep(CMS) Collector

         For many applications the overall throughput is not that important as fast response time. Young
         generation collection doesn't takes much time but old generation does which causes high
         response time. To overcome this problem JVM has CMS collector.

         Young generation collection using CMS Collector
         CMS works same as parallel collector for young generation.

          Old Generation collection using CMS Collector
          Collection cycle for CMS starts with a short pause called initial mark phase which identifies
          the live objects directly accessible from code. Next is the concurrent mark phase, which marks
          all the objects in this set. Since marking is happening in parallel with application so it might
          happen that all the objects are not marked. So there is one more application pause called remark.
          In remark phase all objects which were modified during mark phase are revisited. Marking is
          finalized. Since this pause is long multiple threads are used to increase efficiency. At the
          end of remark phase concurrent sweep reclaims all the garbage.

         CMS Collector is only collector which is non compacting. Unlike other collectors CMS doesn't
          starts when permanent generation is filled rather it starts much before so that it can complete
          early. CMS collector starts based on time statistics regarding previous collection times and
          time it takes to fill old generations. CMS collector will also start collecting if occupancy of
          old generation exceeds initiating occupancy. The value of initiating occupancy can be set by
          command line option -XX:CMSInitiatingOccupancyFraction=n default value is 68.

         CMS collector can be explicitly requested by using -XX:+UseConcMarkSweepGC.