Saturday, November 24, 2012

Load Balancers - 2

Last post was about using NGINX as the load balancer, this post is about using apache http server as a load balancer.  Lets get started with apache ( My favorite).


APACHE HTTP Server

Apache http server needs no introduction,  its like the backbone of www. According to wikipedia :

The Apache HTTP Server, commonly referred to as Apache (/əˈpæ/ ə-pa-chee), is a web server software notable for playing a key role in the initial growth of the World Wide Web.[3] In 2009 it became the first web server software to surpass the 100 million website milestone.

Post intro now lets install and configure it.

Installation :

You can download apache http server from apache download site http://httpd.apache.org/ 
On linux you can install the package using :

sudo apt-get install apache2

Configure :

Apache provides modules to use it as a load balancer but by default they are not enabled, so first step is to enable load balancer modules and proxy modules. Lets enable them 

1. Enable Modules


  • sudo a2enmod proxy_balancer
  • sudo a2enmod proxy_connect
  • sudo a2enmod proxy_http


2. Restart apache 

  • sudo /etc/init.d/apache2 restart

3. Now we need to configure one virtual host. Lets take the last post example where we had    two app servers 192.186.1.2 and 192.168.1.3 and we have load balancer machine 192.168.1.1
We will direct the load from load balancer to app servers. Create one new file  /etc/apache2/sites-enabled/my_load_balancer and enter :


Listen 80
NameVirtualHost 192.168.1.1:80
<VirtualHost 192.168.1.1:80>
        ProxyRequests off
        ServerName 192.168.1.1
        ProxyPreserveHost On

        <Proxy balancer://my_app_servers>
                Balancermember http://192.168.1.2:80 loadfactor=1
                BalancerMember http://192.168.1.3:80 loadfactor=2
                #Order deny, allow
                Allow from all
                ProxySet lbmethod=byrequests
        </Proxy>

        <Location /balancer-manager>
                SetHandler balancer-manager
                Order deny,allow
                Allow from all
        </Location>
        ProxyPass /balancer-manager !
        ProxyPass / balancer://my_app_servers/
</VirtualHost>

Here we are creating a virtual host which is listening on port 80.


4. Restart apache.


  • sudo /etc/init.d/apache2 restart

Note : You need to comment out the default NameVirtualHost (/etc/apache2/ports.conf) in case you are configuring your load balancer to listen on port 80.

Discuss : 

So as you can see here we mention the backend servers using Balancermember and can configure how much load should be directed to that member using loadfactor. And you can also configure which algorithm you can use to distribute the load using lbmethod. These settings are the bare minimum to start your load balancer.

If you want to know what all options are available with proxy module please check this http://httpd.apache.org/docs/2.2/mod/ as there are many options and its not possible to discuss all of them here.

Problems with apache :

Only problem with apache is when you increase the load its performance starts degrading, in my case I had decent load and it was performing well under 5000 requests in an hour.



Thursday, November 15, 2012

Load Balancers

"Scalability" First time I heard this word, I never thought one day it's gonna haunt me so much that I will have few sleepless nights.  It all started when I was asked to do a horizontal scale testing of our backend system. But don't worry I won't lecture you on scalability test rather I want to share few new things that I learned while doing it (interesting ones).

So the scenario was something like this :
Our webapp(supply chain system) is distributed on 12 VMs (test env) and I had to see how system behaves if we add one more similar setup and use load balancers on top to distribute the load. Can it handle more load. Can it scale ??I said lets do it, only problem  was I had no clue about what load balancer I can use, how load balancer works or which one is best for me.

I  googled and found three names Apache webserver, Nginx and Haproxy  which people use as load balancers. I did a small research on these three tried all of them one by one. This post is all about the pros and cons of these three software load balancers( I am not doing any benchmarking for any of these here, just sharing how to configure and use them and what problems I faced).

So lets start with the easiest to configure and really good load balancer Nginx.

NGINX

Nginx is a webserver and a reverse proxy server for HTTP, SMTP, IMAP and POP3 protocols plus it can work as a software load balancer. Nginx is really fast when it comes to serve static contents, it can scale upto 10000 req/sec. What makes it so fast is its event driven architecture, it doesn't have apache type process or thread model architecture and because of this it has very small memory requirement.


Installation

Linux  :   sudo apt-get install nginx

Configure

Suppose you have two backend servers 192.168.1.2 and 192.168.1.3 and you installed nginx on 192.168.1.1 .
Create a file /etc/nginx/sites-enabled/myloadbalancer.cfg

upstream myservers {
               server 192.168.1.2;
               server 192.168.1.3 ;
}

server {
              listen 80;
              server_name localhost;
              access_log /var/log/nginx/access.log;
             location / {
                        proxy_pass http://myservers;
                        proxy_set_header $host;
            }
}

And you are done. One important thing if your application needs the hostname you will have to explicitly set the host header( I needed it and it took me 2 hours to figure out why suddenly our application started giving bad hostname exceptions).

Problems with Nginx

After configuring the load balancers I was happy everything looked fine only problem being some particular rest calls started failing, which was unexpected. After a two days of debugging I finally found that some of the headers which our application was setting up and then making calls, were missing. Nginx was stripping off all the headers starting with X_ . I googled and finally found because of some security measures nginx does strips off certain type of headers.  So if your application needs headers starting with X_ or if any header whose name has _ in it ( it converts _ to - )  then probably nginx is not a good idea for load balancing. Though there is a patch which prevents  _(underscore) to -(dash) conversion, but in my case it was simply stripping them off so it didn't helped me.

And my 3 days of work went in gutter because I needed those headers, and it forced me to move from nginx to apache, the next easiest one to configure. Lets configure apache in next post.