When the servers are loaded with 100s of virtual user hits, the application response will slow down. This slowness is caused by a variety of factors. If we need to improve the speed of the app, we need to know exactly which causes the slowness. We must understand that the internet app will have web servers, app servers, database servers, load balancers, proxies, firewalls, network etc. along the chain. A single bad component can pull down the response of the whole app. To identify the exact place of slowness, we must rely on performance counters.
Imagine a total health check. Our whole body will undergo so many tests. Weight, heart beat rate, blood pressure, cholesterol levels, sugar levels, RBC count, WBC count, treadmill test results etc. will all be taken and analyzed by the doctor. each part of our body is an object and each object has many measurements based on tests, and these are translated to numbers. When all these details are translated to numbers, it is easy to compare and isolate problem areas. Now, treat each object in the entire web application chain as health components and measure it. That means, we must monitor server health!
Each machine or device will have cpu, memory, disk and network connection to transfer data in and out. If we measure these objects in every machine, we will be in a better position to analyze. Every load testing tool also provides performance counter collection component along with the tool. There are other independent tools as well, available in the market. Usually we start collecting these performance object counters from 10 minutes before the test run, collect data when test is in progress, keep collecting the data until 10 minutes after the test is complete. The data will be usually collected every 5 or 10 seconds.
What to measure in CPU? There are 100s of items to be monitored; this list has only the vital counters. % cpu usage, % cpu used by system, % cpu used by user application, number of processes waiting in queue to grab cpu.
What to measure in memory? % memory in use, page faults, swap ratio, cache hits.
What to measure in disk? Number of disk read/sec, number of disk writes/sec, read/write errors, disk queue length.
What to measure in network? Number of packets sent, number of packets received, packet errors, available bandwidth, tcp retransmissions, network queue length.
The above counters must be collected from all servers that are part of the application environment. Over and above these, a lot of specific counters are available and they must also be collected, after consulting with respective system/server admins.
Apart from hardware related counters, we must also collected software related counters. For example, if you use Apache Tomcat, we must collected a few counters such as number of active sessions, number of active connections, cache hit ratio, memory used by webserver, pages cached etc. When you use RDBMS, we must collect counters such as number of active connections to db, index usage percentage, number of waited locks, number of nowait locks, number of open tables, reads per second, writes per second, number of open cursors etc.
This means, we need to collect 100s of such measurements for every run, and analyze after the run. If these are not collected, we cannot isolate where the problem exists. How to identify exact bottleneck - we will see in the next post.
For high end load testing tool, visit http://www.floodgates.co.in.
For free video lessons on load testing, visit http://www.openmentor.net.