Tuesday, March 11, 2014

Performance Testing - Analysis - Part 2

Performance bottleneck analysis has a lot of similarities with crime investigation! We need to start with some messy area and then investigate things around that. Once we zero in on a set of pages that have slowed down, the next step is to identify, at what intervals the slowdown has happened. All tools provide the response time vs elapsed time graph. We need to see the slow pages's response time, one by one, to get the time intervals. 

See this picture.

From the above we can note that, till the 50th minute, response was OK; at that time around 85-90 users were running; From that point onward, we see performance degradation. So, we have one clue that from 50th minute, problems are seen in speed. It is not enough to stop with this alone. Usually more than one symptom is required to prove a point. Let us take the hits to the server. See this picture.


Upto 50th minute, there were more hits going on to the server. From that point, hits count came down. So, 2 things point in the same direction. Now it is time for us to analyze what happened around the 50th minute. The dev  team or server maintenance team, must start looking at all server logs - web server log, app server log, db log to see what happened at the 50th minute from the start of the run. Look for errors, exceptions. We must get something - we cannot say what we will get, but I am sure we will get some strange messages coming out of the application or web server at that point of time.

If you see first and 2nd graphs together, you can see one thing. Beyond 50 users, after 30th minute or so, the hits did not increase, in spite of more users getting into the system. The hits started coming down. So, the issue might have started even before the 50th minute.

Interesting, isn't it? We will continue in the next post.

No comments:

Post a Comment