Tuesday, August 5, 2008

Testing in Chaos

At Delver we work in Blitzkrieg mode: we work hard, develop fast and improve our asskicking search engine almost on a daily basis. Working like this is very exciting, I can sometime feel the imaginary wind in my face, but from QA perspective I have grown to realize that we have a problem testing efficiently in this development environment. Here's a list of the problems and questions I'm dealing with:
  • Sometimes the builds we get in QA should have never have left dev. The builds contain critical bugs which render the build untestable.
  • I'm not always sure how thoroughly we should test each build. Through testing of a new build is currently ~20 man hours for "core", previously tested, features. That's not including new features.
  • I don't know when I'll get a new build. The longer we delay the build, the more time I have to test it as more features are added.
  • What amount of testing should be performed when configuration changes are done in Production?
  • If I test a build thoroughly on my lab, do I have to test it with the same amount of thoroughness when it is uploaded to Production? Where should I focus my testing efforts? Lab? production? Both?
With all these problems I sometimes feel I'm losing control of the process and that I react intuitively instead of in a planned manner. I want to test in the most efficient manner without changing the R&D flow. I want the best coverage for the minimal effort and I want to invest resources where needed, i.e. in new features and risky areas and reduce the amount of routine testing performed by my team. How am I going to tackle these problems:

  • Develop a set of Sanity tests which cover all aspects of our application without going too deep into any of them. For example: Make sure Search feature works, but don't test special characters or long strings.
    • These Sanity tests should be performed by developers before handing out new build to QA.
    • These Sanity tests should be performed by Operations team when they change a hardware configuration and want to make sure they didn't break anything.
    • We will automate these tests and instruct Operations and dev how to run them.

  • We have a rather impressive set of Regression tests, most of them automatic. I'll divide them into two levels:
    • Level I (Core): The Core of Regression tests. Deeper than Sanity tests, faster than Full Regression. All automatic tests are Core but not all Core are automatic. When do we test Core tests?
      • When a build is not Production candidate, or
      • When only minor changes were made since last Build
    • Level II (Full): All Regression tests. When do we test Full Regression
      • When a build is a Production Candidate, and
      • Major changes/features are implemented in this build

To summarize, here is the procedure we'll use regarding new builds and configuration changes in delver:
  • Every new build should pass a Sanity test by dev before it is ready for QA.
  • When Operation changes configuration or install minor fixes which did not pass QA, they should also run Sanity tests.
  • QA will run Core tests on new builds unless the new build is both Production candidate and major changes were performed since last build. In that case, Full Regression tests will be performed.
  • New features will tested thoroughly on the build they become available. In later builds they will be tested either as part of Core or Full Regression tests.

I'm still not sure about how much resources should I invest in testing in Production versus Lab, but that's a matter for another post. Sometime in 2009, probably.

Sunday, March 16, 2008

Performance testing mistakes (I've made)

I've been testing for Performance for eight years now, more or less. When I started testing, I didn't even know I'm supposed to test for "Performance", I was simply the- dude-who-knew-VUGen. Neither me, nor my manager cared much about performance testing methodologies: I'd simply record a script, run it with multiple users and report my findings. This is a very bad attitude to run performance tests, as I have found out in several heated QA-dev meetings. The worst part was seeing my credibility deteriorate, and my hard earned findings doubted. Over the time I managed to pinpoint my mistakes and win my credibility back, but the process was long and painful. In this post I'll discuss some of the biggest performance methodology mistakes I have committed over the years.

1. Configuration problems. This one was the hardest on me. You run a test, you find what you think is valuable performance issue, you call some developers and have them investigate your system, only to find out that you ran out of disk space. Or the debug level was to high. Or the memory wasn't properly configured.

Always verify your system before a test. Read all the Read Me's, even the ones no other QA tester reads. Same goes for Best Practices. In one of the companies I worked for, we found a DB configuration document no one even knew existed, and we configured our system properly. Prepare a checklist a verify it before each test. Misconfiguration will bite you. Hard.

2. Mixing conclusions with findings. I love monitors. I disperse them generously across the applications and servers and monitor all relevant process (and some which aren't). In the end of the day, I sometimes have 100-200 monitors per server. While this is not a problem in itself, it became one when I didn't know what to report.

When asked to present my conclusions, I used to report anything which held a trend: Working Sets increasing, Context Switch peaks, Threads amount changing. The graphs looked impressive and I felt like I have a lot of "meat" in my reports. Problem started when people started asking: "well, what does it mean?" , "isn't it normal?" or even "is it good or bad?"

In many cases, I had no idea. I figured that other R&D personnel knew what those stats meant even when I didn't. Well, they didn't either.

Many trends are cyclic by nature (resource consumption is increasing and then decreasing), some reach a flat plateau after increasing and some rise and fall like saw teeth. In most cases, Monitors of hardware usage only give you hints of the problem and rarely the problem proper. There are some exceptions to this rule: high CPU consumption over a prolonged period of time is bad in most cases, for example.

The findings are important, as they help know your application better and know it baseline. If you're already familiar with specific monitors (I/O of process x, for example) following its trend can help you detect when an application is misbehaving, but in most cases monitors results are best left out of your conclusions section and kept in the findings baseline. Report only things which are definitely bad and you're sure about.

3. Fighting your previous war. There's a saying about the army that it's always preparing for the previous war instead of the next one. While it's always fun to make jokes about the army, most people handle the current project similar to the way they handled the previous one. I once had my engineers spend a week trying to locate memory leaks which weren't there, only because the previous product I tested suffered from memory leaks. In another time, it took me some time to adjust to the fact that the load on the system I tested is not caused by user activity, but by amount of data flowing into the system. Bye-bye 100 concurrent users load test.

Performance problems on different systems are caused by different factors affect different components. Study the system you test, learn what hurts it and how does it express its pain. Design your tests after you are familiar with the system and try to verify with yourself whether you're testing what you're testing because you're used to it or because it's relevant to the current product. Fight the current war, not the previous one. Or you can join the army if you like the uniform, though.

A subsection of this issue would be to test what's easy for you test. Nice try, but no. Don't.

4. Convoluted reports (no one reads). I'm not intimidated by putting together a 200 pages summary document, I actually take a perverse pleasure in it. It had taken me some time to understand no one really reads them if you simply throw the reports in their faces (or in their inbox, for that matter). Having a 10 MB, 200 pages monster uncoil before your eyes is a scary sight when you're unprepared and the natural reaction of most people I sent the report to was to close it and "read it later".

So I changed the format of my reporting and now my reports get the attention they deserve (most of the time, anyway). R&D managers are like ADHD children: they have short attention span and they like bright and shiny things, so in order to get their attention I compose a mail they can easily digest. The secret is to put nice colorful graphs and simple explanations summarizing your most important conclusions in the mail you send to the relevant people. No more than 2-3 issues. Put the rest of the report online somewhere and attach a link to it. Most people will ignore it, so you better make sure that executive summary count. Colorful graphs, simple explanations and no more than 2-3 conclusions. Easy.

Monday, March 10, 2008

What's QA?

I had a not-too-bright manager once, who told me that the job of the QA was to ensure the quality of the product. I told him that I can't do that: I can ensure the quality of the miniatures I paint, the salad I cut or, to some extent, the manner of my daughters. I can't ensure the quality of code I don't write, it's that simple. Ensuring the quality of the software is not my job.

I often see QA engineers who feel personally offended that software is released with (some) open bugs, or that their bugs are re-categorized as enhancements. I always tell them that being offended by the way our bugs are treated is not our job.

In the organization I currently work, I encounter Dev personnel design and run tests. I was surprised and a little offended at first, but I soon realized there's no reason for that. Hell, even testing is not my job (though I'd better be good at doing it).

So what's my job, actually? It's not eating these donuts.

In Delver, we have a white board in the meeting room. My job is to go to that board once in a few weeks and write the product's status. I give a smiley to a component which is ready for production, so-so face to a component which still has some bugs before it's ready and a frowning face to a component with critical issues. That's my job as a QA manager. Go to the board and paint some faces with a dry erase marker.

Because the role of QA is not to ensure the quality of the software, not to fight for bugs and not even to test the product (yes, I'm being a wise ass here). The role of QA in organization is to be able to reflect the quality of the software. In order to do that we have to test or help the dev personnel test. We must be efficient, resourceful and clever in our tests. We must know the requirements and understand how the users use the software. But in the end of the day, our job is to reflect the product's status.

Are there critical bugs you feel are ignored? Publish them. Make sure everyone who should know about them, knows. Should you care about known defects the software is released with? of course, you're part of the team. Is it your job to stop the version? No. Your job is to make sure that the people who make the decision about shipping it know everything they should know.

Sunday, March 9, 2008

Here we go

I test software for roughly eight years. I learned a lot and I still do. I have started this blog in order for me to help others, talk about the stuff I like and rant about the stuff I dislike. I believe I'll focus on Performance testing which I love and QA in Startups, which is the challenge I'm currently facing. I intend to keep this blog mostly professional, but I can't help it if I mention my family. My daughters are simply too cute, I apologize in advance.