Most times I write more
'essay' style articles, but Isaac and I have sometimes had small ideas we wanted to discuss but didn't feel like they were big enough to post on their own. So I'm trying this out, with a series of small short ideas that might be valuable but are not too detailed. Please feel free to comment on any of these shorts or on the idea of these smaller, less essay-style posts. If you are really excited about a topic and ask interesting questions, I might try to follow it up with another essay-style post.
Code Coverage
Starting with a quote:
Recently my employer Rapita Systems released a tool demo in the form of a modified game of Tetris. Unlike "normal" Tetris, the goal is not to get a high score by clearing blocks, but rather to get a high code coverage score. To get the perfect score, you have to cause every part of the game's source code to execute. When a statement or a function executes during a test, we say it is "covered" by that test. - http://blog.jwhitham.org/2014/10/its-hard-to-test-software-even-simple.html
The interesting thing here is the idea of having manual testing linked to code coverage. While there are lots of different forms of coverage, and
coverage has limits, I think it is an interesting way of looking at code coverage. In particular, it seems like it might be interesting if it was ever integrated into a larger exploratory model. Have I at least touched all the code changes since the last build? Am I exploring the right areas? Does my coverage line up with the unit test coverage and between the two what are we missing? This sort of tool would be interesting for that sort of query. Granted you wouldn't know if you covered all likely scenarios, much less do complete testing (which is impossible), but more knowledge in this case feels better than not knowing. At the very least, this knowledge allows action, where as just plain code coverage from unit tests as a metric isn't often used in an actionable way.
I wonder if anyone has done this sort of testing? Did it work? If you've tried this, please post a comment!
Mario’s Minus World
Do you recall playing Super Mario Brothers on the Nintendo Entertainment System ("NES")? Those of you who do will have more appreciation for this short but I will try to make it clear to all. Super Mario Bros, which made Nintendo's well known series of Mario games famous. In it you are a character who has to travel through a series of 2D planes with obstacles including bricks and turtles that bite, all to save a princess. What is interesting is that fans have in fact cataloged a
large list of bugs for a game that came out in 1985. Even more interesting, I recall trying to recreate one of the bugs back in my childhood, which is perhaps the most famous bug in gaming history. It's known as the
minus world bug. The funny thing is in some of these cases, if a tester had found these bugs and they had been fixed and the tester would have tested
out value rather than adding value in, at least for most customers. I am not saying that we as testers should ignore bugs, but rather one man's bug can in some cases be another man's feature.
How Little We Read
I try not to talk much about the blog in a blog post (as it is rather meta) or post stats, but I do actually find them interesting. They to some degree give me insight into what other testers care about. My most read blog post was about the
Software Test World Cup, with second place going to my book review of
Exploratory Software Testing. The STWC got roughly 750 hits and EST got roughly 450 hits. Almost every one has heard of
Lessons Learned in Software Testing (sometimes called "the blue test book"). It is a masterpiece made by Cem Kaner, James Bach and Bret Pettichord back in 2002. I just happened upon a stat that made me sad to see how little we actually read books. According to
Bret Pettichord, "Lessons Learned in Software Testing...has sold 28,000 copies (2/09)". 28,000 copies?!? In 7 years?!? While the following assertion is not fully accurate and perhaps not fair, that means there is roughly 30k actively involved testers who consider the context drive approach. That means my blog post which had the most hits saw roughly 3% of those testers. Yes, the years are off, yes I don't know if those purchased books were read or if they were bought by companies and read by multiple people. Lots of unknowns. Still, that surprised me. So, to the few testers I do reach, when was the last time you read a test related book? When are you going to go read another? Are books dead?
Metrics: People Are Complicated
Metrics are useful little buggers. Humans like them. I've been listening to
Alan and Brent in the podcast AB Testing, and they have some firm opinions on how it is important to measure users who don't know they are (or how they are) being measured. I also just read Jeff Atwood's post about how
little we read when we do read (see my above short). Part of that appears to be those who want to contribute are so excited to get involved (or in a pessimistic view,
spew out our ideology) that they fail to actually read what was written. In Jeff Atwood's article, he points to a page that only exists in the
Internet Archives, but had a interesting little quote. For some context, the post was about a site meant to create a community of forum-style users, using points to encourage those users to write about various topics.
Members without any pre-existing friends on the site had little chance to earn points unless they literally campaigned for them in the comments, encouraging point whoring. Members with lots of friends on the site sat in unimpeachable positions on the scoreboards, encouraging elitism. People became stressed out that they were not earning enough points, and became frustrated because they had no direct control over their scores.
How is it that a metric, even a metric as meaningless as score stressed someone out? Alan and Brent also talked about gamers buying games just to get Xbox gamer points, spending real money to earn points that don't matter. Can that happen with more 'invisible metrics' or 'opaque metrics'? When I try to help my grandmother on the phone deal with Netflix, the fact that they are running
300 AB tests. What she sees and what I see sometimes varies to my frustration. Maybe it is the AB testing or maybe it is just a language and training barrier (E.G. Is that flat square with text in it a button or just style or a flyout or a drop down?).
Worse yet, for those who measure, these AB tests don't explain why one is preferred over another. Instead, that is a story we have to develop afterwards to explain the numbers. In fact, I just told you several stories about how metrics are misused, but those stories were at least in part told by numbers. In more theoretical grounds, let us consider a scenario. Is it only mobile users who are expert level liked the B scenario while users of tablets and computers prefer A? Assuming you have enough data, you have to ask, 'does your data even show that'? Knowing the device is easy in comparison, but which devices count as
tablets? How do you know someone is an expert? Worse yet, what if two people share an account, which one is the expert? Even if you provide sub-accounts (as Netflix does), not everyone uses one and not consistently. I'm not saying to ignore the metrics, just know that statistics are at best a proxy for the user's experience.