Monday, December 15, 2014

More on No Testers

I sometimes look to see who is reading my articles, and if the incoming link looks interesting, I try to see if I can find the blog it came from, maybe even comment on it.  I particularly wanted to see if anyone had comments on my latest blog about No Testers.  Well I did that today.  I went to look at a Russian article (in Russian), and found a comment by Maxim Shulga saying (translated by Google):

Detailed and very adequate answer to the same question to my http: //about98percentdone.blog ... The only pity is that uncomfortable read: white on black. Burns my eyes :)

Now normally I would just reply to such a comment by filling out the form or using my gmail credentials and writing back.  I might say something silly about how I know not everyone likes the black on white style, as Jeff Atwood pointed out years ago.

While Google translate did translate the article, it didn't translate the comments nor the method to reply to a comment... using some developer tools I was able to establish this was from Disqus.  I am now going to chronicle my efforts to get an account and demonstrate how usability might have benefited from a tester, not to mention a bug tracking system.  Before I go on, I want to say I have no business relation with Disqus or any other discussion or forum/blog related software outside of this blog.  I am not trying to pick on them, it just happens I was trying to get something done and they are blocking me.  They also happen to have no testers, or at least no one with a label of "QA" or "Tester" or anything outside of 'engineering'.  This just happens to be an interesting example which relates to my previous post. I did not intentionally go looking for a company that has no testers and only discovered after I started writing this blog post that they have no testers.  I have contacted them about the issues I have noted here and none of the issues relate directly to security or should go unpublished for ethics reasons.

What I Found


Item 1:

Disqus did not give me English versions of their UI for replying even though my browser should be requesting English.  I tried several browsers, including IE which I checked to see it was set to en-US.  I am not sure if this could be detected via any sort of metrics to figure out they should be looking at the browser language rather than some user/blog setting.  To me this is the sort of choice that happens when you don't think long and hard about localization.  But perhaps they know and plan on changing it or maybe they think this is the right choice.

Item 2 & 3:

So I go to https://disqus.com/ and click sign up.  I am asked for a email AND a username.  I don't know what the username is used for in this context and there is no useful description or even an icon to click.  So I enter an email, username I enter JCD and a password and get told "Username already exists."  So I try JC-D.  Nope, "Letters and numbers only please."  Okay, what about JÇD.  It is all ASCII characters but I just get "Letters and numbers only please."  In fact, it thinks all sorts of things are not letters or numbers in their view but which I would consider letters.  I didn't bother with much Unicode but I imagine it would have the same sorts of issues.  Maybe this is intentional, but their error is not meaningful to me.  Worse yet, it excludes many people with names that are not just alpha characters.  My name has a hyphen in it, thus the JC-D.  The hyphen was excluded, excluding me using my real name.  Granted the math models would say I don't count as few people have hyphenated names.  Maybe I shouldn't care, maybe the name doesn't matter, but the username's usage is not clearly explained.  However, not including Markus Gärtner because his name has a non-English character seems really wrong.

Item 4:

I thought for a moment to use their gmail integration.  I have used it for Khan's Academy to log in and it worked fine.  So I tried but it looked like they wanted me to have a Google+ account, which I intentionally don't have.  I am a bit odd, wanting both privacy and a voice.  I don't like giving away personal details and being tracked, even if I have professional opinions that I wish to voice.  My professional and personal lives mix, but only a little.  So that was a no-go for me.  Worse yet, I get an warning saying that OpenID2, the method Disqus uses to sign up, will being no longer supported early next year (April I believe).  Clearly they have some updating to do.

Item 5:

I considered using a fake email address, but their terms of service were not on their sign up page.  In fact, most of the links on their page went away on the sign up page.  Maybe that is intentional.  Maybe it was A/B tested.  If so, awesome, but for me it was less than optimal.  I admit to not being the main use case, and perhaps that is one problem with testers.  We are not equal to users.

Item 6:

I go to report these issues and the best I can find is a contact support.  Not saying support is a bad place to start, but the form of input gives me about 3 sentences and a scroll bar.  If I wanted to tweet the error, that might work, but I had detailed points to give.  Only someone who is concerned with the customer would notice this, but I, a potential customer did.

Analysis of Why the Issues Exist


I suppose the question is, how do we capture this sort of data or if we care.  Maybe annoying your users is okay when you are a free product.  Maybe alienating users is fine when your metrics show few users try to use Unicode.  Perhaps that is what data scientists are useful for, deciding which problems matter?  Maybe having a functional tester would notice these issues and bring them up?  Having the customer deal with the problems until you figure out if it is a good idea is not an uncommon model, particularly if you own the market.  But keep in mind I never did get to make that comment.  Speaking in psychological terms, even with professional distancing, I will have a more negative view of their product and it will take effort on their part to turn me around.  Even if they magically changed it all tomorrow a potential customer like me might be long gone.  Perhaps with a billion users, it doesn't matter.

While I personally don't feel this way, but maybe the Disqus team is not the right team, which is the argument made for hiring 'the right team' that I have heard from the no tester camp.  If they were, I'd not be writing this post, with so many issues.  I think the language used by the no-tester side is unclear what the 'right team' is, and perhaps what they think testers do.  I am not sure that a mythical right team, with or without testers will ever produce bug-free code, but there are certainly good and bad team mixtures.  I feel that it is rather more difficult to evaluate their team dynamics having never met any of them.

Perhaps they are 'the right team', and I am just the wrong customer?  That is the other half of this particular no tester argument.  That testers are not like customers, so use customers.  As a customer who thinks like a tester, maybe I'm not representative?  Then again, if I pull out my heuristics, I can compare this to other products that don't require you to sign up at all to write a comment.  That may not be a complete defense in comparing myself to a real customer, but it certainly gives credence to these issues.

Finally, it could be they do have QA/Testers but they were renamed to some other title.  That just made it harder for me to figure out if they have testing and if what was built was what was intended.  These are design choices, but it is unclear if there was anyone questioning these design choices. Without someone in that role, the 'get it done' mentality comes into play, at least in some organizations.  Perhaps that happened here.

I am sure the reality of Disqus is way more complex than I have presented it, but I am an outsider.  I welcome any feedback from the company and will update this accordingly.  I also was not looking for this example case.  I was not planning on posting any more this year.  It just showed up and I thought it was interesting.  I'd love to hear from those who feel no testers is an appropriate choice and how they would interpret this.

To Maxim Shulga, I am sorry you don't like my black background with white text.  I will take your view under advisement if I ever try to re-theme this blog.  I hope at least the content is useful.  And next time, just leave a comment on my blog... this reply to your comment took way too long to write. :)

Friday, December 12, 2014

Thinking about No Testers

In attempting to hear out the various viewpoints of those whom subscribe to the idea that Agile should move towards removing the role of tester.  I have yet to see anyone who actually suggested eliminating testing completely, if that is even possible.  So let us unbox this idea and examine it, with as little bias towards keeping my own job as possible.  Here are the rough set of arguments I have seen/heard in TL/DR (Too long, didn't read) form:
  • Limited deploys (to say .01% of the users and deploy to more over time) with data scientists backing it will allow us to get things done faster with more quality.
    • Monitoring makes up for not having testers.
  • Hiring the right team and best developers will make up for the lack of testers.
    • Writing code that "just works" is possible without testing, given loose enough constraints (which leads to the next argument).
  • Since complete testing is impossible, bench testing + unit testing + ... is enough.
  • Quality cannot be tested in.  That is the design/code.
    • Testers make developers lazy.
  • It is just a rename to change the culture so that testers are treated as equals.
    • Testers should be under a dev/team manager, not in a silo.
  • It is a minor role change where testers now work in a more embedded fashion.
  • We hire too many testers who add inefficiencies as we have to wait <N time> before deploying.
    • We only hire testers to do a limit set of activities like security testing.
  • Testers do so many things the name/profession does not mean much of anything.
  • With the web we can deploy every <N time unit>, so if there are bugs we can patch them quickly.
  • As a customer I see those testers as an expensive waste.

That is sure a lot of different reasons, and I'm sure it is not an exhaustive list.  Now I shall enumerate the rough responses I have seen:

  • A little genuine anger and hurt that 'we' are unwanted.
  • A little genuine disgust that 'we' are going through the 80's again.
  • Denial ('this will never work')
  • It can work for some, but probably not all.  (This follows a very CDT 'it depends' point of view)
    • Not everything can easily be instrumented/limited deployed.  For example, I have heard Xbox 360 games cost ~10+k to do an update.
    • In some software, having less bugs is more important than the cost of testing.
  • This is why we need a standard (I never heard this one, but it is an obvious response)
  • Concern about the question of testing as a role vs task.
  • Different customers have different needs they want to fulfill.
  • Good teams create defects too.
  • We need science to back up these claims.
  • Testing should move towards automation to help eliminate the bottlenecks that started this concern.
  • It is very difficult to catch your own mistaken assumptions.
    • If you need someone else to help catch mistakes, why not a specialist?
While I don't have a full set of links I have heard this back and forth viewpoints, but I think it is an interesting subject to consider.  I am going to go through my bulleted list top down, starting with the "no testing" side of the world.  I believe I have seriously considered the limited deployment and letting the tester be the customer in a previous post.  In that same post I noted that at least not all teams can make up for a lack of testers, so again, I think it has been addressed.

Complete Testing is Impossible & Quality Cannot be Tested in


The premise is true, complete testing is in fact impossible, I do agree with that.  The conclusion that hiring testers is not needed is the part I find interesting.  Quality of code and design cannot be tested in is also in interesting statement.  The idea that testers make developers lazy feels very unsubstantiated, with possible correlation-causation issues (and I have seen no studies on the subject), so I am going to leave that part alone for now.

I write software outside of work.  Mostly open source projects, of which I have 5 projects to my name.  Most of them were projects in which I was looking to solve a problem I had and wanted to give to the world.  In more than half of them I wrote no formal tests, no unit tests and to be honest the code quality in some of them is questionable.  There are bugs I know how to work around in some cases, or bugs that never matter to me and thus the quality for me is good enough.  I have had ~10k downloads from my various projects and very few bugs have ever been filed.  Either most people used it and then deleted it or the quality was good enough for them from a free app.  I hired no testers and as the sole dev I was the only one testing the app.  I think this is a +1 for the no tester group, if you work on an open source app for yourself, no tester (maybe) required.  CDT even would agree with this as it does depend.  Context matters.  I am sure other contexts exist, even if I have not seen them.  What I have not seen is specifics around contexts that might require no testers, with the exception of the Microsoft/Google sized companies.

Now can you test code quality in?  If you are about to design the wrong thing and a tester brings that up, did the tester 'test' code quality in?  Well he didn't test any software, the no tester side would say.  Sure, but he did test assumptions, which is often part of testing.  What about the feature the tester notes the competition has that you don't, which was discovered when comparing competitor protect quality?  What if I suppose the dev knows exact what to develop, can the tester add value?  Well I know I have made code suggestions that have gotten a developer thinking.  However, I could have just as easily been an XP developer sitting beside him as he coded rather than a tester.  Well what about TDD?  Is that testing quality in?  If I write tests first, I have to think about my design.  That is testing quality in, but again that is just the developer.  What if the tester and the developer co-design the TDD or the tester recommends additional tests added during development.  Again, this could be a developer, but a tester is often thinking about the problem differently.  So while I think the statement is false, if I look at it from the spirit, I think it might be true in some cases.  There are some really great developers who do think like testers and they could do it.  However, I think that is only some of the developers.  There will always be junior developers and there are always other mind set factors that play in.

 What is in a Name & Quick Deploys?


What's Montague? it is nor hand, nor foot,
Nor arm, nor face, nor any other part
Belonging to a man. O, be some other name!
What's in a name? that which we call a rose
By any other name would smell as sweet;
- Juliet, Romeo and Juliet
Does the fact that we are in a in-group called 'test' and some see developers as an outsider matter?  Visa-versa is probably true too.  In fact, the other day Isaac and I had a small debate about correcting a tester in some very minor minutia about deep mathematical and development theory.  To me, perhaps a developer who was neck deep in building new languages would find such a correction helpful, but a tester who was doing some exploration around the subject didn't need such a correction.  In fact such a correction might make things more confusing to anyone who didn't have a deep understanding of the subject.  I myself am guilty of the insider/outsider mentality.  We all categorize things and use those categories to make choices.  Perhaps renaming us all as human beings would make things better.  If an organization is particularly combative, this might be a useful change.  That being said, there is also a group identity for people who see themselves as testers or QA.  There is a specific field of study around it, and that also adds value.  Also, by having a sub-group, the tension, even if minor, might possibly help people grow.

This is a mushy, non-business oriented view of the world, and perhaps those who are just interested in the how fast things go today would prefer avoiding the conflict in order to get things done.  The fear of doing this from a QA point of view comes from the 1980s, where defects were a big deal.  Renaming people feels like going back to that era.  The argument back is that this isn't the 80s, we can ship software almost instantly.  The argument to this is that some software can't just be shipped with broken code.  Work on many different areas must be 'mostly right' the first time.  Also, this grand new world that people want to move into has not been vetted by time.  Wait a while and see if it is the panacea it is believed to be.  I recognize the value in renaming positions in order to improve team cohesion and believe it comes from a genuine desire to improve.  Just keep in mind what is lost and the risks in doing so.

I should note, I have seen some of this implemented before in organizations I have worked in.  Working under a single development manager, the development manager would often ignore the advice given by the "QA" person.  Perhaps a full rename would have fixed things, but often development wants to move forward with the new shiny thing.  Having no testers is perhaps a new shiny, and so development would be on board.  The people who are more aware of risk of change are the QA and operations people.  This is a generalization, unsubstantiated by anything but my observation, but it feels like QA is now calling this a risk and because it threatens our jobs we are being told we are biased, so we can be ignored.  Psychology and politics are complex things and anything I write will not adequately cover this subject.  I don't object to people trying this, maybe it will work.  I still see it as a risk.  I hope some people take it and report back their observations.

Perhaps the final argument about us is that we are many different roles tied with one name.  This is a complexity that came up at WHOSE.  We talked less about testing than you might expect.  Communication and Math and Project Management and ...  We had a list of about 200 skills a tester might need.  We are a swiss-army knife.  We are mortar for the projects developed, and because every project is different, we have to fill in different cracks for each project.  We happened to do that best in some ways because we happen to see many different areas of a system.  We have to worry about security, the customers who use it, the code behind it, how time works, the business, and lots more.  That is ignoring the testing pieces, like what to test, when to test and how to justify the bugs you find.  Maybe we could remove the justification piece, but sometimes the justification matters even to the developer, because you don't know if it is worth fixing or what fix to choose.  I think we can't help being many different things because what we do is complicated.

Customers Don't Want QA?


While that is a lie in regards to me personally (I get excited when I see QA in the credits of video games), perhaps it is different for others.  I have written elsewhere how users of mobile devices are accepting more defects.  Perhaps that is true in general.  We know there is such a thing as too much QA.  To go to the absurd, when QA bankrupts the company, it is clearly too much QA.  We also know there is too little QA, when the company goes bankrupt from quality issues.  Perhaps we could say when there are regulatory constraints that close the company down because of a lack of QA could also be there.  Those are the edge cases, and like a typical tester, I like starting at boundary cases.  Customers clearly want some form of quality, and as long as testers help enhance quality (E.G. Value to someone who matters), testers are part of that equation.  Are they the only way to achieve quality?  Maybe not, but that involves either re-labeling (see above) or giving the task to others, be they the developers or the customers or someone else.  Some customers like the cheaper products with less quality, and that does exist.  Having looked at the games that exist on mobile devices that are free, my view is the quality is not nearly as good.  Then again, I want testers.  Maybe I'm biased.  Even if I was a developer for a company, I would like to have testers.

Should the ideas of no testers be explored?  I have no objection, but I would say that it should be done with caution and be carefully examined.  Which makes the companies who try it...well, testers.

 Brief Comment on the Community


I have heard a lot of good responses, albeit limited.  I have also seen some anger and frustration.  Sometimes silly things are said in the heat of the moment, likely from both sides.  We also have some cultural elements whom are jagged edged.  These can create an atmosphere of conflict, which can be good, but often just causes shouting without much communication.  One of the problems is most people don't have the time or energy to write long winded detailed posts like this.  We instead use twitter and use small slogans like #NoTesting.  I personally think twitter as a method for communicating is bad for the community, but I acknowledge it exists.  How can we improve the venue of these sorts of debates?  How can we improve our communication?  I don't pretend to have an answer, but maybe the community does.

UPDATE: I wrote a second piece around this topic.

Monday, December 1, 2014

Shorts: Ideas described in less words; Code Coverage, Metrics, Reading and More

Most times I write more 'essay' style articles, but Isaac and I have sometimes had small ideas we wanted to discuss but didn't feel like they were big enough to post on their own.  So I'm trying this out, with a series of small short ideas that might be valuable but are not too detailed.  Please feel free to comment on any of these shorts or on the idea of these smaller, less essay-style posts.  If you are really excited about a topic and ask interesting questions, I might try to follow it up with another essay-style post.

Code Coverage


Starting with a quote:
Recently my employer Rapita Systems released a tool demo in the form of a modified game of Tetris. Unlike "normal" Tetris, the goal is not to get a high score by clearing blocks, but rather to get a high code coverage score. To get the perfect score, you have to cause every part of the game's source code to execute. When a statement or a function executes during a test, we say it is "covered" by that test. - http://blog.jwhitham.org/2014/10/its-hard-to-test-software-even-simple.html

The interesting thing here is the idea of having manual testing linked to code coverage.  While there are lots of different forms of coverage, and coverage has limits, I think it is an interesting way of looking at code coverage.  In particular, it seems like it might be interesting if it was ever integrated into a larger exploratory model.  Have I at least touched all the code changes since the last build?  Am I exploring the right areas?  Does my coverage line up with the unit test coverage and between the two what are we missing?  This sort of tool would be interesting for that sort of query.  Granted you wouldn't know if you covered all likely scenarios, much less do complete testing (which is impossible), but more knowledge in this case feels better than not knowing.  At the very least, this knowledge allows action, where as just plain code coverage from unit tests as a metric isn't often used in an actionable way.

I wonder if anyone has done this sort of testing?  Did it work?  If you've tried this, please post a comment!

 Mario’s Minus World


Do you recall playing Super Mario Brothers on the Nintendo Entertainment System ("NES")? Those of you who do will have more appreciation for this short but I will try to make it clear to all.  Super Mario Bros, which made Nintendo's well known series of Mario games famous.  In it you are a character who has to travel through a series of 2D planes with obstacles including bricks and turtles that bite, all to save a princess.  What is interesting is that fans have in fact cataloged a large list of bugs for a game that came out in 1985.  Even more interesting, I recall trying to recreate one of the bugs back in my childhood, which is perhaps the most famous bug in gaming history.  It's known as the minus world bug.  The funny thing is in some of these cases, if a tester had found these bugs and they had been fixed and the tester would have tested out value rather than adding value in, at least for most customers.  I am not saying that we as testers should ignore bugs, but rather one man's bug can in some cases be another man's feature.

How Little We Read


I try not to talk much about the blog in a blog post (as it is rather meta) or post stats, but I do actually find them interesting.  They to some degree give me insight into what other testers care about.  My most read blog post was about the Software Test World Cup, with second place going to my book review of Exploratory Software Testing.  The STWC got roughly 750 hits and EST got roughly 450 hits.  Almost every one has heard of Lessons Learned in Software Testing (sometimes called "the blue test book").  It is a masterpiece made by Cem Kaner, James Bach and Bret Pettichord back in 2002.  I just happened upon a stat that made me sad to see how little we actually read books.  According to Bret Pettichord, "Lessons Learned in Software Testing...has sold 28,000 copies (2/09)".  28,000 copies?!?  In 7 years?!?  While the following assertion is not fully accurate and perhaps not fair, that means there is roughly 30k actively involved testers who consider the context drive approach.  That means my blog post which had the most hits saw roughly 3% of those testers.  Yes, the years are off, yes I don't know if those purchased books were read or if they were bought by companies and read by multiple people.  Lots of unknowns.  Still, that surprised me.  So, to the few testers I do reach, when was the last time you read a test related book?  When are you going to go read another?  Are books dead?

 Metrics: People Are Complicated

Metrics are useful little buggers.  Humans like them.  I've been listening to Alan and Brent in the podcast AB Testing, and they have some firm opinions on how it is important to measure users who don't know they are (or how they are) being measured.  I also just read Jeff Atwood's post about how little we read when we do read (see my above short).  Part of that appears to be those who want to contribute are so excited to get involved (or in a pessimistic view, spew out our ideology) that they fail to actually read what was written.  In Jeff Atwood's article, he points to a page that only exists in the Internet Archives, but had a interesting little quote.  For some context, the post was about a site meant to create a community of forum-style users, using points to encourage those users to write about various topics.

Members without any pre-existing friends on the site had little chance to earn points unless they literally campaigned for them in the comments, encouraging point whoring. Members with lots of friends on the site sat in unimpeachable positions on the scoreboards, encouraging elitism. People became stressed out that they were not earning enough points, and became frustrated because they had no direct control over their scores.
How is it that a metric, even a metric as meaningless as score stressed someone out?  Alan and Brent also talked about gamers buying games just to get Xbox gamer points, spending real money to earn points that don't matter.  Can that happen with more 'invisible metrics' or 'opaque metrics'?  When I try to help my grandmother on the phone deal with Netflix, the fact that they are running 300 AB tests.  What she sees and what I see sometimes varies to my frustration.  Maybe it is the AB testing or maybe it is just a language and training barrier (E.G. Is that flat square with text in it a button or just style or a flyout or a drop down?).

Worse yet, for those who measure, these AB tests don't explain why one is preferred over another.  Instead, that is a story we have to develop afterwards to explain the numbers.  In fact, I just told you several stories about how metrics are misused, but those stories were at least in part told by numbers.  In more theoretical grounds, let us consider a scenario.  Is it only mobile users who are expert level liked the B scenario while users of tablets and computers prefer A?  Assuming you have enough data, you have to ask, 'does your data even show that'?  Knowing the device is easy in comparison, but which devices count as tablets?  How do you know someone is an expert?  Worse yet, what if two people share an account, which one is the expert?  Even if you provide sub-accounts (as Netflix does), not everyone uses one and not consistently.  I'm not saying to ignore the metrics, just know that statistics are at best a proxy for the user's experience.