Sunday, May 18, 2014

Software Test World Cup 2014 - Part 1

[Part 2 is now posted which includes the scores we received and an analysis.]

I recently participated in the Software Testing World Cup 2014. Just to get it out of the way, our group did place in the semi finals, but did not win. However, that is of less importance to me. Wait you say, what is the point of a contest if not to win. Well for me, I was more interested in getting feedback, learning from the experience and seeing what the idea was. Since this is our blog, not my resume, I'd rather talk about what we did than sell myself. What I am going to try to capture is the experience and outcome of the event.

In starting about the event, I should be clear, a large part of the event happened before the event even started. We were asked to prepare to: Interact with the customer, test the software, write bugs and come out with a report on what was found. We also were told the judging would be on: Being on mission, quality of the bugs, quality of the test report, accuracy, non-functional testing and interaction with the judges.

Knowing that, I did a few things. The first was I assumed it was likely to be web based, as most apps don't work on all platforms, so I started to enumerate what sorts of non-functional tests we could do. This turned out to be wrong, but it is better to be over prepared than not in my view. Then I looked into how to write such as report as I don't do those sorts of consultant-oriented paperwork. I have conversations with the developers and work closely with the product owner. Once I had an idea for the format, I created a list of questions we would want to ask, some of which were perhaps excessive, but again, I assumed more was better because in the moment I could trim the list by the way the customer answered questions, but I might fail to think of a question which could not be 'included' unless I spent time thinking about it. Ironically this felt a little heavy (for a group of judges which is composed of lots of pro CDT people), but again, these sorts of reports are heavy in my view.

At this point we had generically pre-processed the report and the interaction with the judges. We were told the day of that it was visual screen capture software so I did some research before the start of the contest, to compare feature sets. I captured some possible tests we might do and documented those.

While I have built my own screen recording tools, and even automation tools I have rarely gotten to the chance to test out such a massively well known support tool as SnagIt. We didn't know it was SnagIt until half an hour before the competition had actually started. With half an hour to go, we started to organize our thoughts and take a look around the tool. I create a few product oriented questions and found some obvious bugs. One the really concerned me was the order page link did not work. That turned out to be important later. I wrote notes down, but didn't file bugs as the contest had not started. I was just trying to understand the product.

We were told we could start asking questions, and I started peppering the judge with questions, but try to keep them slow enough to know if I need a follow up question or not. The judge's ability to respond to feedback was a little poor. For example I wrote a question regarding screen capture in Firefox which includes a plugin, something that seemed poorly documented. I missed saying it was for Mac as the plugin is only for Mac. They got confused by this and I updated my response but they were not looking at the chat log. I don't mind the video chat, but that became frustrating for me.

We as group decided two people would be in Mac land, one would be in windows VM and I would be between my personal win 8 box and my Mac.  We split up the functionality as well and started to test.  We did a load test using static fuzzing for a recording (E.G. something that wouldn't compress well) and my win 8 box has very little resources, so I did a load test where I recorded video while compiling.  I know lots of other testing was done, but those were some of the more interesting tests that I can recall (this was all written weeks after the contest was over).

As a tool smith who specialized in Windows, I appreciate how hard it is to build complex tools. However, when the product owner said that they used the tool to capture screenshots of bugs in the tool it seemed funny as we wrote up several bugs around that. We found that claim to be incorrect, but I wonder if the judges heard/recalled that claim since I didn't create strong paper-trail around it. Also because the judges in general don't have conversations with people afterwards it is hard to know what someone is thinking. I get that writing has to be clear, but how much context does one normally capture? Without knowing the culture, that might have been a throw away line from the product owner or it might have been serious. It makes me appreciate consultants in some ways, even if I do think they sometimes are more about conning and insulting than providing useful data.

It was genuinely neat to try to test something I would not have tested otherwise and doubt I will ever personally work on. It was fun. We found a lot of bugs. With one hour left I asked for permission to load test the download link, as that seemed part of the system and non-functional. We couldn't get a response. The judges were joking and talking about their experiences. Isaac tweeted two of them and they ignored them. The judges were filling the dead space that should have been silent with conversation, which I think was a detriment to the experience, but on the other hand, maybe that is part of the contest. Simulating that annoying coworker who keeps talking and talking in one long unbroken sentence moving from topic to topic... Yeah, Wayne! Quit that.

While finalizing our report, the judges sent a response to one of my early findings about the order link being broken saying that was expected.  I had put in a post 2+ hours earlier about that bug and the judges had not answered.  I didn't find out they had responded until after the contest was over.  We were given a chance to amend our report because of this, which I appreciate, but I also think it might have changed our testing strategy.  When you find one bug, you often look for clusters of bugs, which is what I did.  Obviously life is not fair and I don't expect the first year of a contest to run perfectly smoothly.

At this point we know our placing but not our scores. The contest was broken up by judge with points and scores attached. We got bonus points for helping one user, or at least that is what we were told. I wonder if we will get them or if their will be fear that giving said scores might cause disputes like you forgot to give points for X. It would suck if our placement went up or down at this point. But we shall see.  I will post a second blog entry based upon the score information we are given.


  1. Hi JCD,

    Thanks for writing this up! I was a judge of the Oceania comp, and it has been very interesting to read posts from the other side of the fence.

    In particular I like your comments about the context in a competition and having CDT judges. Maybe there'll be a slightly different format in November when contestants and judges are colocated, I don't know. For now though, it's safe to say that all participants were in the same boat, and the competition may be more fair than real life in that respect.

    The North America comp was the first one of this World Cup event. No doubt there were some teething issues...

    Hopefully my blog post will help to explain things a bit more from the judges perspective, in terms of what was going on during the Oceania competition ( For example:
    - The YouTube stream is recorded, so long periods of silence wouldn't have been great from that perspective. All comments (verbal and written) can be viewed again on YouTube if you're interested.
    - In our comp some of the judges were online at 3AM and 11PM, so I think we were partially talking to keep ourselves awake at the end there. I assumed most participants were tuning us out and getting on with their work at that stage :)
    - Just like the participants, the judges were aiming to keep on top of many different communication streams. Twitter may have been seen as the lower priority?
    - YouTube comments are difficult to follow, particularly replies to comments. I don't know what the answer is, because posting replies as new comments would have also been very difficult to follow during a live event.
    - Although I don't think it was intentional (you never know..) it does mimic real-life when product owners aren't as familiar with technical details as the team are (like FF plugins for Mac), or when they make off-the-cuff remarks which may not be entirely accurate.
    - Judges are humans too and product owner\judges comments can be considered a heuristic oracle. A fallible source of truth, as you discovered :)

    I haven't seen your bug reports or test report, but you mention limited experience with test reporting. One tip I have in general for test reporting is to include any limitations, assumptions and workarounds. Eg:
    - You were faced with limitation of not getting timely responses to questions x, y and z.
    - You made an assumption that you were not allowed to perform stress testing in the absence of being given explicit permission, due to impact this could have for users in production, etc.
    - Workaround is a recommendation that product owner seeks further testing in areas a, b and c, using tool 'blah'. This tool is recommended for reasons d, e and f.
    - and so on...

    From your advance preparation for the competition, the fact that you took the time to write this post, and the observations that you've made here, I'd say that you're exactly the calibre of tester that the organisers were hoping to attract with this competition. I get the feeling that the experience has left a bad taste in your mouth, and I sincerely hope that's not the case. It would be great to see you participate again if there's another world cup event next year, and perhaps even apply to be a judge :)


    1. Hi Kim,

      To address your last issue first, like when we find bugs and report them to the developers, I wanted to capture the issues I saw in the contest in order to improve it. Well, other than yelling at the judge's images on the youtube channel because of the fact I couldn't get a response. :) Really I don’t have any problems with the contest as a whole, I just think there are kinks to be worked out.

      To address your bullet points (some points are combined):

      I understand that it is a youtube video, but for the judges to go back over the speakers would be very difficult, as you never know when real content is occurring vs just some guys talking. I should know, as I reviewed parts of the video. From a contest perspective it is, in my opinion, a bad idea. As for keeping awake, that is a fair point, human fatigue was probably an issue, related to multiple issue we were seeing.

      As for communication I used the youtube channel while Isaac used twitter. I think fatigue a more likely reason for the failure. Honestly, they also all seemed so intent on their conversation that I don’t think they were paying enough attention towards the end.

      I agree that the post/reply in youtube was poor. In our case though, we had a post that took 2.5 hours to get a response to. In my eyes, that is far too long. I do appreciate that the judges did go back over the comments, but it has to be timely for a contestant to notice. Co-location will certainly help for next time.

      Yes, my failure to write a full and complete sentence was a failure. I am use to having conversations, so to be fair, using written real-time many to many communication is not something I have do often. I am sure the same is true for the product owners and judges. When I write a report I try to include what went wrong on both sides. That is what we, as a group, learn from.

      I want perfect oracles! Until then I guess I will have to accept POs/Judges. :) The fact most of the judges didn’t even have the SUT installed however does strike me as less than prepared, at least from a judging perspective. It sounds like you had the SUT installed, which is a improvement, and hopefully a lesson learned from the first region’s contest.

      I will post our bug report when (if) we get a judge’s review with it. Otherwise the report on its own makes less sense, since we don’t know if it would be rated well or poorly. I might also include a bug or two in my next post just as brief examples. Assuming I still have access to the bug DB.

      Thanks for the interesting and informative comment!

      - JCD

      P.S. I noticed you had noted that the test repot needed to be submitted when the contest ended. Interestingly enough, I was the one who asked the judges (Matt and Maik) how long we had before the contest started, as the rules at that time had not been made clear. Again, I hope to have improved the process rather than detracted from it.

  2. This comment has been removed by a blog administrator.

    1. Could you clarify your question? I talk about a lot of different ways to find software issues, and different types of issues. In particular I specialize in the automated parts of testing, but I appreciate both automation and purely manual testing. Are you looking for techniques for find bugs using manual test techniques or finding bugs using an automated approach? While I can talk about manual testing techniques meant to find bugs, I think Cem Kaner's BBST is a better place to start. He has entire classes around designing your tests for various needs as well as a good course on reviewing black box testing.

      Regarding automation, without knowing what type of bugs you are looking for as well as the type of software under test, I can only give general advice. Automation generally does not specialize in finding new bugs, as most automation written is regression oriented and not around new functionality. If you want to find lots of bugs in automation, you might look at fuzz testing, load testing and security testing to find non-regression bugs.

      Please provide me more details in your question and I will try to write a more detailed reply.