Tuesday, May 20, 2014

Software Test World Cup 2014 - Part 2

I wrote recently about the Software Test World Cup. In the last post I said I would post our score card and an analysis if one was possible. First I will give you the raw data, with some "Avg" columns removed. I have not intentionally edited the content other than making the judges more anonymous, format shifting from excel and removing extra columns involving averages and totals. Oh and I marked spelling mistakes with sic, which I just learned should be written with brackets, not parenthesis.

JudgesImportance of Bugs FiledQuality of Bug ReportsNon-Functional Bugs filedWriting/ Quality of Test ReportAccuracy of Test ReportBONUS: Teamwork/Judge interaction 0-10NOTES
A1616914141Bonus: Test Report/Bugs made me want to engage the customer. Several Usability issues.
B15151113131I found that the report did not flow well. I know many teams are expected to give ship/no ship decisions, this [sic] iritates me, i promise not to let it affect my [sic] juding.
C12111014121I like the test report. It gives practical examples, what could have been [sic] testeed for different aspects, e.g. Load, but the ship decision and the "major" bugs seem not to fit imo. bug spread is okay, they tried to consider disability issues (red/green vs. [sic] colorbling ID326)

First of all, I thank the judges for making not only numeric judgments, which is super hard, but also that they spent the time to write some comments. I appreciate that. However, in looking at the judge's comments it seems confusing. For example, C says they liked the report, but their score was no higher than the other reports. They complain that the major issues seem not to fit, such as crashes and the inability to order the product. Granted I don't have the full list of bugs we found, but I wonder what priority they were looking for. That is left to the reader's imagination.

Judges A and B were kinder score wise. Judge A gives us a bonus, but not the bonus we were promised by Matt Heusser, the product owner for answering a question in the youtube channel. Certainly the bonuses were not used how I imagined. I thought our conversations with Matt and the Product Owner would be part of that bonus, but it appears the judges ignored this. I did find it interesting that Matt said that usability would be part of non-functional [no citation, I'm not rewatching 3 hours of video]. We also asked for permission to do some load testing on the Snagit site but never got a response from the product owner, so we chose not to because of legal and ethical implications. It seems like that was not applied in the scoring, but maybe I am wrong. With Snagit as the tool to test there isn't a lot of non-functional testing to do. Judge B on the other hand was harsh in comments but gave good scores (relatively). I agree with Judge B that providing a ship/no ship is annoying, but that is what a conversation is for. We didn't get to have one of those, so we did the best we could.

It is interesting how diverse the opinions of the judges were and also how middle in the road most of their scores were (all items but the bonus were out of 20). Considering how highly we placed, I am guessing either the judges were never impressed or they found out how bad things got and a 10 / 20 really is more like a 15 / 20 relatively speaking. Finally, I promised our test report. Sadly I can't upload the thing to blogspot, so instead I am going to post it below, with some attempts to deal with formatting.

Functional Test Report

By: JCD, Isaac Howard, Wayne Earl, and KRB


Do not ship

Major Issues:

Undo does not always work, undoing the wrong thing. We found a good number of bugs, including multiple crashes; some seem more realistic seen than others. Bug 241 showed that webcam capture crashed on one particular Mac. When no webcam exists and you attempt to take a camera capture Snagit closes. The Order Now, Tutorial, Get More Stamps and New Output buttons on went to a 404 page. With Preferences open in windows 7, 8, the application refuses to take screenshots. There are about 20 priority 1-3 bugs, which suggests the application isn’t finished.

What Did Work:

We did a performance test with a small Win 8 with 4 gigs of ram and 1.8ghz cpu. It succeeded to take video at a viewable quality. The editor in most cases worked well. Basic usage in most cases works well. The mobile integration worked.


We earned bonus points per Matt for giving advice on how to edit video (Jeremy Cd). We asked multiple times in the youtube channel if we could load test the system’s web site but could not get back a response from the judges/Matt. We chose not to ‘hack’ the system due to legal and ethical issues.

Limits of Testing:

- No Automation was generated
- State of Unit Testing is unknown (Customer’s don’t know about unit tests)
- We did even come close to hitting all the menu items.
    o We only have a rough set of tests of windows. Most of our testing was on Macs.
- We didn’t have the technical expertise in the system to capture logs.
- We didn’t have the time to capture the before and after
- We have limited experience with the SUT.
- We only tested the configurations provided. Other configurations of the system were ignored.
- Could not have a conversation with product owner on critical bugs after he left.
- Driver/Hardware testing is limited. We mostly have Maverick OSX.

How Testing Was Planned:


As we have been informed that this is a highly hardware dependent product (video and screen capture), which is a commodity, we decided that actually looking at the website of the product might be as important as the product itself. Particularly since a user cannot determine the differences in quality between products without testing the products, the website becomes a primary concern. Particularly since we don’t know they support the types of computers we have, we might be forced to test the site. Finally, if the product has a price, making sure you can’t break the security and get to the download system for free is important.


In discussing load testing, we have considered trying to record multiple youtube videos all playing at once, which will stress the hardware and will make for a great deal of variation of the recorder to capture. It might also add a lot of audio channels to record if required. Also using a TV-static screen will push the compression algorithm, as it cannot be compressed well, so we might also test with that. Finally attempt to use a slower system, with little ram and a slow hard disk with lots of CPU and disk usage to see if the recording fails.


We will attempt to get our relatively few mobile devices to load the system if possible and do some basic usage.


If it is not easy to do the basics of screen/video capture, then users may ask for their money back or go find a different product.

-Feature comparison:

• Save a file with a good and a bad file name.
• Record and Capture on app basis, full screen, area
• Two monitors vs One Monitor
• Editing if supported
• Sound if supported
• Pan and zoom if supported
• Arrows, Text, Captions, etc.
• Transitions
• Competitive Intel:
    o http://www.techsmith.com/tutorial-camtasia-8.html
    o http://www.techsmith.com/jing-features.html
    o http://www.telestream.net/screenflow/features.htm
• Long time record (If possible)
• Upload Tools
• Formats supported
• Merging/Dividing recordings
• Tagging / describing the videos other than just the file name
• How easy is it to take the output and use it with another system (e.g. I want to quickly use screen shots and videos from this tool and add them to a bug I created in JIRA)
• Can you add your own voice to the recording? Like narrate what you are doing or what you expect via the microphone on the device you are using
• Can you turn this ability off so you don't hear Isaac swearing at the system or can you remove the swearing track after the fact?

- Who are the stakeholders for this testing?
- What are the requirements? What is the minimum viable feature set?
- What is the goal of this testing (Important bugs, release decision, support costs, lawsuits, etc.)?
- Can you please give 3 major user scenarios?
- What is the typical user like? Advanced? Beginner?
- What are the top 3 things the stakeholders care about, such as usability and security.
- Can you give an example of the #1 competitor?
- What sorts of problems do you expect to see?
- Does the SUT require multi-system support? What systems need support (E.G. Mobile, PC, Mac, etc.)? Are there any special features limited to certain browsers/OSes? How many version back are supported?
- What is the purpose of the product? Do you have a vision or mission statement?
- Can you describe the performance profile?
- Is there any documentation that we should review?
- What languages need to be supported?
- Does the application call home (external servers)? Are debug logs available to us? Even if not, is anything sensitive recorded there? Are they stored in a secure location, either locally or externally?
- Do we know what sorts of networks our typical user will use to download the app with? Do we know what the average patch size is?
- Are there other possible configurations that might need to be tested?
- How does the product make money? What is the business case rather than the customer case?
- Should we/can we do any white box testing?
- Can we get access to a developer to ask questions regarding the internals of the system, code coverage, etc.?

Sunday, May 18, 2014

Software Test World Cup 2014 - Part 1

[Part 2 is now posted which includes the scores we received and an analysis.]

I recently participated in the Software Testing World Cup 2014. Just to get it out of the way, our group did place in the semi finals, but did not win. However, that is of less importance to me. Wait you say, what is the point of a contest if not to win. Well for me, I was more interested in getting feedback, learning from the experience and seeing what the idea was. Since this is our blog, not my resume, I'd rather talk about what we did than sell myself. What I am going to try to capture is the experience and outcome of the event.

In starting about the event, I should be clear, a large part of the event happened before the event even started. We were asked to prepare to: Interact with the customer, test the software, write bugs and come out with a report on what was found. We also were told the judging would be on: Being on mission, quality of the bugs, quality of the test report, accuracy, non-functional testing and interaction with the judges.

Knowing that, I did a few things. The first was I assumed it was likely to be web based, as most apps don't work on all platforms, so I started to enumerate what sorts of non-functional tests we could do. This turned out to be wrong, but it is better to be over prepared than not in my view. Then I looked into how to write such as report as I don't do those sorts of consultant-oriented paperwork. I have conversations with the developers and work closely with the product owner. Once I had an idea for the format, I created a list of questions we would want to ask, some of which were perhaps excessive, but again, I assumed more was better because in the moment I could trim the list by the way the customer answered questions, but I might fail to think of a question which could not be 'included' unless I spent time thinking about it. Ironically this felt a little heavy (for a group of judges which is composed of lots of pro CDT people), but again, these sorts of reports are heavy in my view.

At this point we had generically pre-processed the report and the interaction with the judges. We were told the day of that it was visual screen capture software so I did some research before the start of the contest, to compare feature sets. I captured some possible tests we might do and documented those.

While I have built my own screen recording tools, and even automation tools I have rarely gotten to the chance to test out such a massively well known support tool as SnagIt. We didn't know it was SnagIt until half an hour before the competition had actually started. With half an hour to go, we started to organize our thoughts and take a look around the tool. I create a few product oriented questions and found some obvious bugs. One the really concerned me was the order page link did not work. That turned out to be important later. I wrote notes down, but didn't file bugs as the contest had not started. I was just trying to understand the product.

We were told we could start asking questions, and I started peppering the judge with questions, but try to keep them slow enough to know if I need a follow up question or not. The judge's ability to respond to feedback was a little poor. For example I wrote a question regarding screen capture in Firefox which includes a plugin, something that seemed poorly documented. I missed saying it was for Mac as the plugin is only for Mac. They got confused by this and I updated my response but they were not looking at the chat log. I don't mind the video chat, but that became frustrating for me.

We as group decided two people would be in Mac land, one would be in windows VM and I would be between my personal win 8 box and my Mac.  We split up the functionality as well and started to test.  We did a load test using static fuzzing for a recording (E.G. something that wouldn't compress well) and my win 8 box has very little resources, so I did a load test where I recorded video while compiling.  I know lots of other testing was done, but those were some of the more interesting tests that I can recall (this was all written weeks after the contest was over).

As a tool smith who specialized in Windows, I appreciate how hard it is to build complex tools. However, when the product owner said that they used the tool to capture screenshots of bugs in the tool it seemed funny as we wrote up several bugs around that. We found that claim to be incorrect, but I wonder if the judges heard/recalled that claim since I didn't create strong paper-trail around it. Also because the judges in general don't have conversations with people afterwards it is hard to know what someone is thinking. I get that writing has to be clear, but how much context does one normally capture? Without knowing the culture, that might have been a throw away line from the product owner or it might have been serious. It makes me appreciate consultants in some ways, even if I do think they sometimes are more about conning and insulting than providing useful data.

It was genuinely neat to try to test something I would not have tested otherwise and doubt I will ever personally work on. It was fun. We found a lot of bugs. With one hour left I asked for permission to load test the download link, as that seemed part of the system and non-functional. We couldn't get a response. The judges were joking and talking about their experiences. Isaac tweeted two of them and they ignored them. The judges were filling the dead space that should have been silent with conversation, which I think was a detriment to the experience, but on the other hand, maybe that is part of the contest. Simulating that annoying coworker who keeps talking and talking in one long unbroken sentence moving from topic to topic... Yeah, Wayne! Quit that.

While finalizing our report, the judges sent a response to one of my early findings about the order link being broken saying that was expected.  I had put in a post 2+ hours earlier about that bug and the judges had not answered.  I didn't find out they had responded until after the contest was over.  We were given a chance to amend our report because of this, which I appreciate, but I also think it might have changed our testing strategy.  When you find one bug, you often look for clusters of bugs, which is what I did.  Obviously life is not fair and I don't expect the first year of a contest to run perfectly smoothly.

At this point we know our placing but not our scores. The contest was broken up by judge with points and scores attached. We got bonus points for helping one user, or at least that is what we were told. I wonder if we will get them or if their will be fear that giving said scores might cause disputes like you forgot to give points for X. It would suck if our placement went up or down at this point. But we shall see.  I will post a second blog entry based upon the score information we are given.