About 98 Percent Done: November 2013

Monday, November 25, 2013

Context

So recently my children's school underwent a renovation over the summer. They changed the carpet from institutional (a dreary mud-grey) to fun (black and white stripes with occasional blocks of bright colors). They painted all the cubby hangers the kids had (used to be varnished particle board, now it's bright elementary colors). They moved / changed 10 of the 12 teachers around to different grades. The front office got a complete remodel, including windows where walls used to be.
My daughter came home the other day and said, "Dad, they made some big changes at school, and I'm not sure I like them." So in my practicing socratic style I asked her why she didn't like the new changes. "Well, they halved the salad bar, and put the silverware on the old half of the salad bar, so there aren't as many vegetables as there used to be. Can I start taking lunch to school with more vegetables?"
In her world, carpet, wall hangings, windows and teachers in grades didn't matter. What really mattered to her was how many vegetables were served at lunch.

And that's what most people think about when they think of your software. They don't care how it was envisioned, backlogged, developed, tested or deployed. They care about how it directly affects them in their attempt to use your product. Think about it from there next time.

Monday, November 18, 2013

Word of the Week: Oracle

Oracle Test Definitions

Thanks to:
Isaac Howard and Wayne J. Earl who had a great deal to do with the editing and formulation of this article.

Like my previous word of the week on Heuristics and Algorithms, this is a complicated one. According to wiki,

An oracle is a mechanism used by software testers and software engineers for determining whether a test has passed or failed.

According to the wiki citation, this comes from BBST, which has some thoughts about what an Oracle is or isn't. Specifically it talks a lot about Oracle Heuristics, an interesting combination that Bach roughly states as a way to get the right answer some of the time. I don't feel I have a problem with that, but then we go back into the BBST class and things get confusing. On Slide 92 of the 2010 BBST course, it says,

How can we know whether a program has passed or failed a test? Oracles are heuristics

Slide 94 says:

An oracle is a reference program. If you give the same inputs to the software under test and the oracle, you can tell whether the software under test passed by comparing its results to the oracle's.

Which later goes on to say that the definition is wrong. The slides does so because of the claim that Oracles are heuristics.

Classic Problems With The Definitions

But how can this be so if Oracles know all? They are the truth tellers. Well the problem in software is that Oracles are not absolute like in the stories. They give you an answer, but the answer might be wrong.

For example, you might test Excel and compare it to a calculator. You take the calculator and enter 2.1 * 1 getting back the value 2. Now perhaps the calculator is setup to provide integers, but when you compare it to Excel's output, you find that Excel gives back 2.1. This appears to be a failure in Excel, but in reality it is a configuration issue. The heuristic is in assuming your Oracle is right. This might of course be a false assumption, or it might be only right in some circumstances. Interestingly, one of the creators of BBST, Cem Kaner, has revised the definition of Oracle slightly in a posting about Oracles and automation,

...a software testing oracle is a tool that helps you decide whether the program passed your test.

Partial Oracle

While I don't want to get too far off track, I do want to note that Partial Oracles do exist, and from what I can tell, they are Oracles that tell you if the answer is even reasonable. For example, for two positive integers, you can say that when you add them together, you will get a larger number than either of the separate digits. 1+1=2. Thus 1<2. 3+4=7. Thus 4<7. In both cases the larger number is always smaller than the sum of the two numbers, for ANY two numbers.

New Questions

Let me chart out the idea of an Oracle:

Test: Tester runs test. Example: Login with valid user.

Result: Login takes 5 seconds and goes to a internal page.

Request for Data: Make a request out to the Oracle; Was 5 seconds too long?

Process: Oracle considers the answer.
Data: Oracle generates the answer: Yes, 5 seconds is too long.

Compare: Verify if Test's Result are acceptable.
Output: Test's Results are not acceptable.
React: Tester reacts to the result. Maybe they fail the test. Maybe they...

Now lets get picky. Is the monitor (the display) which is beaming information to you an Oracle? It shows the results you requested and is a tool. While I noted the parts that are the Oracle, who is this Oracle? If the Oracle is your mind, then what makes this different from testing?

My colleague Isaac noted that the requirements of a test in most definitions includes making observations and comparing them to expectations. For example, Elisabeth Hendrickson said,

Testing is a process of gathering information by making observations and comparing them to expectations.

Does this make an Oracle simply part of a test? Even to be able to come up with the question seems to indicate you might suspect an answer. Is this too long? Well, in asking that, one assumes you have a built in answer in your head. Perhaps you are wrong, but that is part of what an Oracle can be.

Alternatively, maybe an Oracle is an external source, thus it has to be outside of the "Tester". If that is the case, then can the Oracle be the System Under Test? Imagine testing using two browsers at the same time doing the above test and the login time has a large difference between browsers. Is the Oracle the browser or the SUT?

Lets take a different approach. Lets say you take screenshots after 4 seconds from logging in using an automated process. You compare the screenshot to the previous version of the result, pixel by pixel. If the pixels compare differently, the automation reports back a failure. Where is the Oracle? The request for data was to get a previous edition of the SUT in image form. No processing occurred, so the Oracle can't be the processing, but perhaps the image itself is the Oracle. Or is the Oracle the past edition of the site and the data? Continuing into this, the automation pulls out each pixel (is that the Oracle?) then compares them. But wait a minute... someone wrote the automation. That someone thought to ask the question about comparing the images. Are they the Oracle? Since the author is the Tester (checker, whatever) the first time, capturing the first set of images, they saw what was captured and thus became a future Oracle.

Even if the Oracle is always an external source, is it always a tool? Is a spec a tool? Is another person (say a manager) a tool? No, not that sort of tool. Is a list of valid zip codes a tool or just data?

In case you are wondering, many of the examples given are real with only slight changes to simplify.

How Others Understand Oracles - BBST

Perhaps you feel this detailed "What is it?" questioning is pedantic or irrelevant. Perhaps we all know what an Oracle is and my attempt to define it is just making a mess. In order to address that question, I am going to do something a little different from what I have done in the past. I'm going to open up my BBST experience a little, as I answered the question Kaner wrote about and then talk a little about it. To be clear, this answer has the majority of the content pulled out, as it could be used as an exam question in the future:

Imagine writing a program that allows you to feed commands and data to Microsoft Excel and to Open Office Calc. You have this program complete, and it’s working. You have been asked to test a new version of Calc and told to automate all of your testing. What oracles would you use and what types of information would you expect to get from each?

1. Excel (Windows) – Verify that the most common version of Excel works like Calc does. Are the calculations the same? Do the formulas work the same? If you save a Calc file can Excel open it? What about visa-versa?

a. I’m looking to see if they have similar calculations and similar functionality.

<snippet>

7. Stop watch – Does Calc perform the tasks at the same rough rate as Excel, Google docs? Is it a lot faster or a lot slower?

a. I’m looking to see if we are consistent with current user expectations of speed.

The responses I got were interesting in my peer reviews. Please note I removed all names and rewrote the wording per the rules in BBST. One person noted that item 7 would be categorized under performance and that item 1 was looking at consistency with comparable products. Multiple reviewers felt I was looking at consistency, a heuristic Bach created. What I find odd about that, is the need to label the Oracle when the Oracle (in my view at the time) was the tool, not the heuristic, therefore citing the heuristic of comparable products was not part of the question. I got a complaint that I was not testing Excel or Google but Calc, yet the call of the question is about how I would use those Oracles. One fair issue was, I should have noted I could have compared the old version to the new version using a stop watch, which I had missed. However, I had cited Old Calc in my full document, so I think that was a relatively minor issue.

Since Oracles are tools, how can I not be implicitly testing Excel? I kept hearing people say I should name my Oracles, yet to me I was naming them very clearly. I got into several debates about if something like Self Verifying Data is in fact an Oracle (even though the article clearly has its own opinion on that)! It seemed like everyone wanted to label the heuristic the Oracle, probably because of the "Heuristic Oracle" label in BBST. While I did feel BBST failed to make clear what an Oracle is, it did make me think about Oracles a lot more.

Wrapping Up

I'm sorry if that felt a little ranty, but by talking about this subject, I want you to also think about what you see as an Oracle. Oddly, Kaner himself cite's Doug Hoffman with items which I did not consider an Oracle (such as Self Verifying Data) when I started writing this article. I think Kaner's own work defends his view point, as he doesn't appear to use his own rule (his definition) to the letter but rather the similarity of the items, a method humans appear to use frequently.

Truth be told, I'm not so sure that Oracle should be a word we in the industry should be using at all. Isaac does not seem to believe in Oracles anymore, and appears to feel the definition is broken as it really cannot be separated from test. To me, I see that many people seem to use it and perhaps it can have value if we shrink down the role of the word. So let me wrap this up with my attempt to patch the definition into something that is usable.

Oracle: One or more imperfect external sources (often tools) that provide data or are data to assist in determining the success or failure of a test.

For a not exactly correct but useful working definition, an Oracle is the value that is the expected result.

What do you think a Oracle is? Are we missing critical details or other expert views? Feel free to write us a note in the comments!

Thursday, November 7, 2013

@Testing with [Reflections] Part II

If you haven't already, I suggest you read about reflections before reading too deeply into the topic of annotations. In case I failed to convince you or in case it didn't totally sink in, reflections as I see it is a way for code to 'think' about code. In this case we are only considering the "reflection" piece and not the "eval" piece that I spoke about previously. Reflections supports some pretty fascinating ideas and can be used in multiple different areas with different ways of doing so. Annotations (Java) or Attributes (C#) in comparison would be code having data about code which works hand in hand with reflections.

xUnit

Lets start with one of the most common examples people in test would be exposed to:

[Test] //Java: @Test()
public void TestSomeFeature() { /*... */}

First to be clear, the attribute (or commented out annotation) is on the first line. For clarities sake, for the rest of the article I'm going to say "annotation" to mean either attribute or annotation. It defines that the method is in fact a "Test" which can be run by xUnit (NUnit, JUnit, TestNG).

The annotation does not in fact change anything to do with the method. In fact, in theory xUnit could run ANY and ALL methods, including private methods in an entire binary blob (jar, dll, etc.) using reflections. In order to filter out the framework and helper methods, they use annotations, requiring them to be attached so they know which methods will run. This by itself is a very useful feature.

The Past Anew

Looking back at my example from my previous post, you can now see a few subtle changes which I have marked NEW:

class HighScore {
 String playerName;
 int score;
 @Exclude()//NEW
 Date created;
 int placement;
 String gameName;
 @CreateWith(Class = LevelNamer.class)//NEW
 String levelName;
 //...Who knows what else might belong here.
}

Again, the processing function could be subtly modified to support this change. The change is again marked NEW:

function testVariableSetup(Object class) {
for each variable in class.Variables {
 if(variable.containsAnnotation(Exclude.class)) //NEW
  continue;//NEW don't process the variable.
 if(variable.containsAnnotation(CreateWith.class)) { //NEW
  variable.value = CreateNewInstance(variable.getAnnotation(CreateWith).Class).Value();//NEW ; Create New Instance must return back the interface.
 }//NEW
 if(variable.type == String) then variable.value = RandomString();
 if(variable.type == Number) then variable.value = RandomNumber();
 if(variable.type == Date) then variable.value = RandomDate();
}

So what this code demonstrates is the ability to exclude fields in any given class by attaching some meta data to it, which any functionality can look at but doesn't have to. In the high score class we marked the created date variable as something we didn't want to set, maybe because the constructor sets the date and we want to check that first. The second thing we did was we set a class to create the levelName. The levelName might have a custom requirement that it follow a certain format. Having a random String would not do for this, so we created an annotation that takes in a class which will generate the value.

Now we could have a different custom annotation for each and every custom value type, but that would defeat the purpose of make this as generic as possible. Instead, we use a defined pattern which could apply to several different variables. For example, gameName also had to follow a pattern, but it was different from the levelName pattern. You could create another class called gameNamer and as long as it followed the same design (had a method called "Value()" that returned a string), you could just use the CreateWith(Class=X) annotation and they would act the same. This means you would not need to add another case in the testVariableSetup method or even change it. In Java and C# the mechanism for this is a common ancestor which can be either an interface or an abstract class. That is to say, they both inherit from the same abstract class or implement the same interface. For the sake of completeness and to help make this make sense, I have included an updated pseudo code examples below:

class HighScore {
 String playerName;
 int score;
 @Exclude()//NEW
 Date created;
 int placement;
 @CreateWith(Class = GameNamer.class)//NEW
 String gameName;
 @CreateWith(Class = LevelNamer.class)//NEW
 String levelName;
 //...Who knows what else might belong here.
}
// All NEW below:

class ICreateWith { String Value(); }
class GameNamer implements ICreateWith { String Value() { return "Game # " + RandomNumber(); }
class LevelNamer implements ICreateWith { String Value() { return "Level # " + RandomNumber(); }

//In some class
class some { ICreateWith CreateNewInstance(Class class) { return (ICreateWith)class.new(); } }

TestNG - Complex but powerful

One last example that is a little more of a real life example that is common in the testing world. Although I think the code might be a little too complex to get into here, I want to talk about a real life design and how it works in general. TestNG uses annotations with an interesting twist. Say you have a "Group", a label saying that this is part of a set of tests. Well, perhaps you have a Group called "smoke" for the smoke tests you want to run separate from all the others. TestNG might support filtering, but between TestNG and Maven and ... you decide you want to do determine the filtering of tests at run time using a flag somewhere (database, environment variable, wherever) that says "run smoke only". During run time, TestNG calls an event saying "I'm about to run test X, here is all the annotation data about it. Would you like to change any of it?" At this point you can read the Group information about the test. If your flag says smoke only, you can then check the groups the test has. If the Group list does not have smoke in it, you set the test to enabled=false, changing the annotation's data at run time. TestNG calls this Annotation Transformations. I call it cool.

The weird part is that you are modifying data at run time annotation data that is hard coded at compile time. That is to say, annotation values cannot be actually changed at runtime*, but a copy of the instance of them can be. That is what TestNG actually changes from what I can tell.

If you are reading this and saying, this topic is rather confusing, don't feel too badly. I know it is confusing. The TestNG part in particular is a bit mind bending. And to be clear, I don't see myself as an expert. There are way more complex ideas out there that just amaze me.

* This is from what I can tell. Perhaps there are reflective properties to let you do this. You can however override annotations through inheritance, but that is a more complex piece.

Friday, November 1, 2013

How do you know when you are right?

I have been thinking about this problem, the problem of correctness (which is what I mean by "right") for a good number of years, but never formally. This post was written a bit on a lark, so forgive the lack of intellectual rigour in the post. So what sorts of strategies do we have at are disposal:

Faith - Assume another has all the knowledge and take it as The Truth(tm) without a good reason for confidence.

Example: My fortune cookie said ... and while it has never been right, it will be this time.

Personal History - Using your personal knowledge to judge the correctness.

Example: You've known your smart friend for years and he typically doesn't lead you astray, so I trust this to be true.
Example: My teacher said the earth is round.

History - If it ain't broken, don't fix it.

Example: My grandfather worked as a miner, my father worked as a miner, therefore I should work as a miner.

Senses - Observe and deduce.

Example: I see a black sky every night, so the sky must be black everywhere during the night.

Logic - Attempt to determine rightness by applying a set of rules.

Example: If I am floating, I am not on Earth.

Scientific Method - Test it and test it until you believe you have a more accurate world view.

Example: 5 / 6 times I tried to login with this username/password, it succeeded. Therefore, this username / password must be correct.

Probability - Attempt to assign probabilities that a given item is right. This might be via the scientific method, your senses, etc.

Example: If I am floating, I have a 60% chance of being in space, a 39% chance of being on the vomit comet and a 1% chance of something outside of my experience.

Research - Use the "Faith" principle, but take multiple sources. Intentionally look for counter arguments. Intentionally look for consensus.

According to Lou Adler and Venkat Rao, you should ask an interviewee to tell a stories about their success. Others, like Nick Corcodilos note that different questions are useful for different requirements.

Random / Sounds Good- Choose a value at random or seemingly random and claim it is right. This might be a intentional lie or it could be the first thing that 'popped' into your head.

Example: Q: How many wheels does a typical truck have? A: 16. Q: How did you get the value? Well I knew a little, but I basically guessed.
Example: Q: Are you a better than average driver? A: Yes. (80-90% of the American population says yes, thus some are lying or 'randomly' choosing yes because it sounds good).

Assume - Sometimes we just assume something is the truth without knowledge. Cultural ideas follow this often.

Example: Barber shops are where you get your hair cut. (Not always; some are also brothels for example.)

Null - I refuse to answer the question or I don't know.

Example: Q: What is the smallest atomic unit in the universe? A: I don't know. (Even shades of this like, smaller than a truck or smaller than an atom could fall into this category).

I am wondering, what other ways do people "assign" the idea of "right" (as in correct) to a statement. Do you find that you use multiple of these, and which of these strategies are the most helpful? Care to add to the list?

One last question I have in mind with this topic that I think Scott Adams hit reasonably well with BOCTAOE (But Of Course There Are Obvious Exceptions). How much rigour do we need in any given statement to be clear? Does the audience matter? Does the % accuracy matter? "Login works" might be right, but not right for all customers (or under all loads). If I had to mention all exclusions of rightness in all my statements, I might end up needing a EULA just to let someone hear me (or read my works). Last but not least, is our inability to recall the exact truth a just plain human flaw, making the concept of being 'right' really null in and of itself. Should right be really "within an order of magnitude of the 'truth'"?

I'm not sure I know the answers to these questions, but I think there is a high probability that I have an opinion which maybe within an order of magnitude of the truth, but for now I'm going to say I don't know.