About 98 Percent Done: Automation

Showing posts with label Automation. Show all posts

Wednesday, July 15, 2015

My Many Years in Testing and Automation Development

I have heard the question expressed in multiple different ways. "How do I get into QA?" "How do I progress my career?" "How do I level up my skill?" "You read whitepapers?!?" "What type of automation should I write?" I have written various articles around this broad range of topics, but I wanted to take a slightly different tact around one of the implied questions. The implied question I'm considering is how do I continue down the path from where I am to become a better tester and what skills should I spend time working on improving? Instead of trying to explore the topic in a general sense, I am going to go through a few interesting experiences I have had in both "Testing" and "Automation Development". While I will be talking about automation, I do not mean to imply that is where you should spend your time learning, just that I will discuss how I made the choice. I will try to concentrate on interesting stories and lessons learned.

I have been writing code for nearly half my life, but I started working in the development of test automation somewhere between 2004 and 2005. I started professionally testing in 2002, although I had been writing my own code and testing it earlier. I also did some testing for others before 2002, but I was not paid for that work. I will try to describe the companies and context a little, but I will not use company names, to protect the innocent. But first, a small introduction to perhaps explain how early on I was interested in QA.

When I started my life in QA, I was roughly 6 years old and curious about this clock in a video game called "The Last Ninja" for the NES. It kept a timer of how long you had been playing the game. It gave hours, minutes and seconds. There was a place where no bad guys were with a stream and a park bench. It was perfect for leaving my ninja to sit and just enjoy the scenery. In only 99 hours 59 minutes and 59 seconds I could find out what happened when the clock ran out. I knew Sonic would run off the screen after 3 minutes, so I conducted some smaller tests. In my smaller tests, I discovered, unlike Sonic, The Last Ninja did not seem to mind waiting for short periods of time, so it seemed easy enough to test the longer case. The only problem was I didn't want to lose my Nintendo for a week while I waited for the result. So when I went on vacation, I left my system running. When I came back, what I found was "0A" and the game was frozen up. Now, many years later, I know 0A is hex for 10, likely meaning the clock had run over to hex and then the game had crashed. I didn't understand the result, but I was curious about it. I was later grounded for leaving my system on for that long. You've been warned kids!

Years later, I was looking to get a job out of High School. I had been programming text adventures for some years, but never had a 'real' programming job. I enjoyed programming, I enjoyed the experience of creating, but still didn't feel like I knew that much. Still I kept at it. I had worked at the Post Office for 3 weeks in a Christmas rush season, but had little other professional work under my belt. I was going to school and studying Computer Science, but that was going to take years and I needed money sooner. So I started looking for a computer job. I interviewed once, but didn't know what I was doing and did not get the job. College and High School don't seriously prepare you for interviewing. In my second job interview, I took a computerized test and passed all but one question (I tried typing in a search but they expected me to use the dropdown... No, I'm not still bitter.).

I got the job and started working as a managed contractor for a overly large company who worked on specialized hardware. I was a tester who did everything from competitive intel to stress testing to environment testing to configuration testing. However, I had no training and really no idea that testing was a general skill set. I incidentally learned a few simple QA skills, but nearly all of the learning I did was around social skills in my first year. Perhaps one the most important skills I learned was to apologize is important and doubly so when you are a manager and have made a mistake. I moved to a more complex set of testing, but it was a script based. I learned more about networking and security testing. I almost got a job in a team that was inspired by James Bach's work around exploratory testing (in one of the places that James and his brother did some of the development around the idea of exploratory testing), but I missed out to someone with a lot more experience.

Then, in a sad set of events a former lead who changed shifts passed away suddenly. I took his place, an awkward thing for sure. I learned about this idea that you could 'automate' tests. I began to apply my existing programming skills in order to test various UI elements. It worked well in some cases, but we struggled to handle descriptions (a QTP phrase for describing a way of IDing elements in a UI system). A co-worker of mine had more difficulty picking up the programming as his specialty was in hardware, not software. He came into work more than once complaining of nightmares around programming. My coworker did eventually get better at automation, but that was never his strength (Aside: He's now a big-wig manager, likely making way more than me!).

I bounced around from place to place, with a rough total of nine different jobs (not all jobs were title changes). One thing looking back is how I was always trying to go meta, not just writing good automation but improving the entire process. In a large company, that can be very difficult. I ultimately had a conflict with one of the managers and felt the need to leave. Before I left, I was told to quit trying to improve the organization (by providing useful tools) but instead work on personal projects! I did and learned SQL and web development while looking for a job. I also had multiple open source projects under my belt by this point.

If a big company didn't work, I reasoned that a smaller company would be better! I applied to one company and studied up. Rather than applying to many companies, I reasoned it was better to be prepared. This from having done so poorly years ago seemed to work reasonably well. I got a job offer. I tried to negotiate for more money, but was declined. I declined the job but was later called back and given a slightly higher offer. I concluded that because of the recession that a much bigger offer was not going to happen.

I took a job at a 50 person company, and within 2 weeks of me joining the QA manager had quit. This left the QA group answering to the VP of Engineering and it remained that way for nearly 2 years. I worked for longer hours at this company than anywhere else. I lost another co-worker who I was training, and whom I may have been the last person he talked to. He was hit by a bus. I learned the 'hit by a bus' scenario is both real and the worry about how the business knowledge loss will be much less impactful than the emotional impact from the loss. I learned what the term 'death march' meant as I worked 80+ hour work weeks. I learned how a recession can affect you in spite of the fact that your field is in demand. I learned the dangers of saying that you won't ship with any [known] bugs. I learned how working more hours did not equal more productivity. I wrote nearly 1300 automated UI tests. I found the most bugs at the company, in part because of the automation I wrote, and because all bugs must be fixed, it must logically be concluded I found the most important ones. (Yes, that is sarcasm.) Yet, I was not promoted nor paid more for nearly the entire 3 years I was there. Only after 50% turn over and the firing of upper management did things change even a little, and by then I was too tired to care. I left. In case you want more details, Isaac also detailed his own personal journey at this same company.

Now we are getting into more recent events. To be perfectly honest, I don't plan on telling anything more about the various company cultures because of how it might affect me. However, I still learned social and technical lessons.

I learned that 'manual' testers could appreciate automation and automation results. I, for the first time, got to work with other people with equal or better coding skills than myself as we developed automation. I got to learn how to play a technical leader, even when I was not promoted into a leadership position nor had any authority over others. While I had dealt with databases and did a little database testing before, I got to do development in which nearly all the code was around database usage, including communication between database systems. For the first time I had to deal with the integration of multiple team's automation efforts which were written in different languages. By this point in my career I had written test code in C, Java, JavaScript, SQL (various flavors), C#, J#, VB.NET and VB Script.

On the social front, I got to work with both one of the best teams and one of the most difficult teams I ever worked with. In dealing with the difficult team, I got to see how personality clashes worked and how to work around them. To be fair, I have my own personality quirks too, but when these hit extremes it is imperative that you figure out how to be a positive force for the team rather than yet another point of contention. Perhaps one of the biggest lessons I learned was that it is important to have a solid social connection and understanding of an individual before giving a completely honest evaluation of someone's skill set. In fact, I took the lesson so much to heart that I try to be more careful with my day-to-day speech as a whole.

Finally, I want to describe a little bit of my attempts at professional growth. I spent a good amount of time embracing various avenues for learning. I took BBST, and while I only took the class for the first BBST course, I watched all the videos and read all the material. I have written on this blog for a few years now. I have spent time giving talks at various venues. I have written letters, written for other professional venues, testing non-work apps and taken serious amounts of time trying to understand the systems under which work is performed.

So to sum this all up, I think I have just two simple suggestion for you:

Learn

Making mistakes is fine. Just learn from them. Learn how you learn.

Don't just do work

We all are making a journey, and at some point it will stop. Making an impact is important. But don't make work the only impact in life.

Tuesday, July 7, 2015

Autonomation: Old is New Again; Toyota and Lean

I came upon the word Autonomation recently and felt it was interesting enough to bring it up. The concept comes from Toyota as they were developing Lean Manufacturing in the 1990s. The primary goal for Lean Manufacturing is to eliminate waste and thus improve production and quality. Autonomation is also referred to as jidoka in Toyota's TPS process. Autonomation or jidoka is one of the layers in Lean Manufacturing, meant to trap failures rather than produce results. The earliest example Toyota notes is from 1924.

In 1896, Sakichi Toyoda invented Japan's first self-powered loom called the "Toyoda Power Loom." Subsequently, he incorporated numerous revolutionary inventions into his looms, including the weft-breakage automatic stopping device (which automatically stopped the loom when a thread breakage was detected), the warp supply device and the automatic shuttle changer. Then, in 1924, Sakichi invented the world's first automatic loom, called the "Type-G Toyoda Automatic Loom (with non-stop shuttle-change motion)" which could change shuttles without stopping operation. The Toyota term "jido" is applied to a machine with a built-in device for making judgments, whereas the regular Japanese term "jido" (automation) is simply applied to a machine that moves on its own. Jidoka refers to "automation with a human touch," as opposed to a machine that simply moves under the monitoring and supervision of an operator. Since the loom stopped when a problem arose, no defective products were produced. This meant that a single operator could be put in charge of numerous looms, resulting in a tremendous improvement in productivity. - http://www.toyota-global.com/company/vision_philosophy/toyota_production_system/jidoka.html

The term Autonomation feels similar to the term I coined perviously, Manumation. In the wiki article on Autonomation, I found it interesting that Shigeo Shingo claimed there were 23 stages between manual and fully automated processes. Unfortunately, there is no citation for where that claim was made and while I saw others repeat the claim, no one had any citation nor data on the stages that I could find. In my mind, Autonomation is just one form of Manumation. However, it is also an attitude. You don't have to try to fully automate something on your first attempt to create automation. The idea that you set your automation up knowing it will fail, that it will need humans but don't bother the humans until it fails and that the failure is easily traceable and fixable. Also, it means attempting to fail quickly rather than generating a bunch of waste work.

Ultimately, automation of any sort is meant to help people. If it helps you get work done, even if it requires a human touch, it is worth considering. What sort of autonomation have you done?

Thursday, April 9, 2015

The Future of Load Testing

First of all, I won't pretend I actually know what the future of Load Testing will look like, but I want to describe some of the different ideas I have seen and done. Some of these things I have not seen or heard of by anyone else, certainly not on the internet. So hopefully these will expand your thinking around Load Testing.

What is the Purpose of Load Testing?

Functional testing is designed to exercise the functionality of the system. Frequently people talk about using Selenium or QTP to exercise a particular piece of functionality, often in the UI. With the Test Pyramid it is suggested that these sorts of tests should try to hit below the UI level for various reasons. No matter, if you test an API, a UI, Console App or some other hook below the UI, if your concern is around the Testing Pyramid, it is likely you are trying to test the functionality. You are often interested in large sets of behaviors and the way the system responds. When you do Load Testing, in most cases, few of the broad tests are as important. You are not trying to see if the system will handle all the corner cases and often Load Tests don't even check anything besides a response code. The raison d'etre for a Load Test on the other hand often has less to do with how the system functions but in ways the system reacts to many different inputs occurring in a relatively quick fashion. Granted some Load Tests are less about the number of inputs and more about the style of the input (E.G. large files) or other types of constraints (e.g. with less RAM). To sum it up in a general statement, Wikipedia describes it this way:

Load testing is the process of putting demand on a system or device and measuring its response. - http://en.wikipedia.org/wiki/Load_testing

However, the majority of Load Tests simply want to understand how many inputs a system can handle given a certain profile of inputs.

All of that sounds rather abstract, but if you go reading Microsoft's very hand guide on types of performance tests, you will see that the underlying purpose of this sort of testing varies. They use terms like load test, performance test and stress test for different meanings. I think all of this data is really useful and valid. I however, am going to use Load Test (in capitals) to describe any number of different purposes. Instead, what I want to look at is what sorts of ways we can organize our testing to make for a better long term experience. These ideas could be applied to many of the various purposes of Load Testing, so assuming you understand your Load Testing goals you can tailor these ideas to your organization.

35-50% of the Internet Traffic

The first idea I think worth exploring is the Netflix model of Load Testing. Effectively, trying to have a production-like QA system is silly for Netflix because that would be like having a second set of the internet for QA. In fact, if you were like Netflix, you would then have to have an insane number of additional systems to generate a load anything like your customers... or you could just have production traffic mirrored between the two systems. Having a second prod-like environment is of course not going to work. So they came up with a radical set of strategies, but I think this sums it up nicely:

We have found that the best defense against major unexpected failures is to fail often. By frequently causing failures, we force our services to be built in a way that is more resilient. - http://techblog.netflix.com/2012/07/chaos-monkey-released-into-wild.html

The best method they found was basically to attack production and see how it would respond. This however still leaves the question of how do they specifically test their code will handle new loads. While they do some Load Testing, their biggest defenses are the scaled down production traffic mirrors they run, the ability to go back to old code and the fact that AWS allows you to spin up new instances all the time. In effect, they turn the problem on its head, but this really only is helpful if you can use AWS to grow quickly and if your traffic is relatively uniform. Also, at Netflix scale, you can hire a lot of engineering talent to build this up.

The Limits of Load Test Systems

When trying to leverage user traffic doesn't work, you have to start looking for other options. Using something like JMeter is interesting. I call out JMeter because that is what Netflix uses above and beyond their production traffic mirror. A tool like JMeter takes traffic from a proxy and feeds it into a script (I guess you could hand code it if you are crazy). Then you edit the script and parameterize it. You run the script over and over again using multiple threads and try to create load. These tools might instrument the systems under test or you might have to do that yourself. In either case, the data gathered is outputted and left for some poor soul to try to understand. After having been in this position several times, let me say that it truly is difficult to understand these results, in particular because you had to use record-playback from a proxy. Just as it is a bad idea to use record-playback in automating your functional tests, I think it is a bad idea to do so with such tools.

In one of my former companies, they had one specialist whose only job was the deal with these proxy-recorded scripts. They are a mess and I'm not going to pretend I know how to fix them. However, I do have a few ideas and all of them involve your already created friends, the functional tests.

Functional Tests == Load Tests: With or Without UI

When you have a functional test, you might have a complex setup and tear down. However, the test itself is often fairly simple. There are two major ways you can create functional tests, one is using a UI and the other is to hit just below the UI, perhaps at an API level. So logically you can do two things to create load. One is to scale up your UI tests. I have seen this done and know of others who have tried to do it. It is a fairly big engineering feat to create a Load Test using Selenium, but I know it can be done. Be warned, this can be very expensive, as it requires one OS per hand full of threads you want to run plus overhead for selenium hub nodes. The other option is to use your existing API tests and create a Load Test overtop of that. You might have to simplify the data creation and you might have to remove the validation if those take too long, but this is a very easy method of Load Testing the system. I have personally built several Load Tests around this idea.

Now we have talked for years about how functional tests should be run in a CI environment. If you are writing your Load Tests like you write your functional tests, using the same systems, then why not run your Load Tests nightly? Obviously there are some questions you want to ask up front, like if someone will be alerted because of it or what impact that has on your functional tests. Another question you might consider is what sort of load do you want? If you need to actively watch to make sure the system stays up, a nightly Load Test would not make much sense. On the other hand, what if you did a small load for a short period of time? You could capture the resulting data and plot it on a chart. This way as you gathered more nightly data, you would have a rough understanding of what you expect. Now, you aren't running a Load Test once a sprint with little idea of what changes might cause impact, with specialized scripts that take a lot of effort to maintain. Instead, you have a trend line and will notice changes. Now this won't tell you when you will fall over or some other data points, but it does give you a change detector. Furthermore, when you have to fix your functional test, your fix automatically goes into the Load Test.

The next piece is you can run multiple 'threads' at once, not all of which are load related. If your Load Test can't do validation, while running the Load Test you can run some functional tests to see if the system still appears to function. Since this is a job in your CI manager, it should be easy to kick off. You can even manually test while your Load Test runs.

Eventually, you might realize that your functional and load tests only vary in how much load there is and how complex the setup/validation is. You might realize, like the Load Test, you can metrickitize the functional tests so that you see if a functional test start taking longer.

Load Test? What is that?

If you notice my description, my efforts are to make the code base easier to maintain and have as much data as possible. This ultimately makes your Load Testing efforts and your functional efforts look very similar. They live in the same code base, call the same functions, effectively the concepts merge. The only differences are in the design, setup, cleanup and how heavy the validation is. I do think there is value in having different words to describe the intent, but merging the code allows you to get more done with less resources. One of the biggest advantages I have personally seen is that I actually understand what the test does, where as when I was using tools like JMeter, I often had no idea how it worked. I have learned a ton about HTTP and HTTPS just by building my own tools. Now not everyone has time for that and I think ultimately we will want tooling to help make this easier. However, I am not sure if the costs of having a different tech stack and code base is worth the value the current tools provide, so you might have to make your own for now.

If you have found your Load Testing tools are working well today, then feel free to ignore this. I know not everyone has the same needs as we have. I know that the tools we have today do serve some purposes, but my experience hints that they frequently are as much trouble as the value they add.

"In the Year 2000"

- Conan O'Brien, et al.

In trying to predict the future it is really difficult to say what will or will not happen. Conan O'Brien has been predicting what will happen in the year 2000 for more than 15 years, but unlike him, I have not had the benefit of seeing the future. With that said, I suspect that in the future we will see more machine learning style systems that take our Load Test data and create load profiles based upon real time data. Such a system would be able to adaptively adjust based upon metrics in the systems under test and would also detect what changes happened and what code appears to have caused a particular slowdown. It will correlate this and help find what is causing systems to fail.

I also suspect that we will have a better set of load profiles that can push or break systems. Load Test systems of the future might go through those profiles on a daily or build-basis and inform you when a build/day has uncharacteristic results, again based upon some form of machine learning. It will start looking a lot more like functional tests which you only examine when something strange occurs or when you are auditing your tests.

Obviously all of this takes some fair amount of work and effort to produce. We presently don't have the tooling to do this and while bits and pieces have been worked at, I have heard of no one actually doing this.

What sorts of things would you like to see in future load testing systems? What areas have you struggled with? I have found very little experiential data around load testing. I'd love to hear from others on this topic, even if you just dump a link to your blog post.

Friday, June 27, 2014

My Current Test Framework: Testing large datasets

I recently wrote how I felt few people talk about their framework design in any detail. I feel this is a shame, and should be corrected as soon as possible. Unfortunately, most companies don't allow software, including testing automation to be released into the public. So most of the code we see is from consultants with companies occasionally okaying something. In my case, I did something like a clean room implementation of my code. It is much simplified, does not demonstrate anything I did for my company nor does it directly reference them. It is open source and free to use. Without further delay, here is the link: https://github.com/jc-d/Demos.

It is in Java and was intended as part of a demo for a 2 hour presentation. Because of the complexity of the system, I'll write some notes about it here. I used IntelliJ to develop this code and recommend using it to view the code. It does use Maven for the libraries, including TestNG which you can run from IntelliJ. Many of the concepts could be translated to C# with little difficulty.

So what does it do? It demonstrates a few different, simple examples of reflections and then a build up of methods for generating test data in reflective and possibly smarter fashion (depending on context). I'm sure your sick of hearing about reflections from me, so I'll try to make this my last talk on them for a while, unless I come up with something new and clever.

As a brief aside, Isaac claims that while this is a valiant attempt to write out a code walkthrough, it really needs to be a video or audio recording. Perhaps so, but I don't want to devote the time unless people want it or will find it useful. As I have my doubts, I'm going to let the text stand and see if I get anyone requesting a video. If I do maybe I'll put some time into it. Maybe. :)

Now on to the code...

First the simple examples which I do use in my framework but not as simple as this. The two simple examples are DebugData and ExampleOfList and both are under test/java/SimpleReflections. DebugData shows how you can use reflections to print out a simple object one level. down. It 'toStrings' each field in the object given. Obviously if you wanted sub-fields that would take more complex code, but this is often useful. ExampleOfList takes a list of string and runs a method on each item on the list and returns back the modified list. Obviously this could be any command, but for simplicity of the demo I limited it to methods that did not take arguments.

Now all the rest of the code is around different methods for generating data. I will briefly describe each of them and if you want to you can review the code.

The HardcodedNaiveApproach is where all of the values are hard coded by using quoted strings. E.G. x.setValue("Hard coded"); This is a good method for 1-3 tests but if you need more you probably don't want to copy and paste that data. It is hard to maintain, so you might go to the HardcodedSmarterApproach. This method uses functions to return objects with static data so you can follow the DRY principle. However, all the data is the same each time. So you add some random value, maybe append it to the end. The problem is what are your equivalent class values? For example, do you want to generate all Unicode characters? What about the error and 'null' characters? Are negative numbers equally valid to positive numbers. If not, then your methods are less DRY than you might want as you will need different methods for each boundary, if that matters. Also you are writing setters for each value, which might fail when a new property is added. We haven't even talked about validation yet, which would require custom validators based upon the success/failure criteria generated by the functions you write. That is to say if you generate a negative number and it should fail for that, not only does your generator have to handle that but your validator does as well. What to do?

Perhaps reflections could help solve these problems? The ReflectiveNaiveApproach instead uses the typing system to determine what to generate for each field in a given class. An integer would generate a random integer and a string would generate a random string. We know the field name and class type so we could add if statements for each field/type but that puts in the same maintenance of new properties we had with the hard coded approaches. If we didn't do that we could still handle new properties assuming we knew how to set the type, but it might not fit the rules of the business logic and we have no way to know if it should work or not. For fuzz testing this is alright, but not functional testing. Is there any solutions? Maybe.

The final answer I currently have is the ReflectiveSmarterApproach. In effect when you need to generate lots of different data for lots of different fields, you need to have custom generators per class of fields. What is needed is an annotation for each field telling it what needs to be generated. An example of that can be found in the Address class. Here is a partial example of this:

public class Address {
 @FieldData(dataGenerators = AverageSizedStringGenerator.class)
 private String name;
 @FieldData(dataGenerators = AddressGenerator.class)
 private String address1;
 //...
}

Now let us look at an example generator:

public class AddressGenerator extends GenericGenerator {
 @Override
 public List<dynamicdata> generateFields() {
  List<dynamicdata> fields = new ArrayList<dynamicdata>();
  fields.add(new DynamicData(RandomString.randomAddress1(), "Address", DynamicDataMetaData.PositiveTest));
  fields.add(new DynamicData("", "Empty",
   new DynamicDataMetaData[] {DynamicDataMetaData.NegativeTest, DynamicDataMetaData.EmptyValue}).
   setErrorClass(InvalidDataError.class));

  return fields;
 }
}

This generator generates a random address as well as an empty address. It is clear one of these addresses is valid while the empty address appears to be a negative test.

Through the power of reflections you can do something like this:

List<DynamicDataMetaData> exclude = new ArrayList<DynamicDataMetaData>();
exclude.add(DynamicDataMetaData.NegativeTest);
ReflectiveData<Address> shippingAddress = new CreateInstanceOfData<Address>().setObject(new Address(), exclude);

The exclude piece is where you might filter out generating certain values. Say you want to only do positive (as in expected to be successful) tests, you might filter out the negative tests (those that expect not to complete the task and possibly cause an error). The third line generates you an object with all the properties that have the attached annotation and values. Now this does not handle new fields automatically but it could certainly be designed to error out if it found any un-annotated fields (it is not at present designed to do this) and if you embed the code in your production code, it would be more obvious to the developer they need to add a generator.

Now how do we pick which value to test when we could test with the empty address or a real address? At present it picks it randomly because according to James Bach, random only takes roughly 2x to get equal coverage to pair wise testing. Since we know all about the reason for generating a particular value (what error it would cause, etc.) we can at run time say how the validation should occur. The example validation is probably a bit complex, but I was running out of time and got a bit slap-dash on that part. One issue with this method is it is hard to know what your coverage is. You can serialize the objects for later examination and even create statistical models around what you generated if needed.

Summary

Obviously this is a somewhat heavy framework for generating say 20 test data values. But when you have a much larger search space that approaches infinite this is a really valuable tool. I have generated as many as 50 properties/fields about 400,000 times in a 24 hour period. That is to say, generating roughly 400,000 tests. I have found bugs that even with our generator would only be seen 1 : 40,000 times and would probably have never been found in manual testing (but would likely be seen in production). The version I use at work has more than a years worth of development and research, supporting a lot more complexity than exists in this example, but I also don't think it could be easily be adapted as it was built around our particular problems.

This simple version can be made to support other environments with relatively little code modification. It took much of the research and ideas I had and implemented in a simpler fashion which is more flexible. You should easily be able to hook up your own class, create annotations and generates and have tests being generated within a day (once you understand how). On the other hand it might take a little longer to figure out how to do the validation as that can be tricky.

One problem I have with what I have generated is there is no word or phrase to describe it. In some sense it is designed to create exploratory data. In another sense it is a little like model driven testing in that it generates data, has an understanding of what state it should go to and a method to validate it went to the correct state. However it doesn't traverse multiple states and isn't designed like a traditional MDT system. Data Driven Testing describes a method for testing using static data from a source like a csv or database. While similar, this creates dynamic tests that no tester may have imagined. Like combinatorics, this creates combinations of values, but unlike pairwise testing, the goal isn't just generating the combinations (which can be impossible/unpractical to enumerate) but to generate almost innumerable values and pick a few to test with, while enforcing organization of your test data. This method also encourages usage of ideas like random values while combinatorics is designed to have a more static set of values. Yes you can make combinatorial ideas works with non-static sets, but it requires more abstraction (E.G. Create a combination of Alpha, Alpha-numeric, ... and this set of Payment method, now use the string type to choose what generate you use) and complexity. Finally combinatoric methods can have difficulties when you have too many variables, depending on implementation. This is a strange hybrid of multiple different techniques. I suppose that means it is up to me to try to name it. Let's call it: ~~JCD's awesome code~~. Reflective Test Data Model Generation

I would say don't expect any major changes/additions to the design unless I start hearing people using it and needing support. That being said I love feedback, both positive and negative.

While researching for this article I came across this which is cool, but I found no good place to cite it. So here is a random freebie: http://en.wikipedia.org/wiki/Curse_of_dimensionality

Thursday, June 19, 2014

What is the Highest Level of Skill in Automation?

Thanks to Robert Sabourin for generating this topic. Rob asked me roughly, 'What in your opinion is the highest level of skill in automation?' He asked this to me in the airport after WHOSE had ended, while we waited for our planes. It gave me pause in considering the skills I have learned and help generate this post.

Let me make clear a few possible issues and assumptions regarding what the highest level of skill is in automation. First of all, I think that there is an assumption of pure hierarchy, which may not exist. That is to say, there might not be a 'top' skill at all or the top skill might vary by context. So I really am mostly speaking from a personal level and with my own personal set of automation problems I have faced. When I answered Rob's question in person, I neglected to add that stipulation. The other possible concern is that the answer I give is overloaded, and so I will have to work on describing the details after I give the short answer. Without making you wait, here is my rough answer: Reflections.

What are reflections?

In speaking of reflections, you might assume I am speaking of the technology, and for good reason. I have spoken on them many times in this blog. However, that is just a technical trick, albeit a useful one. I am not talking about that trick, even if the comp-science term 'reflections' is part of the answer. In speaking of reflections, I mean something much broader.

There is the famous "thinker" sitting on his rock just pondering is much closer to what I had in mind. But you might say, "Wait, isn't that human thinking? Isn't that critical thinking or introspection?" Yes, yes it is. What I mean by reflections is the art form of making a computer think. While a computer's intelligence is not exactly human intelligence, the closer we approach that vast gulf, the closer we are to generating better automation.

Most people might start to argue that requires someone with in depth knowledge or artificial intelligence or at least a degree in computer science or someone with a development oriented background. Perhaps that is the logical conclusion we will ultimately see in the automation field, but I don't think that either an in depth knowledge of development or AI is required for now. I know that you need to go to that level to start understanding this concept.

Instead, I think you need to start thinking of the automation in the way you think about writing tests. In some ways this relates to test design. Why can't the automation ask what am I missing? Why can't my automation tell me what the most likely reason a failure occurred*? Why can't the automation work around failures*? Or at the very least, ignore some failures so it isn't blocked by the first issue it runs into*?

* I've done some work around these, so don't say they are impossible.

Now that I have walked around the definition, let me define the reflections in context of this article.

Reflections: Developing new ideas based upon what is already know.

An example

A good example for the need for reflections is the brilliant talk given by Vishal Chowdhary, in which he notes that in translations (and searches, etc), you can't know what the correct answer is. You have no Oracle to determine if the results are correct. Many words could be chosen for a translation and it is hard to predict which ones are the 'best'. Since computer language translations are adaptive, you can't just write "Assert.Equals(translation, expectedWord)" with hardcoded values. Since these values are dynamic, the best you can do is to use a "degree of closeness". You see, they couldn't predict how the translation service would work because it has dynamic data and the world changes quickly, including new words, proper titles, et cetera.

So how do you test with this? Well you can look at the rate of change between translations. You can translate a sentence, translate it back and record how close it was to the original sentence. Now track how close it is over time, with different code and data changes. You could take translation string lengths and see how they vary over time and note when large deviations occur. There are lots of methods to validate a translation, but most of them require the code to reflect on past results, known sentences and the likes. The automation 'thinks' about its past, and on that basis judges the current results.

Not to say some automation shouldn't be reflective. For example you could hard code a sentence with "Bill Clinton" in it and check to make sure that it didn't in fact translate his name. You could translate a number and check to see it didn't change the value. You might translate a web page and check something not related to the translation such as layout.

Not just the code

In reading my blog you might assume that because I specialize in automation I think reflections is a code-oriented activity. I do think that, but I think it applies more broadly. When I write a test, I should be reflecting on that activity. That is to say I should be thinking "Is that really the best design?", "Should I be copying and pasting?", "Should I really be automating this?", etc. In always having part of my brain reflecting on the code, I too am write better code. Hopefully between my writing better code and my code trying to do better testing using reflections, we do better testing overall. This also applies to testing in general, with considerations around things like "That doesn't look like the rest of the UI." or "I don't recall that button there in the last build."

I have only scratched the surface of this topic and made it more specifically apply to automation/testing, but I think this applies to life too. For a more broad look at this topic I would highly recommend Steve Yegge's blog post Gödel-Escher-Blog. It will make you smarter. Then next time you go do some automation, reflect upon these ideas. :) And if you are feeling really adventurous, please put a comment about your reflections on this article here.

Thursday, January 16, 2014

Why can't anyone talk about frameworks?

In writing for WHOSE, I was dismayed at the total lack of valuable information regarding automation frameworks and developing them. I could find some work on the frameworks with names (data driven, model driven and keyword driven), but almost nothing on how to design a framework. I get that few people can claim to have written 5-10 frameworks like I have, but why is it we are stuck with only these 3 types of frameworks?

Let me define my terms a little (I feel like a word of the week might show up sometime soon for this). An architecture is a concept, the boxes you write on a board that are connected by lines, the UML diagram or the concepts locked in someone's head. Architecture never exists outside of the stuff of designs and isn't tied to anything, like a particular tool. Frameworks on the other hand have real stuff behind them. They have code, they do things. They still aren't the tests, but they are the pieces that assist the test and are called by the test. A test results datastore is framework, a file reading utility is framework, but the test along with its steps is not part of the framework.

Now let me talk about a few framework ideas I have had for the past 10 years. Some of them are old and some are relatively recent. I am going to pull from some of my presentations of old, but the ideas have at least been useful for one framework of mine, if not more.

Magic-Words

I'm sure I'm not the first one to come to this realization, but I have found no records of other automation engineers speaking of this before me. I have heard the term DSL (Domain Specific Language) which I think is generally too tied to Keyword-driven testing, but a close and reasonable label. The concept is to use the compiler and auto complete to assist in your writing of the framework. Some people like the keyword driven frameworks, but in my past experience, they don't give compile time checking nor do they help you via auto complete. So I write code using a few magic words. Example: Test.Steps.*, UI.Page.*, DBTest.Data, etc. These few words are all organizational and allow for a new user to 'discover' the functionality of the automation. It also forces your automation to separate out the testing from the framework. A simple example of that can be given:

@Test()
public void aTestOfGoogleSearch() {
 Test.Browser.OpenBrowser("www.google.com");
 Test.Steps.GoogleHome.Search("test");
 Test.Steps.GoogleSearch.VerifySearch("test");
}

//Example of how Test might work in C#, in Java it would have to be a method.
public class TestBase { //All tests inherit this
  private TestFramework test = new TestFramework();
  public TestFramework Test { get { return test; } }
}

Clearly the steps are somewhere else while the test is local to what you can see. The "Test.*" provides access to all the functionality and is the key to discoverability.

Reflection-Oriented Data Generation

I have spoken of reflections a lot, and I think reflections are a wonderful tool for solving data-generation style problems. Using annotations/attributes to tell each piece of data how to generate, what sorts of expectations there are (success, failure with exception x, etc.), filter the values you allow to generate and then picking a value and testing with it is great. I have a talk later this year where I will go in depth on the subject and I hope to have a solid code example to show. I will certainly post that up when I have it, but for now I will hold off on that.

...

Okay, fine, I'll give you a little preview of what it would look like (using Java):

public class Address {

 @FieldData(classes=NameGenerator.class)
 private String Name;
 @FieldData(classes=StateGenerator.class)
 private String State;
 //...

}
public class NameGenerator {

  public List<Data> Generate() {
   List<Data> d = new ArrayList<Data>();
   d.add(new Data("Joe", TestDetails.Positive);
   d.add(new Data(RandomString.Unicode(10),  {TestDetails.Unicode, TestDetails.Negative));//Assume we don't support Unicode, shame on us.
   //TODO More test data to be added
   return d;
  }

}

Details

Why is it that we as engineers who love the details fail to talk about them? I get that we have time limits and I don't want to write a book for every blog post, but rarely do I see anyone outside of James McCaffrey and sometimes Doug Hoffman talk on the details. Even if you don't have a framework, or a huge set of code, why can't you talk about your minor innovations? I come up with new and awesome ideas once in a while, but I come up with lots of little innovations all the time.

Let me give one example and maybe that will get your brain thinking. Maybe you'll write a little blog on the idea and even link to it in the comments. I once helped write a framework piece with my awesome co-author, Jeremy Reeder, to figure out the most likely reason a test would fail. How?

Well we took all the attributes we knew, mostly via reflections of the test and put them into a big bag. We knew all the words used in the test name, all the parameters passed in, the failures in the test, etc. We would look at all the failing tests and see which ones had similar attributes. Then we looked at the passing tests and looked to see which pieces of evidence could 'disprove' the likeliness of a cause.

For example, say 10 tests failed. All 10 involving a Brazilian page. 7 of those touched checkout and 5 of those ordered an item. We would assume that the Brazilian language is the flaw if all tests failed, as that might be the most common issue. However, if we had passing tests involving Brazilian, then that seems less likely, so we would see if we could at least establish if all checkout failures had no passing tests involving checkout. If none had, we would say there was a good chance that checkout was broken and notify manual testers to investigate that part of the system first. It worked really well and solved a lot of bugs quickly.

I do admit I am skipping some of the details in this example, like we did consider variables in concert, like Brazilian tests that involved checkout might be considered together rather than just as separate variables, but I hope this is enough that if you wanted to you could build your own solution.

Now your turn. Talk about your framework triumphs. Blog about them and if you want to put a link in the comments.

Monday, November 18, 2013

Word of the Week: Oracle

Oracle Test Definitions

Thanks to:
Isaac Howard and Wayne J. Earl who had a great deal to do with the editing and formulation of this article.

Like my previous word of the week on Heuristics and Algorithms, this is a complicated one. According to wiki,

An oracle is a mechanism used by software testers and software engineers for determining whether a test has passed or failed.

According to the wiki citation, this comes from BBST, which has some thoughts about what an Oracle is or isn't. Specifically it talks a lot about Oracle Heuristics, an interesting combination that Bach roughly states as a way to get the right answer some of the time. I don't feel I have a problem with that, but then we go back into the BBST class and things get confusing. On Slide 92 of the 2010 BBST course, it says,

How can we know whether a program has passed or failed a test? Oracles are heuristics

Slide 94 says:

An oracle is a reference program. If you give the same inputs to the software under test and the oracle, you can tell whether the software under test passed by comparing its results to the oracle's.

Which later goes on to say that the definition is wrong. The slides does so because of the claim that Oracles are heuristics.

Classic Problems With The Definitions

But how can this be so if Oracles know all? They are the truth tellers. Well the problem in software is that Oracles are not absolute like in the stories. They give you an answer, but the answer might be wrong.

For example, you might test Excel and compare it to a calculator. You take the calculator and enter 2.1 * 1 getting back the value 2. Now perhaps the calculator is setup to provide integers, but when you compare it to Excel's output, you find that Excel gives back 2.1. This appears to be a failure in Excel, but in reality it is a configuration issue. The heuristic is in assuming your Oracle is right. This might of course be a false assumption, or it might be only right in some circumstances. Interestingly, one of the creators of BBST, Cem Kaner, has revised the definition of Oracle slightly in a posting about Oracles and automation,

...a software testing oracle is a tool that helps you decide whether the program passed your test.

Partial Oracle

While I don't want to get too far off track, I do want to note that Partial Oracles do exist, and from what I can tell, they are Oracles that tell you if the answer is even reasonable. For example, for two positive integers, you can say that when you add them together, you will get a larger number than either of the separate digits. 1+1=2. Thus 1<2. 3+4=7. Thus 4<7. In both cases the larger number is always smaller than the sum of the two numbers, for ANY two numbers.

New Questions

Let me chart out the idea of an Oracle:

Test: Tester runs test. Example: Login with valid user.

Result: Login takes 5 seconds and goes to a internal page.

Request for Data: Make a request out to the Oracle; Was 5 seconds too long?

Process: Oracle considers the answer.
Data: Oracle generates the answer: Yes, 5 seconds is too long.

Compare: Verify if Test's Result are acceptable.
Output: Test's Results are not acceptable.
React: Tester reacts to the result. Maybe they fail the test. Maybe they...

Now lets get picky. Is the monitor (the display) which is beaming information to you an Oracle? It shows the results you requested and is a tool. While I noted the parts that are the Oracle, who is this Oracle? If the Oracle is your mind, then what makes this different from testing?

My colleague Isaac noted that the requirements of a test in most definitions includes making observations and comparing them to expectations. For example, Elisabeth Hendrickson said,

Testing is a process of gathering information by making observations and comparing them to expectations.

Does this make an Oracle simply part of a test? Even to be able to come up with the question seems to indicate you might suspect an answer. Is this too long? Well, in asking that, one assumes you have a built in answer in your head. Perhaps you are wrong, but that is part of what an Oracle can be.

Alternatively, maybe an Oracle is an external source, thus it has to be outside of the "Tester". If that is the case, then can the Oracle be the System Under Test? Imagine testing using two browsers at the same time doing the above test and the login time has a large difference between browsers. Is the Oracle the browser or the SUT?

Lets take a different approach. Lets say you take screenshots after 4 seconds from logging in using an automated process. You compare the screenshot to the previous version of the result, pixel by pixel. If the pixels compare differently, the automation reports back a failure. Where is the Oracle? The request for data was to get a previous edition of the SUT in image form. No processing occurred, so the Oracle can't be the processing, but perhaps the image itself is the Oracle. Or is the Oracle the past edition of the site and the data? Continuing into this, the automation pulls out each pixel (is that the Oracle?) then compares them. But wait a minute... someone wrote the automation. That someone thought to ask the question about comparing the images. Are they the Oracle? Since the author is the Tester (checker, whatever) the first time, capturing the first set of images, they saw what was captured and thus became a future Oracle.

Even if the Oracle is always an external source, is it always a tool? Is a spec a tool? Is another person (say a manager) a tool? No, not that sort of tool. Is a list of valid zip codes a tool or just data?

In case you are wondering, many of the examples given are real with only slight changes to simplify.

How Others Understand Oracles - BBST

Perhaps you feel this detailed "What is it?" questioning is pedantic or irrelevant. Perhaps we all know what an Oracle is and my attempt to define it is just making a mess. In order to address that question, I am going to do something a little different from what I have done in the past. I'm going to open up my BBST experience a little, as I answered the question Kaner wrote about and then talk a little about it. To be clear, this answer has the majority of the content pulled out, as it could be used as an exam question in the future:

Imagine writing a program that allows you to feed commands and data to Microsoft Excel and to Open Office Calc. You have this program complete, and it’s working. You have been asked to test a new version of Calc and told to automate all of your testing. What oracles would you use and what types of information would you expect to get from each?

1. Excel (Windows) – Verify that the most common version of Excel works like Calc does. Are the calculations the same? Do the formulas work the same? If you save a Calc file can Excel open it? What about visa-versa?

a. I’m looking to see if they have similar calculations and similar functionality.

<snippet>

7. Stop watch – Does Calc perform the tasks at the same rough rate as Excel, Google docs? Is it a lot faster or a lot slower?

a. I’m looking to see if we are consistent with current user expectations of speed.

The responses I got were interesting in my peer reviews. Please note I removed all names and rewrote the wording per the rules in BBST. One person noted that item 7 would be categorized under performance and that item 1 was looking at consistency with comparable products. Multiple reviewers felt I was looking at consistency, a heuristic Bach created. What I find odd about that, is the need to label the Oracle when the Oracle (in my view at the time) was the tool, not the heuristic, therefore citing the heuristic of comparable products was not part of the question. I got a complaint that I was not testing Excel or Google but Calc, yet the call of the question is about how I would use those Oracles. One fair issue was, I should have noted I could have compared the old version to the new version using a stop watch, which I had missed. However, I had cited Old Calc in my full document, so I think that was a relatively minor issue.

Since Oracles are tools, how can I not be implicitly testing Excel? I kept hearing people say I should name my Oracles, yet to me I was naming them very clearly. I got into several debates about if something like Self Verifying Data is in fact an Oracle (even though the article clearly has its own opinion on that)! It seemed like everyone wanted to label the heuristic the Oracle, probably because of the "Heuristic Oracle" label in BBST. While I did feel BBST failed to make clear what an Oracle is, it did make me think about Oracles a lot more.

Wrapping Up

I'm sorry if that felt a little ranty, but by talking about this subject, I want you to also think about what you see as an Oracle. Oddly, Kaner himself cite's Doug Hoffman with items which I did not consider an Oracle (such as Self Verifying Data) when I started writing this article. I think Kaner's own work defends his view point, as he doesn't appear to use his own rule (his definition) to the letter but rather the similarity of the items, a method humans appear to use frequently.

Truth be told, I'm not so sure that Oracle should be a word we in the industry should be using at all. Isaac does not seem to believe in Oracles anymore, and appears to feel the definition is broken as it really cannot be separated from test. To me, I see that many people seem to use it and perhaps it can have value if we shrink down the role of the word. So let me wrap this up with my attempt to patch the definition into something that is usable.

Oracle: One or more imperfect external sources (often tools) that provide data or are data to assist in determining the success or failure of a test.

For a not exactly correct but useful working definition, an Oracle is the value that is the expected result.

What do you think a Oracle is? Are we missing critical details or other expert views? Feel free to write us a note in the comments!

Thursday, November 7, 2013

@Testing with [Reflections] Part II

If you haven't already, I suggest you read about reflections before reading too deeply into the topic of annotations. In case I failed to convince you or in case it didn't totally sink in, reflections as I see it is a way for code to 'think' about code. In this case we are only considering the "reflection" piece and not the "eval" piece that I spoke about previously. Reflections supports some pretty fascinating ideas and can be used in multiple different areas with different ways of doing so. Annotations (Java) or Attributes (C#) in comparison would be code having data about code which works hand in hand with reflections.

xUnit

Lets start with one of the most common examples people in test would be exposed to:

[Test] //Java: @Test()
public void TestSomeFeature() { /*... */}

First to be clear, the attribute (or commented out annotation) is on the first line. For clarities sake, for the rest of the article I'm going to say "annotation" to mean either attribute or annotation. It defines that the method is in fact a "Test" which can be run by xUnit (NUnit, JUnit, TestNG).

The annotation does not in fact change anything to do with the method. In fact, in theory xUnit could run ANY and ALL methods, including private methods in an entire binary blob (jar, dll, etc.) using reflections. In order to filter out the framework and helper methods, they use annotations, requiring them to be attached so they know which methods will run. This by itself is a very useful feature.

The Past Anew

Looking back at my example from my previous post, you can now see a few subtle changes which I have marked NEW:

class HighScore {
 String playerName;
 int score;
 @Exclude()//NEW
 Date created;
 int placement;
 String gameName;
 @CreateWith(Class = LevelNamer.class)//NEW
 String levelName;
 //...Who knows what else might belong here.
}

Again, the processing function could be subtly modified to support this change. The change is again marked NEW:

function testVariableSetup(Object class) {
for each variable in class.Variables {
 if(variable.containsAnnotation(Exclude.class)) //NEW
  continue;//NEW don't process the variable.
 if(variable.containsAnnotation(CreateWith.class)) { //NEW
  variable.value = CreateNewInstance(variable.getAnnotation(CreateWith).Class).Value();//NEW ; Create New Instance must return back the interface.
 }//NEW
 if(variable.type == String) then variable.value = RandomString();
 if(variable.type == Number) then variable.value = RandomNumber();
 if(variable.type == Date) then variable.value = RandomDate();
}

So what this code demonstrates is the ability to exclude fields in any given class by attaching some meta data to it, which any functionality can look at but doesn't have to. In the high score class we marked the created date variable as something we didn't want to set, maybe because the constructor sets the date and we want to check that first. The second thing we did was we set a class to create the levelName. The levelName might have a custom requirement that it follow a certain format. Having a random String would not do for this, so we created an annotation that takes in a class which will generate the value.

Now we could have a different custom annotation for each and every custom value type, but that would defeat the purpose of make this as generic as possible. Instead, we use a defined pattern which could apply to several different variables. For example, gameName also had to follow a pattern, but it was different from the levelName pattern. You could create another class called gameNamer and as long as it followed the same design (had a method called "Value()" that returned a string), you could just use the CreateWith(Class=X) annotation and they would act the same. This means you would not need to add another case in the testVariableSetup method or even change it. In Java and C# the mechanism for this is a common ancestor which can be either an interface or an abstract class. That is to say, they both inherit from the same abstract class or implement the same interface. For the sake of completeness and to help make this make sense, I have included an updated pseudo code examples below:

class HighScore {
 String playerName;
 int score;
 @Exclude()//NEW
 Date created;
 int placement;
 @CreateWith(Class = GameNamer.class)//NEW
 String gameName;
 @CreateWith(Class = LevelNamer.class)//NEW
 String levelName;
 //...Who knows what else might belong here.
}
// All NEW below:

class ICreateWith { String Value(); }
class GameNamer implements ICreateWith { String Value() { return "Game # " + RandomNumber(); }
class LevelNamer implements ICreateWith { String Value() { return "Level # " + RandomNumber(); }

//In some class
class some { ICreateWith CreateNewInstance(Class class) { return (ICreateWith)class.new(); } }

TestNG - Complex but powerful

One last example that is a little more of a real life example that is common in the testing world. Although I think the code might be a little too complex to get into here, I want to talk about a real life design and how it works in general. TestNG uses annotations with an interesting twist. Say you have a "Group", a label saying that this is part of a set of tests. Well, perhaps you have a Group called "smoke" for the smoke tests you want to run separate from all the others. TestNG might support filtering, but between TestNG and Maven and ... you decide you want to do determine the filtering of tests at run time using a flag somewhere (database, environment variable, wherever) that says "run smoke only". During run time, TestNG calls an event saying "I'm about to run test X, here is all the annotation data about it. Would you like to change any of it?" At this point you can read the Group information about the test. If your flag says smoke only, you can then check the groups the test has. If the Group list does not have smoke in it, you set the test to enabled=false, changing the annotation's data at run time. TestNG calls this Annotation Transformations. I call it cool.

The weird part is that you are modifying data at run time annotation data that is hard coded at compile time. That is to say, annotation values cannot be actually changed at runtime*, but a copy of the instance of them can be. That is what TestNG actually changes from what I can tell.

If you are reading this and saying, this topic is rather confusing, don't feel too badly. I know it is confusing. The TestNG part in particular is a bit mind bending. And to be clear, I don't see myself as an expert. There are way more complex ideas out there that just amaze me.

* This is from what I can tell. Perhaps there are reflective properties to let you do this. You can however override annotations through inheritance, but that is a more complex piece.

Monday, October 21, 2013

The case for an Automation Developer

Disclaimers: As is always true, context matters. Your organization or needs may vary. This is only based upon my experience in the hardware, B2B, ecommerce and financial industries. Given the number of types of businesses and situations, I assume you can either translate this to your situation or see how it doesn't translate to your situation.

Automation

Automation within this context is long living, long term testing somewhere between vertical integration testing (e.g., Unit testing including all layers) and system testing (including load testing). These activities include some or all of the following activities:

Writing database queries to get system data and validate results.
Writing if statements to add logic, about things like the results or changing the activities upon environment.
Creating complex analysis of results such as reporting those to an external system, rerunning failed tests, assigning like reasons for failure, etc.
Capturing data about the system state when a failure occurs, such as introspection of the logs to detect what happened in the system.
Providing feedback to the manual testers or build engineers in regards to the stability of the build, areas to investigate manual, etc.
Documenting the state of automation, including what has been automated and what hasn't been.
Creating complex datasets to test many variations of the system, investigating the internals of the system to see what areas can or should be automated.
Figuring out what should and shouldn't be automated.

Developer

Developer within this context is the person who can design complex systems. They need to have a strong grasp on the current technology sets and be able to speak to other developers at roughly the same level. They need to be able to take very rough high level ideas and translate them into working code. They should be able to do or speak to some or all of the following activities:

Design
Understand OOP

Organization

Database

Design
Query

Refactor
Debug
Reflections

Automation Developer

You will notice that the two lists are somewhat similar in nature. I tried to make the first set feel more operational and the second set to be a little more skills based, but in order to do those operations, you really have to have the skills of a developer. In my experience, you need at least one developer-like person on a team of automators. If you want automation to work, you have to have someone who can treat the automation as a software development project. That also of course assumes your automation is in fact a software development project. Some people only need 10 automated tests, or record-playback is good enough for their work. For those sorts of organizations, a 'manual' tester (that is to say, a person who has little programming knowledge) is fine for those sorts of needs.

Automation Failures

I know of many stories of automation failure. Many of the reasons revolve around expectations, leadership and communication. As that is an issue everywhere I don't want to consider those in too much depth other than to say a person who doesn't understand software development will have a hard time to clearly state what they can or can't do.

Technical reasons for failure involve things as simple as choosing the wrong tool to building the wrong infrastructure. For example, if you are trying to build an automated test framework, have you got an organized structure defining the different elements and sub elements. These might be called "categories" and "pages" with multiple pages in a category and multiple web elements in a page. How you organize the data is important. Do you save the elements as variables, call getters or embed that in actions in the page? Do actions in the page return other pages or is the flow more open? What happens when the flow is changed based upon the user type? Do you verify that the data made it into the database or just check the screen? Are those verifications in the page layer or in a database layer? Organization matters and sticking to that organization or refactoring it as need be is a skill set most testers don't have initially. This isn't the only technical development skill most testers don't have, but I think it illustrates the idea. Maybe they can learn it, but if you have a team for automation, that team needs a leader.

Real Failure

These sorts of problems I talk about aren't new (Elisabeth Hendrickson from 1998) which is why I hesitate to enumerate the problems with much more detail. The question is how have we handled such failures as a community? Like I said, Elisabeth Hendrickson said in 1998 (1998! Seriously!):

Treat test automation like program development: design the architecture, use source control, and look for ways to create reusable functions.

So if we knew this 15 years ago, then why have we as a community failed to do so? I have seen suggestions that we should separate the activities into two camps, checking vs testing, with checking being a tool to assist in testing, but not actually testing. This assumes that automation purely assists because it doesn't have the ability to come up with judgment. This may be insightful in trying to denote roles, but this doesn't really tell you much about who should do the automating. CDT doesn't help much, they really only note that it depends on external factors.

When often automation fails or at least seems to have limited value, who can suggest what we should do? My assertion is that testers typically don't know enough about code to evaluate the situation other than to say "Your software is broken" (as that is what testers do for a living,). Developers tend to not want to test is typically noted when talking about developers doing testing. Furthermore, what developer ever intentionally writes a bug (that is to say, we are often blind to our own bugs)?

A Solution?

I want to be clear, this is only one solution, there maybe others which is why the subheading starts with "A". That being said, I think a mixed approach is reasonable. What you want is a developer-like person leading the approach, doing the design and enforcing the code reviews. They 'lead' the project's framework while the testers 'lead' the actual automated tests. This allows for the best of both worlds. The Automation Developer is mostly doing code as a software development project while the testers do what they do best, develop tests. Furthermore, the testers then have buy-in in the project and they know what actually is tested.

Thoughts?

Wednesday, October 16, 2013

Reflections

I have been recently been reading over some of Steve Yegge's old posts and they reminded me of a theme I wanted to cover. There is a idea we call meta-cognition, that testers often use to defocus and focus, to occasionally come back for air and look for what we might have missed. It is a important part of our awareness. We try to literally figure out what we don't know and transfer that into a coherent question(s) or comment(s). Part of what we do is reflect on the past, using a learned sub-conscious routine and attempt to gather data.

In the same way, programming too has ways of doing this, in some cases, in some frames of reference. This is the subject I wish to visit upon and consider a few different ways. In some languages this is called reflections, which uses a reification of typing to introspect on the code. Other languages allow other styles of the same concept and they call them 'eval' statements. No matter the name, the basic idea is brilliant. Some computer languages literally can consider things in a meta sense intelligently.

Reflections

So lets consider an example. Here is the class under consideration:

class HighScore {
 String playerName;
 int score;
 Date created;
 int placement;
 String gameName;
 String levelName;
 //...Who knows what else might belong here.
}

First done poorly in pseudo code, here is a way to inject test variables for HighScore:

function testVariableSetup(HighScore highScore) {
highScore.playerName = RandomString();
highScore.gameName = RandomString();
highScore.levelName = RandomString();
highScore.score = RandomNumber();
highScore.create = RandomDate();
//... I got tired.
}

Now here is a more ideal version:

function testVariableSetup(Object class) {
for each variable in class.Variables {
 if(variable.type == String) then variable.value = RandomString();
 if(variable.type == Number) then variable.value = RandomNumber();
 if(variable.type == Date) then variable.value = RandomDate();
}

Now what happens when you add a new variable to your class? For that matter, what happens when you have 2 or more classes you need to do this in? The first version can be applied to anything that has Strings, Dates and Numbers. Perhaps we are missing some types, like Booleans, but that doesn't take too much effort to get the majority of the simple types. Once you have that, you only have to pass in a generic object and it will magically set all fields. Perhaps you want filtering, but that too is just another feature in the method.

The cool thing is, this can also be used to get all the fields without knowing what the fields are. In fact, this one is so simple, I am going to show a real life example, done in Java:

//import java.lang.reflect.Field;
//import java.util.*;

 public static List<String> getFieldNames(Object object) {
  List<String> names = new ArrayList<String>();
  for(Field f : object.getClass().getFields()) {
   f.setAccessible(true);
   names.add(f.getName());
  }
  return names;
 }

 public static Object getFieldValue(String fieldName, Object object) {
  try{
   Field f = object.getClass().getDeclaredField(fieldName);
   f.setAccessible(true);
   return f.get(object);
  }catch (Throwable t) {
   throw new Error(t);
  }
 }

 public static Map<String, Object> getFields(Object object) {
  HashMap<String, Object> map = new HashMap<String, Object>();
  for(String item : getFieldNames(object)) {
   map.put(item, getFieldValue(item, object));
  }
  return map;
 }

Lets first define the term "Field." This is a Java term for a variable, be it public or private. In this case, there is code to get all the field names, get any field value and get a map of field names to values. This allows you to write really quick debug strings by simply automatically reflecting any object and spitting out name/value pairs. Furthermore, you can make it so that it would filter out private variables, variables with a name like X or rather than getting fields, use it to get properties rather than variables. Obviously this can be rather powerful.

Eval

Let me give one other consideration of how reflective like properties can work. Consider the eval statement, a method of loading in code dynamically. First, starting with a very simple JavaScript function, let me show you what eval can do:

  var x = 10;
  alert( eval('(x + 2) * 5'));

This would display an alert with the value 60. In fact, an eval can execute any amount of code, including very complex logic. This means you can generate logic using strings rather than hard code it.

While I believe the eval statement is rather slow (in some cases), it can be useful for generating dynamically generated code. I'm not going to write out an exact example for this, but I want to give you an idea of a problem:

for(int a = 0; a!=10; a++) {
  for(int b = 0; b!=10; b++) {
    //for ... {
      test(X[a][b][...]);
    //}
  }
}

First of all, I do know you could use recursion to deal with this problem. That's actually a hard problem to solve, hard to follow and hard to debug. If you were in a production environment, maybe that would be the way to go for performance reasons, but for testing, performance is often not as critical. Now imagine if you had something that generated dynamic strings? I will again attempt to pseudo code an example:

CreateOpeningsFor(nVariables, startValue, endValue) {
  String opening = "for(a{0} = {1}; a{0}!={2}; a{0}++) {";  
  String open = "";
  for(int i = 0; i!=nVariables; i++) {
    open = open + "\n" + String.Format(opening, i, startValue, endValue);
  }
  return open;
}


eval(CreateOpenFor(5, 0, 10) + CreateFunctionFor("test", "X", 5) + CreateCloseFor(5));
//TODO write CreateFunctionFor, CreateCloseFor...  
//Should look like this: String function = "{0}({1}{2});" String closing = "}";

While I skipped some details, it should be obvious to someone who programs that this can be completed. Is this a great method? Well, it does the same thing as the hard coded method, yet it is dynamically built, thus is easily changed. You can log the created function even and place it in code if you get worried about performance. Then if you need to change it, change the generator instead of the code. I don't believe this solves all problems, but it is a useful way of thinking about code. Another tool in the tool belt.

The next question you might ask is, do I in fact use these techniques? Yes, I find these tools to be invaluable for some problem sets. I have testing code that automatically reflects and generates lots of different values and applies them to various fields or variables. I have used it in debugging. I have used it to create easily filterable dictionaries so I could get all variables with names like X. It makes the inflexibly type system into a helpful systems. I have even used it in creating a simple report database system which used reflections to create insert/update statements where the code field names are the column names in the database. Is it right for every environment? No, of course not, but be warned, as a tool it can feel a little like the proverbial hammer which makes everything look like nails. You just have to keep in mind that it is often a good ad-hoc tool, but not always a production worthy tool without significant design consideration. That being said, after a few uses and misuses, most good automators can learn when to use it and when not to.

Another reasonable question is what other techniques are useful and similar to this? One closely related tool would be regular expressions, a tool I don't wish to go in depth on as I feel like there is a ton of data on it. The other useful technique is known as annotations or attributes. These are used to define meta data about a field, method or class. As I think there is a lot more details to go over, I will try to write a second post on this topic in combinations with reflections as they are powerful together.

Wednesday, October 2, 2013

Refactoring: Improving The Design Of Existing Code

Having read many books on the basics of programming, building up queries and the basics of design, I have found that almost none really talk about how to deal with designing code outside abstract forms. Most books present design in a high level, possibly where a UML is shown and revolves around most of the OOP connections. Some talk over how to use design patterns or create data structures of various types. All seem to be under the illusion that, we as programmers actually can apply abstract concepts into real every day practical methodologies. Almost always they use things like animals or shapes or other "simple" (but not practical) OOP examples. They are always well designed, and seem relatively simple. In spite of that, I think I have learned how to do proper design to some degree by years of trial and error.

Refactoring, on the other hand starts out with a highly simple, yet more real world, movie system, and then slowly but surely unwraps the design over 50 pages. It is one of the most wonderful sets of “this code works, but let us make it better” I have ever seen, and it is dead simple in implementation. The book starts with a few classes, including a method to render to a console. Then blow by blow, the shows differences via a bolding of each change. The book talks design choices like how having a rendering method that contains what most would call “business logic” prevents you from easily having multiple methods of rendering (I.E. Console and html) without duplicate code. They also make a somewhat convincing argument against temporary variables, although I am not 100% convinced. Their reasoning is that it is harder to refactor with temp. variables, but sometimes temp variables (in my opinion) can provide clarity, not to mention they tend to show up in debugger watch windows. To be fair, later on, the author notes that adding back in temporary variables for clarity, but it is not emphasized nearly as much.

As I continued through the book, a second and important point was brought up. Refactoring requires reasonable amounts of unit testing in order to ensure that you do not accidentally break the system by the redesign. The argument continues that all coding falls into two categories. One is creating new functionality and the other is refactoring. When creating the initial code, you need testing around it, which connects into the TDD. The point is to be careful with refactors because they can cause instability, and to work slowly in the refactoring. They talk about constantly hitting the compile button just to be sure you have not screwed something up.

Sometimes, it is comforting to know that I am not the only one who runs into some types of trouble. One of my favorite statements in the book is in chapter 7 where the author notes after 10+ years of OOP, he still gets the placement of responsibilities in classes wrong. This to me is perhaps the entire key to refactoring. The point is, we as humans are flawed, even if we are all flawed in unique ways; it is something we have in common. This book, in my opinion is not ultimately how to correctly design, but how to deal with the mess that you will inevitably make. This is so important I think it bears repeating; this book says that no one is smart enough to build the correct software the first time. Instead, it says that you must bravely push on with your current understanding and then, when you finally do have something of a concept of what you are doing, go back and refactor the work that you just created. That is hard to put into practice, since you just got it working, and now you have to go tear it up and build again. The beauty of the refactor is that you don’t have to go it alone, you have techniques that allow you to make wise choices on what the change and the changes should have no affect on the final product.

One final thing I think is worth mentioning. The way the book is laid out, you can easily re-visit any given refactoring technique that you didn’t “get” the first time, as it is fairly rationally structured, grouping refactors together but keeping each refactor individualized. It makes me wonder how many times they refactored the book before they felt they had it right?

Friday, September 27, 2013

Word of the Week: Manumatic

Before I go into the depths of what I mean, I should first talk about what the wiki defines manumatic as. According to the dictionary, Manumatic is a type of semi-automatic shifter used for vehicles. This is not what I am talking about, even though it shares some of the same flavor. What I am talking about is semi-automated testing (or is that semi manual checking?). Some testers like the term tool-assisted testing and I can imagine a half dozen other terms like tool driven testing. Whatever you want to call it, I tend to call it a manumatic process or manumatic test.

The idea is that you have a manual process that is cumbersome or difficult to do. However, either some part of the test is hard to automate or the validation of the results requires human interpretation. There are many different forms this can come in, and my attempt to define it may be missing some corner cases (feel free to QA me in the comments), but allow me to give some examples.

At a previous company I worked for I had to find a way to validate thousands of pages did not change in 'unexpected' ways, but unexpected was not exactly defined. Unexpected included JavaScript errors, pictures that did not load, html poorly rendering and the likes. QA had no way of knowing that anything had in fact changed, so we had to look at the entire set every time and these changes were done primarily in production to a set of pages even the person who did the change may not have known. How do you test this? Well, you could go through every page every day and hope you notice any subtle changes that occur. You could use layout bugs detectors, such as the famous fighting layout bugs (which is awesome by the way), but that doesn't catch nearly all errors and certainly not subtle content changes.

We used a sort of custom screenshot comparison with the ability to shut off certain html elements in order to hide things like date/time displays. We did use some custom layout bug detectors and did some small checking, but primarily the screenshots were our tool of choice. Once the screenshots were done, we would manually look at the screenshots and determine which changes were acceptable and which were not. This is a manumatic test, as the automation does do some testing, but a "pass" meant nothing changed (in so far as the screenshots were concerned), and finding a diff or change in the layout didn't always mean "fail". We threw away the "test results", only keeping the screenshots.

In manually testing, often we need new logins. It requires multiple sql calls and lots of data to create a new login, not to mention some verifications that other bits are created. It is rather hard to do, but we wrote automation to do it. So with a few edits, an automated 'test' was created that allows a user to fill in the few bits of data that usually matter and lets the automation safely create a user. Since we have to maintain the automation already, this means every tester need not have the script on their own box and fight updates as the system changes. This is a manumatic process.

Let me give one more example. We had a page that interacted with the database based upon certain preset conditions. In order to validate the preset conditions, we need to do lots of different queries, each of which was subtly connected to other tables. Writing queries and context switching was a pain, so we wrote up a program to do the queries and print out easy to read HTML. This is a manumatic process.

I honestly don't care what you call it; I just want to blur the lines between automated testing and manual testing, as I don't think they are as clear as some people make them out to be.