Tuesday, 8 October 2013

Developing TDD style by example

(this post is adapted from my work-in-progress open source TDD tutorial)

(Note that I use a term "Statement" instead of "test" and "Specification" instead of "Test Suite" in this post)

Throughout this blog, you have seen me using quite extensively a utility class called Any. The time has come to explain a little bit more carefully what principles lie under this technique and tool. I'll also use this technique as a case study to show you how one develops a style of Test-Driven Development.

A style?

Yep. Why am I wasting your time writing about style instead of giving you the hardcore technical details? The answer is simple. Before I started writing this tutorial, I read four or five books solely on TDD and maybe two others that contain chapters on TDD. All of this sums up to about two or three thousands of paper pages, plus numerous posts on many blogs. And you know what I noticed? No two authors use exactly the same sets of techniques for test-driving their code! I mean, sometimes, when you look techniques they're suggesting, the suggestions from two authorities contradict each other. As each authority has their followers, it's not uncommon to observe and take part in discussions about whether this or that technique is better than a competing one or which one leads to trouble in the long run.

I did this, too. I also tried to understand how come people praise techniques I KNEW were wrong and led to disaster. Then, Finally, I got it. I understood that it's not a "technique A vs. technique B" debate. There are certain sets of techniques that work together and choosing one technique leaves us with issues we have to resolve by adopting other techniques. This is how a style is created.

Developing a style starts with underlying set of principles. These principles lead us to adopt our first technique, which makes us adopt another one and, ultimately, a coherent style emerges. Using Constrained Non-Determinism as an example, I'll try to show you how part of a style gets derived from a technique that is derived from a principle.

Principle: Tests As Specification

As I already stressed, I strongly believe that unit tests constitute an executable specification. Thus, they should not only pass input values to an object and assert on the output, they should also convey to their reader the rules according to which objects and functions work. The following oversimplified example shows a Statement where it's not explicitly stated what's the relationship between input and output:

[Fact] public void 
ShouldCreateBackupFileNameContainingPassedHostName()
{
  //GIVEN
  var fileNamePattern = new BackupFileNamePattern();
  
  //WHEN
  var name = fileNamePattern.ApplyTo("MY_HOST_NAME");
  
  //THEN
  Assert.Equal("backup_MY_HOST_NAME.zip");
}

Although the relationship can be guessed quite easily, remember it's just an example. Also, seeing code like that makes me ask questions like: is the "backup_" prefix always applied? What if I actually pass "backup_" instead of "MY_HOST_NAME"? Will the name be "backup_backup_.zip", or "backup_.zip"? Also, is this object responsible for any validation of passed string?

This makes me invent a first technique to provide my Statements with better support for the principle I believe in.

First technique: Anonymous Input

I can wrap the actual value "MY_HOST_NAME" with a method and give it a name that better documents the constraints imposed on it by the specified functionality. In our case, we can pass whatever string we want (the object is not responsible for input validation), so we'll name our method AnyString():

[Fact] public void 
ShouldCreateBackupFileNameContainingPassedHostName()
{
  //GIVEN
  var hostName = AnyString();
  var fileNamePattern = new BackupFileNamePattern();
  
  //WHEN
  var name = fileNamePattern.ApplyTo(hostName);
  
  //THEN
  Assert.Equal("backup_MY_HOST_NAME.zip");
}

public string AnyString()
{
  return "MY_HOST_NAME";
}

By using anonymous input, we provide a better documentation of the input value. Here, I wrote AnyString(), but of course, there can be a situation where I use more constrained data, e.g. AnyAlphaNumericString() when I need a string that does not contain any characters other than letters and digits. Note that this technique is applicable only when the particular value of the variable is not important, but rather its "trait". Taking authorization as an example, when a certain behavior occurs only when the input value is Users.Admin, there's no sense making it anonymous. On the other hand, for a behavior that occurs for all values other than Users.Admin, it makes sense to use a method like AnyUserOtherThan(Users.Admin) or even AnyNonAdminUser().

Now that the Statement itself is freed from the knowledge of the concrete value of hostName variable, the concrete value of "backup_MY_HOST_NAME.zip" looks kinda weird. There's no clear indication of the kind of relationship between input and output and whether there is any at all (one may reason whether the output is always the same string or maybe it depends on the string length). It is unclear which part is added by the production code and which part depends on the input we pass to the method. This leads us to another technique.

Second Technique: Derived Values

To better document the relationship between input and output, we have to simply derive the expected value we assert on from the input value. Here's the same Statement with the assertion changed:

[Fact] public void 
ShouldCreateBackupFileNameContainingPassedHostName()
{
  //GIVEN
  var hostName = AnyString();
  var fileNamePattern = new BackupFileNamePattern();
  
  //WHEN
  var name = fileNamePattern.ApplyTo(hostName);
  
  //THEN
  Assert.Equal(string.Format("backup_{0}.zip", hostName);
}
public string AnyString()
{
  return "MY_HOST_NAME";
}

This looks more like a part of specification, because we're documenting the format of the backup file name and show which part of the format is variable and which part is fixed. This is something you'd probably find documented in a paper specification for the application you're writing - it would probably contain a sentence saying: "The format of a backup file should be backup_H.zip, where H is the current local host name".

Derived values are about defining expected output in terms of the input that was passed to provide a clear indication on what is the "transformation" of the input required of the specified production code.

Third technique: Distinct Generated Values

Let's assume that some time after our initial version is shipped, we're asked to make the backup feature applied locally per user only for this user's data. As the customer doesn't want to confuse files from different users, we're asked to add name of the user doing backup to the backup file name. Thus, the new format is "backup_H_U.zip", where H is still the host name and U is the user name. Our Statement for the pattern must change as well to include this information. Of course, we're trying to use the anonymous input again as a proven technique and we end up with:

[Fact] public void 
ShouldCreateBackupFileNameContainingPassedHostNameAndUserName()
{
  //GIVEN
  var hostName = AnyString();
  var userName = AnyString();
  var fileNamePattern = new BackupFileNamePattern();
  
  //WHEN
  var name = fileNamePattern.ApplyTo(hostName, userName);
  
  //THEN
  Assert.Equal(string.Format(
    "backup_{0}_{1}.zip", hostName, userName);
}

public string AnyString()
{
  return "MY_HOST_NAME";
}

Now, we can clearly see that there is something wrong with this Statement. AnyString() is used twice and each time it returns the same value, which means that evaluating the Statement does not give us any guarantee, that both values are applied and that they're applied in the correct places. For example, the Statement will be evaluated to true when user name is used instead of host name in specified production code. This means that if we still want to use the anonymous input effectively, we have to make the two values distinct, e.g. like this:

[Fact] public void 
ShouldCreateBackupFileNameContainingPassedHostNameAndUserName()
{
  //GIVEN
  var hostName = AnyString();
  var userName = AnyString2();
  var fileNamePattern = new BackupFileNamePattern();
  
  //WHEN
  var name = fileNamePattern.ApplyTo(hostName, userName);
  
  //THEN
  Assert.Equal(string.Format(
    "backup_{0}_{1}.zip", hostName, userName);
}

public string AnyString()
{
  return "MY_HOST_NAME";
}

public string AnyString2()
{
  return "MY_USER_NAME";
}

We solved the problem (for now) by introducing another helper method. However, this, as you can see, isn't a very scalable solution. Thus, let's try to reduce the amount of helper methods for string generation to one and make it return a different value each time:

[Fact] public void 
ShouldCreateBackupFileNameContainingPassedHostNameAndUserName()
{
  //GIVEN
  var hostName = AnyString();
  var userName = AnyString();
  var fileNamePattern = new BackupFileNamePattern();
  
  //WHEN
  var name = fileNamePattern.ApplyTo(hostName, userName);
  
  //THEN
  Assert.Equal(string.Format(
    "backup_{0}_{1}.zip", hostName, userName);
}

public string AnyString()
{
  return Guid.NewGuid.ToString();
}

This time, we're not returning an understandable string, but rather a guid, which gives us the fairly strong guarantee of generating distinct value each time. The string not being understandable (contrary to something like "MY_HOST_NAME") may leave you worried that maybe we're losing something, but hey, didn't we say AnyString()?

Distinct generated values means that each time we need a value of a particular type, we get something different (if possible) than the last time and each value is generated automatically using some kind of heuristics.

Fourth technique: Constant Specification

Let us consider another modification that we're requested to make - this time, the backup file name needs to contain version number of our application as well. Remembering that we want to use Derived Values, we won't hardcode the version number into our Statement. Instead, we're going to use a constant that's already defined somewhere else in the application (this way we also avoid duplication of this version number across the application):

[Fact] public void 
ShouldCreateBackupFileNameContainingPassedHostNameAndUserNameAndVersion()
{
  //GIVEN
  var hostName = AnyString();
  var userName = AnyString();
  var fileNamePattern = new BackupFileNamePattern();
  
  //WHEN
  var name = fileNamePattern.ApplyTo(hostName, userName);
  
  //THEN
  Assert.Equal(string.Format(
    "backup_{0}_{1}_{2}.zip", 
    hostName, userName, Version.Number);
}

public string AnyString()
{
  return Guid.NewGuid.ToString();
}

Note that I didn't use the literal constant value, but rather, the value inside the Version.Number constant. This allows us to use derived value, but leaves us a little worried about whether the value of the constant is correct - after all, we're using it for creation of our expected value, but it's a part of production code - i.e. is something that should be specified itself!

To keep everyone happy, we write a single Statement just for the constant to specify what the value should be:

[Fact] public void 
ShouldContainNumberEqualTo1_0()
{
  Assert.Equal("1.0", Version.Number);
}

By doing so, we make the value in the production code just echo what's in our executable Specification, which we can fully trust.

Summary of the example

In this example, I tried to show you how a style can evolve from the principles you value when doing TDD. I did so for two reasons:

  1. To introduce to you a set of techniques I personally use and recommend and to do it in a fluent and logical way.
  2. To help you better communicate with people that are using different styles. Instead of just throwing "you're doing it wrong" at them, try to understand their principles and how their techniques of choice support those principles.

Now, let's take a quick summary of all the techniques introduced in example:

Anonymous Input
moving the output out of the Statement code and hide it behind a method that to emphasize the constrain on the data used rather than what's its value
Derived Values
defining expected output in terms of the input in order to document the relationship between input and output
Distinct Generated Values
When using Anonymous Input, generate a distinct value each time (in case of types that have very few values, like boolean, try at least not to generate the same value twice in a row) in order to make the Statement more reliable.
Constant Specification
Write a separate Statement for a constant and use the constant instead of its literal value in all other Statements to create a Derived Value.

Constrained non-determinism

When we combine anonymous input together with distinct generated values, we get something that's called Constrained Non-Determinism. This is a term coined by Mark Seemann and basically means three things:

  1. Values are anonymous i.e. we don't know the actual value we're using
  2. The values are generated in as distinct as possible sequence (which means that, whenever possible, no two values generated one after another hold the same value)
  3. The non-determinism in generation of the values is constrained, which means that the algorithms for generating values are carefully picked in order to provide values that are not special in any way (e.g. when generating integers, we don't allow generating '0' as it's usually a special-case-value)) and that are not "evil" (e.g. for integers, we generate small positive values first and go with bigger numbers only when we run out of those small ones).

There are multiple ways to implement constrained non-determinism. Mark Seemann himself invented the AutoFixture library for C# that's freely available to download by anyone. Here's a shortest possible snippet to generate an anonymous integer using AutoFixture:

Fixture fixture = new Fixture();
var anonymousInteger = fixture.Create<int>();

I, after Amir Kolsky and Scott Bain, like to use Any class. Any takes a slightly different approach than AutoFixture (although it shamelessly uses AutoFixture internally and actually obscures its impressive abilities). My implementation of Any class is available to download as well.

Summary

That was a long ride, wasn't it? I hope that this post gave you some understanding of how different TDD styles came into existence and why I use some of the techniques I do (and how these techniques are not just a series of random choices). In the next posts, I'll try to introduce some more techniques to help you grow a bag of neat tricks - a coherent style.