Monday, 26 November 2012

Explaining TDD to high school students

Some time ago, I took on a challenge to try and prepare a short introduction to TDD that would make a good presentation for high school students (let's say 45 minutes). As TDD is hard to grasp and get used to even by experienced programmers, I thought that this should be really basic, just to put the point across.

I've pretty much prepared the flow of the presentation. Today I'd like to publish a draft of the presentation in hope to get some constructive feedback and make it even better.

Oh, by the way, I put content associated with each slide below it. Have fun!

Here we go!

Let's start with something obvious. In every activity, it's good to have a goal. Whether you're a race car driver, or a painter or a golf player, or even a tourist laying on the beach to get some sun and rest, you tend to have a goal. Without a goal, it's hard to determine things such as: when am I finished? How do I know that I failed? What can cause me to break the activity? How often do I need to repeat the activity? There are some other questions that can be derived from "having a goal".

We, software engineers, also need to have goals when writing software.

Let's take an exemplary student that is thinking about writing an application "from scratch". His goal may be to create an application that looks and works like Thunderbird e-mail client.

And so, he sets on a journey to make his dream come true. However, the goal is a distant one and so reaching it may be very difficult. This is because there's nothing along the road to tell him whether he's actually closer to or further away from the goal. It's almost like embarking on a journey with eyes closed. In software engineering world, we talk about a project having a high risk associated to it.

However, there's an easy way out of this inconvenience.

If we can split the path into several shorter ones, by choosing a subset of functionality to deliver first, we can arrive at the goal faster and with lower risk. It's like stating "ok, first I need to get to the Statue Of Liberty, and from there...".

Using this approach, we can slowly, but reliably arrive...

...at the full goal. This ability is, of course influenced by another factor - whether or not we have the ability not to mistakenly go back to the point of start. In other words, if our journey is composed of three points: A, B and C, we want to be sure whether from B, we're going to C, not back to A. In order to do this, we need some kind of "landmarks". In software engineering terms, we talk about "a way to confirm existing functionality and avoid introducing regression".

Thus, it makes sense to draw two conclusions.

The first one is that wherever we go and whatever software we write, we cannot take it "in one bite" - we have to split the overall goals into shorter goals to get feedback earlier.

Also, we want to be sure whether we really reached the goal or is it just our imagination. We'd like to be be able to tell whether all the goals are achieved, and if not, we'd like to know what's the next goal we must fulfill in order to get the work done.

Also, we don't want to come back and re-add the functionality we already put in. When we reach one goal, we want to get right to the next one and so on until the application is finished. If we can't completely prevent breaking what already works when adding new functionality, we want to at least know instantly when it happens to address it right away when it's a little trouble to do so.

Thus, we need the goals to be easily and reliably verifiable, so that the first time we arrive at the goal, we want to be sure that we really did. Later, we'd also want to re-verify these goals easily to make sure we didn't lose them accidentally while trying to achieve further goals.

These were the conclusions upon which TDD is built. Now let's see how it looks like in practice, taking on a naive example.

Let's imagine that we're trying to write a function that will raise a number to a given power. Also, let's keep the problem at the primary school level, because that's approximately where I left my math skills :-).

Anyway, we know how the signature would look like - we take one number, and another one and raise the first to the power of another. Now we need to state some goals that will help us arrive at the target.

We can come up with few examples of how properly implemented power function should behave. Each of these examples describes a more general rule. Such examples are called "Key Examples". The first one tells us what about a special case when we take anything and raise it to the power of 0. The second one describes a special case when we take anything and raise it to the power of 1. The third one illustrates the general rule that when we raise something to the power of N, we multiply it N times.

A set of key examples together with more general description forms a specification. Long gone are the times when the best way to write specification was to develop an abstract model of it. Nowadays, we want each rule implemented by the system to be illustrated by a concrete example. That's because...

...it's very easy to translate the speification made up this way into code, making it executable. This way, we're achieving the so-desired verifiability - if the specification is written as code that actually invokes the developed logic, we can reliably verify whether we really achieved the specified goals. Also, we can re-verify it later in no time, since code executes blazingly fast compared to a human that would need to read the specification and compare it with the actual code.

Ok, so let's take on the first statement of this specification and try to put it in the context.

In order to write and run specifications, we use special tools that provide some infrastructure upon which we can build. Such tools will automate many tasks for us, so we just need to write what we want and the tool will take care of gathering the specifications, executing them and reporting the result. For this example, we're gonna use a framework that's called XUnit.Net.

XUnit.Net allows us to state "facts" about the developed systems, by creating methods and marking them with [Fact] attribute and stating what is the expected behavior of the developed system. If the result is in sync with what we expect, the tool will mark such example as "passed" (usually using green color). If the result is not in sync, the tool will make the example as "failed" (usually using red color). Also, the specification needs a name. We try to name it after a description of the general rule illustrated by the example, so that we can easily come back to it later and read it as a "living documentation".

Now that we have prepared the necessary infrastructure, let's try adding some content to the example.

First, we state our assumptions. We often think about examples in executable specifications in terms of three sections: "Given" (assumptions), "When" (action), "Then" (desired result). We treat it as a general template for a behavior description. In this case, our assumption is that we have any number which happens to be 3 (but the name points that it can be any other number as well). We understand the code we just wrote as "given any number".

Now for the action, or "When". In our case, the action that we want to describe is taking something to the power of 0. Note that we use 0 explicitly and not give it any name. This is to stress that the described behavior takes place only when we use 0 here. This part should be understood as "When we raise it to the power of 0".

And now, the desired result, or "Then". Here, we state what should happen when the behavior is in place - in other words, what will make the example "passed" (green). In our case, we say "Then the result should be equal to 1". If this is not true, the example will be marked as "failed" (red).

Ok, let's quickly recap by trying to read the whole example. It reads like this:

Given any number
When I raise it to the power of 0
Then the result should be equal to 1

Our goal is to make this example "pass" (green) - when we it happens, the tool will display a message like the one on the slide. Note that the goal fulfills the criteria that we defined earlier. It is:

  • short
  • incremental - covers a well defined part of the functionality
  • verifiable - we can compile it, run it and in a second, we'll get a response on whether this goal is achieved or not.

By the way, this goal is so far unfulfilled. We don't have any code to even get past the compilation stage...

So let's add some. Note that we're deliberately putting in an implementation that will make this example "failed" (displayed in red). This is to make sure that the goal is well-stated. One can, by mistake, make an example that will always "pass", and we want ourselves protected from this kind of errors. Thus, we make the first implementation as naive as possible just to compile it and watch it not fulfilling our current specification.

The example we have created seems state the goal well. As the system does not work the way this example describes, it shows "failure" (in red), meaning that the goal is not achieved yet. Our task is to achieve it.

Thankfully, this simple goal could be achieved by changing one character in the original implementation. This is just enough implementation to put the desired behavior in place. Done.

"Wait a minute" - you may say - "this isn't a proper implementation of power algorithm! It's cheating!". And you may give some examples where the current implementation won't work...

...like 2 to the power of 2. If you really said this, all of this would actually be correct, except for the "cheating" part :-).

That's because TDD process consists of small cycles, where we do provide the simplest implementation possible and expand it when we expand the specification with new examples.

This process is usually called "Red - Green - Refactor" and consists of three stages:

  1. Red - named after the color that the tool for running executable specification shows you when the goal stated with an example is not achieved. We saw this when we made out Pow() method return 0 instead of expected 1.
  2. Green - named after the color that the tool shows you when the goal stated with example is achieved by the current implementation. We saw this when we put in the correct implementation for "anything to the power of 0" scenario.
  3. Refactor - After achieving the goal and making sure no previous goals were lost, it's a good moment to take a step back and look at the current design. Maybe the behaviors added on "one by one" basis can be generalized? Maybe something other can be cleaned up? In our case, no refactoring was needed, since there was hardly any design, however, in real-life scenarios, this is a crucial step.

When we finish the whole cycle, we take on another goal and do over until we run out of goals. That's the core of the TDD process.

Now's the time for a confession - you thought this presentation was about Test-Driven Development, but until now, I didn't even mention the word "test" - I was only talking about goals, specifications and examples. So where are the tests?

Ok, here's the thing: we use the examples to state our goals and verify their achievement up to the moment when the logic is in place. After this happens, we don't throw out these examples - they take on the role of micro-level tests that ensure all behaviors persist when we add more logic. These "tests" are a by-product of the TDD process.

The example we went through just now was a simplified case of a single function. As you surely know, the real-world projects, especially those object oriented ones, are not like this. They consist of a web of objects, collaborating together to achieve a task.

Most of these objects know about other objects and use them. In other words, object depend on other objects. Some say that object orientation is all about managing dependencies. How does TDD fit here?

Using examples, we can specify how an object should interact with its collaborators, even those that do not exist yet. "What?" - you may ask - "how am I supposed to create an example of an object that uses something which does not exist?". True, it's impossible per se, but there are ways to deal with that.

The trick is to develop fake objects that look like the real ones on the surface, but underneath, they allow us to set up their behavior, so that the specified object "thinks" different things happen in the system. It's like taking over all media in real life and broadcasting fake auditions about earthquake in New York - people in other countries, who know about current situation in New York from media only will believe the lies and act like they were real. Here, we wanna do the same thing - cheat an object about what happens in its surrounding to write examples on how it should behave.

In order to do it, we can use polymorphism. Let's take a look at two examples of such fake objects.

Sometimes, we want to state that the specified object should communicate something to its collaborators. Let's say we have a drive full of music and we want to show an example where our object makes an attempt to play the music. If we used the real drive, the music playing or not could result from many different conditions (like file privileges, format, whether there are actually any files on the drive etc.) which are out of scope of this example. To make things easier, we use a fake object, cheating our specified one into thinking that it's dealing with a real drive.

This is the second type, that allows us to set up a value returned by a method. Using this object, we can cheat the users of the drive into thinking that it's read only. If we used a real drive, we would probably have to invoke some complex logic to set it in this state. With a fake, we can just pre-program an answer that will be issued when the question gets asked.

The End

That's all I have for now, I'd be grateful for any feedback on this draft. Bye!

Tuesday, 20 November 2012

Reusing code in Test Data Builders in C++

Test Data Builders

Sometimes, when writing unit or acceptance tests, it's a good idea to use Test Data Builder. For example, let's take a network frame that has two fields - one for source, one for destination. A builder for such frame could look like this:

class FrameBuilder
{
protected:
  std::string _source;
  std::string _destination;
public:
  FrameBuilder& source(const std::string& newSource)
  {
    _source = newSource;
    return *this;
  }

  FrameBuilder& destination(const std::string& newDestination)
  {
    _destination = newDestination;
    return *this;
  }

  Frame build()
  {
    Frame frame;
    frame.source = _source;
    frame.destination = _destination;
    return frame;
  }
};

and it can be used like this:

auto frame = FrameBuilder().source("A").destination("B").build();

The issue with Test Data Builder method reuse

The pattern is fairly easy, but things get complicated when we have a whole family of frames, each sharing the same set of fields. If we wanted to write a separate builder for each frame, we'd end up duplicating a lot of code. So another idea is inheritance. However, taking the naive approach gets us into some trouble. Let's see it in action:

class FrameBuilder
{
protected:
  std::string _source;
  std::string _destination;
public:
  FrameBuilder& source(const std::string& newSource)
  {
    _source = newSource;
    return *this;
  }

  FrameBuilder& destination(const std::string& newDestination)
  {
    _destination = newDestination;
    return *this;
  }

  virtual Frame* build() = 0;
};

class AuthorizationFrameBuilder : public FrameBuilder
{
private:
  std::string _password;
public:
  AuthorizationFrameBuilder& password(const std::string& newPassword)
  {
    _password = newPassword;
    return *this;
  }

  Frame* build()
  {
    auto authorizationFrame = new AuthorizationFrame();
    authorizationFrame->source = _source;
    authorizationFrame->destination = _destination;
    authorizationFrame->password = _password;
    return authorizationFrame;
  }
}

Note that there are two difficulties with this approach:

  1. We need the build() method to return a pointer, or we'll never be able to use methods from FrameBuilder in the chain (because each of the methods from FrameBuilder returns a reference to FrameBuilder, which only knows how to create frames, not how to create authorization frames). So, we'll need the polymorphism to be able to perform chains like:
    AuthorizationFrameBuilder().password("a").source("b").build()
  2. Because FrameBuilder calls return a reference to FrameBuilder, not an AuthorizationFrameBuilder, we cannot use calls from the latter after calls from the first. E.g. we cannot make a chain like this:
    AuthorizationFrameBuilder().source("b").password("a").build()
    This is because source() method returns FrameBuilder, that doesn't include a method called password() at all. Such chains end up in compile errors.

Templates to the rescue!

Fortuntely, there's a solution for this. Templates! Yes, they can help us here, but in order to do this, we have to use the Curiously Recurring Template Pattern. This way we'll force the FrameBuilder methods to return reference to its subclass - this will allow us to mix methods from FrameBuilder and AuthorizationFrameBuilder in any order in a chain.

Here's an example code for the solution:

template<typename T> class FrameBuilder
{
protected:
  std::string _source;
  std::string _destination;
public:
  T& source(const std::string& newSource)
  {
    _source = newSource;
    return *(reinterpret_cast<T*>(this));
  }

  T& destination(const std::string& newDestination)
  {
    _destination = newDestination;
    return *(reinterpret_cast<T*>(this));
  }
};

class AuthorizationFrameBuilder 
: public FrameBuilder<AuthorizationFrameBuilder>
{
private:
  std::string _password;
public:
  AuthorizationFrameBuilder& password(const std::string& password)
  {
    _password = password;
    return *this;
  }

  AuthorizationFrame build()
  {
    AuthorizationFrame frame;
    frame.source = _source;
    frame.destination = _destination;
    frame.password = _password;
    return frame;
  }
};

Note that in FrameBuilder, the this pointer is cast to its template type, which happens to be the sublass on which the methods are actually called. this cast is identical in every method of FrameBuilder, so it can be turned into a separate method like this:

  T& thisInstance()
  {
    return *(reinterpret_cast<T*>(this));
  }

  T& source(const std::string& newSource)
  {
    _source = newSource;
    return thisInstance();
  }

Summary

This solution makes it easy to reuse any number of methods in any number of different builders, so it's a real treasure when we've got many data structures that happen to share some common fields.

That's all for today - if you'd like to, please use the comments section to share your solution to this problem for other programming languages.

Bye!

Tuesday, 13 November 2012

Don't use setup and teardown, or I will...

...write a blog post.

There - I did it. I told you I would!

This time, I'm going to share some of my reasons why I tend not to use Setup and Teardown mechanism at all.

First, a disclaimer - my point here is NOT that Setup and Teardown lead to inevitable catastrophe. My point here is that Setup and Teardown are so misunderstood and so easy to abuse, that I'd rather not use them at all. There are some other reasons why I actually prefer not having Setups and Teardowns even when they are used properly, but I'll save this for another post. This time, I'd like to focus only on the ways this mechanism is most often abused.

What's Setup and Teardown?

As everyone knows, a Setup method is a special kind of method that is executed by unit testing tools before each unit test in the suite. Such methods are commonly used to set the stage before a unit test begins. Analogously, "Teardown" method is a method that is always run after unit test execution and is usually used to perform cleanup after the test finishes. So, given this code (example uses NUnit):

[Setup]
public void Setup()
{
  Console.WriteLine("SETUP");
}

[Teardown]
public void Setup()
{
  Console.WriteLine("TEARDOWN");
}

[Test]
public void Test2()
{
  Console.WriteLine("TEST");
}

[Test]
public void Test1()
{
  Console.WriteLine("TEST");
}

... when it is run by a unit testing tool, it produces the following output:

SETUP
TEST
TEARDOWN
SETUP
TEST
TEARDOWN

While having the common logic for "setting the stage" and "cleaning up" in test suite looks like adhering to DRY, avoiding duplication etc., there are, however, certain dangers when using this kind of mechanism, some of which I'd like to list below.

Issues with Setup and Teardown

Because we're talking mostly about C#, we're mainly going to examine Setup, because Teardown is seldom used for unit tests in such languages. By the way, the examples provided are to explain what kind of abuse I have in mind. I tried to keep them simple - this way they're more understandable, but do not show how bad can it get with real code and real project - you'll have to believe :-). Let's go!

Joint Fixtures

Imagine that someone has a set of unit tests evolving around the same setup (let's call it "A"):

[Setup]
public void SetUp()
{
  emptyList = new List<int>(); //A
}

[Test] //A
public void ShouldHaveElementCountOf0AfterBeingCreated() 
{
 Assert.AreEqual(0, emptyList.Count());
}

[Test] //A
public void ShouldBeEmptyAfterBeingCreated()
{
  Assert.True(emptyList.IsEmpty());
}

One day, another set of unit tests must be added for the same class that requires the setup to be handled a little bit differently (let's call it "B"). What this person might do is to just add the necessary setup besides the first one.

[Setup]
public void SetUp()
{
  emptyList = new List<int>(); //A
  listWithElement = new List<int>() { anyNumber }; //B
}

[Test] //A
public void ShouldHaveElementCountOf0AfterBeingCreated() 
{
 Assert.AreEqual(0, emptyList.Count());
}

[Test] //A
public void ShouldBeEmptyAfterBeingCreated()
{
  Assert.True(emptyList.IsEmpty());
}

[Test] //B
public void ShouldNotContainElementRemovedFromIt()
{
  listWithElement.Remove(anyNumber);
  Assert.False(listWithElement.Contains(anyNumber));
}

[Test] //B
public void ShouldIncrementElementCountEachTimeTheSameElementIsAdded()
{
  var previousCount = listWithElement.Count();
  listWithElement.Add(anyNumber);
  Assert.AreEqual(previousCount + 1, listWithElement.Count());
}

The downside is that another person striving to understand one unit test will have to read both through the related setup and the unrelated one. And in real life scenarios such orthogonal setups tend to be longer that in this toy example.

Of course, this may be fixed by separating this class into two - each having its own setup. There are, however, two issues with this: one is that this is almost never done by novices, and the second is that it usually complicates navigation through the unit tests.

Locally Corrected Fixtures

Another temptation a novice may face is when one day they need an object that's slightly different than the one already set up in the Setup. The temptation is to, instead of create new object configured separately, undo some of the changes made by the Setup just for this single test. Let's see a simplified example of a situation where one needs a radio set up slightly differently each time:

[Setup]
public void Setup()
{
  radio = new HandRadio();
  radio.SetFrequency(100);
  radio.TurnOn();
  radio.SetToSecureModeTo(true);
}

[Test]
public void ShouldDecryptReceivedContentWhenInSecureMode()
{
  var decryptedContent = radio.Receive(encryptedContent);
  Assert.AreEqual("I love my wife", decryptedContent);
}

[Test]
public void ShouldThrowExceptionWhenTryingToSendWhileItIsNotTurnedOn()
{
  radio.TurnOff(); //undo turning on!!

  Assert.Throws<Exception>(() =>
    radio.Send(Any.String())
  );
}

[Test]
public void ShouldTransferDataLiterallyWhenReceivingInNonSecureMode()
{
  radio.SetSecureModeTo(false); //undo secure mode setting!!

  var inputSignal = Any.String();
  var receivedSignal = radio.Receive(inputSignal);

  Assert.AreEqual(inputSignal, receivedSignal);
}

Overloaded fixtures

Let's stick with the hand radio example from previous section. Consider the following unit test:

[Test]
public void ShouldAllowGettingFrequencyThatWasSet()
{
  radio.SetFrequency(220);
  Assert.AreEqual(220, radio.GetFrequency());
}

Ok, looks good - we specify that we should be able to read the frequency that we set on the radio, but... note that the radio is turned on in the Setup, so this is actually getting a frequency from a radio that's turned on. What about if it's turned off? Does it work the same? In the real world, some radios are analog and they allow manual regulation of frequency with a knob. On the other hand, some radios are digital - you can set their parameters only after you turn them on and they become active. Which case is it this time? We don't know.

Ok, I actually lied a little. In fact, if someone was doing TDD, we can assume that if the scenario was different when a radio is off, another unit test would exist to document that. But in order to determine that, we have to: 1) look through all the unit tests written for this class, and 2) really, really believe that the author(s) developed all the features test-first. An alternative is to look at code coverage or at the code itself. However, all these ways of dealing with the issue require some examination that we have to perform. In such case, the Setup gets in the way of understanding a minimum activities required for a functionality to work as described.

Scope hiding

It's quite common to see unit tests grow too big in scope as the design gets worse and worse and it's harder and harder to separate different bits and pieces for unit testing. Many people think that Setup and Teardown are THE feature to deal with this. I'm sorry, they are not. If you have issues with your design, the right way to go is to SOLVE them, not to HIDE them. Unfortunately, this kind of Setup abuse tends to grow into multi-level inherited Setups (do class names like "TestBase" ring a bell?), where no one really knows anymore what's going on.

Evading already used values

Consider the following example of a user registration process that may take place in various situations (e.g. creating sessions, presidential election etc.):

[Setup]
public void Setup()
{
  registration = new UserRegistration();
  registration.PerformFor("user1");
  registration.PerformFor("user2");
}

[Test]
public void ShouldBeActiveForRegisteredUsers()
{
  Assert.True(registration.IsActiveFor("user1"));
}

[Test]
public void ShouldBeInactiveForUnregisteredUsers()
{
  //cannot be "user1" or "user2" or it fails!
  Assert.False(registration.IsActiveFor("unregisteredUser"));
}

Why "unregisteredUser" was chosen as a value for unregistered user? That's because someone writing a unit test wanted to evade the values used in Setup. While that's not as bad when reading those unit tests, it's a pain when writing new ones - in order to avoid conflict, you have to always look back at what users are already registered and either use the same values or deliberately pick different ones. Thus, writing every new test begins with trying to understand what the Setup already does and trying to get around that. What's worse, when putting another value in the Setup, we have to go through all existing unit tests to make sure that we pick a value different that already used in any of them. This hurts maintainability.

Why do I tell teams to avoid Setup and Teardown

As Roy Osherove once told me:

I don't use my frameworks to teach me design.

and while it was in a different context, I can paraphrase it into saying that avoiding using a feature instead of learning how not to misuse it is not something that will teach you good design and TDD.

Why then do I insist that people new to unit testing hold back from using features such as Setup and Teardown? How is it helpful?

In my opinion, holding back from using Setup and Teardown exposes us to a diagnostic pain - when we feel the pain and we cannot work around it, we have to learn to avoid it. It's never sufficient to say "don't use Setup and Teardown" or put in any other rule (e.g. "no getters") without following up on experience from applying this rule. When the team gets into difficulties, there has to be someone to explain what's the root cause. Only then do such rules make sense. If not for this, the team would just find another workaround, e.g. with helper methods (which are better than Setup and Teardown in that you always get to choose every time which ones to use and which not, but can also be abused). Thus, when I say "no Setup/Teardown" to a team, I always wait for a reaction like "you're nuts!" - through such experiences, I'm preparing them to acknowledge that TDD is something different that they initially thought (which is usually something like "execute a lot of production code" and "get a high coverage").

How about experienced developers? Do they need to adhere to the same rule? Well, let me share my experience. I told myself once - if I ever run into a situation where it makes perfect sense to use Setup and Teardown, I will not hold myself back. Guess what - since then, I used it only once (because one technique of dependency injection required it) and I was regretting it later. Other than this, I never really felt the need to use this feature, since I was doing even better without it. There are some other advantages (like keeping each unit test a "one part story") that make me never want to go back. Where I work, we've got over a thousand of unit tests and no Setup and Teardown at all.

There are actually more smells and difficulties associated with Setup and Teardown (like impeding certain refactorings) and I could also write something more about Setup and Teardown vs helper methods, but everything has to have an end, doesn't it? Till the next time!