Tuesday, 23 October 2018

Sandro Mancuso & Uncle Bob comparative design episode 6 notes

Notes from episode 1
Notes from episode 2
Notes from episode 3/4/5

The episode 6 was the first episode where Uncle Bob's approach is presented. This approach is much less familiar to me than Sandro's so I had a lot of observations about both uncle Bob's testing and his design approach. Not everything is clear to me but more, I hope, will unveil in the next episodes.
  1. As described in the episode summary, he starts with UML diagram to identify a domain model consisting of abstractions and relationship between them. Also, Bob identifies a set of use cases on a whiteboard first.
  2. Bob starts his first test with a test file for a use case (e.g. for "create user" use case, he creates a file called CreateUserTest) and creates a first test called "nothing".
  3. The "nothing" test quickly transforms into creation test (testing that a use case object can be created) 
  4. The second test is for the use case logic and is called "canCreateUser". The use case class is called CreateUser - a verb, not a noun, which I am fine with. I don't think all classes names should be nouns and I consider this one a good example.
  5. When writing this test, Uncle Bob introduces a DTO and calls it CreateUserRequest. Also, he puts all the data as public fields, to stress that this is a data structure and not a class with behaviors.
  6. As a refactoring step, Uncle Bob moves all but assertions to the setUp() method, creating essentially what is called "testcase class per fixture" test organization scheme.
  7. A surprising step - Uncle Bob does not inject repository to the use case. Rather, he makes a class called Context and places the repository in a mutable public static field of this class. In each test, he assigns his own in-memory repository to this field. Then the production class accesses this global variable to get the repository. This is, AFAIK, the Ambient Context pattern. The rationale for using it is that Uncle Bob doesn't like passing the reference to repository around. My comments on that choice:
    1. Personally, I don't think the reference would be "passed around" - had Uncle Bob used a factory for the use cases, he could inject the repository from composition root into the factory and from there into the use case and that's it. Not sure I would call it "passing around". 
    2. Another point would be that probably not every unit testing framework would let him get away with this. For example in xUnit.net, where tests are ran in parallel with each other, I assume he would see some nasty non-deterministic behavior.
    3. By creating the global mutable state, Uncle Bob creates a persistent fixture, which requires a manual teardown (either at the beginning or at the end of a test). I believe him he has enough discipline to keep it torn down at all times, but in team setup, I can see two programmers, each adding new tests and new fields to the Context, each anaware of what the other does and then they merge and it turns out that the changes made by programmer "A" do not include tearing down the things that programmer "B" added and vice versa. In other words, I believe it requires an immense amount of discipline and communication to keep this stuff clean at all times.
    4. This approach, as Sandro suggested, makes dependencies implicit. Uncle Bob argues that although it does make the dependency implicit, it does not hide it. Having looked at several definitions of what "implicit" means (I'm not a native speaker), from my perspective I would call it "hiding" anyway.
  8. When testing whether he can get the user that was put inside the repository by the use case, Uncle Bob makes a test fail by changing the in-memory repository to return a clone of the user instead of the original reference that was put inside the repository. Here I don't know what to think of it. This is because I am not sure how Uncle bob treats the InMemoryRepository class. If it's treated as a fake, a part of the test, then this is a legit TDD step, because by modifying the repository, Bob modifies the test fixture, i.e. he modifies the test. On the other hand, if the in-memory repository is a test fixture, then how did it end up in the org.openchat.repositories production package? This is the part that confuses me.
  9. An interesting part is that Bob tries to write the controller logic without a passing test and only goes to write one when he doesn't know what to write next in the controller logic. Bob doesn't want to mock the framework types (I wouldn't want to as well) and searches for another way, but this will be resolved in episode 7.

Wednesday, 26 September 2018

Sandro Mancuso & Uncle Bob comparative design episode 3/4/5 notes

Notes from episode 1
Notes from episode 2
Notes from episode 6

Here are my notes from Comparative Design episode 3,4 and 5 by Sandro Mancuso and Uncle Bob. There were less new things for me in these episodes, more of applying the approach from previous episodes on other use cases (so the notes are going to be short on this one) . Still, I enjoyed it and consider the watching time well-spent. As with previous posts, I will only list stuff that was new or controversial to me as that's what I use the notes for. There's lots of great stuff in these videos that I don't write about because I know them from womewhere else. Also, as these are notes, they are a bit unsolicited and messy - get over it.
  1. I learned IntelliJ IDEA's shortcut ALT+SHIFT+F10 for launching tests from current class. I didn't know that - thanks Sandro & Uncle Bob!
  2. A nice comment from Sandro: In outside-in approach, each test represents a design decision
  3. When doing validation whether a post contains inappropriate words, Sandro delegates the validation (at least the condition) to a separate collaborator and calls it language service instead of "validator" - nice, I never really liked the name "validator". After that, he extracts the condition together with decision to throw exception to a separate method called validate(). Typically, I call methods that throw when a condition is not met assert...() by analogy to how they are called in xUnit frameworks.
  4. When implementing the "adding posts" requirement, Sandro leaves the repository empty since there is no requirement for reading from the repository yet. I like that - there was no point inventing a speculative reading feature for the class only to see it turn out different in the future when driven by a real use case. On the other hand, however, this might be a singnal that the acceptance tests do not revolve around use cases. Is "add a post and receive OK" really a valid customer use case? What use is adding a post if the post cannot be read? Thus, I would probably have a test where I would add a post and then read it, because from the user point of view, only then is the API useful at all. Well, there may be a case where someone writes their own GUI client and wants to integrate with each method as fast as possible (thus having POST only would be beneficial to them as it would allow them check their client compatibility), but I didn't find anything to confirm this is the assumed scenario here. Also, if I only could do one of read/write in an iteration, I would pick reading first as it would already allow showing the customer how posts look like on the GUI.
  5. When testing the language service (a piece of code that rejects inappropriate words), Sandro picks "elephant" for a bad value (because "elephant" is one of the inappropriate words), but uses "OK text" for a good value. I would probably pick some text that's closer to the boundary, e.g. "elephan" to show that the whole word has to match to reject it. Several seconds later, Sandro picks some more examples of bad values, some of which I wouldn't think of (e.g. all uppercase ELEPHANT). Sandro's comment on this: "I don't like baby steps". I can understand that - baby steps are typically for exploratory stuff and this is a CRUD app so no surprises are expected here.
  6. When testing with APIs that accept collections, Sandro uses a single-element collection. I typically prefer 3 elements although I must say that this time Sandro's preference is closer to boundary value testing - interesting :-).
  7. Another interesting observation is that Sandro does sorting of posts (to have the latest post first) in the repository but filtering of logged-in user (when showing all users) in the GUI. I have to admit (and Sandro does as well) these are hard decisions to make and typically an assumption made on these things gets validates as more code appears.
  8. The rest of the episodes 4 and 5 are more or less about the same stuff, so I see no point taking too much notes from them. The only interesting fact is that when Sandro registers a following (a relationship saying that one user follows the other) - and, while there is some validation whether the following already exists, there is no validation whether the users exist at all. I wonder why - does Sandro trust that only the UI will use the API and hence no one will send IDs that don't exist?
And that's it. This concludes my notes on Sandro's part (of course, after Bob's part there will be some dicsussion between both of them). As a final note on this part of episodes, I'd like to comment on the example problem that is used in comparative design case study. It's about writing a backend API for a twitter-like app. The best part about the chosen problem is that it is a realistic one and allows showing TDD with randomness, I/O and some other stuff that usually gets left behind in simple examples, end to end from controller to cache (without a real database though). The worst part of it is that it lacks a significant amount of logic around the use cases. Basically, it's almost a plain CRUD app and if we dropped the layering from the design, each use case would take around 10-15 lines of code to implement. And now we come to my greatest source of confusion. Sandro deciding to introduce the layering means he clearly wanted to show how to design something maintainable. On the other hand, some of the choices he made looked to me like cutting corners because this is a very simple app. No thread safety, no authorization, no authentication. Also, unfortunately, no elaborate requirements like "when you create a post and this is a 10th post created during the last second, we will consider you are a bot and disallow you from posting for 10 minuts". The resulting domain model turned out very anemic and consisting mainly of DTOs. I'd rather see less use cases but more elaborate because interactions between entities and domain concepts is where GOOS authors wrote this style really shines. Still, I enjoyed the episodes and I think this part is something that every senior coder should watch as even without an elaborate domain, it contains lots of great insight, tips, tricks and discussion. I look forward to Uncle Bob's episodes and the concluding discussion!

Tuesday, 11 September 2018

Sandro Mancuso & Uncle Bob comparative design episode 2 notes

Notes from episode 1
Notes from episode 3/4/5
Notes from episode 6

After notes from episode 1, I promised some notes after episode 2 and here they are.

As with the previous post (please read the disclaimer that's there if you didn't), I am focusing on the stuff I disagree with, would do in another way or consider controversial. This is because that's what I learn from the most, not because I want to prove to anyone that I am smarter or something. There was also lots of great stuff that I just happen to agree with, so I won't write a lot about that. Also, remember that the episodes show code in evolution, so it may be that some choices Sandro makes are temporary or made with knowledge that no further development will be done in specific code areas.

General Remarks

Stuff I forgot to write down from the first episode and is also in the second.
  1. I noticed that Sandro mixes Mockito with BddMockito, which seems kind of strange to me. Typically what I see is that people prefer one namespace or the other, but Sandro takes stubbing syntax from BddMockito (given(...).willReturn(...)) and verification syntax from plain Mockito (verify(...)). I wonder what he does when he works with his team - whether they all use this convention or someone gives up and agrees to use something different.
  2. Sandro doesn't use a TODO list anywhere in the example. I think one of the reasons is that his UnsupportedException occurences work as TODO marks. The other may be that this is a prepared exercise so there's little exploration and question marks to be investigated. When doing prepared presentations, I also very rarely use a TODO list, so I don't see any issue with that.
  3. He uses TDD together with programming by intention. This is something I've never done before, I will try to practice it a bit in the coming weeks.
Now for the main notes.

 Users API

  1. The beginning of the episode is about finishing the user API, where a UserService instance should create a user. Sandro moves the randomness of generating user ids (UUIDs) to a separate class, called IdGenerator and says he will not test it as it is not deterministic. While I agree that it isn't deterministic, I think some of the properties of the ID generator may be. Personally, I think it still is a material for a property-based test. Though there's (almost non-existent) chance of generating a UUID that was once used, I think it is reasonable to expect from the algorithm that no two UUIDs generated in a sequence are the same. Sure, that would lead to a discussion on "why two, not three" etc., but I would still write a small test around that property, especially that the return value of the generator is String, not UUID, so there's nothing in the signature that forces one to use UUIDs in the implementation.
  2. The next test for the UserService is about throwing an exception when a user that we're trying to create is already in the repository. Sandro uses JUnit's expected exception annotation: @Test(expected = UserNameAlreadyInUseException.class). Not sure why he prefers it to AssertJ's built-in assertThatThrownBy(). While this is a recent feature, I think it was already around when the video was recorded. Typically when I read people writing on asserting exceptions, the annotation-oriented solutions are considered the inferior ones to specialized assertions.
  3. In the same test, Sandro wonders whether to return an optional or to create a query method named "isUsernameTaken()". Only if that method returns true does he go on and create the user. I think I can see why he doesn't use an optional - because that would not scale - he would need to establish an implicit assumption that every Optional.empty() returned from repository means that user name is already taken, which could be a bit of stretch. Alternatively, he could throw the exception straight from the repository, but that might mean pushing some of the application logic into what can essentially end as adapter (but currently Sandro puts his repository together with domain logic). On the other hand, I'm wondering whether his solution would hold in a more real-life environment of race conditions and multiple threads accessing the repository. Sandro seems to assume no such requirement for this system since if there was one, the access to the repository would need to be synchronized. Then, he would probably need to solve the dilemma of where to place the synchronization construct. I wonder what would be his answer.
  4. Another thing I am wondering about is whether Sandro considers the repository he created a domain construct or an adapter in the hexagonal architecture sense. He puts it into domain logic for now, but actually how he thinks of it can influence what kind of logic he may be inclined to push inside and what kind of logic he would like to keep away from it. Especially that Sandro says this is going to be replaced by a database at some point.
  5. An interesting fact is that he defers the technology choice for the persistence mechanism. Not sure if he will implement it or not. This is the place where in normal project I would like to see something like a story map to be sure where I am, since the external API description in Swagger does not say anything about persistence, so as a behavioral specification, it is incomplete. If this is the only specification we have, then what should we derive our need for persistence from?
  6. When testing the isUserTaken() behavior in the repository, Sandro writes two tests in a single method - one for true (name is taken), one for false (name is not taken). For me these are two behaviors and I would like to write them separately. When I look at it, the behavior looks to me like a good candidate for a parameterized test, although I must say that JUnit 4 does not make writing parameterized test methods easy. At the very least, if I really felt this is closer to a single behavior (just that each assertion shows one side of the coin), I would employ the soft assertions feature of AssertJ to not stop on the first assertion failure.
  7. There is a nice discussion between Sandro and Bob on why there isn't a getUser() method in the repository that would allow checking whether the user is in the repository or not. I agree with Sandro that adding such method would be premature as no existing use case needs it yet and I think something like this would constrain my internal implementation. 

Login API

  1. I noticed that Sandro doesn't use named constants in tests. Even though the production code uses the library-defined OK_200 constant (as funny as this name might be), the test uses plain 200. Not sure what the rationale is as it is never given, so I am going to try and guess that Sandro does not trust the constants and wants the tests independent of them because if tests used the named constants as the production code does, the values of the constants would essentially fall out of scope of a test, so if the value of OK_200 changed at one point, there would be no test to detect that. Personally, I prefer a different way of including constants in the test scope. I use the named constants everywhere in the tests and write an additional small test that says assertThat(OK_200).isEqualTo(200) writing such test takes me literally half a minute and the dilemma is solved for me. Now if the value of the constant changes, I have a single test to maintain. 
  2. Just as Sandro created UserApi class, he creates LoginApi. I think not putting both registration and login methods in the same class is a clever move because by doing that, Sandro establishes vetrtical architecture slicing and has a green field for his new feature in a fresh vertical slice.
  3. Interestingly, Sandro goes straight to the repository from the LoginAPI, not introducing any kind of LoginService as he did in case of the UserAPI -> UserService->UserRepository. The rationale is that the login API is so simple that in this case Sandro is willing to do an exception and move away from what he considers a redundant abstraction layer. I think that doing that is a nice demonstration of benefits of vertical slicing, since he can make this decision for login API without affecting the user API. This is one of the reasons why I like vertical slicing so much. If he did put both registration and login methods in the same API class, it would look confusing that for one feature, the API goes to the service which goes to the repository and for another the same API class goes straight to the repository. That being said, I don't understand in what way is the login API simpler than the user API - both are plain "DTO for DTO" cases with "else return error" conditions so I see no significant factor that would lead me to taking a different approach in each case.
  4. I have some (temporary?) concerns with the UserCredentials class - it seems to be a a DTO and a value object in one. While I don't have much against DTOs implementing comparison operations for convenience, the class later seems to get more responsibilities that make it even more similar to a value object (i.e. a method boolean matches(User user)). Maybe this is a beginning of evolution that I will observe further in the next episodes?
  5. In my notes form the last episode, I noticed that Sandro established a relationship between test values where such relationship is not required. He did the exact contrary in the login API test. Here, he created input UserCredentials and did not make data relationship between it and the User object set to be returned by a mock. He only established such relationship between User and its JSON representation, using the User instance as a reference. This is exactly how I like to handle such cases. This way it is clearly shown that it's not the login API's business to say what kind of values will the User object hold - this responsibility is pushed onto the UserRepository class.
  6. Sandro has two identical methods called jsonContaining() but chooses not to remove the duplication. He gives a nice explanation that he is still not sure whether the two occurences of this method will change for the same or different reasons. Also, I can only guess that he is more careful with refactorings that come across vertical slices, but this is only my assumption. I would maybe leave a TODO here if I believed this conflict can be resolved in a foreseeable future. If I believed otherwise, I would probably eliminate this duplication because it would probably be easier for me to later copy-paste than find all the ocurrences of a piece of code.
  7. In the controller test, Sandro has the following login() method:

    String login(Request request, Response response) {
      UserCredentials credentials = credentialsFrom(request);
      Optional<User> user = userRepository.userFor(credentials);
      if(user.isPresent()) {
        response.status(OK_200);
        response.type("application/json");
        return jsonFor(user.get());
      }
      response.status(NOT_FOUND_404);
      return "Invalid credentials";
    }

    One thing that bothers be is that in the test for the case where a user is NOT present, there is no assertion on the content type of the response, which means that if I move the line response.type("application/json") above the if statement, all the tests are going to compile and pass but the response is going to be different. I assume that from Sandro's perspective, this doesn't matter as he has only one client application and that application doesn't care (or maybe acceptance tests check that?). Typically in what I do for a living, I am told to be strict about what I offer through my API and generous about what I accept. This is because where I work, the API is the product and I don't know all the client applications up-front.
  8. In the same method, a choice between response.status(OK_200) and response.status(NOT_FOUND_400) is what I call a situation of exclusive choice. In other words, this is not like an "add" method that can be called multiple times, but only one call to the status() method makes sense and it must be exclusively chosen. In such situation, I typically use Mockito's VerifyNoMoreInteractions() to make sure only one choice is made. In this case, however, there are several calls to the response object so VerifyNoMoreInteractions() would be too coarse-grained because it works for the whole mock, not for a single method. The code can be refactored (I will show how in a second) to make the use of VerifyNoMoreInteractions easier or something like this could work:

    verify(response).status(OK_200);
    verify(response, never()).status(intThat(status -> !status.equals(OK_200)));

    This could be also refactored to something like:

    verifyStatusSetExclusivelyTo(OK_200, response);

    The whole rationale for me doing it is that presence of two different calls to response.status() in one method move a potential error of calling them both at some point to the Inherently possible (can be made impossible) group from the inherently impossible (can be made possible) group.
  9. Sandro refactors the login() method to use two more methods and ternary operator (condition ? a : b) to transform the code into something like this:

    String login(Request request, Response response) {
      UserCredentials credentials = credentialsFrom(request);
      Optional<User> user = userRepository.userFor(credentials);
      return user.isPresent()
      ? prepareOkResponse(response, user)
      
    : prepareErrorResponse(response);}

    Bob and Sandro briefly discuss this choice, Sandro commenting that he likes ternary operator and that the real problem with ternary operators is when you chain them. Personally, though I have nothing against ternary operator as such I see one more issue with the use of ternary operator in this case and the refactoring that was done here. To demonstrate it on a simpler example, I find the following use of ternary operator confusing as well:

    int a = getA();
    int b = getB();
    int c = getC();
    int y = (a = b) == 3 ? c = 1 : 4;
    The part that confuses me in the code above (besides the cryptic variable names :-)) is that there are side-effects in ternary operator - two assignment operators which, in C-like languages, happen to return value as well. A ternary operator turns "if" from statement to an expression and typically when I see someone make expression out of a statement like this, it is to stress that the whole code in scope of the operator is just a composition of pure expressions or pure functions. This is where I think ternary operator is even clearer in conveying intention than the if statement. But this is not what happens in the login() method as both prepareOkResponse() and prepareErrorResponse() are impure functions with side effects, similarly to the assignment operator I presented above. For example, the prepareErrorResponse() is defined as:

    private String prepareErrorResponse(Response response) {
      response.status(NOT_FOUND_404); //side effect
      return "Invalid Credentials";
    }

    When looking at the login() method and the ternary operator, I wouldn't expect to find side effect in a function like this, so this is not my favorite solution. What would I do instead? Probably either I would not do the refactoring or (in more complex use case) I would refactor further to something like custom result observer or collecting parameter, which would on the other hand cost me at least one or two new classes (and would probably require moving the login() method logic to a service class from the controller). The code could then look like this (note that the Response type is changed to my own custom class):

    void login(Request request, MyOwnResponse response) {
      UserCredentials credentials = credentialsFrom(request);
      Optional<User> user = userRepository.userFor(credentials);
      if(user.isPresent()) {
        response.foundCredentialsMatching(user.get());
      } else {
        response.InvalidCredentials();
        }
    }

    This would have the added benefit that I could now use VerifyNoMoreInteractions on the response object as now there is only one exclusive call to this object in each case and the situation of exclusive choice becomes clearer.

    Note that in my example, the login() method is void. This would be a result of pushing the responsibility of holding the return value to the MyOwnResponse class. Someone that calls the login() method (in my case - a controller) could then do something like this:

    loginService.login(request, response);
    return response.text();
  10. Sandro claims his acceptance tests only test the sunny day scenarios because of test pyramid, so he pushes all the error conditions into the unit tests. Personally, I typically don't have as dramatic dilemma as I write bigger systems where there are three or four levels of tests. And while I would not test error conditions in the end-to-end tests, I would not leave them for unit tests alone. I would have maybe one test at the lowest test level where I still have real network and HTTP, one test for each error category in tests that test the whole component logic in isolation from external resources and leave the rest to unit tests. My experience is that errors and validation is something that sometimes can be more tricky than the sunny day logic. Moreover, I remember writing some code where error detection and reporting is the only logic there is (e.g. assertion libraries) so (even though the openchat app is not like that) I personally believe that validation and error reporting is often underrated part of the system, especially when it is a big system.
  11. An interesting observation is that, while Sandro tends to prefer vertical architecture slicing, his package structure does not seem reflect it (yet?). For example, he has domain.posts package where I would expect posts.domain with posts, the vertical feature, as the main package differentiator. That may be a temporary choice so I think I would need to watch more and see for myself.
And there it is. I suspect I will not be doing as lengthy notes onward as there will be less things I will see for the first time, but I can be wrong :-).

Thursday, 6 September 2018

Sandro Mancuso & Uncle Bob comparative design episode 1 notes

Update: Sandro commented on my notes in a twitter thread.

Notes from episode 2
Notes from episode 3/4/5
Notes from episode 6

I just finished watching episode 1 of Uncle Bob and Sandro Mancuso's London vs Chicago Comparative Design Case Study and enjoyed it. I really recommend this case study to anyone who wants to learn more about TDD.

This episode 1 was about Sandro beginning writing a twitter-like application and the first tests. Sandro claims to use the London Style TDD school, which is heavily based on mocking and outside-in approach. The school originated (AFAIK) from the Growing Object Oriented Software Guided By Tests book. Because of that and because I prefer this approach to design as well, I thought it could be interesting to write down, even if only for the "future me", the notes on the differences I think I can see between what Sandro did and what GOOS would do and what I would do. Of course, this is only based on episode 1, so some things may change during the next episodes.

Disclaimer

  1. This is a set of notes, not a full blown article, so its structure might be a bit too dense and unsolicited. I welcome discussion and requests to clarify stuff. If you didn't see the episode, there is a high chance some of what I write will not make sense to you.
  2. This post is not a critique of Sandro's style. I am not claiming that I am better in TDD than Sandro or better in this particular style of TDD. While I may sometimes sound as if I find some ideas silly, it's just for showing this stuff from the perspective of my biases and default choices (this is why sometimes I will describe my feelings, not only rational judgement). I am by no means suggesting that my choices are/would be better. I am sharing this because I think that sharing thoughts (and feelings) leads to interesting discussions. If you find me being disrespectful, please point it out so that I can correct it.
  3. While I point some of the differences I think I see between GOOS and Sandro's TDD, it's not that I do EVERYthing like in GOOS. I am not including the differences between my TDD and GOOS TDD because I figured nobody would be interested in that :-).
  4. I am not an authority on what is and is not compliant with GOOS style. I just write my subjective opinion based on my observations that may be wrong. If you see me missing the point, please point it out so that I can correct it and learn from the exercise.

Differences between Sandro's TDD and GOOS:

  1. GOOS worked example started with a single acceptance test, then some code was written (without unit tests) just to make this acceptance test pass, resulting with a walking skeleton. The next step would be to refactor the existing code based only on the acceptance test and then introduce some first abstractions and place to insert mocks for future behaviors. Sandro does two things differently: 
    1. He has all the acceptance tests written up-front (Update: this was explained by Sandro as being an input condition for the whole exercise)
    2. He drives even the very first feature with unit tests already.
  2. GOOS advises writing the acceptance tests in a domain language. For that, it recommends building a testing DSL that isolate a test from the implementation details like delivery mechanism. Sandro uses plain RestAssured library calls without (almost) any abstractions on top of it. This may be because Sandro's tests are not really end-to-end, but they are tests for HTTP contracts between the front-end and the back-end (Update: this was explained by Sandro as being an input condition for the whole exercise). Also, Sandro calls his tests "integration tests", which is a term that is used for a different kind of tests in GOOS, but that might not be an issue since, as I believe, the "integration" is more about the goal of a test, not about a level or kind.
  3. GOOS advises usage of strict mocking framework (JMock), while Sandro uses a more-loose-by-default mocking framework (Mockito). Usage of strict mocking framework by GOOS was not merely a choice dictated by the availability of tools at the time the book was written. On the GOOS mailing list, there is a comment by Steve Freeman that he considers strictness-by-default part of his style (cannot find it now though).
  4. GOOS advises mocking interfaces, while Sandro up to now has only mocked classes. As above, the choice to introduce and mock interfaces in GOOS was not dictated by the limitation of tools at that time. It was a deliberate choice to think of interfaces as roles that objects play and GOOS authors claim they are aggressive in refactoring interfaces to make them smaller and more abstract.

Now time for the differences between Sandro's TDD and my TDD:


  1. Sandro starts all his tests from a name and then an assertion. As this is one of my ways to start a failing tests, it's not the only one that I use. In my book, I describe 4 different approaches to starting with a failing test, starting from assertion being one of them.
  2. Sandro uses underscores in test names and lower camel case in production code method names. It's just a matter of preference, but I find that confusing - I find it better to stick to a single convention, which has the added value of simplifying the configuration of static analysis tools that enforce naming convention and takes one dilemma away. Still, this is not a big deal, just a matter of style.
  3. Sandro tends to prefer realistic test data even where this is not needed or required. In the first test, where he passes data for user registration, he chooses a realistic name, (semi)realistic password and the ID is generated as UUID and converted to string. This is not needed at all, because the object under test does not handle any validation and, as we find out later, the only class that knows that the user ID is actually a UUID is the IdGenerator - in the rest of the cases, any string would do. Personally I use dummy data wherever validating this data is not the responsibility of the object under test because I find that by doing it, I describe the contract of my class with its collaborators more precisely. Whenever I see a specific/realistic data, I interpret it as if it was required by the tested object. Also, I am confused about one more thing - if I use another ID generator or change the implementation of existing one, should I change the "realistic data" in the tests where it doesn't matter?
  4. Also, I wouldn't establish relationship between test data fragments where it is clearly not required. For example, Sandro creates a registration object with "name", "password" and "about" fields and sets up a mock of a service object to return a User object with the same three values. Personally, I don't see why. The responsibility for establishing relationship between input and output values is clearly pushed to another class that is mocked (a service class), so in my mind, this is leaking responsibilities of another class to tests of class that clearly does not care. And as I understand GOOS compositional approach to design and its principle of context independence, a class and its tests should not assume in what context the class is used.
  5. Sandro moves parts of the code to the setup method and parts of the data to fields of class with tests, arguing that this is cleaner. I wrote a blog post long ago explaining why I don't like this idea and why I find such tests harder to read and maintain with the "Testcase class per class" organization approach.. Looking at the video, the setup method is called "initialise", which I find a meaningless name and usually read it as "bag for everything I don't want to look at". Thus I consider it a smell and I also feel it takes away a bit of the pressure from tests to improve the design. When I see some logic that I strongly feel could be moved to a setup method and fields, this leads me to searching for a way to improve the design. I find setup methods hiding these issues for me. I understand that most probably, Sandro has a way of compensating for all the issues I described and maybe even making them irrelevant.
  6. The method of UserAPI class that Sandro test-drives clearly violates the principle of command-query separation (it is a command that returns a value as well). This is often what frameworks require us to do, unfortunately. Sandro chooses to push this violation further into the application/domain layer (specifically - the UserService class). Personally, I tend to cut away from this kind of violation on controller level using a Collecting Parameter pattern and then I only deal with commands, which I find makes the rest of my tests easier.
  7. In the controller test, Sandro includes parsing in the scope. I would probably push the parsing responsibility to a collaborator. I can see that Sandro can succeed with his approach with a simple, non-nested JSON with three fields, but I wonder if his approach would be the same if it was a bigger structure with several nesting levels, optional sections etc.

Sunday, 2 July 2017

What's happening with my book and future plans

Hi, I haven't posted in a while, so a quick update:

  • I am currently focusing on my book, that's why the updates on this blog are extremely rare.
  • I have some nice topics to write blog posts about, but I'm putting them on hold for a bit. The steady increase pace of my book's content (it's already over 250 pages A4!) should make it up for you.
  • Speaking about book's content, I recently finished an overhaul of part 1, which took me a long time and lots of effort.
  • I'm currently fixing some bugs submitted on the book's github page for some portions of part 2
  • When I finish this, I plan on continuing part 2, review the value objects chapters and probably move TDD with mocks to part 3.
  • After that, I want to write something about hexagonal architecture (which will probably be part 4) and TDDing on a level of a whole component, using mentioned hexagonal approach (maybe part 5?).

There is still a lot of work ahead of me, so thank you all for your support and I hope you like the new book content. I not, well, you can still contact me with your thoughts and opinions!

Saturday, 13 February 2016

Is test isolation always a good thing?

TDD is not only about unit tests. Steve Freeman and Nat Pryce argue in their book that a TDD cycle is kickstarted using an end-to-end automated test. These tests exercise a system as a black box. This often implies difficulties and issues that well written unit tests don’t have. On of such fields of difficulty is test fixture management, especially fixture teardown.

Gerard Meszaros in his book titled XUnit test patterns provides an extensive discussion of different fixture management strategies and their implications. Using Meszaros’ terminology, what I have seen many end-to-end tests using is a combination of:

  • Persistent shared immutable fixture - this is the data that’s shared by many tests - usually some initial environment setup and plumbing, uploading licenses and so on. The “immutable” part of the fixture name means that no test is allowed to modify any data that’s part of this fixture. The reason is that each test relies on this data being intact. If any test modifies the fixture, then other tests’ assumptions are broken and we cannot expect them to work reliably.
  • Persistent fresh fixture - this is the data that’s needed for a single test. This may be, for example, adding a specific employee required by the test scenario to a database. As database is a persistent storage, this data must be cleaned up either at the end of the same test that created it or at the beginning of any new test (there are some ways to achieve that safely, but that’s a topic for another post). The cleanup is needed to avoid test interactions - many tests may add an employee with the same ID number, but with other data being different. Thus we need to prepare a clean bed for each test so that leftover data from previous tests doesn’t get in the way (by the way, there are some other ways of dealing with test interactions than cleaning up - although they’re quite difficult to apply).
  • Transient fresh fixture - this part of data is created inside a test and removed without our intervention when the test finishes running, usually by garbage collector or other memory management mechanism. The simplest example of such fixture would be an integer variable defined in a test method - we don’t have to worry about it impacting the other tests, because its lifetime is automatically associated with the lifetime of a test.

After reading XUnit test patterns, my conclusion was obvious - life would be beautiful if we could only use transient fresh fixture. When writing unit tests for classes or small clusters of classes, this kind of fixture is the only one we ever need to use. Things change, on the other hand, when e.g. our tests require installation of the application and a database - in such situation, using only transient fixture would probably mean that each test should perform a fresh installation of an operating system. This would take ages, so we use the other fixture types as a compromise. We accept the problems they bring and accept the fact that they can introduce interactions between tests (and thus, instability) and trade these things for shorter test execution time.

I met with an opinion that this is not actually bad. What’s more, the opinion states, this can be seen as an advantage, because when a single instance of an application is exercised by many different tests, it resembles closer the way the user exercises the system when it’s deployed and, by working longer with an existing instance of a system, it may uncover some defects that would otherwise not become visible.

I have several issues with this point of view on fixture management. In order to explain why is that, let’s recap the differences between how user exercises a system and how automated end-to-end tests do it.

What are the differences between usage by automated tests and by user?

There are three pretty big differences that I can think of:

An automated test has a scope, i.e. it has a beginning

When defining a test, we usually define the context in which a test should be executed, a series of steps and an expected outcome. For example, we might say:

Given there are two employees: Ken and Tom //the context
When I choose to remove Ken's data         //step 1
And list all my employees                  //step 2
Then I should only see Tom's data          //expected outcome

Here, the explicitly stated context is that we have two employees - if there is only one, executing the two steps would most probably create different results. Of course, there is also an implicit context that’s not written here, for example that the application is installed and turned on and probably that I’m logged into the application. The test relies on both kinds of context to be the same for every execution. In other words, in order to bring stable outcomes, a test requires a context to be defined. Users usually work in the context they are currently given and are able to adapt to it. For example, I want to create an e-mail account with an address "buziaczek@mymail.com", but this address already exists. No problem, I’ll just create "buziaczek-1984@mymail.com" and I’m ready to go. Most of the time, an automated test doesn’t make such decisions (although we can do something to bypass such situations, e.g. generating a portion of the e-mail address randomly, but note that it’s still not making a decision).

Thus, although it may happen for a user to follow a strict sequence of steps in a strictly defined context, this is not what their work with the application is about. If it was, we could probably automate these sequences in the software itself.

Anyway, a demand for strictly defined context for each automated test means that this context would usually have to be cleaned after or before each test (Meszaros writes about other ways, like generating distinct test data for each test, but I didn’t see them in practice all that often). This is somehow contradictory to wanting to achieve testing “as the user would” by executing tests in partially the same context (i.e. on the same running or installed application instance). The contradiction for me is that we seem to value test isolation, except that we don’t.

An automated test has a scope, i.e. it has an end

A user uses a product in a continuous manner, usually taking many, many steps during each session with an application. On the other hand, when reading on good practices for automated testing, one can spot that such tutorials hardly ever advise writing long, automated scenarios. It is more common to see several test cases for several different behaviors or variants of a single behavior rather than a single test case exercising all of these conditions. So, it looks like we deliberately try to run away from testing the system the way a user would in automated tests. Why is that?

This is for several reasons. One of them is that most of the time, a failed assertion ends an automated test, so the longer the test, the harder it is to get feedback from the later parts of that test, because there are more things that can fail along the way. Two other reasons that I’d like to talk about are: splitting work into manageable chunks and defect localization.

Splitting into manageable chunks means that we prefer shorter scenarios to longer ones because it is easier to maintain and modify each scenario independently when it’s smaller. This is because building up context becomes increasingly harder to grasp, understand and manage. Users, on the other hand, don’t need to manage scenarios - they dynamically choose their goals and steps based on the current situation.

Defect localization means that when a test fails, I need to search for the failure root cause only among what’s in the scope of this test. When a scope of a test is smaller, I have fewer places to search for such root cause. This is also one of the reasons why we set descriptive names for tests, i.e. we’d rather say “the user should be able to delete an employee” than “the user should be able to play with the system” - because in the first case, a failure points us closer to the root cause. The usual thing I do when I see an automated test failing is to try and run it in isolation (i.e. not as part of a suite) and in clean environment. This is because it gives me a better feedback on whether the failure is caused by something that’s in the scope. So, the broader the scope of a test, the lower defect localization.

In his wonderful book, Lean-Agile Acceptance Test-Driven Development, Ken Pugh outlines three conditions we strive to achieve for automated tests:

  1. A test should fail for a well-defined reason.
  2. No other test should fail for the same reason.
  3. A test should not fail for any other reason.

And, while Ken admits this is often very hard (or impossible) to achieve, this is the ideal we should be targeting. Managing a persistent fixture introduces more complexity and possible root causes of test failures, driving us away from these three conditions.

An automated test has a scope, i.e. it has a purpose

As I said above, a test should fail for a well-defined reason. This well-defined reason for failure is a purpose of the test. In other words, every test failure should relate to its purpose. This purpose is usually conveyed in test name, for example “An added employee should appear on an employee report”. Note that very rarely do we see tests like “previous tests runs should not have led the application into an illegal or crashing state”. This is because the tests aren’t usually written to check whether an application crashes if we do some random things with it. Sharing parts of the fixture between tests doesn’t help them achieving their purpose.

I also said that it is probably impossible to get rid of all the things that are beyond the purpose of a test but still can impact the outcome of the test. For example, if a test loads anything from a disk, it can fail because of (unlikely but possible) disk drivers failure. These things are outside the scope of a test, but still influence its result.

The point is, the more things exist outside the scope of an automated test that can affect its outcome, the more the test becomes incompetent at testing what’s inside its scope. Do we want the tests to be incompetent in fulfilling their purpose? I hope not. Thus, my preference is to maximize the competency of a test by reducing to a minimum other things that can impact its outcome. This is how I believe I get the most value out of automated tests. This is in line with Meszaros’ write-up on the goals of test automation, where he stresses the ease of maintainability as one of such goals.

Automated tests teardown is often very error-prone and adds a lot of instability to the test suite, thus making it less competent in doing the thing it written for. For some time, For test execution time-related reasons, I’m afraid we’ll have to live with sharing some parts of persistent fixtures, however, If I could get rid of the necessity of teardowns in tests without sacrificing their run time, I’d do so in a blink of an eye.

Summary

My opinion on test isolation is that the more the better. For me, saying that partial lack of isolation is better (because it would uncover some defects only a user otherwise could) is really taking the biggest disadvantage of automated black-box tests and pretending it’s an advantage. It is relying on undeliberate potential effect of this disadvantage. Creating a test approach targeted towards uncovering such defects (i.e. relying on deliberate effort to uncover them instead of hoping they will come up from other tests) would in my opinion give a better return of investment.