Is it always a good idea to split long methods into smaller ones? An experiment.

There's a long discussion on Reddit named Is it OK to split long functions and methods into smaller ones even though they won't be called by anything else?. Some people in that discussion hold an opinion that it's not always the best idea to split a long method into smaller methods. Well, of course, you can refactor code in many different ways, including ways that read worse than the original long method. But I think that in most cases it makes for better readability to split long methods into smaller ones.

So, I want to do an experiment and try to split a long method that's difficult to split into smaller methods. I know that it's easy to extract methods when the code uses several levels of abstraction or a low level of abstraction. In such a case it's easy to achieve better readability by extracting some methods. So, the hardest long method I can think of would use a high level of abstraction, from which it wouldn't be easy to go to a level higher.

Such a method can be found in Rails' ActiveRecord::Persistence module. It's the #touch method, and it deals with timestamp attributes that need to be updated with current time, scopes, primary keys, locks and SQL UPDATE. In other words, the level of abstraction is high enough and it's the level that deals with database-level concepts. I expect it won't be that easy to go to a higher level of abstraction here.

Introducing the #touch method

At 42 lines, #touch is quite long and requires some explaining. Arguments #touch accepts are names (meaning timestamp attributes to update) and time (the time to set timestamps to). Then we decline to work on non-persisted records (lines 2-7). Then we set up the default time value to current time and merge standard timestamp attributes and passed timestamp attributes (names) into attributes array (lines 9-11).

Then goes the most interesting part, when there are some attributes to update (i.e. attributes isn't empty). We instantiate changes hash to pass to #update_all later, and fill it up with attribute keys and time values, and set timestamp attributes on the record to time (lines 14-19). Then we setup a scope we'll be using to match this very record that we #touch, using primary key (lines 21-22). Then we deal with the case when locking is on, updating both record, changes hash and the scope (lines 24-28).

And then clear #changed, so that after we've updated/touched the record, the stuff we've updated in db is no longer marked as changed (line 30). Then we run #update_all. If we've failed to find the record, because of locking issues, we raise a StaleObjectError (lines 33-35). And at last, we setup a special flag telling Rails whether we've actually changed data in the db.

Take 1: an easy way to split

Now, an easy way to split a method is to see how it's structured, and split at the seams. I'd expect a junior developer to do just that. And that's what I did:

As you can see, I just replaced more verbose code with explanations. The same level of abstractions is used. The main problem here is whether it's still clear what's going on. I've shown this code to a person and he says it's still clear.

However, this is really my 3rd take on refactoring #touch. The first two refactoring were pretty bad. It took me awhile to understand all the intricacies of this method.

Fed up working on bad code? Here's a way out!

For people that that want to stop suffering from bad code I’ve made a FREE course

Hidden concepts

There are hidden concepts in the original code that the reader has to figure out by themselves. I describe them beneath the code (no need to read the code below without explanations).

So, the hidden concepts are:

  1. changes and record attributes are updated with the same information at the same time (lines 4-7 and 15). It follows that the record attributes and changes hold the same information afterwards.
  2. scope is used to match the record #touch is called on (lines 9-10 and 14).
  3. Scope extension to include locking information has to happen before #increment_lock is called.

Also, there is understanding of why we need #clear_attribute_changes (on the line 18) after we're done with adding locking information to changes. In fact, that statement used to be placed before the code that deals with locking, and was fixed later. The bug was that locking column was updated, but not cleared off #changed. Of course, now the tests prevent from regressions, but it'd be great to understand all the intricacies easily.

I believe that if we could put all the code that deals with updating record attributes into one place, and place #clear_attribute_changes afterwards, it'll be a bit clearer. This brings us to another refactoring take:

Take 2: split by activity

In this take I've clearly separated scope building and updating of record attributes and changes hash:

I think the line 2 is as good as it gets, but the line 5 could be named better. The level of abstraction used is essentially the same as the original code, dealing with records and scopes.

As for revealing of the hidden concepts, we have:

  1. Clear indication that the record attributes and changes are updated with the same data (see the line 5).
  2. It's clear that scope is meant to match the record #touch is called on.
  3. It's not clear why scope should be prepared before changing record, at least not in a prominent way. I considered naming scope preparing method #prepare_scope_to_match_this_unchanged_record, but _unchanged_ would just add cognitive overhead. It's not every day that you're moving things around. And, match this record hints that values used for that should be such that the record can be found, i.e. unchanged.

Take 3: split by the domain concepts

The first two takes haven't really deviated from the existing code structure, the first take especially. But there's another way to look at it, from the domain perspective. You could say it's thinking out of the box.

What does the code tell us about domain? Do we see domain concepts manifesting in the code?

If I look at #touch with that in mind, I can't help noticing that domain logic for #touch should be:

  1. Updating the record attributes with new time. Standard timestamp attributes and passed timestamp attributes are updated.
  2. Saving those attributes in db.

That's the level of abstraction that should be used, when looking from the domain context perspective:

So, here we update the standard and passed timestamp attributes on the record and tell #touch_columns to save them in db.

I like this take the most. There's no word about locking as it doesn't really belong in the domain logic, and we don't know how the attributes are going to get saved, all we care about is that they get saved.

The name #touch_columns is not very good. In Rails, there is #update_attributes that #saves the record, and there is #update_columns that just executes SQL and doesn't call any callbacks. #touch_columns is in-between, skipping validations, but calling callbacks. I just don't know Rails enough to come up with a better name. But otherwise, it's a good take.

Conclusion

I've selected code that I considered to be hard to split into smaller methods. It had high level of abstraction, which increased the difficulty level.

So, Is it always a good idea to split a long method into smaller ones? The conclusion I've made is that it depends on your refactoring skills and tenacity. I could have stopped after my first two attempts (take 1 is actually my 3rd attempt) and declared it impossible to split #touch, because the first two attempts were bad.

All three takes described in this post are fitting as a replacement of the original code. Take 1 is better than the original code because it allows to understand what's going on faster, and provides the same level of understanding of the implementation. Take 2 is better than the original code because it highlights hidden concepts that the reader would have to take more time to get otherwise. And finally, take 3 is better than the original code because it uses domain-level concepts, and that makes #touch much easier to reason about.

I would also like to note that doing just extract methods as the original question seems to imply, is very limiting. Refactoring is much more than just extracting methods.

Hire me to help your team ship features faster.

Fed up working on bad code? Here's a way out!

For people that that want to stop suffering from bad code I’ve made a FREE course

Web Analytics