There's a long discussion on Reddit named Is it OK to split long functions and methods into smaller ones even though they won't be called by anything else?. Some people in that discussion hold an opinion that it's not always the best idea to split a long method into smaller methods. Well, of course, you can refactor code in many different ways, including ways that read worse than the original long method. But I think that in most cases it makes for better readability to split long methods into smaller ones.
So, I want to do an experiment and try to split a long method that's difficult to split into smaller methods. I know that it's easy to extract methods when the code uses several levels of abstraction or a low level of abstraction. In such a case it's easy to achieve better readability by extracting some methods. So, the hardest long method I can think of would use a high level of abstraction, from which it wouldn't be easy to go to a level higher.
Such a method can be found in Rails' ActiveRecord::Persistence
module. It's the #touch
method, and it deals with timestamp attributes that need to be updated with current time, scopes, primary keys, locks and SQL UPDATE. In other words, the level of abstraction is high enough and it's the level that deals with database-level concepts. I expect it won't be that easy to go to a higher level of abstraction here.
Introducing the #touch method
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 |
def touch(*names, time: nil) unless persisted? raise ActiveRecordError, <<-MSG.squish cannot touch on a new or destroyed record object. Consider using persisted?, new_record?, or destroyed? before touching MSG end time ||= current_time_from_proper_timezone attributes = timestamp_attributes_for_update_in_model attributes.concat(names) unless attributes.empty? changes = {} attributes.each do |column| column = column.to_s changes[column] = write_attribute(column, time) end primary_key = self.class.primary_key scope = self.class.unscoped.where(primary_key => _read_attribute(primary_key)) if locking_enabled? locking_column = self.class.locking_column scope = scope.where(locking_column => _read_attribute(locking_column)) changes[locking_column] = increment_lock end clear_attribute_changes(changes.keys) result = scope.update_all(changes) == 1 if !result && locking_enabled? raise ActiveRecord::StaleObjectError.new(self, "touch") end @_trigger_update_callback = result result else true end end |
At 42 lines, #touch
is quite long and requires some explaining. Arguments #touch
accepts are names
(meaning timestamp attributes to update) and time
(the time to set timestamps to). Then we decline to work on non-persisted records (lines 2-7). Then we set up the default time
value to current time and merge standard timestamp attributes and passed timestamp attributes (names
) into attributes
array (lines 9-11).
Then goes the most interesting part, when there are some attributes
to update (i.e. attributes
isn't empty). We instantiate changes
hash to pass to #update_all
later, and fill it up with attribute
keys and time
values, and set timestamp attributes on the record to time
(lines 14-19). Then we setup a scope we'll be using to match this very record that we #touch
, using primary key (lines 21-22). Then we deal with the case when locking is on, updating both record, changes
hash and the scope (lines 24-28).
And then clear #changed
, so that after we've updated/touched the record, the stuff we've updated in db is no longer marked as changed (line 30). Then we run #update_all
. If we've failed to find the record, because of locking issues, we raise a StaleObjectError
(lines 33-35). And at last, we setup a special flag telling Rails whether we've actually changed data in the db.
Take 1: an easy way to split
Now, an easy way to split a method is to see how it's structured, and split at the seams. I'd expect a junior developer to do just that. And that's what I did:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 |
unless attributes.empty? changes = {} update_record_and_changes_with_time(attributes, time, changes) scope = scope_by_primary_key if locking_enabled? scope = extend_scope_to_match_locking_column_value(scope) update_record_and_changes_with_lock_increment(changes) end clear_attribute_changes(changes.keys) result = scope.update_all(changes) == 1 if !result && locking_enabled? raise ActiveRecord::StaleObjectError.new(self, "touch") end @_trigger_update_callback = result result else true end |
As you can see, I just replaced more verbose code with explanations. The same level of abstractions is used. The main problem here is whether it's still clear what's going on. I've shown this code to a person and he says it's still clear.
However, this is really my 3rd take on refactoring #touch
. The first two refactoring were pretty bad. It took me awhile to understand all the intricacies of this method.
Hidden concepts
There are hidden concepts in the original code that the reader has to figure out by themselves. I describe them beneath the code (no need to read the code below without explanations).
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 |
unless attributes.empty? changes = {} attributes.each do |column| column = column.to_s changes[column] = write_attribute(column, time) end primary_key = self.class.primary_key scope = self.class.unscoped.where(primary_key => _read_attribute(primary_key)) if locking_enabled? locking_column = self.class.locking_column scope = scope.where(locking_column => _read_attribute(locking_column)) changes[locking_column] = increment_lock end clear_attribute_changes(changes.keys) result = scope.update_all(changes) == 1 if !result && locking_enabled? raise ActiveRecord::StaleObjectError.new(self, "touch") end @_trigger_update_callback = result result else true end |
So, the hidden concepts are:
changes
and record attributes are updated with the same information at the same time (lines 4-7 and 15). It follows that the record attributes andchanges
hold the same information afterwards.scope
is used to match the record#touch
is called on (lines 9-10 and 14).- Scope extension to include locking information has to happen before
#increment_lock
is called.
Also, there is understanding of why we need #clear_attribute_changes
(on the line 18) after we're done with adding locking information to changes
. In fact, that statement used to be placed before the code that deals with locking, and was fixed later. The bug was that locking column was updated, but not cleared off #changed
. Of course, now the tests prevent from regressions, but it'd be great to understand all the intricacies easily.
I believe that if we could put all the code that deals with updating record attributes into one place, and place #clear_attribute_changes
afterwards, it'll be a bit clearer. This brings us to another refactoring take:
Take 2: split by activity
In this take I've clearly separated scope
building and updating of record attributes and changes
hash:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 |
unless attributes.empty? scope = prepare_scope_to_match_this_record changes = {} update_record_and_changes_with_same_data(attributes, time, changes) result = scope.update_all(changes) == 1 if !result && locking_enabled? raise ActiveRecord::StaleObjectError.new(self, "touch") end @_trigger_update_callback = result result else true end private def update_record_and_changes_with_same_data(attributes, time, changes) attributes.each do |column| column = column.to_s changes[column] = write_attribute(column, time) end if locking_enabled? changes[self.class.locking_column] = increment_lock end clear_attribute_changes(changes.keys) end |
I think the line 2 is as good as it gets, but the line 5 could be named better. The level of abstraction used is essentially the same as the original code, dealing with records and scopes.
As for revealing of the hidden concepts, we have:
- Clear indication that the record attributes and
changes
are updated with the same data (see the line 5). - It's clear that
scope
is meant to match the record#touch
is called on. - It's not clear why
scope
should be prepared before changing record, at least not in a prominent way. I considered naming scope preparing method#prepare_scope_to_match_this_unchanged_record
, but_unchanged_
would just add cognitive overhead. It's not every day that you're moving things around. And, match this record hints that values used for that should be such that the record can be found, i.e. unchanged.
Take 3: split by the domain concepts
The first two takes haven't really deviated from the existing code structure, the first take especially. But there's another way to look at it, from the domain perspective. You could say it's thinking out of the box.
What does the code tell us about domain? Do we see domain concepts manifesting in the code?
If I look at #touch
with that in mind, I can't help noticing that domain logic for #touch
should be:
- Updating the record attributes with new
time
. Standard timestamp attributes and passed timestamp attributes are updated. - Saving those attributes in db.
That's the level of abstraction that should be used, when looking from the domain context perspective:
1 2 3 4 5 6 7 8 9 10 11 |
unless attributes.empty? attributes.each do |column| column = column.to_s write_attribute(column, time) end touch_columns(*attributes) else true end |
So, here we update the standard and passed timestamp attributes on the record and tell #touch_columns
to save them in db.
I like this take the most. There's no word about locking as it doesn't really belong in the domain logic, and we don't know how the attributes are going to get saved, all we care about is that they get saved.
The name #touch_columns
is not very good. In Rails, there is #update_attributes
that #save
s the record, and there is #update_columns
that just executes SQL and doesn't call any callbacks. #touch_columns
is in-between, skipping validations, but calling callbacks. I just don't know Rails enough to come up with a better name. But otherwise, it's a good take.
Conclusion
I've selected code that I considered to be hard to split into smaller methods. It had high level of abstraction, which increased the difficulty level.
So, Is it always a good idea to split a long method into smaller ones? The conclusion I've made is that it depends on your refactoring skills and tenacity. I could have stopped after my first two attempts (take 1 is actually my 3rd attempt) and declared it impossible to split #touch
, because the first two attempts were bad.
All three takes described in this post are fitting as a replacement of the original code. Take 1 is better than the original code because it allows to understand what's going on faster, and provides the same level of understanding of the implementation. Take 2 is better than the original code because it highlights hidden concepts that the reader would have to take more time to get otherwise. And finally, take 3 is better than the original code because it uses domain-level concepts, and that makes #touch
much easier to reason about.
I would also like to note that doing just extract methods as the original question seems to imply, is very limiting. Refactoring is much more than just extracting methods.