metaprogramming - Ruby clarity

acts_as_list refactoring part 3

Dmitry — Fri, 02 Jun 2017 01:07:13 +0000

I refactor acts_as_list Ruby gem again: watch as I choose better names, strip unnecessary variables, work with some ActiveRecord internals and make code intent clearer. In this refactoring adventure I'm going to focus on just one 11-line method, and surprisingly, there's a lot of things that can be improved in just one method.

You don't need to read part 2 and part 1 to understand this article.

acts_as_list is a Rails gem. It allows you to treat Rails model records as part of an ordered list and offers methods like #move_to_bottom and #move_higher.

Step 1: a hairy method using #send

.update_all_with_touch method caught my attention as it's a somewhat long (11 lines) and hairy method. This method executes passed SQL, as Rails' #update_all does, but also updates standard timestamps like updated_at.

define_singleton_method :update_all_with_touch do |updates|                
  record = new                                                             
  attrs = record.send(:timestamp_attributes_for_update_in_model)           
  now = record.send(:current_time_from_proper_timezone)                    

  attrs.each do |attr|                                                     
    updates << ", #{connection.quote_column_name(attr)} = #{connection.quote(connection.quoted_date(now))}"
  end                                                                      

  update_all(updates)                                                      
end

Let's have a look. On the line 2 (see above ↑) it creates a new model instance (acts_as_list is supposed to extend ActiveRecord models, so naturally new would create one). Then it sends two messages to the created model instance record via #send. The reason it uses #send is because both those methods are private, so it can't just say record.timestamp_attributes_for_update_in_model. Now, this is some cognitive load, because every time I read these lines I can't help think of why #send has to be used. But I'll get to it later, let's look at the rest of the method now.

One the lines 6-10 (see above ↑), SQL is built and appended to updates argument, modifying it. Each of the timestamp_attributes_for_update_in_model is updated with current time. And after the SQL was built, it's executed with Rails' standard #update_all.

So, this method does two things - build SQL and execute it. And most of the method is taken up by building SQL.

Is anything wrong with this method? For my taste, it's too hairy, and it goes into too much detail about details of building SQL. So, the first thing I want to do is to go to a higher level of abstraction on building SQL:

define_singleton_method :update_all_with_touch do |updates|
  update_all(updates << touch_record_sql)
end

private

define_singleton_method :touch_record_sql do
  record = new
  attrs = record.send(:timestamp_attributes_for_update_in_model)
  now = record.send(:current_time_from_proper_timezone)

  updates = ""
  attrs.each do |attr|
    updates << ", #{connection.quote_column_name(attr)} = #{connection.quote(connection.quoted_date(now))}"
  end

  updates
end

The result on the line 2 (see above ↑) allows us to grasp what's going on much faster. updates is being appended with SQL of touch_record_sql. It took me more than two pomodoro to figure out a decent name for the method. I've tried many, including update_standard_timestamps_to_current_time_sql (if only it wasn't that long). I prefer touch_record_sql because it uses a well known term touch, which is inherited from Unix touch(1) command and Rails' #touch. Touch means update appropriate timestamps.

Step 1.1: a misnomer

I've copied the code from above for easier reference:

define_singleton_method :update_all_with_touch do |updates|
  update_all(updates << touch_record_sql)
end

private

define_singleton_method :touch_record_sql do
  record = new
  attrs = record.send(:timestamp_attributes_for_update_in_model)
  now = record.send(:current_time_from_proper_timezone)

  updates = ""
  attrs.each do |attr|
    updates << ", #{connection.quote_column_name(attr)} = #{connection.quote(connection.quoted_date(now))}"
  end

  updates
end

On the lines 12-17 (see above ↑), updates variable we inherited from .update_all_with_touch method, doesn't explain what's going on well enough. It may mean updates we want to do to db records, but it's far from being obvious. It's not a bad name, but I prefer sql, to be in tune with the method's name touch_record_sql:

define_singleton_method :touch_record_sql do
  record = new
  attrs = record.send(:timestamp_attributes_for_update_in_model)
  now = record.send(:current_time_from_proper_timezone)

  sql = ""
  attrs.each do |attr|
    sql << ", #{connection.quote_column_name(attr)} = #{connection.quote(connection.quoted_date(now))}"
  end

  sql
end

Step 1.2: #each that collects data into a variable

One the lines 6-11 (see above ↑) #each loops over timestamp attributes and collects SQL fragments into sql variable. It's a typical misuse of #each and it could be replaced with #map(...).join(", ") if we didn't need the leading ,. In this case, #each can be replaced with #inject:

define_singleton_method :touch_record_sql do
  record = new
  attrs = record.send(:timestamp_attributes_for_update_in_model)
  now = record.send(:current_time_from_proper_timezone)

  attrs.inject("") do |sql, attr|
    sql << ", #{connection.quote_column_name(attr)} = #{connection.quote(connection.quoted_date(now))}"
  end
end

Step 1.3: using #send to execute private methods

As I mentioned previously, #send is used here to run private methods on a model instance (see the lines 3-4 above ↑). And, it incurs cognitive load, because you have to wonder why #send is used here. So, I chose to move this code to an instance method:

define_singleton_method :touch_record_sql do
  new.touch_record_sql
end

...

define_method :touch_record_sql do
  connection = self.class.connection
  attrs = timestamp_attributes_for_update_in_model
  now = current_time_from_proper_timezone

  attrs.inject("") do |sql, attr|
    sql << ", #{connection.quote_column_name(attr)} = #{connection.quote(connection.quoted_date(now))}"
  end
end

The line 2 (see above ↑) raises a question though: Why do we need to create model instance to build SQL?. Oh well, it's not perfect.

On the line 8 (see above ↑) I had to add connection variable, so even though record = new is no longer needed, we've not reduced the number of lines. But #send is gone! (see the lines 9-10 above ↑).

Step 1.4: a redundant variable

On the line 9 (see above ↑) there's a variable attrs that is used on the line 12 only. And, since we no longer have #send, i.e. the right side of the variable assignment isn't hairy, we can just do away with it. After all, what new does attrs tells us that timestamp_attributes_for_update_in_model does not?

define_method :touch_record_sql do
  connection = self.class.connection
  now = current_time_from_proper_timezone

  timestamp_attributes_for_update_in_model.inject("") do |sql, attr|
    sql << ", #{connection.quote_column_name(attr)} = #{connection.quote(connection.quoted_date(now))}"
  end
end

Step 1.5: another redundant variable

On the line 3 (see above ↑), there's now variable that is only used in one place, at the line 6. It could be said that there's a performance benefit to keeping current_time_from_proper_timezone call out of the #inject loop, but it also could be said that it's premature optimisation.

It looks like a stalemate, but thankfully, there's another angle we can use here - readability. From readability perspective, having now out of the loop clarifies that the now value doesn't depend on the loop.

But on the line 6 there's also some code that doesn't depend on the loop - #{connection.quote(connection.quoted_date(now))}, and it's hard to reason about two parts of the value that doesn't depend on the loop. So, I'm still going to inline now, and see how it goes:

define_method :touch_record_sql do
  connection = self.class.connection

  timestamp_attributes_for_update_in_model.inject("") do |sql, attr|
    sql << ", #{connection.quote_column_name(attr)} = #{connection.quote(connection.quoted_date(current_time_from_proper_timezone))}"
  end
end

Step 1.6: expression that doesn't depend on loop

Have to say, I like less lines here, but the right value in the SQL assignment doesn't depend on loop values, so it should be moved out of the loop for clarity:

define_method :touch_record_sql do
  connection = self.class.connection
  quoted_now = connection.quote(connection.quoted_date(
    current_time_from_proper_timezone))

  timestamp_attributes_for_update_in_model.inject("") do |sql, attr|
    sql << ", #{connection.quote_column_name(attr)} = #{quoted_now}"
  end
end

Step 1.7: a hairy assignment

At this point, when looking at the lines 3-4 (see above ↑) I'm asking myself why not extract quoted_now into a method. I did just that:

define_method :touch_record_sql do
  connection = self.class.connection
  quoted_now = quoted_current_time_from_proper_timezone

  timestamp_attributes_for_update_in_model.inject("") do |sql, attr|
    sql << ", #{connection.quote_column_name(attr)} = #{quoted_now}"
  end
end

private

def quoted_current_time_from_proper_timezone
  self.class.connection.quote(self.class.connection.quoted_date(
    current_time_from_proper_timezone))
end

Step 1.8: an unclear moment

On the lines 3-7 (see above ↑) we can see that quoted_now is used only on the line 6, and a question arises Why not inline it and live happily ever after?. We already discussed that quoted_now must be the same for all its uses within SQL, but I've failed to encode this knowledge into words. So, I'm going to use a variable name that clearly explains that - cached_quoted_now:

define_method :touch_record_sql do
  connection = self.class.connection
  cached_quoted_now = quoted_current_time_from_proper_timezone

  timestamp_attributes_for_update_in_model.inject("") do |sql, attr|
    sql << ", #{connection.quote_column_name(attr)} = #{cached_quoted_now}"
  end
end

I quite like the result. Have to say, I first got the idea of using a different name for quoted_now when I imagined that the extracted method would be called #quoted_now, so I had to invent a new name for the variable as quoted_now = self.quoted_now would suck.

Step 1.9: another redundant variable

On the line 2 (see above ↑) you can see a remnant of the beginning of this refactoring, connection variable. Now that it's used only once on the line 6, it can be inlined. So, typing self.class.connection sucks, so why not have #connection method? I'd say that it goes agains the Single Responsibility Principle, but convenience trumps it in this case!

define_method :touch_record_sql do
  cached_quoted_now = quoted_current_time_from_proper_timezone

  timestamp_attributes_for_update_in_model.inject("") do |sql, attr|
    sql << ", #{connection.quote_column_name(attr)} = #{cached_quoted_now}"
  end
end

private

delegate :connection, to: self

Not that bad.

Step 1.10: #inject isn't the best way

I got a suggestion from reader Alex Piechowski that #map and #join can be used here, instead of #inject:

define_method :touch_record_sql do
  cached_quoted_now = quoted_current_time_from_proper_timezone

  timestamp_attributes_for_update_in_model.map do |attr|
    ", #{connection.quote_column_name(attr)} = #{cached_quoted_now}"
  end.join
end

Thank you, Alex!

Afterword

I planned on doing more stuff in this refactoring, but it's quite a lot as it is. And most impressive to me, it all came from refactoring a single method. Just how much can you draw from a single method? Turns out, quite a lot.

Happy hacking!

P.S. my PR was accepted by acts_as_list project!

The post acts_as_list refactoring part 3 first appeared on Ruby clarity.

CreateSend refactoring part 2

Dmitry — Tue, 24 May 2016 01:02:06 +0000

Today I'm continuing to refactor a library called CreateSend. The first part is here.

In part 1 I've finished all class methods of Base classs and now I'm going to refactor instance methods.

Step 1: initialize method

initialize (see above ↑) chooses not to use named arguments, and treats method arguments as an array of many arguments. And yet, it only processes one argument from the whole args array.

It's misleading to accept arguments and throw them away. It will require a look into source code to find out why some passed arguments caused no change in behaviour. On the other hand, if we state that initialize only accepts one argument, Ruby will complain if we pass any other parameters. Easier to debug. Thus:

This change required changing subclasses of Base, and they all pass in auth argument.

Step 2: auth method

auth is probably short for authenticate, and if so, it's misleading. No authentication is happening here, just an assignment. It can be replaced with an attr_accessor. It could be even replaced with nothing (meaning, @auth_details could be enough to have, and instance variables don't need to be declared), but I don't know enough, maybe it's part of public API. Thus:

Step 3: refresh_token method

The conditional on the lines 3-7 (see above ↑) is overly complex, using many nots and an unnecessary has_key? (line 4). has_key? :refresh_token is redundant because we later check @auth_details[:refresh_token] value. So, if the key isn't present, the conditinal evaluates to false. If we don't check for key, value of :refresh_token would be nil, leading to the same false. And, if key is present, value check will determine true or false. Thus:

I'd use a variable for @auth_details[:refresh_token] result, to avoid querying hash twice, but I couldn't think of a good name for it, as refresh_token is already taken by method name.

Step 4: API wrapper methods

These methods (see above ↑) wrap access to JSON API. I've included two methods, but there are more of them. I'm going to skip them as I don't see how to improve them.

Step 5: get, put, post and delete methods

All these methods (see above ↑) are almost the same, and can be reduced to something like cs_method :get, :post, ... using metaprogramming. Like this:

Step 6: add_auth_details_to_options

add_auth_details_to_options is used in step 5 methods to "add auth details to options", at the moment not clear, why it's options and not args. Here it is:

It does look overwhelming at the first glance. But I have no intention of being overwhelmed by it. I'm going to simplify it step by step.

The first thing I see is that at line 18 (see above ↑), args is returned unchanged, if there's no @auth_details present. This is typical guard clause case.

Step 6.1: options

At lines 4-7 (see above ↑) we see that so called options are expected as the 2nd element of args array, and we use it if present. It's actually quite hard to reason about this code because if args[1] is present and options get assigned it, and it's nil, we might get an exception later on, if @auth_details has certain data in it and it tries to use nil as a Hash. I want to simplify it!

The lines 8-16 (see above ↑) add stuff to options. And the line 17 (see above ↑) assigns options back as the second element of args.

It's way too complicated. According to Single Responsibility Principle, a method should have one responsibility only. For this method it means adding stuff to options. Here's the result:

None of that nasty business with args[1] is present anymore, much simpler! And look how all the args[1] business is in one place here:

It is the same code (lines 3-7 ↑), same functionality, but it's all in one place! Being in one place means there's no switching of contexts required, which means it's easier to read.

Step 6.2: putting stuff into options

Let's continue with improving add_auth_details_to_options. I'm placing the same code here again, so it's easier to compare with the code I'll have refactored:

Lines 4-6 (see above ↑) are pretty straightforward, just adding an entry into options if :access_token is present in @auth_details. For some reason the code is really careful to put even nil values of @auth_details[:access_token] into authorization header (if it checked for value instead of key presence, it'd not put the authorization header in at all).

Lines 7-12 (see above ↑) are similar, but line 8 can be merged into line 7. Guess, it's the most boring refactoring in this article:

Now, add_auth_details_to_options looks neat and tidy.

Step 6.3: back to define_cs_method

So, we have a variable number of args at line 2 (see above ↑), but why? It turns out, line 10 calls method name on Base class, meaning, it'll execute Base.get, Base.post, etc. Those are methods from HTTParty, and HTTParty uses .get(*args, &block)-like API, so it's understandable that CreateSend also uses it. Thus, my plan to introduce named arguments is foiled and I have to find another way to improve the code.

From what I know about args (from looking at HTTParty examples), the 1st argument is path and the 2nd argument is options. Lines 3-7 (see above ↑) can be made to better explain arguments they're dealing with:

The code at lines 3-4 (see above ↑) isn't equivalent to the original code. The original code would fail if nil was passed as options (because we wouldn't assign options to be {}). But whether that behaviour was intentional or accidental, I have no idea. So, I rely on the fact that tests still pass.

That's all that's of interest left in the Base class. Hope you enjoyed it.