PaginatedResource refactoring

Today I'm going to refactor a class from Digital Ocean Droplet Kit, a library to control droplets (that's what Digital Ocean calls virtual machines).

The class is called PaginatedResource and you can find the original source here. The idea behind the class is that it fetches elements from an external source on demand, and you just call #each and don't worry about fetching.

How PaginatedResource works

Take a look:

	module DropletKit
	class PaginatedResource
	include Enumerable

	PER_PAGE = 20

	attr_reader :action, :resource, :collection
	attr_accessor :total

	def initialize(action, resource, *args)
	@current_page = 0
	@total = nil
	@action = action
	@resource = resource
	@collection = []
	@args = args
	@options = args.last.kind_of?(Hash) ? args.last : {}
	end

view raw paginated_resource orig1.rb hosted with ❤ by GitHub

Elements are fetched from external source one page at a time, and on line 5, PER_PAGE constant defines default number of elements per page. When there's a need, a page full of elements will be fetched from external source.

@current_page tells us number of the last fetched page.
@total tells us how many elements are there altogether at the external source.
@collection is an array that holds already fetched elements.
#initialize can be given a hash of options as the last argument, and the only option it supports is :per_page, overriding PER_PAGE constant.

Step 1: meaningful names

Based on the meanings above, I will do the following renames:

@current_page -> @last_fetched_page. current_page explains that it's current page number, but what does it mean? We have external source and already fetched elements, and to which does current_page refer is not clear at this point. last_fetched_page on the other hand, explains right away that it refers to external element source.
@total -> @total_remote_elements. On the first glance, totally not clear what total represents. Total of what? total_remote_elements conveys that it's about "remote" elements. Knowing that this class is about fetching elements from external resource should help understand "remote".
@collection -> @fetched_elements. I feel it's by far the best rename. Especially because it's part of public interface (from outside, collection looks as another way of getting elements, instead of using #each).

I have also added comments, to explain what the class does:

	module DropletKit
	# PaginatedResource provides an Enumerable interface to external resource,
	# fetching elements as needed.
	#
	# #each is able to start at specified index, i.e. #each(5) will start yielding
	# elements starting at 6th element.
	class PaginatedResource
	include Enumerable

	PER_PAGE = 20

	attr_reader :action, :resource, :fetched_elements
	attr_accessor :total_remote_elements

	def initialize(action, resource, *args)
	@last_fetched_page = 0
	@total_remote_elements = nil
	@action = action
	@resource = resource
	@fetched_elements = []
	@args = args
	@options = args.last.kind_of?(Hash) ? args.last : {}
	end

view raw paginated_resource step1.rb hosted with ❤ by GitHub

Step 2: tidy up initialize

#initialize is somewhat haphazard, and I'd like to change a few things:
* Move simple argument assignments to the top, so they are brain-dead easy to skim over.
* Remove @total_remote_elements = nil because all unassigned attributes are nil by default, there's no need to assign them.
* Bundle fetch-related attributes together.

I considered bundling @options with the top group of assignments, because it's still assigned from arguments, but then, the top group wouldn't be so easy to read.

Here are my changes:

	def initialize(action, resource, *args)
	@action = action
	@resource = resource
	@args = args

	@options = args.last.kind_of?(Hash) ? args.last : {}
	@last_fetched_page = 0
	@fetched_elements = []
	end

view raw paginated_resource step2.rb hosted with ❤ by GitHub

Step 3: disable write-level access to internals

As you may have noticed, there's attr_accessor :total_remote_elements. Assigning total_remote_elements from outside doesn't make much sense because:

If not all elements are needed, then Enumerable#first(n) can be used to get first n elements.
If total_remote_elements was set from outside to a bigger number than number of remote elements, that would probably cause an error when fetching non-existing elements. Not very useful.

So, I've removed it, and tests still pass. It only was used internally by PaginatedResource.

Step 4: #each method

Take a look:

	def each(start = 0)
	# Start off with the first page if we have no idea of anything yet
	fetch_next_page if @total_remote_elements.nil?

	return to_enum(:each, start) unless block_given?
	Array(@fetched_elements[start..-1]).each do \|element\|
	yield(element)
	end

	unless last?
	start = [@fetched_elements.size, start].max
	fetch_next_page
	each(start, &Proc.new)
	end

	self
	end

view raw paginated_resource #each orig.rb hosted with ❤ by GitHub

Step 4a

The first thing #each does (on line 3) is fetch next page if @total_remote_elements is nil. When I first read it, it wasn't clear why @total_remote_elements of nil causes a fetch, and the comment didn't help much. As I read more code I understood that @total_remote_elements gets assigned on the first fetch, so, if it's nil, it means that nothing was fetched yet and we fetch the first page for setting up stuff. And that's what I want to convey on line 3:

	def each(start = 0)
	# Start off with the first page if we have no idea of anything yet
	fetch_next_page if nothing_fetched_yet?

	return to_enum(:each, start) unless block_given?
	Array(@fetched_elements[start..-1]).each do \|element\|
	yield(element)
	end

	unless last?
	start = [@fetched_elements.size, start].max
	fetch_next_page
	each(start, &Proc.new)
	end

	self
	end

	def nothing_fetched_yet?
	@total_remote_elements.nil?
	end

view raw paginated_resource #each 4a.rb hosted with ❤ by GitHub

Step 4b

On line 5 (see the code above ↑) we return an Enumerator, if block wasn't provided. I feel that lines 3 and 6-8 belong together, as they do the actual fetching and yielding work, and to_enum between them just gets in the way. So, I move enumerator creation to the top:

	def each(start = 0)
	return to_enum(:each, start) unless block_given?

	# Start off with the first page if we have no idea of anything yet
	fetch_next_page if nothing_fetched_yet?
	Array(@fetched_elements[start..-1]).each do \|element\|
	yield(element)
	end

	unless last?
	start = [@fetched_elements.size, start].max
	fetch_next_page
	each(start, &Proc.new)
	end

	self
	end

view raw paginated_resource #each 4b.rb hosted with ❤ by GitHub

Step 4c

On line 6 (see the code above ↑), we yield already fetched elements. If start is beyond what was fetched, we'd get nil as the result of @fetched_elements[start..-1], so Array() converts nil to [].
On lines 10-14, if there are more pages to fetch, we update start to omit yielding already yielded elements, fetch next page and recursively call each.

So, altogether, new elements are fetched on demand and yielded.

I'd like to change the abstraction here from pages (last?) to elements (more_elements_to_fetch?) as it's easier to understand and easier to calculate. The only place I see the page abstraction useful is in retrieving new pages. But for calculating whether we can fetch more elements, it's overkill. Here are my changes (line 10):

	def each(start = 0)
	return to_enum(:each, start) unless block_given?

	# Start off with the first page if we have no idea of anything yet
	fetch_next_page if nothing_fetched_yet?
	Array(@fetched_elements[start..-1]).each do \|element\|
	yield(element)
	end

	if more_elements_to_fetch?
	start = [@fetched_elements.size, start].max
	fetch_next_page
	each(start, &Proc.new)
	end

	self
	end

	def more_elements_to_fetch?
	@total_remote_elements > @fetched_elements.size
	end

view raw paginated_resource #each 4c.rb hosted with ❤ by GitHub

Step 4d

On line 11 (see the code above ↑) we update start to omit yielding already yielded elements (lines 6-8 take care of yielding whatever was fetched before). It's a bit hard though to get that meaning from the code. So, I tried to explain it better (lines 11-12):

	def each(start = 0)
	return to_enum(:each, start) unless block_given?

	# Start off with the first page if we have no idea of anything yet
	fetch_next_page if nothing_fetched_yet?
	Array(@fetched_elements[start..-1]).each do \|element\|
	yield(element)
	end

	if more_elements_to_fetch?
	# Ensure we omit from yielding already yielded elements
	start = after_fetched_elements unless start > after_fetched_elements
	fetch_next_page
	each(start, &Proc.new)
	end

	self
	end

	def after_fetched_elements
	@fetched_elements.size
	end

view raw paginated_resource #each 4d.rb hosted with ❤ by GitHub

Step 4e

On line 14 (see the code above ↑), #each is called recursively, passing Proc.new as block. I had to look it up, and apparently, Proc.new translates to the current passed block. But recursion isn't needed here and each recursive call does some extra work on lines 2-8, which are only really needed for the first #each call. So, I replaced recursion with a loop:

	def each(start = 0, &block)
	return to_enum(:each, start) unless block_given?

	# Start off with the first page if we have no idea of anything yet
	fetch_next_page if nothing_fetched_yet?
	yield_fetched_elements(start, &block)

	while more_elements_to_fetch?
	# Ensure we omit from yielding already yielded elements
	start = after_fetched_elements unless start > after_fetched_elements
	fetch_next_page
	yield_fetched_elements(start, &block)
	end

	self
	end

	def yield_fetched_elements(start)
	Array(@fetched_elements[start..-1]).each do \|element\|
	yield(element)
	end
	end

view raw paginated_resource #each 4e.rb hosted with ❤ by GitHub

Step 5

Next is #total_pages method:

	def total_pages
	return nil if nothing_fetched_yet?

	(@total_remote_elements.to_f / per_page.to_f).ceil
	end

view raw paginated_resource step5 orig.rb hosted with ❤ by GitHub

Not sure why #total_pages is part of public interface, perhaps because of tests referencing it. The only thing I've changed here is replacing return nil with just return. There's no need to specify nil because return without argument will produce nil. Here it is:

	def total_pages
	return if nothing_fetched_yet?

	(@total_remote_elements.to_f / per_page.to_f).ceil
	end

view raw paginated_resource step 5 after.rb hosted with ❤ by GitHub

Step 6

Next is #== method. It compares PaginatedResource with objects, responding to #[]:

	def ==(other)
	each_with_index.each.all? {\|object, index\| object == other[index] }
	end

view raw paginated_resource step 6 orig.rb hosted with ❤ by GitHub

each is redundant, so I removed it:

	def ==(other)
	each_with_index.all? { \|object, index\| object == other[index] }
	end

view raw paginated_resource step 6 after.rb hosted with ❤ by GitHub

Step 7

Next is #retrieve method (see the code below ↓). It fetches a page of elements from the resource we got passed in #initialize. Then it adds newly fetched elements to @fetched_elements and, on the first retrieve only, it sets @total_remote_elements:

	def retrieve(page, per_page = self.per_page)
	invoker = ResourceKit::ActionInvoker.new(action, resource, *@args)
	invoker.options[:per_page] \|\|= per_page
	invoker.options[:page] = page

	@fetched_elements += invoker.handle_response

	if @total_remote_elements.nil?
	meta = MetaInformation.extract_single(invoker.response.body, :read)
	@total_remote_elements = meta.total.to_i
	end
	end

view raw paginated_resource step 7 orig.rb hosted with ❤ by GitHub

Step 7a

On line 6 (see the code above ↑), += is used to add newly fetched elements to @fetched_elements. It translates to call Array#+, and that means that a new array is created every time elements are retrieved. A more efficient way is to use Array#concat, which adds elements to the existing array.

	def retrieve(page, per_page = self.per_page)
	invoker = ResourceKit::ActionInvoker.new(action, resource, *@args)
	invoker.options[:per_page] \|\|= per_page
	invoker.options[:page] = page

	@fetched_elements.concat(invoker.handle_response)

	if @total_remote_elements.nil?
	meta = MetaInformation.extract_single(invoker.response.body, :read)
	@total_remote_elements = meta.total.to_i
	end
	end

view raw paginated_resource step 7a.rb hosted with ❤ by GitHub

Step 7b

The last change I want to make (see lines 8-11 above ↑) is to replace if @total_remote_elements.nil? then assign @total_remote_elements with @total_remote_elements ||=. I think it makes clear that @total_remote_elements is assigned here, and you can stop reading right away if you're not interested in that.

	def retrieve(page, per_page = self.per_page)
	invoker = ResourceKit::ActionInvoker.new(action, resource, *@args)
	invoker.options[:per_page] \|\|= per_page
	invoker.options[:page] = page

	@fetched_elements.concat(invoker.handle_response)

	@total_remote_elements \|\|= begin
	meta = MetaInformation.extract_single(invoker.response.body, :read)
	meta.total.to_i
	end
	end

view raw paginated_resource step 7b.rb hosted with ❤ by GitHub

End result

Here are all the changes put together:

	module DropletKit
	# PaginatedResource provides an Enumerable interface to external resource,
	# fetching elements as needed.
	#
	# #each is able to start at specified index, i.e. #each(5) will start yielding
	# elements starting at 6th element.
	class PaginatedResource
	include Enumerable

	PER_PAGE = 20

	attr_reader :action, :resource, :fetched_elements

	def initialize(action, resource, *args)
	@action = action
	@resource = resource
	@args = args

	@options = args.last.kind_of?(Hash) ? args.last : {}
	@last_fetched_page = 0
	@fetched_elements = []
	end

	def per_page
	@options[:per_page] \|\| PER_PAGE
	end

	def each(start = 0, &block)
	return to_enum(:each, start) unless block_given?

	# Start off with the first page if we have no idea of anything yet
	fetch_next_page if nothing_fetched_yet?
	yield_fetched_elements(start, &block)

	while more_elements_to_fetch?
	# Ensure we omit from yielding already yielded elements
	start = after_fetched_elements unless start > after_fetched_elements
	fetch_next_page
	yield_fetched_elements(start, &block)
	end

	self
	end

	def total_pages
	return if nothing_fetched_yet?

	(@total_remote_elements.to_f / per_page.to_f).ceil
	end

	def ==(other)
	each_with_index.all? { \|object, index\| object == other[index] }
	end

	private

	def after_fetched_elements
	@fetched_elements.size
	end

	def fetch_next_page
	@last_fetched_page += 1
	retrieve(@last_fetched_page)
	end

	def more_elements_to_fetch?
	@total_remote_elements > @fetched_elements.size
	end

	def nothing_fetched_yet?
	@total_remote_elements.nil?
	end

	def retrieve(page, per_page = self.per_page)
	invoker = ResourceKit::ActionInvoker.new(action, resource, *@args)
	invoker.options[:per_page] \|\|= per_page
	invoker.options[:page] = page

	@fetched_elements.concat(invoker.handle_response)

	@total_remote_elements \|\|= begin
	meta = MetaInformation.extract_single(invoker.response.body, :read)
	meta.total.to_i
	end
	end

	def yield_fetched_elements(start)
	Array(@fetched_elements[start..-1]).each do \|element\|
	yield(element)
	end
	end
	end
	end

view raw paginated_resource end result.rb hosted with ❤ by GitHub

Happy hacking!

PaginatedResource refactoring

How PaginatedResource works

Step 1: meaningful names

Step 2: tidy up initialize

Step 3: disable write-level access to internals

Step 4: #each method

Step 4a

Step 4b

Step 4c

Step 4d

Step 4e

Step 5

Step 6

Step 7

Step 7a

Step 7b

End result

Fed up working on bad code? Here's a way out!

For people that that want to stop suffering from bad code I’ve made a FREE course

Dmitry

Ninefold CLI refactoring

CreateSend refactoring part 1

PaginatedResource refactoring

How PaginatedResource works

Step 1: meaningful names

Step 2: tidy up initialize

Step 3: disable write-level access to internals

Step 4: #each method

Step 4a

Step 4b

Step 4c

Step 4d

Step 4e

Step 5

Step 6

Step 7

Step 7a

Step 7b

End result

Fed up working on bad code? Here's a way out!

For people that that want to stop suffering from bad code I’ve made a FREE course

Dmitry

Post Navigation

Ninefold CLI refactoring

CreateSend refactoring part 1