Sunday, September 15, 2013

Data Munging Kata in Ruby

Back in 2007, Dave Thomas posted a series of katas that continue to have an enduring appeal. The katas ranged from algorithm implementations to thinking exercises. This Data Munging kata had a particularly real world flare by focusing on data munging and algorithm reuse.

In short, the kata asks you to parse two data files and determine the minimum variance between two fields in each data set. The idea is to implement a solution for the first data set without reuse in mind. Next, implement a solution for the second data set and extract out the common, minimum variance algorithm between the two solutions. Nice!

I was pleased with my solution of passing in a data structure to my SpreadCalc class as well as the min_method and max_method the class could use to query the data structure to determine variance. This design would allow the class to be used on any data set that could respond to a method call and allow for the extension into all sorts of calculations related to minimums and maximums in a dataset.

class SpreadCalc
  def initialize(spread_data: spread_data, max_method: max_method, min_method: min_method)
    @spread_data = spread_data
    @max_method = max_method
    @min_method = min_method
  end

  def minimum
    min_index = 0
    min_spread = calc_spread(spread_data[min_index])

    spread_data.each_with_index do |data, i|
      spread = calc_spread(data)
      if spread < min_spread
        min_index = i
        min_spread = spread
      end
    end

    spread_data[min_index]
  end
end

I was also able to extract out a common algorithm for parsing the data files even though they were in slightly different formats. Although not intentional, I was in the mindset of extracting common functionality and the duplication was quickly obvious. I suppose that's the point of the kata! You can find the full source for my implementation on Github.

No comments:

Post a Comment