Monday, June 25, 2012

Simple Example of Dependency Injection with Rails

Today I came across a great opportunity to illustrate dependency injection in a simple context. I had a Rails partial that was duplicated across two subclasses. The partial was responsible for displaying options to create a new record from the data of the current record. It also offered two types of copy, shallow and deep. The shallow copy used a button to POST data, while the deep copy offered a form with some additional options. The only difference between the partials was the path to post data to. Let's see this in code.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
#app/views/fun_event/_copy_options.html.erb
button_to(t("create_and_edit_shallow_copy"), fun_event_path(:from_event => @event.id, :return => true), :    id => "shallow_copy_btn")
 
form_tag(fun_event_path(:return => true)) do
  #form code
end
 
 
#app/views/boring_event/_copy_options.html.erb
button_to(t("create_and_edit_shallow_copy"), boring_event_path(:from_event => @event.id, :return => true), :    id => "shallow_copy_btn")
 
form_tag(boring_event_path(:return => true)) do
  #form code
end

The first, failed iteration

To remove the duplication, I passed in a path option into the partial, replacing specific references with the generic.

1
2
3
4
5
6
7
8
9
10
11
12
13
#app/views/fun_events/copy.html.erb
<%= render :partial => "events/copy_options", :event_path => fun_event_path %>
 
 
#app/views/boring_events/copy.html.erb
<%= render :partial => "events/copy_options, :event_path => boring_event_path %>
 
#app/views/events/_copy_options.html.erb
button_to(t("create_and_edit_shallow_copy"), event_path(:from_event => @event.id, :return => true), :    id => "shallow_copy_btn")
 
form_tag(event_path(:return => true)) do
  #form code
end

Can you guess where this led?

undefined method `event_path' for ActionView::Base:0xd6acf18

Dude! Inject the dependency!

Obviously the event_path variable I was passing was a string, not a method. I needed the method so I could pass in the appropriate arguments to construct the URL I needed. Had there not been two different calls to the routes, I would likely have just passed in the string needed in each context. But in this case, I was forced to think outside the box. Here's what I ended up with.

1
2
3
4
5
6
7
8
9
10
11
12
13
#app/views/fun_events/copy.html.erb
<%= render :partial => "events/copy_options", :event_path => method(:fun_event_path) %>
 
 
#app/views/boring_events/copy.html.erb
<%= render :partial => "events/copy_options, :event_path => method(:boring_event_path) %>
 
#app/views/events/_copy_options.html.erb
button_to(t("create_and_edit_shallow_copy"), event_path.call(:from_event => @event.id, :return => true), :    id => "shallow_copy_btn")
 
form_tag(event_path.call(:return => true)) do
  #form code
end

The changes are really quite subtle, but we use Object's method method to pass the reference to the method we want to call, and simply pass in the arguments when needed. Mind == Blown

Tuesday, May 8, 2012

Inherit an Application by Rewriting the Test Suite

One of my first tasks at End Point was to inherit a production application from the lead developer who was no longer going to be involved. It was a fairly complex domain model and had passed through many developers' hands on a tight client budget. Adding to the challenge was the absence of any active development; it's difficult to "own" an application which you're not able to make changes to or work with users directly. Moreover, we had a short amount of time; the current developer was leaving in just 30 days. I needed to choose an effective strategy to understand and document the system on a budget.

Taking Responsibility

At the time I was reading Robert C. Martin's The Clean Coder, which makes a case for the importance of taking responsibility as a "Professional Software Developer". He defines responsibility for code in the broadest of terms.

Drawing from the Hippocratic oath may seem arrogant, but what better source is there? And, indeed, doesn't it make sense that the first responsibility, and first goal, of an aspiring professional is to use his or her powers for good?

From there he continues to expound in his declarative style about how to do no harm to the function and structure of the code. What struck me most about this was his conclusions about the necessity of testing. The only way to do no harm to function is know your code works as expected. The only way to know your code works is with automated tests. The only way to do no harm to structure is by "flexing it" regularly.

The fundamental assumption underlying all software projects is that software is easy to change. If you violate this assumption by creating inflexible structures, then you undercut the economic model that the entire industry is based on.

In short: You must be able to make changes without exorbitant costs.

The only way to prove that your software is easy to change is to make easy changes to it. Always check in a module cleaner than when you checked it out. Always make some random act of kindness to the code whenever you see it.

Why do most developers fear to make continuous changes to their code? They are afraid they'll break it! Why are they afraid to break it? Because they don't have tests.

It all comes back to the tests. If you have an automated suite of tests that covers virtually 100% of the code, and if that suite of tests can be executed quickly on a whim, then you simply will not be afraid to change the code.

Test Suite for the Triple Win

Fortunately, there was a fairly large test suite in place for the application, but as common with budget-constrained projects, the tests didn't track the code. There were hundreds of unit tests, but they weren't even executable at first. After just a few hours of cleaning out tests for classes which no longer existed, I found about half of the 500 unit tests passed. As I worked through repairing the tests, I was learning the business rules, classes, and domain of the application, all without touching "production" code (win). These tests were the documentation that future developers could use to understand the expected behavior of the system (double win). While rebuilding the tests, I got to document bugs, deprecation warnings, performance issues and general code quality issues (triple win).

By the end of my 30 day transition, I had 500+ passing unit tests that were more complete and flexible than before. Additionally, I added 100+ integration tests which allowed me to exercise the application at a higher level. Not only was I taking responsibility for the code, I was documenting important issues for the client and myself. This helps the client feel I had done my job transitioning responsibilities. This trust leaves the door open to further development, which means a better system over the long haul.

Tuesday, May 1, 2012

Profile Ruby with ruby-prof and KCachegrind

This week I was asked to isolate some serious performance problems in a Rails application. I went down quite a few paths to determine how to best isolate the issue. In this post I want to document what tools worked most quickly to help find offending code.

Benchmarks

Before any work begins finding how to speed things up, we need to set a performance baseline so we can know if we are improving things, and by how much. This is done with Ruby's Benchmark class and some of Rail's Benchmark class.

The Rails guides would have you setup performance tests, but I found this cumbersome on this Rails 2.3.5 application I was dealing with. Initial attempts to set it up were unfruitful, taking time away from the task at hand. In my case, the process of setting up the test environment to reflect the production environment was prohibitively expensive, but if you can automate the benchmarks, do it. If not, use the logs to measure your performance, and keep track in a spreadsheet. Regardless of benchmarking manually or automatically, you'll want to keep some kind of log of the results keeping notes about what changed in each iteration.

Isolating the Problem

As always, start with your logs. In Rails, you get some basic performance information for free. Profiling code slows down runtime a lot. By reviewing the logs you can hopefully make a first cut at what needs to be profiled, reducing already long profile runs. For example, instead of having to profile an entire controller method, by reading the logs you might notice that it's just a particular partial which is rendering slowly.

Taking a baseline benchmark

Once you've got a sense of where the pain is, it's easy to get a benchmark for that slow code as a baseline.

1
2
3
4
5
6
7
module SlowModule
  def slow_method
    benchmark "SlowModule#slow_method" do
      #my slow code
    end
  end
end

Look to your log files to see results. If for some reason, you're outside your Rails enviornment, you can use Ruby's Benchmark class directly.

1
2
3
4
5
require 'benchmark'
result = Benchmark.ms do
  #slow code
end
puts result

This will tell you the process time in milliseconds and give you a precise measurement to compare against.

Profiling with ruby-prof

First, setup ruby-prof. Once installed, you can add these kinds of blocks around your code.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
require 'ruby-prof'
 
module SlowModule
  def slow_method
    benchmark "SlowModule#slow_method" do
      RubyProf.start
      # your slow code here
      results = RubyProf.end
    end
    File.open "#{RAILS_ROOT}/tmp/SlowModule#slow_method_#{Time.now}", 'w' do |file|
      RubyProf::CallTreePrinter.new(results).print(file)
    end
  end
end

Keep in mind that profiling code will really slow things down. Make sure to collect your baseline both with profiling and without, to make sure you're doing apples-to-apples comparison.

By default ruby-prof measures process time, which is the time used by a process between any two moments. It is unaffected by other processes concurrently running on the system. You can review the ruby-prof README for other types of measurements including memory usage, object allocations and garbage collection time.

If you choose to measure any of these options, make sure your Ruby installation has the tools a profiler needs to collect data. Please see the Rails guides for guidance on compiling and patching Ruby.

Interpreting the Data with KCachegrind

At this point you should have a general sense of what code is slow having reviewed the logs. You've got a benchmark log setup with baseline measurements to compare to. If you're going to Benchmark while your profiling, make sure your baseline includes the profile code; it will be much slower! Remember we want an apples-to-apples comparison! You're ready to start profiling and identifying the root source of the performance problems.

After manually or automatically running your troubled code with the profiling block above, you can open up the output from ruby-prof and quickly find it not to be human friendly. Fortunately, KCachegrind turns that mess into something very useful. I found that my Ubuntu installation had a package for it already built, so installation was a breeze. Hopefully things are as easy for you. Next simply open your result files and start reviewing there results.

The image above shows what's called a "call graph" with the percentages representing the relative amount of time that method uses for the duration of the profile run. The CacheableTree#children method calls Array#collect and takes up more then 90% of the runtime. The subsequent child calls are relatively modest in proportion. It's clear we can't modify Array#collect so let's look at CacheableTree#children.

1
2
3
4
5
6
7
module CacheableTree
  def children(element = @root_element)
    full_set.collect { |node| if (node.parent_id == element.id)
      node
    }.compact
  end
end

Defined elsewhere, full_set is an array of Ruby objects. This is common performance optimization in Rails; collecting data looping through arrays works well with a small data set, but quickly becomes painful with a large one. It turned out in this case that full_set had 4200+ elements. Worse yet the children method was being called recusrively on each of them. Yikes!

At this point I had to decide how to optimize. I could go for broke and completely break the API and try and clean up the mess, or I could see if I could collect the data more quickly, some other way. I looked at how the full_set was defined and found I could modify that query to return a subset of elements rather easily.

1
2
3
4
5
module CacheableTree
  def children(element = @root_element)
    FormElement.find_by_sql(...) #the details aren't important
  end
end

By collecting the data directly via a SQL call, I was able to cut my benchmark by about 20%. Not bad for a single line change! Let's see what the next profile told us.

The above is another view of the profile KCachegrind provides. It's essentially the same information, but in table format. There were a few indicators that my optimization was helpful:

  • The total process_time cost had dropped
  • The amount of time spent in each function seemed to better distributed - I didn't have a single method soaking up all the process time
  • Most of the time was spent in code wasn't mine!
Although, we still saw 66% of time spent in the children method, we could also see that 61% of the time was spent in ActiveRecord::Base. Effectively, I had pushed the 'slowness' down the stack, which tends to mean better performance. Of course, there were LOTS of database calls being made. Perhaps some caching could help reduce the number of calls being made.

1
2
3
4
5
6
module CacheableTree
  def children(element = @root_element)
    @children ||= {}
    @children[element] ||= FormElement.find_by_sql(...) #the details aren't important
  end
end

This is called memoization and let's us reuse this expensive method's results within the page load. This method took another 10% off the clock against the baseline. Yay!

Knowing When to Stop

Performance optimization can be really fun, especially once all the infrastructure is in place. However, unless you have unlimited budget and time, you have to know when to stop. For a few lines of code changed, the client would see ~30% performance improvement. It was up to them to decide how much further to take it.

If allowed, my next step would be to make use of the applications existing dependence on redis, and add the Redis-Cacheable gem. It allows you to marshal Ruby objects in and out of a redis server. The application already makes extensive use of caching, and this page was no exception, but when the user modified the page in a way that expired the cache, we would hit this expensive method again, unnecessarily. Based on the call graph above, we could eliminate another ~66% of the call time, and perhaps, by pre-warming this cache, could help the user to minimize the chances of experiencing the pain of slow browsing!

Friday, April 20, 2012

Deconstructing an OO Blog Designs in Ruby 1.9

I've become interested in Avdi Grimm's new book Object on Rails, however I found the code to be terse. Avdi is an expert Rubyist and he makes extensive use of Ruby 1.9 with minimal explanation. In all fairness, he lobbies you to buy Peter Cooper's Ruby 1.9 Walkthrough. Instead of purchasing the videos, I wanted to try and deconstruct them myself.

In his first chapter featuring code, Mr. Grimm creates a Blog and Post class. For those of you who remember the original Rails blog demo, the two couldn't look more different.

Blog#post_source

In an effort to encourage Rails developers to think about relationships between classes beyond ActiveRecord::Relation, he creates his own interface for defining how a Blog should interact with a "post source".

1
2
3
4
5
6
7
8
9
10
class Blog
  # ...
  attr_writer :post_source
   
  private
  def post_source
    @post_source ||= Post.public_method(:new)
  end
end

The code above defines the Blog class and makes available post_source= via the attr_writer method. Additionally, it defines the attribute reader as a private method. The idea being that a private method can be changed without breaking the class's API. If we decide we want a new default Post source, we can do it safely.

The magic of this code is in defining the post source as a class's method, in this case, Post.public_method(:new). The #public_method method is defined by Ruby's Object class and is similar to #method method. In short, it gives us a way of not directly calling Post.new, but instead, referring to the method that's responsible for creating new posts. This is logical if you remember that the name of this method is #post_source.

Now let's look how he puts post_source into action.

1
2
3
4
5
6
7
8
9
class Blog
  # ...
  def new_post
    post_source.call.tap do |p|
      p.blog = self
    end
  end
  # ...
end

During my first reading, it wasn't clear at all what was going on here, but if we remember that post_source is responsible for returning the method need to "call", we know that post_source.call is equivalent to Post.new. For the sake of clarity-while-learning for those not familiar with post_source.call, let's substitute it with something more readable so we can understand how tap is being employed.

1
2
3
4
5
6
7
8
class Blog
  # ...
  def new_post
    Post.new.tap do |p|
      p.blog = self
    end
  end
end

The tap method is available to all Ruby objects and serves as a way to have a block "act on" the method's caller and return the object called. Per the docs, "the primary purpose of this method is to 'tap into' a method chain, in order to perform operations on intermediate results within the chain". For some examples on using tap see MenTaLguY's post on Eavesdropping on Expressions. As he says in his post, "you can insert your [code] just about anywhere without disturbing the flow of data". Neat.

In this case, it's being used to tap into the process of creating a new blog post and define the blog to which that post belongs. Because tap returns the object it modifies, #new_post returns the post now assigned to the blog.

Brining it All Together

Avdi's approach may seem cumbersome at first, and it is compared to "the Rails way." But in general, that's the whole point of Object on Rails; to challenge you to see beyond a generic solution to a problem (in this case defining relationships between classes) so you can build more flexible solutions. Let's see some interesting things we might be able to do with this more flexible Blog class. We can imagine this same Blog class being able to handle posts from all sorts of different sources. Let's see if we can get creative.

1
2
3
4
5
6
7
8
class EmailPost < ActionMailer::Base
  def receive(message)
    @blog = Blog.find_by_owner_email(message.from)
    @blog.post_source = EmailPost.public_method(:new)
    @email_post = @blog.new_post(params[:email_post])
    @email_post.publish
  end
end

With this little snippet, we're able to use the Blog class to process a different sort of post. We simply let the blog know the method to call when we want a new post and pass along the arguments we'd expect. Let's see if we can think of something else that's creative.

1
2
3
4
5
6
7
8
9
10
11
12
13
require 'feedzirra'
# execute regularly with cronjob call like curl -d "blog_id=1&url=http://somefeed.com" http://myblog.com/feed_poster"
 
class FeedPostersController
  def create
    @feed = Feedzirra::Feed.fetch_and_parse(params[:url])
    @blog = Blog.find(params[:blog_id])
    @post.post_source = FeedPost.public_method(:new)
    @feed.entries.each do |entry|
      @blog.new_post(entry)
    end
  end
end

We could imagine the FeedPost.new method being the equivalent of a retweet for your blog using an RSS feed! Try having the blog class doing this with an ActiveRecord association! Seems to me the Blog class might need to get a bit more complex to support all these Post sources which makes post_source.call.tap look pretty good!

Tuesday, April 17, 2012

Monitoring cronjob exit codes with Nagios

If you're like me, you've got cronjobs that make email noise if there is an error. While email based alerts are better than nothing, it'd be best to integrate this kind of monitoring into Nagios. This article will break down how to monitor the exit codes from cronjobs with Nagios.

Tweaking our cronjob

The monitoring plugin depends on being able to read some sort of log output file which includes an exit code. The plugin also assumes that the log will be truncated with every run. Here's an example of a cronjob entry which meets those requirements:

1
rsync source dest 2>&1 > /var/log/important_rsync_job.log; echo "Exit code: $?" >> /var/log/important_rsync_job.log

So let's break down a couple of the more interesting points in this command:

  • 2>&1 sends the stderr output to stdout so it can be captured in our log file
  • Notice the single > which will truncate the log every time it is run
  • $? returns the exit code of the last run command
  • Notice the double >> which will append to the log file our exit code

Setting up the Nagios plugin

The check_exit_code plugin is available on GitHub, and couldn't be easier to setup. Simply specify the log file to monitor and the frequency with which it should be updated. Here's the usage statement:

1
2
3
4
5
6
7
8
9
Check log output has been modified with t minutes and contains "Exit code: 0".
        If "Exit code" is not found, assume command is running.
        Check assumes the log is truncated after each command is run.
         
        --help      shows this message
        --version   shows version information
 
        -f          path to log file
        -t          Time in minutes which a log file can be unmodified before raising CRITICAL alert

The check makes sure the log file has been updated within t minutes because we want to check that our cronjob is not only running successfully, but running regularly. Perhaps this should be an optional parameter, or this should be called check_cronjob_exit_code, but for right now, it's getting the job done and cutting back on email noise.

Sunday, March 18, 2012

Check JSON responses with Nagios

As the developer's love affair with JSON continues to grow, the need to monitor successful JSON output does as well. I wanted a Nagios plugin which would do a few things:

  • Confirm the content-type of the response header was "application/json"
  • Decode the response to verify it is parsable JSON
  • Optionally, verify the JSON response against a data file

Verify content of JSON response

For the most part, Perl's LWP::UserAgent class makes short work of the first requirement. Using $response->header("content-type") the plugin is able to check the content-type easily. Next up, we use the JSON module's decode function to see if we can successfully decode $response->content.

Optionally, we can give the plugin an absolute path to a file which contains a Perl hash which can be iterated through in attempt to find corresponding key/value pairs in the decoded JSON response. For each key/value in the hash it doesn't find in the JSON response, it will append the expected and actual results to the output string, exiting with a critical status. Currently there's no way to check a key/value does not appear in the response, but feel free to make a pull request on check_json on my GitHub page.

Thursday, March 15, 2012

Check HTTP redirects with Nagios

Often times there are critical page redirects on a site that may want to be monitored. Often times, it can be as simple as making sure your checkout page is redirecting from HTTP to HTTPS. Or perhaps you have valuable old URLs which Google has been indexing and you want to make sure these redirects remain in place for your PageRank. Whatever your reason for checking HTTP redirects with Nagios, you'll find there are a few scripts available, but none (that I found) which are able to follow more than one redirect. For example, let's suppose we have a redirect chain that looks like this:

1
http://myshop.com/cart >> http://www.myshop.com/cart >> https://www.mycart.com/cart

Following multiple redirects

In my travels, I found check_http_redirect on Nagios Exchange. It was a well designed plugin, written by Eugene Kovalenja in 2009 and licensed under GPLv2. After experimenting with the plugin, I found it was unable to traverse multiple redirects. Fortunately, Perl's LWP::UserAgent class provides a nifty little option called max_redirect. By revising Eugene's work, I've exposed additional command arguments that help control how many redirects to follow. Here's a summary of usage:

1
2
3
4
5
6
-U          URL to retrieve (http or https)
        -R          URL that must be equal to Header Location Redirect URL
        -t          Timeout in seconds to wait for the URL to load. If the page fails to load,
                    check_http_redirect will exit with UNKNOWN state (default 60)
        -c          Depth of redirects to follow (default 10)
        -v          Print redirect chain

If check_http_redirect is unable to find any redirects to follow or any of the redirects results in a 4xx or 5xx status code returned, the plugin will report a critical state code and the nature of the problem. Additionally, if the number of redirects exceeds the depth of redirects to follow as specified in the command arguments, it will notify you of this and exit with an unknown state code. An OK status will be returned only if the redirects result in a successful response to a URL which is a regex match against the options specified in the R argument.

The updated check_http_redirect plugin is available on my GitHub page along with several other Nagios plugins I'll write about in the coming weeks. Pull requests welcome, and thank you to Eugene for his original work on this plugin.