Tuesday, December 27, 2011

Importing Comments into Disqus using Rails

It seems everything is going to the cloud, even comment systems for blogs. Disqus is a platform for offloading the ever growing feature set users expect from commenting systems. Their website boasts over a million sites using their platform and offers a robust feature set and good performance. But before you can drink the Kool-Aid, you've got to get your data into their system.

If you're using one of the common blog platforms such as WordPress or Blogger, there are fairly direct routes Disqus makes available for automatically importing your existing comment content. For those with an unsupported platform or a hand-rolled blog, you are left with exporting your comments into XML using WordPress's WXR standard.

Disqus leaves a lot up to the exporter, providing only one page in there knowledge base for using what they describe as a Custom XML Import Format. In my experience the import error messages were cryptic and my email support request is still unanswered 5 days later. (Ok, so it was Christmas weekend!)

So let's get into the nitty gritty details. First, the sample code provided in this article is based on Rails 3.0.x, but should work with Rails 3.1.x as well. Rails 2.x would work just as well by modifying the way the Rails environment is booted in the first lines. I chose to create a script to dump the output to standard output which could be piped in to a file for upload. Let's see some of the setup work.

Setting up a Rails script

I choose to place the script in the RAILS_ROOT/scripts directory and named it wxr_export.rb. This would allow me to call the script with the Rails 2.x style syntax (ahh, the nostalgia):

1
script/wxr_export.rb > comments.xml

This would fire up the full Rails enviornment, execute our Ruby code, and pipe the standard output to a file called comments.xml. Pretty straightforward, but it's not that often Rails developers think about creating these kind of scripts, so it's worth discussing to see the setup mechanics.

1
2
3
#!/usr/bin/env ruby
require File.expand_path('../../config/boot', __FILE__)
require File.expand_path('../../config/environment', __FILE__)

I think the first line is best explained by this excerpt from Ruby Programming:

First, we use the env command in the shebang line to search for the ruby executable in your PATH and execute it. This way, you will not need to change the shebang line on all your Ruby scripts if you move them to a computer with Ruby installed a different directory.

The next two lines are essentially asking the script to boot the correct Rails environment (development, testing, production). It's worth briefly offering an explanation of the syntax of these two somewhat cryptic lines. File#expand_path converts a pathname to an absolute pathname. If passed only the first string, it would use the current working path to evaluate, but since we pass __FILE__ we are asking it to use the current file's path as the starting point.

The config/boot.rb file is well documented in the Rails guides which explains that boot.rb defines the location of your Gemfile, hooks up Bundler, which adds the dependencies of the application (including Rails) to the load path, making them available for the application to load.

The config/enviornment.rb file is also well documented and effectively loads the Rails packages you've specified, such as ActiveModel, ActiveSupport, etc...

Exporting WXR content

Having finally loaded our Rails enviornment in a way we can use it, we are ready to actually build the XML we need. First, let's setup our XML and the gerenal format we'll use to popualate our file:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
# script/wxr_export.rb
 
xml = Builder::XmlMarkup.new(:target => STDOUT, :indent => 2)
 
xml.instruct! :xml, :version=>"1.0", :encoding=>"UTF-8"
 
xml.rss 'version' => "2.0",
        'xmlns:content' => "http://purl.org/rss/1.0/modules/content/",
        'xmlns:dsq' => "http://www.disqus.com/",
        'xmlns:dc' => "http://purl.org/dc/elements/1.1/",
        'xmlns:wp' => "http://wordpress.org/export/1.0/" do
  
  xml.channel do
    Articles.all.each do |article|
      if should_be_exported?(article)
        xml.item do
 
          #Article XML goes here
 
   article.comments.each do |comment|
       
     #Comments XML goes here
 
   end #article.comments.each
 end   #xml.item
      end     #if should_be_exported?
    end       #Articles.all.each
  end         #xml.channel
end           #xml.rss

This is the general form for the WXR format as described by Disqus's knowledge base article. Note that you need to nest the comments inside each specific Article's XML. I found that I needed to filter some of my output so I added a helper function called should_be_exported? which can be defined at the top of the script. This would allow you to exclude Articles without comments, or whatever criteria you might find helpful.

With our basic format in place, let's look at the syntax for exporting the Article fields. Keep in mind that the fields you'll want to pull from in your system will likely be different, but the intention is the same.

Inside the Article XML block

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
# script/wxr_export.rb
 
# Inside the Article XML block
 
xml.title article.title
 
xml.link create_url_for(article)
 
xml.content(:encoded) { |x| x << "<!--[CDATA[" + article.body + "]]-->" }
 
xml.dsq(:thread_identifier) { |x| x << article.id }
 
xml.wp(:post_date_gmt) { |x| x << article.created_at.utc.to_formatted_s(:db) }
 
xml.wp(:comment_status) { |x| x << "open" } #all comments open

Let's look at each of these fields one by one:

  • xml.title: This is pretty straight forward, just the plain text tile of the blog article.
  • xml.link: Disqus can use URLs for determining which comments to display on your page, so it asks you to provide a URL associated with this article. I found that for this particular app, it would be easier to write another helper function to generate the URLs then using the Rails routes. If you wish to use the Rails routes (and I suggest you do), then I suggest checking out this excellent post for using routes outside of views.
  • xml.content(:encoded): The purpose of this field is clear, but the syntax is not. Hope this saves you some time and headache!
  • xml.dsq(:thread_identifier): The other way Disqus can identify your article is by a unique identifier. This is strongly recommended over the use of a URL. We'll just use your unique identifier in the database.
  • xml.wp(:post_date_gmt): The thing to keep in mind here is that we need the date in a very particular format. It needs to be in YYYY-MM-DD HH:MM:SS 24-hour format and adjusted to GMT which typically implies UTC. Rails 3 makes this very easy for us, bless their hearts.
  • xml.wp(:comment_status): This app wanted to leave all comments open. You may have different requirements so consider adding a helper function.

Inside the Comment XML block

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
article.comments.each do |comment|
   
  xml.wp(:comment) do
 
    xml.wp(:comment_id) { |x| x << comment.id }
 
    xml.wp(:comment_author) do |x|
      if comment.user.present? && comment.user.name.present?
        x << comment.user.name
      else
 x << ""
      end
    end
                   
    xml.wp(:comment_author_email) do |x|
      if comment.user.present? && comment.user.email.present?
        x << comment.user.email
      else
        x << ""
      end
    end
 
    xml.wp(:comment_author_url) do |x|
      if comment.user.present? && comment.user.url.present?
        x << comment.user.url
      else
        x << ""
      end
    end
 
    xml.wp(:comment_author_IP) { |x| x << "255.255.255.255" }
 
    xml.wp(:comment_date_gmt) { |x| x << comment.created_at.utc.to_formatted_s(:db) }
 
    xml.wp(:comment_content) { |x| x << "<!--[CDATA[" + comment.body + "]]-->" }
 
    xml.wp(:comment_approved) { |x| x << 1 } #approve all comments
 
    xml.wp(:comment_parent) { |x| x << 0 }
 
  end #xml.wp(:comment)
end #article.comments.each

Again, let's inspect this one field at a time:

  • xml.wp(:comment_id): Straightforward, a simple unique identifier for the comment.
  • xml.wp(:comment_author): Because some commentors may not have a user associated with them, I added some extra checks to make sure the author's user and name were present. I'm sure there's a way to shorten the number of lines used, but I was going for readability here. I'm not certain it was necessary to include the blank string, but after some of the trouble I had importing, I wanted to minimize the chance of strange XML syntax issues.
  • xml.wp(:comment_author_email): More of the same safe guards of having empty data.
  • xml.wp(:comment_author_url): More of the same safe guards of having empty data.
  • xml.wp(:comment_author_IP): We were not collecting user IP data, so I put in some bogus data which Disqus did not seem to mind.
  • xml.wp(:comment_date_gmt): See xml.wp(:post_date_gmt) above for comments about date/time format.
  • xml.wp(:comment_content): See xml.content(:encoded) above for comments about encoding content.
  • xml.wp(:comment_approved): Two options here, 0 or 1. Typically you'd want to automatically approve your existing comments, unless of course you wanted to give a moderator a huge backlog of work.
  • xml.wp(:comment_parent): This little field turned out to be the cause of a lot of trouble for me. In the comments on Disqus's XML example, it says parent id (match up with wp:comment_id), so initially, I just put in the comment's ID in this field. This returned the very unhelpful error * url * URL is required to which I still have my unanswered supprot email in to Disqus. By trial error, I found that by just setting the comment_parent to zero, I could successfully upload my comment content. If you are using threaded comments, I suspect this field will be of more importance to you then it was to me. When I hear from Disqus, I will update this article with more information.

Tuesday, December 13, 2011

Performing Bulk Edits in Rails: Part 2

This is the second article in the series on how to perform a bulk edit in Rails. Let's recap our user's story from Part 1.

  • User makes a selection of records and clicks "Bulk Edit" button
  • User works with the same form they would use for a regular edit, plus
    • check boxes are added by each attribute to allow the user to indicate this variable should be affected by the bulk edit
    • only attributes which are the same among selected records should be populated in the form

Part 1 addressed the first part of our user story. Now that we have our user's selection, we need to create an interface to allow them to select attributes affected by the bulk edit. Let's start with the form we'll use to POST our input.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
# app/controllers/bulk_edits_controller.rb
 
def new
  @foos = Foo.find(params[:stored_file_ids]) #params collected by work done in Part 1
  @foo = Foo.new
end
 
 
# app/views/bulk_edit/new.html.erb
 
<%= form_for @foo, :url => "/bulk_edits" do |f| %>
  <% @foos.each do |foo| %>
    <%= hidden_field_tag "foo_ids[]", foo.id %>
  <% end %>
  <%= render "foos/form", :f => f %>
  <%= f.submit %>
<% end %>

Let's first look at how we formed our form_for tag. Although this is a form for a Foo object, we don't want to POST to foos_controller#create so we add :url => "/bulk_edits" which will POST to the bulk_edits_controller#create. Additionally, we need to send along the foo_ids we eventually want to bulk update. Finally, we don't want to re-create the form we already have for Foo. By modifying one master form, we'll make long term maintenance easier. Now that we've got our form posting to the right place, let's see what modifications will need to make to our standard form to allow the user to highlight attributes they want to modify.

1
2
3
4
5
# app/views/foos/_form.html.erb
 
<%= check_box_tag "bulk_edit[]", :bar %>
<%= f.label :bar %>
<%= f.text_field :bar %>


Bulk edit check boxes appear in front of field names to let users know which fields will be modified.

We've added another check_box_tag to the form to record which attributes the user will select for bulk updating. However, we only want to display this when we're doing a bulk edit. Let's tweak this a bit further.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
# app/views/foos/_form.html.erb
 
<%= bulk_edit_tag :bar %>
<%= f.label :bar %>
<%= f.text_field :bar %>
 
# app/helpers/foos_helper.rb
 
def bulk_edit_tag(attr)
  check_box_tag("bulk_edit[]", attr) if bulk_edit?
end
 
def bulk_edit?
  params[:controller] == "bulk_edits"
end

With these modifications to the form in place, the user can now specify which fields are eligible for bulk editing. Now we need the logic to determine how to populate the bar attribute based on the user's selection. This way, the user will see that an attribute is the same across all selected items. Let's revise our bulk edit controller.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
# app/controllers/bulk_edit_controller.rb
 
def new
  @foos = Foo.find(params[:foo_ids])
  matching_attributes = Foo.matching_attributes_from(@foos)
  @foo = Foo.new(matching_attributes)
end
 
 
# app/models/foo.rb
 
def self.matching_attributes_from(foos)
 
  matching = {}
  attriubtes_to_match = Foo.new.attribute_names  #see <a href="http://api.rubyonrails.org/classes/ActiveRecord/Base.html#method-c-attribute_names">attribute_names</a> for more details
 
  foos.each do |foo|
 
    attributes_to_match.each do |attribute|
 
      value = foo.__send__(attribute)  #see <a href="http://apidock.com/ruby/Object/__send__">send</a>, invokes the method identified by symbol, use underscore version to avoid namespace issues
 
      if matching[attribute].nil?
        matching[attribute] = value  #assume it's a match
 
      elsif matching[attribute] != value
        matching[attribute] = "" #on the first mismatch, empty the value, but don't make it nil
 
      end
 
    end
 
  end
end


Only fields which are the same across all selected records will be populated. Other fields will be left blank by default.


With Foo#matching_attributes_for generating a hash of matching attributes, the form will only populate fields which match across all of the user's selected items. With our form in place, the last step is to actually perform the bulk edit.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
# app/controllers/bulk_edits_controller.rb
def create
  if params.has_key? :bulk_edit
 
    foos = Foo.find(params[:foo_ids])
    foos.each do |foo|
 
        eligible_params = {}
        params[:bulk_edit].each do |eligible_attr|
 
            #create hash of eligible_attributes and the user's value
            eligible_params.merge! { eligible_attr => params[:foo][eligible_attr] }
 
        end
 
        #update each record, but only with eligible attributes
        foo.update_attributes(eligible_params)
 
    end
  end
end

We've now completed the entire user story. Users are able to use check boxes to identify which attributes should be bulk updated. They also get to see which attributes match across their selection. Things are, of course, always more involved with a real production application. Keep in mind this example does not make good use of mass assignment protection using attr_accessible and forcing an empty whitelist of attributes by using config.active_record.whitelist_attributes = true. This is a best practice that should be implemented anytime you need sever-side validation of your forms.

Additionally, there may be cases where you want to perform bulk edits of more complex attributes, such as nested attributes. Consider appending your additional attributes to the Foo.new.attribute_names array and then tweaking the eligible attributes logic. Also consider implementing a maximum number of records which are able to be bulk edited at a time; wouldn't want your server to time out. Good luck!

Monday, December 5, 2011

Working with Constants in Ruby

Ruby is designed to put complete power into the programmer's hands and with great power comes great responsibility! This includes the responsibility for freezing constants. Here's an example of what someone might THINK is happening by default with a constant.

1
2
3
4
5
6
7
8
class Foo
  DEFAULTS = [:a, :b]
end
 
#irb
default = Foo::DEFAULTS
default << :c
Foo::DEFAULTS #=> [:a, :b, :c]  WHOOPS!

As you can see, assigning a new variable from a constant lets you modify what you thought was a constant! Needless to say, such an assumption would be very difficult to track down in a real application. Let's see how we might improve on this design. First, let's freeze our constant.

1
2
3
4
5
6
7
class Foo
  DEFAULTS = [:a, :b].freeze
end
 
#irb
default = Foo::DEFAULTS
default << :c #=>  ERROR can't modify frozen array

Now we'll get very specific feedback about offending code. The question is how can we use our constant now as a starting point for array, and still be able to modify it later? Let's look at some more code.

1
2
3
Foo::DEFAULTS.frozen? #=> true
Foo::DEFAULTS.clone.frozen? #=> true, this was my first guess, but it turns out we need...
Foo::DEFAULTS.dup.frozen? #=> false

It's worth reading the docs on clone and dup to understand there difference, but in short, clone replicates the internal state of the object while dup creates a new instance of the object. There was one more question I needed to answer; what would happen when I wanted to append another frozen array to a non-frozen array? Let's look to the code again!

1
2
3
default = Foo::DEFAULTS.dup  #not frozen
new_default = default + [:c].frozen
new_default.frozen? # false

So it seems that the initial state of the object carries the frozen state, allowing you to append frozen arrays without having to dup them. The moral of the story here is don't make assumptions about Ruby! One of the best ways to challenge your assumptions is with unit tests.