Categories
Code ruby

Blog Move 1: Getting WordPress data to Ruby using XML

Step 1 in the “Moving my blog” process is “Extract the current site’s data into a manageable format”

Frankly, that’s easy! WordPress has a functionality to export the site’s content to a single XML file containing all the published Categories, Tags, Posts, Pages and Comments. To do this (WordPress v2.9.2) click Tools > Export and save the file. In previous versions of the software I believe it’s under the Manage menu.


I’m aware I could import the data directly from the WordPress database (to wherever it goes in the end) but let’s imagine we can’t. Anyway, database access would be tediously slow and inefficient to test against and implement.

A quick google for “import wordpress xml ruby” threw up nothing helpful so I turned to the Ruby XML libraries. John Nunemaker “feverishly posts everything he learns” at railstips.org and has two articles of use here:

The latter deals with three different ruby xml libraries and compares their speed, ease of use and how nice their names are to say. He puts REXML, hpricot and libxml-ruby. I’ll save you the pleasure of reading the article (if you like) and ccv John’s summary:

“Libxml is blisteringly fast, [but] Hpricot has cooler name, REXML and Hpricot both feel easier to use out of the box”

And there you go. Hpricot it is!

Now to get the data into Ruby. After a quick glance at the rubytips article and The RDocs I put together this code as a starting point:


cats_hierarchy={}
(doc/"wp:category").each do |category|
    cat_name = category.at("wp:category_nicename").innerHTML
    cat_parent = category.at("wp:category_parent").innerHTML

    if cats_hierarchy.include? cat_parent
        cats_hierarchy[cat_parent] = cat_name
    else
        cats_hierarchy[cat_name] = []
    end
end

cats = cats_hierarchy.to_a.flatten

That gives me two each to use Ruby objects each containing all of my category data: a hash which preserves the hierarchy of the structure and all the names in a linear array.


?> cats = cats_hierarchy.to_a.flatten.uniq
=> ["route66", nil, "rails", "american-2008", "reciprocal-affection", "hope-for-the-future", "code", "blog", "review-blog", "rant", "brands", "projects", "yab_shop", "textpattern", "meaningful-labor", "giants", "accessibility", "root", "charity-project", "apple", "xhtml", "america-2006-route-66", "ruby", "learning", "america-2007", "uncategorized", "iphone", "america-2008"]

?> cats_hierarchy
=> {"route66"=>nil, "rails"=>nil, "american-2008"=>nil, "reciprocal-affection"=>nil, "hope-for-the-future"=>nil, "code"=>nil, "blog"=>"review-blog", "rant"=>nil, "brands"=>nil, "projects"=>nil, "yab_shop"=>nil, "textpattern"=>nil, "meaningful-labor"=>nil, "giants"=>nil, "accessibility"=>nil, "root"=>nil, "charity-project"=>nil, "apple"=>nil, "xhtml"=>nil, "america-2006-route-66"=>nil, "ruby"=>nil, "learning"=>nil, "america-2007"=>nil, "uncategorized"=>nil, "iphone"=>nil, "america-2008"=>nil}

And so we have the starting point to getting this WordPress exported XML data into a Ruby application.

More soon.