Managing nil in JSON and Hashes

Introduction 

While developing PullReview (our Automated Code Review tool for Ruby), we interact a lot with external services (like GitHub), whose APIs mostly return JSONs. While nice, this lead us to some attempts at managing nil values at different levels.

Sample

Let’s say we ask GitHub for information about a specific commit, and want to extract its user name and repository name, in order to be able to recreate the ‘user/repo’ identifier.

The JSON may look like:

{
    "repository": {
        "owner": {
            "name": "acme"
            },
        "name": "dynamite",
        "date": "2013-07-21"
    }
}

Based on this sample, we want a method to return “acme/dynamite”. Looks easy:

def user_repo(payload)
 repository = payload['repository']['name']
 owner = payload['repository']['owner']['name']
 "#{owner}/#{repository}"
end

Done!

Nil happens

This works great, until our first call returns an empty repository. Without a name, our code fails with a dramatic:

/home/martin/code/test/nilnil/test.rb:7:in `user_repo': undefined method `[]' for nil:NilClass (NoMethodError)
from /home/martin/code/test/nilnil/test.rb:76:in `<main>'

Even worse, each of the elements can actually be nil, i.e., we can have a repository with no name, a repository with no owner, or with an owner with no name.

Reference: Guard clauses

Time for some guard clauses:

def user_repo(payload)
 return nil unless payload['repository']
 repository = payload['repository']['name']
 return nil unless repository
 #
 return nil unless payload['repository']['owner']
 owner = payload['repository']['owner']['name']
 return nil unless owner
 #
 "#{owner}/#{repository}"
end

This works, but looks quite long for such a basic operation, so let’s look for alternatives.

Alternative #1: Chaining ands

The first option is to simply add ‘and’ conditions in order to chain the ‘ifs’:

def user_repo(payload)
 return nil unless payload && payload['repository'] && payload['repository']['name']
 return nil unless payload['repository']['owner'] && payload['repository']['owner']['name']
 #
 repository = payload['repository']['name']
 owner = payload['repository']['owner']['name']
 #
 "#{owner}/#{repository}"
end

Looks a tad better, but the first line is already 93 characters long – more than what can be comfortably shown on github or inside a terminal – and this is only a three level deep model.

While GitHub does a nice job of limiting the deepness of its API (mainly using well defined endpoints), this is by no means an hard limit.

Alternative #2: Improving chains with try

Try is ActiveSupport’s answer to the “maybe I have a value, maybe not” problem. It allows you to try to call any method with any set of parameters on any object, and be sure to never raise a NoMethodError exception:

name = user.try(:name) # will return nil if user is nil

Try’s can be chained, so we can rewrite our sample:

def user_repo(payload)
  repository = payload['repository'].try(:[],'name')
  owner = payload['repository'].try(:[],'owner').try(:[],'name')
  #
  return nil unless owner && repository
  #
  "#{owner}/#{repository}"
end

Try makes the intention quite clear (we try to get a value, being unsure that there is one), but loses in readability when there are parameters involved as in my case. Try requires active_support, but the gem has been nicely split in order to be able to only import the part your need (when outside of Rails).

Alternative #3: Improving chains with andand

andand is an implementation of the Maybe monad in Ruby. I prefered the “maybe” word to implement it, but the result is quite nice:

def user_repo(payload)
  repository = payload['repository'].andand['name']
  owner = payload['repository'].andand['owner'].andand['name']
  #
  return nil unless owner && repository
  #
  "#{owner}/#{repository}"
end

No need for the awkward operator try(:[],’owner’), which helps the code to stay clean (while retaining exactly the same structure and principle).

Alernative #4: JSON Path

This is something JSON specific, but large hashes are often the result of JSON documents so it is quite interesting. JSON Path is to JSON what XPath is to XML: a query language to transform and extract data. The jsonpath gem provide an implementation in Ruby:

require 'jsonpath'

def user_repo(payload)
  repo_path = JsonPath.new 'repository.name'
  repository = repo_path.on(payload).first
  #
  owner_path = JsonPath.new 'repository.owner.name'
  owner = owner_path.on(payload).first
  #
  return nil unless owner && repository
  "#{owner}/#{repository}"
end

No win on the method length, as each path needs to be defined, then applied. Calling ‘on’ immediately after ‘new’ would win no readability here.

I definitely miss a construct that would allow me to write something like:

payload.query('repository.name').first

When I do not need to store or reuse the query. This is not possible for now, and would require to “monkey-patch” String, which I’m not that confident to do (remember that there is no JSON object involved here – before the parsing, it is a String, after the parsing it is a Hash).

Now, JSON Path has two main perks: it can do much much more than just selecting values by their name (it is a fully featured query language), and perhaps even more importantly, it reacts nicely when the deepness increases (if we had a first_name and last_name under name,you just need to add those strings  - the code complexity will stay exactly the same).

Conclusion

With all of this said, what would be a good choice? I think I’ll choose two options, depending on the complexity of the situation:

  • andand is a clean and well thought solution, and solves the generic problem quite satisfactory

  • JSON Path is more specific, but can be really clean when the documents become larger, as it is the most resistant to complexity. As a query language, it also makes it easier to build dynamic queries, should the need arise (apart from of course being able to do much more).





5 thoughts on “Managing nil in JSON and Hashes

  1. Yannick

    I would totally avoid #try .
    Andand seems cleaner but it’s an external dependency.
    My personal go here would be using fetch method for working with hashes.

    repository = payload['repository'].try(:[],'name')
    owner = payload['repository'].try(:[],'owner').try(:[],'name')

    would become something like

    # The syntax here permits to avoid instanciating new {} if there is a repository
    # Not a big impact here, but if the default value implies computation, it helps.
    repository_hash = payload.fetch('repository') { {} }
    repository = repository_hash.fetch('name')
    owner = repository_hash.fetch('owner', {}).fetch('name')

    or, if your leaves may be nil,

    repository_hash = payload.fetch('repository') { {} }
    repository = repository_hash['name']
    owner = repository_hash.fetch('owner', {})['name']

    Reply
    1. Martin Post author

      I agree on avoiding try. I’m probably less external dependency averse than you are (I don’t want to pile them up, but small and well defined librairies are no hindrance to me), but the argument is valid (ceteris paribus, less dependencies is better).

      While your solution is technically sound, I do dislike its syntax which for me is cumbersome and impedes the readability of the code.

      If the choice is “no dependencies” vs “more readable code”, I’ll tend to the later in most situations (like this one).

      Reply
  2. Martin

    I would think about extending Hash to do the job for me.


    Hash.class_eval do
    def deep(*keys)
    value = self

    keys.each do |key|
    break unless value
    value = value[key]
    end

    value
    end
    end

    h = { foo: { bar: 'hello' } }

    h.deep # => {:foo=>{:bar=>"hello"}}
    h.deep(:foo) # => {:bar=>"hello"}
    h.deep(:foo, :bar) # => "hello"
    h.deep(:baz) # => nil

    Reply
    1. Martin Post author

      Hi Martin,
      Thanks for the idea & implementation. I try not to extend core class too much, as this can have some far reached impact. Your case is a new method, so it’s probably “benign”. The result is very similar (albeit a bit more compact) than the “andand” solution. Why would you prefer one on the other?

      Martin

      Reply
      1. Martin

        Fechting values from nested hashs is a common tasks, therefore I would prefer the shortest and most readable solution. Your version 1 is way to verbose. 2, 3) having multiple `andand` or `try` calls in one line makes the code hard to scan.


        payload['repository'].try(:[],'owner').try(:[],'name')
        payload['repository'].andand['owner'].andand['name']
        # vs.
        payload.deep('repository', 'owner', 'name')

        4) The JsonPath solution is very interesting. But even if I was going to use that one, I would write something that allows me to write something like (you wish).


        payload.query('repository.name').first
        # or just
        payload.query('repository.name')

        Reply

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>