Toxic Elephant

Don't bury it in your back yard!

How many s-expression formats are there for Ruby?

Posted by matijs 04/11/2012 at 13h34

Once upon a time, there was only UnifiedRuby, a cleaned up representation of the Ruby AST.

Now, what do we have?

  • RubyParser before version 3; this is the UnifiedRuby format:

    RubyParser.new.parse "foobar(1, 2, 3)"
    # => s(:call, nil, :foobar, s(:arglist, s(:lit, 1), s(:lit, 2), s(:lit, 3)))
    
  • RubyParser version 3:

    Ruby18Parser.new.parse "foobar(1, 2, 3)"
    # => s(:call, nil, :foobar, s(:lit, 1), s(:lit, 2), s(:lit, 3))
    
    Ruby19Parser.new.parse "foobar(1, 2, 3)"
    # => s(:call, nil, :foobar, s(:lit, 1), s(:lit, 2), s(:lit, 3))
    
  • Rubinius; this is basically the UnifiedRuby format, but using Arrays.

      "foobar(1,2,3)".to_sexp
      # => [:call, nil, :foobar, [:arglist, [:lit, 1], [:lit, 2], [:lit, 3]]]
    
  • RipperRubyParser; a wrapper around Ripper producing UnifiedRuby:

      RipperRubyParser::Parser.new.parse "foobar(1,2,3)"
      # => s(:call, nil, :foobar, s(:arglist, s(:lit, 1), s(:lit, 2), s(:lit, 3)))
    

How do these fare with new Ruby 1.9 syntax? Let’s try hashes. RubyParser before version 3 and Rubinius (even in 1.9 mode) can’t handle this.

  • RubyParser 3:

      Ruby19Parser.new.parse "{a: 1}"
      # => s(:hash, s(:lit, :a), s(:lit, 1))
    
  • RipperRubyParser:

      RipperRubyParser::Parser.new.parse "{a: 1}"
      # => s(:hash, s(:lit, :a), s(:lit, 1))
    

And what about stabby lambda’s?

  • RubyParser 3:

      Ruby19Parser.new.parse "->{}"
      # => s(:iter, s(:call, nil, :lambda), 0, nil)
    
  • RipperRubyParser:

      RipperRubyParser::Parser.new.parse "->{}"
      # => s(:iter, s(:call, nil, :lambda, s(:arglist)),
      #      s(:masgn, s(:array)), s(:void_stmt))
    

That looks like a big difference, but this is just the degenerate case. When the lambda has some arguments and a body, the difference is minor:

  • RubyParser 3:

      Ruby19Parser.new.parse "->(a){foo}"
      # => s(:iter, s(:call, nil, :lambda),
      #      s(:lasgn, :a), s(:call, nil, :foo))
    
  • RipperRubyParser:

      RipperRubyParser::Parser.new.parse "->(a){foo}"
      # => s(:iter, s(:call, nil, :lambda, s(:arglist)),
      #      s(:lasgn, :a), s(:call, nil, :foo, s(:arglist)))
    

So, what’s the conclusion? For parsing Ruby 1.9 syntax, there are really only two options: RubyParser and RipperRubyParser. The latter stays closer to the UnifiedRuby format, but the difference is small.

RubyParser’s results are a little neater, so RipperRubyParser should probably conform to the same format. Reek can then be updated to use the cleaner format, and use either library for parsing.

Tags , , , no comments no trackbacks