Toxic Elephant

Don't bury it in your back yard!

Private Toolbox: An Anti-Pattern

Posted by matijs 10/04/2016 at 09h21

This is an anti-pattern that has bitten me several times.

Suppose you have an object hierarchy, with a superclass Animal, and several subclasses, Worm, Snake, Dog, Centipede. The superclass defines the abstract concept move, which is realized in the subclasses in different ways, i.e., by slithering or walking. Suppose that due to other considerations, it makes no sense to derive Worm and Snake from a SlitheringAnimal, nor Dog and Centipede from a WalkingAnimal. Yet, the implementation of Worm#move and Snake#move have a lot in common, as do Dog#move and Centipede#move.

One way to solve this is to provide methods walk and slither in the superclass that can be used by the subclasses that need them. Because it makes no sense for all animals be able to walk and slither, these methods would need to be accessible only to subclasses (e.g., private in Ruby).

Thus, the superclass provides a toolbox of methods that can only be used by its subclasses to mix and match as they see fit: a Private Toolbox.

This may seem an attractive course of action, but in my experience, this becomes a terrible mess in practice.

Let’s examine what is wrong with this in more detail. I see four concrete problems:

  • It is not always clear at the point of method definition what a method’s purpose is.
  • Each subclass carries with it the baggage of extra private methods that neither it nor its subclasses actually use.
  • The superclass’ interface is effectively extended to its non-public methods,
  • New subclasses may need to share methods that are not available in the superclass.

The Animal superclass shouldn’t be responsible for the ability to slither and to move. If we need more modes, we may not always be able to add them to the superclass.

We could extract the modes of movement into separate helper classes, but in Ruby, it is more natural to create a module. Thus, there would be modules Walker and Slitherer, each included by the relevant subclasses of Animal. These modules could either define move directly, or define walk and slither. Because the methods added in the latter case would actually makes sense for the including classes, there is less need to make them private: Once could make a instance of Dog walk, either by calling move, or by calling walk directly.

This solves all four of Private Toolbox’ problems:

  • The module names reveal the purpose of the defined methods.
  • Subclasses that do not need a particular module’s methods do not include it.
  • The implementor of Animal is free to change its private methods.
  • If a new mode of transportation is needed, no changes to Animal are needed. Instead, a new module can be created that provides the relevant functionality.

Tags , , no comments no trackbacks

Try to avoid try

Posted by matijs 28/07/2015 at 10h52

Because of a pull request I was working on, I had cause to benchmark activesupport’s #try. Here’s the code:

require 'benchmark'
require 'active_support/core_ext/object/try'

class Bar
  def foo

  end
end

class Foo

end

bar = Bar.new
foo = Foo.new

n = 1000000
Benchmark.bmbm(15) do |x|
  x.report('straight') { n.times { bar.foo } }
  x.report('try - success') { n.times { bar.try(:foo) } }
  x.report('try - failure') { n.times { foo.try(:foo) } }
  x.report('try on nil') { n.times { nil.try(:foo) } }
end

Here is a sample run:

Rehearsal ---------------------------------------------------
straight          0.150000   0.000000   0.150000 (  0.147271)
try - success     0.760000   0.000000   0.760000 (  0.762529)
try - failure     0.410000   0.000000   0.410000 (  0.413914)
try on nil        0.210000   0.000000   0.210000 (  0.207706)
------------------------------------------ total: 1.530000sec

                      user     system      total        real
straight          0.140000   0.000000   0.140000 (  0.143235)
try - success     0.740000   0.000000   0.740000 (  0.742058)
try - failure     0.380000   0.000000   0.380000 (  0.379819)
try on nil        0.210000   0.000000   0.210000 (  0.207489)

Obviously, calling the method directly is much faster. I often see #try used defensively, without any reason warrented by the logic of the application. This makes the code harder to follow, and now this benchmark shows that this kind of cargo-culting can actually harm performance of the application in the long run.

Some more odd things stand out:

  • Succesful #try is slower than failed try plus a straight call. This is because #try actually does some checks and then calls #try! which does one of the checks all over again.
  • Calling #try on nil is slower than calling a nearly identical empty method on foo. I don’t really have an explanation for this, but it may have something to do with the fact that nil is a special built-in class that may have different logic for method lookup.

Bottom line: #try is pretty slow because it needs to do a lot of checking before actually calling the tried method. Try to avoid it if possible.

Tags , , no comments no trackbacks

In Ruby, negation is a method

Posted by matijs 30/01/2014 at 06h16

These past few days, I’ve been busy updating RipperRubyParser to make it compatible with RubyParser 3. This morning, I discovered that one thing that was changed from RubyParser 2 is the parsing of negations.

Before, !foo was parsed like this:

s(:not, s(:call, nil, :foo))

Now, !foo is parsed like this:

s(:call, s(:call, nil, :foo), :!)

That looks a lot like a method call. Could it be that in fact, it is a method call? Let’s see.

foo = Object.new

def foo.!
  puts "Negating ..."
  false
end

!foo # prints: Negating ...
       # => false

Amazing. Negation is a method call! Does this mean I can make it return something insane?

def foo.!
  "Not really not"
end

!foo # => "Not really not"

Yes!

Now what can you do with this? At first sight, this may look pretty crazy and useless. But perhaps you can make a string class where !"true" == "false", or code to handle boolean values codified as strings in a database. This is just another part where Ruby is a lot more flexible than you might at first think.

Tags , no comments no trackbacks

Some thoughts on Ruby's speed

Posted by matijs 02/03/2013 at 16h42

Yesterday, I read Alex Gaynor’s slides on dynamic language speed. It’s an interesting argument, but I’m not totally convinced.

At a high level, the argument is as follows, it seems:

  • For a comparable algorithm, Ruby et al. do much more work behind the scenes than ‘fast’ languages such as C.
  • In particular, they do a lot of memory allocation.
  • Therefore, we should add tools to those languages that allow us to do memory allocation more efficiently.

Allocation

As an example, the creation of an array of squared integers is presented, where the runtime needs to dynamically allocate ever larger arrays to hold its result.

I can see the argument there, but I’m not convinced the solution is correct, for the example given. In particular, how much time is spent allocating and re-allocating the array of object references, and how much is spent elsewehere?

<typo:code lang=”ruby”> require ‘benchmark’

def doubles_unallocated n buf = [] i = 1; j = 0 while j < n do

buf[i - 1] = i * i
i += 1; j += 1

end buf end

def doubles_preallocated n buf = Array.new n i = 1; j = 0 while j < n do

buf[j] = i * i
i += 1; j += 1

end buf end

n = 10000

Benchmark.bmbm do |x| x.report(“unallocated”) { n.times { doublesunallocated 1000 } } x.report(“preallocated”) { n.times { doublespreallocated 1000 } } end </typo:code>

A typical result on Ruby 1.9.3 would be (I’m not showing the rehearsal section here):

                   user     system      total        real
unallocated    1.480000   0.020000   1.500000 (  1.493959)
preallocated   1.370000   0.030000   1.400000 (  1.404668)

Taking several runs into account, I conclude that pre-allocating the array shaves off 7 to 13 percent of the runtime.

Algorithms

The algorithms used above are obviously not idiomatic Ruby, but they have the advantage of allowing a comparison of allocated and unallocated arrays. Normally, I would probably use map with a range. Similarly, the Python code presented in the slides looks odd to me. I would have expected something using list comprehensions.

Here is the idiomatic Ruby implementation:

<typo:code lang=”ruby”> def doubles_map n (1..n).map {|i| i * i } end </typo:code>

And here is the Ruby equivalent of the Python example given (using each versus for gives no difference in speed):

<typo:code lang=”ruby”> def doubles_each n buf = [] (1..n).each do |i|

buf << i * i

end buf end </typo:code>

For Ruby, doubles_map is the idiomatic implementation. It also has the advantage that a dedicated implementation of Range#map would be able to allocate the result array in one go, transparantly avoiding the array growing overhead. Using list comprehensions in Python would probably have the same advantage.

Unfortunately, doubles_map is also a lot slower than doubles_each, except on MRI 1.8. Here is the benchmark I ran with a fixed-array version thrown in for good measure:

<typo:code lang=”ruby”> ARR = (1..1000).toa def doublesfixed_array ARR.map {|i| i * i } end

n = 10000

Benchmark.bmbm do |x| x.report(“using map”) { n.times { doublesmap 1000 } } x.report(“using each”) { n.times { doubleseach 1000 } } x.report(“using a fixed array”) { n.times { doublesfixedarray } } end </typo:code>

Here is the result for MRI 1.9.3:

                          user     system      total        real
using map             1.220000   0.000000   1.220000 (  1.217607)
using each            1.070000   0.020000   1.090000 (  1.091302)
using a fixed array   0.940000   0.020000   0.960000 (  0.969685)

And for MRI 1.8.7:

                          user     system      total        real
using map             2.790000   0.000000   2.790000 (  2.795956)
using each            3.630000   0.010000   3.640000 (  3.644272)
using a fixed array   2.120000   0.010000   2.130000 (  2.133636)

A better Range#map

These benchmark results made me wonder how Range#map is implemented. Since it is written in Ruby, I checked the Rubinius implementation, and there at least the problem seems due to the lack of a dedicated Range#map implementation: The generic Enumerable#map is used, effectively turning it into doubles_each. I’m not entirely sure why it is actually slower, perhaps that’s due to the double yield involved and/or argument splatting.

Here are the results of the previous benchmark, run on Rubinius 2.0.0-rc1:

                          user     system      total        real
using map             1.312082   0.004000   1.316082 (  1.316717)
using each            0.996062   0.000000   0.996062 (  0.997418)
using a fixed array   0.720045   0.000000   0.720045 (  0.724615)

I made the following simple dedicated implementation for Range#map. It falls back to the default Enumerable#map for most cases, except for Fixnums:

<typo:code lang=”ruby”> class Range def map

if block_given? && Fixnum === @begin
  first, last = @begin, @end
  last -= 1 if @excl
  size = last - first + 1

  out = Array.new size
  out_tuple = out.tuple

  i = first
  j = 0
  while i <= last
    out_tuple[j] = yield i
    i += 1
    j += 1
  end

  out
else
  super
end

end end </typo:code>

This makes doubles_map run slightly faster than the fixed array version:

                          user     system      total        real
using map             0.648040   0.000000   0.648040 (  0.650103)
using each            0.928058   0.000000   0.928058 (  0.930109)
using a fixed array   0.680042   0.004001   0.684043 (  0.686719)

Benchmarking Still Needed

While examining these options, I found that there are many ways to write this code, each subtly different, each with a different performance on different Rubies (the examples using while are the slowest on MRI 1.9.3, but the fastest on Rubinus). So, if you really want your code to be fast, you still have to benchmark, you still have to know which Ruby implementation your code will run on, and you still have to try all the options.

Tags , , , no comments no trackbacks

How many s-expression formats are there for Ruby?

Posted by matijs 04/11/2012 at 13h34

Once upon a time, there was only UnifiedRuby, a cleaned up representation of the Ruby AST.

Now, what do we have?

  • RubyParser before version 3; this is the UnifiedRuby format:

    RubyParser.new.parse "foobar(1, 2, 3)"
    # => s(:call, nil, :foobar, s(:arglist, s(:lit, 1), s(:lit, 2), s(:lit, 3)))
    
  • RubyParser version 3:

    Ruby18Parser.new.parse "foobar(1, 2, 3)"
    # => s(:call, nil, :foobar, s(:lit, 1), s(:lit, 2), s(:lit, 3))
    
    Ruby19Parser.new.parse "foobar(1, 2, 3)"
    # => s(:call, nil, :foobar, s(:lit, 1), s(:lit, 2), s(:lit, 3))
    
  • Rubinius; this is basically the UnifiedRuby format, but using Arrays.

      "foobar(1,2,3)".to_sexp
      # => [:call, nil, :foobar, [:arglist, [:lit, 1], [:lit, 2], [:lit, 3]]]
    
  • RipperRubyParser; a wrapper around Ripper producing UnifiedRuby:

      RipperRubyParser::Parser.new.parse "foobar(1,2,3)"
      # => s(:call, nil, :foobar, s(:arglist, s(:lit, 1), s(:lit, 2), s(:lit, 3)))
    

How do these fare with new Ruby 1.9 syntax? Let’s try hashes. RubyParser before version 3 and Rubinius (even in 1.9 mode) can’t handle this.

  • RubyParser 3:

      Ruby19Parser.new.parse "{a: 1}"
      # => s(:hash, s(:lit, :a), s(:lit, 1))
    
  • RipperRubyParser:

      RipperRubyParser::Parser.new.parse "{a: 1}"
      # => s(:hash, s(:lit, :a), s(:lit, 1))
    

And what about stabby lambda’s?

  • RubyParser 3:

      Ruby19Parser.new.parse "->{}"
      # => s(:iter, s(:call, nil, :lambda), 0, nil)
    
  • RipperRubyParser:

      RipperRubyParser::Parser.new.parse "->{}"
      # => s(:iter, s(:call, nil, :lambda, s(:arglist)),
      #      s(:masgn, s(:array)), s(:void_stmt))
    

That looks like a big difference, but this is just the degenerate case. When the lambda has some arguments and a body, the difference is minor:

  • RubyParser 3:

      Ruby19Parser.new.parse "->(a){foo}"
      # => s(:iter, s(:call, nil, :lambda),
      #      s(:lasgn, :a), s(:call, nil, :foo))
    
  • RipperRubyParser:

      RipperRubyParser::Parser.new.parse "->(a){foo}"
      # => s(:iter, s(:call, nil, :lambda, s(:arglist)),
      #      s(:lasgn, :a), s(:call, nil, :foo, s(:arglist)))
    

So, what’s the conclusion? For parsing Ruby 1.9 syntax, there are really only two options: RubyParser and RipperRubyParser. The latter stays closer to the UnifiedRuby format, but the difference is small.

RubyParser’s results are a little neater, so RipperRubyParser should probably conform to the same format. Reek can then be updated to use the cleaner format, and use either library for parsing.

Tags , , , no comments no trackbacks