Ruby Concurrency Explained
Ruby Concurrency Explained
PHP code should be fast to load and not use too much memory.
Java code is slower to boot and to warm up, it usually uses
quite a lot of memory. Finally, Java is a general purpose
programming language not designed primarily for the internet.
Others programming languages like Erlang and Scala use a
third approach: the actor model. The actor model is somewhat
a bit of a mix of both solutions, the difference is that actors are
a like threads which dont share the same memory context.
Communication between actors is done via exchanged
messages ensuring that each actor handles its own state and
therefore avoiding corrupt data (two threads can modify the
same data at the same time, but an actor cant receive two
messages at the exact same time). Well talk about that design
pattern later on, so dont worry if you are confused.
What about Ruby? Should Ruby developers use threads,
multiple processes, actors, something else? The answer is: yes!
Threads
Since version 1.9, Ruby has native threads (before that green
threads were used).
green threads are threads that are scheduled by a runtime
library or virtual machine (VM) instead of natively by the
underlying operating system. Green threads emulate
multithreaded environments without relying on any native OS
capabilities, and they are managed in user space instead of
kernel space, enabling them to work in environments that do
not have native thread support.
So in theory, if we would like to, we should be able to use
threads everywhere like most Java developers do. Well, thats
almost true, the problem is that Ruby, like Python uses a Global
Interpreter Lock (aka GIL). This GIL is a locking mechanism that
is meant to protect your data integrity. The GIL only allows data
to be modified by one thread at time and therefore doesnt let
threads corrupt data but also it doesnt allow them to truly run
concurrently. That is why some people say that Ruby and
Python are not capable of (true) concurrency.
However these people often dont mention that the GIL makes
single threaded programs faster, that multi-threaded programs
are much easier to develop since the data structures are safe
and finally that a lot of C extensions are not thread safe and
without the GIL, these C extensions dont behave properly.
These arguments dont convince everyone and thats why you
will hear some people say you should look at another Ruby
implementation without a GIL, such as JRuby, Rubinius (hydra
branch) or MacRuby (Rubinius & MacRuby also offer other
concurrency approaches). If you are using an implementation
without a GIL, then using threads in Ruby has exactly the same
pros/cons than doing so in Java. However, it means that now
you have to deal with the nightmare of threads: making sure
your data is safe, doesnt deadlock, check that your code, your
libs, plugins and gems are thread safe. Also, running too many
threads might affect the performance because your OS doesnt
have enough resources to allocate and it ends up spending its
time context switching. Its up to you to see if its worth it for
your project.
Actors/Fibers
Earlier we talked a bit about the actor model. Since Ruby 1.9,
developers now have access to a new type of lightweight
threads called Fibers. Fibers are not actors and Ruby doesnt
have a native Actor model implementation but some people
wrote some actor libs on top of fibers. A fiber is like a simplified
thread which isnt scheduled by the VM but by the programmer.
Fibers are like blocks which can be paused and resumed from
the outside of from within themselves. Fibers are faster and use
less memory than threads as demonstrated in this blog post.
However, because of the GIL, you still cannot truly run more
than one concurrent fiber by thread and if you want to use
multiple CPU cores, you will need to run fibers within more than
one thread. So how do fibers help with concurrency? The
answer is that they are part of a bigger solution. Fiber allow
developers to manually control the scheduling of concurrent
code but also to have the code within the fiber to auto schedule
itself. Thats pretty big because now you can wrap an incoming
web request in its own fiber and tell it to send a response back
when its done doing its things. In the meantime, you can move
on the to next incoming request. Whenever a request within a
fiber is done, it will automatically resume itself and be returned.
Sounds great right? Well, the only problem is that if you are
doing any type of blocking IO in a fiber, the entire thread is
blocked and the other fibers arent running. Blocking operations
are operations like database/memcached queries, http
requests basically things you are probably triggering from
your controllers. The good news is that the only problem to fix
now is to avoid blocking IOs. Lets see how to do that.
Conclusion
High concurrency with Ruby is doable and done by many.
However, it could made easier. Ruby 1.9 gave us fibers which
allow for a more granular control over the concurrency
scheduling, combined with non-blocking IO, high concurrency
can be achieved. There is also the easy solution of forking a
running process to multiply the processing power. However the
real question behind this heated debate is what is the future of
the Global Interpreter Lock in Ruby, should we remove it to
improve concurrency at the cost of dealing with some new
major threading issues, unsafe C extensions, etc..? Alternative
Ruby implementers seem to believe so, but at the same time
Rails still ships with a default mutex lock only allowing requests
to be processed one at a time, the reason given being that a lot
of people using Rails dont write thread safe code and a lot of
plugins are not threadsafe. Is the future of concurrency
something more like libdispatch/GCD where the threads are
handled by the kernel and the developer only deals with a
simpler/safer API?
Further reading:
Concurrency is a myth in Ruby
Ruby fibers vs Ruby threads
Multi-core, threads, passing messages
Threads suck
Non blocking Active Record and Rails
Scalable Ruby processing with EventMachine
Ruby concurrency with actors
Concurrency in MRI; threads
Ruby 1.9 adds fibers for lightweight concurrency
Threads in Ruby, enough already
Untangling Evented Code with Ruby Fibers
Elise Huards RubyConf Concurrency talk slides