Monday, January 17, 2011

Writing Ruby Extensions in C - Part 11, Blocks and Callbacks

This is the eleventh in my series of posts about writing ruby extensions in C. The first post talked about the basic structure of a project, including how to set up building. The second post talked about generating documentation. The third post talked about initializing the module and setting up classes. The fourth post talked about types and return values. The fifth post focused on creating and handling exceptions. The sixth post talked about ruby catch and throw blocks. The seventh post talked about dealing with numbers. The eighth post talked about strings. The ninth post focused on arrays. The tenth post looked at hashes. This post will talk about blocks and callbacks.

Blocks


Blocks [1] are a great idiom in ruby, equivalent to anonymous functions attached to a line of code. As with many other things in ruby C extensions, they are fairly easy to deal with. There are a few functions to know about:
  • rb_block_given_p() - returns 1 if a block was given to this ruby function, 0 otherwise
  • rb_yield(value) - yield a single value to the given block
  • rb_yield_values() - yield multiple values to the given block

In ruby terms "yield"ing sends a value from a statement into a block. If you want to yield multiple values to a ruby block, you have two options: rb_yield() with an array or hash, and rb_yield_values(). They both work equally well, though rb_yield_values() with multiple values is a bit more idiomatic to ruby. It is also possible for the ruby block to return a result from the block; the return value of the last statement of the block will be returned from the rb_yield() or rb_yield_values() call. However, note that the last line of the block cannot be a return; in that case, the value will be lost forever. Unfortunately this puts a bit of a burden on the consumers of the APIs, but it is coded into the ruby runtime[2]. The following example will demonstrate all of these calls.

First let's look at the ruby code:


 1) obj.rb_yield_example {|single|
 2)     puts "Single element is #{single}"
 3)     "done"
 4) }
 5)
 6) obj.rb_yield_values_example {|first, second, third|
 7)     puts "1st is #{first}, 2nd is #{second}, 3rd is #{third}"
 8)     "done"
 9) }

Now let's look at the C code to implement the above:


 1) static VALUE example_rb_yield(VALUE c) {
 2)     VALUE result;
 3)
 4)     if (!rb_block_given_p())
 5)         rb_raise(rb_eArgError, "Expected block");
 6)
 7)     result = rb_yield(rb_str_new2("hello"));
 8)
 9)     fprintf(stderr, "Return value from block is %s\n",
10)             StringValueCStr(result));
11)
12)     return Qnil;
13) }
14)
15) static VALUE example_rb_yield_values(VALUE c){
16)     VALUE result;
17)
18)     if (!rb_block_given_p())
19)         rb_raise(rb_eArgError, "Expected block");
20)
21)     result = rb_yield_values(3, rb_str_new2("first"),
22)                              rb_str_new2("second"),
23)                              rb_str_new2("third"));
24)
25)     fprintf(stderr, "Return value from block is %s\n",
26)             StringValueCStr(result));
27)
28)     return Qnil;
29) }
30)
31) rb_define_method(c_obj, "rb_yield_example",
32)                  example_rb_yield, 0);
33) rb_define_method(c_obj, "rb_yield_values_example",
34)                  example_rb_yield_values, 0);


Callbacks


Although blocks are idiomatic to ruby and should be used wherever possible, there are situations in which they do not work. For instance, if a ruby method needs to be used as a callback for an asynchronous event, blocks do not work; they are only active for the duration of the method call the block is attached to. If it is necessary to call a particular ruby method from a C library asynchronous callback, there are 2 options:

  1. Procs (lambdas)
  2. Named Methods

Procs are more idiomatic to ruby, but as far as I can tell there isn't a whole lot of advantages to Procs over named methods. I'll go through both of them after setting up the example.

Let's assume that the C library being wrapped requires callbacks for asynchronous events. In this case, the library is expecting a function pointer with a signature looking like:

int (*asynccallback)(int event, void *userdata);

(that is, the function must take an event and a void pointer in, and return an int result). Also assume that we have to register the callback with the library:

void register_async_callback(int (*cb)(int, void *), void *userdata);

How would we go about calling a ruby method that the user writes when the library does the asynchronous callback?

Procs


With Procs, we would have the user of our ruby library create a Proc and pass it to the extension. An example ruby client:

 1)  cb = Proc.new {|event, userdata|
 2)      puts "event is #{event}, userdata is #{userdata}"
 3)  }
 4)
 5)  ruby_extension.register_async_proc(cb, "my user data")

Note that the body of the Proc can be any valid ruby; here we simple print out the arguments that were passed into the Proc.

In the extension, we would define a method called "register_async_proc" that takes 2 arguments: the Proc and the user data that we want passed through to the Proc. The extension C code would look something like:


 1) int internal_callback(int event, void *userdata) {
 2)     VALUE passthrough = (VALUE)userdata;
 3)     VALUE cb;
 4)     VALUE cbdata;
 5)
 6)     cb = rb_ary_entry(passthrough, 0);
 7)     cbdata = rb_ary_entry(passthrough, 1);
 8)
 9)     rb_funcall(cb, rb_intern("call"), 2, INT2NUM(event),
10)                cbdata);
11)
12)     return 0;
13) }
14)
15) VALUE ext_register(VALUE obj, VALUE cb, VALUE userdata) {
16)     VALUE passthrough;
17)
18)     if (rb_class_of(cb) != rb_cProc)
19)         rb_raise(rb_eTypeError, "Expected Proc callback");
20)
21)     passthrough = rb_ary_new();
22)     rb_ary_store(passthrough, 0, cb);
23)     rb_ary_store(passthrough, 1, userdata);
24)
25)     register_async_callback(internal_callback,
26)                             (void *)passthrough);
27) }
28)
29) rb_define_method(c_extension, "register_async_proc",
30)                  ext_register, 2);

The above is not a lot of code, but there is a lot going on, so let's step through it one line at a time starting from the end. Line 29 defines our new method called register_async_proc, that will call the internal extension function ext_register (lines 15 to 27) with 2 arguments. Lines 18 and 19 inside of ext_register check to make sure that what the user actually passed us was a Proc. Lines 21 through 23 set up a new ruby array that contains both the callback that the user gave to us and any additional user data that they want passed into the Proc. Line 25 calls the C library function register_async_callback with our *internal* callback, and the ruby array that we set up in lines 21 through 23. There are a couple of things to note with this. First, we cannot use the ruby Proc as the callback directly; the Proc will have the wrong signature, and the C library doesn't have any idea of how to marshal data so that ruby can understand it. Instead, we have the C library call an internal callback inside the extension; this internal callback will marshal the data for the ruby callback, and then invoke the ruby callback. The second thing to note about line 25 is that we pass the array that we created in lines 21 through 23 to the C library in the "opaque" callback data. It is imperative that the C library function provide a void * pointer for user data, otherwise this technique cannot work.

After line 25, the asynchronous callback is set up. When an event happens in the C library, it will callback to the function given to it by register_async_callback. In our case, this callback is internal_callback, lines 1 through 13. The first thing that internal_callback does on line 2 is to cast the void * back to a VALUE so we can operate on it. In lines 6 and 7, the array that was created and registered earlier is pulled apart into separate pieces. Finally, line 9 calls out to the Proc that was originally registered by the user, passing the event that happened and the original user data to be passed into the Proc.

Named methods


Named method callbacks work very similarly to Proc callbacks, so I won't go into great lengths to describe them. I'll show the (very similar) example code, and explain the differences to the Proc callback method.

First the ruby client code:

 1)  def cb(event, userdata)
 2)      puts "event is #{event}, userdata is #{userdata}"
 3)  end
 4)
 5)  ruby_extension.register_async_symbol(:cb, "my user data")

There are two important differences to the Proc code; the fact that the callback is a real method (defined with def), and how we pass it into the extension call. We cannot just use "cb", because otherwise ruby attempts to execute the function cb before calling register_async_symbol. Instead we have to pass the Symbol that represents the callback method.

Now we look at the extension code:

 1) int internal_callback(int event, void *userdata) {
 2)     VALUE passthrough = (VALUE)userdata;
 3)     VALUE cb;
 4)     VALUE cbdata;
 5)
 6)     cb = rb_ary_entry(passthrough, 0);
 7)     cbdata = rb_ary_entry(passthrough, 1);
 8)
 9)     rb_funcall(rb_class_of(cb), rb_to_id(cb), 2, INT2NUM(event),
10)                cbdata);
11)
12)     return 0;
13) }
14)
15) VALUE ext_register(VALUE obj, VALUE cb, VALUE userdata) {
16)     VALUE passthrough;
17)
18)     if (rb_class_of(cb) != rb_cSymbol)
19)         rb_raise(rb_eTypeError, "Expected Symbol callback");
20)
21)     passthrough = rb_ary_new();
22)     rb_ary_store(passthrough, 0, cb);
23)     rb_ary_store(passthrough, 1, userdata);
24)
25)     register_async_callback(internal_callback,
26)                             (void *)passthrough);
27) }
28)
29) rb_define_method(c_extension, "register_async_symbol",
30)                  ext_register, 2);

The differences are minor. Line 29 defines this as "register_async_symbol" instead of "register_async_proc". Line 18 checks to make sure that this is of type rb_cSymbol instead of rb_cProc. Line 9 is where the biggest difference is. Instead of using the "call" method to invoke the Proc, we instead use the class and the ID of the method that the user originally gave to us.

[1] http://ruby-doc.org/docs/ProgrammingRuby/html/tut_containers.html
[2] http://stackoverflow.com/questions/1435743/why-does-explicit-return-make-a-difference-in-a-proc

3 comments:

  1. Could you provide an example where the async callback works?

    Asynchronous in C means either an event-loop (which means a hassle to program in), or threads. In the latter case, the thread that calls the C callback will not hold the ruby GIL and must not issue *ANY* rb_-functions.

    Am I forgetting something?

    ReplyDelete
  2. There is an error in you code example. When you call rb_yield_values the first argument must be an integer that indicates the number of yield params. In you example result = rb_yield_values(3, rb_str_new2("first"), rb_str_new2("second"),rb_str_new2("third"));

    ReplyDelete
    Replies
    1. You are absolutely correct. I've fixed the post now. Thanks for the update!

      Delete