Thursday, January 6, 2011

Writing Ruby Extensions in C - Part 3, Extension Initialization

This is the third in my series of posts about writing ruby extensions in C. The first post talked about the basic structure of a project, including how to set up building. The second post talked about generating documentation. The posts from here on out will focus on the C code. This post talks about initializing the module and setting up classes.

Initializing the module


There is a bit of magic involved with initially loading the extension module into ruby. Assuming the extension module is called "example", then the C code that implements the extension must have an initialization function that looks like:

 1) static VALUE m_example;
 2)
 3) void Init_example() {
 4)     m_example = rb_define_module("Example");
 5)     example_library_initialize();
 6) }

Line 1 sets up the variable that holds the reference to the module. Line 3 is a function that must be called "Init_<extension_name>", take no parameters, and return nothing. When the ruby interpreter encounters a line of code such as "require 'example'", it will call this initialization function to set things up. Line 4 actually defines the module for us and calls it "Example". Finally, line 5 does whatever initialization is necessary for the library that is being wrapped. In this case, it just calls the example_library_initialize() function.

Defining classes, constants, and methods


Once the module itself has been initialized, functions, classes, methods, and attributes can be added to it. These are pretty easy to use:
  • rb_define_module_function(module, "function_name", implementation, number_of_args) - define function_name for module. Assuming the module is called "Example", functions like this can be invoked from ruby code like:

    out = Example::function_name

    The implementation should be a C function that takes number_of_args and returns a VALUE. See "Implementing methods" below for more explanation of implementation of methods in C.
  • rb_define_class_under(module, "class_name", super_class) - define a new class named "class_name" under the module. super_class can be one of the pre-defined types (rb_cObject, rb_cArray, etc) or a class that has been defined in this module.
  • rb_define_method(class, "method_name", implementation, number_of_args) - define a new method for class. The implementation should be a C function that takes number_of_args and returns a VALUE. See "Implementing methods" below for more explanation of implementation of methods in C.
  • rb_define_const(class, "CONST", value) - define a new constant for class with value. Assuming the module is called "Example" and the class is called "Class", these can be accessed in ruby code like:

    puts Example::Class::CONST

    The value can be any legal ruby type.
  • rb_define_attr(class, "attr_name", read, write) - define a new attribute for class called attr_name. The read and write parameters should each be 0 or 1, depending on whether you want a read implementation and/or a write_implementation for this attribute, respectively.
  • rb_define_singleton_method(class, "method_name", implementation, number_of_args) - define a new singleton method for class. The implementation should be a C functions that takes number_of_args and returns a VALUE. See "Implementing methods" below for more explanation of implementation of methods in C.

Implementing methods


Using the above methods, it is pretty straightforward to define module functions, class methods, and singleton methods. There is a bit of work necessary to understand the C implementation of these methods. The first thing to realize is that the "number_of_args" as the last parameter of the rb_define_* call defines how many parameters the method will take. For no parameters, you would pass 0, for one parameter you would pass 1, etc. When you go to implement the method in C, your C function must take the number of parameters, plus one for the class (this will be shown in the example below).

You can also pass -1, which tells ruby that you want to take optional arguments. When you go to implement the method in C, the C function must take exactly 3 arguments: int argc, VALUE *argv, VALUE klass. The argc parameter defines how many arguments were passed, the argv parameter is all of the arguments in an array, and the last parameter is the klass itself. To properly parse the arguments, the rb_scan_args(argc, argv, "format", ...) should be called. A brief explanation of rb_scan_args is below; for more information, see the document at [1].

The first two arguments to rb_scan_args() are the argc and argv passed into the function. The third argument is a string that defines how many required and how many optional parameters the method requires. The last parameters are pointers to VALUEs to place the value of the arguments in. For instance, to have 1 required and 2 optional parameters to the method, format should be "12" and 3 additional VALUE parameters should be passed to rb_scan_args(). To have no required and 1 optional parameters to the method, format should be "01" and 1 additional VALUE parameter should be passed to rb_scan_args(). Note that if less than the number of required parameters is passed to the method, an ArgumentError exception will be raised. All optional arguments are set to the value that was passed, if any, or "nil".

Let's take a look at an example to show all of this off:

 1) static VALUE m_example;
 2) static VALUE c_example;
 3)
 4) static VALUE mymethod(VALUE c, VALUE arg) {
 5)      fprintf(stderr, "Called mymethod with one arguments\n");
 6)      return Qnil;
 7) }
 8)
 9) static VALUE myvariablemethod(int argc, VALUE *argv, VALUE c) {
10)      VALUE optional;
11)
12)      fprintf(stderr, "Called myvariablemethod with variable
                          arguments\n");
13)
14)      rb_scan_args(argc, argv, "01", &optional);
15)
16)      return Qnil;
17) }
18)
19) void Init_example() {
20)     m_example = rb_define_module("Example");
21)     c_example = rb_define_class_under(m_example, "Class",
                                          rb_cObject);
22)
23)     rb_define_attr(c_example, "my_readonly_attr", 1, 0);
24)     rb_define_attr(c_example, "my_readwrite_attr", 1, 1);
25)
26)     rb_define_const(c_example, "MYCONST", INT2NUM(5));
27)
28)     rb_define_method(c_example, "mymethod", example_mymethod, 1);
29)     rb_define_method(c_example, "myvariablemethod",
                         example_variable_method, -1);
30) }

Lines 19 through 30 are the entry point for the extension. Line 20 defines and stores the module called "Example". Line 21 defines and stores the class "Class" under the module "Example". Line 23 defines a new read-only attribute for the class; this is equivalent to attr_reader in ruby code. This is read-only because the 3rd parameter is 1 and the 4th parameter is 0, meaning to generate a read method but no write method for this attribute. Line 24 defines a new read-write attribute for the class; this is equivalent to attr_accessor in ruby code. This is read-write because the 3rd parameter is 1 and the 4th parameter is 1, meaning to generate both read and write methods. Line 26 defines a new constant for the class called "MYCONST" with a value of 5; this can be accessed in ruby code via Example::Class::MYCONST. Line 28 defines a new method for "Example::Class" called "mymethod" that takes exactly one parameter. Line 29 defines a new method for "Example::Class" called "myvariablemethod" that takes a variable number of parameters.

Now that we have looked at the extension initialization, we can examine the implementation of the methods. Lines 4 through 7 implement the "mymethod" method; the first parameter is the class itself, and the second parameter is the required argument. Lines 9 through 17 implement the "myvariablemethod" method. As described earlier, this takes the number of arguments in argc, the argument array in argv, and the class in c. Line 14 uses rb_scan_args to define zero required arguments and one optional argument. We pass the address of the VALUE "optional" to rb_scan_args(); if an argument is given, this will be filled in with the argument, otherwise it will be set to "nil".

[1] http://www.oreillynet.com/ruby/blog/2007/04/c_extension_authors_use_rb_sca_1.html

Update: edited to make the examples readable

1 comment:

  1. This is very very helpful. Thanks so much!

    ReplyDelete