Tuesday, January 4, 2011

Writing Ruby Extensions in C - Part 1, Project Setup

Earlier this year, I took over maintainership of the ruby-libvirt bindings[1]. While I had been contributing to the bindings on and off for the last couple of years, taking over maintainership has led me to learning about a whole range of issues deep inside ruby. Subsequently, I've found that while there is information scattered around the internet about writing these bindings, comprehensive guides (with examples) seem to be lacking. This series of blog posts aim to be a guide for anyone interested in some of the finer details of writing ruby extensions in C. All of these notes apply to Ruby 1.8. In theory, most of this also applies to Ruby 1.9, but I have not personally tested them or done much with Ruby 1.9, so your mileage may vary.

This information is culled from various places around the internet, along with reading the ruby source code and banging my head against a wall until things worked. The most useful resources I have found, besides the ruby sources, are at [2] and [3].

This first post will talk about the general structure of a ruby extension project, including documentation and building. Further posts will talk about programming considerations, including defining classes and methods, memory management, etc.

(NOTE: actually writing ruby extensions by hand seems to be kind of passe nowadays. Apparently FFI[4] is all the rage. That being said, I still find this a useful exercise, if to nobody but myself)

Directory structure

The directory structure of a ruby project is flexible, though most of the ruby extensions that I have seen follow a very similar pattern. Usually the top-level of the project contains a directory listing that looks like:
COPYING
NEWS
Rakefile
README.rdoc
doc/
ext/
The COPYING file contains the license for the project. The NEWS file typically contains information about releases. The Rakefile defines rake targets for the project (see the section about Rakefiles for more information). The README.rdoc file contains the header information that will be used when generating the RDoc documentation; see the post about RDoc for more details. The doc subdirectory contains any additional documentation about the project, including the code for the website, example usage of the code, etc. The ext/ directory typically contains the C source code for the extension module, which can be in any number of files (though note the caveat in the RDoc post about automatically generating RDoc documentation from multiple C files). The ext/ directory also contains the extconf.rb file (see the extconf and mkmf section), which controls how to build the extension.

mkmf and extconf

extconf and mkmf are the parts of the ruby extension build system that generate the header files and Makefile(s) needed to compile the C part of the program. Like the Rakefile, it is run through ruby so has all of the power of ruby at its disposal. A file named extconf.rb is generally placed in the ext/ subdirectory of the project, and extconf requires mkmf to do all of the heavy lifting for it. An example extconf.rb looks like:

 1) require 'mkmf'
 2)
 3) RbConfig::MAKEFILE_CONFIG['CC'] = ENV['CC'] if ENV['CC']
 4)
 5) extension_name = 'example'
 6)
 7) unless pkg_config('library_to_link_to')
 8)     raise "library_to_link_to not found"
 9) end
10)
11) have_func('useful_function', 'library_to_link_to/lib.h')
12) have_type('useful_type', 'library_to_link_to/lib.h')
13)
14) create_header
15) create_makefile(extension_name)
Line 1 just pulls in the mkmf module, which is what does all of the hard work here.

Line 3 isn't strictly necessary, but gives the ability to easily use alternate compilers to build the extension. Since mkmf detects the compiler at Makefile creation time, this isn't very interesting until you consider static analysis tools, which tend to substitute the standard compiler with their own enhanced version. By having this line of code at the top, the Rakefile is prepared to let these static analysis tools do their thing (and help improve your code).

Line 5 defines the extension name, which is used later.

Lines 7 through 9 do a pkgconfig check to see if the library necessary to build this extension exists. Typically you will need to have the development package of the library you want to use installed, including the header files. If the library cannot be found, an exception will be raised and no Makefiles will be generated. Note that this is a required first step; all of the have_*() functions later on work by trying to compile and link a program with the function, type, or constant that you are looking for, so they need to know where to find the library to link against.

Line 11 uses the mkmf function have_func() to determine if the library installed on the build system has the function 'useful_function' defined in the header file 'library_to_link_to/lib.h'. If the function is found, then a macro called HAVE_ will be defined in extconf.h (which all of the C files in the project should #include).

Line 12 uses the mkmf function have_type() to determine if the library installed on the build system has the structure 'useful_type' defined in the header file 'library_to_link_to/lib.h'. If the structure is found, then a macro called HAVE_TYPE_ will be defined in extconf.h.

Line 14 actually creates the header file extconf.h, based on the results from all of the previous have_*() functions. The extconf.h file should be #include'd by all of the C files in the project to gain access to the HAVE_* macros that extconf defines.

Line 15 creates the Makefile based on all of the previous information.
While the recommended way to invoke the extconf.rb is through the Rakefile (see the next section), you can also run it by hand to test it out. If the extconf.rb file is located in the recommended ext/ subdirectory, you can run:
$ cd ext
$ ruby extconf.rb
The mkmf commands should run, and if everything goes smoothly, the extconf.h and Makefile will be generated inside of the ext/ subdirectory. If things do not succeed, the output to stdout, or to mkmf.log should help to debug the problem.

Rakefile

Once the extconf is in place, the next step is to create a Rakefile. As the name suggests, Rakefiles are the ruby analog to Makefiles; they allow automation of arbitrary tasks with possible dependencies between them. They also only re-build pieces of the code that have changed since the last invocation. The main difference between Rakefiles and Makefiles is that Rakefiles are written in ruby, so you have the full power of ruby at your disposal.
With that said, let's take a look at a Rakefile. I'll preface this discussion by saying that I don't know all that much about Rakefiles, other than the bare minimum to get them working. There are additional resources out on the web to describe them in depth[5], so if you want to know more, please look there.

 1) require 'rake/clean'
 2)
 3) EXT_CONF = 'ext/extconf.rb'
 4) MAKEFILE = 'ext/Makefile'
 5) MODULE = 'ext/example.so'
 6) SRC = Dir.glob('ext/*.c')
 7) SRC << MAKEFILE
 8)
 9) CLEAN.include [ 'ext/*.o', 'ext/depend', MODULE ]
10) CLOBBER.include [ 'config.save', 'ext/mkmf.log', 'ext/extconf.h',
                      MAKEFILE ]
11)
12) file MAKEFILE => EXT_CONF do |t|
13)     Dir::chdir(File::dirname(EXT_CONF)) do
14)         unless sh "ruby #{File::basename(EXT_CONF)}"
15)             $stderr.puts "Failed to run extconf"
16)             break
17)         end
18)     end
19) end
20) file MODULE => SRC do |t|
21)     Dir::chdir(File::dirname(EXT_CONF)) do
22)         unless sh "make"
23)             $stderr.puts "make failed"
24)             break
25)         end
26)     end
27) end
28) desc "Build the native library"
29) task :build => MODULE
Line 1 brings in the rake task that we care about. There are many more pre-defined rake tasks available; some of them will be described in further posts.

Lines 3 through 7 set up some global ruby variables that we will use later on. The important point to note here is that we have the full power of ruby available to us, including doing directory globs, array concatenation, etc.
Lines 9 and 10 set up the list of files that will get removed during the CLEAN and CLOBBER steps, respectively. 'rake clean' will clean out the development files listed in the CLEAN variable, and 'rake clobber' will clean out the development files in the CLEAN and CLOBBER variables.

Lines 12 through 29 are the meat of the build task. Lines 28 and 29 set up the start of the dependency chain; any time the rake target of "build" is entered, it depends on everything in MODULE (which is 'ext/example.so'). When rake encounters that, it goes looking for any other dependencies that MODULE may have. In this case, we've defined that MODULE depends on SRC, which is a list of all C files in ext/, plus the Makefile. Since the Makefile is going to be auto-generated by mkmf, we have another dependency between the Makefile and EXT_CONF (which is responsible for generating the makefile). At this point we've reached the end of our dependency chain, so the block at lines 13 through 18 is executed, which produces the Makefile. Once that is done rake goes back up the dependency chain and executes the block at lines 21 to 26, which actually does the build using make. At the end of all of this, the extension module should be properly built (assuming no compile errors, of course).

Gem

The ruby gem system aims to be a package manager for pieces of ruby code. While my personal opinion is that this system re-invents operating system package managers (poorly), they are an integral part of the ruby experience. Gems can be easily built using a few rakefile commands, and they are generally registered at http://rubygems.org. A few minor additions to the Rakefile are used to setup the task:

 1) require 'rake/gempackagetask'
 2)
 3) PKG_FILES = FileList[
 4)     "Rakefile", "COPYING", "README", "NEWS", "README.rdoc",
 5)     "ext/*.[ch]", "ext/MANIFEST", "ext/extconf.rb",
 6) ]
 7)
 8) SPEC = Gem::Specification.new do |s|
 9)     s.name = "example"
10)     s.version = "1.0"
11)     s.email = "list@example.com"
12)     s.homepage = "http://example.org/"
13)     s.summary = "C bindings"
14)     s.files = PKG_FILES
15)     s.required_ruby_version = '>= 1.8.1'
16)     s.extensions = "ext/extconf.rb"
17)     s.author = "List of Authors"
18)     s.rubyforge_project = "None"
19)     s.description = "C Bindings"
20) end
21)
22) Rake::GemPackageTask.new(SPEC) do |pkg|
23)     pkg.need_tar = true
24)     pkg.need_zip = true
25) end
Line 1 brings in the rake gempackagetask. Lines 3 through 6 define the files that we want included in the package; ruby globs can be used here. Lines 8 through 20 are the meat of the gem specification, and are pretty straightforward; just replace the fields with ones appropriate for your project. Finally, lines 22 through 25 define the task itself. To actually build the gem, you would now run:

$ rake gem

[1] http://libvirt.org/ruby
[2] http://ruby-doc.org/core
[3] http://ruby-doc.org/docs/ProgrammingRuby/html/ext_ruby.html
[4] https://github.com/ffi/ffi
[5] http://jasonseifer.com/2010/04/06/rake-tutorial

Update: modified some of the examples to make sure the code wasn't cut-off

5 comments:

  1. these Ruby C ext posts are great! Thanks :)

    ReplyDelete
  2. Thanks, good to know that somebody is reading :).

    ReplyDelete
  3. Thanks a lot for this series of posts. They're really very useful - so much better than digging through other extensions looking for examples!

    ReplyDelete
  4. Spakman,

    You are welcome. There are definitely some idiosyncrasies in the ruby source code, so I thought I would share them. If you have further question or things you would like explored, let me know. I can't promise I will look at them, but if I find time I may.

    ReplyDelete
  5. great posts man!!

    ReplyDelete