Tuesday, October 4, 2011

Git bash prompts and tab completion

Someone recently asked me about my nifty bash command-prompt with git branch names. If I'm not in a git directory, then the bash prompt looks normal:

[clalance@localhost ~]$

However, as soon as I cd into any directory that is a git repository, my prompt changes:

[clalance@localhost oz (master)]$

If I'm in the middle of a rebase, my prompt looks like:

[clalance@localhost oz (master|REBASE-i)]$

There are many other prompts, but that just gives you a taste of what you get. All of this goodness is due to the git-completion file that is shipped along with the git sources. The canonical place for git-completion.sh is actually the upstream git sources; you can see it here: http://repo.or.cz/w/git.git/blob/HEAD:/contrib/completion/git-completion.bash. Basically, you download that file, put it somewhere in your home directory (mine is at ~/.git-completion.sh), source it from your .bashrc, and then modify your PS1 to call the appropriate function. The end of my .bashrc looks like:

source ~/.git-completion.sh
export PS1='[\u@\h \W$(__git_ps1 " (%s)")]\$ '

The additional benefit that you get from sourcing .git-completion.sh is that you get branch auto-completion, which is also a very useful feature.

Friday, September 23, 2011

RPM dependency trees

Recently I wondered what the dependency tree for Aeolus looked like in Fedora. I knew we had a whole host of dependencies, but I thought it would be instructive to see it visually.

This has been mentioned in other blog posts in the past, but the basic procedure to do this on Fedora is:

# yum install rpmorphan graphviz
$ rpmdep -dot aeolus.dot aeolus-all
$ dot -Tsvg aeolus.dot -o aeolus.svg

The rpmorphan provides the rpmdep binary. The rpmdep binary is a perl script that runs through the RPM dependency information, outputting one digraph node per-line. Then we use dot (part of the graphviz package) to take that digraph information and generate an image out of it. In the above example I made it generate an SVG, but you can have it output PNG, JPEG, PDF, etc. The full list of what dot can do is here: http://www.graphviz.org/doc/info/output.html

Thursday, September 15, 2011

Oz 0.7.0 release

I'm pleased to announce release 0.7.0 of Oz. Oz is a program for doing automated installation of guest operating systems with limited input from the user.

Release 0.7.0 is a bugfix and feature release for Oz. Some of the highlights between Oz 0.6.0 and 0.7.0 are:

  • Ability to use the "direct initrd injection" method to install Fedora/RHEL guests. This is an internal implementation detail, but can significantly speed up installs for Fedora or RHEL guests. (thanks for the tip from Kashyap Chamarthy)

  • Support for Fedora-16 (thanks to Steve Dake for help in making this work)

  • Use the serial port to announce guest boot, rather than a network port. This makes it so we no longer have to manipulate iptables, and gets us one step closer to having Oz run as non-root

  • (for developers) Re-written unittests in python for speedier execution

  • (for developers) Additional methods in the TDL class to merge in external package lists (thanks to Ian McLeod)


A tarball of this release is available, as well as packages for Fedora-14, Fedora-15, and RHEL-6. Note that to install the RHEL-6 packages, you must be running RHEL-6.1 or later. Instructions on how to get and use Oz are available at http://aeolusproject.org/oz.html

If you have any questions or comments about Oz, please feel free to contact aeolus-devel@lists.fedorahosted.org or me (clalance@redhat.com) directly.

Thanks to everyone who contributed to this release through bug reports, patches, and suggestions for improvement.

Thursday, September 8, 2011

New required kickstart line in Fedora 16

Just a quick note for anyone looking at Fedora-16. From Fedora-16 forward, you need a new line in your kickstart that looks like:
part biosboot --fstype=biosboot --size=1

I'm honestly not sure what this is exactly needed for, but unattended kickstart installs will not start without it. There is a bit more information at https://fedoraproject.org/wiki/Anaconda/Kickstart

Tuesday, September 6, 2011

Services and systemd

I spent some time last week poking around systemd and trying to figure out how certain things work. I can't claim to be an expert yet, but I did uncover some things that I found to be very useful.

If you try to start a service on a machine with systemd (Fedora-15, for instance), it actually looks different then with a traditional SysV style init.

SysV:

[root@localhost ~]# service mongod start
Starting mongod: [ OK ]
[root@localhost ~]# service mongod stop
Stopping mongod: [ OK ]
[root@localhost ~]# /etc/init.d/mongod start
Starting mongod: [ OK ]
[root@localhost ~]# /etc/init.d/mongod stop
Stopping mongod: [ OK ]
[root@localhost ~]#

Systemd:

[root@localhost ~]# service mongod start
Starting mongod (via systemctl): [ OK ]
[root@localhost ~]# service mongod stop
Stopping mongod (via systemctl): [ OK ]
[root@localhost ~]# /etc/init.d/mongod start
Starting mongod (via systemctl): [ OK ]
[root@localhost ~]# /etc/init.d/mongod stop
Stopping mongod (via systemctl): [ OK ]
[root@localhost ~]#

"(via systemctl)" is a small but important change to how services are launched. With SysV-style scripts, the scripts are executed more-or-less directly from the bash shell they are launched from (the "service" binary does a little more in terms of cleaning up the environment, but it still ends up exec'ing the script in the end).

With systemd this all changes. One of the first things nearly all initscripts do is to source /etc/init.d/functions. On a systemd-enabled system, the very first thing that /etc/init.d/functions does is to execute systemctl and then exit (ignoring the rest of the initscript). What systemctl does is to put a message on dbus asking for the service you specified to be started. systemd itself is listening on dbus; when it sees a message like this, it picks up the message and proceeds to act on it. It first looks to see if there is a native systemd unit file for this service; if there is, it starts the service according to the native unit file and returns status to systemctl, which returns status to service. If there is no native systemd unit file, it then looks in /etc/init.d for a legacy script. If one is found, then it fork and exec's that script, and returns the status to systemctl, which returns to service.

This leads to one of the most visible issues with systemd, in that there is no output if the initscript fails. That is, if you are using a legacy style initscript and you are used to certain output being shown when something fails, it may not be shown anymore since that output was consumed by systemd itself and not returned to systemctl.

One way to deal with this is to skip the redirect from service to systemctl. There are different ways to do this depending whether you are using the "service" binary or if you are executing the script directly. If you are using the service binary, it understands a new flag to skip redirection to systemd:
service --skip-redirect foo start
If you are directly executing the initscript, you need to pass an environment variable:
SYSTEMCTL_SKIP_REDIRECT=1 /etc/init.d/foo start

Monday, August 22, 2011

Oz 0.6.0 release

I'm pleased to announce release 0.6.0 of Oz. Oz is a program for doing automated installation of guest operating systems with limited input from the user.

Release 0.6.0 is a bugfix and feature release for Oz. Some of the highlights between Oz 0.5.0 and 0.6.0 are:

  • The ability to specify the destination for the ICICLE output from oz-install
    and oz-generate-icicle

  • pydoc class documentation for all internal Oz classes

  • Automatic detection of KVM or QEMU at runtime (this allows oz to be used within virtual machines, although with a large performance hit)

  • Less scary warning messages in the debug output

  • Printing of the screenshot path when a build fails

  • Ability to run multiple Oz installs of the same OS at the same time

  • Support for OEL and ScientificLinux

  • Support for RHEL-5.7

  • Support for CentOS 6

  • Support for OpenSUSE arbitrary file injection and command execution

  • Ability to make the TDL (template) parsing enforce a root password

  • Rejection of localhost URLs for repositories (since they must be reachable from the guest operating system, localhost URLs make no sense)


Fedora-14, Fedora-15, and RHEL-6 packages are available for this release. Note that to install the RHEL-6 packages, you must be running RHEL-6.1 or later. Instructions on how to get and use Oz are available at http://aeolusproject.org/oz.html

If you have any questions or comments about Oz, please feel free to contact aeolus-devel@lists.fedorahosted.org or me (clalance@redhat.com) directly.

Thanks to everyone who contributed to this release through bug reports, patches, and suggestions for improvement.

Friday, July 29, 2011

Release of ruby-libvirt 0.4.0

I'm pleased to announce the release of ruby-libvirt 0.4.0. ruby-libvirt is a ruby wrapper around the libvirt API. Version 0.4.0 brings new APIs, more documentation, and bugfixes:

  • Updated Domain class, implementing dom.memory_parameters=,
    dom.memory_parameters, dom.updated?, dom.migrate2,
    dom.migrate_to_uri2, dom.migrate_set_max_speed,
    dom.qemu_monitor_command, dom.blkio_parameters,
    dom.blkio_parameters=, dom.state, dom.open_console, dom.screenshot,
    and dom.inject_nmi

  • Implementation of the Stream class, which covers the
    libvirt virStream APIs

  • Add the ability to build against non-system libvirt libraries

  • Updated Error object, which now includes the libvirt
    code, component and level of the error, as well as all of
    the error constants from libvirt.h

  • Updated Connect class, implementing conn.sys_info, conn.stream,
    conn.interface_change_begin, conn.interface_change_commit, and
    conn.interface_change_rollback

  • Updated StorageVol class, implementing vol.download and vol.upload

  • Various bugfixes


Version 0.4.0 is available from http://libvirt.org/ruby:

Tarball: http://libvirt.org/ruby/download/ruby-libvirt-0.4.0.tgz
Gem: http://libvirt.org/ruby/download/ruby-libvirt-0.4.0.gem

It is also available from rubygems.org; to get the latest version, run:

$ gem install ruby-libvirt

As usual, if you run into questions, problems, or bugs, please feel free to
mail me (clalance@redhat.com) and/or the libvirt mailing list.

Thursday, July 7, 2011

Oz is now in Fedora!

Thanks to the work by Pádraig Brady, Oz is now available in Fedora 14, 15, rawhide, and EPEL 6. It should be making its way out to the mirrors shortly. Problems with the Oz package in Fedora should be reported to RedHat Bugzilla.

Thursday, June 30, 2011

Oz 0.5.0 release

I'm pleased to announce release 0.5.0 of Oz. Oz is a program for doing automated installation of guest operating systems with limited input from the user.

Release 0.5.0 is a bugfix and feature release for Oz. Some of the highlights between Oz 0.4.0 and 0.5.0 are:


  • Replace icicle-nc binary with a shell script to try various methods. Besides being more portable, this also allows us to convert Oz to a noarch RPM (thanks to Padraig Brady)

  • Support for Ubuntu 6.06

  • Support for md5sum/sha1sum/sha256sum checking of ISOs after download

  • New -x flag for oz-install to write the libvirt XML file to a user-specified location (thanks to Steve Dake)

  • Support for OpenSUSE customization

  • Support for F-15 customization (thanks to Steve Dake)

  • Support for running commands at the end of package installation (thanks to Steve Dake)



Fedora 14 and RHEL-6 packages are available for this release. Note that to install the RHEL-6 packages, you must be running RHEL-6.1 or later. Instructions of how to get and use Oz are available at http://aeolusproject.org/oz.html

If you have any questions or comments about Oz, please feel free to contact
aeolus-devel@lists.fedorahosted.org or me (clalance@redhat.com) directly.

Thursday, June 23, 2011

Release of libdeltacloud 0.9

I'm pleased to announce the release of libdeltacloud 0.9. libdeltacloud is a library for accessing the Deltacloud API from C or C++ programs. There are several important changes in this release:

  • Performance optimizations

  • Support for deltacloud driver section

  • Support for deltacloud buckets and blobs

  • Support for deltacloud loadbalancers

  • Full doxygen documentation of all of the functions

  • Other minor bugfixes


The release is available from: http://people.redhat.com/clalance/libdeltacloud/0.9

The source code is pushed to the git repository, which lives here: git://git.fedorahosted.org/deltacloud/libdeltacloud.git

As always, questions, comments, problems and patches are welcome!

Thursday, June 9, 2011

Oz 0.4.0 release

I'm pleased to announce release 0.4.0 of Oz. Oz is a program for doing automated installation of guest operating systems with limited input from the user.

Release 0.4.0 is a bugfix and feature release for Oz. Some of the highlights between Oz 0.3.0 and 0.4.0 are:

  • Automatic detection/use of cached JEOS images (the previous method required the user of the oz libraries to detect whether the image was cached or not)

  • Fixes to make Debian installs work for both x86_64 and i386

  • Honor user-provided automatic installation files better, by not doing modifications

  • Use pyparted to generate partitions on newly created disks (if necessary)

  • Add support for Fedora 15

  • Add the ability to supply an initial root/Administrator password to the diskimage

  • Abort installs after 5 minutes of disk inactivity

  • Use M2Crypto to generate SSH keys (if necessary)

  • Some pydoc documentation (more of this to come in the future)

  • Support for Ubuntu 11.04

  • Support for RHEL-6.1


More documentation on Oz (including links on how to obtain it) is available here: http://aeolusproject.org/oz.html

For this release, Fedora 14 and RHEL-6 packages are available. Note that I have stopped providing Fedora 13 packages, since that distribution is (or will be) out of support. Also note that to install the RHEL-6 packages, you must be running RHEL-6.1 or later.

If you have any questions or comments about Oz, please feel free to contact aeolus-devel@lists.fedorahosted.org or me (clalance@redhat.com) directly.

Tuesday, May 10, 2011

Aeolus 0.2.0 Released

(this is a copy-n-paste from a release announcement sent out by Justin Clift)

The Aeolus team are pleased to announce a significant new release of Aeolus, the multi-cloud deployment solution, sponsored by Red Hat.

This release, version 0.2.0, marks the introduction of several important
enhancements:

  • Added capability to import Amazon AMIs through web UI

  • Self-service users can now create templates and launch instances

  • Streamlined configuration data to provide immediately usable installation

  • Now uses the 2nd incarnation of Image Factory to build images

  • Added ability to upload existing images to newly added Cloud Providers

  • Added ability to create pools

  • Added ability to map Cloud Provider Realms to Cloud Engine Realms


And several minor fixes or functionality improvements:

  • Image Warehouse is now the canonical store for image-related data

  • Improved stability and error reporting of QMF communication with Image Factory

  • Web interface verified to work end-to-end with the testing driver (mock)

  • Added ability to search on Instance owner

  • Added end-to-end testing framework for Aeolus scripts

  • Bugfix: BZ #690467 - Realm: Deleting mapping of realm w/o selecting anything throws error


Aeolus is still some way from feature complete, so is suitable for
testing with. But please don't use it with live data at this stage.

Installation is very straightforward, with instructions on the website:

http://www.aeolusproject.org/get_it.html

Packages are available for Fedora 14 x86_64, and RHEL 6.1 x86_64 (Release Candidate or later). Support for Fedora 13 has been discontinued with this release.

Upgrading from previous releases can be done using yum. Due to a recent reorganisation of the yum repositories, you need to point to a new repo first, as shown in the above instructions.

All testing and bug reports are hugely appreciated, directly contributing towards the quality of Aeolus releases and future growth.

Tuesday, April 19, 2011

Release of libdeltacloud 0.8

I'm pleased to announce the release of libdeltacloud version 0.8. libdeltacloud is a library for accessing the Deltacloud API from C or C++ programs. There are several important changes in this release:

  • The API for deltacloud_create_instance() has been changed, hopefully for the last time. The previous version of the API had separate parameters for every option we wanted to pass to the deltacloud server. This was confusing and required that we break the API every time we wanted to add a new parameter. The new version of the API expects an array of deltacloud_create_parameter structures that describe the options to pass to the create call. This is more intuitive, more flexible, and shouldn't require us to break the API again.

  • We no longer expect return data from POST calls to the deltacloud API. The deltacloud API does not guarantee any return data from a POST call (such as that used to create new instances). libdeltacloud was erroneously assuming it would get data back, which was causing problems with certain deltacloud backends. The new version doesn't expect any return data from POST, and instead requires the user of the library to make an additional GET call if they want additional information about an instance.

  • Other minor bugfixes


The release is available here.

The source code is available from: git://git.fedorahosted.org/deltacloud/libdeltacloud.git

As always, questions, comments, problems and patches are welcome!

Thursday, March 31, 2011

Release of Oz 0.3.0

I'm pleased to announce release 0.3.0 of Oz. Oz is a program for doing automated installation of guest operating systems with limited input from the user.

Release 0.3.0 is mostly a bugfix release for Oz. However, there are also a few minor features included as well. Some of the highlights between Oz 0.2.0 and 0.3.0 are:

  • Add the ability to specify the output directory for screenshots (the default
    is still CWD).

  • Add the ability to specify the bridge to use for installs (the default is still the first libvirt NAT bridge on the system).

  • Nicer error messages from the command-line tools when debugging is off.

  • Add the ability to install Debian 5 and 6.

  • Detect servers that can't possibly work for installing RedHat/Fedora OSs, and fail the install.

  • Allow installs with spaces in the name.

  • Add the ability to install OpenSUSE 11.4.


More documentation on Oz is available here: http://aeolusproject.org/oz.html

If you have any questions or comments about oz, please feel free to contact aeolus-devel@fedorahosted.org or me (clalance@redhat.com) directly.

Thursday, March 17, 2011

Release of Oz 0.2.0

I'm pleased to announce release 0.2.0 of Oz. Oz is a program for doing automated installation of guest operating systems with limited input from the user.

There are quite a bit of bugfixes, features, and updates to Oz between 0.1.0 and 0.2.0. Some of them are:

  • The ability to upload individual files into a guest

  • The ability to specify extra repositories to install packages on the guest from

  • Updated documentation

  • More control over Oz's caching behavior. You can now specify whether you want Oz to cache the original ISO, the modified ISO, the resulting JEOS image, or any combination of the above

  • Support for installing Ubuntu 10.04 and 10.10 guests

  • Support for installing CentOS 3, 4, and 5 guests

  • Support for OpenSUSE 11.0

  • Faster ISO extraction/generation

  • A user-configurable timeout for installation


More documentation on Oz is available here: http://aeolusproject.org/oz.html

If you have any questions or comments about oz, please feel free to contact aeolus-devel@fedorahosted.org or me (clalance@redhat.com) directly.

Friday, March 11, 2011

Release of libdeltacloud 0.7.0

I just pushed out a new release of libdeltacloud v0.7. This version adds the ability to override specific portions of a hardware profile when launching an instance. For example, when launching a cloud instance on GoGrid, you have the choice of several pre-canned instance sizes. However, you can also optionally override the amount of memory while launching the instance, so you get something of a "custom" size. This new libdeltacloud release allows for custom memory, CPU, and storage sizes when launching instances.

Tuesday, February 8, 2011

FUDCon 2011 - Day 2

Day 2

The second day of FUDCon was much shorter than the first day. In particular,
there were only presentations in the morning, and the afternoon was devoted to hackfests. I started out by going to a few talks.

Asterisk Hacks

This was a fun session for me to attend. I've always wanted to setup Asterisk at home, if for nothing else to play with it. Up until now, I haven't found the time, but I'm hoping to do so in the near future. This talk was about various out-of-the-way features of Asterisk.

Being a telephony system, Asterisk can do all of the normal stuff you associate with telephony: make and receive calls, do call queuing, voicemail, 3-way calling, conference calls, etc. However, there are a number of fun things that Asterisk can do.

The first one is the ability to shift the pitch of your voice. There is a PITCH_SHIFT setting in the Asterisk configuration files that allows you to change the pitch of the voice on the sending or receiving end. There don't seem to be too many practical uses, but it could be fun to be both the secretary and manager of your "company".

The next hack is the ability to avoid someone. Asterisk can do call routing based on the caller ID coming in. You can then have it automatically hang up, redirect to voice mail, or even redirect to another number. All very useful if you are trying to avoid an irate ex-girlfriend ;).

You can also use Asterisk to increase the volume on a connection automatically for certain phone numbers. This might be useful if you often talk to someone that is very soft-spoken.

Finally, Asterisk can be used to trigger calls based on calendar events. It currently has support for integrating with caldev, ical, and exchange calendars, and based on events on those calendars it can trigger a call when that event is starting. This could be very useful for remembering meetings.

GIMP as a pro photo editing tool
I then went to the session on GIMP hosted by MarĂ­a (tatica). She was fun and engaging, and showed off a few tools including GIMP and a couple of others for editing photos. Unfortunately I was deep in the middle of hacking, so while I looked up from time-to-time, I did not fully listen. Sorry MarĂ­a!

Matahari
The last official talk I went to was about Matahari, which is piece of software aiming to provide remote APIs for system management. The full slide deck is here; I'll just go over some highlights.

  • Matahari is aiming to provide cross-platform, generically useful APIs over remote interfaces
  • Initial targets are for machine/guest introspection and general OS management
  • Matahari is implemented as a QMF agent; uses the AMQP protocol underneath
  • Matahari does not provide policy; needs to be driven externally
  • Initial agents are for Host, Network, Services

This was the last session of the day. The rest of the day was devoted to hackfests. For my part, I spent a bunch of time with Marek Goldmann, the maintainer of Boxgrinder. Boxgrinder and my own project, Oz, share some of the same common goals. The nice part, however, is that there isn't a lot of overlap, and we talked a lot about integrating Oz with boxgrinder and how to speed up operating system installs. I also got quite a bit of hacking done, so I thought this was a useful exercise for me.

All in all, I thought that it was a useful and productive FUDCon. I would definitely go to another one to learn about the things going on in Fedora and meet some more intelligent, fun people.

FUDCon 2011 - Day 1

(sorry this is late; it just took me a couple of weeks to get my act together and get this written)

The weekend of Jan 29 and 30 I went to FUDCon 2011 in Tempe, Arizona. Thanks very much to Robyn Bergeron and all of the rest of the people who did the planning and logistics. This was my first FUDCon, and from my point of view things went off very smoothly.

As has been mentioned elsewhere, the location of the event was excellent. Since I was coming from the snowy northeast, it was great to get a break and have some nice warm weather. The fact that both hotels were in walking distance of the conference venue was also a big plus; I didn't have to rent a car or take a taxi or anything like that.

I arrived on Friday night and headed straight to the primary hotel where the welcome event and opensource.com birthday party were happening. It was nice to meet some people I had only spoken with on IRC, and to have a few drinks and pizza :).

Day 1

On Saturday morning the conference started. My introduction to a "barcamp" style conference started when I showed up at the door to the conference, and I was enlisted to help bring some boxes down to the initial meeting room. People started filtering in, and after some logistics we headed straight into the "pitches" for the talks. Each person who wants to give a talk gives a 20 second synopsis of what they will talk about. Then each of the conference attendees gets to vote for the talks that they want to hear. I pitched my talk about "Cloud Management - Deltacloud, Aeolus, and friends", and after the voting it seemed like there was a lot of interest in the topic. While the votes were being tallied and the scheduled was being finalized, Jared Smith (the Fedora Project Leader) gave his "State of Fedora" talk. At this point we headed over to the main conference venue. My talk was scheduled for the end of the day on Saturday, so I was able to enjoy a few talks before giving mine.

Fedora on ARM

The first talk I attended was the "Fedora on ARM" talk. ARM is one of the world's most widely produced processors, and runs on everything from phones up to servers (in the near future). Why do we want to port Fedora to it? First, the OLPC XO 1.75 is using an ARM processor, and we would like to continue to be able to run Fedora on it. Second, several development boards are currently in production (GuruPlug, Panda board, etc), and running Fedora on those is a great way to get started with ARM development. Third, future tablets and netbooks will likely have ARM processors. And finally, ARM is looking to expand into the server market to compete with x86. Since this is an area that Linux has dominated, it would be good to get a head start here. Servers running ARM aren't quite available yet, but when they do become available they are projected to use 75% less power than equivalent x86 machines.

Previous efforts to port Fedora on ARM have fizzled out. This new effort has a koji build instance at arm.koji.fedoraproject.org, and any developer can submit builds there. It is hoped that this will encourage more participation in the process. The current effort basically has Fedora 13 running on ARM, though there are a couple of problems preventing a full "release". Right now the target platform in ARMv5tel, though they are looking to go to ARMv7 in the future. Now that Fedora 13 mostly works, getting Fedora 14 working is expected to be much easier. All of the patches that have been needed have been sent upstream, and in general things should move a lot faster. In theory, it should be easy to support both ARMv5 and ARMv7 in Fedora (as the instruction sets are compatible), but the reality of the situation is a bit different. ARMv5 has no required FPU, meaning that it is often the case that it has to fall back to software FPU. The ARMv7 specification requires a FPU. Additionally, in ARMv7 some parameters can be passed in FP registers, which obviously will not work on ARMv5. While you can definitely compile for ARMv5 and then run on ARMv7, the performance hit is pretty dramatic. Therefore, there will be two sub-architectures for ARM, one targeting v5 and one targeting v7.

There have been a number of challenges getting ARM building on Fedora. On the hardware side, most of the boards that are available are fairly slow. 512MB of RAM, single-core, with slow I/O seems to be the norm. There are some new "Panda boards" that have 1GB of memory and 2 cores, but the ethernet connection is slow (100Mbit), and the storage subsystem is slow. What this all means is that compiling on these machines takes a long time. On the software side, the challenges have mostly been around software that does not support ARM. For instance, OCaml in Fedora-13 does not support ARM (though this is fixed in Fedora-15). Other packages have had to be patched to deal with ARM alignment problems, compile problems, etc.

In terms of using Fedora on ARM, there are a few hurdles to overcome. The first is that for a random off-the-shelf ARM board, the bootloaders tend to be proprietary. That means that it is not always clear how to load a custom kernel onto the board. The u-boot project can help with many, but not all boards. Even if you have figured out how to load a custom kernel, the Fedora ARM port currently does not provide a packaged kernel, so it is up to you to compile one and create a minimal root filesystem to get the system bootstrapped.

The last part of this session was a quick overview of OLPC. The new version of the OLPC (1.75) will be running an ARMv7 processor, and they want to continue to be able to run Fedora on them. At the same time, there are 2 million x86-based OLPC machines out in the wild, so they want to continue to run Fedora on those machines. In particular, keeping the base distribution small and keeping dependencies down will help on the old OLPCs which only have 256MB of memory.

Sheepdog

The next talk I attended was about Sheepdog, a distributed storage system for Qemu/KVM block storage. I had previously seen a talk about this at KVM Forum 2010, but I went in hoping to understand some more details.

What's the motivation for building Sheepdog? It has a few advantages over other distributed block storage systems. The first is that it is open-source, unlike many of the competitors. The second is that hardware-based distributed block storage systems, like SANs, are very expensive and can be a single point of failure. The third reason is that other open-source distribute block storage systems (like CEPH and Lustre) are complex to setup and administer, and don't always have the performance characteristics expected.

Sheepdog is a fully-symmetric, reliable, scalable, and well-performing distributed block storage system. There is no central node, hence no single point of failure. Instead, blocks are distributed around the machines in a clock-like fashion, so the loss of a single (or in most cases 2) machine does not cause the data to be lost. Sheepdog is intimately tied into Qemu; it uses block layer hooks in Qemu to do its work. On the machine that is running the qemu/kvm process, there is a sheep server process that qemu talks to in order to read and write blocks. When a request for a block that is not on the local node is received, the sheepdog server process reaches out to the other sheepdog process running on other machines and requests the block that it is looking for. Because the data is broken up into 4MB chunks and distributed around the nodes, the performance should be comparable to other network-based storage systems (such as NFS), with much higher reliability.

Sheepdog currently uses the Corosync Engine to do cluster-like operations, which has advantages and disadvantages. The obvious advantage is that Corosync is a well-tested piece of cluster infrastructure that uses a well-known totem ring protocol to do synchronization across nodes. The disadvantage is that corosync is limited to around 48 total nodes. Moving to larger numbers of nodes will require developing a new protocol.

Cloud: The future of computing

I next went to Mike McGrath's talk on the future of cloud computing. The first part of his talk explained the advantages that businesses see in cloud computing (vs. traditional, hardware-hosted-on-premises computing); cost, scalability, and the ability to pay as you go. In this section he also talked about the differences between IaaS, PaaS, and SaaS. Infrastructure as a service (IaaS) is the idea of running virtual machines on hardware that you do not necessarily control. Amazon EC2 is the canonical example of IaaS. Platform as a service (PaaS) is the idea of running your application on a platform that is already prepared for you. Web hosting providers fall into this category. Finally, software as a service (SaaS) is the idea of being completely divorced from the maintenance of the system. Things like Google Docs fall into this category. He then demoed "Libra", an application that he is working on to quickly deploy PaaS applications.

The second part of his talk was how cloud computing is set to replace traditional computing, and how ill-prepared free software is to deal with that new future. In particular, his points are that:

  1. HTML5 is set to replace traditional desktop toolkits with a rich interface
  2. The new model of cloud computing encourages consumption, not contribution
  3. Duplicating a cloud computing model becomes easy (given enough hardware), so there is little incentive for companies to open-source their software stack
  4. Current licenses are not prepared for cloud computing
  5. The open source crowd will have to compromise to open standards, rather than open-source

While I agree with him on some points, I don't believe that cloud computing is going to completely take over the computing landscape. Computing goes through cycles from thin clients to thick clients, and back again. While cloud computing will almost certainly be a part of the IT landscape for a long time to come, it will be just that: a part. Not every application is suitable for cloud computing. Not every user wants to outsource their concerns to a third party.

Pulp

The last session I went to before my own was about Pulp, which is a software repository management system. It can manage packages, errata, and distros, along with doing repository synchronization from a feed or custom repos.

The architecture looks like:


Essentially, there is a central pulp server that can be interfaced with via REST. It uses a mongodb to hold all of the metadata about packages, errata, etc, and can push (via AMQP) requests to consumer machines. The consumer machines run a small pulp daemon that continually monitors for changes, so if modifications are done outside of pulp, the main server can be notified. The current version of pulp supports modern Fedora along with RHEL-5 and RHEL-6.

When managing repositories with pulp, there are a number of operations available. Repositories can be created, deleted, or updated based on commands from the admin. The data in the repositories can be sync'ed immediately or at a fixed schedule, and the content can come from yum repositories, RHN feeds, or locally created repositories. It also supports global package search across all repos, which may be useful for finding out-of-date packages in the repositories.

The consumers need to subscribe (bind) to certain pulp repositories. Once they are subscribed, the repositories are available through yum, so the normal yum commands will work to install additional packages. The pulp server can do remote installation of packages through the pulp API if desired. As mentioned before, the daemon running on the consumer can keep track of packages that were installed locally, and update the pulp server with that information. Finally, any actions done on the consumer are extensively audited for debugging/security purposes.

What's the difference between Pulp and Spacewalk? Pulp is a much lighter version of Spacewalk, so it can be integrated separately. The hope is to break up Spacewalk into smaller, single-purpose components that are more re-usable (and hence have more users).

Some advanced features of pulp include:
  • Batch operations on groups of repositories and groups of consumers
  • LDAP integration
  • Repository Cloning, so you can start with the same repository but use it for multiple different purposes (think development vs. production)
  • A tagging system for consumers
  • Full REST API

The future roadmap for Pulp includes:
  • Maintenance Windows (do updates at particular dates/times)
  • Scheduled installation of packages on consumers
  • Support for external content delivery systems to reduce bandwidth
  • Clone filters to filter out what packages to clone
  • HA and Failover
  • Internationalization

Cloud Management
It was then time for my talk. I'm not a great presenter, so I was a bit nervous to start with, but I eventually got into a flow with my talk. I spoke about both Deltacloud and the newly launched Aeolus project, which uses Deltacloud underneath. The outline I used is available here, so I won't go into too much additional detail here. I will say that I had quite a few people come to my presentation, and I had some good questions, so I thought it was successful from that point of view. I'll record some of the questions that I remember below; they make a nice addendum to the material from the talk itself. Any mistakes are mine, since I am working off of my imperfect memory.

Deltacloud
Q: Why use the deltacloud API over something like the EC2 API, which seems to becoming an industry standard?
A: The Deltacloud API is very useful right now, since there is no common API between cloud providers.  While some providers are indeed implementing the EC2 API, it is not ubiquitous.  If it turns out in the future that every cloud that we care about implements the EC2 API, then deltacloud could be viewed as superflous.  Until that point, however, we think that it has significant value in avoid cloud lock-in.

Oz
Q: What about making Oz more configurable, to allow users to upload their own kickstarts?
A: This is a great question, and the answer is that yes, we should make it more configurable. In response to this question, I implemented uploadable autoinstallation files which went into the Oz 0.1.0 release.

Aeolus Conductor
Q: Looking at the full diagram of Aeolus, it is a pretty complicated structure. How are Fedora users expected to install and configure this?
A: This is a valid question, and the answer is not fully formed yet. In order to solve many of the problems that Aeolus is trying to solve, we think the complexity of the internal components is needed. That being said, we really need to make it simple to deploy a simple Aeolus Conductor. To that end, we have a set of scripts called "aeolus-configure" that can install and configure a simple configuration. This certainly needs work, but it is one way in which we can make this more consumable to end-users.

After my talk, there was FUDPub which was a lot of fun. Not much to mention here, other than the combination of drinks, bowling, pool, and friends was fantastic. I would highly recommend doing something similar at future FUDCons.

Friday, February 4, 2011

Oz version 0.1.0

I've just sent the following mail to aeolus-devel@lists.fedorahosted.org and cloud@lists.fedoraproject.org, announcing the first release of the Oz tool:

All,
I'm pleased to announce the first release of Oz, version 0.1.0. Oz is
a tool for doing automated installation of operating systems using the native
installation tools. While Oz is part of the Aeolus umbrella, and thus intended
to be used in a cloud management environment, it is also useful as a standalone
tool to do guest installation.

This first release focusses on providing the basic infrastructure to do guest
installs. There are 3 possible phases to a guest install (and all 3 are
optional):

1) Installation of a JEOS. Oz creates an automated installation file (a
kickstart, winnt.sif, preseed, etc), and then launches a KVM virtual machine
to do the installation. By default, a minimal JEOS is always installed to
reduce the likelihood of installation failure (though this behavior can be
overriden)

2) Customization. Once a guest is installed from step 1) (or otherwise
provided), the virtual machine is launched and a native tool is used to do
installation of additional packages and files. For instance, in the case of
modern linux, ssh is used to upload files and run commands to install
additional packages.

3) Manifest generation. After the guest installation and customization are
done (if any), a package manifest (and some other metadata) can be extracted
from the image.

As mentioned, all 3 steps above are optional; you can start at step 1 and run
all the way to step 3 using oz, or you can install a guest using another
method and skip directly to steps 2 or 3. Documentation of how to do these
sorts of things are on the Oz webpage at http://aeolusproject.org/oz.html.

At present, Oz can install the following types of guests:
Fedora: 7, 8, 9, 10, 11, 12, 13, 14
Fedora Core: 1, 2, 3, 4, 5, 6
RHEL 2.1: GOLD, U2, U3, U4, U5, U6
Fedora Core: 1, 2, 3, 4, 5, 6
RHEL 2.1: GOLD, U2, U3, U4, U5, U6
RHEL 3: GOLD, U1, U2, U3, U4, U5, U6, U7, U8, U9
RHEL 4: GOLD, U1, U2, U3, U4, U5, U6, U7, U8
RHEL 5: GOLD, U1, U2, U3, U4, U5, U6
RHEL 6: 0
Ubuntu: 6.10, 7.04, 7.10, 8.04.[1,2,3,4], 8.10, 9.04, 9.10
Windows: 2000, XP, 2003
RHL: 7.0, 7.1, 7.2, 7.3, 8, 9
OpenSUSE: 11.1, 11.2, 11.3

Note that not all guest types can do all 3 of the steps above. The Oz webpage
lists which backends can do which phases.

Instructions on how to download Oz are at
http://aeolusproject.org/oz-download.html

As usual, comments, questions, bug reports, and patches are welcome for Oz.

Friday, January 28, 2011

Headed off to FUDCon

In a couple of hours I'll be boarding a plane to Phoenix to attend my first FUDCon. Assuming my talk gets voted on, I'll be talking about Deltacloud and Aeolus, two of the cloud projects currently going on at RedHat. I don't go to very many conferences, and I've never been to FUDCon before, so this should be an interesting experience.

If you are going, see you there!

Monday, January 24, 2011

Exiting a Python program

You'd think this would be a straightforward topic, wouldn't you? In general it is simple, but there is a gotcha.

When you want to exit a program written in python, the typical way to do it is to call sys.exit(status) like so:

import sys

sys.exit(0)
For simple programs, this works great; as far as the python code is concerned, control leaves the interpreter right at the sys.exit method call. If you look under the hood, though, sys.exit is a little more interesting. It does not immediately call the libc function exit(), but instead raises a SystemExit[1] exception. If nothing catches the exception, then the python interpreter catches it at the end, does not print a stack trace, and then calls exit.

There are situations in which you do not want to raise an exception to exit the program. The example I recently came across was debugging some of my python code in Oz[2]. Oz is careful to clean up after itself, which means that on many exceptions it catches the exception, does some cleanup steps, and then re-raises the exception. In general this is a good thing to do, but sometimes when debugging I want to totally skip the cleanup steps so I can examine what went wrong.

Enter os._exit(). This function is a thin wrapper around the libc function exit(), so it does not raise an exception and leaves the program immediately. You would use it like:

import os

os._exit(1)

[1] http://docs.python.org/library/exceptions.html
[2] http://www.aeolusproject.org/oz.html

Friday, January 21, 2011

Deltacloud support in upstream Condor

One of the main tasks that the Aeolus Conductor has to do is to schedule large numbers of cloud instances, possibly across several different cloud providers. In previous projects we've rolled our own sort of scheduler, but instead of re-inventing something for the Conductor, we decided to re-use a known component.

In this case we re-used Condor, a well-known Grid (and other) scheduler. There is good precedent for putting cloud-like functionality into condor; not too long ago, direct EC2 support was added, so using condor in this way was not unprecedented.

We didn't want to use the existing condor EC2 infrastructure, however, since that ties you into a single cloud. Instead we decided to write new code to utilize the Deltacloud API, which will give us nice cross-cloud functionality.

I'm pleased to say that as of January 7, the code to manage cloud instances via deltacloud is in upstream condor. That means that this code will be part of the condor 7.6 release. It is a bit complicated to use, which is why we hide the whole thing behind the conductor. However, it is possible to use the deltacloud support by hand, which I will describe below.

In order to take full advantage of condor's matching and scheduling capabilities, our current solution relies on having provider classads and instance classads. The provider classads describe each of the cloud provider combinations, along with their values. For instance, if you had two AMIs on EC2 that you wanted to be able to use, you would generate two separate classads with all of the data the same except for the AMI id. A provider classad looks something like:

Name="provider_combination_1"
MyType="Machine"
Requirements=true
# Stuff needed to match:
hardwareprofile="large"
image="fedora13"
realm="1"
# Backend info to complete this job:
image_key="ami-ac3fc9c5"
hardwareprofile_key="m1.large"
realm_key="us-east-1a"
provider_url="http://localhost:3002/api"
username="ec2_username"
password="ec2_secret_access_key"
cloud_account_id="1"
keypair="mykeypair"

(pruned for brevity and to protect the innocent). There are a few things to note here. First, in the "stuff needed to match" category are generic things the job classad will match against. That is, the instance classad should not need to know any specifics of the cloud provider backend, just that it wants to run Fedora 13 on a large type instance. Once a provider (or providers) have been found that matches those requirements, the values in the "backend info the complete this job" section are substituted into the actual instance classad, and then the job is submitted.

The jobs themselves then look something like:
universe = grid
executable = job_name
notification = never
requirements = hardwareprofile == "large" && image == "fedora13"
grid_resource = deltacloud $$(provider_url)
DeltacloudUsername = $$(username)
DeltacloudPassword = $$(password)
DeltacloudImageId = $$(image_key)
DeltacloudHardwareProfile = $$(hardwareprofile_key)
DeltacloudKeyname = $$(keypair)
queue

There is a lot going on here, so let's break it down. The first three fields are necessary, but not that interesting; they just give the job a name, tell condor to use the grid universe, and tell condor not to send notifications. The interesting things start happening on the "requirements" line. This is the line that specifies what has to match in order for this job to run. In this case, we are saying that condor gridmanager should look through the provider ads looking for one that has a hardwareprofile that matches "large" and an image that matches "fedora13". These values match what was put in our provider ad above, so condor selects that provider ad. Once it selects that provider ad, it can fill in the rest of the job ad $$ substitutions and submit the job. For instance, in our job ad DeltacloudUsername = $$(username). Once our provider ad has been selected above, that $$(username) gets replaced with "ec2_username". The same happens for the rest of the $$ substitutions.

Once condor has made the match, filled in the values and submitted the job, it will then contact the deltacloud core (from the grid_resource line) to submit the job. If all has gone well to this point, then the instance will actually be launched in the cloud. The instance can also be controlled via condor; if you "condor_rm" the job, then the instance in the cloud will be killed off.

Thursday, January 20, 2011

ruby-libvirt in EPEL

Ohad Levy has recently implemented support for launching VMs via libvirt in The Foreman. As part of that, he needed ruby-libvirt bindings available in RHEL 5 and 6, so I've now built EPEL packages for both of these. They are in the "updates-candidates" tag for the time being; to get direct access, the packages are here:

EPEL 5
EPEL 6

Tuesday, January 18, 2011

Writing Ruby Extensions in C - Part 12, Allocating memory

This is the twelfth in my series of posts about writing ruby extensions in C. The first post talked about the basic structure of a project, including how to set up building. The second post talked about generating documentation. The third post talked about initializing the module and setting up classes. The fourth post talked about types and return values. The fifth post focused on creating and handling exceptions. The sixth post talked about ruby catch and throw blocks. The seventh post talked about dealing with numbers. The eighth post talked about strings. The ninth post focused on arrays. The tenth post looked at hashes. The eleventh post explored blocks and callbacks. This post will look at allocating and freeing memory.

Allocating memory


When creating a new ruby object, memory will be automatically allocated from the garbage collector as needed.

If the ruby extension needs to allocate C-style memory, the basic malloc/realloc/calloc calls can be used. However, there are ruby counterparts that do the work of malloc/realloc/calloc in a slightly better way. The advantage of the following calls is that they first try to allocate memory, and if they fail, they will invoke the garbage collector to free up a bit of memory and try again. That way if the program is low on memory, or the address space is fragmented because of the ruby memory allocator, these functions will succeed where basic malloc/realloc/calloc would fail:
  • ALLOC(type) - allocate a structure of the pointer type
  • ALLOC_N(type, num) - allocate num structures of pointer type
  • REALLOC_N(var, type, num) - realloc var to num structure of pointer type

It is important to use xfree() to free the memory allocated by these calls. In the nominal case there isn't much difference between regular free() and xfree(), but if ruby is built a certain way, xfree() does some additional internal accounting. In any case, there is no reason not to use xfree(), so it is recommended to always use xfree(). Thanks to SodaBrew for pointing this out in the comments.

A simple example to demonstrate the use of these functions:

 1) struct mystruct {
 2)     int a;
 3)     char *b;
 4) };
 5)
 6) static VALUE implementation(VALUE a) {
 7)     struct mystruct *single;
 8)     struct mystruct *multiple;
 9)
10)     single = ALLOC(struct mystruct);
11)     xfree(single);
12)
13)     multiple = ALLOC_N(struct mystruct, 5);
14)
15)     REALLOC_N(multiple, struct mystruct, 10);
16)
17)     xfree(multiple);
18)
19)     return Qnil;
20) }

Lines 1 through 4 just define a simple structure containing a char * and an int. The implementation of a ruby method on lines 6 through 20 show the use of the allocation functions. Line 10 shows the allocation of a single structure of type struct mystruct, which is freed on line 11. Line 13 shows the allocation of an array of 5 elements of struct mystructs into the multiple pointer. Line 15 shows the reallocation of the multiple array to 10 elements. Notice that since REALLOC_N is a macro, it operates slightly differently than realloc(); in particular, there is no need (and no way) to re-assign the pointer. Finally, line 17 frees up the multiple pointer and line 19 returns successfully from the function.

Error handling and not leaking memory


Ruby is a garbage collected language, meaning that applications don't generally have to worry about freeing memory after it is used. This garbage collection extends into C extension modules, but only to a certain point. If you are writing a C extension to ruby, there are some places that you have to worry about keeping track of your pointers and freeing them up. To understand why, we need to dig a little into the memory allocation functions of ruby.

When you are writing pure ruby, and execute a line of code like:

x = ['a']

the ruby virtual machine causes some memory to come into existence to hold that list for you. The way that this memory is allocated is with rb_ary_new() (or one of its derivatives). The call chain looks like: rb_ary_new() -> rb_ary_new2() -> ary_new() -> NEWOBJ() -> rb_newobj(). Inside of rb_newobj(), no memory is actually allocated; instead, the new object that we need to come into existence is just taken off of the list of free objects, and the free list head is moved to the next object. If it turns out that no memory is available in this freelist, the garbage collector is run to try to reap some memory, and then the memory is given to this new object. Because this memory is coming from the freelist, it is all involved with (and can later be reaped by) the garbage collection.

When you allocate memory in C code using malloc (or one of its derivatives), no such thing happens. The memory is properly allocated, but it is not involved in any of the garbage collection schemes. This leads to 2 problems:
  1. Since malloc isn't involved in the garbage collection, the malloc can fail earlier than it normally would due to address space fragmentation. This isn't generally a problem on 64-bit architectures, but it could crop up as a problem on 32-bit ones.
  2. If a ruby call in your extension module fails, it will throw an exception. In ruby, exceptions are done via a longjmp out of the extension code and into the ruby exception handling code. If you have allocated any memory with malloc and friends, you have now lost the pointers to that memory, so you now have a memory leak (apparently this problem is much worse when dealing with C++; see [1]).

Problem 1) is partially solved by using the built-in ruby ALLOC, ALLOC_N, and ruby_xmalloc functions. Problem 2) is much more insidious, and more difficult to handle. Luckily, it is not impossible to handle.

Assume you have the following code snippet:


 1) int *ids;
 2) VALUE result;
 3) int i;
 4)
 5) ids = ALLOC_N(int, 5);
 6) for (i = 0; i < 5; i++)
 7)     ids[i] = i;
 8)
 9) result = rb_ary_new2();
10)
11) for (i = 0; i < 5; i++)
12)     rb_ary_push(result, INT2NUM(ids[i]));
13)
14) xfree(ids);

(while this is a bit of a contrived example, it actually bears a lot of resemblance to this[2] code in ruby-libvirt)

What this code is trying to do is to create an array full of the values in the "ids" array. If there are no errors, then this code works absolutely fine and doesn't leak any memory (ids gets freed at line 14, and the ruby array will get reaped by the garbage collector eventually). However, if either rb_ary_new2() or rb_ary_push() fails in lines 9 or 12, then they will automatically longjmp to the ruby exception handler, completely skipping the xfree at line 14. This code has now leaked memory.

The way to fix this is to interrupt ruby's normal longjmp on exception mechanism so that you can insert code of your own before throwing the exception. The rb_protect() ruby call can be used to do exactly this. Unfortunately the interface is a bit clunky, but we have to do what we have to do.

rb_protect() takes 3 arguments: a name of a callback function that takes 1 (and exactly 1 argument), the argument to pass to that callback function, and a pointer to an integer to store the exception address (if any). Because the callback function can only take one argument, typical usage is to create a callback "wrapper" that takes the one and only argument. The data that you pass in can be anything, so if you want to pass in multiple arguments, you can do so by passing in a pointer to a structure containing all of the data that you need. An example should help clarify some of this:


 1) struct rb_ary_push_arg {
 2)     VALUE arr;
 3)     VALUE value;
 4) };
 5)
 6) static VALUE rb_ary_push_wrap(VALUE arg) {
 7)     struct rb_ary_push_arg *e = (struct rb_ary_push_arg *)arg;
 8)
 9)     return rb_ary_push(e->arr, e->value);
10) }
11)
12) int *ids;
13) VALUE result;
14) int i;
15) int exception = 0;
16) struct rb_ary_push_arg args;
17)
18) ids = ALLOC_N(int, 5);
19) for (i = 0; i < 5; i++)
20)     ids[i] = i;
21)
22) result = rb_ary_new2();
23)
24) for (i = 0; i < 5; i++) {
25)     args.arr = result;
26)     args.value = INT2NUM(ids[i]);
27)     rb_protect(rb_ary_push_wrap, (VALUE)&args, &exception);
28)     if (exception) {
29)         xfree(ids);
30)         rb_jump_tag(exception);
31)     }
32) }
33)
34) xfree(ids);

Now when we add entries to the ruby array, we are doing so through the rb_ary_push_wrap() function, called by rb_protect(). This means that if rb_ary_push() fails for any reason and throws an exception, control will be returned back to the code above at line 28, but with exception set to a non-zero number. We have a chance to clean up after ourselves, and then continue propagating the exception with rb_jump_tag(). Note that with the use of a proper structure, we can pass any number of arguments through to the wrapper function, so we can use this for all internal ruby functions. Notice that I did not wrap rb_ary_new2(), even though that can cause the same problem; I leave this as an exercise to the reader.

[1] http://www.thoughtsincomputation.com/posts/ruby-c-extensions-c-and-weird-crashing-on-rb_raise
[2] http://libvirt.org/git/?p=ruby-libvirt.git;a=blob;f=ext/libvirt/domain.c;h=eb4426252af635311e14e234a62780fbd4048f0b;hb=HEAD#l80

Monday, January 17, 2011

Aeolusproject.org

One of the projects I work on at Red Hat is cloud management. Our cloud projects rely on Deltacloud, a cross-cloud API, to do the real work on the cloud.

Due to some historical baggage, some of the other cloud projects were also under the Deltacloud name although they didn't directly deal with the API. In order to reduce confusion, we have renamed these projects under a new umbrella project, Aeolus. The website for this is http://www.aeolusproject.org. The main project under the Aeolus umbrella is the Conductor, a web-based UI for managing instances in clouds. There are also several other subprojects that help the Conductor do its job, including:

  • Oz (an automated guest installation system)
  • Image Factory (a project to do installation of guests along with transformation of them to particular cloud backends)
  • Image Warehouse (a project to move data from place to place depending on user-specified rules)
  • Audrey (a system for configuring cloud instances on-the-fly)

If you have interest in what Red Hat is doing in the cloud, please stop by the website and see what we have to offer. As always, we welcome comments, questions, constructive criticism, and contributions; please send such to the Aeolus mailing list.

Writing Ruby Extensions in C - Part 11, Blocks and Callbacks

This is the eleventh in my series of posts about writing ruby extensions in C. The first post talked about the basic structure of a project, including how to set up building. The second post talked about generating documentation. The third post talked about initializing the module and setting up classes. The fourth post talked about types and return values. The fifth post focused on creating and handling exceptions. The sixth post talked about ruby catch and throw blocks. The seventh post talked about dealing with numbers. The eighth post talked about strings. The ninth post focused on arrays. The tenth post looked at hashes. This post will talk about blocks and callbacks.

Blocks


Blocks [1] are a great idiom in ruby, equivalent to anonymous functions attached to a line of code. As with many other things in ruby C extensions, they are fairly easy to deal with. There are a few functions to know about:
  • rb_block_given_p() - returns 1 if a block was given to this ruby function, 0 otherwise
  • rb_yield(value) - yield a single value to the given block
  • rb_yield_values() - yield multiple values to the given block

In ruby terms "yield"ing sends a value from a statement into a block. If you want to yield multiple values to a ruby block, you have two options: rb_yield() with an array or hash, and rb_yield_values(). They both work equally well, though rb_yield_values() with multiple values is a bit more idiomatic to ruby. It is also possible for the ruby block to return a result from the block; the return value of the last statement of the block will be returned from the rb_yield() or rb_yield_values() call. However, note that the last line of the block cannot be a return; in that case, the value will be lost forever. Unfortunately this puts a bit of a burden on the consumers of the APIs, but it is coded into the ruby runtime[2]. The following example will demonstrate all of these calls.

First let's look at the ruby code:


 1) obj.rb_yield_example {|single|
 2)     puts "Single element is #{single}"
 3)     "done"
 4) }
 5)
 6) obj.rb_yield_values_example {|first, second, third|
 7)     puts "1st is #{first}, 2nd is #{second}, 3rd is #{third}"
 8)     "done"
 9) }

Now let's look at the C code to implement the above:


 1) static VALUE example_rb_yield(VALUE c) {
 2)     VALUE result;
 3)
 4)     if (!rb_block_given_p())
 5)         rb_raise(rb_eArgError, "Expected block");
 6)
 7)     result = rb_yield(rb_str_new2("hello"));
 8)
 9)     fprintf(stderr, "Return value from block is %s\n",
10)             StringValueCStr(result));
11)
12)     return Qnil;
13) }
14)
15) static VALUE example_rb_yield_values(VALUE c){
16)     VALUE result;
17)
18)     if (!rb_block_given_p())
19)         rb_raise(rb_eArgError, "Expected block");
20)
21)     result = rb_yield_values(3, rb_str_new2("first"),
22)                              rb_str_new2("second"),
23)                              rb_str_new2("third"));
24)
25)     fprintf(stderr, "Return value from block is %s\n",
26)             StringValueCStr(result));
27)
28)     return Qnil;
29) }
30)
31) rb_define_method(c_obj, "rb_yield_example",
32)                  example_rb_yield, 0);
33) rb_define_method(c_obj, "rb_yield_values_example",
34)                  example_rb_yield_values, 0);


Callbacks


Although blocks are idiomatic to ruby and should be used wherever possible, there are situations in which they do not work. For instance, if a ruby method needs to be used as a callback for an asynchronous event, blocks do not work; they are only active for the duration of the method call the block is attached to. If it is necessary to call a particular ruby method from a C library asynchronous callback, there are 2 options:

  1. Procs (lambdas)
  2. Named Methods

Procs are more idiomatic to ruby, but as far as I can tell there isn't a whole lot of advantages to Procs over named methods. I'll go through both of them after setting up the example.

Let's assume that the C library being wrapped requires callbacks for asynchronous events. In this case, the library is expecting a function pointer with a signature looking like:

int (*asynccallback)(int event, void *userdata);

(that is, the function must take an event and a void pointer in, and return an int result). Also assume that we have to register the callback with the library:

void register_async_callback(int (*cb)(int, void *), void *userdata);

How would we go about calling a ruby method that the user writes when the library does the asynchronous callback?

Procs


With Procs, we would have the user of our ruby library create a Proc and pass it to the extension. An example ruby client:

 1)  cb = Proc.new {|event, userdata|
 2)      puts "event is #{event}, userdata is #{userdata}"
 3)  }
 4)
 5)  ruby_extension.register_async_proc(cb, "my user data")

Note that the body of the Proc can be any valid ruby; here we simple print out the arguments that were passed into the Proc.

In the extension, we would define a method called "register_async_proc" that takes 2 arguments: the Proc and the user data that we want passed through to the Proc. The extension C code would look something like:


 1) int internal_callback(int event, void *userdata) {
 2)     VALUE passthrough = (VALUE)userdata;
 3)     VALUE cb;
 4)     VALUE cbdata;
 5)
 6)     cb = rb_ary_entry(passthrough, 0);
 7)     cbdata = rb_ary_entry(passthrough, 1);
 8)
 9)     rb_funcall(cb, rb_intern("call"), 2, INT2NUM(event),
10)                cbdata);
11)
12)     return 0;
13) }
14)
15) VALUE ext_register(VALUE obj, VALUE cb, VALUE userdata) {
16)     VALUE passthrough;
17)
18)     if (rb_class_of(cb) != rb_cProc)
19)         rb_raise(rb_eTypeError, "Expected Proc callback");
20)
21)     passthrough = rb_ary_new();
22)     rb_ary_store(passthrough, 0, cb);
23)     rb_ary_store(passthrough, 1, userdata);
24)
25)     register_async_callback(internal_callback,
26)                             (void *)passthrough);
27) }
28)
29) rb_define_method(c_extension, "register_async_proc",
30)                  ext_register, 2);

The above is not a lot of code, but there is a lot going on, so let's step through it one line at a time starting from the end. Line 29 defines our new method called register_async_proc, that will call the internal extension function ext_register (lines 15 to 27) with 2 arguments. Lines 18 and 19 inside of ext_register check to make sure that what the user actually passed us was a Proc. Lines 21 through 23 set up a new ruby array that contains both the callback that the user gave to us and any additional user data that they want passed into the Proc. Line 25 calls the C library function register_async_callback with our *internal* callback, and the ruby array that we set up in lines 21 through 23. There are a couple of things to note with this. First, we cannot use the ruby Proc as the callback directly; the Proc will have the wrong signature, and the C library doesn't have any idea of how to marshal data so that ruby can understand it. Instead, we have the C library call an internal callback inside the extension; this internal callback will marshal the data for the ruby callback, and then invoke the ruby callback. The second thing to note about line 25 is that we pass the array that we created in lines 21 through 23 to the C library in the "opaque" callback data. It is imperative that the C library function provide a void * pointer for user data, otherwise this technique cannot work.

After line 25, the asynchronous callback is set up. When an event happens in the C library, it will callback to the function given to it by register_async_callback. In our case, this callback is internal_callback, lines 1 through 13. The first thing that internal_callback does on line 2 is to cast the void * back to a VALUE so we can operate on it. In lines 6 and 7, the array that was created and registered earlier is pulled apart into separate pieces. Finally, line 9 calls out to the Proc that was originally registered by the user, passing the event that happened and the original user data to be passed into the Proc.

Named methods


Named method callbacks work very similarly to Proc callbacks, so I won't go into great lengths to describe them. I'll show the (very similar) example code, and explain the differences to the Proc callback method.

First the ruby client code:

 1)  def cb(event, userdata)
 2)      puts "event is #{event}, userdata is #{userdata}"
 3)  end
 4)
 5)  ruby_extension.register_async_symbol(:cb, "my user data")

There are two important differences to the Proc code; the fact that the callback is a real method (defined with def), and how we pass it into the extension call. We cannot just use "cb", because otherwise ruby attempts to execute the function cb before calling register_async_symbol. Instead we have to pass the Symbol that represents the callback method.

Now we look at the extension code:

 1) int internal_callback(int event, void *userdata) {
 2)     VALUE passthrough = (VALUE)userdata;
 3)     VALUE cb;
 4)     VALUE cbdata;
 5)
 6)     cb = rb_ary_entry(passthrough, 0);
 7)     cbdata = rb_ary_entry(passthrough, 1);
 8)
 9)     rb_funcall(rb_class_of(cb), rb_to_id(cb), 2, INT2NUM(event),
10)                cbdata);
11)
12)     return 0;
13) }
14)
15) VALUE ext_register(VALUE obj, VALUE cb, VALUE userdata) {
16)     VALUE passthrough;
17)
18)     if (rb_class_of(cb) != rb_cSymbol)
19)         rb_raise(rb_eTypeError, "Expected Symbol callback");
20)
21)     passthrough = rb_ary_new();
22)     rb_ary_store(passthrough, 0, cb);
23)     rb_ary_store(passthrough, 1, userdata);
24)
25)     register_async_callback(internal_callback,
26)                             (void *)passthrough);
27) }
28)
29) rb_define_method(c_extension, "register_async_symbol",
30)                  ext_register, 2);

The differences are minor. Line 29 defines this as "register_async_symbol" instead of "register_async_proc". Line 18 checks to make sure that this is of type rb_cSymbol instead of rb_cProc. Line 9 is where the biggest difference is. Instead of using the "call" method to invoke the Proc, we instead use the class and the ID of the method that the user originally gave to us.

[1] http://ruby-doc.org/docs/ProgrammingRuby/html/tut_containers.html
[2] http://stackoverflow.com/questions/1435743/why-does-explicit-return-make-a-difference-in-a-proc

Sunday, January 16, 2011

Writing Ruby Extensions in C - Part 10, Hashes

This is the tenth in my series of posts about writing ruby extensions in C. The first post talked about the basic structure of a project, including how to set up building. The second post talked about generating documentation. The third post talked about initializing the module and setting up classes. The fourth post talked about types and return values. The fifth post focused on creating and handling exceptions. The sixth post talked about ruby catch and throw blocks. The seventh post talked about dealing with numbers. The eighth post talked about strings. The ninth post focused on arrays. This post will look at hashes.

Hashes


The nice thing about hashes in ruby C extensions is that they act very much like the ruby hashes they represent. There are a few functions to know about:
  • rb_hash_new() - create a new ruby Hash
  • rb_hash_aset(hash, key, value) - set the hash key to value
  • rb_hash_aref(hash, key) - get the value for hash key
  • rb_hash_foreach(hash, callback, args) - call callback for each key,value pair in the hash. Callback must have a prototype of int (*cb)(VALUE key, VALUE val, VALUE in)

An example will demonstrate this:

 1) int do_print(VALUE key, VALUE val, VALUE in) {
 2)      fprintf(stderr, "Input data is %s\n", StringValueCStr(in));
 3)
 4)      fprintf(stderr, "Key %s=>Value %s\n", StringValueCStr(key),
 5)              StringValueCStr(val));
 6)
 7)      return ST_CONTINUE;
 8) }
 9)
10) VALUE result;
11) VALUE val;
12)
13) result = rb_hash_new();
14) // result is now {}
15) rb_hash_aset(result, rb_str_new2("mykey"),
16)              rb_str_new2("myvalue"));
17) // result is now {"mykey"=>"myvalue"}
18) rb_hash_aset(result, rb_str_new2("anotherkey"),
19)              rb_str_new2("anotherval"));
20) // result is now {"mykey"=>"myvalue",
21) //                "anotherkey"=>"anotherval"}
22) rb_hash_aset(result, rb_str_new2("mykey"),
23)              rb_str_new2("differentval"));
24) // result is now {"mykey"=>"differentval",
25) //                "anotherkey"=>"anotherval"}
26) val = rb_hash_aref(result, rb_str_new2("mykey"));
27) // result is now {"mykey"=>"differentval",
28) //                "anotherkey"=>"anotherval"},
29) // val is "differentval"
30) rb_hash_delete(result, rb_str_new2("mykey"));
31) // result is now {"anotherkey"=>"anotherval"}
32)
33) rb_hash_foreach(result, do_print, rb_str_new2("passthrough"));

Most of this is pretty straightforward. The most interesting part of this is line 33, where we perform an operation on all elements in the hash by utilizing a callback. This callback is defined on lines 1 through 8, and takes in the key, value, and the user data provided to the original rb_hash_foreach() call. The return code from the callback defines what happens to the processing of the rest of the hash. If the return value is ST_CONTINUE, then the rest of the hash is processed as normal. If the return value is ST_STOP, then no further processing of the hash is done. If the return value is ST_DELETE, then the current hash key is deleted from the hash and the rest of the hash is processed. If the return value is ST_CHECK, then the hash is checked to see if it has been modified during this operation. If so, processing of the hash stops.

Update: Fixed up the example code to show on the screen.

Saturday, January 15, 2011

Writing Ruby Extensions in C - Part 9, Arrays

This is the ninth in my series of posts about writing ruby extensions in C. The first post talked about the basic structure of a project, including how to set up building. The second post talked about generating documentation. The third post talked about initializing the module and setting up classes. The fourth post talked about types and return values. The fifth post focused on creating and handling exceptions. The sixth post talked about ruby catch and throw blocks. The seventh post talked about dealing with numbers. The eighth post talked about strings. This post will focus on arrays.

Arrays


The nice thing about arrays in ruby C extensions is that they act very much like the ruby arrays they represent. There are a few functions to know about:
  • rb_ary_new() - create a new array with 0 elements. Elements can be added later using rb_ary_push(), rb_ary_store(), or rb_ary_unshift().
  • rb_ary_new2(size) - create a new array with size elements
  • rb_ary_store(array, index, value) - put the ruby value into array at index. This can be used to create sparse arrays; intervening elements that have not yet had values assigned will be set to nil
  • rb_ary_push(array, value) - put value at the end of the array
  • rb_ary_unshift(array, value) - put value at the start of the array
  • rb_ary_pop(array) - pop the last element of array off and return it
  • rb_ary_shift(array) - remove the first element of array and return it
  • rb_ary_entry(array, index) - examine array element located at index without changing array
  • rb_ary_dup(array) - copy array and return the copy
  • rb_ary_to_s(array) - invoke the "to_s" method on the array. Note that this concatenates the array elements together without spacing, so is not generally useful
  • rb_ary_join(array, string_object) - create a string by converting each element of the array to a string separated by string_object. If string_object is Qnil, then no separator is used
  • rb_ary_reverse(array) - reverse the order of all of the elements in array
  • rb_ary_to_ary(ruby_object) - create an array out of any ruby object. If the object is already an array, a reference to the same object is returned. If the object supports the "to_ary" method, then "to_ary" is invoked on the object and the result is returned. If neither of the previous are true, then a new array with 1 element containing the object is returned

An example should make most of this clear:

 1) VALUE result, elem, arr2, mystr;
 2)
 3) result = rb_ary_new();
 4) // result is now []
 5) rb_ary_push(result, INT2FIX(1));
 6) // result is now [1]
 7) rb_ary_push(result, INT2FIX(2));
 8) // result is now [1, 2]
 9) rb_ary_unshift(result, INT2FIX(0));
10) // result is now [0, 1, 2]
11) rb_ary_store(result, 3, INT2FIX(3));
12) // result is now [0, 1, 2, 3]
13) rb_ary_store(result, 5, INT2FIX(5));
14) // result is now [0, 1, 2, 3, nil, 5]
15) elem = rb_ary_pop(result);
16) // result is now [0, 1, 2, 3, nil] and elem is 5
17) elem = rb_ary_shift(result);
18) // result is now [1, 2, 3, nil] and elem is 0
19) elem = rb_ary_entry(result, 0);
20) // result is now [1, 2, 3, nil] and elem is 1
21) arr2 = rb_ary_dup(result);
22) // result is now [1, 2, 3, nil] and arr2 is [1, 2, 3, nil]
23) mystr = rb_ary_to_s(result);
24) // result is now [1, 2, 3, nil] and mystr is 123
25) mystr = rb_ary_join(result, rb_str_new2("-"));
26) // result is now [1, 2, 3, nil] and mystr is 1-2-3-
27) rb_ary_reverse(result);
28) // result is now [nil, 3, 2, 1]
29) rb_ary_shift(result);
30) // result is now [3, 2, 1]
31) result = rb_ary_to_ary(rb_str_new2("hello"));
32) // result is now ["hello"]

Friday, January 14, 2011

Writing Ruby Extensions in C - Part 8, Strings

This is the eighth in my series of posts about writing ruby extensions in C. The first post talked about the basic structure of a project, including how to set up building. The second post talked about generating documentation. The third post talked about initializing the module and setting up classes. The fourth post talked about types and return values. The fifth post focused on creating and handling exceptions. The sixth post talked about ruby catch and throw blocks. The seventh post talk about dealing with numbers. This post will talk about strings.

Dealing with Strings


It is fairly easy to convert C-style strings to ruby string objects, and vice-versa. There are a few functions to know about:
  • rb_str_new(c_str, length) - take the char * c_str pointer and a length in, and return a ruby string object. Note that c_str does *not* have to be NULL terminated; this is one way to deal with binary data
  • rb_str_new2(c_str) - take the NULL terminated char * c_str pointer in, and return a ruby string object
  • rb_str_dup(ruby_string_object) - take ruby_string_object in and return a copy
  • rb_str_plus(string_object_1, string_object_2) - concatenate string_object_1 and string_object_2 and return the result without modifying either object
  • rb_str_times(string_object_1, fixnum_object) - concatenate string_object_1 with itself fixnum_object number of times and return the result
  • rb_str_substr(string_object, begin, length) - return the substring of string_object starting at position begin and going for length characters. If length is less than 0, then "nil" is returned. If begin is passed the end of the array or before the beginning of the array, then "nil" is returned. Otherwise, this function returns the substring of string_object that matches begin..length, though it may be cut short if there are not enough characters in the array
  • rb_str_cat(string_object, c_str, length) - take the char * c_str pointer and length in, and concatenate onto the end of string_object
  • rb_str_cat2(string_object, c_str) - take the NULL-terminated char *c_str pointer in, and concatenate onto the end of string_object
  • rb_str_append(string_object_1, string_object_2) - concatenate string_object_2 onto string_object_1
  • rb_str_concat(string_object, ruby_object) - concatenate ruby_object onto string_object_1. If ruby_object is a FIXNUM between 0 and 255, then it is first converted to a character before concatenation. Otherwise it behaves exactly the same as rb_str_append
  • StringValueCStr(ruby_object) - take ruby_object in, attempt to convert it to a String, and return the NULL terminated C-style char *
  • StringValue(ruby_object) - take ruby_object in and attempt to convert it to a String. Assuming this is successful, the C char * pointer for the string is available via the macro RSTRING_PTR(return_value) and the length of the string is available via the macro RSTRING_LEN(return_value). This is useful to retrieve binary data out of a String object

An example should make most of this clear:

 1) VALUE result, str2, substr;
 2)
 3) result = rb_str_new2("hello");
 4) // result is now "hello"
 5) str2 = rb_str_dup(result);
 6) // result is now "hello", str2 is now "hello"
 7) result = rb_str_plus(result, rb_str_new2(" there"));
 8) // result is now "hello there"
 9) result = rb_str_times(result, INT2FIX(2));
10) // result is now "hello therehello there"
11) substr = rb_str_substr(result, 0, 2);
12) // result is now "hello therehello there", substr is "he"
13) substr = rb_str_substr(result, -2, 2);
14) // result is now "hello therehello there", substr is "re"
15) substr = rb_str_substr(result, -2, 5);
16) // result is now "hello therehello there", substr is "re"
17) // (substring was cut short because the length goes past the end of the string)
18) substr = rb_str_substr(result, 0, -1);
19) // result is now "hello therehello there", substr is Qnil
20) // (length is negative)
21) substr = rb_str_substr(result, 23, 1);
22) // result is now "hello therehello there", substr is Qnil
23) // (requested start point after end of string)
24) substr = rb_str_substr(result, -23, 1);
25) // result is now "hello therehello there", substr is Qnil
26) // (requested start point before beginning of string)
27) rb_str_cat(result, "wow", 3);
28) // result is now "hello therehello therewow"
29) rb_str_cat2(result, "bob");
30) // result is now "hello therehello therewowbob"
31) rb_str_append(result, rb_str_new2("again"));
32) // result is now "hello therehello therewowbobagain"
33) rb_str_concat(result, INT2FIX(33));
34) // result is now "hello therehello therewowbobagain!"
35) fprintf(stderr, "Result is %s\n", StringValueCStr(result));
36) // "hello therehello there wowbobagain!" is printed to stderr

Update: modified the code to fit in the pre box.

Thursday, January 13, 2011

Writing Ruby Extensions in C - Part 7, Numbers

This is the seventh in my series of posts about writing ruby extensions in C. The first post talked about the basic structure of a project, including how to set up building. The second post talked about generating documentation. The third post talked about initializing the module and setting up classes. The fourth post talked about types and return values. The fifth post focused on creating and handling exceptions. The sixth post talked about ruby catch and throw blocks. This post will talk about numbers.

Dealing with numbers


Numbers are pretty easy to deal with in a ruby C extension. There are two possible types of Ruby numbers; FIXNUMs and Bignums. FIXNUMs are very fast since they just use the native long type of the architecture. However, due to some implementation details, the range of a FIXNUM is limited to one-half of the native long type. If larger (or smaller) numbers need to be manipulated, Bignums are full-blown ruby objects that can represent any number of any size, at a performance cost. The ruby C extension API has support for converting native integer types to ruby FIXNUMs and Bignums and vice-versa. Some of the functions are:
  • INT2FIX(int) - take an int and convert it to a FIXNUM object (but see INT2NUM below)
  • LONG2FIX(long) - synonym for INT2FIX
  • CHR2FIX(char) - take an ASCII character (0x00-0xff) and convert it to a FIXNUM object
  • INT2NUM(int) - take an int and convert it to a FIXNUM object if it will fit; otherwise, convert to a Bignum object. Since this does the right thing in all circumstances, this should always be used in place of INT2FIX
  • LONG2NUM(long) - synonym for INT2NUM
  • UINT2NUM(unsigned int) - take an unsigned int and convert it to a FIXNUM object if it will fit; otherwise, convert to a Bignum object
  • ULONG2NUM(unsigned long int) - synonym for UINT2NUM
  • LL2NUM(long long) - take a long long int and convert it to a FIXNUM object if it will fit; otherwise, convert to a Bignum object
  • ULL2NUM(unsigned long long) - take an unsigned long long int and convert it to a FIXNUM object if it will fit; otherwise, convert to a Bignum object
  • OFFT2NUM(off_t) - take an off_t and convert it to a FIXNUM object if it will fit; otherwise, convert to a Bignum object
  • FIX2LONG(fixnum_object) - take a FIXNUM object and return the long representation (but see NUM2LONG below)
  • FIX2ULONG(fixnum_object) - take a FIXNUM object and return the unsigned long representation (but see NUM2ULONG below)
  • FIX2INT(fixnum_object) - take a FIXNUM object and return the int representation (but see NUM2INT below)
  • FIX2UINT(fixnum_object) - take a FIXNUM object and return the unsigned int representation (but see NUM2UINT below)
  • NUM2LONG(numeric_object) - take a FIXNUM or Bignum object in and return the long representation. Since this does the right thing in all circumstances, this should be used in favor of FIX2LONG
  • NUM2ULONG(numeric_object) - take a FIXNUM or Bignum object in and return the unsigned long representation. Since this does the right thing in all circumstances, this should be used in favor of FIX2ULONG
  • NUM2INT(numeric_object) - take a FIXNUM or Bignum object in and return the int representation. Since this does the right thing in all circumstances, this should be used in favor of FIX2INT
  • NUM2UINT(numeric_object) - take a FIXNUM or Bignum object in and return the unsigned int representation. Since this does the right thing in all circumstances, this should be used in favor of FIX2UINT
  • NUM2LL(numeric_object) - take a FIXNUM or Bignum object in and return the long long representation
  • NUM2ULL(numeric_object) - take a FIXNUM or Bignum object in and return the unsigned long long representation
  • NUM2OFFT(numeric_object) - take a FIXNUM or Bignum object in and return the off_t representation
  • NUM2DBL(numeric_object) - take a FIXNUM or Bignum object in and return the double representation
  • NUM2CHR(ruby_object) - take ruby_object in and return the char representation of the object. If ruby_object is a string, then the char of the first character in the string is returned. Otherwise, NUM2INT is run on the object and the result is returned
For this particular topic I'll omit the example. There aren't really a lot of interesting things to show or odd corner cases that you need to deal with when working with numbers.

Wednesday, January 12, 2011

Writing Ruby Extensions in C - Part 6, Catch/Throw

This is the sixth in my series of posts about writing ruby extensions in C. The first post talked about the basic structure of a project, including how to set up building. The second post talked about generating documentation. The third post talked about initializing the module and setting up classes. The fourth post talked about types and return values. The fifth post focused on creating and handling exceptions. This post will talk about ruby catch and throw blocks.

Catch/Throw

In ruby, raising exceptions is used to transfer control out of a block of code when something goes wrong. Ruby has a second mechanism for transferring control to blocks called catch/throw. Any ruby block can be labelled via catch(), and then any line of code within that block can throw() to terminate the rest of the block. This also works with nested catch/throw blocks so an inner nested throw could throw all the way back out to the outer block. Essentially, they are a fancy goto mechanim; see [1] for some examples. How can we catch and throw from within our C extension module? Like exceptions, we accomplish this through callbacks.

To set up a catch in a C extension, the rb_catch() function is used. rb_catch() takes 3 parameters: the first parameter is the name of the catch block, the second parameter is the name of the callback to invoke in block context, and the third parameter is data to be passed to the callback. As may be expected, the callback function must take a single VALUE parameter in and return a VALUE.

To return to a catch point in a C extension, the rb_throw() function is used. rb_throw() takes two parameters: the name of the catch block to return to, and the return value (which can be any valid ruby object, including Qnil). If rb_throw() is executed, control is returned from the point of the rb_throw() to the end of the rb_catch() block, and execution continues from there.

An example can demonstrate much of this. First let's look at the C code to implement an example catch/throw:

 1) static VALUE m_example;
 2)
 3) static VALUE catch_cb(VALUE val, VALUE args, VALUE self) {
 4)     rb_yield(args);
 5)     return Qnil;
 6) }
 7)
 8) static VALUE example_method(VALUE klass) {
 9)     VALUE res;
10)
11)     if (!rb_block_given_p())
12)         rb_raise(rb_eStandardError, "Expected a block");
13)
14)     res = rb_catch("catchpoint", catch_cb, rb_str_new2("val"));
15)     if (TYPE(res) != T_FIXNUM)
16)         rb_throw("catchpoint", Qnil);
17)
18)     return res;
19) }
20)
21) void Init_example() {
22)     m_example = rb_define_module("Example");
23)
24)     rb_define_module_function(m_example, "method",
25)                               example_method, 0);
26) }
Lines 21 through 26 set up the extension module, as described elsewhere.

Lines 8 through 19 implement the module function "method". Line 11 checks if a block is given; if not, an exception is raised on line 12. Line 14 sets up an rb_catch() named "catchpoint". The callback catch_cb() will be executed, and a new string of "val" will be passed into the callback. Lines 3 through 6 implement the callback; the value is yielded to the block initially passed into "method", and a nil is returned (which is ignored). Line 15 checks the return value from the block; if it is not a number, then line 16 does an rb_throw() to abort the entire block (with control passing to the line of ruby code after the Example::method call). If the value from the block is a number, then it is returned at line 18. Note that this particular sequence of calls is contrived, since the value returned from the block is just returned to the caller. Still, I think it is a good example of what can be done with rb_catch() and rb_throw().

Now let's look at some example ruby code that might utilize the above code:

require 'example'

# if the method were to be called like this, an exception would be
# raised since no block is given
# retval = Example::method

# if the method were to be called like this, an exception would be
# raised since the return value from the block is not a number
# retval = Example::method {|input|
#     "hello"
# }

# this works properly, since the return value is a number
retval = Example::method {|input|
    puts "Input is #{input}"
    6
}

[1] http://ruby-doc.org/docs/ProgrammingRuby/html/tut_exceptions.html

Tuesday, January 11, 2011

Writing Ruby Extensions in C - Part 5, Exceptions

This is the fifth in my series of posts about writing ruby extensions in C. The first post talked about the basic structure of a project, including how to set up building. The second post talked about generating documentation. The third post talked about initializing the module and setting up classes. The fourth post talked about types and return values. This post will focus on creating and handling exceptions.

Exceptions

When a method implementation in a ruby C extension encounters an error, the typical response is to throw an exception (a value indicating error can also be returned, but that is not idiomatic). The exception to be thrown can either be one of the built-in exception classes, or a custom defined exception class. The built-in exception classes are:
  • rb_eException
  • rb_eStandardError
  • rb_eSystemExit
  • rb_eInterrupt
  • rb_eSignal
  • rb_eFatal
  • rb_eArgError
  • rb_eEOFError
  • rb_eIndexError
  • rb_eStopIteration
  • rb_eRangeError
  • rb_eIOError
  • rb_eRuntimeError
  • rb_eSecurityError
  • rb_eSystemCallError
  • rb_eThreadError
  • rb_eTypeError
  • rb_eZeroDivError
  • rb_eNotImpError
  • rb_eNoMemError
  • rb_eNoMethodError
  • rb_eFloatDomainError
  • rb_eLocalJumpError
  • rb_eSysStackError
  • rb_eRegexpError
  • rb_eScriptError
  • rb_eNameError
  • rb_eSyntaxError
  • rb_eLoadError

Extension modules should usually define a custom exception class for errors related directly to the extension, and use one of the built-in exception classes for standard errors. The custom exception class should generally be a subclass of rb_eException or rb_eStandardError, though if the module has special needs any of the built-in exception classes can be used. Example:

 1) static VALUE m_example;
 2) static VALUE e_ExampleError;
 3)
 4) static VALUE exception_impl(VALUE klass, VALUE input) {
 5)     if (TYPE(input) != T_FIXNUM)
 6)         rb_raise(rb_eTypeError, "invalid type for input");
 7)
 8)     if (NUM2INT(input) == -1)
 9)         rb_raise(e_ExampleError, "input was < 0");
10)         return Qnil;
11) }
12)
13) void Init_example() {
14)     m_example = rb_define_module("Example");
15)
16)     e_ExampleError = rb_define_class_under(m_example, "Error",
17)                                            rb_eStandardError);
18)
19)     rb_define_module_function(m_example, "exception_example",
20)                               exception_impl, 1);
21) }
Line 14 sets up the extension module. Line 16 creates the custom exception class as a subclass of rb_eStandardError. Now if the extension module runs into a situation that it can't accept, it can raise e_ExampleError and throw an exception of type Example::Error. Line 19 defines a module function that demonstrates the use of standard and custom exceptions. If Example::exception_example is called with an argument that is not a number, it raises the ArgumentError exception on line 6 (side-note: Check_Type should really be used to do this type of checking, but for example purposes we omit that). If Example::exception_example is called with a number argument that is -1, then the custom exception Example::Error is raised on line 9. Otherwise, the method succeeds and Qnil is returned.

Raising exceptions

There are a few different ways to raise exceptions:
  • rb_raise(error_class, error_string, ...) - the main interface for raising exceptions. A new exception object of class type error_class is created and then raised, with the error message set to error_string (plus any printf-style arguments)
  • rb_fatal(error_string, ...) - a function for raising an exception of type rb_eFatal with the error message set to error_string (plus any printf-style arguments). After this call the entire ruby interpreter will exit, so extension modules typically should not use it
  • rb_bug(error_string, ...) - prints out the error string (plus any printf-style arguments) and then calls abort(). Since this call doesn't allocate an error object or do any of the other typical exception handling steps, it isn't technically a function to raise exceptions. This function should only be used when a bug in the interpreter is found, and as such, should not be used by extension modules
  • rb_sys_fail(error_string) - raises an exception based on errno. Ruby defines a separate class for each of the errno values (such as Errno::EAGAIN, Errno::EACCESS, etc), and this function will raise an exception of the type that corresponds to the current errno
  • rb_notimplement() - raises an exception of rb_eNotImpError. This is used when a particular function is implemented on one platform, but possibly not on other platforms that ruby supports
  • rb_exc_new2(error_class, error_string) - allocate a new exception object of type error_class, and set the error message to error_string. Note that rb_exc_new2() does not accept printf-style options, so the string will have to be fully-formed before passing it to rb_exc_new2()
  • rb_exc_raise(error_object) - a low-level interface to raise exceptions that have been allocated by rb_exc_new2()
  • rb_exc_fatal(error_object) - a low-level interface to raise a fatal exception that has been allocated by rb_exc_new2(). After this call the entire ruby interpreter will exit, so extension modules typically should not use it
The example below shows the use of rb_raise() and rb_exc_raise(), which are the only two calls that extension modules should really use.

 1) static VALUE m_example;
 2) static VALUE e_ExampleError;
 3)
 4) static VALUE example_method(VALUE klass, VALUE input) {
 5)     VALUE exception;
 6)
 7)     if (TYPE(input) != T_FIXNUM)
 8)         rb_raise(rb_eTypeError, "invalid type for input");
 9)
10)     if (NUM2INT(input) < 0) {
11)         exception=rb_exc_new2(e_ExampleError, "input was < 0");
12)         rb_iv_set(exception, "@additional_info",
13)                   rb_str_new2("additional information"));
14)         rb_exc_raise(exception);
15)     }
16)
17)     return Qnil;
18) }
19)
20) void Init_example() {
21)     m_example = rb_define_module("Example");
22)
23)     e_ExampleError = rb_define_class_under(m_example, "Error",
24)                                            rb_eStandardError);
25)     rb_define_attr(e_ExampleError, "additional_info", 1, 0);
26)
27)     rb_define_module_function(m_example, "method",
28)                               example_method, 1);
29) }
Lines 20 through 29 show the module initialization. Since this is described in more detail elsewhere, I'll only point out line 25, where a custom attribute for the error class e_ExampleError is defined. When an error occurs in the extension module, additional error information can be placed into that attribute, and any caller can look inside of the error object to retrieve that additional information.

Lines 4 through 18 implement an example method that takes one and only one input parameter. Line 7 checks to see if the input value is a number, and if not an exception is raised with rb_raise() on line 8. Line 10 checks to see if the number is less than 0. If it is, then a new exception object of type e_ExampleError is allocated on line 11 with rb_exc_new2(), and the additional_info attribute of the object is set to "additional information" on line 12. As with most other things, the value that additional_info is set to can be any valid ruby object. Line 14 then raises the exception. This example shows very clearly the power of rb_exc_new2() and rb_exc_raise(), in that additional error information can be passed through to callers.

Handling exceptions

The other half of dealing with exceptions in an extension module is handling exceptions in C code when they are thrown from ruby functions. How is that done since C has no raise/rescue type mechanism? Through callbacks.

There are a few functions that can be used for handling exceptions:
  • rb_ensure(cb, cb_args, ensure, ensure_args) - Call function cb with cb_args. The callback must take in a single VALUE parameter and return VALUE. When cb() finishes, regardless of whether it completes successfully or raises an exception, call ensure with ensure_args. The ensure function must take in a single VALUE parameter and return VALUE
  • rb_protect(cb, cb_args, line_pointer) - Call cb with cb_args. The callback must take in a single VALUE parameter and return VALUE. If an exception is raised by cb(), store the exception handler point in line_pointer and return control. It is then the responsibility of the caller to call rb_jump_tag() to return to the exception point
  • rb_jump_tag(line) - do a longjmp to the line saved by rb_protect(). No code after this statement will be executed
  • rb_rescue(cb, cb_args, rescue, rescue_args) - Call function cb with cb_args. The callback must take in a single VALUE parameter and return VALUE. If cb() raises any exception, rescue is called with rescue_args. The rescue callback should take in two VALUE parameters and return VALUE

Another example should make some of this clear:

 1) static VALUE cb(VALUE args) {
 2)     if (TYPE(args) != T_FIXNUM)
 3)         rb_raise(rb_eTypeError, "expected a number");
 4)     return Qnil;
 5) }
 6)
 7) static VALUE ensure(VALUE args) {
 8)     fprintf(stderr, "Ensure value is %s\n",
 9)               StringValueCStr(args));
10)     return Qnil;
11) }
12)
13) static VALUE rescue(VALUE args, VALUE exception_object) {
14)     fprintf(stderr, "Rescue args %s, object classname %s\n",
15)             StringValueCStr(args),
16)             rb_obj_classname(exception_object));
17)     return Qnil;
18) }
19)
20) VALUE res;
21) int exception;
22)
23) res = rb_ensure(cb, INT2NUM(0), ensure, rb_str_new2("data"));
24) res = rb_ensure(cb, rb_str_new2("bad"), ensure,
25)                 rb_str_new2("data"));
26)
27) res = rb_protect(cb, INT2NUM(0), &exception);
28) res = rb_protect(cb, rb_str_new2("bad"), &exception);
29) if (exception) {
30)     fprintf(stderr, "Failed cb\n");
31)     rb_jump_tag(exception);
32) }
33)
34) res = rb_rescue(cb, INT2NUM(0), rescue, rb_str_new2("data"));
35) res = rb_rescue(cb, rb_str_new2("bad"), rescue,
36                    rb_str_new2("data"));
Line 23 kicks off the action with a call to rb_ensure(). In this first rb_ensure, we pass a FIXNUM object to cb(), which means that no exception is raised. Because of the rb_ensure(), however, the ensure() callback on lines 7 through 11 is called anyway and does some printing.

Line 24 passes a String object to cb(), which causes cb() to raise an exception. Because of the rb_ensure, the ensure() callback on lines 7 through 11 is called and does some printing. Importantly, after ensure() is called the exception is propagated, so in reality none of the code after line 21 will be executed (we'll ignore this fact for the sake of this example).

Line 27 uses rb_protect() to call the callback; since a FIXNUM object is passed, no exception is raised. Note that if the call that is being wrapped by rb_protect() does not raise an exception, exception is always initialized to 0.

Line 28 uses rb_protect() to call cb() with a String object, which causes an exception to be raised. Because rb_protect() is being used, control will be returned to the calling code at line 29, and that code can then check for the exception. Since an exception was raised, the "exception" integer will have a non-0 number and the code can do whatever we need to clean up and then propagate the exception further with rb_jump_tag() on line 31.

Line 34 uses the rb_rescue() wrapper to call cb(). Since a FIXNUM object is passed to cb(), no exception is raised and no callbacks other than cb() are called.

Line 35 uses rb_rescue() to call cb() with a String object, which causes an exception to be raised and the rescue() callback to be executed. The rescue() callback on lines 13 through 18 takes two arguments: the VALUE initially passed into the rb_rescue() rescue_args, and the exception_object that caused the exception. Based on the exception_object, the rescue() callback can choose to handle this exception or not.

Example

Before finishing this post, I'll leave you with another example. When writing ruby code, the full begin..rescue block goes something like:

begin
  ...
rescue FooException => e
  ...
rescue
  ...
else
  ...
ensure
  ...
How would we implement this in C?

 1) static VALUE foo_exception_rescue(VALUE args) {
 2)     fprintf(stderr, "foo_exception_rescue value is %s\n",
 3)             StringValueCStr(args));
 4)     return Qnil;
 5) }
 6)
 7) static VALUE other_exception_rescue(VALUE args) {
 8)     fprintf(stderr, "other_exception_rescue value is %s\n",
 9)             StringValueCStr(args));
10)     return Qnil;
11) }
12)
13) static VALUE rescue(VALUE args, VALUE exception_object) {
14)     if (strcmp(rb_obj_classname(exception_object),
15)                "FooException") == 0)
16)         return foo_exception_rescue(args);
17)     else
18)         return other_exception_rescue(args);
19) }
20)
21) static VALUE cb(VALUE args) {
22)     return rb_rescue(cb, args, rescue, rb_str_new2("data"));
23) }
24)
25) static VALUE ensure(VALUE args) {
26)     fprintf(stderr, "Ensure args %s\n", StringValueCStr(args));
27)     return Qnil;
28) }
29)
30) VALUE res;
31)
32) res = rb_ensure(cb, INT2NUM(0), ensure, rb_str_new2("data"));
This example implements almost the entire ability of the ruby begin..rescue blocks. What it does not implement is the "else" clause; I have not yet come up with a good way to do that. If you think of something to make this example work for the "else" clause, please leave a comment.