Tuesday, February 8, 2011

FUDCon 2011 - Day 1

(sorry this is late; it just took me a couple of weeks to get my act together and get this written)

The weekend of Jan 29 and 30 I went to FUDCon 2011 in Tempe, Arizona. Thanks very much to Robyn Bergeron and all of the rest of the people who did the planning and logistics. This was my first FUDCon, and from my point of view things went off very smoothly.

As has been mentioned elsewhere, the location of the event was excellent. Since I was coming from the snowy northeast, it was great to get a break and have some nice warm weather. The fact that both hotels were in walking distance of the conference venue was also a big plus; I didn't have to rent a car or take a taxi or anything like that.

I arrived on Friday night and headed straight to the primary hotel where the welcome event and opensource.com birthday party were happening. It was nice to meet some people I had only spoken with on IRC, and to have a few drinks and pizza :).

Day 1

On Saturday morning the conference started. My introduction to a "barcamp" style conference started when I showed up at the door to the conference, and I was enlisted to help bring some boxes down to the initial meeting room. People started filtering in, and after some logistics we headed straight into the "pitches" for the talks. Each person who wants to give a talk gives a 20 second synopsis of what they will talk about. Then each of the conference attendees gets to vote for the talks that they want to hear. I pitched my talk about "Cloud Management - Deltacloud, Aeolus, and friends", and after the voting it seemed like there was a lot of interest in the topic. While the votes were being tallied and the scheduled was being finalized, Jared Smith (the Fedora Project Leader) gave his "State of Fedora" talk. At this point we headed over to the main conference venue. My talk was scheduled for the end of the day on Saturday, so I was able to enjoy a few talks before giving mine.

Fedora on ARM

The first talk I attended was the "Fedora on ARM" talk. ARM is one of the world's most widely produced processors, and runs on everything from phones up to servers (in the near future). Why do we want to port Fedora to it? First, the OLPC XO 1.75 is using an ARM processor, and we would like to continue to be able to run Fedora on it. Second, several development boards are currently in production (GuruPlug, Panda board, etc), and running Fedora on those is a great way to get started with ARM development. Third, future tablets and netbooks will likely have ARM processors. And finally, ARM is looking to expand into the server market to compete with x86. Since this is an area that Linux has dominated, it would be good to get a head start here. Servers running ARM aren't quite available yet, but when they do become available they are projected to use 75% less power than equivalent x86 machines.

Previous efforts to port Fedora on ARM have fizzled out. This new effort has a koji build instance at arm.koji.fedoraproject.org, and any developer can submit builds there. It is hoped that this will encourage more participation in the process. The current effort basically has Fedora 13 running on ARM, though there are a couple of problems preventing a full "release". Right now the target platform in ARMv5tel, though they are looking to go to ARMv7 in the future. Now that Fedora 13 mostly works, getting Fedora 14 working is expected to be much easier. All of the patches that have been needed have been sent upstream, and in general things should move a lot faster. In theory, it should be easy to support both ARMv5 and ARMv7 in Fedora (as the instruction sets are compatible), but the reality of the situation is a bit different. ARMv5 has no required FPU, meaning that it is often the case that it has to fall back to software FPU. The ARMv7 specification requires a FPU. Additionally, in ARMv7 some parameters can be passed in FP registers, which obviously will not work on ARMv5. While you can definitely compile for ARMv5 and then run on ARMv7, the performance hit is pretty dramatic. Therefore, there will be two sub-architectures for ARM, one targeting v5 and one targeting v7.

There have been a number of challenges getting ARM building on Fedora. On the hardware side, most of the boards that are available are fairly slow. 512MB of RAM, single-core, with slow I/O seems to be the norm. There are some new "Panda boards" that have 1GB of memory and 2 cores, but the ethernet connection is slow (100Mbit), and the storage subsystem is slow. What this all means is that compiling on these machines takes a long time. On the software side, the challenges have mostly been around software that does not support ARM. For instance, OCaml in Fedora-13 does not support ARM (though this is fixed in Fedora-15). Other packages have had to be patched to deal with ARM alignment problems, compile problems, etc.

In terms of using Fedora on ARM, there are a few hurdles to overcome. The first is that for a random off-the-shelf ARM board, the bootloaders tend to be proprietary. That means that it is not always clear how to load a custom kernel onto the board. The u-boot project can help with many, but not all boards. Even if you have figured out how to load a custom kernel, the Fedora ARM port currently does not provide a packaged kernel, so it is up to you to compile one and create a minimal root filesystem to get the system bootstrapped.

The last part of this session was a quick overview of OLPC. The new version of the OLPC (1.75) will be running an ARMv7 processor, and they want to continue to be able to run Fedora on them. At the same time, there are 2 million x86-based OLPC machines out in the wild, so they want to continue to run Fedora on those machines. In particular, keeping the base distribution small and keeping dependencies down will help on the old OLPCs which only have 256MB of memory.

Sheepdog

The next talk I attended was about Sheepdog, a distributed storage system for Qemu/KVM block storage. I had previously seen a talk about this at KVM Forum 2010, but I went in hoping to understand some more details.

What's the motivation for building Sheepdog? It has a few advantages over other distributed block storage systems. The first is that it is open-source, unlike many of the competitors. The second is that hardware-based distributed block storage systems, like SANs, are very expensive and can be a single point of failure. The third reason is that other open-source distribute block storage systems (like CEPH and Lustre) are complex to setup and administer, and don't always have the performance characteristics expected.

Sheepdog is a fully-symmetric, reliable, scalable, and well-performing distributed block storage system. There is no central node, hence no single point of failure. Instead, blocks are distributed around the machines in a clock-like fashion, so the loss of a single (or in most cases 2) machine does not cause the data to be lost. Sheepdog is intimately tied into Qemu; it uses block layer hooks in Qemu to do its work. On the machine that is running the qemu/kvm process, there is a sheep server process that qemu talks to in order to read and write blocks. When a request for a block that is not on the local node is received, the sheepdog server process reaches out to the other sheepdog process running on other machines and requests the block that it is looking for. Because the data is broken up into 4MB chunks and distributed around the nodes, the performance should be comparable to other network-based storage systems (such as NFS), with much higher reliability.

Sheepdog currently uses the Corosync Engine to do cluster-like operations, which has advantages and disadvantages. The obvious advantage is that Corosync is a well-tested piece of cluster infrastructure that uses a well-known totem ring protocol to do synchronization across nodes. The disadvantage is that corosync is limited to around 48 total nodes. Moving to larger numbers of nodes will require developing a new protocol.

Cloud: The future of computing

I next went to Mike McGrath's talk on the future of cloud computing. The first part of his talk explained the advantages that businesses see in cloud computing (vs. traditional, hardware-hosted-on-premises computing); cost, scalability, and the ability to pay as you go. In this section he also talked about the differences between IaaS, PaaS, and SaaS. Infrastructure as a service (IaaS) is the idea of running virtual machines on hardware that you do not necessarily control. Amazon EC2 is the canonical example of IaaS. Platform as a service (PaaS) is the idea of running your application on a platform that is already prepared for you. Web hosting providers fall into this category. Finally, software as a service (SaaS) is the idea of being completely divorced from the maintenance of the system. Things like Google Docs fall into this category. He then demoed "Libra", an application that he is working on to quickly deploy PaaS applications.

The second part of his talk was how cloud computing is set to replace traditional computing, and how ill-prepared free software is to deal with that new future. In particular, his points are that:

  1. HTML5 is set to replace traditional desktop toolkits with a rich interface
  2. The new model of cloud computing encourages consumption, not contribution
  3. Duplicating a cloud computing model becomes easy (given enough hardware), so there is little incentive for companies to open-source their software stack
  4. Current licenses are not prepared for cloud computing
  5. The open source crowd will have to compromise to open standards, rather than open-source

While I agree with him on some points, I don't believe that cloud computing is going to completely take over the computing landscape. Computing goes through cycles from thin clients to thick clients, and back again. While cloud computing will almost certainly be a part of the IT landscape for a long time to come, it will be just that: a part. Not every application is suitable for cloud computing. Not every user wants to outsource their concerns to a third party.

Pulp

The last session I went to before my own was about Pulp, which is a software repository management system. It can manage packages, errata, and distros, along with doing repository synchronization from a feed or custom repos.

The architecture looks like:


Essentially, there is a central pulp server that can be interfaced with via REST. It uses a mongodb to hold all of the metadata about packages, errata, etc, and can push (via AMQP) requests to consumer machines. The consumer machines run a small pulp daemon that continually monitors for changes, so if modifications are done outside of pulp, the main server can be notified. The current version of pulp supports modern Fedora along with RHEL-5 and RHEL-6.

When managing repositories with pulp, there are a number of operations available. Repositories can be created, deleted, or updated based on commands from the admin. The data in the repositories can be sync'ed immediately or at a fixed schedule, and the content can come from yum repositories, RHN feeds, or locally created repositories. It also supports global package search across all repos, which may be useful for finding out-of-date packages in the repositories.

The consumers need to subscribe (bind) to certain pulp repositories. Once they are subscribed, the repositories are available through yum, so the normal yum commands will work to install additional packages. The pulp server can do remote installation of packages through the pulp API if desired. As mentioned before, the daemon running on the consumer can keep track of packages that were installed locally, and update the pulp server with that information. Finally, any actions done on the consumer are extensively audited for debugging/security purposes.

What's the difference between Pulp and Spacewalk? Pulp is a much lighter version of Spacewalk, so it can be integrated separately. The hope is to break up Spacewalk into smaller, single-purpose components that are more re-usable (and hence have more users).

Some advanced features of pulp include:
  • Batch operations on groups of repositories and groups of consumers
  • LDAP integration
  • Repository Cloning, so you can start with the same repository but use it for multiple different purposes (think development vs. production)
  • A tagging system for consumers
  • Full REST API

The future roadmap for Pulp includes:
  • Maintenance Windows (do updates at particular dates/times)
  • Scheduled installation of packages on consumers
  • Support for external content delivery systems to reduce bandwidth
  • Clone filters to filter out what packages to clone
  • HA and Failover
  • Internationalization

Cloud Management
It was then time for my talk. I'm not a great presenter, so I was a bit nervous to start with, but I eventually got into a flow with my talk. I spoke about both Deltacloud and the newly launched Aeolus project, which uses Deltacloud underneath. The outline I used is available here, so I won't go into too much additional detail here. I will say that I had quite a few people come to my presentation, and I had some good questions, so I thought it was successful from that point of view. I'll record some of the questions that I remember below; they make a nice addendum to the material from the talk itself. Any mistakes are mine, since I am working off of my imperfect memory.

Deltacloud
Q: Why use the deltacloud API over something like the EC2 API, which seems to becoming an industry standard?
A: The Deltacloud API is very useful right now, since there is no common API between cloud providers.  While some providers are indeed implementing the EC2 API, it is not ubiquitous.  If it turns out in the future that every cloud that we care about implements the EC2 API, then deltacloud could be viewed as superflous.  Until that point, however, we think that it has significant value in avoid cloud lock-in.

Oz
Q: What about making Oz more configurable, to allow users to upload their own kickstarts?
A: This is a great question, and the answer is that yes, we should make it more configurable. In response to this question, I implemented uploadable autoinstallation files which went into the Oz 0.1.0 release.

Aeolus Conductor
Q: Looking at the full diagram of Aeolus, it is a pretty complicated structure. How are Fedora users expected to install and configure this?
A: This is a valid question, and the answer is not fully formed yet. In order to solve many of the problems that Aeolus is trying to solve, we think the complexity of the internal components is needed. That being said, we really need to make it simple to deploy a simple Aeolus Conductor. To that end, we have a set of scripts called "aeolus-configure" that can install and configure a simple configuration. This certainly needs work, but it is one way in which we can make this more consumable to end-users.

After my talk, there was FUDPub which was a lot of fun. Not much to mention here, other than the combination of drinks, bowling, pool, and friends was fantastic. I would highly recommend doing something similar at future FUDCons.

1 comment:

  1. Why hasn't there been much/any talk about moving fedora/RH to suse's build/management service(if not their actually infrastructure then at least deploy the stack on your own infrastructure)?

    ReplyDelete