Monday, January 19, 2009

KVM performance tools

I've recently been working on tracking down some performance problems with KVM guests. There are a few tools available, but in this entry I'll stick to the two I've found most useful recently: kvm_stat and kvmtrace.

Let me start with kvm_stat, since that one is far easier to work with. kvm_stat is basically a python script that periodically collects statistics from the kernel about what KVM is up to. Unfortunately, it is not packaged in the Fedora kvm package (I should probably file a bug about this), but the good news is that it is very easy to get. You just need to check out the kvm-userspace git repository (git clone git://, and the script is at the top-level of the directory.

To get kvm_stat to actually do something useful, you first have to mount debugfs. In all likelihood, your distro kernel has this turned on, so all you really have to do is:
mount -t debugfs none /sys/kernel/debug
If you are going to do this often enough, it's probably a good idea to add that to your /etc/fstab.

Once you have debugfs mounted, you can now use kvm_stat. If you just run "kvm_stat", it periodically outputs columns of data, sort of similar to vmstat. I've found this kind of hard to look at, though. So what I've been doing instead is using the -l flag of kvm_stat, and piping the output to a text file:
kvm_stat -l >& /tmp/output
The -l flag puts kvm_stat into logging mode, which is harder to read on a second-to-second basis, but easier to read in a spreadsheet later. After I've collected data for a while (mostly during the tests I care about), I use OpenOffice to read that data into a spreadsheet (hint: use "space" as a delimiter, and tell it to "Merge Delimiters"). Now, it's fairly easy to see what your guest has been doing, from the host's POV. Note that these are cumulative numbers; if you have multiple guests running, this is all of the data from all of the guests.

There are quite a few fields that kvm_stat outputs; I'll talk about the ones I think are relevant:
  • exits - I *think* this is a combined count of all of the VMEXIT's that happened during this time period. Useful number to start with.
  • fpu_reload - The number of times a VMENTRY had to reload the FPU state (this only happens if your guest is using floating point)
  • halt_exit - This is the number of times that the guest exited due to calling "halt" (presumably because it had no work to do)
  • halt_wake - This is the number of times it was woken up from halt (it should be roughly equivalent to halt_exit)
  • host_state_reload - This is an interesting field. It counts the number of times we had to do a full reload of the host state (as opposed to the guest state). From what I can tell, this gets incremented mostly when a guest goes to read an MSR, or when we are first setting up MSR's.
  • insn_emulation - The number of instructions that the host emulated on behalf of the guest. Certain instructions (especially things like writes to MSR's, changes to page tables, etc) are trapped by the host, checked for validity, and emulated.
  • io_exits - The number of times the guest exited because it was writing to an I/O port
  • irq_exits - The number of times the guest exited because an external irq fired
  • irq_injections - The number of IRQ's "delivered" to the guest
  • mmio_exits - The number of times the guest exited for MMIO. Note that under KVM, mmio is much slower than a normal I/O exit (inb, outb), so this can make a significant difference
  • tlb_flush - The number of tlb_flush's that the guest performed.

The other tool I've started to use is kvmtrace. This tool does generally the same as the kvm_stat tool, but it does it at a much finer granualarity. From the output, you can see not only that it did a VMEXIT, but also that it did a VMEXIT because of an APIC_ACCESS (or whatever). This can be powerful, but it also generates a lot more data to sift through.

Using this tool is a little more complicated than the kvm_stat one. Luckily, it is packaged in the Fedora kvm RPM, so that part we get for free. To run this beast, you'll want to do something like:
kvmtrace -D outdir -o myguest
What this does is to tell that you want all output files to go to "outdir", and have them named "myguest.kvmtrace.?". You'll get one file for each CPU on the system. The last statement is actually quite important; generally, your best bet is going to be to pin the guest to a particular CPU on the host, so that your results don't span across multiple CPUs. Now, this is the raw, binary data for each CPU on the system. You next need to convert that into something that a human can look at. For this job, there is kvmtrace_format. You can do all kinds of clever things with kvmtrace_format, but what I've found the easiest so far is to use the "default" format file (which generates all events), and then dump that out to a file. So, for instance, I ran:
kvmtrace_format user/formats <> myguest-kvmtrace.out
Note that user/formats is from kvm-userspace at git:// (again, it's not in the Fedora kvm package, which I should probably file a bug about). That ends up dumping all of the output to myguest-kvmtrace.out, which turns it into a *huge* file. From here, I just did a bunch of processing with sed, grep, and awk to look for things that I care about.

1 comment: