Category Archives: Programming

General thoughts on programming

Using Mock to build Firefox on Linux

We currently have two different Linux build machine configurations. One does our 32bit desktop and Android builds, the other does our 64bit desktop builds. This isn’t very efficient because we have to segregate these two types of machine in their own pools.  If we have a burst of mobile builds, we end up with a pool of overloaded 32bit machines while our 64bit machines sit idle.  We also have one build environment in which all linux based builds need to be done. With boot2gecko and other projects coming online, we need to ensure that we are able to scale to serve these projects while allowing us to manage our build slaves in a sustainable way.

I have been working on a proof of concept for adapting Fedora’s Mock utility to build Firefox, Firefox for Mobile and other projects like boot2gecko in their own sandboxed build environment. Mock is what the Fedora project uses to build every single RPM included in their distribution. Mock separates our build host from our build environment, allows us to scale build hosts any way we desire, allows us to be more efficient and agile, improves security, improves the developer workflow and enables community engagement.

Mock’s Architecture Simplified

When Mock is used, yum and rpm populate the basic sandbox. Each time a command is run, Mock changes the root directory (chroot) to this sandboxed build environment. Mock then drops privileges as requested and runs the command in the sandbox. To speed up the creation of these sandboxes, Mock instructs yum to locally cache the rpm files it downloads. Mock also creates a tarball containing the sandbox after installing the base packages to further speed up sandbox creation. In all, it takes Mock approximately one minute on my local Fusion VM to initialize a build environment once the caches are warmed.

Build Environment Isolation

The host operating system is responsible for providing the kernel, the tools to drive Mock and Mock itself. Because each build environment is built on demand, the libraries that the host operating system provides aren’t important or used. This means that we can take security patches on the build host OS without risking our build environments. We are also able to bring up new fast hardware which may not support our build environment’s OS natively.  Mock uses the chroot system call. This call has been around for quite some time and its abilities and flaws are well known. Chroot works on virtual machines which means we can use Mock even if we decide to scale to virtualized hardware.

Efficiency and Agility

Having no build dependencies installed on the build hosts, other than Mock itself, allows us to use the same configuration on our test and build machines. The test machines could even benefit from having Mock by allowing us to run tests against Gnome 2 and Gnome 3 on the same pool of slaves.

A common scenario that I’ve dealt with having to update a library which we link to Firefox. Lets say this we are using libfoo version 1 in Firefox 10. To support a developer landing a change to Firefox 11 that requires libfoo version 2, we currently need to figure out if Firefox 10 can maintain binary compatibility when built with libfoo version 2 or figure out how to install version 2 in a separate location and have only Firefox 11 and newer use libfoo version 2.  Using Mock, we would create a new build environment that only has libfoo version 2.  Because libfoo version 1 is unavailable to Firefox 11 builds and libfoo version 2 is unavailable to Firefox 10, we don’t risk depending on a new version of libfoo in Firefox 10 and we don’t risk silently falling back to using libfoo version 1 on Firefox 11.

Mobile moves quickly, often needing a completely unique build environment. When we split each mobile project out to its own Mock build environment, we are able to set up new environments significantly quicker because we are able to limit the scope of changes and no longer risk breaking Firefox by changing anything related to mobile. We can also allow mobile projects to use much newer Fedora software in their build environment while still running the more stable CentOS on our build hosts.  This means we enable mobile developers to use modern software while avoiding rebuilding a slew of packages in an alternate location.

Security

I understand that chroots are not a perfect sandboxing tool. I think we need to make sure that we don’t make perfect the enemy of good. We currently have no real sandboxing on our build slaves. Mock will enable us to run all developer submitted commands in an unprivileged chroot that is rebuilt for every build. Mock allows us to keep our system software updated on the build host to lessen our exposure to security vulnerabilities.

A special case in our infrastructure is Try. Each Try build is currently able to modify the system in ways that we might not want it to.  As a result, we currently have a completely separate pool of machines which run our Try builds.  Mock allows us to sandbox all developer submitted code into a disposable sandbox. Because all of the developer submitted code is limited to the sandbox, we can start looking at merging the Try and production pools.  The failure mode here is that the developer is able to exploit a kernel bug to gain root permissions in their sandbox.  Once a user has root in the sandbox they would be able to chroot back out of the sandbox.  This will be important when we discuss merging our Try and production pools.

Correctness

Each build environment will install the exact set of packages required for the build.  When a new dependency is required, we will explicitly add it to the list of dependencies. This means we won’t be surprised when Firefox starts depending on another library or tool.  We are also able to easily move actively developed branches of Firefox forward without taking risky changes to the build environment of shipping versions of Firefox. This is  increasingly important with the ESR (Extended Support Release)/ release.  Using Mock, we can continue to build in the exact same environment we use today for the rest of the ESR release even though we are building a 10 month newer version of Firefox with different dependencies on the same build host.  Having a single pool of slaves that can do all of our Linux based builds without having to figure out alternate install locations for each dependency is a major scalability win in my books.

Developer workflow improvements

Upgrading a library on our build slaves is currently a pain for developers and releng alike. Setting up a local copy of our Linux build machines is a pain. Mock makes this easier by allowing developers to use their existing machines to exactly replicate our official build environments.  Setting up a copy of our build environment on their machine would becomes trivially easy.  Developers could test their code changes in an exact match of our production build environment. Furthermore, with a tool like mozharness, a developer would be able to test an entire build using the exact same bits we use in production with the exact same commands and flow control.

Mock allows us to engage the community more effectively. Community members are able to work on our build machine configuration without impediment and would have access to the exact tools and packages that we use to build Firefox and Mobile.  When a community member finds or fixes a bug in our configuration or wants to upgrade a dependency, they are able to do so and submit patches for review.  This is a lot easier than our current process of signing out a slave for them, adding them to our vpn, letting them do their work followed by removing them from our vpn, reimaging the slave and setting it up again to be in production.  Project like SeaMonkey will be able to match official Firefox build environments with a lot less work required.

Conclusion

Mock is a very useful tool. Using it enables us to handle developer requests quicker, engage the community, improve developer workflow, improve security and reduce risk of breaking shipping versions of Firefox when moving development versions forward.  There are some changes that I’d like to make to the proof of concept before calling mock_mozilla complete, mainly how the driver script (mock_mozilla.py) parses the command line arguments and how build environment locking works.

If you’d like to take a look at the code for yourself, please feel free to check it out on github! https://github.com/jhford/mock_mozilla.

Demonstrations

Before you start running code in a Mock sandbox, you need to intialize it.  Initializing cleans out the old contents and sets up the base packages specified in the sandbox config file.

~/software/mock_mozilla $ mock_mozilla -r mozilla-f15-x86_64 --init
INFO: mock_mozilla.py version 1.1.17 starting...
INFO: State Changed: init plugins
INFO: selinux disabled
INFO: State Changed: start
INFO: State Changed: lock buildroot
INFO: State Changed: clean
INFO: State Changed: unlock buildroot
INFO: State Changed: init
INFO: State Changed: lock buildroot
INFO: Mock Version: 1.1.17
INFO: Mock Version: 1.1.17
INFO: Mock Version: 1.1.17
INFO: calling preinit hooks
INFO: enabled root cache
INFO: State Changed: unpacking root cache
INFO: enabled yum cache
INFO: State Changed: cleaning yum metadata
INFO: enabled ccache
INFO: State Changed: running yum
INFO: State Changed: unlock buildroot
INFO: State Changed: end

Mock uses yum to install packages into the sandbox, but that doesn’t mean that we are limited to using yum and rpm to populate tools.

~/software/mock_mozilla $ mock_mozilla -r mozilla-f15-x86_64 --shell "zip --version"
INFO: mock_mozilla.py version 1.1.17 starting...
INFO: State Changed: init plugins
INFO: selinux disabled
INFO: State Changed: start
INFO: State Changed: lock buildroot
INFO: State Changed: shell
/bin/sh: zip: command not found
INFO: State Changed: unlock buildroot
~/software/mock_mozilla $ mock_mozilla -r mozilla-f15-x86_64 --install zip
INFO: mock_mozilla.py version 1.1.17 starting...
INFO: State Changed: init plugins
INFO: selinux disabled
INFO: State Changed: start
INFO: Mock Version: 1.1.17
INFO: Mock Version: 1.1.17
INFO: Mock Version: 1.1.17
INFO: State Changed: lock buildroot
INFO: installing package(s): zip

================================================================================
 Package        Arch              Version               Repository         Size
================================================================================
Installing:
 zip            x86_64            3.0-3.fc15            fedora            251 k

Transaction Summary
================================================================================
Install       1 Package(s)

Total size: 251 k
Installed size: 770 k

Installed:
  zip.x86_64 0:3.0-3.fc15                                                      

INFO: State Changed: unlock buildroot
INFO: State Changed: end
~/software/mock_mozilla $ mock_mozilla -r mozilla-f15-x86_64 --shell "zip --version"
INFO: mock_mozilla.py version 1.1.17 starting...
INFO: State Changed: init plugins
INFO: selinux disabled
INFO: State Changed: start
INFO: State Changed: lock buildroot
INFO: State Changed: shell
Copyright (c) 1990-2008 Info-ZIP - Type 'zip "-L"' for software license.
This is Zip 3.0 (July 5th 2008), by Info-ZIP.
Currently maintained by E. Gordon.  Please send bug reports to
the authors using the web page at www.info-zip.org; see README for details.

Latest sources and executables are at ftp://ftp.info-zip.org/pub/infozip,
as of above date; see http://www.info-zip.org/ for other sites.

Compiled with gcc 4.6.0 20110205 (Red Hat 4.6.0-0.6) for Unix (Linux ELF) on Feb  8 2011.

Zip special compilation options:
     USE_EF_UT_TIME       (store Universal Time)
     SYMLINK_SUPPORT      (symbolic links supported)
     LARGE_FILE_SUPPORT   (can read and write large files on file system)
     ZIP64_SUPPORT        (use Zip64 to store large files in archives)
     UNICODE_SUPPORT      (store and read UTF-8 Unicode paths)
     STORE_UNIX_UIDs_GIDs (store UID/GID sizes/values using new extra field)
     UIDGID_NOT_16BIT     (old Unix 16-bit UID/GID extra field not used)
     [encryption, version 2.91 of 05 Jan 2007] (modified for Zip 3)

Encryption notice:
     The encryption code of this program is not copyrighted and is
     put in the public domain.  It was originally written in Europe
     and, to the best of our knowledge, can be freely distributed
     in both source and object forms from any country, including
     the USA under License Exception TSU of the U.S. Export
     Administration Regulations (section 740.13(e)) of 6 June 2002.

Zip environment options:
             ZIP:  [none]
          ZIPOPT:  [none]
INFO: State Changed: unlock buildroot

Mock allows us to run developer submitted code as an unprivileged user to isolate their code and files from the rest of the machine.

~/software/mock_mozilla $ mock_mozilla -r mozilla-f15-x86_64 --shell "whoami"
INFO: mock_mozilla.py version 1.1.17 starting...
INFO: State Changed: init plugins
INFO: selinux disabled
INFO: State Changed: start
INFO: State Changed: lock buildroot
INFO: State Changed: shell
root
INFO: State Changed: unlock buildroot
~/software/mock_mozilla $ mock_mozilla -r mozilla-f15-x86_64 --unpriv --shell "whoami"
INFO: mock_mozilla.py version 1.1.17 starting...
INFO: State Changed: init plugins
INFO: selinux disabled
INFO: State Changed: start
INFO: State Changed: lock buildroot
INFO: State Changed: shell
mock_mozilla
INFO: State Changed: unlock buildroot

Putting all of the above together, I have written a proof of concept script  and recorded a screencast of it building Firefox in a Mock sandbox.  I’ve cut the middle bit out because this takes a while on my Macbook-based VM.

Writting a native rm program for Windows

Our Windows build machines use msys to emulate a posix environment.  Msys is great tool, providing a lot of common posix utilities, like cp, mv, rm.  Sadly there are bugs in the posix emulation.  For us, this manifests in the rm program being unable to delete certain files.

Bug 583129 is about using native Windows tools for file removals.  In that bug, we’ve looked at a combination of the rmdir and attrib Windows tools.  The problem is that rmdir doesn’t like to delete files if they have the read-only or system attribute.  To fix that, we need to run attrib to remove those attributes then rmdir to delete them.  Running rmdir is fast but attrib is very slow.

Last year, I spent a bit of time writing a native windows version of rm.  I am by no means a Windows developer, so I spent a couple hours learning the basics of the Windows API.  The API is quite different to what I’m used to, but the documentation seems to have been written quite well for my purposes.  I was able to get the basics working pretty quickly.  Yesterday, I decided to finish up the program by adding directory deletion, recursive deletes and a command line parser.

I present to you winrm.  The code is available in my Mozilla user repo.  This tool works similarly to the standard posix rm utility.  For simplicity’s sake, only single character options are supported.  These options can be joined, like “-rf”, or specified individually, like “-r -f”.  The “–” option is also supported, signalling the program to treat all following arguments as files to be deleted.  Because of how the option parser works, files are deleted in reverse order to how they are on the command line.

jhford@JHFORD-VM ~/mozilla/jhford-native-rm
$ touch a b c d e

jhford@JHFORD-VM ~/mozilla/jhford-native-rm
$ winrm -v -- a b c d e
deleting "e"
deleted "e"
deleting "d"
deleted "d"
deleting "c"
deleted "c"
deleting "b"
deleted "b"
deleting "a"
deleted "a"

Because my program is written in standard Windows API and is a much simpler program, it is also much faster than the msys rm program.  To test this, I timed deletion of a mozilla-central clone using both mine and msys’ rm.  My program took 37s where the msys program took 113s.

If you know the Windows API and have the cycles, please let me know if you find any glaring errors with my program.   If you want to test the program without having to build it, I’ve uploaded a copy here winrm-0.1.

Using OpenVPN to tunnel all traffic through my home server

I want to be able to send all my internet traffic to the Linux machine I have running in my apartment and I am not a networking expert. My motivation for this post is threefold; document my process for future reference, share my info and see if people have suggestions for how to do this better. I am not going to go through every option, just what I did and what worked for me.

The next step was to figure out what I needed to do. I decided on using openvpn because I already use it for work and because it’s open source. I found the how-to document on the openvpn site to be really useful. I am using Fedora, so I skipped the section on installing openvpn from source and ran “sudo yum install openvpn“. My next step was to copy the pki support files into a directory by running “cp -r /usr/share/openvpn/easy-rsa/2.0/* .“. I then followed the directions for generating the pki infrastructure.

For this to work you need an open port on your server. I used the openvpn standard of 1194. I tested that the port was open with netcat by running “nc -l 1194” on my server and “nc server.name 1194“. Writing on either terminal will show the output on the other on EOL.

At this point, I needed to set up the server configuration. I copied the sample config file to my directory by running “cp cp /usr/share/doc/openvpn-2.1.4/sample-config-files/server.conf server.conf“. I found that the sample server config file seemed to work great for me with the following changes:

diff -U0 sample-config-files/server.conf config/server.conf
--- sample-config-files/server.conf	2011-12-12 21:43:31.000000000 -0800
+++ config/server.conf	2011-12-12 22:16:46.000000000 -0800
@@ -196,0 +197,2 @@
+push "dhcp-option DNS 0.0.0.0"
+push "dhcp-option DNS 0.0.0.0"
@@ -204 +206 @@
-;client-to-client
+client-to-client

The first change pushes DNS servers to my client (fake ips, obviously) and the second change is to allow different clients to talk to each other. I am not sure how useful the inter-client link will end up being.

I am using the Viscosity client because that’s the only sane way to do this on OS X and Windows. Sending all traffic over the vpn link is the default behaviour for Network Manager (Linux). I started with the sample by running “cp /usr/share/doc/openvpn-2.1.4/sample-config-files/client.conf .“. My changes where pretty basic:

diff -U0 sample-client.conf client.conf
--- sample-client.conf	2011-12-12 22:43:11.000000000 -0800
+++ client.conf	2011-12-12 21:49:17.000000000 -0800
@@ -42 +42 @@
-remote my-server-1 1194
+remote server.name 1194
@@ -89,2 +89,2 @@
-cert client.crt
-key client.key
+cert laptop.crt
+key laptop.key

At this point, the client side configuration was ready to transfer, so I tarred up the needed files with:

mkdir ovpn-configs
cp keys/ca.crt keys/laptop.crt keys/laptop.key client.conf ovpn-configs/
tar jcf laptop-openvpn-config.tar.bz2 ovpn-configs

and used scp to transfer the files over to my laptop.

Once on my laptop, I untarred the files and imported the configuration into Viscosity. I did this by:

  • clicking on Viscosity menu icon then selecting preferences
  • clicking on plus arrow with down, selecting “import connection” then selecting “from file”
  • selected the client.conf file from the tarball

Next, I configured all my traffic to go over vpn. I selected the “client” configuration from the list of configurations and pressed the “edit” button. In the sheet, I navigated to the “networking” tab and checked the box for “send all traffic over VPN connection”. My client side configuration was complete.

At this stage, I tested that my machine was able to connect to my openvpn server. I gathered the various files needed for the openvpn server into a single directory:

mkdir ~/openvpn-server/
cp keys/* ~/openvpn-server #lazy
cp server.conf ~/openvpn-server

and started the server with “cd ~/openvpn-server && sudo openvpn server.conf“. I connected using viscosity to the server. The client connected properly, but I was unable to resolve anything on dns or reach anything other than my openvpn server. Reading the openvpn howto suggested setting up a NAT. I did some searching and found a page with information on setting up the NAT. I did:

echo 1 > /proc/sys/net/ipv4/ip_forward
/sbin/iptables -t nat -A POSTROUTING -o eth0 -j MASQUERADE
/sbin/iptables -A FORWARD -i eth0 -o tun0 -m state --state RELATED,ESTABLISHED -j ACCEPT
/sbin/iptables -A FORWARD -i tun0 -o eth0 -j ACCEPT

At this point, everything worked! I ran traceroute, and the first hop was my vpn server’s vpn address (10.8.0.1). I also used some websites to check my public IP and it was showing as my server’s IP.

I hope this is useful to others. If I’ve done something really dumb, I’d appreciate any suggestions for how to do it better! I have left out information about how to start the openvpn service on boot. This isn’t really important to me right now but if I ever bother with it, I’ll update this blog post.

Screenshots on OS X timeouts

As of the mozilla-inbound merge this morning, any time automation.py based tests timeout or crash, a screenshot will be base64 encoded and dumped into the test log on OS X. We’ve had this support for a while on Linux and I have matched the output format. In case you aren’t familiar with this, your logs will print out something that looks like:

8217 ERROR TEST-UNEXPECTED-FAIL | /tests/toolkit/content/tests/widgets/test_videocontrols.html | Test timed out.
args: ['/usr/sbin/screencapture', '-C', '-x', '-t', 'png', '/var/folders/Hs/HsDn6a9SG8idoIya6p9mtE+++TI/-Tmp-/mozilla-test-fail_k9Dpdz']
SCREENSHOT: data:image/png;base64,iVBORw0.....

If you want to see this image, copy everything from ‘data:image’, inclusive, to the end of line and paste it into your browser’s awesome bar.

In case you want to see what this looks like in the wild, here is a sample log with a screenshot.

I am working on getting this enabled on windows as well. My automation.py.in changes should easily support the win32 screenshot utility written by Ted in bug 414049.

Disabling PGO for the majority of Firefox Builds.

This project has been discussed in a dev.planning thread, with the work being tracked in bug 658313.  We are going to be turning off PGO for incremental builds on all branches[1] on Wednesday, October 5, 2011.  In the exceedingly rare chance that you don’t read every single bug comment and dev.planning thread post, here is what will change:

  • Builds triggered as part of a push will not have PGO enabled on any platform
  • We will be producing builds every four hours with PGO enabled on Windows and Linux for the following branches:
    • Mozilla-Inbound
    • Mozilla-Central
    • Mozilla-Aurora
    • Mozilla-Beta
  • All nightlies produced for platforms we ship PGO enabled (linux, windows) will have PGO on for the nightly build
  • Platforms that we ship with PGO enabled will have their PGO talos results report to the current graphserver branch.  This includes nightlies and the new 4-hourly builds
  • Platforms that we ship with PGO enabled will have their non-PGO talos results to a ‘-Non-PGO’ suffixed branch, e.g. Firefox-Non-PGO
  • Platforms that we do not ship with PGO enabled will report both per-push builds and nightlies  to the current graphserver branch
  • TBPL was modified in bug 670037 to make PGO builds special.  The deployment of these changes is being tracked in bug 691550.  If this TBPL change isn’t deployed before we start generating PGO builds, you might see duplicate build and test entries
  • Yes, we plan to teach try chooser how to optionally do PGO.  For now, please include ‘mk_add_options MOZ_PGO=1′  in your PGO platform’s mozconfig-extra-$platform file if you wish to have PGO enabled for try.  Your results will be on the Try branch  in graphserver.  This work is being tracked in bug 691673
  • Yes, we plan to optimize scheduling so that we only do a build if there has been a push in the previous four hours.  This might allow us to add PGO builds on more branches and is tracked in bug 691675

The motivation for this project is to get results to developers quicker.  It is felt that this reduction in PGO coverage is a safe optimization because there have been very few PGO related bugs found so far.

If you have any concerns, please contact me.  I am jhford in #build on irc.mozilla.org

[1] Well, all active development branches.  We are leaving PGO on for Win32 on branches older than Firefox 5, like mozilla-1.9.2