A *useful* IRC channel for discussions about Release Engineering and its systems

At some point or another, I am sure that every code contributor has had a question about Release Engineering or the systems we run. The best place for this should have been #build. Instead, these discussions end up happening in various other channels because of the barrage of alerts from nagios. To put some approximate numbers on it, in 5 months, we had 69,000 messages in #build. Of these, 42,000 were messages from or to nagios. This makes it very difficult to have a conversation in #build.

To fix this, we have created #buildduty. This channel is specifically for nagios alerts and buildduty queries. The work to switch nagios to point to #buildduty is being tracked by bug 700817. Once this is done, #build should become a useful collaboration point for all things Release Engineering.

Mac OS X 10.7 Lion testing machines now online

As of this morning, we now have tests running on Mac OS X 10.7. These machines are running the same spec hardware as our new Rev4 Mac OS X 10.6 machines. There are some consistent oranges as mentioned here. Sadly, the list of failing tests was moderated and never acted on. The list can be found here.

Until bug 700429 is resolved, all of these tests are hidden on all trees. Once that bug is resolved, we will begin to show the green tests. In the meantime, please feel free to check out how things look using noignore and jobname on tbpl. I have been using “https://tbpl.mozilla.org/?noignore=1&jobname=lion“.

results from a standard lion push

Screenshots on OS X timeouts

As of the mozilla-inbound merge this morning, any time automation.py based tests timeout or crash, a screenshot will be base64 encoded and dumped into the test log on OS X. We’ve had this support for a while on Linux and I have matched the output format. In case you aren’t familiar with this, your logs will print out something that looks like:

8217 ERROR TEST-UNEXPECTED-FAIL | /tests/toolkit/content/tests/widgets/test_videocontrols.html | Test timed out.
args: ['/usr/sbin/screencapture', '-C', '-x', '-t', 'png', '/var/folders/Hs/HsDn6a9SG8idoIya6p9mtE+++TI/-Tmp-/mozilla-test-fail_k9Dpdz']
SCREENSHOT: data:image/png;base64,iVBORw0.....

If you want to see this image, copy everything from ‘data:image’, inclusive, to the end of line and paste it into your browser’s awesome bar.

In case you want to see what this looks like in the wild, here is a sample log with a screenshot.

I am working on getting this enabled on windows as well. My automation.py.in changes should easily support the win32 screenshot utility written by Ted in bug 414049.

Disabling PGO for the majority of Firefox Builds.

This project has been discussed in a dev.planning thread, with the work being tracked in bug 658313.  We are going to be turning off PGO for incremental builds on all branches[1] on Wednesday, October 5, 2011.  In the exceedingly rare chance that you don’t read every single bug comment and dev.planning thread post, here is what will change:

  • Builds triggered as part of a push will not have PGO enabled on any platform
  • We will be producing builds every four hours with PGO enabled on Windows and Linux for the following branches:
    • Mozilla-Inbound
    • Mozilla-Central
    • Mozilla-Aurora
    • Mozilla-Beta
  • All nightlies produced for platforms we ship PGO enabled (linux, windows) will have PGO on for the nightly build
  • Platforms that we ship with PGO enabled will have their PGO talos results report to the current graphserver branch.  This includes nightlies and the new 4-hourly builds
  • Platforms that we ship with PGO enabled will have their non-PGO talos results to a ‘-Non-PGO’ suffixed branch, e.g. Firefox-Non-PGO
  • Platforms that we do not ship with PGO enabled will report both per-push builds and nightlies  to the current graphserver branch
  • TBPL was modified in bug 670037 to make PGO builds special.  The deployment of these changes is being tracked in bug 691550.  If this TBPL change isn’t deployed before we start generating PGO builds, you might see duplicate build and test entries
  • Yes, we plan to teach try chooser how to optionally do PGO.  For now, please include ‘mk_add_options MOZ_PGO=1′  in your PGO platform’s mozconfig-extra-$platform file if you wish to have PGO enabled for try.  Your results will be on the Try branch  in graphserver.  This work is being tracked in bug 691673
  • Yes, we plan to optimize scheduling so that we only do a build if there has been a push in the previous four hours.  This might allow us to add PGO builds on more branches and is tracked in bug 691675

The motivation for this project is to get results to developers quicker.  It is felt that this reduction in PGO coverage is a safe optimization because there have been very few PGO related bugs found so far.

If you have any concerns, please contact me.  I am jhford in #build on irc.mozilla.org

[1] Well, all active development branches.  We are leaving PGO on for Win32 on branches older than Firefox 5, like mozilla-1.9.2

New hardware for testing Mac OS X 10.6

A while ago, we decided that we needed to expand our pool of machines to test Firefox on.  We are just about done getting the new 10.6 machines ready for production!  The machines we will be enabling are MacMini4,1 spec.  Internally, we are calling them ‘Rev4′ minis.

We won’t have all 80 of these minis ready for production right away, so we’ll start with ten.  Once we have 80 rev4 minis in production and every test suite is running reliably green on them, we will turn off the rev3 10.6 minis so they can be moved to other pools.  If there are suites that are orange, we will hide those results while we look into the failures.

In order to differentiate the talos results from these two different spec machines, we are creating a new platform on the graphserver.  When looking for data on graphserver, the platforms drop down will have two Mac OS X 10.6 options:

  • MacOSX 10.6.2 (rev3)
  • MacOSX 10.6 (rev4)
You should treat the ‘MacOSX 10.6.2 (rev3)’ results as authoritative for the time being.  If you find any issues with the Rev4 machines that you feel should block us from turning off the Rev3 10.6 machines, please file a bug and have it block bug 683734.
Results should start showing up on TBPL some time after the changes are deployed on Wednesday, October 4, 2011.