Mozharness is moving into the forest

Since its beginnings, Mozharness has been living in its own world (repo). That's about to change. Next quarter we are going to be moving it in-tree.

what's Mozharness?

it's a configuration driven script harness

why in tree?

  1. First and foremost: transparency.
    • There is an overarching goal to provide developers the keys to manage and stand up their own builds & tests (AKA self-serve). Having the automation step logic side by side to the compile and test step logic provides developers transparency and a sense of determinism. Which leads to reason number 2.
  2. deterministic builds & tests
    • This is somewhat already in place thanks to Armen's work on pinning specific Mozharness revisions to in-tree revisions. However the pins can end up behind the latest Mozharness revisions so we end up often landing multiple changes to Mozharness at once to one in-tree revsion.
  3. Mozharness automated build & test jobs are not just managed by Buildbot anymore. Taskcluster is starting to take the weight off Buildbot's hands and, because of its own behaviour, Mozharness is better suited in-`tree.
  4. ateam is going to put effort this quarter into unifying how we run tests locally vs automation. Having mozharness in-tree should make this easier

this sounds great. why wouldn't we want to do this?

There are downsides. It arguably puts extra strain on Release Engineering for managing infra health. Though issues will be more isolated, it does become trickier to have a higher view of when and where Mozharness changes land.

In addition, there is going to be more friction for deployments. This is because a number of our Mozharness scripts are not directly related to continuous integration jobs: e.g. releases, vcs-sync, b2g bumper, and merge tasks.

why wasn't this done yester-year?

Mozharness now handles > 90% of our build and test jobs. Its internal components: config, script, and log logic, are starting to mature. However, this wasn't always the case.

When it was being developed and its uses were unknown, it made sense to develop on the side and tie itself close to buildbot deployments.

okay. I'm sold. can we just simply hg add mozharness?

Integrating Mozharness in-tree comes with a fe6 challenges

  1. chicken and egg issue

    • currently, for build jobs, Mozharness is in charge of managing version control of the tree itself. How can Mozharness checkout a repo if it itself lives within that repo?
  2. test jobs don't require the src tree

    • test jobs only need a binary and a It doesn't make sense to keep a copy of our branches on each machine that runs tests. In line with that, putting mozharness inside also leads us back to a similar 'chicken and egg' issue.
  3. which branch and revisions do our release engineering scripts use?

  4. how do we handle releases?

  5. how do we not cause extra load on hg.m.o?

  6. what about integrating into Buildbot without interruption?

it's easy!

This shouldn't be too hard to solve. Here is a basic outline my plan of action and road map for this goal:

  • land copy of mozharness on a project branch
  • add an end point on relengapi with the following logic
    1. endpoint will contain 'mozharness' and a '$REVISION'
    2. look in s3 for equivalent mozharness archive
    3. if not present: download a sub repo dir archive from hg.m.o, run tests, and push that archive to s3
    4. finally, return the url to the s3 archive
  • integrate the endpoint into buildbot
    • call endpoint before scheduling jobs
    • add builder step: download and unpack the archive on the slave
  • for machines that run mozharness based releng scripts
    • add manifest that points to 'known good s3 archive'
    • fix deploy model to listen to manifest changes and downloads/unpacks Mozharness in a similar manner to builds+tests

This is a loose outline of the integration strategy. What I like about this

  1. no code change required within Mozharness' code
  2. there is very little code change within Buildbot
  3. allows Taskcluster to use Mozharness in whatever way it likes
  4. no chicken and egg problem as (in Buildbot world), Mozharness will exist before the tree exists on the slave
  5. no need to manage multiple repos and keep them in sync

I'm sure I am not taking into account many edge cases and I look forward to hitting those edges head on as I start this in Q2. Stay tuned for further developments.

One day, I'd like to see Mozharness (at least its internal parts) be made into isolated python packages installable by pip. However, that's another problem for another day.

Questions? Concerns? Ideas? Please comment here or in the tracking bug