Comeback post: using Chef search with chef-zero and multiple Vagrant machines

I haven’t posted in quite a while and after a couple of requests, I decided to make a comeback and continue sharing some knowledge here.

Over the recent months, I’ve ran into Chef cookbooks that use the search feature, such as the nrpe cookbook. Developing and testing a wrapper cookbook locally works fine… up to a certain point.

When using chef-zero provisioning in Vagrant an in-memory chef server is launched on the Vagrant instance. This becomes a problem when you want to have a multi-host Vagrant setup (for example a client-server situation, like in the case of the nrpe cookbook) because each host would have its own in-memory server.

Because of this, some people have been forced to use some fragile solutions like standing up a secondary chef server for testing or even more dangerous, just not testing the cookbooks locally.

There is no out-of-the-box support for this in Vagrant, but it can be achieved by cleverly using a couple of features. I’ll describe the key elements in the post, but feel free to contact me if you feel I missed something or if you have questions.

The first thing we need is the Vagrantfile:

Vagrant.configure("2") do |config|

  config.vm.define "chef" do |chef|
    chef.vm.provision "bootstrap", type: "shell", path: "", privileged: false
    chef.vm.provision "update", type: "shell", path: "", privileged: false "private_network", ip: ""
    chef.vm.hostname = ""

  config.vm.define "server" do |server| "private_network", ip: ""
    server.vm.provision "chef_client" do |chef|
      chef.chef_server_url = ""
      chef.validation_key_path = "dummy.pem"
      chef.validation_client_name = "dummy" 
      chef.node_name = "nagios-server"
      chef.run_list = "role[nagios-server]"
      chef.environment = "dev"
      chef.delete_client = true
      chef.delete_node = true

  config.vm.define "client" do |client| "private_network", ip: ""
    client.vm.provision "chef_client" do |chef|
      chef.chef_server_url = ""
      chef.validation_key_path = "dummy.pem"
      chef.validation_client_name = "dummy" 
      chef.node_name = "nagios-client"
      chef.run_list = "role[nagios-client]"
      chef.environment = "dev"
      chef.delete_client = true
      chef.delete_node = true

If you notice, we’re not really using chef-zero on the client or server machines, but rather standing up a single chef-zero server in one Vagrant host, that the other hosts are going to talk to as if it was a full-fledged Chef server.

Here are the contents of


curl -sSL | gpg --import - 2>&1
curl -sSL | bash -s stable --ruby=2.2.1 2>&1
source /home/vagrant/.rvm/scripts/rvm
gem install librarian-chef

chef-zero --host --daemon
cd /vagrant
librarian-chef config tmp /tmp --global
librarian-chef install
knife upload /. --chef-repo-path /vagrant/ --server-url --key /vagrant/dummy.pem --user dummy

I’m using librarian-chef by choice but you can use something else like Berkshelf.

The last line of this script is the key. It’s using knife as a client to upload the cookbook to the server (including /cookbooks, data_bags and roles). Because chef-zero uses dummy authentication, we need to provide a dummy key.

The second Vagrant provisioner in the chef server machine is another script called, used to update the server if there were any changes to the cookbook. Here are its contents:


cd /vagrant
knife upload /data_bags --chef-repo-path /vagrant/ --server-url --key /vagrant/dummy.pem --user dummy 2>&1
knife upload /roles --chef-repo-path /vagrant/ --server-url --key /vagrant/dummy.pem --user dummy 2>&1
knife upload /environments --chef-repo-path /vagrant/ --server-url --key /vagrant/dummy.pem --user dummy 2>&1
knife cookbook upload my-cookbook --cookbook-path /vagrant/cookbooks --server-url --key /vagrant/dummy.pem --user dummy 2>&1

Needless to say, you have to spin up and provision your chef server first, before your server or client.

Last but not least, up until recently, Vagrant would make the call to delete the chef node and client from within the host (rather than within the guest), so as of this writing, it’s likely that you’ll need a knife.rb file with the following contents alongside your Vagrantfile:

node_name                'dummy'
validation_client_name   'dummy'
validation_key           'dummy.pem'
chef_server_url          ''

But that’s about it! Now you have two vagrant machines who think they’re being provisioned by a real chef server when they’re actually being provisioned by a tiny chef-zero server.

A couple of things to keep in mind:

  • Depending on your version of Vagrant, it’s possible that you might get a conflict if you’re trying to use knife commands with your live (production) chef server. If that’s the case, just make sure you use the --config switch to point it to your default knife configuration file (most likely under ~/.chef/knife.rb.
  • If you make any changes to the cookbook that need to be re-tested, make sure you update the chef server and re-provision the necessary boxes. The process varies depending on your cookbook management system (librarian-chef in this case), but it helps to have scripts you can quickly call from the command line.
  • In my experience, sometimes Vagrant didn’t quite work correctly and would keep a zombie process after the initial provisioning and I would have to kill it and start over. I couldn’t really diagnose what the problem was but it didn’t happen frequently.
  • Multi-VM vagrant setups can quickly consume your host resources so make sure you configure your VMs correctly so you don’t run out of memory!

Featured in national non-profit organization

This is the lowest-tech post I have ever done, but wanted to share it anyway.

A few days ago, a national non-profit organization I support featured me in their blog.

The interesting part of this is that this organization is, the non-profit supported by important names in Silicon Valley like Mark Zuckerberg, Eric Schmidt and Bill Gates among others (you can see the list of supporters here This group is looking to help pass the immigration reform that is being discussed in the House of Representatives.

Here’s the link to the blog post

Feel free to contact me if you’re interested in knowing a little more about what you can do to support us.

Multi-branch builds in Jenkins

One of the coolest features of Atlassian Bamboo is the ability to automatically detect multiple branches of a repository and run a plan with that branch.


Unfortunately, Jenkins cannot do that, but with the help of a couple of plugins and some duct tape, we can emulate that same behavior. Disclaimer: I don’t know if this works for Hudson, because I haven’t tested it, but it simple enough that I don’t see why it shouldn’t.

I spent a few days trying to figure out how to do this without modifying the source code of the existing plugins or writing my own, and I was able to do it. I also think my solution is generic enough that it will fit your needs without major issues (other than the cons that I’ll discuss towards the end of the post).

This tutorial (if you can call it that) assumes you have a few plugins that you probably already have if your Jenkins plans need such a level of sophistication that you’re reading this. If you find that something that is explained here is not provided by Jenkins out of the box, a simple plugin search should give you the answer.

Let me explain what one of the main problems is: the Mercurial Plugin, makes an effort to minimize the disk usage footprint so it only attempts to clone or pull the minimum changeset graph that will lead to the specified changeset. Because of this, when using the Poll SCM feature, it won’t detect (or pull) changes in any other branch than the one specified in the SCM section of the job.

The way around this is to use the Repository Sharing feature.

Screen Shot 2013-06-28 at 3.38.21 PM


If you enable that option, then Jenkins will store a local cache of every HG repository so less requests are made to the remote server. You don’t need to know how the feature is implemented, for now, just knowing that the entire repository is there, and not just the branch you configured.

Ok, we’ve jumped one hurdle. Next step (assuming you have configured your job to poll the repo), how do you make Jenkins poll for all changes if it’s only polling one branch? Well, it’s not really just polling one branch. I know the SCM configuration has a field Branch, but it should really be ‘Revision’, because you can enter any arbitrary thing that would identify one changeset: branch name, SHA1, or tag name, including “tip” that in this case, will be your friend.

You know how the tag called “tip” is automatically moved to the latest changeset? If we have Jenkins poll for this tag, it will detect if any changesets have been added to the repository.

But we’re not out of the woods yet. Yes, Jenkins will detect changes but the Mercurial plugin will only pull the changes from whatever branch the “tip” tag would happen to be. This is where the gluing begins.

First of all we would need to have two jobs for each repository that you want to trigger. I called them <repository> and <repository>-all-branches. The <repository> job does the actual build/test/deployment, etc. that you want to do. The <repository>-all-branches does the polling and triggers the real job. Of course substitute <repository> with your repository name. I’ll get into more detail further down. As I was writing the last lines of this post, I realize that it’s possible that this can be done using one job, but I haven’t done it, I’ll probably write another post if I can make that work.

The triggering job will do the following steps:

  • Poll the repository for changes.
  • Figure out the branches that have changes and require a build.
  • Use the Jenkins REST API to call the real job.

Here’s another problem, Jenkins attempts to retrieve the incoming changes before you do anything else. So, the first plugin we need is pre-scm-buildstep, this allows us to execute build steps at the pre-scm phase. It is at this point that we execute this script, I’m deliberately not putting it here, to save space, but if you don’t want to read it, just know that it creates a file with the branches with incoming changes that require a build.

So, now we have that file and we let Mercurial do its thing. At this point it is irrelevant since we don’t need the actual code, we just use the repository to detect changes and to query for branches with changes.

After this we use this other script, this time as a main build phase step, to iterate over the branches captured in the pre-scm step and trigger an execution of the main job with each one of these branches. There are a few things that the main job needs to have before we can do this. Think about it as an Interface that the job has to implement to be able to do this (if you’re a OO programmer).

First, it has to at least take the branch name as a parameter.

Screen Shot 2013-06-28 at 4.42.17 PM


And plug it into the SCM configuration.

Screen Shot 2013-06-28 at 4.44.16 PM


And finally, allow it to be triggered remotely, and adding a token that other jobs will use to invoke it.

Screen Shot 2013-06-28 at 4.46.06 PM


With this, your target job is ready to be invoked remotely. Now let’s go back to the triggering job, one of the things that is needed for authentication purposes is an API token for the user that will be triggering the build. Since this is done in the context of a shell script, there is no knowledge of the user that’s executing the script, so it can’t be inherited, thus the need for authentication. The script referenced earlier (this one) does a curl command around line 6 (maybe a different line if it has been modified since I wrote this post). This is a simple command with three parameters.

  • -F $json -> It passes the json file created just above that line and submits the request as a Form POST.
  • -u $user:$APIToken -> The authentication credentials for the $user. These are parameters to the script and the APIToken should be the string that can be retrieved from http://your.jenkins.server/user/youruser/configure, under the Show API Token button. The $user parameter should be the one that you’re retrieving the token for.
  • $JENKINS_URL/job/$targetJob/build?token=$targetJobToken -> $JENKINS_URL is an environment variable and doesn’t need to be passed as a paremter, $targetJob and $targetJobToken are script parameters and should match the settings you configured in the main job earlier. Some documentation says that you should use /buildWithParameters instead of /build, but this worked for me.

Also, the last step in that script is to pull the repository so the job has a baseline for the next detected change.

This pretty much does the trick, but you can take a few extra steps to smooth the edges and make it maintainable.

If you use the Managed Script Plugin, you can have the script in one central location and invoke them as a build step (and also as a pre-build step, since you already have the pre-scm-buildstep plugin). This helps if you have a high number of jobs that you’re automating and you have to change the script. You only change it in one place and it works seamlessly across the board. Additionally, if you need to do some logic before you execute the managed scripts, you can use another feature of this plugin that allows you to put a copy of one of your managed scripts in the workspace and put the name of the script as a variable, so you can reference it easily.

You can also use the Mask Passwords Plugin to mask the API Token value from the console log and prevent others from seeing it and be able to authenticate as that user. The plugin also allows you to store the masked password (hidden as *’s) as a global variable so it’s available at run time.

Finally, because this job can potentially be executed often, I would check the option to discard old builds and limit it to a smaller number that you’re comfortable with.

One caveat of this is that you effectively have two clones of the same repo, in two different Jenkins workspaces, but if you’re not concerned with disk space, this shouldn’t be a showstopper. As I’m writing this, I realize a different approach that could reduce the complexity to one job, so this downside could be eliminated.

Also, note that the user you’re using to authenticate the curl call, should have permissions (either global or project-based) required to run the job.

One last note, but this should be obvious, the scripts are shell scripts so they won’t work in a Windows environment, although I don’t see why not someone can come with a good batch script in five minutes that does the same thing.

Did you notice how we did this without writing a line of code outside of the 5-6 lines in the shell scripts? I think it beats the hell out of writing your own plugin to implement this.

Wow, this was a long post, the longest I’ve written so far, but I hope it helped. If you have questions, my contact information is in the left of your screen (if I haven’t changed the layout), feel free to shoot me any questions you have.

Do personal projects, run into new problems

I had written a lot of code in my life but I never contributed to any open source projects. Because of that I didn’t have any code to show to colleagues or potential employees, everything was proprietary.

I realized then, that I in fact had done a lot of little scripts here and there and I could easily put them in a github or bitbucket repository.

Then I decided I could start playing with a new technology and put all the code, configuration, metadata, etc. in a github repository. Even if it was not generic enough to be reused by somebody else with a good amount of refactoring or reconfiguration, it would at the very least serve as documentation of my own work.

So I decided to work with Amazon Web Services and Puppet, two technologies that I had worked close to, but never hands-on.

So, in the process, I’ve found problems that I’d never encountered before. This is positive in two ways.

  1. Because it’s done on my own time, I’m not pressured to find just any hack to make it work. I can design a good, long-term solution.
  2. Whenever I work with these technologies in a professional project, I’ll be closer to the answer than I would’ve been if I had never used them.

So, go ahead and play with new toys!

Continuous Delivery

Jez Humble’s Continuous Delivery presentation from DevOps Days 2010. If you’re reading this blog you’ve probably seen this presentation (or some form of it) before, but some of us need a refresher from time to time.

Continuous Delivery.

Open-sourcing infrastructure

People contribute code to open source projects in many languages. Many of these projects are tools or modules that started out as internal development that became good enough (and generic enough) that it could be exposed as an open source project.

There are some tradeoffs to opening up your code to the world, such as getting free patches at the expense of having to spend time merging pull requests. But generally the number one benefit is ‘street cred’, both for the company and for the individual developers.

But the majority of the open source projects are either the code for a specific product or some other executable piece of software (libraries, frameworks, plug-ins, etc.). But how about other parts of a stack that can also be codified? With the increased popularity of tools such as Chef and Puppet a lot of companies are managing their systems through a set of scripts that use a DSL defined by the specific framework. This is what it’s known as Infrastructure as code.

This is an exploratory thought, but what if we combine both ideas? Open up our Chef or Puppet implementations of common infrastructure. Enough organizations use a Node.js installation, an Apache server or a Mongo database to benefit from a generic implementation that can be fine-tuned or managed as the needs change. Yes, all these can be installed easily, but not necessarily in a consistent manner across the board. Versions or configurations can vary from one instance to the other.

Github already has many repositories with puppet code, so this is not a new idea, but it just needs a name. I would call it open-sourcing infrastructure.

Commented out code is dead code

“I’m uncommenting and fixing this code that I commented out a few days ago”, said no programmer ever.

Commenting out code is a nasty habit. I try to avoid it as much as I can. I try to not even use it for comments. On numerous occasions, I’ve seen /*commented out code*/ that has been sitting there for months, judging from the annotation on source control.

What’s the number one reason for committing commented out code? “To keep it as reference in case we want to add it back in the future”. Ok, this is an argument I can rationalize and tolerate. But how about these two thoughts?

  1. If you’re adding back that logic/configuration within a few days, then it’s going to be fresh in your mind, no need to make your code your personal post-it; if after a couple of weeks, those lines are never added back, it probably means they’ll never be used.
  2. What is a VCS for if not for keeping history? You can safely delete any lines you want and be rest assured that you can recover them at any point in the future. In fact I would take this a step further and delete all unused code, whether commented out or not. Don’t keep anything you think you might use in the future. If you do use it in the future, you can dig it out.

But the reason that should not be acceptable for commenting out code is: “I’m commenting out these test cases, so the build is green; I’ll get back to them later”. You’re defeating the purpose of automated testing and continuous integration.

If the code is dead, bury it, if your tests are broken, fix them.