Showing posts with label chef. Show all posts
Showing posts with label chef. Show all posts

December 18, 2015

Day 18 - Deployments done the Delivery way

Written by: Christopher Webber (@cwebber)
Edited by: Ted Young (@jitterted)

This year has been all about Delivery for me, starting with getting https://www.chef.io deployed on stage at ChefConf using Delivery. It has been a blast moving services into Delivery and using Delivery to build new ones.

What Even is Delivery?

In the simplest of terms, Delivery is tool for enabling continuous delivery. It has been shaped over many years of experience working with folks all across the industry in building their pipelines. For me, it is an opinionated build and deployment pipeline. Explaining why things are the way they are is a bit outside of the scope of this post. What follows is a brief overview of the way I view the world.

Phases and Stages

Delivery is made up of a set of stages: Verify, Build, Acceptance, Union, Rehearsal, Delivered. There are manual approval steps between the Verify and Build stages, and the Acceptance and Union stages. Each stage is made up of a series of phases where actual tasks are executed.

Below is a list of the stages, and the phases that they execute.

  • Verify: Before another developer reviews the code, verify it is worthy of being viewed by a human.
    • Unit: Unit test the code that you are deploying. For a cookbook, this is probably ChefSpec, for a Rails app, it might be RSpec or minitest.
    • Lint: This is a test of whether your are properly formatting your code and following best practices. For Ruby apps, you probably will want to run Rubocop.
    • Syntax: Is it parse-able? Just like we do a configtest before restarting nginx or apache, it is useful to do the same with our code.
  • Build: Now that code review is done and we have merged to master, let's build some artifacts (cookbooks, packages, images, etc.)
    • Unit: Same as before, but now on an integrated codebase (we merge the code to master during the manual approval step between Verify and Build).
    • Lint: Same as before, but now on an integrated codebase.
    • Syntax: Same as before, but now on an integrated codebase.
    • Quality: This is where you might fail a build it if doesn't have the right amount of code coverage, etc. You are looking to test that the code meets a quality standard of some sort.
    • Security: Test the code for security. In Rails, running Brakeman along with bundler-audit is a great place to start.
    • Build: Produce artifacts that we can promote through the process. This may be a cookbook, a software package, or even a system image.
  • Acceptance: Setup the artifact(s) in an environment where we can verify that they are ready to go to production. We have a manual step after this to give someone a chance to poke around and make sure things work well.
    • Provision: What this means for your environment may vary, but I usually use it to stand up the instances I am going to deploy onto and any other supporting pieces, such as ELBs, RDS Instances, Elasticache Instances, etc.
    • Deploy: In most cases, this is a matter of, run the cookbook associated with the service.
    • Smoke: Does it work? For most web services, it is as simple as making sure you get a 200 OK from a healthcheck endpoint to prove, yup, it started. These tests should be super lightweight to provide fast feedbackin case it fails, so we don't waste time doing Functional tests.
    • Functional: This is where we ensure it functions correctly. Whether that is testing a bunch of endpoints, running selenium scripts, or pointing metasploit at the instances, you want to validate that the system is functional.
  • Union: Do the upstream services still work? After we do a pass on the service we are deploying, we go and re-run the phases in the union stage for the projects that have declared a dependency on this project.
    • All phases are the same as in Acceptance.
  • Rehearsal: Ensure that we can do the deploy one more time cleanly.
    • All phases are the same as in Acceptance.
  • Delivered: Actually build out the "production" service.
    • All phases are the same as in Acceptance.

As you may have noticed, most of the phases are executed in more than one stage, allowing us to ensure that the state of the world is good. For example, all of the phases that run in Verify also run in Build to validate that things are still good. And in Acceptance, Union, Rehearsal, and Delivered, each stage runs the same set of phases to build each environment the same way.

Ship it!

So what does this actually look like? For most services, I see it break out into three pieces:

  1. The application: This is the actual service you are going to run. It is usually the base of the repo.
  2. The deploy cookbook: A cookbook that lives under cookbooks that you run on the node on which the service is running.
  3. The build cookbook: A cookbook that lives at .delivery/build-cookbook that handles all of the moving parts that make the service go.

Most of us are familiar with the first two. The application is the actual thing. If you are a Ruby shop, it is probably a Rails or Sinatra app. If you are a Java shop, it might be a Spring app. Whatever it is, it is the actual thing you are deploying. The deploy cookbook is the configuration management code that makes the node do the thing. If you are deploying a Rails app, it probably sets up nginx, adds some users, and spins the application up using Unicorn.

The Build Cookbook

I want to focus on the third item for a bit. The build cookbook is what Delivery, using the delivery-cli, actually runs. Each phase is represented by a recipe. So there are unit, lint, syntax, provision, etc., recipes in this cookbook. There is also a recipe called default, which is run as root, and runs at the beginning of each phase. Once that finishes, the actual phase recipe is executed with non-root privileges. The build cookbook provides the directed orchestration I have always wanted: I can stand up a database, run the data import, and then, only if that succeeds, start up the app instances. In the case of omnitruck, we make sure that the instances have everything they need, like a load balancer and CDN service before we deploy the code.

The Shared Repo

Since all three pieces, the application, and build and deploy cookbooks, are all in one repo, changes to any aspect of the application can easily be found. The coolest thing is that we now tie all changes to the service to a single commit history. This means, if we make a change to the app and a corresponding config change is needed in nginx, we see it all together. It also means that all changes to the app are tracked in one place. Whether we are tracking that a new route was added to the app or that a new header was added via the load balancer, all of the changes are wrapped up neatly in a single log of commits.

Deploying Omnitruck

Chef runs a service called omnitruck that provides information about packages used by chef-client and other tools. The application follows the pattern I outlined above. You can visit the omnitruck repo and browse through the code. Here is a high level overview of what it looks like to ship omnitruck.

  1. The process starts with someone creating a change, automatically kicking off the verify stage. If it passes, we review the code and approve the change.
  2. From there it heads to build and acceptance. In the build stage, we get a set of deployable artifacts. For omnitruck, it is the deploy cookbook being published to the chef-server and the source code being neatly packaged into a tarball.
  3. The fun really begins in the acceptance stage where we start standing stuff up. We provision an ELB, a set of EC2 instances, some CDN config, and some DNS entries.
  4. Still in the acceptance stage, we next use the deploy cookbook, which comes from the chef-server, to deploy omnitruck to the EC2 instances. If the chef-client run completes successfully on the EC2 instances, we flush the cache on the CDN.
  5. Smoke and functional tests then run to ensure that we are good to go.
  6. In union, we do it all over again, except, that we rerun the Union phase of each of the consumer projects, which are other projects that have declared a dependency on the service we are shipping. For example, while you can't see it in the omnitruck repo, there is a project called chef-web-ocfrontend which defines the nginx instances that support www.opscode.com. That service depends on omnitruck so we verify that it didn't get broken in this process.
  7. From there, we move on to the rehearsal stage and the delivered stage which makes the project live.

Conclusion

As I watch Delivery mature, I am amazed at how awesome the workflow has become. While the product Delivery is closed source, the delivery-cli, which handles the actual running of code is freely available for download.

December 1, 2015

Day 1 - Using Automation to build an OpenStack Cloud

Written by: JJ Asghar, @jjasghar

Edited by: Klynton Jessup, @klyntonj

I wrote this as a narrative around what I hope a typical engineer would experience trying to resolve an issue. This story, while fiction, is taken from personal experiences and inspired by what I hope would happen. I hope you enjoy this.

Like normal, I came to my stand-up unexcited for the normal grind. My boss Steve came in, sat down at the conference table, and put his notebook down. Yes, stand-up happened in a conference room and yes we actually sat at the conference room table, so, honestly, I never really understood why we called it a stand-up. I guess it was a “rebranding” of “Daily Status” or maybe it was a hold over from those days we tried to do “agile.” Who knows?

Anyway, Steve opened his notebook and looked around at my team. “So, we have a problem. The local development for our cookbooks is great but we need to start testing on multiple platforms. There’s a chance we might be spinning up a new application and it only runs on CentOS.” (We’re a Ubuntu shop.) There were some sighs and groans as we looked around at each other.

“Today y’all have your normal responsibilities but we need to think of a way to parallelize our cookbook development. I dunno if you’ve ever tried it but running kitchen test -p 2 on your laptops brings everything to a grinding halt. Let’s skip stand-up and spend the first half of the day doing some research and let’s try and come up with some ideas. We’ll get back together after lunch.” With that Steve closed his notebook, got up, and walked out of the room.

“Interesting,” I said under my breath. “I guess it’s time to start Googling.”

I walked back over to my laptop and typed in “parallelize cookbook development” in Google. Wow, there was nothing about test-kitchen on there, until I got to http://leopard.in.ua/2013/12/01/chef-and-tdd/ on the second page! It’s 2015, so this post from 2013 has to be out of date. Right? But didn’t Steve mention the -p in stand-up? I read through the page and found out that there was a kitchen test --parallel that must be what he was talking about. Sweet. OK, he isn’t confused and just saying words again.

I hopped over to https://github.com/test-kitchen/ and noticed all the drivers for test-kitchen. There were a ton, ranging from AWS, to HyperV, to OpenStack, and Digital Ocean (DO). This is great, I can run test-kitchen on any cloud I want. I started to play around with the different options and settled on https://github.com/test-kitchen/kitchen-digitalocean as my test version. I’ve always enjoyed using Digital Ocean to do my development and it seemed reasonable. I opened my .kitchen.yml and looked at the configuration.

driver:
  name: vagrant
  network:
    - ["forwarded_port", {guest: 80, host: 8080, auto_correct: true}]
  customize:
    cpus: 4
    memory: 8096

I made the changes to the driver name and added in the Droplet size I wanted:

driver:
  name: digitalocean
driver_config:
  size: 8gb

And then installed the gem:

gem install kitchen-digitalocean
export DIGITALOCEAN_ACCESS_TOKEN="THIS_ISNT_MY_REAL_TOKEN"
export DIGITALOCEAN_SSH_KEY_IDS="8675, 30900"

Then I ran test-kitchen. Nice! It worked. This is great. I spun up both of my test boxes on Digital Ocean, one Ubuntu 14.04 and one CentOS 7 at the same time and verified them both. I got up and walked over to Steve’s office. “Hey Steve, I think I got an answer for you from stand-up,” I said.

“Oh yeah? That was fast,” he said looking up from his laptop. “What is it?”

“I hooked up test-kitchen to Digital Ocean and ran a test command on both systems at the same time,” I said, with confidence.

“Ha! That’s awesome. I had no idea you could use other drivers with test-kitchen, I thought it was all local development.”

“Yeah I learned that after looking at the GitHub test-kitchen org, turns out it was pretty easy to set up.”

“Hang on, Digital Ocean costs money though right?”

“Yeah, it’s only pennies though, and I charged my own personal account.”

“Ah well if it’s ‘only pennies’ I guess you won’t need to expense it.”

“Ouch, fine. I guess I deserve that. So you think it might be too expensive to run our test suite for every commit?”

“You read my mind. We need something like DO but local. How about OpenStack?”

“HAHAHAAHAHAAHAHAHAHAHAH” I actually started crying from the tears of laughter. “Yeah right, I don’t have enough time to set up an OpenStack cloud. OpenStack has only gotten worse since they started the project.”

Steve took a moment to let me compose myself, then said plainly, “You should give it a shot, I heard JJ, the OpenStack Chef guy, mention something called an ‘OpenStack-model-t.’ It’s a project Chef was working on to help people build basic OpenStack clouds. Any chance there’s a kitchen-openstack driver for it like there was for Digital Ocean?”

“Actually, yeah there is. Ah, OK I see where you’re going with this, I’ll report back what I find out about these two projects.” I turned around and walked out of his office.

I sat down at my laptop, opened up Chrome, and typed in: openstack-model-t. The first hit was: https://github.com/chef-partners/openstack-model-t. One of the first things that caught my eye was: “The Customer Can Have Any Color He Wants So Long As It’s Black”. Henry Ford. Funny, real funny Chef. I started looking around. It seemed that the cookbook was pretty straight forward. There was even a .kithen.yml file in the repo, so I figured I’d checkout the repo and give it a shot.

cd ~
mkdir openstack-stuff
cd openstack-stuff
git clone https://github.com/chef-partners/openstack-model-t.git
cd openstack-model-t
chef exec kitchen verify

My laptop fans started to spin and my MacBookPro started to heat up, yep, I was building an OpenStack cloud on my laptop. After about 20 minutes, I came back to my laptop and I saw:

         Process "nova-scheduler"
           should be running
           user
             should eq "nova"
         Process "neutron-l3-agent"
           should be running
           user
             should eq "neutron"
         Process "neutron-dhcp-agent"
           should be running
           user
             should eq "neutron"
         Process "neutron-metadata-agent"
           should be running
           user
             should eq "neutron"
         Process "neutron-linuxbridge-agent"
           should be running
           user
             should eq "neutron"

       Finished in 1.52 seconds (files took 0.41367 seconds to load)
       67 examples, 0 failures

       Finished verifying <default-ubuntu-1404> (0m18.47s).
-----> Kitchen is finished. (16m24.13s)

Wow, a fully tested and verified All-in-One OpenStack cloud in one command! Let’s see if I can spin up a VM. Reading the README, it seems I have to run a script when I first ssh into the box to make sure I have an initial network and “CirrOS” image on the machine. OK, so be it.

chef exec kitchen login
sudo su -
bash build_neutron_networks.sh

Now the README tells me I can look at the URL in my web browser, with https://127.0.0.1:8443/horizon and I should see the OpenStack login. I opened up Chrome again and typed it in. Sweet, it worked. Using the demo username and password of mypass, I was able to login without a hitch. I clicked the Instance button on the left-hand side and then “Launch Instance”. Heavens to Betsy, it actually worked!

I looked around, I didn’t believe it. A successful OpenStack build out of the box? What kind of black magic was this? This is bonkers. I stood up from my laptop and took a walk. This opened up a huge opportunity for my team and I needed to clear my head.

As I walked around my office I saw one of our IT team members carrying a new laptop that he had provisioned for someone on another team. He was going the same direction I was, so I figured I’d ask, “Hey Billy, quick question what do you do with the old desktops that these laptops are going to replace?”

I guess I caught him off guard because he turned and looked at me with confused and concerned eyes, “Oh, hey, honestly, nothing. They have already depreciated in value so they just sit in the IT room until we get Goodwill to come ’round and we write them off as a donation. Nothing too exciting.”

“Interesting, any chance I can get…three of them, I’d like to try something out.”

“Sure, no problem, come ’round in a bit and I’ll get you what you need.”

“Awesome, see you then.” And I continued walking.

About twenty or thirty minutes later I walked up to the IT room and Billy was looking at Reddit. No real surprise there, 60% of IT work is done or is linked to from Reddit and yes you can quote me on that. “Hey Billy, I’m here, can you hook me up?”

“Sure, no problem, take what you need; just write it down on that clipboard.”

“Can’t I email it to you?”

“Meh, you could, but I’d prefer to see you write down what you take when you take it.”

“Fair enough,” I said, as I picked up one of the desktops. “Any chance you have a spare switch or two?”

“Probably, just be sure to put it down on that clipboard if you find one.”

“Cool, thanks again.”

“No problem”, he said as he turned back to looking at his laptop.

I pulled together what I needed and took it all back to my desk. The README for the model-t had a controller node that needed 3 NICs in it and the compute nodes only needed 2 if I wasn’t going to use a storage network. I thought about it for a few moments and realized that nope, I don’t need a storage network for running a test-kitchen OpenStack cloud, so 2 NICs would be fine.

I powered on the controller node, it had Windows 7 on it, which meant I had to re-image these machines. I downloaded Ubuntu 14.04, created a USB boot disk, and started the installation on the machines. I tried doing it in parallel but ended up confusing myself; so I did it serially. I rebooted the controller node and started the installation. I did a basic install, naming the machine Controller, original I know, and connected it to my lab network. I was able to ping 8.8.8.8 with only my management network plugged in so I felt like a was making some progress. I repeated the process with the two Compute machines naming them Compute1 and, you guessed it, Compute2 and then decided to break for lunch. Over lunch I talked to some of my coworkers about what I was doing and all-in-all there was a pretty positive response. I did get a couple giggles and shakes of their heads about OpenStack, but I expected that.

I walked back to my desk, looked at the beginning of what I was hoping to be a real OpenStack cloud and was pretty proud of my work. I remembered then that Steve wanted to have a sync up after lunch, so I headed over to the dreaded stand-up conference room. As I walked in Steve was just starting.

“So it seems we have a couple options on the table. Most of you figured out that you could run test-kitchen with different drivers giving you access to more compute resources. That’s good. Some of you didn’t come to me with anything so I have to assume you either got caught up with something else, or didn’t bother looking.”

After that inspiring talk, I spoke up, “Hey Steve, I’ve made some pretty impressive progress with the OpenStack-model-t cookbook, I’m going to go heads down on it for the rest of the day to see if I can get this done by EOB.”

Steve looked at me, smiled, then looked at the rest of the room and said, “See that’s what I want, someone to run with my assignments. OK, let’s get back to work and let’s see where we are at stand-up tomorrow.” We all started filing out of the conference room, all of us realizing at that moment how much of a waste of our time our sync meetings were.

I sat down in front of my laptop and started to think of what I had to do next. I put a small plan together:

  1. Get a hosted Chef instance, the 5 free nodes they give me ought to be enough to get this proof of concept built.
  2. Get the model-t cookbook up on the hosted instance.
  3. Figure out what I need in the run_list for each of the machines.
  4. Converge and see my cloud come to life.

First is always first, right? So I went to https://manage.chef.io and got myself an instance of hosted Chef. It was pretty easy I just created a new org, baller-model-t and then pulled down the getting_started kit. I did a chef exec knife status to confirm it was working and it was. Awesome, step 1 complete. Second, I uploaded the openstack-model-t cookbook to my hosted instance. It complained about dependencies because I had forgotten to do the Berkshelf stuff. So I did the following:

cd openstack-model-t
chef berks install
chef berks upload
cd ..
chef exec knife cookbook upload openstack-model-t -o .

Sweet, success. OK, as it looked in the cookbook the All-in-One test was sent from the default.rb recipe. That’s good, that’s my controller node. Now the question is what’s in my compute node run_list? Well it looks like compute_node.rb is the wrapper for just a compute node. Great, I’m almost done. Now to get the converge to happen. I still need to get these boxes to check into the hosted Chef instance, so I decided to use knife bootstrap to do it in one shot. These were the commands I used:

chef exec knife bootstrap controller -x ubuntu --sudo -r 'openstack-model-t::default'
chef exec knife bootstrap compute1 -x ubuntu --sudo -r 'openstack-model-t::compute_node'
chef exec knife bootstrap compute2 -x ubuntu --sudo -r 'openstack-model-t::compute_node'

I went to https://controller/horizon and I couldn’t believe what I saw. I logged in with admin/mypass and I saw 3 hypervisors and an empty OpenStack cloud ready to go. I ssh’d into the controller node, ran the bash build_neutron_networks.sh, and saw it come to life. (NOTE FROM MY FUTURE SELF: If I remember correctly, I had to change some of the floating-ip options around on my second build of this, but still, this was amazing.)

I spun up a CirrOS image, then a second, and a third. I ssh’d into each of them and make sure I could ping 8.8.8.8. It worked! I had a running multi-node horizontally scalable cloud on my desk. I looked at the clock and it was only about an hour ‘til quitting time. I searched for some OpenStack cloud images and found Ubuntu’s and CentOS’ and injected them into my cloud. I leaned back for a moment and thought about what I had accomplished. I had to smile. As someone who has always been interested but scared to try building out OpenStack this seemed like dream come true.

For a couple moments, I thought about going home, then I realized that I could finish the project off if I just got kitchen-openstack running. So I went through the same steps that I did with Digital Ocean with the kitchen-openstack driver. I ran my parallel tests and successfully spun up a test-kitchen run.

I couldn’t be more proud of my accomplishments for today. For the first time in a while I’m going to be excited for stand-up tomorrow when I can show this off to everyone.

December 23, 2014

Day 23 - The Importance of Pluralism, or The Danger of the Letter "S"

Written by: Mike Fiedler (@mikefiedler)
Edited by: Hugh Brown (@saintaardvark)

Prologue: A Concept

One aspect of Chef that’s confusing to people comes up when searching for nodes that have some attribute: just what is the difference between a nodes reported ‘role’ attribute, and its ‘roles’ attribute? It seems like it could almost be taken for a typo – but underlying it are some very deep statements about pluralism, pluralization, and the differences between them.

One definition of the term ‘pluralism’ is “a condition or system in which two or more states, groups, principles, sources of authority, etc., coexist.” And while pluralism is common in descriptions of politics, religion and culture, it also has a place in computing: to describe situations in which many systems are in more than one desired state.

Once a desired state is determined, it’s enforced. But then time passes – days, minutes, seconds or even nanoseconds – and every moment has the potential to change the server’s actual state. Files are edited, hardware degrades, new data is pulled from external sources; anyone who has run a production service can attest to this.

Act I: Terms

Businesses commonly offer products. These products may be composed of multiple systems, where each system could be a collection of services, which run on any number of servers, which run on some amount of hosts. Each host, in turn, provides another set of services to the server that makes up part of the system, which then makes up part of the product, which the business sells.

An example to illustrate: MyFace offers a social web site (the product), which may need a web portal, a user authentication system, index and search systems, long-term photo storage systems, and many more. The web portal system may need servers like Apache or Nginx, running on any number of instances. A given server-instance will need to use any number of host services, such as /o, cpu, memory and more.

So what we loosely have is: products => systems => services => servers => hosts => services. (Turtles, turtles, turtles.)

In Days of Yore, when a Company ran a ‘Web Site’, they may have had a single System, maybe some web content Service, made up of a web Server, a database Server (maybe even on the same host) - both consuming host services (CPU, Memory, Disk, Network) - to provide the Service the Company then sells, hopefully at a profit (right!?).

Back then, if you wanted to enact a change on the web and database at the same time (maybe release a new feature), it was relatively simple, as you could control both things in one place, at roughly the same time.

Intermission

In English, to pluralize something, we generally add a suffix of “s” to the word. For instance, to convey more than one instance, “instance” becomes “instances”, “server” becomes “servers”, “system” becomes “systems”, “turtle” becomes “turtles”.

We commonly use pluralization to describe the concept of a collection of similar items, like “apples”, “oranges”, “users”, “web pages”, “databases”, “servers”, “hosts”, “turtles”. I think you see the pattern.

This extends even in to programming languages and idiomatic use in development frameworks. For example, all tables in a Rails application will typically pluralize a table name for objects named Apple to apples.

This emphasizes that the table in question does not store a singular Apple, rather many Apple instances will be located in a table named apples.

This is not pluralism, this is pluralization - don’t get them confused. Let’s move on to the next act.

Act II: Progress

We’ve evolved quite a bit since the Days of Yore. Now, a given business product can span hundreds or even thousands of systems of servers running on hosts all over the world.

As systems grow, it becomes more difficult to enact a desired change at a deterministic point in time across a fleet of servers and hosts.

In the realm of systems deployment, many solutions perform what has become known as “test-and-repair” operations - meaning that when provided a “map” desired state (which typically manifests in human-written and readable code), that when executed, will “test the current state of a given host, and perform and ”repair" operations to bring the host to the desired state - whether it be installing packages, writing files

Each system calls this map something different - cfengine:policies, bcfg2:specifications, puppet:modules, chef:recipes, ansible:playbooks, and so on. While they don’t always map 1:1, they all have some sort of concept for ‘Things that are similar, but not the same.’ They will have unique IP addresses, hostnames, while sharing enough of a set of common features to become termed something like “web heads” or the like.

Act III: Change

In the previous sections, I laid the groundwork to understand one of the more subtle features in Chef. This feature may be available in other services, but I’ll describe the one I know.

Using Chef, there is a common deployment model where Chef Clients check in with a Chef Server to ask “What is the desired state I should have?” The Chef terminology is ‘a node asks the server for its run list’.

A run list can contain a list of recipes and/or roles. A recipe tells Chef how to accomplish a particular set of tasks, like installing a package or editing a file. A role is typically a collection of recipes, and maybe some role-specific metadata (‘attributes’ in Chef lingo).

The node may be in any state at this point. Chef will test for each desired state, and take action to enforce it: install this package, write that file, etc. The end result should either be “this node now conforms to the desired state” or “this node was unable to comply”.

When the node completes successfully, it will report back to Chef Server that “I am node ‘XYZZY’, and my roles are ‘base’ and ‘webhead’, my recipes are ‘base::packages’, ‘nginx’, ‘webapp’” along with a lot of node-specific metdata (IP addresses, CPU, Memory, Disk, and much more).

This information is then indexed and available for others to search for. A common use case we have is where a load balancing node will perform a search for all nodes holding the webhead role, and add these to the balancing list.

Pièce de résistance, or Searching for Servers

In a world where we continue to scale and deploy systems rapidly and repeatedly, we often choose to reduce the need for strong consistency amongst a cluster of hosts. This means we cannot expect to change all hosts at the precise same moment. Rather we opt for eventual consistency: either all my nodes will eventually be correct, or failures will occur and I’ll be notified that something is wrong.

This changes how we think about deployments, and more importantly, how do we use our tools to find other nodes.

Using Chef’s search feature, a search like this:

webheads = search(:node, 'role:webheads')

will use the node index (a collection of node data) to look for nodes with the webheads role in the node’s run list - this will also return nodes that have not yet completed an initial Chef run and reported the complete run list back to Chef Server.

This means that my load balancer could find a node that is still mid-provisioning, and potentially begin to send traffic to a node that’s not ready to receive yet, based on the role assignment alone.

A better search, in this case might be:

webheads = search(:node, 'roles:webheads')

One letter, and all the difference.

This search now looks for an “expanded list” that the node has reported back. Any node with the role webheads that has completed a Chef run would be included. If the mandate is that only webhead nodes get the webhead role assigned to them, then I can safely use this search to include nodes that have completed their provisioning cycle.

Another way to use this search to our benefit is to search one axis and compare with another to find nodes that never completed provisioning:

badnodes = search(:node, 'role:webheads AND NOT roles:webheads')
# Or, with knife command line:
$ knife search node 'role:webheads AND NOT roles:webheads'

This will grab any nodes with an assignment but not a completion – very helpful when launching large amounts of nodes.

Note: This is not restricted to roles; this also applies to recipe/recipes. I’ve used roles here, as we use them heavily in our organization, but the same search patterns apply for using recipes directly in a run list.

Curtain

This little tidbit of role vs roles has proven time and again to be a confusing point when someone tries to pick up more of Chef’s searching abilities. But having both adjectives describe a state of the node is helpful in making a determination of what state the node is in, and whether it should be included in some other node’s list (such as in the loadbalancer/webhead example from before).

Now, you may argue against the use of roles entirely, or the use of Chef Server and search, and use something else for service discovery. This is a valid argument - but be careful you’re not tethering a racehorse to a city carriage. If you don’t fully understand its abilities, someday it might run away on you.

Epilogue

A surgeon spends a lot of time how to use a sharpened bit of metal to fix the human body. While there are many instruments he or she will go on to master, the scalpel remains the fundamental tool, available when all else is gone.

While we don’t have the same risks involved as a surgeon, the tools we use can be more complex, and provide us with a large amount of power at our fingertips.

It behooves us to learn how they work, and when and how to use its features to provide better systems and services for our businesses.

Chef’s ability to discern between what a node has been told about itself, and what it reports about itself, can make all the difference when using Chef to accomplish complex deployment scenarios and maintain flexible infrastructure as code. This not only lets you accomplish fundamentals of service discovery and less hard-coded configurations, but lets you avoid the uncertainty of bringing in yet another outside tool.

On that note, Happy Holiday(s)!

December 21, 2014

Day 21 - Baking Delicious Resources with Chef

Written by: Jennifer Davis (@sigje)
Edited by: Nathen Harvey (@nathenharvey)

Growing up, every Christmas time included the sweet smells of fresh baked cookies. The kitchen would get incredibly messy as we prepped a wide assortment from carefully frosted sugar cookies to peanut butter cookies. Holiday tins would be packed to the brim to share with neighbors and visiting friends.

Sugar Cookies

My earliest memories of this tradition are of my grandmother showing me how to carefully imprint each peanut butter cookie with a crosshatch. We’d dip the fork into sugar to prevent the dough from sticking and then carefully press into the cookie dough. Carrying on the cookie tradition, I am introducing the concepts necessary to extend your Chef knowledge and bake up cookies using LWRPs.

To follow the walkthrough example as written you will need to have the Chef Development Kit (Chef DK), Vagrant, and Virtual Box installed (or use the Chef DK with a modified .kitchen.yml configuration to use a cloud compute provider such as Amazon).

Resource and Provider Review

Resources are the fundamental building blocks of Chef. There are many available resources included with Chef. Resources are declarative interfaces, meaning that we describe the state we want the resource to be in, rather than the steps required to reach that state. Resources have a type, name, one or more parameters, actions, and notifications.

Let’s take a look at one sample resource, Route.

route “NAME” do
  gateway “10.0.0.20”
  action :delete
end

The route resource describes the system routing table. The type of resource is route. The name of the resource is the string that follows the type. The route resource includes optional parameters of device, gateway, netmask, provider, and target. In this specific example, we are only declaring the gateway parameter. In the above example we are using the delete action and there are no notifications.

Each Chef resource includes one or more providers responsible for actually bringing the resource to the desired state. It is usually not necessary to select a provider when using the Chef-provided resources, Chef will select the best provider for the job at hand. We can look at the underlying Chef code to examine the provider. For example here is the Route provider code and rubydoc for the class.

While there are ready-made resources and providers, they may not be sufficient to meet our needs to programmatically describe our infrastructure with small clear recipes. We reach that point where we want to reduce repetition, reduce complexity, or improve readability. Chef gives us the ability to extend functionality with Definitions, Heavy Weight Resources and Providers (HWRP), and Light Weight Resources and Providers (LWRP).

Definitions are essentially recipe macros. They are stored within a definitions directory within a specific cookbook. They cannot receive notifications.

HWRPs are pure ruby stored in the libraries directory within a specific cookbook. They cannot use core resources from the Chef DSL by default.

LWRPs, the main subject of this article, are a combination of Chef DSL and ruby. They are useful to abstract repeated patterns. They are parsed at runtime and compile into ruby classes.

LWRPs

Extending resources requires us to revisit the elements of a resource: type, name, parameters, actions, and notifications.

Idempotence and convergenence must also be considered.

Idempotence means that the provider ensures that the state of a resource is only changed if a change is required to bring that resource into compliance with our desired state or policy.

Convergence means that the provider brings the current resource state closer to the desired resource state.

Resources have a type. The LWRP’s resource type is defined by the name of the file within the cookbook. This implicit name follows the formula of: cookbook_resource. If the default.rb file is used the new resource will be named cookbook.

File names should match for the LWRP’s resource and provider within the resources and providers directories. The chef generators will ensure that the files are created appropriately.

The resource and it’s available actions are described in the LWRP’s resource file.

The steps required to bring the piece of the system to the desired state are described in the LWRP’s provider file. Both idempontence and convergence must also be considered when writing the provider.

Resource DSL

The LWRP resource file defines the characteristics of the new resource we want to provide using the Chef Resource DSL. The Resource DSL has multiple methods: actions, attribute, and default_action.

Resources have a name. The Resource DSL allows us to tag a specific parameter as the name of the resource with :name_attribute.

Resources have actions. The Resource DSL uses the actions method to define a set of supported actions with a comma separated list of symbols. The Resource DSL uses the default_action method to define the action used when no action is specified in the recipe.

Note: It is recommended to always define a default_action.

Resources have parameters. The Resource DSL uses the attribute method to define a new parameter for the resource. We can provide a set of validation parameters associated with each parameter.

Let’s take a look at an example of a LWRP resource from existing cookbooks.

djbdns includes the djbdns_rr resource.

actions :add
default_action :add

attribute :fqdn,     :kind_of => String, :name_attribute => true
attribute :ip,       :kind_of => String, :required => true
attribute :type,     :kind_of => String, :default => "host"
attribute :cwd,      :kind_of => String

The rr resource as defined here will have one action: add, and 4 attributes: fqdn, ip, type, and cwd. The validation parameters for the attribute show that all of these attributes are expected to be of the String class. Additionally ip is the only required attribute when using this resource in our recipes.

Provider DSL

The LWRP provider file defines the “how” of our new resource using the Chef Provider DSL.

In order to ensure that our new resource functionality is idempotent and convergent we need the:

  • desired state of the resource
  • current state of the resource
  • end state of the resource after the run
Requirement Chef DSL Provider Method
Desired State new_resource
Current State load_current_resource
End State updated_by_last_action

Let’s take a look at an example of a LWRP provider from an existing cookbook to illustrate the Chef DSL provider methods.

djbdns includes the djbdns_rr provider.

action :add do
  type = new_resource.type
  fqdn = new_resource.fqdn
  ip = new_resource.ip
  cwd = new_resource.cwd ? new_resource.cwd : "#{node['djbdns']['tinydns_internal_dir']}/root"

  unless IO.readlines("#{cwd}/data").grep(/^[\.\+=]#{fqdn}:#{ip}/).length >= 1
    execute "./add-#{type} #{fqdn} #{ip}" do
      cwd cwd
      ignore_failure true
    end
    new_resource.updated_by_last_action(true)
  end
end
new_resource

new_resource returns an object that represents the desired state of the resource. We can access all attributes as methods of that object. This allows us to know programmatically our desired end state of the resource.

type = new_resource.type assigns the value of the type attribute of the new_resource object that is created when we use the rr resource in a recipe with a type parameter.

load_current_resource

load_current_resource is an empty method by default. We need to define this method such that it returns an object that represents the current state of the resource. This method is responsible for loading the current state of the resource into @current_resource.

In our example above we are not using load_current_resource.

updated_by_last_action

updated_by_last_action notifies Chef that a change happened to converge our resource to its desired state.

As part of the unless block executing new_resource.updated_by_last_action(true) will notify Chef that a change happened to converge our resource.

Actions

We need to define a method for each supported action within the LWRP resource file. This method should handle doing whatever is needed to configure the resource to be in the desired state.

We see that the one action defined is :add which matches our LWRP resource defined actions.

Cooking up a cookies_cookie resource

Preparing our kitchen

First, we need to set up our kitchen for some holiday baking! Test Kitchen is part of the suite of tools that come with the Chef DK. This omnibus package includes a lot of tools that can be used to personalize and optimize your workflow. For now, it’s back to the kitchen.

Kitchen Utensils

Note: On Windows you need to verify your PATH is set correctly to include the installed packages. See this article for guidance.

Download and install both Vagrant, and Virtual Box if you don’t already have them. You can also modify your .kitchen.yml to use AWS instead.

We’re going to create a “cookies” cookbook that will hold all of our cookie recipes. First we will use the chef cli to generate a cookbook that will use the default generator for our cookbooks. You can customize default cookbook creation for your own environments.

chef generate cookbook cookies
Compiling Cookbooks...
Recipe: code_generator::cookbook

followed by more output.

We’ll be working within our cookies cookbook so go ahead and switch into the cookbook’s directory.

$ cd cookies

By running chef generate cookbook we get a number of preconfigured items. One of these is a default Test Kitchen configuration file. We can examine our kitchen configuration by looking at the .kitchen.yml file:

$ cat .kitchen.yml

---
driver:
  name: vagrant

provisioner:
  name: chef_zero

platforms:
  - name: ubuntu-12.04
  - name: centos-6.5

suites:
  - name: default
    run_list:
      - recipe[cookies::default]
    attributes:

The driver section is the component that configures the behavior of Test Kitchen. In this case we will be using the kitchen-vagrant driver that comes with Chef DK. We could easily configure this to use AWS or any other cloud compute provisioner.

The provisioner is chef_zero which allows us to use most of the functionality of integrating with a Chef Server without any of the overhead of having to install and manage one.

The platforms define the operating systems that we want to test against. Today we will only work with the CentOS platform as defined in this file. You can delete or comment out the Ubuntu line.

The suites is the area to define what we want to test. This includes a run_list with the cookbook::default recipe.

Next, we will spin up the CentOS instance.

Preheat Oven

Note: Test Kitchen will automatically download the vagrant box file if it’s not already available on your workstation. Make sure you’re connect to a sufficiently speedy network!

$ kitchen create

Let’s verify that our instance has been created.

$ kitchen list

➜  cookies git:(master) ✗ kitchen list
Instance             Driver   Provisioner  Last Action
default-centos-65    Vagrant  ChefZero     Created

This confirms that a local virtualized node has been created.

Let’s go ahead and converge our node which will install chef on the virtual node.

$ kitchen converge

Cookie LWRP prep

We need to create a LWRP resource and provider file and update our default recipe.

We create the LWRP base files using the chef cli included in the Chef DK. This will create the two files resources/cookie.rb and providers/cookie.rb

chef generate lwrp cookie

Let’s edit our cookie LWRP resource file and add a single supported action of create.

Edit the resources/cookie.rb file with the following content:

actions :create

Next edit our cookie LWRP provider file and define the supported create action. Our create method will log a message that includes the name of our new_resource to STDOUT.

Edit the providers/cookie.rb file with the following content:

use_inline_resources

action :create do
 log " My name is #{new_resource.name}"
end

Note: use_inline_resources was introduced in Chef version 11. This modifies how LWRP resources are handled to enable the inline evaluation of resources. This changes how notifications work, so read carefully before modifying LWRPs in use!

Note: The Chef Resource DSL method is actions because we are defining multiple actions that will be defined individually within the providers file.

We will now test out our new resource functionality by writing a recipe that uses it. Edit the cookies cookbook default recipe. The new resource follows the naming format of #{cookbookname}_#{resource}.

cookies_cookie "peanutbutter" do
   action :create
end

Converge the image again.

$ kitchen converge

Within the output:

Converging 1 resources
Recipe: cookies::default
  * cookies_cookie[peanutbutter] action create[2014-12-19T02:17:39+00:00] INFO: Processing cookies_cookie[peanutbutter] action create (cookies::default line 1)
 (up to date)
  * log[ My name is peanutbutter] action write[2014-12-19T02:17:39+00:00] INFO: Processing log[ My name is peanutbutter] action write (/tmp/kitchen/cache/cookbooks/cookies/providers/cookie.rb line 2)
[2014-12-19T02:17:39+00:00] INFO:  My name is peanutbutter

Our cookies_cookie resource is successfully logging a message!

Improving the Cookie LWRP

We want to improve our cookies_cookie resource. We are going to add some parameters. To determine the appropriate parameters of a LWRP resource we need to think about the components of the resource we want to modify.

Delicious delicious ingredients parameter

There are some basic common components of cookies. The essential components are fat, binder, sweetner, leavening agent, flour, and additions like chocolate chips or peanut butter. The fat provides flavor, texture, and spread of a cookie. The binder will help “glue” the ingredients together. The sweetener affects the color, flavor, texture, and tenderness of a cookie. The leavening agent adds air to our cookie changing the texture and height of the cookie. The flour provides texture as well as the bulk of the cookie structure. All of the additional ingredients differentiate our cookies flavoring.

A generic recipe would involve combining all the wet ingredients and dry ingredients separately and then blending them together adding the additional ingredients last. For now, we’ll lump all of our ingredients into a single parameter.

Other than ingredients, we need to know the temperature at which we are going to bake our cookies, and for how long.

When we add parameters to our LWRP resource, it will start with the keyword attribute, followed by an attribute name with zero or more validation parameters.

Edit the resources/cookie.rb file:

actions :create  

attribute :name, :name_attribute => true
attribute :bake_time
attribute :temperature
attribute :ingredients

We’ll update our recipe to incorporate these attributes.

cookies_cookie "peanutbutter" do
   bake_time 10
   temperature 350
   action :create
end

Using a Data Bag

While we could add the ingredients in a string or array, in this case we will separate them away from our code. One way to do this is with data bags.

We’ll use a data_bag to hold our cookie ingredients. Production data_bags normally exist outside of our cookbook within our organization policy_repo. We are developing and using chef_zero so we’ll include our data bag within our cookbook in the test/integration/data_bags directory.

To do this in our development environment we update our .kitchen.yml so that chef_zero finds our data_bags.

For testing our new resource functionality, add the following to the default suite section of your .kitchen.yml:

data_bags_path: "test/integration/data_bags"

At this point your .kitchen.yml should look like this.

$ mkdir -p test/integration/data_bags/cookies_ingredients

Create peanutbutter item in our cookies_ingredients data_bag by creating a file named peanutbutter.json in the directory we just created:

{
  "id" : "peanutbutter",
  "ingredients" :
    [
      "1 cup peanut butter",
      "1 cup sugar",
      "1 egg"
    ]
}

We’ll update our recipe to actually use the cookies_ingredients data_bag:

search('cookies_ingredients', '*:*').each do |cookie_type|
  cookies_cookie cookie_type['id'] do
    ingredients cookie_type['ingredients']
    bake_time 10
    temperature 350
    action :create
  end
end

Now, we’ll update our LWRP resource to actually validate input parameters, and update our provider to create a file on our node, and use the attributes. We’ll also create an ‘eat’ action for our resource.

Edit the resources/cookie.rb file with the following content:

actions :create, :eat

attribute :name, :name_attribute => true
# bake time in minutes
attribute :bake_time, :kind_of => Integer
# temperature in F
attribute :temperature, :kind_of => Integer
attribute :ingredients, :kind_of => Array

We’ll update our provider so that we create a file on our node rather than just logging to STDOUT. We’ll use a template resource in our provider, so we will create the required template.

Create a template file:

$ chef generate template basic_recipe

Edit the templates/default/basic_recipe.erb to have the following content:

Recipe: <%= @name %> cookies

<% @ingredients.each do |ingredient| %>
<%= ingredient %>
<% end %>

Combine wet ingredients.
Combine dry ingredients.

Bake at <%= @temperature %>F for <%= @bake_time %> minutes.

Now we will update our cookie provider to use the template, and pass the attributes over to our template. We will also define our new eat action, that will delete the file we create with create.

Edit the providers/cookie.rb file with the following content:

use_inline_resources

action :create do

  template "/tmp/#{new_resource.name}" do
    source "basic_recipe.erb"
    mode "0644"
    variables(
      :ingredients => new_resource.ingredients,
      :bake_time   => new_resource.bake_time,
      :temperature => new_resource.temperature,
      :name        => new_resource.name,
    )
  end
end

action :eat do

  file "/tmp/#{new_resource.name}" do
    action :delete
  end

end

Try out our updated LWRP by converging your Test Kitchen.

kitchen converge

Let’s confirm the creation of our peanutbutter resource by logging into our node.

kitchen login

Our new file was created at /tmp/peanutbutter. Check it out:

[vagrant@default-centos-65 ~]$ cat /tmp/peanutbutter
Recipe: peanutbutter cookies

1 cup peanut butter
1 cup sugar
1 egg

Combine wet ingredients.
Combine dry ingredients.

Bake at 350F for 10 minutes.

Peanut Butter Cookie Time

Let’s try out our eat action. Update our recipe with

search("cookies_ingredients", "*:*").each do |cookie_type|
  cookies_cookie cookie_type['id'] do
    action :eat
  end
end

Converge our node, login and verify that the file doesn’t exist anymore.

$ kitchen converge
$ kitchen login
Last login: Fri Dec 19 05:45:23 2014 from 10.0.2.2
[vagrant@default-centos-65 ~]$ cat /tmp/peanutbutter
cat: /tmp/peanutbutter: No such file or directory

To add additional cookie types we can just create new data_bag items.

Cleaning up the kitchen

Messy Kitchen

Finally once we are done testing in our kitchen today, we can go ahead and clean up our virtualized instance with kitchen destroy.

kitchen destroy

Next Steps

We have successfully made up a batch of peanut butter cookies yet barely touched the surface of extending Chef with LWRPs. Check out Chatper 8 in Jon Cowie’s book Customizing Chef and Doug Ireton’s helpful 3-part article on creating LWRP. You should examine and extend this example to use load_current_resource and updated_by_last_action. Try to figure out how to add why_run functionality. I look forward to seeing you share your LWRPs with the Chef community!

Feedback and suggestions are welcome [email protected].

Additional Resources

Thank you

Thank you to my awesome editors who helped me ensure that these cookies were tasty!

December 14, 2014

Day 14 - Using Chef Provisioning to Build Chef Server

Or, Yo Dawg, I heard you like Chef.

Written by: Joshua Timberman (@jtimberman)
Edited by: Paul Graydon (@twirrim)

This post is dedicated to Ezra Zygmuntowicz. Without Ezra, we wouldn’t have had Merb for the original Chef server, chef-solo, and maybe not even Chef itself. His contributions to the Ruby, Rails, and Chef communities are immense. Thanks, Ezra, RIP.

In this post, I will walk through a use case for Chef Provisioning used at Chef Software, Inc.: building a new Hosted Chef infrastructure with Chef Server 12 on Amazon EC2. This isn’t an in-depth how to guide, but I will illustrate the important components to discuss what is required to setup Chef Provisioning, with a real world example. Think of it as a whirlwind tour of Chef Provisioning and Chef Server 12.

Background

If you have used Chef for awhile, you may recall the wiki page “Bootstrap Chef RubyGems Installation” - the installation guide that uses cookbooks with chef-solo to install all the components required to run an open source Chef Server. This idea was a natural fit in the omnibus packages for Enterprise Chef (nee Private Chef) in the form of private-chef-ctl reconfigure: that command kicks off a chef-solo run that configures and starts all the Chef Server services.

It should be no surprise, that at CHEF we build Hosted Chef using Chef. Yes, it’s turtles and yo-dawg jokes all the way down. As the CHEF CTO Adam described when talking about one Chef Server codebase, we want to bring our internal deployment and development practices in line with what we’re shipping to customers, and we want to unify our approach so we can provide better support.

Chef Server 12

As announced recently, Chef Server 12 is generally available. For purposes of the example discussed below, we’ll provision three machines: one backend, one frontend (with Chef Manage and Chef Reporting), and one running Chef Analytics. While Chef Server 12 has the capability to install add-ons, we have a special cookbook with a resource to manage the installation of “Chef Server Ingredients.” This is so we can also install the chef-server-core package used by both the API frontend nodes and the backend nodes.

Chef Provisioning

Chef Provisioning is a new capability for Chef, where users can define “machines” as Chef resources in recipes, and then converge those recipes on a node. This means that new machines are created using a variety of possible providers (AWS, OpenStack, or Docker, to name a few), and they can have recipes applied from other cookbooks available on the Chef Server.

Chef Provisioning “runs” on a provisioner node. This is often a local workstation, but it could be a specially designated node in a data center or cloud provider. It is simply a recipe run by chef-client (or chef-solo). When using chef-client, any Chef Server will do, including Hosted Chef. Of course, the idea here is we don’t have a Chef Server yet. In my examples in this post, I’ll use my OS X laptop as the provisioner, and Chef Zero as the server.

Assemble the Pieces

The cookbook that does the work using Chef Provisioning is chef-server-cluster. Note that this cookbook is under active development, and the code it contains may differ from the code in this post. As such, I’ll post relevant portions to show the use of Chef Provisioning, and the supporting local setup required to make it go. Refer to the README.md in the cookbook for the most recent information on how to use it.

Amazon Web Services EC2

The first thing we need is an AWS account for the EC2 instances. Once we have that, we need an IAM user that has privileges to manage EC2, and an SSH keypair to log into the instances. It is outside the scope of this post to provide details on how to assemble those pieces. However once those are acquired, do the following:

Put the access key and secret access key configuration in ~/.aws/config. This is automatically used by chef-provisioning’s AWS provider. The SSH keys will be used in a data bag item (JSON) that is described later. You will then want to choose an AWS region to use. For sake of example, my keypair is named hc-metal-provisioner in the us-west-2 region.

Chef Provisioning needs to know about the SSH keys in three places:

  1. In the .chef/knife.rb, the private_keys and public_keys configuration settings.
  2. In the machine_options that is used to configure the (AWS) driver so it can connect to the machine instances.
  3. In a recipe.

This is described in more detail below.

Chef Repository

We use a Chef Repository to store all the pieces and parts for the Hosted Chef infrastructure. For example purposes I’ll use a brand new repository. I’ll use ChefDK’s chef generate command:

% chef generate repo sysadvent-chef-cluster

This repository will have a Policyfile.rb, a .chef/knife.rb config file, and a couple of data bags. The latest implementation specifics can be found in the chef-server-cluster cookbook’s README.md.

Chef Zero and Knife Config

As mentioned above, Chef Zero will be the Chef Server for this example, and it will run on a specific port (7799). I started it up in a separate terminal with:

% chef-zero -l debug -p 7799

The knife config file will serve two purposes. First, it will be used to load all the artifacts into Chef Zero. Second, it will provide essential configuration to use with chef-client. Let’s look at the required configuration.

This portion tells chef, knife, and chef-client to use the chef-zero instance started earlier.

chef_server_url 'http://localhost:7799'
node_name       'chef-provisioner'

In the next section, I’ll discuss the policyfile feature in more detail. These configuration settings tell chef-client to use policyfiles, and which deployment group the client should use.

use_policyfile   true
deployment_group 'sysadvent-demo-provisioner'

As mentioned above, these are the configuration options that tell Chef Provisioning where the keys are located. The key files must exist on the provisioning node somewhere.

First here’s the knife config:

private_keys     'hc-metal-provisioner' => '/tmp/ssh/id_rsa'
public_keys      'hc-metal-provisioner' => '/tmp/ssh/id_rsa.pub'

Then the recipe - this is from the current version of chef-server-cluster::setup-ssh-keys.

fog_key_pair node['chef-server-cluster']['aws']['machine_options']['bootstrap_options']['key_name'] do
  private_key_path '/tmp/ssh/id_rsa'
  public_key_path '/tmp/ssh/id_rsa.pub'
end

The attribute here is part of the driver options set using the with_machine_options method for Chef Provisioning in chef-server-cluster::setup-provisioner. For further reading about machine options, see Chef Provisioning configuration documentation. While the machine options will automatically use keys stored in ~/.chef/keys or ~/.ssh, we do this to avoid strange conflicts on local development systems used for test provisioning. An issue has been opened to revisit this.

Policyfile.rb

Beware, gentle reader! This is an experimental new feature that mayWwill change. However, I wanted to try it out, as it made sense for the workflow when I was assembling this post. Read more about Policyfiles in the ChefDK repository. In particular, read the “Motivation and FAQ” section. Also, Chef (client) 12 is required, which is included in the ChefDK package I have installed on my provisioning system.

The general idea behind Policyfiles is to assemble node’s run list as an artifact, including all the roles and recipes needed to fulfill its job in the infrastructure. Each policyfile.rb contains at least the following.

  • name: the name of the policy
  • run_list: the run list for nodes that use this policy
  • default_source: the source where cookbooks should be downloaded (e.g., Supermarket)
  • cookbook: define the cookbooks required to fulfill this policy

As an example, here is the Policyfile.rb I’m using, at the toplevel of the repository:

name            'sysadvent-demo'
run_list        'chef-server-cluster::cluster-provision'
default_source  :community
cookbook        'chef-server-ingredient', '>= 0.0.0',
                :github => 'opscode-cookbooks/chef-server-ingredient'
cookbook        'chef-server-cluster', '>= 0.0.0',
                :github => 'opscode-cookbooks/chef-server-cluster'

Once the Policyfile.rb is written, it needs to be compiled to a lock file (Policyfile.lock.json) with chef install. Installing the policy does the following.

  • Build the policy
  • “Install” the cookbooks to the cookbook store (~/.chefdk/cache/cookbooks)
  • Write the lockfile

This doesn’t put the cookbooks (or the policy) on the Chef Server. We’ll do that in the upload section with chef push.

Data Bags

At CHEF, we prefer to move configurable data and secrets to data bags. For secrets, we generally use Chef Vault, though for the purpose of this example we’re going to skip that here. The chef-server-cluster cookbook has a few data bag items that are required before we can run Chef Client.

Under data_bags, I have these directories/files.

  • secrets/hc-metal-provisioner-chef-aws-us-west-2.json: the name hc-metal-provisioner-chef-aws-us-west-2 is an attribute in the chef-server-cluster::setup-ssh-keys recipe to load the correct item; the private and public SSH keys for the AWS keypair are written out to /tmp/ssh on the provisioner node
  • secrets/private-chef-secrets-_default.json: the complete set of secrets for the Chef Server systems, written to /etc/opscode/private-chef-secrets.json
  • chef_server/topology.json: the topology and configuration of the Chef Server. Currently this doesn’t do much but will be expanded in future to inform /etc/opscode/chef-server.rb with more configuration options

See the chef-server-cluster cookbook README.md for the latest details about the data bag items required. Note At this time, chef-vault is not used for secrets, but that will change in the future.

Upload the Repository

Now that we’ve assembled all the required components to converge the provisioner node and start up the Chef Server cluster, let’s get everything loaded on the Chef Server.

Ensure the policyfile is compiled and installed, then push it as the provisioner deployment group. The group name is combined with the policy name in the config that we saw earlier in knife.rb. The chef push command uploads the cookbooks, and also creates a data bag item that stores the policyfile’s rendered JSON.

% chef install
% chef push provisioner

Next, upload the data bags.

% knife upload data_bags

We can now use knife to confirm that everything we need is on the Chef Server:

% knife data bag list
chef_server
policyfiles
secrets
% knife cookbook list
apt                      11131342171167261.63923027125258247.235168191861173
chef-server-cluster      2285060862094129.64629594500995644.198889591798187
chef-server-ingredient   37684361341419357.41541897591682737.246865540583454
chef-vault               11505292086701548.4466613666701158.13536425383812

What’s with those crazy versions? That is what the policyfile feature does. The human readable versions are no longer used, cookbook versions are locked using unique, automatically generated version strings, so based on the policy we know the precise cookbook dependency graph for any given policy. When Chef runs on the provisioner node, it will use the versions in its policy. When Chef runs on the machine instances, since they’re not using Policyfiles, it will use the latest version. In the future we’ll have policies for each of the nodes that are managed with Chef Provisioning.

Checkpoint

At this point, we have:

  • ChefDK installed on the local privisioning node (laptop) with Chef client version 12
  • AWS IAM user credentials in ~/.aws/config for managing EC2 instances
  • A running Chef Server using chef-zero on the local node
  • The chef-server-cluster cookbook and its dependencies
  • The data bag items required to use chef-server-cluster’s recipes, including the SSH keys Chef Provisioning will use to log into the EC2 instances
  • A knife.rb config file that will point chef-client at the chef-zero server, and tells it to use policyfiles

Chef Client

Finally, the moment (or several moments…) we have been waiting for! It’s time to run chef-client on the provisioning node.

% chef-client -c .chef/knife.rb

While that runs, let’s talk about what’s going on here.

Normally when chef-client runs, it reads configuration from /etc/chef/client.rb. As I mentioned, I’m using my laptop, which has its own run list and configuration, so I need to specify the knife.rb discussed earlier. This will use the chef-zero Chef Server running on port 7799, and the policyfile deployment group.

In the output, we’ll see Chef get its run list from the policy file, which looks like this:

resolving cookbooks for run list: ["chef-server-cluster::[email protected] (081e403)"]
Synchronizing Cookbooks:
  - chef-server-ingredient
  - chef-server-cluster
  - apt
  - chef-vault

The rest of the output should be familiar to Chef users, but let’s talk about some of the things Chef Provisioning is doing. First, the following resource is in the chef-server-cluster::cluster-provision recipe:

machine 'bootstrap-backend' do
  recipe 'chef-server-cluster::bootstrap'
  ohai_hints 'ec2' => '{}'
  action :converge
  converge true
end

The first system that we build in a Chef Server cluster is a backend node that “bootstraps” the data store that will be used by the other nodes. This includes the postgresql database, the RabbitMQ queues, etc. Here’s the output of Chef Provisioning creating this machine resource.

Recipe: chef-server-cluster::cluster-provision
  * machine[bootstrap-backend] action converge
    - creating machine bootstrap-backend on fog:AWS:862552916454:us-west-2
    -   key_name: "hc-metal-provisioner"
    -   image_id: "ami-b99ed989"
    -   flavor_id: "m3.medium"
    - machine bootstrap-backend created as i-14dec01b on fog:AWS:862552916454:us-west-2
    - Update tags for bootstrap-backend on fog:AWS:862552916454:us-west-2
    -   Add Name = "bootstrap-backend"
    -   Add BootstrapId = "http://localhost:7799/nodes/bootstrap-backend"
    -   Add BootstrapHost = "champagne.local"
    -   Add BootstrapUser = "jtimberman"
    - create node bootstrap-backend at http://localhost:7799
    -   add normal.tags = nil
    -   add normal.chef_provisioning = {"location"=>{"driver_url"=>"fog:AWS:XXXXXXXXXXXX:us-west-2", "driver_version"=>"0.11", "server_id"=>"i-14dec01b", "creator"=>"user/IAMUSERNAME, "allocated_at"=>1417385355, "key_name"=>"hc-metal-provisioner", "ssh_username"=>"ubuntu"}}
    -   update run_list from [] to ["recipe[chef-server-cluster::bootstrap]"]
    - waiting for bootstrap-backend (i-14dec01b on fog:AWS:XXXXXXXXXXXX:us-west-2) to be ready ...
    - bootstrap-backend is now ready
    - waiting for bootstrap-backend (i-14dec01b on fog:AWS:XXXXXXXXXXXX:us-west-2) to be connectable (transport up and running) ...
    - bootstrap-backend is now connectable
    - generate private key (2048 bits)
    - create directory /etc/chef on bootstrap-backend
    - write file /etc/chef/client.pem on bootstrap-backend
    - create client bootstrap-backend at clients
    -   add public_key = "-----BEGIN PUBLIC KEY-----\n..."
    - create directory /etc/chef/ohai/hints on bootstrap-backend
    - write file /etc/chef/ohai/hints/ec2.json on bootstrap-backend
    - write file /etc/chef/client.rb on bootstrap-backend
    - write file /tmp/chef-install.sh on bootstrap-backend
    - run 'bash -c ' bash /tmp/chef-install.sh'' on bootstrap-backend

From here, Chef Provisioning kicks off a chef-client run on the machine it just created. This install.sh script is the one that uses CHEF’s omnitruck service. It will install the current released version of Chef, which is 11.16.4 at the time of writing. Note that this is not version 12, so that’s another reason we can’t use Policyfiles on the machines. The chef-client run is started on the backend instance using the run list specified in the machine resource.

Starting Chef Client, version 11.16.4
 resolving cookbooks for run list: ["chef-server-cluster::bootstrap"]
 Synchronizing Cookbooks:
   - chef-server-cluster
   - chef-server-ingredient
   - chef-vault
   - apt

In the output, we see this recipe and resource:

Recipe: chef-server-cluster::default
  * chef_server_ingredient[chef-server-core] action reconfigure
    * execute[chef-server-core-reconfigure] action run
      - execute chef-server-ctl reconfigure

An “ingredient” is a Chef Server component, either the core package (above), or one of the Chef Server add-ons like Chef Manage or Chef Reporting. In normal installation instructions for each of the add-ons, their appropriate ctl reconfigure is run, which is all handled by the chef_server_ingredient resource. The reconfigure actually runs Chef Solo, so we’re running chef-solo in a chef-client run started inside a chef-client run.

The bootstrap-backend node generates some files that we need on other nodes. To make those available using Chef Provisioning, we use machine_file resources.

%w{ actions-source.json webui_priv.pem }.each do |analytics_file|
  machine_file "/etc/opscode-analytics/#{analytics_file}" do
    local_path "/tmp/stash/#{analytics_file}"
    machine 'bootstrap-backend'
    action :download
  end
end

machine_file '/etc/opscode/webui_pub.pem' do
  local_path '/tmp/stash/webui_pub.pem'
  machine 'bootstrap-backend'
  action :download
end

These are “stashed” on the local node - the provisioner. They’re used for Chef Manage webui, and the Chef Analytics node. When the recipe runs on the provisioner, we see this output:

  * machine_file[/etc/opscode-analytics/actions-source.json] action download
    - download file /etc/opscode-analytics/actions-source.json on bootstrap-backend to /tmp/stash/actions-source.json
  * machine_file[/etc/opscode-analytics/webui_priv.pem] action download
    - download file /etc/opscode-analytics/webui_priv.pem on bootstrap-backend to /tmp/stash/webui_priv.pem
  * machine_file[/etc/opscode/webui_pub.pem] action download
    - download file /etc/opscode/webui_pub.pem on bootstrap-backend to /tmp/stash/webui_pub.pem

They are uploaded to the frontend and analytics machines with the files resource attribute. Files are specified as a hash. The key is the target file to upload to the machine, and the value is the source file from the provisioning node.

machine 'frontend' do
  recipe 'chef-server-cluster::frontend'
  files(
        '/etc/opscode/webui_priv.pem' => '/tmp/stash/webui_priv.pem',
        '/etc/opscode/webui_pub.pem' => '/tmp/stash/webui_pub.pem'
       )
end

machine 'analytics' do
  recipe 'chef-server-cluster::analytics'
  files(
        '/etc/opscode-analytics/actions-source.json' => '/tmp/stash/actions-source.json',
        '/etc/opscode-analytics/webui_priv.pem' => '/tmp/stash/webui_priv.pem'
       )
end

Note These files are transferred using SSH, so they’re not passed around in the clear.

The provisioner will converge the frontend next, followed by the analytics node. We’ll skip the bulk of the output since we saw it earlier with the backend.

  * machine[frontend] action converge
  ... SNIP
    - upload file /tmp/stash/webui_priv.pem to /etc/opscode/webui_priv.pem on frontend
    - upload file /tmp/stash/webui_pub.pem to /etc/opscode/webui_pub.pem on frontend

Here is where the files are uploaded to the frontend, so the webui will work (it’s an API client itself, like knife, or chef-client).

When the frontend runs chef-client, not only does it install the chef-server-core and run chef-server-ctl reconfigure via the ingredient resource, it also gets the manage and reporting addons:

* chef_server_ingredient[opscode-manage] action install
  * package[opscode-manage] action install
    - install version 1.6.2-1 of package opscode-manage
* chef_server_ingredient[opscode-reporting] action install
   * package[opscode-reporting] action install
     - install version 1.2.1-1 of package opscode-reporting
Recipe: chef-server-cluster::frontend
  * chef_server_ingredient[opscode-manage] action reconfigure
    * execute[opscode-manage-reconfigure] action run
      - execute opscode-manage-ctl reconfigure
  * chef_server_ingredient[opscode-reporting] action reconfigure
    * execute[opscode-reporting-reconfigure] action run
      - execute opscode-reporting-ctl reconfigure

Similar to the frontend above, the analytics node will be created as an EC2 instance, and we’ll see the files uploaded:

    - upload file /tmp/stash/actions-source.json to /etc/opscode-analytics/actions-source.json on analytics
    - upload file /tmp/stash/webui_priv.pem to /etc/opscode-analytics/webui_priv.pem on analytics

Then, the analytics package is installed as an ingredient, and reconfigured:

* chef_server_ingredient[opscode-analytics] action install
* package[opscode-analytics] action install
  - install version 1.0.4-1 of package opscode-analytics
* chef_server_ingredient[opscode-analytics] action reconfigure
  * execute[opscode-analytics-reconfigure] action run
    - execute opscode-analytics-ctl reconfigure
...
Chef Client finished, 10/15 resources updated in 1108.3078 seconds

This will be the last thing in the chef-client run on the provisioner, so let’s take a look at what we have.

Results and Verification

We now have three nodes running as EC2 instances for the backend, frontend, and analytics systems in the Chef Server. We can view the node objects on our chef-zero server:

% knife node list
analytics
bootstrap-backend
chef-provisioner
frontend

We can use search:

% knife search node 'ec2:*' -r
3 items found

analytics:
  run_list: recipe[chef-server-cluster::analytics]

bootstrap-backend:
  run_list: recipe[chef-server-cluster::bootstrap]

frontend:
  run_list: recipe[chef-server-cluster::frontend]

% knife search node 'ec2:*' -a ipaddress
3 items found

analytics:
  ipaddress: 172.31.13.203

bootstrap-backend:
  ipaddress: 172.31.1.60

frontend:
  ipaddress: 172.31.1.120

If we navigate to the frontend IP, we can sign up using the Chef Server management console, then download a starter kit and use that to bootstrap new nodes against the freshly built Chef Server.

% unzip chef-starter.zip
Archive:  chef-starter.zip
...
  inflating: chef-repo/.chef/sysadvent-demo.pem
  inflating: chef-repo/.chef/sysadvent-demo-validator.pem
% cd chef-repo
% knife client list
sysadvent-demo-validator
% knife node create sysadvent-node1 -d
Created node[sysadvent-node1]

If we navigate to the analytics IP, we can sign in with the user we just created, and view the events from downloading the starter kit: the validator client key was regenerated, and the node was created.

Next Steps

For those following at home, this is now a fully functional Chef Server. It does have premium features (manage, reporting, analytics), but those are free up to 25 nodes. We can also destroy the cluster, using the cleanup recipe. That can be applied by disabling policyfile in .chef/knife.rb:

% grep policyfile .chef/knife.rb
# use_policyfile   true
% chef-client -c .chef/knife.rb -o chef-server-cluster::cluster-clean
Recipe: chef-server-cluster::cluster-clean
  * machine[analytics] action destroy
    - destroy machine analytics (i-5cdac453 at fog:AWS:XXXXXXXXXXXX:us-west-2)
    - delete node analytics at http://localhost:7799
    - delete client analytics at clients
  * machine[frontend] action destroy
    - destroy machine frontend (i-68dfc167 at fog:AWS:XXXXXXXXXXXX:us-west-2)
    - delete node frontend at http://localhost:7799
    - delete client frontend at clients
  * machine[bootstrap-backend] action destroy
    - destroy machine bootstrap-backend (i-14dec01b at fog:AWS:XXXXXXXXXXXXX:us-west-2)
    - delete node bootstrap-backend at http://localhost:7799
    - delete client bootstrap-backend at clients
  * directory[/tmp/ssh] action delete
    - delete existing directory /tmp/ssh
  * directory[/tmp/stash] action delete
    - delete existing directory /tmp/stash

As you can see, the Chef Provisioning capability is powerful, and gives us a lot of flexibility for running a Chef Server 12 cluster. Over time as we rebuild Hosted Chef with it, we’ll add more capability to the cookbook, including HA, scaled out frontends, and splitting up frontend services onto separate nodes.

December 19, 2013

Day 19 - Automating IAM Credentials with Ruby and Chef

Written by: Joshua Timberman (@jtimberman)
Edited by: Shaun Mouton (@sdmouton)

Chef, nee Opscode, has long used Amazon Web Services. In fact, the original iteration of "Hosted Enterprise Chef," "The Opscode Platform," was deployed entirely in EC2. In the time since, AWS has introduced many excellent features and libraries to work with them, including Identity and Access Management (IAM), and the AWS SDK. Especially relevant to our interests is the Ruby SDK, which is available as the aws-sdk RubyGem. Additionally, the operations team at Nordstrom has released a gem for managing encrypted data bags called chef-vault. In this post, I will describe how we use the AWS IAM feature, how we automate it with the aws-sdk gem, and store secrets securely using chef-vault.

Definitions

First, here are a few definitions and references for readers.
  • Hosted Enterprise Chef - Enterprise Chef as a hosted service.
  • AWS IAM - management system for authentication/authorization to Amazon Web Services resources such as EC2, S3, and others.
  • AWS SDK for Ruby - RubyGem providing Ruby classes for AWS services.
  • Encrypted Data Bags - Feature of Chef Server and Enterprise Chef that allows users to encrypt data content with a shared secret.
  • Chef Vault - RubyGem to encrypt data bags using public keys of nodes on a chef server.

How We Use AWS and IAM

We have used AWS for a long time, before the IAM feature existed. Originally with The Opscode Platform, we used EC2 to run all the instances. While we have moved our production systems to a dedicated hosting environment, we do have non-production services in EC2. We also have some external monitoring systems in EC2. Hosted Enterprise Chef uses S3 to store cookbook content. Those with an account can see this with knife cookbook show COOKBOOK VERSION, and note the URL for the files. We also use S3 for storing the packages from our omnibus build tool. The omnitruck metadata API service exposes this.

All these AWS resources - EC2 instances, S3 buckets - are distributed across a few different AWS accounts. Before IAM, there was no way to have data segregation because the account credentials were shared across the entire account. For (hopefully obvious) security reasons, we need to have the customer content separate from our non-production EC2 instances. Similarly, we need to have the metadata about the omnibus packages separate from the packages themselves. In order to manage all these different accounts and their credentials which need to be automatically distributed to systems that need them, we use IAM users, encrypted data bags, and chef.

Unfortunately, using various accounts adds complexity in managing all this, but through the tooling I'm about to describe, it is a lot easier to manage now than it was in the past. We use a fairly simple data file format of JSON data, and a Ruby script that uses the AWS SDK RubyGem. I'll describe the parts of the JSON file, and then the script.

IAM Permissions

IAM allows customers to create separate groups which are containers of users to have permissions to different AWS resources. Customers can manage these through the AWS console, or through the API. The API uses JSON documents to manage the policy statement of permissions the user has to AWS resources. Here's an example:
{
  "Statement": [
    {
      "Action": "s3:*",
      "Effect": "Allow",
      "Resource": [
        "arn:aws:s3:::an-s3-bucket",
        "arn:aws:s3:::an-s3-bucket/*"
      ]
    }
  ]
}
Granted to an IAM user, this will allow that user to perform all S3 actions to the bucket an-s3-bucket and all the files it contains. Without the /*, only operations against the bucket itself would be allowed. To set read-only permissions, use only the List and Get actions:
"Action": [
  "s3:List*",
  "s3:Get*"
]
Since this is JSON data, we can easily parse and manipulate this through the API. I'll cover that shortly.

See the IAM policy documentation for more information.

Chef Vault

We use data bags to store secret credentials we want to configure through Chef recipes. In order to protect these secrets further, we encrypt the data bags, using chef-vault. As I have previously written about chef-vault in general, this section will describe what we're interested in from our automation perspective.

Chef vault itself is concerned with three things:
  1. The content to encrypt.
  2. The nodes that should have access (via a search query).
  3. The administrators (human users) who should have access.
"Access" means that those entities are allowed to decrypt the encrypted content. In the case of our IAM users, this is the AWS access key ID and the AWS secret access key, which will be the content to encrypt. The nodes will come from a search query to the Chef Server, which will be added as a field in the JSON document that will be used in a later section. Finally, the administrators will simply be the list of users from the Chef Server.

Data File Format

The script reads a JSON file, described here:
{
  "accounts": [
    "an-aws-account-name"
  ],
  "user": "secret-files",
  "group": "secret-files",
  "policy": {
    "Statement": [
      {
        "Action": "s3:*",
        "Effect": "Allow",
        "Resource": [
          "arn:aws:s3:::secret-files",
          "arn:aws:s3:::secret-files/*"
        ]
      }
    ]
  },
  "search_query": "role:secret-files-server"
}
This is an example of the JSON we use. The fields:
  • accounts: an array of AWS account names that have authentication credentials configured in ~/.aws/config - see my post about managing multiple AWS accounts
  • user: the IAM user to create.
  • group: the IAM group for the created user. We use a 1:1 user:group mapping.
  • policy: the IAM policy of permissions, with the action, the effect, and the AWS resources. See the IAM documentation for more information about this.
  • search_query: the Chef search query to perform to get the nodes that should have access to the resources. For example, this one will allow all nodes that have the Chef role secret-files-server in their expanded run list.
These JSON files can go anywhere, the script will take the file path as an argument.

Create IAM Script

Note This script is cleaned up to save space and get to the meat of it. I'm planning to make it into a knife plugin but haven't gotten a round tuit yet.
require 'inifile'
require 'aws-sdk'
require 'json'
filename = ARGV[0]
dirname  = File.dirname(filename)
aws_data = JSON.parse(IO.read(filename))
aws_data['accounts'].each do |account|
  aws_creds = {}
  aws_access_keys = {}
  # load the aws config for the specified account
  IniFile.load("#{ENV['HOME']}/.aws/config")[account].map{|k,v| aws_creds[k.gsub(/aws_/,'')]=v}
  iam = AWS::IAM.new(aws_creds)
  # Create the group
  group = iam.groups.create(aws_data['group'])
  # Load policy from the JSON file
  policy = AWS::IAM::Policy.from_json(aws_data['policy'].to_json)
  group.policies[aws_data['group']] = policy
  # Create the user
  user = iam.users.create(aws_data['user'])
  # Add the user to the group
  user.groups.add(group)
  # Create the access keys
  access_keys = user.access_keys.create
  aws_access_keys['aws_access_key_id'] = access_keys.credentials.fetch(:access_key_id)
  aws_access_keys['aws_secret_access_key'] = access_keys.credentials.fetch(:secret_access_key)
  # Create the JSON content to encrypt w/ Chef Vault
  vault_file = File.open("#{File.dirname(__FILE__)}/../data_bags/vault/#{account}_#{aws_data['user']}_unencrypted.json", 'w')
  vault_file.puts JSON.pretty_generate(
    {
      'id' => "#{account}_#{aws_data['user']}",
      'data' => aws_access_keys,
      'search_query' => aws_data['search_query']
    }
  )
  vault_file.close
  # This would be loaded directly with Chef Vault if this were a knife plugin...
  puts <<-eoh data-blogger-escaped---admins="" data-blogger-escaped---json="" data-blogger-escaped---mode="" data-blogger-escaped---search="" data-blogger-escaped--="" data-blogger-escaped--sd="" data-blogger-escaped-account="" data-blogger-escaped-admins="" data-blogger-escaped-aws_data="" data-blogger-escaped-be="" data-blogger-escaped-client="" data-blogger-escaped-code="" data-blogger-escaped-create="" data-blogger-escaped-data_bags="" data-blogger-escaped-encrypt="" data-blogger-escaped-end="" data-blogger-escaped-eoh="" data-blogger-escaped-humans="" data-blogger-escaped-knife="" data-blogger-escaped-list="" data-blogger-escaped-of="" data-blogger-escaped-paste="" data-blogger-escaped-search_query="" data-blogger-escaped-should="" data-blogger-escaped-unencrypted.json="" data-blogger-escaped-user="" data-blogger-escaped-vault="" data-blogger-escaped-who="">
This is invoked with:
% ./create-iam.rb ./iam-json-data/filename.json
The script iterates over each of the AWS account credentials named in the accounts field of the JSON file named, and loads the credentials from the ~/.aws/config file. Then, it uses the aws-sdk Ruby library to authenticate a connection to AWS IAM API endpoint. This instance object, iam, then uses methods to work with the API to create the group, user, policy, etc. The policy comes from the JSON document as described above. It will create user access keys, and it writes these, along with some other metadata for Chef Vault to a new JSON file that will be loaded and encrypted with the knife encrypt plugin.

As described, it will display a command to copy/paste. This is technical debt, as it was easier than directly working with the Chef Vault API at the time :).

Using Knife Encrypt

After running the script, we have an unencrypted JSON file in the Chef repository's data_bags/vault directory, named for the user created, e.g., data_bags/vault/secret-files_unencrypted.json.
{
  "id": "secret-files",
  "data": {
    "aws_access_key_id": "the access key generated through the AWS API",
    "aws_secret_access_key": "the secret key generated through the AWS API"
  },
  "search_query": "roles:secret-files-server"
}
The knife encrypt command is from the plugin that Chef Vault provides. The output of the create-iam.rb script outputs how to use this:
% knife encrypt create vault an-aws-account-name_secret-files \
  --search 'roles:secret-files-server' \
  --mode client \
  --json data_bags/vault/an-aws-account-name_secret-files_unencrypted.json \
  --admins "`knife user list | paste -sd ',' -`"

Results

After running the create-iam.rb script with the example data file, and the unencrypted JSON output, we'll have the following:
  1. An IAM group in the AWS account named secret-files.
  2. An IAM user named secret-files added to the secret-files.
  3. Permission for the secret-files user to perform any S3 operations
    on the secret-files bucket (and files it contains).
  4. A Chef Data Bag Item named an-aws-account-name_secret-files in the vault Bag, which will have encrypted contents.
  5. All nodes matching the search roles:secret-files-server will be present as clients in the item an-aws-account-name_secret-files_keys (in the vault bag).
  6. All users who exist on the Chef Server will be admins in the an-aws-account-name_secret-files_keys item.
To view AWS access key data, use the knife decrypt command.
% knife decrypt vault secret-files data --mode client
vault/an-aws-account-name_secret-files
    data: {"aws_access_key_id"=>"the key", "aws_secret_access_key"=>"the secret key"}
The way knife decrypt works is you give it the field of encrypted data to encrypt which is why the unencrypted JSON had a field named data created - so we could use that to access any of the encrypted data we wanted. Similarly, we could use search_query instead of data to get the search query used, in case we wanted to update the access list of nodes.

In a recipe, we use the chef-vault cookbook's chef_vault_item helper method to access the content:
require 'chef-vault'
aws = chef_vault_item('vault', 'an-aws-account_secret-files')['data']

Conclusion

I wrote this script to automate the creation of a few dozen IAM users across several AWS accounts. Unsurprisingly, it took longer to test the recipe code and access to AWS resources across the various Chef recipes, than it took to write the script and run it.

Hopefully this is useful for those who are using AWS and Chef, and were wondering how to manage IAM users. Since this is "done" I may or may not get around to releasing a knife plugin.