The Misconception of DevOps

DevOps is still a term that is widely misunderstood and used in often very inaccurate ways. Even within the context of the conference a number of different uses of the term DevOps was used incorrectly a surprising number of times. It's obvious more effort needs to be put into the clarification of what the term represents so that more people can understand what it is they're looking for.

DevOps is a culture. It's the process of both developers (including QA, release, etc.) and operations (including networking, storage, infrastructure, etc.) coming together and working collaboratively towards common business goals. It's the idea that the application cannot run independent of the infrastructure and that there is a fundamental need to incorporate the automation of testing, QA, deployment, monitoring & capacity planning into all facets of the workflow. Every project that flows through the company needs to incorporate all aspects of the team to produce a cohesive and reliable product.

DevOps is not a position. It's not a department or a one-man show, it describes the teamwork that flows throughout a company that's doing DevOps properly. It's all encompassing and it requires investment on all levels of the corporation.

What a DevOps Company Looks Like

One session presented by Greg Burton and Ori Rowlings from Orbitz (Case Study: How Shifting to a DevOps Culture Enabled Performance and Capacity Improvements) discussed how their company addressed a number of inefficiencies in their process through the use of DevOps methodologies. From the session synopsis, "Starting from our vantage point in a traditional development team, we originally saw site capacity as a distant, Operations issue. As we adjusted our perspective and took ownership of site capacity targets, we began a year of close collaboration and delightful discoveries that improved capacity dramatically. We sympathized with the frustration of database operations engineers who intuitively knew that applications were inefficiently hitting the database. We sympathized with Site Operations engineers who intuitively knew that our lack of rigorous performance testing meant that each release could introduce new inefficiencies that would go undetected." The Orbitz team learned that in order to produce better workflows, they needed to take a holistic approach to projects & include the entire team from top to bottom. Soon they began to realize that each role's contributions directly effected every other role and they all needed to work in harmony to truly understand the project end-to-end.

Adam Jacobs from Chef continued to reinforce this philosophy in his session (How to be Great at Operations) which spoke directly to the operations team and it's role in the mechanism of DevOps. Operations often works closely with QA/testing and deployment engineering to improve the delivery pipeline. The goal is to improve the three "MTT's", Mean Time To Failure (MTTF), Mean Time To Diagnose (MTTD) and Mean Time To Repair/Recover (MTTR). When coupled closely with a QA/testing team that utilizes comprehensive automated tests and a deployment team that automates quick & reliable delivery of code into staging and production, operations teams can help improve the reliability & agility of the entire company workflow. Coupling tests and deployments with dynamic, reliable monitoring tools helps reduce MTTF by a clearer understanding of MTTD. When diagnosis are found quickly they can be acted upon and through the use of efficient testing and delivery mechanisms, quick MTTR can be achieved.

DevOps teams are encouraged to collaborate on postmortems so that entire workflows can be improved. Postmortems are intended to be held quickly (within 24 hours of an event - both positive and negative). They are meant to be succinct and without blame - focused solely on what happened, why and what can be acted upon to improve the system in the future. Incident commanders are also added to the team and can be members of existing teams. Their job is to coordinate resolution efforts by including appropriate experts, sending away those who aren't immediately needed so they can focus on other things (including their personal lives or other work) and communicating status reports to stakeholders (customers, management) so the people resolving the issues can remain focused.

The entire team needs to be in tune with the purpose of the business, it's goals and what the customer is that the company is serving. This isn't just marketing or management's responsibility, but the responsibility of everyone on the team. It's also important to realize that customers aren't just the people that pay for our product. Operation's customers include the paying customer through the requirement for a reliable, responsive application infrastructure. They also include the development team as operations provides them with a stable and reliable infrastructure within which to work. QA, testing and deployment also requires a solid infrastructure and thus are also the customers of the operations team, and so on.

What Makes A Good Culture?

In one word, "trust." When each team member is trusted, they are happy. We've all been hired because we are good at what we do. We're all in various positions because someone believed that is where we could do the most good for the company. We all have a part to play (as described above) in making the company effective and profitable. And we all must trust each other to do the job we've been hired for well. Engaged employees were described in both Adam Jacob's (linked above) session as well as Jennifer Davis' (from Chef - From Hero to Zero) when she discussed the detriment of hero syndrome. Heroes fail to trust others and take undue responsibility upon themselves leading to eventual burnout. They build up expectations that a busy company is only too eager to capitalize on and soon become overwhelmed. Their personal lives and personal health degrade to the point they are no longer able to be effective (or worse). It is important for everyone in the company to trust each other's abilities and build up those who need help so they too can thrive. It is important for each of us to recognize problems of heroism and step in to share the load. Proper project management and planning should be done so that realistic expectations can be set and achieved.

Some additional ways to improve the team are through the use of consistent workflows, automated testing frameworks that introduce confidence, comprehensive (but concise) documentation, proper planning and the reduction of noise in monitoring (more on this in another blog post). Having healthy, up-beat work environments where innovation and ideas are encouraged creates an atmosphere that satisfaction and contentment can thrive in.

Overall companies around the world are struggling to overcome the traditional silos that made up the corporation and work towards a more inclusive and collaborative culture. Each company deals with this in different ways from the very smallest to the very largest of organizations. Silos are for dinosaurs. Figuring out what DevOps really means is just a step to putting a name on the kind of work place culture that I think we're all striving for.