Signs of Triviality

Opinions, mostly my own, on the importance of being and other things.
[homepage] [index] [] [@jschauma] [RSS]

Defining "Operations"

October 15th, 2012

I've recently been thinking about how organizations define "Ops", how it differs from "System Administration", whether or not "Infrastructure Architecture" is part of "Operations", whether or not "Ops" and "Operations" might be different, and of course how and when "Security" enters this whole mix. What made me reflect on this were, aside from the obvious, Christopher Brown's keynote from Velocity Europe 2012 about how the profession of Operations/System Architecture has moved from craft to trade and is on its way to becoming an actual "science":

(Insert standard blog phrase along the lines of: "Go watch it, I'll wait right here. Are you back? Good.")

Likewise interesting and thought provoking was Anil Dash's recent blog entry, The Blue Collar Coder. Anil and Christopher both refer to different levels of sophistication, breadth and depth of knowledge, or expertise to help define how we regard a profession. Both suggest that we are at a point where the respective professions may change, and multiple "tiers" may evolve. All of this rings true to me for the profession of System Administration.

System Administration Education

The impression of being in a "blue collar" job (albeit in the decidedly white collar world of information technology) has long defined, if not haunted many a System Administrator's self image; many (happily begrudgingly) accept a defensive stance of being the undervalued superhero on whose shoulders the entire organization rests. The fact that there is no formal education and that seniority appears to be established by experience -- trial by fire as a (no, not service) rite of passage -- furthers the notion of System Administration as a "craft". Discussions surrounding just what, exactly, makes a System Administrator a "Senior System Administrator" come up regularly; see Tom Limoncelli's recent blog entry or Matt Simmons's wonderfully terse definition:

How do System Administrators learn? How do they advance in their profession? The Job Descriptions for System Administrators booklet published by USENIX/LISA (formerly SAGE) helps define what System Administrators on different levels of seniority should know, but does not describe how this knowledge is achieved. Given that we currently do not have any well-defined career path and only few advanced degrees in the field, the only way appears to be learning the craft through an apprenticeship in order to become a journeyman and ultimately a "master of ones craft". As romantic as this notion seems, however, it dismisses the in my opinion obvious need for a formal education and a common body of knowledge. The reasons that we are still missing out here include the fact that this is a very young profession (and perhaps, as Christopher Brown suggests, we are on the brink of developing into a science, Skiier Hiking which will lead us there), but also to a non-negligible part the fact that not everybody sees a need for a formal education. This is easy to understand, as many senior System Administrators have entered the field without any such requirements, and what's good for the goose is good for the gander and change is bad and all this modern humbug doesn't teach you anything in my time we had to carry the punch cards 10 miles through the snow, uphill, both ways, and we liked it.

Teaching System Administration is difficult, precisely because the profession is not well defined, and so different people derive from their own experiences different priorities about how it should be taught, what novices should learn. Building a degree granting program in the area of System Administration is, I imagine, not easy: core requirements need to be laid out, overlap with other programs identified, resolved and/or accepted, and a clear academic path defined to allow more complex matters to be taught relying on previously covered materials.

"Operations" vs. System Administration

Note that all this time I have been talking about "System Administration", not about "Operations". Would a syllabus for a degree in "Operations" differ from one in "System Administration"? Are the two different? How senior can a System Administrator become and still be considered part of "Ops"? (We all want to pretend that we're architects, don't we?) In the industry -- what those of us in it unfortunately not jokingly refer to as the "real world" -- the distinction between different experience levels is hierarchical/organizational: somewhere in a company's corporate structure we find a subtree labeled "Operations". Depending on the size of the environment, this can entail anything and everything having to do with computer hardware (including "corporate IT" or "helpdesk" functionality); in others it is clearly divided by experience, impact and paygrade. Somewhat generalized, I'm drawing the following definitions:

Ops Hierarchy

That is, I roughly divide the fields of "Operations" into "Architecture" and "Day to Day Operations", each of which can of course be further divided by specialty. Each layer provides more experience and expertise in an area that builds on the knowledge and foundation of the lower layer(s). The larger an organization, the more likely it is to have distinct teams for specialized functions: in very large environments, you would be likely to find teams dedicated to the long term planning of the content delivery system used on the edge of the network, to storage requirements throughout the company, and many others in addition to the ones I included in the graphic above.

On the flip side, the smaller the environment, the more tasks are performed by a single team. Generalizing, I've observed that medium sized companies tend to have one group responsible for the "hands-on" tasks (System- and Network Administration, including any hardware related work, ye olde rackin'n'stackin', cable running, worrying about cooling and the like) with long-term planning being done on a "higher paygrade"; very small environments do not distinguish at all between the different layers, and frequently a very small team of people (sometimes an army of one) is responsible for all of this.

I've also found the more "hands-on" a team is, the more likely it is to identify as "ops" (not "Operations"). I believe that this has once again to do with the perceived blue collar image: the more specialized or advanced a group becomes, the more likely they are to refer to themselves as "Engineers" or eventually as "Architects" -- I have not yet come across a "Principal Net Ops" or "Senior Staff Sys Ops" title.

George Costanza Of course, as has been pointed out by others before, it is curious that in all other disciplines outside of IT/Software, people can't just run around calling themselves "Engineer" or "Architect" and need to follow a well-defined career path and undergo some rigorous testing/licensing. It seems that we have tried to take the meaning and implications of the title -- seniority, skill, experience -- without the tedious pre-requisites. Slick! As a result, the titles have become meaningless, as anybody and everybody can call themselves a "Systems Architect" or an "Infrastructure Engineer". In the end, I believe that the best term for the profession remains "System Administration", but that the organization inside a company retains the title of "Operations" to allow it to accommodate the flexible culture that is a benefit of the lack of a formal education path and to hire at each level skilled people regardless of their previous job title.

What about "Web Operations" and "DevOps"?

Recently, I have observed a shift away in the use of the term "Operations" towards "Web Operations". To me, this seems again a way to distinguish oneself from what is perceived as implying not particularly intellectually challenging work: "Ops" is regarded as grunt work, and somehow "Web Operations" (rarely: "Web Ops") sounds more important. It also has become more closely associated with "DevOps", agile development and operational models, large scale deployments and all that jazz, but in my opinion it is a misnomer. Having "Web Operations" suggests that there are other kind of "Operations" that might follow different principles, yet the practices (code or product deployment, application and system monitoring, capacity planning, incidence response, post-mortem analysis, ...) and required background knowledge (operating system concepts, package management, version control, networking, kernel tuning, ...) were no different if the product was not served via port 80.

Similarly, the term "DevOps" is at times used not (only) to signify a closer collaboration (or possibly the complete removal of any distinction) between developers and operations staff, but, I feel, to allow "Ops" to be more appreciated by management, to be more than "just ops", to break into the "white collar" developer world -- and to demand higher paychecks.

I'm still convinced that even in a world with a clearly defined "Operations" group as shown in the graphic above we can achieve agile procedures that allow developers and operational staff to collaborate closely and successfully. The answer is not necessarily to muddle job descriptions, but to understand how to best utilize subject matter experts. Suppose a product was steered by a single Software Architect (as suggested long ago by Fred Brooks), who regularly consulted with representatives of Operations, as suggested by the Infrastructure Architect:

Product Development

Now in this model, the development process is of course iterative: the project team gets together regularly as progress is being made (and as requirements change). This also primarily covers the initial design and development process from concept to prototype to release, not the ongoing maintenance, feature releases, bug fixes etc., all of which take up around 80% of the total cost of ownership of any given large scale project. Those tasks then fall into what is usually covered by the current DevOps mantra of developers deploying code and reacting to alerts on the one hand...

Software Development Cycle

...and on the other hand? I've found that the encouragement of developers taking over more of the "ops" tasks is dominant, but the logical inverse -- "ops" developing the product and handling more than just minor bug fixes -- to not be as popular. The "Ops" side interpretation appears to mostly revolve around the use of religious automation, configuration management, logging and monitoring and not, as one might expect, around owning the software product and fixing bugs or developing features. But then again, at this point "DevOps" means lots of different things to lots of different people, and I'm sure some will suggest that perhaps I just didn't "get it" (they're probably right).

So wait... what about Security?

Oh sheesh, this is already much too long, but sure, here goes nothing: The model I gave above covers a number of aspects of "Operations", but does not mention "Security". In many organizations, the team in charge of security -- talk about a broad directive! -- sits outside of operations. Often, the security team functions as a blocker^Wapprover before a product is released into production. Well, more realistically, the product is released, the security team notified, and they come in and complain about this and that, wave their arms in the air and mumble something about shutting the whole thing down. This is a sad situation, not aided by the fact that external compliance requirements, generally within the responsibility of the security team, are generally handled only in an ad-hoc the-house-is-on-fire fashion via directives from, again, outside of Operations.

Of course this cannot work: as a result, developers and operational teams frequently view interactions with the Security team as something entirely inconvenient and to be avoided lest those paranoid clowns make things complicated and less agile. My point of view is that in order for any system to succeed, it must have security considerations involved from the very beginning. That is, "security" cannot come from the outside, but must be part of the operations organization. However, I explicitly do not list security in my hierarchy above, because I believe that being aware of good security practices is but one of the many qualities of a good System Administrator. There is no separate layer; understanding security is important on each.

As a Jack/Jill of all Trades, System Administrators in small and medium organizations ought to cover security with as much emphasis or enthusiasm as they do performance or reliability. In larger environments, it is useful to include subject matter experts in the Operations organization, and so in my ideal environment, the project team put together for the development of a new product includes a security conscious member just as it would include somebody with networking expertise etc. Frequently a single person can cover multiple areas, but each team should have a clear idea of whom to approach with questions relating to security.

Furthermore, I believe that in particular the design and deployment of Police Clown more and more automated systems, which allow numerous crucial infrastructure components to exchange information and trigger actions, that these developments require a keen eye for infrastructure or systems security. Too often do we take shortcuts because, well, not surprisingly, they are convenient. But just like it pays off to invest time and effort into developing a tool to accomplish a repeated task, so does it pay off to invest the initial energy to identify and overcome hindrances or annoyances resulting from security related requirements. In this regard, we still have a long way to make "Sec" an equal peer to "Dev" and "Ops".

October 15th, 2012

[We get signal. What!] [Index] [Sandy and I]