How advanced is your organization’s information system management? Here’s a simple framework for thinking about your organization’s infrastructure management capabilities:
Level 0: We can’t get it to work.
Level 1: We got it working!
Level 2: We got it working and documented the process so that we can set up similar systems on a repeatable basis.
Level 3: We got it working, documented it thoroughly, and implemented the configuration in an automated system to create similar systems. The configuration is version-controlled, and we can track changes.
Of course, these four levels are a little over-simplified. For example, there’s a state between Level 1 and Level 2 in which you got one system working and documented it, but haven’t tested the process by configuring a fresh instance from scratch. When you’re doing something for the first time, you tend to try a lot of things, and sometimes you can’t be sure which one actually solved the problem. The first draft of the documentation may not reflect the minimum necessary steps to get the system working. Ideally, you’d have a different person follow the procedure on a fresh system, to make sure the documentation contains the “necessary and sufficient” information.
Which level is right for your organization? It depends on what meets the organization’s needs at time. A stable business that depends upon a complex information system needs to operate its core systems at Level 3. However, that same company’s internal R&D may operate at Level 0, 1, or 2. An early-stage start-up that’s building an MVP (minimum viable product) and searching for product/market fit should probably operate at Level 1 or 2. When you have limited capital and are struggling to find product/market fit, getting to Level 3 too quickly might waste precious resources that could go into product development. However, the startup’s management needs to remember that the MVP will incur technical debt that will have to be paid off as the product matures.
Craig Finch & Mike Soule, Rootwork’s experienced infrastructure management consultants, would be happy to help you evaluate your organization’s capabilities and develop a plan to make sure that your information systems meet your needs.
The Open Compute Project (OCP) is valuable to enterprise IT professionals because it embodies the best practices of companies that operate hyperscale computing systems. It’s not often that a business is willing to share information about the practices that are key to their competitiveness. Economical, efficient, and scalable infrastructure is crucial to the success of companies such as Google, Amazon, and Facebook. By studying the Open Compute Project, you can learn about the best practices of computing at hyperscale, and determine which practices can be applied to improve your IT operations.
For years, hyperscale operators have been working directly with original device manufacturers (ODMs) to design and produce hardware that meets their unique needs. In 2012, Google claimed to be one of the largest hardware makers in the world, and had probably been in the server hardware business for years. In 2011, Facebook started the Open Compute Project in an effort to standardize the design of servers and infrastructure for a hyperscale environment. The OCP releases open-source hardware specifications that can be implemented by any ODM. Key design goals include minimizing initial cost and power consumption, and maximizing interoperability and standardization. The hardware is designed to be “vanity-free,” meaning that it does not incorporate any features that are specific to a particular manufacturer. These design goals have led to some interesting departures from industry conventions.
OCP servers are primarily intended to fit into the Open Rack (although 19” servers with OCP-compliant motherboards are now available). This rack has the same floor footprint as a standard 19” rack, but it is very different internally. The rack height is measured in “OpenU.” One OpenU is 48mm, while 1U in a 19” rack is 44.5mm. Three high-current, 12V DC power buses run down the back of the Open Rack. The rack is divided into three “power zones,” each with ten OpenU for system shelves. Each power zone has a 3-OpenU “power shelf” that supplies 4200W of power to the DC power busses in that zone. 2 OpenU at the top of each rack are reserved for a network switch.
A typical OCP server is housed in a deep, narrow “system tray” that contains a motherboard with two CPUs, one hard drive, and fans. The rear of the tray has a power plug that fits into one of the DC power buses in the Open Rack. Three trays can fit side-by-side on a shelf that occupies one OpenU. Alternatively, OCP-compliant servers are available with four nodes in a 2-OpenU unit. OCP servers are capable of operating in an environment with a higher ambient temperature and higher humidity than a typical data center. This capability reduces the cooling requirements for the data center, increasing its energy efficiency and reducing operating costs. Facebook has also released open-source design specifications for its data centers through the Open Compute Project.
The Open Compute Project provides a rare inside look at a set of best practices for hyperscale computing. I encourage you to follow the links to the various OCP specifications, which are concise and easy to read. Of course, most of us don’t have the opportunity to build a hyperscale computing infrastructure from scratch. In future articles, I’ll go into more detail about specific aspects of the project and explain how specific best practices from the OCP can be applied in a typical enterprise setting.