Rootwork

Posted on May 21, 2019May 21, 2019 by craig

Feedback Loops in Amazon Simple Email Service (SES) (Part 2)

In Part 1, I explained the general feedback mechanisms that are available in Amazon Simple Email Service (SES), and compared their strengths and weaknesses. Now I will explain the different configuration methods that can be used to select a feedback mechanism.

Domain-Level SNS Configuration

You can configure Simple Notification Service (SNS) feedback for an entire domain. Select one of your domains and expand the Notifications section. You can select an SNS topic for bounces, complaints, and deliveries. This is the most straightforward way to enable SNS feedback if you want to treat all email for a domain in exactly the same way.

Address-Specific SNS Configuration

In some cases, you should not treat all email in the domain in the same way. For example, you might have different software environments (dev, staging, production) that use the same domain, but send from different email addresses. Email bounces from Dev should go to developers, bounces from Staging to QA, and bounces from Production to the customer service team. You also might use the same domain for email marketing (not a great idea), and that feedback should go to the marketing team. You can configure each verified email address in SES with a different set of SNS topics.

Configuration Sets

SES Configuration Sets are my least favorite option. However, configuration sets are the only way to send email statistics to CloudWatch or to S3 via Kinesis Firehose, so sometimes you are stuck with using them. The application of a Configuration Set to an email message is determined by setting a special header when submitting the email to SES:

X-SES-CONFIGURATION-SET: ConfigSet

There are several problems with Configuration Sets:

Not transparent. In the SES interface, you can see the configuration sets, but you can’t see to which emails each set is being applied.
Too easy for a server admin to accidentally break the SMTP server configuration and disable monitoring.
A server admin has to get involved if the configuration set needs to be changed.
Configuring SMTP servers to set customer headers may be tricky.

I recommend configuring each domain or specific email addresses with SNS topics. Only use configuration sets for getting email statistics into a CloudWatch dashboard.

Posted on May 6, 2019May 4, 2019 by craig

How to “Level Up” your Information System Management

How advanced is your organization’s information system management? Here’s a simple framework for thinking about your organization’s infrastructure management capabilities:

Level 0: We can’t get it to work.

Level 1: We got it working!

Level 2: We got it working and documented the process so that we can set up similar systems on a repeatable basis.

Level 3: We got it working, documented it thoroughly, and implemented the configuration in an automated system to create similar systems. The configuration is version-controlled, and we can track changes.

Of course, these four levels are a little over-simplified. For example, there’s a state between Level 1 and Level 2 in which you got one system working and documented it, but haven’t tested the process by configuring a fresh instance from scratch. When you’re doing something for the first time, you tend to try a lot of things, and sometimes you can’t be sure which one actually solved the problem. The first draft of the documentation may not reflect the minimum necessary steps to get the system working. Ideally, you’d have a different person follow the procedure on a fresh system, to make sure the documentation contains the “necessary and sufficient” information.

Which level is right for your organization? It depends on what meets the organization’s needs at time. A stable business that depends upon a complex information system needs to operate its core systems at Level 3. However, that same company’s internal R&D may operate at Level 0, 1, or 2. An early-stage start-up that’s building an MVP (minimum viable product) and searching for product/market fit should probably operate at Level 1 or 2. When you have limited capital and are struggling to find product/market fit, getting to Level 3 too quickly might waste precious resources that could go into product development. However, the startup’s management needs to remember that the MVP will incur technical debt that will have to be paid off as the product matures.

Craig Finch & Mike Soule, Rootwork’s experienced infrastructure management consultants, would be happy to help you evaluate your organization’s capabilities and develop a plan to make sure that your information systems meet your needs.

Posted on April 26, 2019April 26, 2019 by craig

Amazon Simple Email Service Feedback Loops

Amazon Simple Email Service (SES) is a lean, low-cost option to send bulk email. However, the low cost of SES comes at a cost; it lacks many features that are built into SendGrid or other email services. You have to build your own “features” with combinations of other AWS services. The figure below summarizes some useful integrations that provide monitoring and feedback loops.

Diagram showing how Amazon SES integrates with other services to monitor email sending, bounces, and abuse complaints.

CloudWatch

The Dashboard you see in SES is very primitive, with a very limited time series of data. CloudWatch allows you to create your own dashboard to monitor critical parameters, such as bounce and complaint rates. SES will put you on probation or terminate your service if you are suspected of sending spam. CloudWatch also lets you set events that will be triggered when a threshold is crossed.

Kinesis Firehose to S3

Kinesis Firehose is a “data bus.” It accepts data from various sources, transforms it (if necessary), and passes it on to other endpoints. In the context of SES, you can set up a Kinesis Firehose to push email notifications into Amazon S3 storage. It automatically creates a datetime-based folder hierarchy within S3 and stores a JSON file for each event. This maze of folders makes it rather hard to navigate the data by hand.

Logging email events to S3 is not very useful, in my experience, but there might be use cases that I haven’t thought of. Maybe you could set up an AWS Lambda function to watch for abuse notifications that appear in a bucket and take some action-but if you are doing that, you might as well use SNS to send directly to Lambda.

Simple Notification Service

Simple Notification Service, or SNS, is probably the most useful destination for SES data. A SES configuration set can send data directly to an SNS topic. An SNS subscription can listen to one or more topics and send data to a destination, which is one of the following:

HTTP
HTTPS
Email
Email (JSON)
Amazon SQS
AWS Lambda
Platform application endpoint
SMS

Personally, I like the HTTPS endpoint the best. You build an API endpoint that receives email feedback, and SNS will POST JSON to your endpoint every time an email bounces or a complaint is registered. Then, your application take immediate action to flag the address and stop sending to it, which is good for your sending reputation.

Posted on November 18, 2018 by craig

Wearing the CISO Hat

As someone who works in the CIO/CTO role in smaller organizations, I often have to act as the CISO and sometimes even as the compliance officer.

Disadvantages

As someone who is not a security specialist, there are a number of disadvantages to taking on the role of CISO. The most obvious problem is that I’m working outside my area of expertise, and potentially facing adversaries who are very skilled. I could be found liable if it can be shown that I was negligent in some way. Sometimes I have to spend a lot of time researching an area I’m not familiar with, which lowers my productivity. All of these factors can make my job more stressful. However, for startups and small businesses, sometimes even a virtual CISO is too expensive, and I have to find ways to make the best of the situation.

Another struggle, which most security leaders can probably relate to, is the difficulty of convincing the business to allocate scarce resources to security. If you invest developer time in a new product, it’s easy to see the investment pay off in revenue from the product. It’s not easy to see how investments in security pay off because you can’t account for the cost of a breach that never happened. When security is a separate team, the debate about allocating resources can take place once a year in the C-suite. When security is one role among many of the technology team, the resource allocation debate happens in every weekly planning meeting, as I try to convince business leaders to allocate time to security or compliance instead of revenue-producing projects. Business leaders can get frustrated when they see the cost of security and compliance on a daily basis, but never see quantitative benefits.

Since I sometimes have to take responsibility for security, I have to find ways to make it work. I have learned to quickly recognize what I don’t know, and to be comfortable stating what I don’t know to my peers and clients. The business decision makers need to understand when I’m doing my best under the circumstances, and better results could be obtained if we could allocate resources to bring in other expertise. I also need to acknowledge when I’m totally out of my depth and make sure that I get the resources I need. Despite the disadvantages of having one person be the CIO/CTO and CISO, the situation does create a unique opportunity.

Opportunities

As CIO/CTO and CISO, I have the opportunity to create a security culture throughout the technology team and lay a strong foundation for a future CISO. Most importantly, I can set an expectation that security and productivity are not mutually exclusive. A security team that operates in a “silo,” out of touch with the rest of the organization, can be more dangerous than useful. When the security team’s only goal is security, they can just say no to every request, and implement controls that impact the ability of everyone else to do their jobs effectively. When faced with the choice of following security rules or getting their job done, most employees find ways to work around security controls.

I saw one glaring example at a branch office of a large global corporation. For “security” reasons that nobody could explain (because all the IT and security decision makers were far away), all LAN traffic was actually routed over the WAN to the central IT office. As a result, file transfers on the LAN were incredibly slow, so the office was infested with unencrypted USB drives (which were forbidden by policy). The “secure” network design actually made the company less secure. Whenever I’m training employees on security and compliance, I emphasize that every process in the organization can be carried out in a way that is secure, compliant, and efficient. If they ever find themselves unable to carry out their duties effectively because of security or compliance controls, they should come talk to me, and we’ll find a better process.

Conclusions

It’s not ideal to have the same person fill the roles of CIO, CTO, and CISO, but sometimes business conditions make it unavoidable. It’s better to have someone be responsible for security than to have no one be responsible for security. It’s better to conduct an imperfect risk assessment than none at all, and it’s better to have weak plans or incomplete policies than to have none. In the worst-case scenario, honest efforts will play out far better than willful ignorance in front of a federal regulator or a jury. If you are the CIO or CTO and you aren’t able to hire a CISO or virtual CISO, don’t be afraid to start taking responsibility for security. Somebody has to do it!

Posted on March 20, 2016 by craig

Encrypt laptops to avoid HIPAA violations

Have you encrypted the hard drive or solid-state drive (SSD) on every laptop in your organization? If your business or nonprofit organization handles Protected Health Information (PHI), you should encrypt every laptop as soon as possible! Laptops that store PHI may be one of the greatest vulnerabilities in any organization. Just one stolen laptop can cost millions of dollars in legal fees and fines. Last week, two major major settlements were announced that could have been prevented with proper laptop encryption. North Memorial Health Care of Minnesota agreed to pay $1.55 million, and Feinstein Institute for Medical Research agreed to pay $3.9 million to settle charges that they potentially violated the HIPAA Privacy and Security Rules. I will summarize both cases and explain the simple steps that you can take to protect your organization.
Continue reading Encrypt laptops to avoid HIPAA violations

Posted on July 17, 2015September 13, 2018 by craig

Write-only bucket policy example for Amazon S3

Amazon S3 is widely used as a repository for backups. Backups are an important aspect of a resilient system, but they are also a potential security vulnerability. Unauthorized access to a database backup is still a PCI or HIPAA violation. The permissions on Amazon S3 should be configured to minimize access to critical backups. My strategy is to use IAM to create a backup user with no password (cannot log in to AWS) and a single access key. The backup user’s access key is used to put backup files into S3. The S3 bucket policy is configured to allow “write-only” access for the backup user. The backups cannot be obtained, even if the backup user’s credentials are compromised.

It is fairly difficult to figure out how to create a “write only” bucket policy. The policy shown below is the “write only” policy that I use. It consists of two statements: BucketPermissions gives a user ability to locate the bucket (necessary to do anything in the bucket) and list its contents (to verify that a backup was written). You may remove the s3:ListBucket action if true write-only access is desired. The statement called ObjectPermissions allows the user to create objects in the specified bucket.

{
 "Version": "2012-10-17",
 "Id": "YOUR_POLICY_ID_NUMBER",
 "Statement": [
 {
 "Sid": "BucketPermissions",
 "Effect": "Allow",
 "Principal": {
 "AWS": "arn:aws:iam::YOUR_ACCOUNT_ID:user/USERNAME"
 },
 "Action": [
 "s3:ListBucket",
 "s3:GetBucketLocation"
 ],
 "Resource": "arn:aws:s3:::BUCKET_NAME"
 },
 {
 "Sid": "ObjectPermissions",
 "Effect": "Allow",
 "Principal": {
 "AWS": "arn:aws:iam::YOUR_ACCOUNT_ID:user/USERNAME"
 },
 "Action": [
 "s3:PutObjectAcl",
 "s3:PutObject"
 ],
 "Resource": "arn:aws:s3:::BUCKET_NAME/*"
 }
 ]
}

S3 does one odd thing: this policy allows the user to verify that a particular object exists in S3, even if they don’t have permission to GET the object. For example, running the command:

s3cmd get s3://BUCKET_NAME/PATH/BACKUP_FILE_NAME.tgz

Produces the output:

s3://BUCKET_NAME/PATH/BACKUP_FILE_NAME.tgz -> ./BACKUP_FILE_NAME.tgz
ERROR: S3 error: Unknown error

It appears that the file is being downloaded, but in fact only an empty file is created! While it is generally a bad idea to allow unauthorized users to guess file names, it is not a real problem in this case, because the backup user’s credentials would have to be compromised even to confirm the existence of a file stored in S3.

References

Posted on January 20, 2015 by craig

The Illustrated Guide to SSH Port Forwarding

SSH is a powerful tool for accessing remote systems. This guide will illustrate one of the more confusing and poorly documented capabilities of the ssh command on Linux: port forwarding. Port forwarding is a way to “tunnel” any TCP protocol through a secure, encrypted SSH connection. It can also be used to make network connections transparent to the applications that are using them. The diagram below shows a user with an application running on a local machine (Client), such as a laptop. The app needs to interact with a server hosted on a remote host (Protected) which is isolated behind a login node (Login). This situation may occur when a user wants to run a management or admin GUI for a database such as MySQL or MongoDB. In a production environment, a database server is never exposed directly to the internet. Database connections on a private network are often unencrypted to maximize speed. SSH port forwarding can be used to connect the GUI to a database on a remote server. Forwarding is also used for running visualization applications on a GPU node that is located behind the login node on a high-performance computing cluster.

Continue reading The Illustrated Guide to SSH Port Forwarding

Posted on October 19, 2014 by craig

Knowing about Open Compute can help you make better decisions about your infrastructure

The Open Compute Project (OCP) is valuable to enterprise IT professionals because it embodies the best practices of companies that operate hyperscale computing systems. It’s not often that a business is willing to share information about the practices that are key to their competitiveness. Economical, efficient, and scalable infrastructure is crucial to the success of companies such as Google, Amazon, and Facebook. By studying the Open Compute Project, you can learn about the best practices of computing at hyperscale, and determine which practices can be applied to improve your IT operations.

For years, hyperscale operators have been working directly with original device manufacturers (ODMs) to design and produce hardware that meets their unique needs. In 2012, Google claimed to be one of the largest hardware makers in the world, and had probably been in the server hardware business for years. In 2011, Facebook started the Open Compute Project in an effort to standardize the design of servers and infrastructure for a hyperscale environment. The OCP releases open-source hardware specifications that can be implemented by any ODM. Key design goals include minimizing initial cost and power consumption, and maximizing interoperability and standardization. The hardware is designed to be “vanity-free,” meaning that it does not incorporate any features that are specific to a particular manufacturer. These design goals have led to some interesting departures from industry conventions.

OCP servers are primarily intended to fit into the Open Rack (although 19” servers with OCP-compliant motherboards are now available). This rack has the same floor footprint as a standard 19” rack, but it is very different internally. The rack height is measured in “OpenU.” One OpenU is 48mm, while 1U in a 19” rack is 44.5mm. Three high-current, 12V DC power buses run down the back of the Open Rack. The rack is divided into three “power zones,” each with ten OpenU for system shelves. Each power zone has a 3-OpenU “power shelf” that supplies 4200W of power to the DC power busses in that zone. 2 OpenU at the top of each rack are reserved for a network switch.

OCP triplet server image — OCP triplet server

A typical OCP server is housed in a deep, narrow “system tray” that contains a motherboard with two CPUs, one hard drive, and fans. The rear of the tray has a power plug that fits into one of the DC power buses in the Open Rack. Three trays can fit side-by-side on a shelf that occupies one OpenU. Alternatively, OCP-compliant servers are available with four nodes in a 2-OpenU unit. OCP servers are capable of operating in an environment with a higher ambient temperature and higher humidity than a typical data center. This capability reduces the cooling requirements for the data center, increasing its energy efficiency and reducing operating costs. Facebook has also released open-source design specifications for its data centers through the Open Compute Project.

The Open Compute Project provides a rare inside look at a set of best practices for hyperscale computing. I encourage you to follow the links to the various OCP specifications, which are concise and easy to read. Of course, most of us don’t have the opportunity to build a hyperscale computing infrastructure from scratch. In future articles, I’ll go into more detail about specific aspects of the project and explain how specific best practices from the OCP can be applied in a typical enterprise setting.

Posted on October 14, 2014 by craig

Rootwork featured on Open Compute Project panel discussion

Rootwork consultant and co-founder Craig Finch will participate in a panel discussion about the Open Compute Project at the Data Center World conference, which takes place in Orlando, FL from October 19-22, 2014. The panel discussion (TRD-8) will take place from 4:30-5:30PM on Tuesday October 21 in Barrel Spring 1.

Posted on May 20, 2014 by craig

Craig Finch writes feature article for HPCwire

Craig Finch has written a feature article for HPCwire about how outdated system administration practices are holding back the growth of high performance computing.