Five Pillars of the AWS Well-Architected Framework
The AWS Well-Architected Framework helps cloud architects build secure, high-performing, resilient and efficient infrastructure for their applications and workloads.
The five pillars are Operational Excellence, Security, Reliability, Performance Efficiency and Cost-Optimization.
Operational Excellence (automation)
- Automate processes and tools through config files.
- Identify metrics that align with business goals.
- Analyze metrics (statistics, custom queries/reports).
- Actions (trigger alarms, create dashboard, make data-driven product decisions).
Security (zero trust)
Identify and Access Management (IAM)
The IAM permission model enforces access boundaries. Every agent should only have the minimal permissions necessary to accomplish their function.
There are three fundamental components to an IAM policy:
- Principals: specifies WHO permissions are given to (identity-based policy).
- Actions: specifies WHAT is being performed.
- Resources: specifies WHICH properties are being accessed (resource-based policy).
Amazon Virtual Private Cloud (VPC) is a core AWS network that we can define and provision resources into. Here are some of the components that make up the VPC:
- Subnets: A range of IP addresses within our VPC.
- Route tables: A set of rules that determine where traffic is directed.
- Internet Gateway: A component that allows communication between resources inside our VCP and the internet.
To safeguard traffic in our VCPs, we can:
- Divide our resources into public-facing resources and internal resources.
- Use a proxy service like the application load balancer to handle all internet-facing traffic (reduces the attack surface).
- Provision internal surfaces like servers and databases inside internal subnets that are cut off from direct public internet access.
- Use the AWS Web Application Firewall (WAF) to further restrict traffic into our network.
Resource Level Security
Individual AWS resources have configurable network security contols. The most common control is known as security group. We can use security groups to only allow traffic from specific ports and trusted resources.
Data encryption is the process of encoding information in such a way that it is unintelligible to any third party that does not possess the key necessary to decypher the data. Adopting a zero trust model for data means encrypting our data everywhere, both in transit nad at rest.
- Encryption in transit involves encryting the data as it travels between systems. E.g. We can use the Application Load Balancer (ALB) to enforce a connection over HTTPS to our endpoints.
- Encryption at Rest involves encrypting the data within systems. Most storage and database services integrate directly with the Amazon Key Management Service (KMS) which lets us create Customer Managed Keys (CMK) to encrypt our data.
Reliability (blast radius)
- Use fault isolation zones to limit the blast radius. AWS has fault isolation zones at three levels: Resource and Request, Availability zone and Region.
- Resources and requests are partitions on a given dimension like the resource ID by AWS services. These partitions are called cells, which are designed to be independant and contain failures inside themselves.
- AWS availability zones are seperate facilities in different geographical locations, to protect against failure from environmental hazards such as fires and floods.
- Regions are autonomous data centers each two or more availability zones.
- Use limits. Soft limits which can be increased by requesting an increase from AWS, and hard limits that cannot be increased. Limits are constraints that can be applied to protect a service from excessive load. Monitor limits for services we're using and plan our limit increases accordingly to avoid service disruption.
Performance Efficiency (cattle, not pets)
Selection is the ability to choose the service that most closely aligns with our workload. The typical workload usually requires selection across four main service categoires in AWS: Compute (service that processes data, like virtual machines), storage (static storage of data like an object store), database (organized storage of data like relational databases) and networks (how our data moves around, like a content delivery network).
- Implementing a workload on AWS involves selecting services across the compute, storage, database and network categories.
- Within each category, we must select the right type of service based on our use-case.
- Within each type, we can select the specific service based on our desired degree of management.
- Within each service, we can select the specific configuration based on the specific performance charateristics we want to acheive.
Vertical Scaling involves upgrading our underlying compute to a bigger instance type. For example, if we're running a t3.small instance, vertically scaling this instance might be upgrading it to a t3.large. Scaling vertically is simpler operationally but represents an availability risk and has lower limits.
Horizontal Scaling involves increasing the number of underlying instances. For example, if we're running a t3.small instance, horizonally scaling this instance would involve provisioning two additional t3.small instances. Scaling horizontally requires more overhead but comes with much better reliability and much higher limits.
Cost Optimization (pay as you go over one-time purchase)
Pay for use
AWS services have a pay for use model where we only pay for the capacity that we use. For common ways to optimize our cloud spend when we pay for use are:
- Right sizing: Picking the right instance size and family.
- Serverless: You only pay for what you use. When our case permits, choosing serverless can be the most cost-effective way of building our service.
- Reservations: Committing to paying for a certain amount of capacity in exchange for a significant discount.
- Spot instances: We can take advantage of unused capacity to run instances at a 90% discount, but capacity providers can reclaim the capacity at any moment.
Cost Optimization Lifecycle
The cost optimization lifecycle is the continuous process of improving our cloud spend over item. It involves a three-step workflow:
- Review: Before optimizing our cloud spend, we need to first understand where it's coming from. AWS Cost Explorer can help us visualize and review our cloud spend over time.
- Track: Once we have an overview of our overall cloud spend, we can start grouping it along dimensions that we care about (using cost allocation tags). Common tags categories: App ID, Business Unit, Resource Owner etc.
- Optimize: Optimize using "pay for use" techniques mentioned in the previous section.