Day 1 of DevOps
Create a CloudWatch alarm that sends an email using SNS notification when CPU Utilization is more than 70%.
Creating a Status Check Alarm to check System and Instance failure and send an email using SNS notification
Three-way Solution:
· AWS Console
· AWS CLI
·Terraform
· AWS CloudWatch is a monitoring service to monitor AWS resources, as well as the applications that run on AWS.
(For additional information follow the official page)
· What is Amazon CloudWatch? — Amazon CloudWatch
· We can use CloudWatch to collect and track metrics, which are variables you can measure for your resources and applications.
EC2 Detailed Monitoring:
CloudWatch Custom Metrics:
EC2/Host Level Metrics that CloudWatch monitors by default consist of
· CPU
· Network
· Disk
Status Check
There are two types of status check:
System status check:
Monitor the AWS System on which your instance runs. It either requires AWS involvement to repair or you can fix it by yourself by just stop/start the instance (in case of EBS volumes). Examples of problems that can cause system status checks to fail
ü Loss of network connectivity
ü Loss of system power
ü Software issues on the physical host
ü Hardware issues on the physical host that impact network reachability
Instance status check:
· Monitor the software and network configuration of an individual instance. It checks/detects problems that require your involvement to repair.
ü Incorrect networking or start-up configuration
ü Exhausted memory
ü Corrupted filesystem
ü Incompatible kernel
· Memory/RAM utilization is custom metrics.
· By default, EC2 monitoring is 5 minutes intervals but we can always enable detailed monitoring (1 minutes interval, but that will cost you some extra $$$)
Reference:
Amazon CloudWatch Pricing — Amazon Web Services (AWS)
P.S: CloudWatch can be used on premise too. We just need to install the SSM (System Manager) and CloudWatch agent.
Scenario1:
We want to create a CloudWatch alarm that sends an email using SNS notification when CPU Utilization is more than 70%
Solution1: Setup a CPU Usage Alarm using the AWS Management Console
Open the CloudWatch console at https://console.aws.amazon.com/cloudwatch/.
In the navigation pane, choose Alarms, Create Alarm.
Go to Metric → Select metric → EC2 → Per-Instance-Metrics → CPU Utilization → Select metric
Define the Alarm as follows*
Type the unique name for the alarm (e.g.: High CPU Utilization Alarm)
* Description of the alarm* Under whenever, choose >= and type 70, for type 2. This specify that the alarm is triggered if the CPU usage is above 70% for two consecutive sampling period
* Under Additional settings, for treat missing data as, choose bad (breaching threshold), as missing data points may indicate that the instance is down
* Under Actions, for whenever this alarm, choose state is alarm. For Send notification to select an existing SNS topic or create a new one
* To create a new SNS topic, choose new list, for send notification to type a name of SNS topic (for e.g.: High CPU Utilization Threshold) and for email list type a comma-separated list of email addresses to be notified when the alarm changes to the ALARM state.
* Each email address is sent to a topic subscription confirmation email. You must confirm the subscription before notifications can be sent.
* Click on Create Alarm
Solution2: Setup CPU Usage Alarm using the AWS CLI
· Create an alarm using the put-metric-alarm command
·
aws CloudWatch put-metric-alarm — alarm-name cpu-mon — alarm-description “Alarm when CPU exceeds 70 percent” — metric-name CPU Utilization — namespace AWS/EC2 — statistic Average — period 300 — threshold 70 — comparison-operator GreaterThanThreshold — dimensions “Name=InstanceId,Value=i-12345678” — evaluation-periods 2 — alarm-actions arn:aws:sns:us-east-1:111122223333:MyTopic — unit Percent
· Using the command line, we can test the Alarm by forcing an alarm state change using a set-alarm-state command
· Change the alarm-state from INSUFFICIENT_DATA to OK
·
# aws cloudwatch set-alarm-state — alarm-name “cpu-monitoring” — state-reason “initializing” — state-value OK
· Change the alarm-state from OK to ALARM
·
# aws cloudwatch set-alarm-state — alarm-name “cpu-monitoring” — state-reason “initializing” — state-value ALARM
· Check if you have received an email notification about the alarm
Solution3: Setup CPU Usage Alarm using the Terraform
#cloudwatch.tfresource "aws_cloudwatch_metric_alarm" "cpu-utilization" {
alarm_name = "high-cpu-utilization-alarm" comparison_operator = "GreaterThanOrEqualToThreshold" evaluation_periods = "2" metric_name = "CPUUtilization" namespace = "AWS/EC2" period = "120" statistic = "Average" threshold = "80" alarm_description = "This metric monitors ec2 cpu utilization" alarm_actions = [ "${aws_sns_topic.alarm.arn}" ]dimensions { InstanceId = "${aws_instance.my_instance.id}" }}
GITHUB link: jeeva0406/devops-learning (github.com)
Scenario2: Create a status check alarm to notify when an instance has failed a status check
Solution1: Creating a Status Check Alarm Using the AWS Console
1. Open the Amazon EC2 console at https://console.aws.amazon.com/ec2/.
2. In the navigation pane, choose Instances.
3. Select the instance, choose the Status Checks tab, and choose to Create Status Check Alarm.
* You can create new SNS notification or use the exisiting one(I am using the existing one create in earlier example of high CPU utilization)
* In Whenever, select the status check that you want to be notified about(options Status Check Failed(Any), Status Check Failed(Instance) and Status Check Failed(System)
* In For at least, set the number of periods you want to evaluate and in consecutive periods, select the evaluation period duration before triggering the alarm and sending an email.
* In Name of alarm, replace the default name with another name for the alarm.
* Choose Create Alarm.
Solution2: To create a status check alarm via AWS CLI
· Use the put-metric-alarm command to create the alarm
aws cloudwatch put-metric-alarm — alarm-name StatusCheckFailed-Alarm-for-test-instance — metric-name StatusCheckFailed — namespace AWS/EC2 — statistic Maximum — dimensions Name=InstanceId, Value=i-1234567890abcdef0 — unit Count — period 300 — evaluation-periods 2 — threshold 1 — comparison-operator Greater than or Equal to Threshold — alarm-actions arn:aws:sns:us-west-2:111122223333:my-sns-topic
Solution3: To create a status check alarm via Terraform
resource "aws_cloudwatch_metric_alarm" "instance-health-check" {
alarm_name = "instance-health-check" comparison_operator = "GreaterThanOrEqualToThreshold" evaluation_periods = "1" metric_name = "StatusCheckFailed" namespace = "AWS/EC2" period = "120" statistic = "Average" threshold = "1" alarm_description = "This metric monitors ec2 health status" alarm_actions = [ "${aws_sns_topic.alarm.arn}" ]dimensions { InstanceId = "${aws_instance.my_instance.id}" }}
Different use cases using CloudWatch:
Master in cloudwatch links :
Mastering AWS CloudWatch: A Step-by-Step Tutorial for Beginners (cto.ai)