Day 1 of DevOps

Jeeva-AWSLabsJourney
6 min readApr 4, 2023

--

Create a CloudWatch alarm that sends an email using SNS notification when CPU Utilization is more than 70%.

Creating a Status Check Alarm to check System and Instance failure and send an email using SNS notification

Three-way Solution:

· AWS Console

· AWS CLI

·Terraform

What is CloudWatch?
What is CloudWatch?

· AWS CloudWatch is a monitoring service to monitor AWS resources, as well as the applications that run on AWS.

(For additional information follow the official page)

· What is Amazon CloudWatch? — Amazon CloudWatch

· We can use CloudWatch to collect and track metrics, which are variables you can measure for your resources and applications.

EC2 Detailed Monitoring:

CloudWatch Custom Metrics:

EC2/Host Level Metrics that CloudWatch monitors by default consist of

· CPU

· Network

· Disk

Status Check

There are two types of status check:

System status check:

Monitor the AWS System on which your instance runs. It either requires AWS involvement to repair or you can fix it by yourself by just stop/start the instance (in case of EBS volumes). Examples of problems that can cause system status checks to fail

ü Loss of network connectivity

ü Loss of system power

ü Software issues on the physical host

ü Hardware issues on the physical host that impact network reachability

Instance status check:

· Monitor the software and network configuration of an individual instance. It checks/detects problems that require your involvement to repair.

ü Incorrect networking or start-up configuration

ü Exhausted memory

ü Corrupted filesystem

ü Incompatible kernel

· Memory/RAM utilization is custom metrics.

· By default, EC2 monitoring is 5 minutes intervals but we can always enable detailed monitoring (1 minutes interval, but that will cost you some extra $$$)

Reference:

Amazon CloudWatch Pricing — Amazon Web Services (AWS)

P.S: CloudWatch can be used on premise too. We just need to install the SSM (System Manager) and CloudWatch agent.

Scenario1:

We want to create a CloudWatch alarm that sends an email using SNS notification when CPU Utilization is more than 70%

Solution1: Setup a CPU Usage Alarm using the AWS Management Console

Open the CloudWatch console at https://console.aws.amazon.com/cloudwatch/.

In the navigation pane, choose Alarms, Create Alarm.

Go to Metric → Select metric → EC2 → Per-Instance-Metrics → CPU Utilization → Select metric

Define the Alarm as follows*
Type the unique name for the alarm (e.g.: High CPU Utilization Alarm)
* Description of the alarm* Under whenever, choose >= and type 70, for type 2. This specify that the alarm is triggered if the CPU usage is above 70% for two consecutive sampling period
* Under Additional settings, for treat missing data as, choose bad (breaching threshold), as missing data points may indicate that the instance is down
* Under Actions, for whenever this alarm, choose state is alarm. For Send notification to select an existing SNS topic or create a new one
* To create a new SNS topic, choose new list, for send notification to type a name of SNS topic (for e.g.: High CPU Utilization Threshold) and for email list type a comma-separated list of email addresses to be notified when the alarm changes to the ALARM state.
* Each email address is sent to a topic subscription confirmation email. You must confirm the subscription before notifications can be sent.
* Click on Create Alarm

Solution2: Setup CPU Usage Alarm using the AWS CLI

· Create an alarm using the put-metric-alarm command

·

aws CloudWatch put-metric-alarm — alarm-name cpu-mon — alarm-description “Alarm when CPU exceeds 70 percent” — metric-name CPU Utilization — namespace AWS/EC2 — statistic Average — period 300 — threshold 70 — comparison-operator GreaterThanThreshold — dimensions “Name=InstanceId,Value=i-12345678” — evaluation-periods 2 — alarm-actions arn:aws:sns:us-east-1:111122223333:MyTopic — unit Percent

· Using the command line, we can test the Alarm by forcing an alarm state change using a set-alarm-state command

· Change the alarm-state from INSUFFICIENT_DATA to OK

·

# aws cloudwatch set-alarm-state — alarm-name “cpu-monitoring” — state-reason “initializing” — state-value OK

· Change the alarm-state from OK to ALARM

·

# aws cloudwatch set-alarm-state — alarm-name “cpu-monitoring” — state-reason “initializing” — state-value ALARM

· Check if you have received an email notification about the alarm

Solution3: Setup CPU Usage Alarm using the Terraform

#cloudwatch.tfresource "aws_cloudwatch_metric_alarm" "cpu-utilization" {
  alarm_name                = "high-cpu-utilization-alarm"  comparison_operator       = "GreaterThanOrEqualToThreshold"  evaluation_periods        = "2"  metric_name               = "CPUUtilization"  namespace                 = "AWS/EC2"  period                    = "120"  statistic                 = "Average"  threshold                 = "80"  alarm_description         = "This metric monitors ec2 cpu utilization"  alarm_actions             = [ "${aws_sns_topic.alarm.arn}" ]dimensions {    InstanceId = "${aws_instance.my_instance.id}"  }}

GITHUB link: jeeva0406/devops-learning (github.com)

Scenario2: Create a status check alarm to notify when an instance has failed a status check

Solution1: Creating a Status Check Alarm Using the AWS Console

1. Open the Amazon EC2 console at https://console.aws.amazon.com/ec2/.

2. In the navigation pane, choose Instances.

3. Select the instance, choose the Status Checks tab, and choose to Create Status Check Alarm.

* You can create new SNS notification or use the exisiting one(I am using the existing one create in earlier example of high CPU utilization)
* In
Whenever, select the status check that you want to be notified about(options Status Check Failed(Any), Status Check Failed(Instance) and Status Check Failed(System)
* In
For at least, set the number of periods you want to evaluate and in consecutive periods, select the evaluation period duration before triggering the alarm and sending an email.
* In
Name of alarm, replace the default name with another name for the alarm.
* Choose
Create Alarm.

Solution2: To create a status check alarm via AWS CLI

· Use the put-metric-alarm command to create the alarm

aws cloudwatch put-metric-alarm — alarm-name StatusCheckFailed-Alarm-for-test-instance — metric-name StatusCheckFailed — namespace AWS/EC2 — statistic Maximum — dimensions Name=InstanceId, Value=i-1234567890abcdef0 — unit Count — period 300 — evaluation-periods 2 — threshold 1 — comparison-operator Greater than or Equal to Threshold — alarm-actions arn:aws:sns:us-west-2:111122223333:my-sns-topic

Solution3: To create a status check alarm via Terraform

resource "aws_cloudwatch_metric_alarm" "instance-health-check" {
  alarm_name                = "instance-health-check"  comparison_operator       = "GreaterThanOrEqualToThreshold"  evaluation_periods        = "1"  metric_name               = "StatusCheckFailed"  namespace                 = "AWS/EC2"  period                    = "120"  statistic                 = "Average"  threshold                 = "1"  alarm_description         = "This metric monitors ec2 health status"  alarm_actions             = [ "${aws_sns_topic.alarm.arn}" ]dimensions {    InstanceId = "${aws_instance.my_instance.id}"  }}

Different use cases using CloudWatch:

scenarios to practice

Master in cloudwatch links :

Mastering AWS CloudWatch: A Step-by-Step Tutorial for Beginners (cto.ai)

GitHub : jeeva0406/devops-learning (github.com)

--

--

Jeeva-AWSLabsJourney
Jeeva-AWSLabsJourney

Written by Jeeva-AWSLabsJourney

Exploring AWS, cloud, Linux & DevOps. Your guide to navigating the digital realm. Join me on the journey of discovery

No responses yet