Uncover Hidden AWS Costs

Photo by Jas Min on Unsplash

Uncover Hidden AWS Costs

Guide to Spotting Idle AWS Resources

🧠 Rationale

On AWS, you will experiment a lot. It is one of the main advantages of the Cloud Computing.

To level up your skills following training (maybe a certification path), testing something new, working on a Proof of Concept (PoC) for your products and side-projects, automating a painful task, or experimenting with some new fancy 🤖 AI - ML - LLM things.

As you all know, the Cloud Service Provider (CSP) pricing uses the consumption-based model, so everything active will be billed at the end of the month. 💣

It is frequent and expected to forget to turn things off (I’m doing this all the time) or be conscious that there are still resources up and running on your account, especially if you are a newcomer.

You must manually check billing services like AWS Cost Explorer to verify that nothing is ramping up your bill and worth. There are no built-in default billing alerts; you must set up yourself.

It is even more challenging that there are 32 regions (to date) where your assets can reside over the AWS global footprint.

In this blog post, we will share a methodology for the most common AWS services (usual suspects) to identify if there are wasted resources on your AWS account and ultimately:

  1. Save Money

  2. Reduce Attack Surface

  3. Lower Carbon Footprint

Let’s dive into the main culprit AWS services 🔍

🖥️ EC2 Instances

To find unused AWS instances, you must rely on CloudWatch (CW) metrics. At a starting point, we can rely on the following metrics: CPUUtilization, and NetworkPacketsIn / NetworkPacketsOut. Our assumption is that if the CPU Utilization is barely low AND there is no traffic in / out during a specific period, we can assume this EC2 instance is no longer used.

A great way to achieve this kind of monitoring is to monitor multiple CW metrics with, for example, Composite Alarm.

To check Underutilized instances, you can also review it manually each instance in EC2 Console and check the Monitoring tab in the last two weeks period, for example.

There are also some paid options on AWS:

  • Trusted Advisor: Cost Optimizations section (Business or Enterprise Support Plan required - at least $100/month)

  • AWS Compute Optimizer (See Pricing)

💽 RDS Instances

CloudWatch Metrics to see active connection to the RDS Database.

We assume that a db without connection for a week is a good suspect for no longer being used. Operational teams will need to confirm this assumption.

The corresponding CloudWatch Metric is: DatabaseConnections you can either check the history of this metric using API or by looking it into CloudWatch Console.

To monitor this assumption on your AWS account, you can set up a CloudWatch alarm to monitor this behavior with an alarm on a 0 threshold on this metric for a period of one week.

Here is a CloudFormation to set this Alarm:

MyDBAlarm:
  Type: AWS::CloudWatch::Alarm
  Properties:
    AlarmName: "ZeroDatabaseConnections"
    AlarmDescription: "Trigger if DatabaseConnections is 0 for one week"
    Namespace: "AWS/RDS"
    MetricName: "DatabaseConnections"
    Statistic: "Average"
    Period: 604800 # 1 week in seconds
    EvaluationPeriods: 1
    Threshold: 0
    ComparisonOperator: "LessThanOrEqualToThreshold"
    AlarmActions:
      - Ref: "MySNSTopic"

🪣 EBS Volumes

For EBS Volumes, in some cases, the volume could be detached or available, so there are no more active ones, and no EC2 instance could use it. It could be legit, but in most cases, it is no longer used, and as a preventive measure, when you terminate an EC2, the attached disk is just detached.

The following command will help you to identify “available” EBS volumes for a given AWS Region.

aws ec2 describe-volumes --query "Volumes[?State=='available'].[VolumeId,Size]" --output table

📸 EBS Snapshots

Sometimes, you create a snapshot of a specific volume and keep it forever. But it costs you money. Generally, we assume snapshots older than 90 days are irrelevant and useless.

Run the bash script below for a given AWS Region to identify these obsolete snapshots.

There are some AWS services that, when you delete them, will create a termination snapshot as a safety procedure, but the snapshot will reside here forever (and billed forever).

DATE_90_DAYS_AGO=$(date -u -d "90 days ago" +'%Y-%m-%dT%H:%M:%S')
aws ec2 describe-snapshots --owner-ids YOUR_AWS_ACCOUNT_ID --query "Snapshots[?StartTime<'$DATE_90_DAYS_AGO'].[SnapshotId,StartTime,VolumeId]" --output table

📒 CloudWatch LogGroups

CloudWatch LogGroup can also be stored indefinitely (default behavior). This will lead to a subsequent amount of wasted dollars on your AWS bills.

To identify CloudWatch LogGroup without expiration, you can run the following AWS CLI command:

aws logs describe-log-groups --query "sort_by(logGroups[?retentionInDays == null], &storedBytes) | reverse(@) | [].[logGroupName, storedBytes]" --output table

👷🏻‍♀️ IAM Principals

One way to extract the last usage of IAM principals is to rely on credential reports. If you are unfamiliar with this generated csv file, it's a flat file containing all AWS users with metadata about the IAM creds. Generally, it's used during habilitations reviews.

aws iam generate-credential-report
aws iam get-credential-report --query 'Content' --output text | base64 -d > credential_report.csv
awk -F ',' '{print $1,$5,$10}' credential_report.csv | column -t

For IAM Roles, it's more challenging, and you will have to generate a report for each role.

#!/bin/bash

# Get the list of all role names in the account
role_names=$(aws iam list-roles \
  --query 'Roles[].RoleName' \
  --output text)

# Loop through each role name to generate last accessed details
for role_name in $role_names; do
  echo "Generating last accessed details for role: $role_name"

  # Generate service last accessed details
  job_output=$(aws iam generate-service-last-accessed-details \
    --arn "arn:aws:iam::ACCOUNT-ID-WITHOUT-HYPHENS:role/$role_name")

  # Extract JobId from the JSON output
  job_id=$(echo $job_output | jq -r '.JobId')

  # Wait for a few seconds or minutes to ensure the job is completed
  # Or you can implement a more complex wait mechanism here
  sleep 30

  # Retrieve and save the last accessed details using the JobId
  aws iam get-service-last-accessed-details \
    --job-id "$job_id" \
    > "last_accessed_details_${role_name}.json"

  echo "Last accessed details for role $role_name saved to last_accessed_details_${role_name}.json"
done

I Don’t Have Time. Can This Be Automated?

Navigating the labyrinth of waste detection on your AWS account can be a time-consuming endeavor, especially when the landscape is ever-changing and geographically spread across multiple AWS regions. It’s not just a one-time task; it requires ongoing diligence to spot unused or underutilized assets systematically.

Moreover, the scope of potential wastage extends beyond the common AWS services we’ve discussed here. It seeps into specialized services such as Redshift Clusters, Glue Endpoints, SageMaker Apps, Notebooks, or Endpoints.

That’s where 💸 unusd.cloud comes into play.

Within minutes, you can onboard your AWS accounts onto our SaaS platform. You can then schedule regular scans across all accounts and regions, effectively automating the waste-detection process.

No more manual sifting through CloudWatch metrics or setting up ad-hoc alerts. Our solution does the heavy lifting for you, offering you peace of mind and more time to focus on strategic cloud initiatives.

You’ll receive a digest report through your communication channel of choice — be it Email, Microsoft Teams, or Slack. This ensures that your operational teams can continue to work within their existing Instant Messaging tools, seamlessly integrating waste management into their daily routines and discussions.