Splunk : How to Monitor Your AWS Workloads

August 25, 2021 at 04:01 pm EDT

By Johnathan Campos August 25, 2021

AWS is a comprehensive platform with over 200+ types of cloud services available globally. As organizations adopt these services, monitoring their performance can seem overwhelming.

The majority of AWS workloads behind the scenes are dependent on a core set of services: EC2 (the compute service), EBS (block storage), and ELB (load balancing). For most organizations, these services are at the foundation of their AWS deployments, thus understanding how to monitor these services is at the core of ensuring successful workloads.

This blog will break down the key steps to monitor your AWS services with Splunk Infrastructure Monitoring and discuss a few key AWS infrastructure metrics for the major AWS services.

Want to skip the examples and see for yourself? Start a free trial of Splunk Observability Cloud instantly, no credit card required.

To get started, there are a few prerequisites you need to be aware of. When connecting AWS to Splunk Observability Cloud, you must have an access token for the organization you want to get data into. With a free trial account, an access token named Default has already been created for you. Otherwise, for more on creating organization access tokens, see our docs page on creating and manage organization access tokens.

Once your prerequisites are in order, you will want to log into Splunk Observability Cloud and navigate to Data Setup. On the AWS Setup page, select New integration to open the AWS integration wizard. Click + Add Connection to configure an integration for one of your AWS accounts and follow the four steps needed to create your connection. Although our step-by-step process takes you to every step in detail, you can always check out the docs page on connecting to AWS for more information.

Once connected, Splunk Infrastructure Monitoring will enumerate all of your AWS services. Navigate to the Infrastructure page, select Amazon Web Services, to see a list of all AWS resources in a single pane of glass. Below is an example of my deployed services.

From here, we can quickly dive into each of these services and inspect their metrics to understand better how they are performing. Let's drill down into some of these metrics.

EC2 Metrics

The EC2 compute service lets you run virtual machines in the AWS cloud. (There are a few bare-metal EC2 instance types available, too.) If you host any kind of application or service in AWS, it likely runs on EC2. Even if you host it in a service like EKS (the AWS Kubernetes platform), in most cases, it's still running on an EC2 instance. Splunk Infrastructure monitoring provides you with an excellent overview of all your EC2 metrics by color-coding key metrics as well as your Kubernetes deployment with Kubernetes navigator. Below is an example of how your EC2 instances are shown and the color-coded filter options available for you to choose from.

You can also group common EC2 instance types by various options such as region, state, os type, and more. Here we have an example of a known instance with high CPU utilization (Instance ID omitted), allowing you to identify problematic instances quickly.

While there are many metrics to choose from, there are three key metrics to track for each EC2 instance.

CPU Utilization: The total number of CPU units used, expressed as a percentage of the total available. If this metric exceeds about 80 percent for more than a brief period, you'll want to investigate whether you need to increase the CPU capacity allocated to your workload. Or, there may be a problem with your application that is causing excessive CPU usage.
^{Image showing current EC2 CPU percentage used.}
DiskReadOps: The total completed read operations by the EC2 instance in a given period of time. When this metric deviates from the historical baseline average, it could signify that something is wrong with the application running inside the instance.
DiskWriteOps: The total completed write operations by the EC2 instance in a given period of time. Like spikes in DiskReadOps, DiskWriteOps data that deviates from the norm could signal an application problem.
^{Image showing current EC2 disk ops.}
EBS Metrics
EBS is Amazon's solution for workloads that require block-level storage. EBS volumes tend to be especially important as storage for EC2 instances. EBS monitoring with Splunk Infrastructure Monitoring also follows the same workflow as EC2 monitoring. It starts with an overview map color-coding all of your EBS volumes, allowing you to group them by common characteristics. If a problematic volume is identified in the overview map, you can quickly select it to drill down and gather specific information about it. Here is a great example of the color-coded key metrics shown within the console.
This second example shows how we can easily find a problematic volume and drill down into the details.
To ensure the health and performance of your EBS volumes, be sure to stay aware of these metrics:
- Volume State (aws_state): AWS performs health checks on EBS volumes and returns a status in the form of one of the following: creating, available, in-use, deleting, deleted, or error. If the volume state is showing error, it may be best to investigate. Other states to consider investigating are available, deleting or deleted, depending on the scenario.
^{Image showing current aws_state of an EBS volume.}
- Total IOPS: This is the total read and write operations in a set period of time. High metrics beyond your normal baseline can indicate application bottlenecks or poor storage selection. Below is an example of a chart showing the total IOPS separated by read and write operations updated every minute for an EBS volume attached to an EC2 instance hosting a front-end microservice. This particular release of the microservice contains new static content with an expected higher VolumeReadOps. Higher VolumeWriteOps, in this case, was caused by the microservice having an excessive logging level set that was being written to the volume - seeing the lowered VolumeWriteOps suggests the release fixed this problem.
- Average Queue Length: Volume queue length is the number of pending I/O requests measured by its latency. This latency shows the time elapsed between sending an I/O to EBS and receiving an acknowledgment from EBS that the I/O read or write is complete. High latency on an EBS volume might show the need for a possible well-suited volume such as an SSD-backed volume.
^{Image showing current average queue length of an EBS volume.}
ELB Metrics
ELB, AWS's load balancing service, offers several types of load balancers that distribute application traffic across different EC2 instances. ELB monitoring with Splunk Infrastructure Monitoring also follows the same workflow as EC2 and EBS monitoring. An overview map color-codes all of your Elastic Load Balancers, allowing you to group them by common characteristics. Suppose a problematic load balancer is identified in the overview map. In that case, you can quickly select it to drill down and gather specific information about it, just as EC2 instances and EBS volumes. Splunk Infrastructure monitoring also provides an excellent overview of all your Elastic Load Balancers volumes similar to EC2 instances and EBS volumes by color-coding key metrics. Below is an example.
This second example shows how you can quickly drill down into a specific load balancer for detailed information (ELB ID omitted).
To ensure that ELB is properly allocating requests between the various instances in your environment, be sure to monitor the following metrics:
- Request Count: This is the total requests that ELB handles in a set period. While it's natural for request counts to vary as demand for your application ebbs and flows, sudden spikes or decreases inconsistent with historical traffic patterns at a specific time of day or day of the week could signal a problem like the inability of users to reach your application.
^{Image total routed requests per min of a given ELB.}
- Latency: A measure of the time it takes for one of your instances to start the response to a request from ELB. High latency could be a sign of problems such as an issue with the network or an under-provisioned EC2 instance struggling to handle all of its requests.
^{Image of the average latency of a given ELB.}
- Unhealthy Host Count: ELB performs health checks on instances and uses this metric to count those that it deems unhealthy, meaning that they are not ready to handle requests and may be down. Monitor this metric to ensure you don't run out of sufficient healthy instances to handle application demand.
^{Image showing the amount of health and unhealthy host on a given ELB.}
The metrics and services described are just the tip of the iceberg for AWS service monitoring. Depending on your deployment, you may wish to track several other metrics for each service, such as cloud spend.
Cloud Spend
If you are not sure how much you are spending on AWS, Splunk Infrastructure Monitoring can also help. With the AWS Optimizer, you can quickly identify cost-saving opportunities. I recommend checking out our docs page on our AWS Optimizer and this excellent blog by Greg on How to Optimize Your Cloud Spend Using Observability, where you can discover incredible examples of the AWS optimizer in action.
So, wherever you are within your cloud journey, Splunk Infrastructure Monitoring can help. Be sure to get a clear understanding of what's going on with your infrastructure. Want to learn more about how Splunk Observability Cloud works with AWS to bring you meaningful insights? Watch this video on monitoring AWS workloads with Splunk Infrastructure Monitoring and sign up for your free trial today.

Attachments

Original document
Permalink

Disclaimer

Splunk Inc. published this content on 25 August 2021 and is solely responsible for the information contained therein. Distributed by Public, unedited and unaltered, on 25 August 2021 20:00:04 UTC.

	1st Jan change	Capi.
MICROSOFT CORPORATION	+7.52%	3,004B
SYNOPSYS INC.	+0.93%	79.28B
CADENCE DESIGN SYSTEMS, INC.	+4.97%	77.94B
DASSAULT SYSTÈMES SE	-13.60%	53.9B
ATLASSIAN CORPORATION	-18.60%	50.22B
PALANTIR TECHNOLOGIES INC.	+23.06%	46.76B
THE TRADE DESK, INC.	+12.30%	39.49B
SEA LIMITED	+39.01%	31.91B
TAKE-TWO INTERACTIVE SOFTWARE, INC.	-12.62%	23.99B

1st Jan change

Capi.

MICROSOFT CORPORATION

+7.52%

3,004B

SYNOPSYS INC.

+0.93%

79.28B

CADENCE DESIGN SYSTEMS, INC.

+4.97%

77.94B

DASSAULT SYSTÈMES SE

-13.60%

53.9B

ATLASSIAN CORPORATION

-18.60%

50.22B

PALANTIR TECHNOLOGIES INC.

+23.06%

46.76B

THE TRADE DESK, INC.

+12.30%

39.49B

SEA LIMITED

+39.01%

31.91B

TAKE-TWO INTERACTIVE SOFTWARE, INC.

-12.62%

23.99B

ANALYST RECOMMENDATIONS : Best Buy, Wells Fargo, AMD, Netflix, Nvidia...	Mar. 20
Splunk Inc.(NasdaqGM:SPLK) dropped from FTSE All-World Index	Mar. 19	CI
Splunk Inc.(NasdaqGS:SPLK) dropped from S&P Software & Services Select Industry Index	Mar. 19	CI
Splunk Inc.(NasdaqGS:SPLK) dropped from S&P TMI Index	Mar. 19	CI
Splunk Inc.(NasdaqGS:SPLK) dropped from S&P Global BMI Index	Mar. 19	CI
ANALYST RECOMMENDATIONS : 3M Company, Snowflake, Splunk, Micron, Nvidia...	Mar. 19
How Cisco Will Integrate Splunk Into Company	Mar. 18	MT
Cisco: completes acquisition of Splunk for $28 billion	Mar. 18	CF
Splunk Inc.(NasdaqGS:SPLK) dropped from NASDAQ Composite Index	Mar. 17	CI
Cisco Systems, Inc. entered into an agreement and plan of merger to acquire Splunk Inc. from Hellman & Friedman Capital Partners X, L.P., managed by Hellman & Friedman LLC, BlackRock, Inc., The Vanguard Group, Inc., PRIMECAP Management Company and others.	Mar. 17	CI
Splunk Inc.(NasdaqGS:SPLK) dropped from NASDAQ-100 Index	Mar. 14	CI
Add a little SaaS to your life	Mar. 14
EU Watchdog Green-lights Cisco Systems' Purchase of Splunk	Mar. 14	MT
Cisco gains EU antitrust nod for $28 billion Splunk acquisition	Mar. 14	RE
Oracle posts rise in quarterly profit on strong cloud demand	Mar. 11	RE
Linde to Join Nasdaq-100 Index	Mar. 11	MT
Cisco's Splunk deal set to win unconditional EU antitrust OK, sources say	Mar. 05	RE
GitLab shares drop as 'less conservative' forecast disappoints investors	Mar. 05	RE
Splunk beats quarterly revenue estimates on steady demand for cloud services	Feb. 27	RE
Splunk Fiscal Q4 Earnings, Revenue Rise	Feb. 27	MT
Earnings Flash (SPLK) SPLUNK Posts Q4 Revenue $1.49B, vs. Street Est of $1.27B	Feb. 27	MT
Splunk Inc. Reports Earnings Results for the Full Year Ended January 31, 2024	Feb. 27	CI
Splunk Inc. Reports Earnings Results for the Fourth Quarter and Full Year Ended January 31, 2024	Feb. 27	CI
Equities Mixed as Traders Parse Economic Data, Fed Governor Remarks	Feb. 27	MT
Cisco to lay off 5% of workforce, cuts annual revenue forecast	Feb. 14	RE

Splunk Inc.

Equities

SPLK

US8486371045

Software

Splunk : How to Monitor Your AWS Workloads

Latest news about Splunk Inc.

Chart Splunk Inc.

Company Profile

Sector Other Software