Dynatrace, Inc.

DT

US2681501092

IT Services & Consulting

Market Closed - Nyse Other stock markets 04:00:02 2024-04-17 pm EDT			5-day change	1st Jan Change
44.43 ^USD	-0.20%		-6.36%	-18.76%

Apr. 09	Dynatrace Expands Go-To-Market Partnership with Google Cloud	CI
Apr. 02	North American Morning Briefing : Stocks Seen -2-	DJ

Architected for resiliency: How Dynatrace withstands data center outages

June 15, 2021 at 10:26 am EDT

Software reliability and resiliency don't just happen by simply moving your software to a modern stack, or by moving your workloads to the cloud. There is no 'Resiliency as a Service' you can connect to via an API that makes your service withstand chaotic situations. The fact is, Reliability and Resiliency must be rooted in the architecture of a distributed system. The path to 'Architected for Resiliency' is long, but it clearly pays off in the long run, especially when outages occur, as I want to show you in this blog post.

This article was inspired by an email I received from Thomas Reisenbichler, Director of Autonomous Cloud Enablement on Friday, June 11^th. The subject line said: 'Success Story: Major Issue in single AWS Frankfurt Availability Zone!' The email walked through how our Dynatrace self-monitoring notified users of the outage but automatically remediated the problem thanks to our platform's architecture. This meant there was no negative impact on our end-users, Service Level Objectives (SLOs), or Service Level Agreements (SLAs). And the last sentence of the email was what made me want to share this story publicly, as it's a testimonial to how modern software engineering and operations should make you feel. It read: 'Observing this issue, it was an honor to have the possibility to 'just' sit next to it and do a little bit of babysitting knowing that we are coping very well with this failure!'

What Thomas meant by saying this, was the dashboard showed how well the Dynatrace architecture automatically redirected traffic to the remaining nodes that were not impacted by the issue - thanks to our multi-availability zone deployment - as you can see below:

Dynatrace self-monitoring showing how 'Architected for Resiliency' works in action: Fully automated, Zero Impact!

Ready to learn more? Then read on!

Fact #1: AWS EC2 outage properly documented

Let's start with some facts. There really was an outage and AWS did a great job notifying their users about any service disruptions via their AWS Service Health Dashboard. The following screenshot shows the problem reported on June 10^th for the EC2 Services in one of their Availability Zones in Frankfurt, Germany:

All major cloud service vendors provide great status updates and historical information about service status.

The problem started at 1:24PM PDT, with the services starting to become available again about 3 hours later. The final status update was at 6:54PM PDT with a very detailed description of the temperature rise that caused the shutdown initially, followed by the fire suppression system dispersing some chemicals which prolonged the full recovery process.

Fact #2: No significant impact on Dynatrace Users

There are several ways Dynatrace monitors and alerts on the impact of service disruption. Let me start with the end-user impact.

Dynatrace provides both Real User Monitoring (RUM) as well as Synthetic Monitoring as part of our Digital Experience Solution. Through the RUM data, Dynatrace's AI engine, Davis, detected seven users were impacted by the outage when they tried to access the Web Interface. This number was so low because the automatic traffic redirect was so fast it kept the impact so low. The screenshot below shows the opened problem ticket and the root cause information:

Dynatrace detected that seven users had issues accessing the web UI and pointed to the root cause

Note to our Dynatrace users: This story triggered a feature request that will benefit every Dynatrace user in the future. The team wants to enrich root-cause information in the Dynatrace problem ticket with external or third-party status details such as the AWS Service Health Status. This will eliminate the need to cross-check whether an existing outage of your third-party providers is going on right now.

Fact #3: Minimum impact detected through synthetics

Besides real user analytics, we also use Dynatrace Synthetic Monitoring, which continuously validates successful logins to our SaaS tenants on each cluster. Those tests get executed from two locations (Paris and London) hosted by different cloud vendors (Azure & AWS).

For the outage, Dynatrace Synthetic detected a very short one-time connection timeout, as you can see below:

Dynatrace Synthetic detected a single login problem caused by a connection time out over the course of the 3-hour outage

As a general best practice, Synthetic Tests are great to validate your core use cases are always working as expected. In our case that includes the login to our SaaS tenants and exploring captured data. If those use cases don't work as expected, we want to get alerted.

Tip: We see more of our users started to Shift-Left and GitOps-ify Dynatrace Synthetic. This means that Synthetic Tests are not just used in production but also in pre-production environments to validate environment stability, e.g., do I have a stable build in a QA or Test environment or not? Thanks to our Automation APIs and our open-source project Monaco (Monitoring as Code) the creation and updates of those synthetic tests are fully embedded into their GitOps automation. Dynatrace Synthetic Test definitions are version control in Git, as YAML gets automatically rolled out as part of their delivery automation, e.g.: via Jenkins, GitLab, Azure DevOps, Keptn Fact #4: Multi-node, multi-availability zone deployment architecture

I already mentioned at the beginning of this blog that resiliency and reliability do not come for free - they must be part of your architecture. And that's true for Dynatrace as well. You can find a lot of information about the Dynatrace architecture online, both for our SaaS and Managed deployments.

Dynatrace cluster architecture for SaaS and Managed follows high availability and resiliency practices allowing it to operate without problems even in chaotic situations

I wanted to highlight a couple of essential elements that are key for Dynatrace's resilience against a data center (=AWS Availability Zone) outage:

High availability due to multi-AZ Dynatrace cluster node deployments
Rack-aware Cassandra deployments

Let's have a quick look at Dynatrace Smartscape to see how our cluster node services are truly distributed across multiple EC2 hosts in different Availability Zones:

The live dependency information not only visualizes the successful deployment but also fuels Dynatrace's anomaly and root cause detection

The health-based load balancing of incoming traffic automatically redirects traffic to healthy nodes. In case of host unavailability, consumers of Dynatrace services (via Web UI or API) never experience any issues. This deployment is also super resilient to full data center (e.g., Availability Zone) outages.

As for Cassandra, a 3-node Dynatrace SaaS deployment, we deploy 9 Cassandra nodes with a rack-aware deployment. The rack is linked to the AWS Availability Zone. In case one zone goes down, the traffic gets redirected to the remaining Cassandra nodes. The following chart shows the distribution of nodes before, during, and after the outage:

High availability and rack aware Cassandra node deployment in action: 6 nodes were able to handle the load of the missing 3 without a problem

The Dynatrace deployment also contains Elastic Search and our Active Gates. But - thanks to our multi-node and multi-datacenter deployments, all these components provide the same high availability and resiliency. That's why the complete Dynatrace Software Intelligence Platform is 'Architected for Resiliency'

Tip: Our managed customers have the same high availability and resiliency features. For more information check out our documentation on fault domain awareness such as rack aware managed deployments. Conclusion: Investing in resilient architecture is CONTINUOUS

This story proves that high availability and resiliency must be features and considerations you plan from the start when designing a distributed system. Built-in monitoring is the only way to validate these systems work as designed, and alerting is the insurance that you get notified in corner cases to reduce the risk of negative end-user impact.

I was also reminded that resilient architecture is not a 'one-time investment'. It needs continuous attention and focus. At Dynatrace we built our current architecture years ago, and to ensure it still withstands challenging situations every new feature gets evaluated against non-functional requirements such as resiliency or performance. Our dynamic growth in engineering also made us invest in continuous training for new and existing hires. And to give them feedback on the potential impact of code changes we have an automated continuous performance environment that battle tests new versions before admitting them to production.

Before saying goodbye, let me say thanks to our Dynatrace Engineering and everyone involved in designing and building such a resilient system architecture. I also want to say thank you to Thomas Reisenbichler for bringing this story to my attention, to Thomas Steinmaurer for giving me additional background information, and to Giulia Di Pietro for helping me finalize the blog post.

Fore more on Chaos Engineering & Observability be sure to register for my upcoming webinar with Gremlin.

Attachments

Original document
Permalink

Disclaimer

Dynatrace Inc. published this content on 15 June 2021 and is solely responsible for the information contained therein. Distributed by Public, unedited and unaltered, on 15 June 2021 14:25:07 UTC.

Latest news about Dynatrace, Inc.

Dynatrace Expands Go-To-Market Partnership with Google Cloud	Apr. 09	CI
North American Morning Briefing : Stocks Seen -2-	Apr. 02	DJ
JPMorgan Cuts Price Target on Dynatrace to $60 From $65, Keeps Overweight Rating	Apr. 01	MT
Needham Downgrades Dynatrace to Hold From Buy	Apr. 01	MT
Barclays Adjusts Dynatrace's Price Target to $52 From $59, Maintains Equalweight Rating	Mar. 20	MT
Wolfe Research Starts Dynatrace With Outperform Rating, $60 Price Target	Mar. 18	MT
Dynatrace Insider Sold Shares Worth $562,645, According to a Recent SEC Filing	Mar. 08	MT
Transcript : Dynatrace, Inc. Presents at Morgan Stanley?s Technology, Media & Telecom Conference 2024, Mar-05-2024 12:35 PM	Mar. 05
Dynatrace, Inc. Announces Resignation of Seth Boro from the Board	Feb. 23	CI
Dynatrace Insider Sold Shares Worth $346,974, According to a Recent SEC Filing	Feb. 15	MT
Dynatrace Positioned for Long-Term Share Gain in Observability Market, Morgan Stanley Says	Feb. 13	MT
BMO Scales Digital Banking Capabilities for Customers Worldwide with Dynatrace	Feb. 13	CI
Morgan Stanley Initiates Dynatrace at Equalweight With $60 Price Target, Sees Long-Term Opportunity But Notes 'Patience Needed'	Feb. 13	MT
Sector Update: Tech Stocks Higher Late Afternoon	Feb. 08	MT
Sector Update: Tech Stocks Gain Thursday Afternoon	Feb. 08	MT
Dynatrace Drops After Cutting Annual Recurring Revenue Growth Outlook	Feb. 08	MT
Stocks in motion: AI-related stocks soar	Feb. 08
Transcript : Dynatrace, Inc., Q3 2024 Earnings Call, Feb 08, 2024	Feb. 08
Dynatrace's Fiscal Q3 Non-GAAP Earnings, Revenue Rise; Q4 Guidance Issued, 2024 Guidance Lifted	Feb. 08	MT
Dynatrace, Inc. Reports Earnings Results for the Third Quarter and Nine Months Ended December 31, 2023	Feb. 08	CI
(DT) DYNATRACE Sees Q4 Revenue Range $372M - $377M	Feb. 08	MT
(DT) DYNATRACE Sees Q4 EPS Range $0.26 - $0.28	Feb. 08	MT
Earnings Flash (DT) DYNATRACE Posts Q3 Revenue $365.1M, vs. Street Est of $357.7M	Feb. 08	MT
Earnings Flash (DT) DYNATRACE Reports Q3 EPS $0.32, vs. Street Est of $0.28	Feb. 08	MT
Dynatrace, Inc. Updates Earnings Guidance for the Fiscal Year 2024	Feb. 08	CI

Chart Dynatrace, Inc.

Duration

Period

More charts

Company Profile

Dynatrace, Inc. offers a unified observability and security platform with analytics and automation for dynamic, hybrid, multi-cloud environments. The Company's Dynatrace Software Intelligence Platform provides application and micro service monitoring (APM), runtime application security, infrastructure monitoring, digital experience monitoring (DEM), business analytics, and cloud automation. Its product offerings include Applications and Microservices Monitoring, Infrastructure Monitoring, Application Security, Log Management and Analytics, Digital Experience Monitoring, Digital Business Analytics, and Cloud Automation. Its Dynatrace Infrastructure Monitoring provides complete visibility into a customer's infrastructure layer across public and private clouds and hybrid, multi-cloud environments. It also provides real-time detection and blocking to help protect against injection attacks that exploit critical vulnerabilities, such as Log4Shell.

Sector

IT Services & Consulting

Calendar

2024-05-14 - Q4 2024 Earnings Release (Projected)

More about the company

Income Statement Evolution

More financial data

Analysis / Opinion

INTERVIEW - Rick McConnell, CEO of Dynatrace: Earnings Beat, Upbeat Forecast

May 17, 2023 at 03:20 pm EDT

More Strategies

Ratings for Dynatrace, Inc.

Trading Rating

Investor Rating

ESG Refinitiv

C+

More Ratings

Analysts' Consensus

Sell

Buy

Mean consensus

BUY

Number of Analysts

Last Close Price

44.43 USD

Average target price

62.48 USD

Spread / Average Target

+40.62%

Consensus

EPS Revisions

Estimates Revisions

Quarterly earnings - Rate of surprise

Company calendar

Sector Cloud Computing Services

	1st Jan change	Capi.
DYNATRACE, INC.	-18.76%	13.15B
SALESFORCE.COM, INC.	+5.01%	268B
CLOUDFLARE, INC.	+5.69%	29.71B
NUTANIX, INC.	+27.30%	14.83B
KINGDEE INTERNATIONAL SOFTWARE GROUP COMPANY LIMITED	-31.72%	3.58B
IONOS GROUP SE	+30.58%	3.41B
DIGITALOCEAN HOLDINGS, INC.	-9.43%	3.02B
ALKAMI TECHNOLOGY, INC.	-1.69%	2.32B
BEIJING SINNET TECHNOLOGY CO.,LTD	-7.61%	2.23B
SINCH AB	-34.31%	1.89B

Cloud Computing Services

Dynatrace, Inc.

Equities

DT

US2681501092

IT Services & Consulting

Architected for resiliency: How Dynatrace withstands data center outages

EPS Revisions