Observability: It’s Not What You Think

May 04, 2021 at 04:41 pm EDT

By Greg Leffler May 04, 2021

What is Observability?

Observability is a mindset that enables you to answer any question about your entire business through collection and analysis of data. If you ask other folks, Observability is the dry control theory definition of 'monitoring the internal state of a system by looking at its output,' or it's the very technical definition of 'metrics, traces, and logs.' While these are correct, Observability isn't just one thing you implement, then proudly declare 'now this system has Observability™.' Building Observability into your business lets you answer questions about your business.

What Kind of Questions?

Of course, the basic 'what happened in our app when this error count spiked up' questions can be answered with Observability tools, but that's barely scratching the surface of what Observability actually is. What an Observability mindset lets you do is to figure out why the error count spiked up. If you're intimately familiar with your app and all of its dependencies, then perhaps you can get this insight from a monitoring system, but as modern apps become increasingly more complex, the ability to maintain the state of them in your head becomes more and more challenging. Business demands, feature launches, A/B tests, refactoring into microservices… things like this all combine to create ever-increasing entropy, so knowing everything about your system without help gets more difficult by the day.

Observability also lets you ask how (or if!) the errors actually impacted the user experience. You can look at RUM data, purchase volume, general business metrics, marketing campaigns, customer support tickets, social media sentiment, the list goes on and on - this data takes an Observability system from something only a few people use to something the entire company can get insight from. This data lets you answer not just 'what,' but 'why' and 'how.' A true Observability suite lets you answer all of these:

'What made this break?'
'How effective was this ad?'
'Did this new front-end design drive purchases?'
'Did this service outage make our users angry?'

Why Should You Care?

Integrating this type of data into your system lets you discover that a marketing campaign you sent to your best customers had a typo in the call to action URL that is sending customers to a 404 page. Without fully integrating data into your Observability solution, sure, you can see the 4xx rate increased. However, you can't figure out why the 4xx rate went up, only that the 4xx rate went up. Imagine how much faster you could resolve an issue if in addition to 'fe-server 4xx above threshold' you also saw an event showing that 'marketing campaign whales-winback started' happened at the same time. You'd know not only what was wrong, but you'd have a good guess at what caused it, and you'd have a good springboard to investigate the revenue impact, or negative goodwill, that this error caused you.

How is This Not Monitoring?

As I said, monitoring tells you something is wrong, but it doesn't tell you why it's wrong. Monitoring setups also can only monitor things you've already thought could be problematic (your 'known knowns'.) If you didn't think to instrument the component in question in advance, you can't monitor it. What's worse, if you then have a problem there and decide to add monitoring to it, you still don't have the historical data about how the component performed. Also, monitoring requires special attention before you even know what could go wrong - you have to specifically instrument specific things and set up specific alerts about them. This takes time and is prone to errors.

Also, no matter how well-instrumented your monitoring solution is, it still doesn't let you explore your business. Looking into 'unknown unknowns' isn't possible with a classic monitoring system, because the data simply doesn't exist for you to evaluate. Adding in business metrics is generally not supported or poorly supported in traditional monitoring. Real-user data is almost never included in monitoring systems, which is absurd, because the entire point of what we do in web applications is delivering user experience!

How Do the 'Three Pillars' Work Together?

Metrics, traces, and logs are the 'three pillars' of Observability, and they are necessary but not sufficient to really understand what Observability is and to gain insight into your applications and business. Metrics can be used to tell you what's wrong. Traces tell you how it's wrong - what specific calls aren't working, for example. Logs tell you why it's wrong, letting you dive into a particular metric/trace to figure out why it's behaving in the way you see. Collecting this data is the start to an Observability mindset, but it's only the start.

Why Do You Need Every Single Piece of Data?

A big problem with Observability from the naive point of view is that there's simply too much data to collect and retain it all. 'You can't realistically store the amount of output generated by a modern service in one place,' goes the refrain. The solution to this that most vendors propose is what they call sampling, but what I like to call 'throwing data away.' The data that gets thrown away might be your most critical customer's transaction. It might be the one particular use case that causes some bizarre bug that crashes your database server. What's worse is that a lot of vendors will advertise this as a feature to 'save you money.' I'll go into more detail about the hidden costs of sampling in a separate blog post.

In a classic control-theory world where you had a bunch of gauges monitoring critical infrastructure, would you throw out 70% of the observations because '30% should be good enough?' Of course you wouldn't, yet this is what many vendors suggest you simply must do with Observability due to the nature of the data in question. It isn't true. Mature platforms can handle all the data about your business without throwing any away.

Observability is Not a Practice, It's a Mindset

While this article has discussed some implementation details about Observability, what Observability really is isn't 'collect and store metrics, traces, and logs.' It's a mindset of 'what data should we collect that might be useful in figuring out any question we want to know about our business.' Observability is not merely about application performance monitoring or infrastructure monitoring (though those are parts of it). It's about understanding the need to ingest everything. Real-user experience metrics. Marketing campaigns. Seasonal changes to traffic. Sick days taken by your warehouse team.

Observability is a mindset that necessitates a single source of truth for data about your business and your applications that everyone (developers, ops, product, the C-suite, etc.) uses. There's millions of points of data that make up your business, and Observability is about capturing all that data in one system and then using the data to answer questions beyond just the technical app(s) your business runs.

To fully leverage Observability, you need a purpose-built streaming architecture that can arbitrarily scale and that lets you receive constant feedback on how your changes impact your users and your business. You need a system that bundles together many tools into one common source of truth and that provides insights from those tools.

You can experience this with a free trial of Splunk Observability Cloud and start getting insights today.

If you're not ready for the trial and want to learn even more about how to get Observability, check out our freeBeginner's Guide to Observability.

Attachments

Original document
Permalink

Disclaimer

Splunk Inc. published this content on 04 May 2021 and is solely responsible for the information contained therein. Distributed by Public, unedited and unaltered, on 04 May 2021 20:40:05 UTC.

	1st Jan change	Capi.
MICROSOFT CORPORATION	+4.93%	3,039B
SYNOPSYS INC.	+2.26%	80.47B
CADENCE DESIGN SYSTEMS, INC.	+1.29%	75.69B
DASSAULT SYSTÈMES SE	-15.70%	54.73B
ATLASSIAN CORPORATION	-16.99%	51.62B
PALANTIR TECHNOLOGIES INC.	+24.96%	47.77B
THE TRADE DESK, INC.	+15.80%	40.95B
SEA LIMITED	+55.10%	35.86B
TAKE-TWO INTERACTIVE SOFTWARE, INC.	-11.09%	24.36B

1st Jan change

Capi.

MICROSOFT CORPORATION

+4.93%

3,039B

SYNOPSYS INC.

+2.26%

80.47B

CADENCE DESIGN SYSTEMS, INC.

+1.29%

75.69B

DASSAULT SYSTÈMES SE

-15.70%

54.73B

ATLASSIAN CORPORATION

-16.99%

51.62B

PALANTIR TECHNOLOGIES INC.

+24.96%

47.77B

THE TRADE DESK, INC.

+15.80%

40.95B

SEA LIMITED

+55.10%

35.86B

TAKE-TWO INTERACTIVE SOFTWARE, INC.

-11.09%

24.36B

ANALYST RECOMMENDATIONS : Best Buy, Wells Fargo, AMD, Netflix, Nvidia...	Mar. 20
Splunk Inc.(NasdaqGM:SPLK) dropped from FTSE All-World Index	Mar. 19	CI
Splunk Inc.(NasdaqGS:SPLK) dropped from S&P Software & Services Select Industry Index	Mar. 19	CI
Splunk Inc.(NasdaqGS:SPLK) dropped from S&P TMI Index	Mar. 19	CI
Splunk Inc.(NasdaqGS:SPLK) dropped from S&P Global BMI Index	Mar. 19	CI
ANALYST RECOMMENDATIONS : 3M Company, Snowflake, Splunk, Micron, Nvidia...	Mar. 19
How Cisco Will Integrate Splunk Into Company	Mar. 18	MT
Cisco: completes acquisition of Splunk for $28 billion	Mar. 18	CF
Splunk Inc.(NasdaqGS:SPLK) dropped from NASDAQ Composite Index	Mar. 17	CI
Cisco Systems, Inc. entered into an agreement and plan of merger to acquire Splunk Inc. from Hellman & Friedman Capital Partners X, L.P., managed by Hellman & Friedman LLC, BlackRock, Inc., The Vanguard Group, Inc., PRIMECAP Management Company and others.	Mar. 17	CI
Splunk Inc.(NasdaqGS:SPLK) dropped from NASDAQ-100 Index	Mar. 14	CI
Add a little SaaS to your life	Mar. 14
EU Watchdog Green-lights Cisco Systems' Purchase of Splunk	Mar. 14	MT
Cisco gains EU antitrust nod for $28 billion Splunk acquisition	Mar. 14	RE
Oracle posts rise in quarterly profit on strong cloud demand	Mar. 11	RE
Linde to Join Nasdaq-100 Index	Mar. 11	MT
Cisco's Splunk deal set to win unconditional EU antitrust OK, sources say	Mar. 05	RE
GitLab shares drop as 'less conservative' forecast disappoints investors	Mar. 05	RE
Splunk beats quarterly revenue estimates on steady demand for cloud services	Feb. 27	RE
Splunk Fiscal Q4 Earnings, Revenue Rise	Feb. 27	MT
Earnings Flash (SPLK) SPLUNK Posts Q4 Revenue $1.49B, vs. Street Est of $1.27B	Feb. 27	MT
Splunk Inc. Reports Earnings Results for the Full Year Ended January 31, 2024	Feb. 27	CI
Splunk Inc. Reports Earnings Results for the Fourth Quarter and Full Year Ended January 31, 2024	Feb. 27	CI
Equities Mixed as Traders Parse Economic Data, Fed Governor Remarks	Feb. 27	MT
Cisco to lay off 5% of workforce, cuts annual revenue forecast	Feb. 14	RE

Splunk Inc.

Equities

SPLK

US8486371045

Software

Observability: It’s Not What You Think

Latest news about Splunk Inc.

Chart Splunk Inc.

Company Profile

Sector Other Software