Splunk : Deep Learning Toolkit 3.6 - Automated Machine Learning, Random Cut Forests, Time Series Decomposition, and Sentiment Analysis

July 22, 2021 at 06:58 am EDT

By Philipp Drieger July 22, 2021

We're excited to share that the Deep Learning Toolkit App for Splunk (DLTK) is now available in version 3.6 for Splunk Enterprise and Splunk Cloud. The latest release includes:

several bug fixes
library updates
drill-down links on the container management dashboard
four new algorithm examples that benefit from the latest update of the libraries available in the Golden Image CPU 3.6

Let's get started with the new operational overview dashboard which was built using Splunk's brand new dashboard studio functionality which I highly recommend checking out. You can learn more about it in this recent tech talk which you can watch on demand.

Automated Machine Learning

Building machine learning models may require you to iterate over your model to find the best set of parameters for a given dataset. Doing this manually requires quite some time so you might want to think about ways to automate this. We discussed a gridsearch based approach in the blog post of DLTK version 3.4 release. Recently, my brilliant colleague Lukas worked with autosklearn, an automated machine learning library and contributed a notebook and dashboard example on how to apply this concept to a given dataset. Autosklearn allows you to automatically run various machine learning algorithms over your dataset, build models, score them, further tune them and finally provide you with an optimal model that was found based on a few user defined constraints. However, this comes at the cost of compute, something to be aware of if you decide to apply it to your datasets.

The following dashboard shows the results of a classification model that was automatically built on top of a given example dataset. Under the scoring results you can find the model summary which shows more details about the automatic model building process. This approach can be useful in two typical scenarios:

You have already built a model, either with Splunk's Machine Learning Toolkit or some other techniques, and you want to know if it can be further improved.
You have an unknown dataset and want to simply get a model built on automatically without spending time on it manually determining the right algorithms and their parameters.

Last but not least, please keep in mind this approach helps you automate the modelling part, but it still requires you to think and really understand the business problem you want to solve and what datasets you actually decide to use for the modelling part.

Robust Random Cut Forest

In the last blog post of DLTK version 3.5 we discussed various new approaches for anomaly detection, especially in time series data. My colleague Greg recently wrote about cyclical statistical forecasts and anomalies. But of course there are even more useful techniques you may want to use. One of them is the so-called Robust Random Cut Forest, which is certainly not a tool for randomly cutting trees in your favourite forest nearby, but a tree based modelling technique that can provide you with insight, telling you which of your data points are the most anomalous ones in your dataset. As you can read from the chart below the red line marks the outliers found in a simple time series data set. The typical cyclical behavior is automatically learnt to be normal and the parts that show stronger deviations are flagged as outliers.

Time Series Decomposition

Another useful preprocessing technique when working with time series data is to decompose it to analyze if there is a seasonality and trend contained in the data. This technique is often referred to under the acronym 'STL' and is helpful for forecasting scenarios, but also for anomaly detection. The example below shows how the raw time series data is decomposed into its season and trend parts and the remaining residual which can then subsequently be used to detect anomalies as you can read from the panel at the bottom of the dashboard.

If you want to apply this technique directly on your stream of data, Splunk's Data Stream Processor has an online STL algorithm available as well.

Sentiment Analysis with spaCy

Analyzing sentiment in natural language text data can be a very useful tool to better understand customer interactions with business services, product reviews or even social media feeds. Despite the challenge of language ambiguities and the complexity of working with natural language data, libraries like spaCy can still provide you with an easy starting point for building those kinds of analytics. The following example shows how a basic sentiment analysis can easily be applied to any text data available in Splunk. The results like the sentiment polarity or subjectivity can be further used then for investigation and further aggregations or trend analysis.

If you want to apply sentiment directly on a data stream with the help of Splunk's Data Stream processor, then you can check out the sentiment analysis functions there.

Finally, I want to give kudos to three contributors. A big shout out goes to Andreas Zientek from Zeppelin for his super useful contribution to the container management dashboard. Secondly, I want to thank my colleague Lukas Utz for all his work to make auto-sklearn easily accessible in DLTK. And another big thanks to Greg Ainslie-Malik who contributed all the other examples as a result of recent customer engagements.

We all hope that you can make good use of those new techniques and get even more value from your data in Splunk! If you have any questions, please engage with the community or reach out to us - we're happy to help with your MLTK or DLTK use cases!

Happy Splunking,

Philipp

Attachments

Original document
Permalink

Disclaimer

Splunk Inc. published this content on 22 July 2021 and is solely responsible for the information contained therein. Distributed by Public, unedited and unaltered, on 22 July 2021 10:57:03 UTC.

	1st Jan change	Capi.
MICROSOFT CORPORATION	+8.60%	3,028B
SYNOPSYS INC.	+3.44%	80.47B
CADENCE DESIGN SYSTEMS, INC.	+2.72%	76.81B
DASSAULT SYSTÈMES SE	-11.97%	54.2B
ATLASSIAN CORPORATION	-15.80%	51.74B
PALANTIR TECHNOLOGIES INC.	+26.38%	47.88B
THE TRADE DESK, INC.	+16.23%	39.62B
SEA LIMITED	+55.65%	35.14B
TAKE-TWO INTERACTIVE SOFTWARE, INC.	-11.61%	24.18B

1st Jan change

Capi.

MICROSOFT CORPORATION

+8.60%

3,028B

SYNOPSYS INC.

+3.44%

80.47B

CADENCE DESIGN SYSTEMS, INC.

+2.72%

76.81B

DASSAULT SYSTÈMES SE

-11.97%

54.2B

ATLASSIAN CORPORATION

-15.80%

51.74B

PALANTIR TECHNOLOGIES INC.

+26.38%

47.88B

THE TRADE DESK, INC.

+16.23%

39.62B

SEA LIMITED

+55.65%

35.14B

TAKE-TWO INTERACTIVE SOFTWARE, INC.

-11.61%

24.18B

ANALYST RECOMMENDATIONS : Best Buy, Wells Fargo, AMD, Netflix, Nvidia...	Mar. 20
Splunk Inc.(NasdaqGM:SPLK) dropped from FTSE All-World Index	Mar. 19	CI
Splunk Inc.(NasdaqGS:SPLK) dropped from S&P Software & Services Select Industry Index	Mar. 19	CI
Splunk Inc.(NasdaqGS:SPLK) dropped from S&P TMI Index	Mar. 19	CI
Splunk Inc.(NasdaqGS:SPLK) dropped from S&P Global BMI Index	Mar. 19	CI
ANALYST RECOMMENDATIONS : 3M Company, Snowflake, Splunk, Micron, Nvidia...	Mar. 19
How Cisco Will Integrate Splunk Into Company	Mar. 18	MT
Cisco: completes acquisition of Splunk for $28 billion	Mar. 18	CF
Splunk Inc.(NasdaqGS:SPLK) dropped from NASDAQ Composite Index	Mar. 17	CI
Cisco Systems, Inc. entered into an agreement and plan of merger to acquire Splunk Inc. from Hellman & Friedman Capital Partners X, L.P., managed by Hellman & Friedman LLC, BlackRock, Inc., The Vanguard Group, Inc., PRIMECAP Management Company and others.	Mar. 17	CI
Splunk Inc.(NasdaqGS:SPLK) dropped from NASDAQ-100 Index	Mar. 14	CI
Add a little SaaS to your life	Mar. 14
EU Watchdog Green-lights Cisco Systems' Purchase of Splunk	Mar. 14	MT
Cisco gains EU antitrust nod for $28 billion Splunk acquisition	Mar. 14	RE
Oracle posts rise in quarterly profit on strong cloud demand	Mar. 11	RE
Linde to Join Nasdaq-100 Index	Mar. 11	MT
Cisco's Splunk deal set to win unconditional EU antitrust OK, sources say	Mar. 05	RE
GitLab shares drop as 'less conservative' forecast disappoints investors	Mar. 05	RE
Splunk beats quarterly revenue estimates on steady demand for cloud services	Feb. 27	RE
Splunk Fiscal Q4 Earnings, Revenue Rise	Feb. 27	MT
Earnings Flash (SPLK) SPLUNK Posts Q4 Revenue $1.49B, vs. Street Est of $1.27B	Feb. 27	MT
Splunk Inc. Reports Earnings Results for the Full Year Ended January 31, 2024	Feb. 27	CI
Splunk Inc. Reports Earnings Results for the Fourth Quarter and Full Year Ended January 31, 2024	Feb. 27	CI
Equities Mixed as Traders Parse Economic Data, Fed Governor Remarks	Feb. 27	MT
Cisco to lay off 5% of workforce, cuts annual revenue forecast	Feb. 14	RE

Splunk Inc.

Equities

SPLK

US8486371045

Software

Splunk : Deep Learning Toolkit 3.6 - Automated Machine Learning, Random Cut Forests, Time Series Decomposition, and Sentiment Analysis

Latest news about Splunk Inc.

Chart Splunk Inc.

Company Profile

Sector Other Software