NetApp Data Science Toolkit 1.1: Bringing the data fabric to data scientists and data engineers

March 03, 2021 at 11:08 am EST

As the adoption of AI in the enterprise continues to expand at a rapid pace, AI training workflows are becoming more complex. Data scientists and data engineers often need to pull data from multiple different data sources, and these data sources aren't always compatible with each other. This presents a major challenge and causes many AI projects to underdeliver or even fail completely. It is now imperative that data scientists and data engineers have the tools necessary to construct unified data pipelines that incorporate different data sources, environments, platforms, and protocols. The latest version of the NetApp® Data Science Toolkit, version 1.1, enables data scientists and data engineers to directly trigger the movement of datasets-on demand or as a step in an automated workflow. Here's a rundown of what's new in version 1.1.

Triggering a sync operation for a Cloud Sync relationship

You can now use the Data Science Toolkit to sychronize a NetApp Cloud Sync relationship that you previously created in your NetApp Cloud Central account. The Cloud Sync service can replicate data to and from various file and object storage systems. Use cases include the following:

Replicating new sensor data from the edge back to the core data center or to the cloud. You can use this data for artificial intelligence or machine learning (AI/ML) model training or retraining.
Replicating a newly trained or updated model from the core data center to the edge or to the cloud to be deployed as part of an inferencing application.
Replicating data from a Simple Storage Service (S3) data lake to a high-performance environment to use in training an AI/ML model.
Replicating data from a Hadoop data lake (through Hadoop NFS Gateway) to a high-performance environment to use in training an AI/ML model.
Saving a new version of a trained model to an S3 or Hadoop data lake for permanent storage.
Replicating NFS-accessible data from a legacy or third-party system of record to a high-performance environment for use in training an AI/ML model.

Triggering a sync operation for an asynchronous Mirror or Vault relationship

Using the Data Science Toolkit, you can now synchronize an existing NetApp SnapMirror^® relationship whose destination volume is on your storage system. SnapMirror volume replication technology quickly and efficiently replicates data between NetApp storage systems. For example, you can gather data from another storage system and replicate it to your own storage system for AI/ML model training or retraining.

Pulling data from S3

The Data Science Toolkit now lets you pull one or more objects from an S3 bucket. When you pull multiple objects, the operation is multithreaded, so you'll get better performance than if you loop through objects and pull them one at a time. This S3 pull functionality is particularly useful when a NetApp ONTAP® system is your AI training environment, but you need to collect training data from an S3 object storage data lake, such as NetApp StorageGRID® or an S3-compliant object store in the cloud. One caveat: The multithreaded S3 pull functionality hasn't been tested at scale, so it might not be appropriate for extremely large datasets.

Pushing data to S3

The toolkit also lets you push data to an S3 bucket. As with the S3 pull functionality, the S3 push functionality lets you push one file or multiple files; for multiple files, the operation is multithreaded. This S3 push functionality is useful for saving trained models and updated datasets in an S3 object storage data lake. The same caveat applies here: Because the multithreaded S3 push functionality hasn't been tested at scale, it might not be appropriate for extremely large datasets.

New data volumes are now thin-provisioned by default

The toolkit also includes a minor enhancement to the 'create volume' operation. Now, new volumes that you create will be thin-provisioned by default. A thin-provisioned volume is one for which storage space isn't reserved in advance. Instead, it's allocated dynamically, as needed, and free space is released back to the storage system when data in the volume is deleted. This approach helps you use your storage space more efficiently, because you won't end up underutilizing it anytime there's space available. If you want to fully allocate storage space for a volume that you're creating, you can specify an optional parameter, which makes the system guarantee sufficient space for the full capacity of the volume.

Learn more

For a full list of enhancements, fixes, and changes in version 1.1, refer to the release notes. To download the latest version of the NetApp Data Science Toolkit, visit the toolkit's GitHub repository.

Attachments

Original document
Permalink

Disclaimer

NetApp Inc. published this content on 03 March 2021 and is solely responsible for the information contained therein. Distributed by Public, unedited and unaltered, on 03 March 2021 16:07:01 UTC.

	1st Jan change	Capi.
NETAPP, INC.	+13.43%	20.83B
PURE STORAGE, INC.	+45.74%	17B
SHANNON SEMICONDUCTOR TECHNOLOGY CO.,LTD.	+10.68%	2.41B
INNODISK CORPORATION	-4.81%	800M
QUANTA STORAGE INC.	+6.95%	762M
NETAC TECHNOLOGY CO., LTD.	-31.97%	692M
ARGOSY RESEARCH INC.	-4.96%	449M
NETLIST, INC.	-33.24%	338M
AUSTRIACARD HOLDINGS AG	+1.11%	244M
INFORTREND TECHNOLOGY, INC.	+2.11%	184M

1st Jan change

Capi.

NETAPP, INC.

+13.43%

20.83B

PURE STORAGE, INC.

+45.74%

17B

SHANNON SEMICONDUCTOR TECHNOLOGY CO.,LTD.

+10.68%

2.41B

INNODISK CORPORATION

-4.81%

800M

QUANTA STORAGE INC.

+6.95%

762M

NETAC TECHNOLOGY CO., LTD.

-31.97%

692M

ARGOSY RESEARCH INC.

-4.96%

449M

NETLIST, INC.

-33.24%

338M

AUSTRIACARD HOLDINGS AG

+1.11%

244M

INFORTREND TECHNOLOGY, INC.

+2.11%

184M

Real-time Estimate Cboe BZX Other stock markets 02:20:31 2024-04-18 pm EDT			5-day change	1st Jan Change
100.1 ^USD	-0.82%		-4.63%	+13.43%

Apr. 09	NetApp, Inc. Announces Executive Changes	CI
Mar. 21	Netapp Insider Sold Shares Worth $862,218, According to a Recent SEC Filing	MT

NetApp, Inc. Announces Executive Changes	Apr. 09	CI
Netapp Insider Sold Shares Worth $862,218, According to a Recent SEC Filing	Mar. 21	MT
NetApp, Inc. Empowers Customers to Securely Talk to Their Data in Collaboration with NVIDIA	Mar. 18	CI
NetApp, Inc. Appoints Alessandra Yockelson as Chief Human Resources Officer	Mar. 18	CI
Netapp Insider Sold Shares Worth $1,535,614, According to a Recent SEC Filing	Mar. 14	MT
BofA Securities Ups Price Target on NetApp to $85 From $78, Keeps Underperform Rating	Mar. 13	MT
Transcript : NetApp, Inc. Presents at Morgan Stanley?s Technology, Media & Telecom Conference 2024, Mar-05-2024 02:50 PM	Mar. 05
Citigroup Raises NetApp's Price Target to $110 From $90, Maintains Neutral Rating	Mar. 05	MT
Tesla, Apple and China’s delusions	Mar. 05
NetApp Turbocharge AI Innovation with Intelligent Data Infrastructure	Mar. 05	CI
NetApp Fights Ransomware in Real-Time with Built-In Artificial Intelligence on Enterprise Storage and Enhanced Cyber-Resiliency Solutions	Mar. 05	CI
North American Morning Briefing : S&P 500 Futures -2-	Mar. 05	DJ
ANALYST RECOMMENDATIONS : Dell, Domino's, Netapp, Okta, M&S...	Mar. 05
Argus Upgrades NetApp to Buy From Hold, Price Target is $130	Mar. 04	MT
Wall Street: record-breaking fireworks on Friday	Mar. 04	CF
Wall Street: record-breaking fireworks, semi-C +4% sector	Mar. 01	CF
S&P 500, Nasdaq Stretch Record Closing Runs	Mar. 01	MT
S&P 500 Climbs to a Fresh Record Close Led by Technology, Real Estate, Consumer Discretionary; Health Care Slips	Mar. 01	MT
S&P 500, Nasdaq Close Record Runs	Mar. 01	MT
US Equity Markets Close Higher Friday Following Manufacturing, Consumer Sentiment Data	Mar. 01	MT
Sector Update: Tech Stocks Sharply Higher Late Afternoon	Mar. 01	MT
Equities Rise Intraday as Investors Weigh Manufacturing Data, Consumer Sentiment	Mar. 01	MT
Weakness in Consumer Sentiment, Manufacturing Sends US Equity Indexes Higher, Treasury Yields Drop	Mar. 01	MT
Sector Update: Tech Stocks Sharply Higher Friday Afternoon	Mar. 01	MT
Sector Update: Tech	Mar. 01	MT

NetApp, Inc.

Equities

NTAP

US64110D1046

Computer Hardware

NetApp Data Science Toolkit 1.1: Bringing the data fabric to data scientists and data engineers

Latest news about NetApp, Inc.

Chart NetApp, Inc.

Company Profile

Income Statement Evolution

Analysis / Opinion

Ratings for NetApp, Inc.

Analysts' Consensus

EPS Revisions

Quarterly earnings - Rate of surprise

Sector Storage Devices