What's in a name? Quite a lot, actually. When we are talking about a product or tool, the name of the product or tool implies certain things. A name often implies functionality. What a product is called can also imply its intended use. A name can be enlightening, or it can be confusing.

After extensive conversations with our customer community, we have decided to rename the NetApp Data Science Toolkit to NetApp DataOps Toolkit. We know that a name change can be disruptive, so we did not make this decision lightly. We believe that this name better reflects the function of the toolkit.

The name
Why the name NetApp® DataOps Toolkit? Although the toolkit was originally developed for data scientists and is still very much targeted toward data scientists, it is not a data science framework like TensorFlow or PyTorch. The name Data Science Toolkit could be misunderstood to imply that the product is similar to those frameworks. Instead, our toolkit simplifies data and storage operations for data scientists, data engineers, and developers. The DataOps Toolkit works with data science frameworks like TensorFlow and PyTorch, which simplify the training of deep learning (DL) models. The NetApp DataOps Toolkit, on the other hand, simplifies operations relating to the data that data science frameworks use to train models.
Accelerating AI workflows
For those of you who are new to the NetApp DataOps Toolkit, let's review the toolkit's capabilities. The toolkit has two key features that can greatly streamline AI workflows.

With the NetApp DataOps Toolkit, a data scientist can almost instantaneously create a space-efficient data volume that's an exact copy of an existing volume, even if the existing volume contains terabytes or even petabytes of data. Data scientists can quickly create clones of datasets that they can reformat, normalize, and manipulate, while preserving the original "gold-source" dataset. Under the hood, these operations use highly efficient and battle-tested NetApp cloning technology, but they can be performed by a data scientist without storage expertise. What used to take days or weeks (and the assistance of a storage administrator) now takes seconds.


Data scientists can also save a space-efficient, read-only copy of an existing data volume. Based on the famed NetApp Snapshot™ technology, this functionality can be used to version datasets and implement dataset-to-model traceability. In regulated industries, traceability is a baseline requirement, and implementing it is extremely complicated with most other tools. With the NetApp DataOps Toolkit, it's quick and easy.

The NetApp DataOps Toolkit comes in two different flavors-one for Kubernetes-based environments, and one for traditional virtualized or bare-metal environments. Users can take advantage of the NetApp DataOps Toolkit capabilities in any type of environment that they operate in.
Other use cases
Another reason that we decided to change the name of the toolkit is that its usefulness is not limited to only data science use cases. Working together with our customers, we have been hard at work applying the toolkit to other use cases involving other types of users. What other use cases, you ask? Stay tuned to NetApp blogs to find out! We have another blog coming soon that focuses on some additional benefits of the toolkit.

The name may have changed, but the features and benefits haven't. With the NetApp DataOps Toolkit, data management is not an impediment to a fast, streamlined AI process. To learn more about the toolkit, visit its GitHub repository. To learn more about all of NetApp's AI solutions, visit www.netapp.com/ai.

Attachments

  • Original document
  • Permalink

Disclaimer

NetApp Inc. published this content on 02 August 2021 and is solely responsible for the information contained therein. Distributed by Public, unedited and unaltered, on 04 August 2021 09:45:02 UTC.