Log in
E-mail
Password
Show password
Remember
Forgot password ?
Become a member for free
Sign up
Sign up
New member
Sign up for FREE
New customer
Discover our services
Settings
Settings
Dynamic quotes 
OFFON

CLOUDERA, INC.

(CLDR)
SummaryQuotesChartsNewsRatingsCalendarCompanyFinancials 
SummaryMost relevantAll NewsAnalyst Reco.Other languagesPress ReleasesOfficial PublicationsSector newsMarketScreener Strategies

Cloudera : Apache Ozone – A High Performance Object Store for CDP Private Cloud

10/15/2021 | 08:42am EST

As organizations wrangle with the explosive growth in data volume they are presented with today, efficiency and scalability of storage become pivotal to operating a successful data platform for driving business insight and value. Apache Ozone is a distributed, scalable, and high performance object store, available with Cloudera Data Platform Private Cloud. CDP Private Cloud uses Ozone to separate storage from compute, which enables it to handle billions of objects on-premises, akin to Public Cloud deployments which benefit from the likes of S3. Ozone is also fully compatible with S3 API*, establishing it as a future proof solution and enabling CDP Hybrid Cloud to meet the growing demand for a hybrid data cloud .

Apache Ozone has added a new feature called File System Optimization ("FSO") in HDDS-2939. This feature is merged upstream into the master branch and will be available in the next Ozone release. The FSO feature provides file system semantics (hierarchical namespace) efficiently while retaining the inherent scalability of an object store. With FSO, Apache Ozone guarantees atomic directory operations, and renaming or deleting a directory is a simple metadata operation even if the directory has a large set of sub-paths (directories/files) within it. In fact, this gives Apache Ozone a significant performance advantage over other object stores in the data analytics ecosystem. Moreover, Ozone seamlessly integrates with Apache data analytics tools like Hive, Spark and Impala. Also, various use cases like Apache Hive drop table query, recursive directory deletion, directory moving operations are now much faster and are strongly consistent without any partial results in case of any failure.

Apache Ozone supports interoperability of the same data for various use cases. For example, a user can ingest data into Apache Ozone using FileSystem API, and the same data can be accessed via Ozone S3 API*. This would potentially improve the efficiency of the user platform with on-prem ObjectStore.

Please refer to Apache Ozone documentationfor more details regarding Apache Ozone's atomicity guarantees.

In this blog post, we will look into benchmark test results measuring the performance of Apache Hadoop Teragen and a directory/file rename operation with Apache Ozone (native o3fs) vs. Ozone S3 API*. We enabled Apache Ozone's FSO feature for the benchmarking tests.

Job Committers:

Apache data analytics traditionally assumes that rename and delete operations are strictly atomic. Most data analytics tools like Apache Hive, Apache Impala, Apache Spark, MR, etc. often write output to temporary locations and then rename it at the end of the job to become publicly visible. For example, the job committers of Hive and Impala require consistency of directory listing and atomicity of rename operations. Consequently, the performance of the query is directly impacted by how quickly the intermediate rename operation is completed. This means that job output is observed by readers on an all-or-nothing basis. Below is a high-level view of Apache data analytics and the interactions between the storage systems like Apache HDFS, Apache Ozone, S3-like object stores, etc. Even though Ozone is an object store, it does not need any special output committers.

Performance comparison between Apache Ozone and S3 API*

  • Benchmarking Apache Ozone vs. S3 API* using Teragen:

We ran Apache Hadoop Teragen benchmark tests in a conventional Hadoop stack consisting of YARN and HDFS side by side with Apache Ozone. We used an Apache Hadoop S3A Filesystem connector to connect to the S3 API* and also used Hadoop's default file committer to commit work to S3.

The following measurements were obtained using Teragen for various runs with data size in the range of 1GB, 10GBand 100GBrespectively. We performed multi-run testing (three runs) for each data size, and the performance numbers have been averaged out with a max deviation of ~10% between runs. The results show that the performance of Native Ozone is faster than S3 object stores, e.g., S3 API*, etc).

  • File movement performance comparison:

We ran "hadoop mv command" tests on a directory of size in the range of 1GB, 10GBand 100GBrespectively, stored in Apache Ozone and S3 API*. This directory contained a uniform-sized 30 files. Apache Native Ozone (o3fs) performed the renaming of source directory to destination directory similar to HDFS but unlike S3a (S3 API*, etc) which does a copy object and delete original object operation.

The following chart shows that Ozone performance for the move operation is in the same order as HDFS while retaining the atomicity guarantee. We performed multi-run testing (three runs) for each directory size, and the performance numbers have been averaged out with a max deviation of ~10% between runs.

Test Environment Details:

The cluster setup consisted of 10 uniform physical nodes with 40 core Intel® Xeon® processors, 128 GB of RAM, 3 x 2 TB disks, 1 x 1 TB disk and a 10 Gb/s network, configured with 3 dedicated disks for data storage. The nodes ran CentOS 7, and Cloudera Runtime 7.5.1, which contains Hadoop 3.1.1, ZooKeeper 3.5.5 and Ozone built from Apache master branch, version 1.1.0, github commit hash 19ed79464ca9ed2210ca8ac47a4736fb67d8bd3e.

SSL/TLS was turned off and in unsecure mode. High availability was enabled for the Apache Ozone service.

We used an Apache Hadoop S3A Filesystem connector to connect to the AWS S3 object store and also used Hadoop's default file committer to commit work to S3.

Conclusion

The benchmark results showed that Apache Ozone with the File System Optimization ("FSO") feature enabled was faster than an S3 API*-like an object store and very attractive for high-performance data-intensive workloads. With FSO, Ozone directory/file rename and delete operations are strongly consistent and give deterministic performance numbers irrespective of the large set of subpaths (directories/files) contained within it.

In short, Ozone with FSO helps users to achieve the same atomicity guarantees as HDFS with job and task commits thus making it natively integrated with Apache data analytics tools like Hive, Spark and Impala, etc. without the need for an S3Guard-like layer, while retaining its performance characteristics. Ozone in CDP Private Cloud provides out of the box security integration with Apache Ranger and Apache Atlas. Furthermore, data stored in Ozone can be shared between use cases deployed as part of CDP as well as external third-party analytics, eliminating the need for data duplication, which in turn reduces risk and optimizes resource utilization.

Further Reading

Apache Ozone - Object Store Overview

Apache Ozone - Object Store Architecture

S3 API* - refers to Amazon S3 implementation of the S3 API protocol.

Disclaimer

Cloudera Inc. published this content on 15 October 2021 and is solely responsible for the information contained therein. Distributed by Public, unedited and unaltered, on 15 October 2021 12:41:02 UTC.


© Publicnow 2021
All news about CLOUDERA, INC.
10/18CLOUDERA : Our 2021 Data Impact Awards Finalists
PU
10/15CLOUDERA : Apache Ozone – A High Performance Object Store for CDP Private Cloud
PU
10/14#CLOUDERALIFE SPOTLIGHT : Bryan Bottinelli, Commercial Account Executive
PU
10/14CLOUDERA : CDP Public Cloud Regional Control Plane is Now Available in Australia and Europ..
PU
10/13CLOUDERA : How to Turn your Data Center into a True Private Cloud
PU
10/13CLOUDERA : Your Parents Still Don't Know What a Hashtag Is. Let's Teach Them the Basics of..
PU
10/12CLOUDERA : What is new in Cloudera Streaming Analytics 1.5?
PU
10/08Tech Stocks Narrowly Lower Near Friday Close
MT
10/08Tech Stocks Drifting Lower in Friday Trading
MT
10/08KKR : Cloud Company Cloudera Goes Private After KKR, Clayton, Dubilier & Rice Close $5.3 B..
MT
More news
Analyst Recommendations on CLOUDERA, INC.
More recommendations
Financials (USD)
Sales 2021 869 M - -
Net income 2021 -163 M - -
Net cash 2021 90,1 M - -
P/E ratio 2021 -28,4x
Yield 2021 -
Capitalization 4 737 M 4 737 M -
EV / Sales 2020 3,38x
EV / Sales 2021 4,93x
Nbr of Employees 2 728
Free-Float 98,7%
Chart CLOUDERA, INC.
Duration : Period :
Cloudera, Inc. Technical Analysis Chart | MarketScreener
Full-screen chart
Income Statement Evolution
Managers and Directors
Robert G. Bearden Chief Executive Officer & Director
Mick Hollison President
Scott Aronson Chief Operating Officer
David Matthew Howard Secretary, Director & Chief Legal Officer
Kevin D. Cook VP-Corporate Development & Investor Relations
Sector and Competitors
1st jan.Capi. (M$)
CLOUDERA, INC.14.95%4 737
ADOBE INC.32.39%315 027
WORKDAY INC.15.83%69 388
DATADOG, INC.84.12%56 553
AUTODESK, INC.-16.76%55 879
TWILIO INC.-14.25%51 760