Data Quality for Dummies Data Science and Technology | Risk, Performance, and Reporting

By Pat Reilly| July 20, 2021

In this five-part series, FactSet's Pat Reilly, Director of Analytics, examines the theme of data governance and distribution through the lenses of data sourcing, integration, quality, analysis, and distribution across internal and external clients. Combined, these provide asset managers and asset owners with an overview of the key elements to be considered when constructing an efficient data governance and distribution process.

Part three takes on the theme of data quality; the full series can be downloaded here.

Garbage in, Garbage Out

While the expression applies to so many aspects of life, it's particularly relevant to asset managers and asset owners who are working to integrate and understand ever-increasing amounts of data from multiple sources in continually evolving contexts and platforms. As an investment professional, it's no longer enough to have just the data required to complete a given task. Rather, the world has pivoted to new use cases for standard data and the application of new content sets in the hopes of squeezing alpha (or minimizing risk) across a book of business.

Failing to understand or account for data quality and accuracy after integration is the bugaboo of the industry. Quality and accuracy issues are a massive governance headache, on par with data security and access. Where there's smoke around governance, usually there's fire around quality and accuracy. Missed deadlines, bad data to clients, compliance breaches-these challenges and so many others are caused by poor data quality and accuracy.

In FactSet's survey of Quantitative Analysts, Data Scientists, and Chief Data Officers from 50 global institutional asset managers, quality and accuracy were far and away the leading issues, ranking in the top three concerns for 54% and 46% of respondents, respectively.

That begs the question 'how do you effectively solve the data quality challenge at scale across different data elements?'. Like successfully integrating diverse datasets, the answer is rooted in a robust, transparent process. While proprietary data has similarities with third-party content, there are nuances to each.

I Want You…To Have Quality Holdings Data!

Starting with proprietary data, quality assurance is a process that overlays integration. Understanding the number of portfolios or composites being loaded, the number of securities across the book of business, and the ending market value are all key elements where monitoring can be automated to check for daily deltas above or below a certain threshold. Where a proprietary dataset also includes firm-provided measures like analyst sentiment, inhouse ratings, relative value measures, etc., consistency checks around file format can be implemented. Ideally the process is built to warn and proceed or fail outright, depending on the mission critical status of an element.

Where holdings or composites are concerned, typically a next step involves the generation of analytics. This may be single-security fixed income analytics, group-level performance figures, or portfolio-level risk statistics. A critical, but often overlooked step involves ascertaining the coverage level, typically in terms of percent of market value or number of securities. While the goal will typically start at 100%, an acceptable level of coverage may vary depending on end-user requirements. Understanding those requirements in advance is crucial to establishing a process. This process can be supplemented by automating an initiation of coverage request.

Once the coverage check has been completed, analytics can be derived. This is no small feat. Asset type conventions, client demands, and the composition of the aggregate book all pose challenges. Understanding the quality demands across analytics, returns and attribution, or grouping and partitioning is the first step of the process and can be supplemented via technology and human intervention.

Reconciliation and remediation of analytics are best handled via a dual exception approach that takes daily deltas and outliers into account. Take effective duration as an example. As of previous close, a generic corporate bond may have a duration of 4.0. As of today's close, that same bond may have a duration of 0.5. That is an extreme drop, but is not necessarily wrong. By applying exception-based checks to capture a 10% move, this security would be flagged for further research. From there, human intervention could confirm the price used in the generation and research whether there are security specific details that make the move valid.

A similar approach applies to return calculations. Starting with a bottom-up decomposition of total return, it's easy to automate the review of price action, income, or a return of principal. For example, when terms indicate a coupon occurred, but there is no payment or concurrent drop in accruals. This is a not a complicated breach, but it is the needle in the haystack that can take hours to find manually. Transactions can also be monitored in this fashion. The ending quantity held should equal the beginning quantity held plus purchases or minus sales. Identifying a mismatch immediately improves governance overall and turns reconciliation that may only occur monthly or quarterly into a nightly process.

The final step in the puzzle is around descriptives like sector or industry, rating, or analyst. At first glance, this would seem to be an easy check, where identifying securities that fall into an unassigned or other group can be sorted manually. However, think about a book in its entirety. There may be hundreds, if not thousands, of portfolios. Some may follow a GICS definition while others follow a benchmark classification. Still others may follow a firm specific definition. Where sector, industry, and sub-industry are used, even more potential gaps exist. Identifying these upfront again pulls forward reconciliation and addresses holes in the underlying security master. Immediate recognition improves governance and enhances client service by reducing the time to client during a reporting cycle.

It is not enough to merely surface these gaps in an error log. Increasing transparency about the identification and resolution of these gaps creates a feedback loop. The ability to automate the detection and notification, while also providing stakeholders real-time access to process performance from integration to analytics generation to production, assures the quality of the data in question and ultimately enhances the end user's usage of the data, regardless of their role in the firm.

Adventures in Market Data Quality Assurance

It would be irresponsible to walk through portfolio quality assurance best practices while ignoring market data such as benchmarks, fundamentals, or estimates. After all, so much of the portfolio construction process assumes a sound market data foundation.

Benchmarks will tend to follow a process similar to portfolio holdings. Constituent coverage, universe consistency, and meta data completeness all come into play. The primary difference relative to a portfolio is in the reconciliation to an official benchmark return. Performing reconciliation over daily, month-to-date, and year-to-date periods with a small threshold for variance will suffice. This allows for matching official benchmark returns either on or off platform.

Other market data, such as fundamentals or estimates, follow a rules-based approach, testing completion of data or inputs into a normalized metric before the data element is released downstream. Using FactSet's fundamental dataset as an example, over 1,500 automated checks are performed, with results verified by analysts before publication.

Data integration and validation approaches like this seamlessly pave the way for further analysis.

The information contained in this article is not investment advice. FactSet does not endorse or recommend any investments and assumes no liability for any consequence relating directly or indirectly to any action or inaction taken based on the information contained in this article.

Attachments

  • Original document
  • Permalink

Disclaimer

FactSet Research Systems Inc. published this content on 20 July 2021 and is solely responsible for the information contained therein. Distributed by Public, unedited and unaltered, on 20 July 2021 10:29:04 UTC.