Shaping the Cost Transparency for Expedia Group

Musings from our journey building next-generation FinOps platform

'At Expedia Group, our mission is to power global travel for everyone, everywhere. Our multi-cloud FinOps platform helps ensure we build a tech optimised travel platform while improving our unit economics.'

Background

During our multi-year cloud migration journey, tech and finance teams have had a lot of lessons thanks to expensive mistakes from which we came out stronger and smarter. We also had our fair share of multiple cloud optimisation attempts which prompted us to find common patterns in cloud finance management and invest in building nimble solutions around it. In this blog, we'll talk about those patterns and the solutions we've envisioned to shape a comprehensive suite of platform components while building a cloud cost-aware engineering culture at Expedia Group.

Focus on Common Patterns

Before we could get started on build our own cost platform, we established a framework based on common patterns around cloud finance management based on our previous learnings of retroactively optimizing cloud cost.

This framework helped us strategize our build vs buy decisions, compare or categorize available solutions and ensure the fitment of tools we plan to build or buy to our goals in each of these cost sub-domains. We looked at multiple open-source and paid solutions in each of these sub-domains against our use cases, identified the gaps in their capabilities and checked the feasibility of fitting them into our framework.

Building the Foundation

After carefully analyzing available solutions, we realized that there is no end to end platform available in the market which can cover our current and future needs which included:

  1. Support for multiple billing datasources from IaaS (AWS, Azure, GCP etc.) and PaaS (Qubole, Databricks, Kubernetes etc.) & SaaS (Datadog, Splunk etc.)
  2. Show/Charge-back of shared platforms (in-house & 3rd party)
  3. Customized cost attribution rules and org metadata
  4. Integration with our existing BI systems
  5. Ability to democratize the cost and usage data

This led us to architect our own cloud cost management platform in-house. Instead of taking a big bang approach, we decided to build it in a lego-like fashion which can interplay with each other to achieve the broader vision. These components were built over time and were extensible, portable and nimble enough to adapt to continuously changing business needs.

At EG, we felt that cloud finance is a 'data problem' given the size of raw billing data. Hence the core of the platform is an ETL pipeline which gives us the flexibility to build all kinds of integrations and land the data into our data lake for combining/enriching with other business data. As org standard, we generate productized cost datasets (domain and sub-domain based) which act as a source of truth for further reporting and analytics while keeping the cost business rules in one place.

We gradually added other integrations with our existing BI platforms for executive and self-service reporting. API, UI & ML components were added on a use-case basis which was super effective and low cost/maintenance for us.

Cost Platform in Numbers

In order to justify the build vs buy decision, we used some fairly simple and common sense techniques to keep cost of the platform super low. We intentionally utilized and self-hosted open-source frameworks for building various components to keep them portable and avoid lock-in. Existing enterprise licenses BI solutions were leveraged to avoid the cost of reporting/analytics.

Apart from self-service analytics, our team also provided some pre-built standard dashboards and templates to lower the barrier of entry and serve as starting point. A community of key stakeholders, sub-org level cost focals and power users was formed to help guide new users and lower the support burden on the dev team. Here are some facts and figures to throw some light on the same:

  1. No. of Spark jobs - 50
  2. Raw and productized datasets + streams - 150
  3. IaaS vendors supported - 4
  4. PaaS vendors (1st & 3rd party) supported - 7
  5. Metadata sources supported - 5
  6. Avg. cost, usage & meta-data processed - ~1 Billion rows/month
  7. No. of visibility/governance/optimization dashboards - 50+
  8. Number of internal platform users - 1000+
  9. Operating cost of core platform (ETL, API & UI) - 20X less than common paid solutions

A very small (think 1-pizza) team of 'full-stack data engineers' primarily based out of India have been able to build and maintain this platform over the last couple of years.

Learnings & Accomplishments

Building our own cloud finance platform led to a lot of learnings and discovery of further business use cases which can be solved in an Agile manner using the platform components. It enabled a flywheel effect for cost governance at EG which allows us to quickly iterate and improve our processes and tools for continuous optimization.

Here is a quick summary of other benefits we were able to achieve and our learnings from them:

  1. Building the data pipeline in vendor and brand agnostic fashion upfront was our best decision till date. It enabled tagging governance, advanced & fine-grained analytics, optimization and standardized reporting.
  2. Also allowed us to do platform service allocations and provide visibility that was not available by default from native infrastructure billing data.
  3. The ability to support multiple metadata systems allowed multiple lines of business to adopt the solution and start tracking costs even without a central tagging standard.
  4. Fully loaded pricing along with cost allowed one is to one comparison while making tech and architectural choices between vendors and service offerings.
  5. Integration with internal BI tools allowed agility to move reporting between different tools and allow combining existing business data for cloud cost vs business transaction kind of analytics.
  6. More technical users were enabled with programmatic access (SQL & API) to productized datasets and solve their own use cases (custom alerting/reporting).
  7. Decoupling the 'operational cost and prediction view' from 'financial cost and forecast view' turned out another good decision in hindsight. It allowed engineering teams to be more focused on directional accuracy.
  8. Prediction UI allowed scaling the monthly prediction process to lower levels (up to 4 levels down) in simple and intuitive manner.
  9. Gamification via grading mechanism (actual vs prediction) in standard dashboards drove engineering leaders & teams to pay regular attention and be more accountable.
  10. Engineering leaders and teams started talking about cloud cost run rates in weekly pacers empowered with pre-built or custom cost dashboards dedicated to their orgs.
  11. By the law of large numbers and quick turnaround for forecast correction at a low level, overall prediction accuracy improved significantly at the org & sub-org level.

This has led to a culture of cost awareness and significant upside to Expedia Group with key accomplishments including:

  1. Helped lower the cloud cost significantly across EG.
  2. Improved yearly forecast accuracy to 95% (+80% YoY) for a major EG line of business (~1% variance for last 6 months).
  3. Cut down untagged/mistagged resources by 96% across EG.
  4. Brought down cloud cost forecasting cycle from yearly → quarterly → monthly level.
  5. $$$ savings by retiring paid cloud finance management solution.
  6. Built an industry matching multi-tenant platform chargeback model supporting a variety of stateless/stateful compute solutions.

Acknowledgements

In our platform journey, we had the pleasure of working with some brilliant individuals across geographies from other EG locations and teams. Multiple teams like dev platform, metadata, optimization engine, internal platform teams, finance, cloud infra & security closely partnered with us for integration and compounded value addition. Our Cloud Tech Optimization team, cost focals and senior leaders across EG responsible for their cost centers constantly derived value from our platform and provided feedback to plug the gaps. We also got a lot of support from our vendors and their technical account managers who time to time extended consulting and support to help bring this platform to reality.

Hope this blog serves as a reference point for other engineering teams who're contemplating a similar solution for their organizational needs. We're open to hearing thoughts and feedback from the FinOps community on our approach, share learnings and help improve the overall eco-system.

Learn more about technology at Expedia Group

Attachments

  • Original document
  • Permalink

Disclaimer

Expedia Group Inc. published this content on 14 September 2021 and is solely responsible for the information contained therein. Distributed by Public, unedited and unaltered, on 14 September 2021 13:21:01 UTC.