When did you realize that making the transition was the right decision?

We had built our first two cells and had data flowing into them. During a critical holiday period, we had an unexpected traffic spike that caused us to reach capacity in one cell. It was the first time that we shifted traffic out of an unhealthy cell to a healthy cell in production. We were then able to scale up both cells concurrently with no impact on customers. That's when we knew cell architecture was really going to work for us, and we proceeded to accelerate our migration.

What advice do you have for other practitioners looking to transition to the cloud?

Use New Relic to instrument your applications and infrastructure so you can monitor, debug, and improve your entire stack. Define your SLOs (service level objectives), validate them with your customers to align on expectations, and configure your alerts to notify you according to your needs.

Establish a FinOps team early on to focus on cloud cost optimization. Tag your cloud assets upfront and define clear governance and processes for capacity management and leverage autoscaling where possible.

When you move from data centers to the cloud, it changes your business model. You're moving from Capex (capital expenditures) to Opex (operation expenditures). If you do not understand how to manage capacity and costs, you can blow up your budget very quickly.

Our engineers are making decisions about costs on a daily basis. They're analyzing things like "What's the cost of my service? How can I implement auto-scaling? How can I reduce resource usage? Where am I not as efficient as I could be in capacity management?"

What best practices do you recommend for cell architecture?

We want our cells to have a lifespan of 90 days or less. It's a challenge to build and decommission cells that frequently, but otherwise they get stale. With a stale cell, you get drift in your configuration and deployed services, you're not picking up security patches and OS upgrades, you're not getting new functionality from your cloud provider. It's really critical to keep those cells fresh.

We have a team dedicated to tooling and automation of cell builds and decommissions and have made tremendous improvements in the efficiency, consistency, and quality of cell builds. We are now building new types of cells and cells in different regions. We even have a random cell name generator and have seen some really interesting and funny cell names.

What is one lesson learned from the transition to AWS?

Don't plan for the happy path. Most people tend to be overly optimistic versus planning for new technology, discoveries, unexpected work, or things going wrong. We planned for the data platform migration to take one year. By taking an iterative approach and planning in time for unknowns, we were able to overcome challenges, make adjustments, and meet our goals.

Attachments

  • Original Link
  • Original Document
  • Permalink

Disclaimer

New Relic Inc. published this content on 18 November 2021 and is solely responsible for the information contained therein. Distributed by Public, unedited and unaltered, on 18 November 2021 17:12:03 UTC.