Three years ago, online travel agency Priceline started its cloud journey with a goal to create a more flexible and agile technology infrastructure, says CTO Marty Brodbeck.
That effort included modernizing applications following the 12-factor methodology, “moving them into Docker containers, and then streamlining that process by running them in Kubernetes on Google’s GKE Edge.”
At the same time, the organization was building out a real-time data infrastructure to provide insight into business performance and identify future trends.
CIO Contributing Editor Julia King sat down with Brodbeck at CIO’s recent Future of Cloud summit to discuss the challenges and successes of scaling cloud deployment, his focus on making developers’ work easier, and lessons learned along the way.
What follows are edited excerpts of that conversation. For more of Brodbeck’s insights, watch the full interview embedded below.
On taking a developer-first approach:
We view the software development process as one of the most mission-critical business processes within the company. So, the more that we can make their lives easier, and increase their velocity, the more they are going to contribute to the overall goals of the company. And since we do a lot of A/B testing as a company, the frequency with which we can put features out onto our platform and test them is a critical priority for us.
One of the challenges that we have seen so far in our cloud transformation is since a lot of these technologies are so new, they do not necessarily provide the most robust developer experience.
[Another challenge] is a lot of the cloud development that we have been doing is made of 12-factor and Kubernetes. Yet a lot of the existing CI/CD pipelines that are out there currently are not necessarily Kubernetes or 12-factor native to begin with.
The culture of the company is highly collaborative. [W]e like to test, iterate, and deploy relatively quickly. And that is the same exact way in which we test tooling. We like to come up with a set of use cases, quickly test those out, figure out if they meet our needs, and then figure out a way to scale.
We do that across the entire organization. If an engineer has a really good idea, we want to be able to move quickly on that idea, test it out, make it more robust, and then if it really works, then scale it out across the entire organization.
On reviewing new cloud technology:
The way in which we look at any new technology is, first and foremost, what kind of operational efficiency and effectiveness are we going to get out of these technologies? What costs can we take out of the current way in which we are managing our infrastructure and software development?
[Then] we look at [the] value or incremental revenue [a new technology] is going to drive on our platform. Will this capability help us enable better customer experiences, which is going to drive further revenue and growth of our platform and a better experience for our customers?
The third is just all around operational efficiency or more qualitative metrics around a better work experience for our colleagues and employees.
Whenever we evaluate any kind of technology, a business case is built around one of those three buckets, or sometimes it is all three of them together—with a clear ROI on that investment and when we think we are going to make those business cases profitable for the company.
[As an example], our cloud business case that we built with Google was based on—first and foremost—taking costs out of our infrastructure. So, we put together a 3-year business case that sees us sunsetting all of our data centers by 2023.
The second clear business case was around the efficiency of our CI/CD pipeline: How many more net new features could we crank out of an investment in CI/CD tools for the company? How much automation could we build into our CI/CD pipeline that was going to make our developers more efficient?
On lessons learned along the way:
I think that the biggest lesson for us was to ensure that you have really good operational support and stability for running these platforms in the cloud.
And that [involves] a few key things:
Number one is having a very robust observability platform that monitors your cloud applications and you can look to where you have bugs and defects.
Two, that you have really good cost management controls in place and that you can get granular information on how your organization is using the cloud, with really good policies for governance.
Three, having a very robust site reliability engineering organization that can manage the deployments and management of your Kubernetes environment and scale.
I wish I knew all of what I know now, back when we started this. But the beauty is that we failed quickly in those areas and were able to pivot really quickly and get some really good capabilities in place that has allowed us to scale out our cloud deployment in a timely fashion.