Deploy Your Next Application to Google Kubernetes Engine (Cloud Next '19)

[MUSIC PLAYING] WILLIAM DENNISS: Hello, and welcome to our session “Deploy Your Next Application to Google Kubernetes Engine.” I’m William Dennis, a product manager on Google Kubernetes Engine And later I’ll be joined by Marshall and Cat from Fair In today’s session we’re going to be covering three broad topics First, we’re going to talk about why you might want to use Kubernetes in the first place and some of the benefits you can expect to realize if you do Then I’m going to do a demo on deploying an app into Google Kubernetes Engine And finally, we’re going to talk a little bit about how to introduce Kubernetes into your enterprise All right So why should you use Kubernetes as your application platform? Well, first and foremost, Kubernetes is a container platform, so you inherit all the benefits of developing with containers These include things like language and library flexibility, so no longer are you limited to a particular language in your production environment, like a certain version of Java or a certain vision of Ruby, and no longer do you have to put up with some three-year-old release of a library that for some reason is still shipping in your Linux distribution With containers, the developer is in control of what they want to run and exactly how they want to configure it You also get isolated dependencies, so you can run multiple containers on a single machine without them interfering with each other And production and development uniformity– this is one of my favorite features So your production environment looks very, very close to your development environment And in fact, if you have a bug in production, you can typically download that container, run it on your developer machine, attach to, say, the staging endpoints, and actually debug the exact code that was running in production Not only that, but you also get uniformity between the developers on your team So gone are the days where you have to spend a whole day configuring some development machine with all the various tools and things that you need You just install Docker and you can immediately be building these containers You can even have someone on the team using a Windows machine, someone using a Linux machine, someone using Mac, all developing and deploying Linux containers, or whatever operating system that you wanted to use And you get all this with really minimal overhead and complete resource isolation So, in the past, you might have used virtual machines to get some of these benefits By comparison, containers are very lightweight, both in the execution and the low overhead of the actual image size So back to Kubernetes Kubernetes is then a container platform that manages those containers for you It does things like handling how to schedule and launch the containers on the machines It’ll do things like reboot any containers that crash It can even run– continually monitor them with health check, so that it won’t route traffic to something that’s hung, or a replica of your application that, say, is missing its connection to the database It performs a lot of operations like this, all with the goal of keeping your application up and running without too much help from you So Kubernetes is a workload level abstraction And what I mean by that is you describe your workload and it basically has the job of running that for you So for example, let’s say you have a server that you want to run in a distributed way, or you have a database that has a disk attached that needs to follow it around, or maybe you want to have a logging utility that runs on every single node, or maybe you have a movie that you want to render one frame at a time using the cheapest possible compute These are all different workloads, and these are all things that you can just represent in Kubernetes terms, and it can handle it for you And I believe that Kubernetes sits at the right level of abstraction It’s not super low level in the sense that you’re not having to deal with individual machines anymore, but it’s not too high level in the sense that it doesn’t interfere with what you’re trying to do with your application It kind of sits very cleanly in the middle And for me this contrasts to traditional PaaS, which kind of like merges into the application layer by imposing a lot of restrictions on you, and also occasionally even modifying how the code actually runs With Kubernetes, these layers between the infrastructure, the application platform, and the application itself are very, very clean And I think that’s important, because when you’re choosing the application platform you want to use and when you’re looking at abstraction layers, you want to prioritize making hard tasks possible over making simple tasks easier So Kubernetes, instead of focusing on the easiest type of application to deploy, which is typically, say, a stateless application, and making that task easier, it actually focuses on making it possible for you to deploy a really wide range of far more complex deployments, things like machine learning batch task or a database that has state And it can do all this while enabling

a lot of really nice cost efficiencies as well So I mentioned before that containers have really low overhead When you combine that property of containers with the power of Kubernetes to manage those containers for you, you can actually really pack a whole bunch of containers onto machines, and you can have a bunch of machines running at really high efficiency And you can even do that in such a way that it doesn’t compromise the availability and reliability of your services And this is really something I think that’s unique to Kubernetes and one of the real big value adds that Kubernetes brings over just raw containers Kubernetes also comes with a rich ecosystem and community So let’s say you want to add some logging or monitoring service to your application, with Kubernetes you typically have the choice You can either get some open source and run it yourself or you can find a vendor that can manage and run that for you I think this is really good because it kind of lets you pick and choose what you want to focus on Maybe in some areas you’d rather have someone else manage it for you But in others, maybe you actually really want to understand how that’s working and you want to run it yourself With Kubernetes, you always have both options, I find, with pretty much anything that you want to try and do And, of course, Kubernetes itself is open source and, perhaps equally importantly, it’s really widely available How wide? Well, if you look at the certified Kubernetes program run by the CNCF, The Cloud Native Compute Foundation, you’ll see that there are currently 32 platforms that are certified, 50 for distributions and 14 installs So you really have a wide range of choice of where you want to run that Kubernetes And, of course, Google Kubernetes Engine is a founding member of the certified Kubernetes program And certainly, not all Kubernetes platforms are created equal Kubernetes Engine was one of the very earliest to go to market, the earliest managed Kubernetes product And we have a lot of really robust automation capabilities, things like cluster autoscaling, which can add compute resources when you need them and remove them in kind of a quiet period, so you can save some money We have things like node auto repair, which can monitor various unhealthy conditions in nodes and automatically replace them And we have node auto upgrade, which allows you to stay up to date with the latest features and security patches without you needing to lift a finger Together, these three features and more really help automate a lot of the operations for you And it basically means less 3:00 AM phone calls for you And perhaps importantly– equally importantly, I think, that the same team that brings you GKE, actually contribute over 40% to Kubernetes itself So my team’s contributions are the kind of yellow orange line at the bottom there And you can see that, not only do we contribute 40%, but we’ve actually really sustained and had a long term commitment to Kubernetes This graph goes back nearly five years And what this means for you is that, if you hit some bug in Kubernetes or you’re having difficulty with something that you just can’t figure out, well, I’m pretty sure there’s someone on my team who’s going to be able to help you with that problem, probably the person that actually wrote the feature that you’re using And, of course, you get all this with the benefits of Google Cloud So all the benefits that you’re hearing about this week and that you will hear about this week, such as our world class fiber network, you inherit all these benefits on Google Kubernetes Engine from Google Cloud I really like this network With this you’re basically getting the benefits of YouTube in a way, right? Google built this thing out to handle YouTube and your little app can run on the exact same fiber network And we’re even deploying undersea cables now So, again, all the benefits of Google Cloud in Kubernetes Engine And speaking of scale, GKE can definitely handle your scale So I think this is really important, actually, that when you’re designing in your application, that you really want to think about scalability from the start Unless you’re creating some internal only like lunch ordering app, I’m sure you want more users and you want more scale But imagine if at that very moment when everyone’s kind of beating down the door trying to get into your products, at that very moment of potential success, your application falls over That’s just terrible That’s just the worst thing So I believe if you start with GKE, and as long as you follow some good scalability principles, we can be there for you and ready to handle your scale and set you up for success A great example of such a success is Niantic when they launched Pokemon You might have heard of this app last year, or two years ago And when they launched, they actually had the first day traffic, or the first kind of week traffic, was something in order of 50 times their original projection and 10 times their worst case projection of traffic We were still able to handle that

And, as you may know, the launch was fairly successful So another thing about Kubernetes is it doesn’t have a very strongly opinionated development deployment workflow So you might think that’s a burden, in the sense that you do have to actually set up your own deployment workflow But, fortunately, using something like Cloud Run, you can actually get a deployment workflow up pretty quickly And the real power comes in when you actually need to set up some business logic for your enterprise This might be something like a vulnerability scanner that you want to run before every deploy to ensure that no vulnerabilities are getting out into the wild Or perhaps it’s just some other admission control that you just want to check some behavior, enforce something to make sure your developers aren’t doing the wrong thing All these kind of custom business logics are possible for you to set up in Kubernetes Engine with your own custom pipeline And with that, I’d actually like to invite up Marshall and Kat from Fair– have a lot to say about Kubernetes MARSHALL BREKKA: Hey, everyone My name’s Marshall CATHERINE CAI: And I’m Kat And we’re from Fair MARSHALL BREKKA: Cool So if you haven’t heard of Fair before, you don’t know what it is, Fair is a new car ownership model That sounds like really markety, so what does that actually mean? If you download the app, sign up, you can search for a car that fits your needs and your budget Once you have that car, you can give it back at anytime, so there’s no time commitment The entire thing is a mobile experience You sign and pay for the car on the app If you’re interested, check it out in the app store So I’m sure a lot of us has gone through the experience of buying a car And you’ve probably realized that it’s not really a fun experience It’s pretty outdated, it’s disjointed, and there’s not really automation about it There’s haggling involved You got to get your own financing You have to get insurance And at the end, there’s a ton of physical paperwork to sign Every single part of that process can be its own business, and usually is its own business Fair’s decided that we’re going to try and abstract all that into a single experience And that’s what we’ve done, but unfortunately that brings a lot of legal, financial and operational complexities So kind of from the start, we said, maybe a service or an architecture is going to be a good fit for us as we try and build out a business that literally didn’t exist It’s actually kind of a funny story with that Two weeks before we launched we decided, we’re going to add insurance as a product on that And we were able to do that because of the way that we had structured our services But with a lot of services, comes complexity, and you need to manage that complexity somehow And that’s kind of where we chose Kubernetes But how did we actually end up on Kubernetes? We didn’t start there The early Fair employees, we had experiences at prior companies with past products, and we knew them, so we started with those And it was working, but it really only solved one part of the deployment process, which is really just– or not process, but solution for API servers, web servers, background job kind of things It solved those But our data scientist team, they had really sub-par infrastructure They were coming from a prior company where their training was actually done on a fixed set of VMs and some in-office hardware So it wasn’t flexible and it wasn’t scalable The way they actually did their training is they had to SSH into a box, pull down their model code, run the training code, and then check back in on it later So they were actually kind of having to coordinate leasing all of that hardware among themselves in order to get their training done So when coming to Fair, they said, OK, we’re not going to bring all that baggage with us, we’re going to try something new And they were experimenting with ways of expressing their training using kind of like directed workflows But they needed something to actually power that system, they needed some infrastructure there So that’s kind of where I came in I was like, OK, you need this problem solved, let me go look around and find some things in the solution space I was looking for something that was relatively simple to use and setup and could give them on demand capacity So when they needed to submit their training jobs, we could scale up, and when they didn’t have anything to train, we could scale down, so we weren’t paying cost on that hardware Pretty quickly we found Kubernetes And that was really back in the Kubernetes 1.1, 1.2 days And took a few weeks We integrated it with their systems And we got it working And it was a huge success It gave them all the flexibility that they needed And after a few more weeks, we realized, wait, the workloads that we’re running here for their data science training, we could actually be running all of our workloads on Kubernetes, the API servers, background jobs, et cetera So we cut everything over And honestly, our devs were actually a little resistant,

because their previous past products they were using, honestly, were simpler But at the end of the day, the ability to actually have all of our workloads running on a single platform with a single set of tooling, that kind of operational simplicity, really won out And we started pretty small and simple with just a couple of services, but we’ve grown We’re not a Google, but we’re running 40-some odd various kind of workloads And Kubernetes has always been able to run every single one of those for us CATHERINE CAI: So we’ve been running Kubernetes for two years What are some of the benefits we’ve actually seen? Well, workflow customization So William actually mentioned in his portion of the presentation that Kubernetes has really solid primitives for expressing different workload types And we were actually able to leverage that to write a lot of tooling to automate and abstract our deployment workflow So what does that mean? Well, we created these standardized build and release pipelines that simplify how an application goes from source to production So our developers actually can spin up an entirely new service and put it into production and our platform team does not have to be involved with the process, which is amazing for someone like me, because I don’t want to spend all day dealing with a developer and helping them put their app into production, right? So how do we actually do this? Well, it actually assumes a lot of defaults, but we actually give developers the flexibility to override some of the default in our pipelines to fit their specific needs So what a data scientist might actually need to deploy their app is very different from what a regular app developer needs, right? And my point really is that, even though our pipeline needs have changed over time– in two years Fair has grown a lot, Kubernetes has always been underneath, very flexible, and been able to fit our needs So the next benefit we actually got was it forced us into very good operational patterns So we don’t have any apps that are pets or special snowflakes And if you’re not familiar with the pets versus cattle analogy, I’ll spell it out for you very quickly A pet is that one box that you have in production that you have to give a lot of love, and if it goes down, you’re really screwed A cattle is more the box that you can terminate it and there’s probably another one that’s already running or will come back up very quickly, so you don’t have a huge problem if it’s down in prod So that’s the way that we actually think of our applications We run our applications to try and be as stateless as possible And the reason for that is because Kubernetes can terminate your workload at any time and actually shift it onto another node So, for example, your node might actually be suffering from something critical, like it might have very high disk utilization or high CPU utilization, so it might actually terminate your app and then move it onto another node So as a result, the way that we think about our applications, it’s stateless We try and run multiple replicas of them And most of our apps actually need to be resilient to running multiple versions at a time because Kubernetes does rolling deployments One other really awesome benefit we got out of Kubernetes was standardized application visibility So with Kubernetes you can see all of your app logs, all of your app metrics They’re all in one place And on top of that, you can see what apps are actually deploying to the cluster So with all of that, we actually got some really cool hidden benefits out of Kubernetes as well We actually migrated our entire workload into another cloud onto GKE without any trouble So it comes to my next point, and the long list of things you’re probably considering when you’re thinking about adopting Kubernetes is whether or not to go managed or unmanaged We’re at a Google conference, so really GKE or unmanaged, right? So we actually started on AWS in 2016 And we actually didn’t have any managed options on AWS– EKS, which is Amazon’s equivalent of GKE, didn’t come out until late 2017 So we actually opted to run unmanaged because that was the only choice And we’ve been running it for two years and we’ve learned a lot about Kubernetes So we’ve developed a lot of really good in-house expertise on Kubernetes But also, after two years, we realize it’s actually a full-time job to run your machines, it’s a lot of security patches, upgrading your machines And I know that, for a lot of people in ops, you’re kind of worried about going– you want to go unmanaged because you don’t want to lose operational control I’m just going to say, take it from us,

we’ve been doing this for two years And when we migrated our workloads to GKE, we didn’t necessarily feel like we were trading off or giving up a lot of control And more importantly, Kubernetes is such a great abstraction on top of cloud providers that we transitioned and moved all of our apps from AWS to GKE and we didn’t have to make any application changes But speaking of apps and deployments, Williams going to come back on stage and he’s actually going to demo how to deploy your app into Kubernetes [APPLAUSE] WILLIAM DENNISS: All right So before I get started with the demo, I just wanted to quickly cover a few key Kubernetes concepts, so that some of the stuff in the demo will make sense, just if you’re new to Kubernetes So we’ll start with containers Kubernetes actually wraps containers up into what’s called a pod So a pod is, for Kubernetes, the smallest schedulable unit, meaning that these containers get deployed together So maybe you’ll have just one container in a pod or maybe you have multiple in the case you have like a sidecar container that adds like logging or monitoring or something like that So that’s a pod Then we have a concept called nodes Nodes are really just a fancy way of saying a machine, which is either a VM or bare metal And the real workhorse in Kubernetes is a concept called a deployment And Kubernetes works on this kind of operative pattern where you just you specify how many pods you want in the deployment So you say like, I want two of pod A and one of pod B And then Kubernetes has this operative pattern that then seeks to drive the actual observed state and the cost to be the required state that you specified So that looks like something like this So say we have a system, and we have two of pod A, one of pod B And then, if one of those nodes were to become unhealthy and disappear, the system would very quickly notice, oh, wait a minute, there’s only like one of pod A running now and none of pod B It’s going to then reschedule those onto another node, at which time it will then observe that the desired state has been met and everything is right in the world again So that all happens automatically That’s one of the very nice things about Kubernetes And that operative pattern applies to various other constructs as well Lastly, we have a service And so a service is just a group of identical pods that together kind of provide a logical API service All right And with that, I’m going to go to the demo OK So I was going to do like a fairly boring hello world app, but I decided on Friday, actually, to make something a little more interesting So I wrote a little server in Swift And this server is actually using this other little project at Google called Plus Codes to convert a geographical coordinate into a plus code Anyway, that’s what the app’s going to do I picked Swift because I don’t think Swift is really used a whole lot in production yet And I kind of wanted to emphasize one of those points earlier that with containers you really get a lot of flexibility I love Swift, but personally I’ve really just not seen too much being in production So if you do want to use Swift in production, I dare say that GKE is a good place for you to do it All right, so what does this little app do? This little app, it’s using a library called Swifter I’m just going to start like a basic HTTP server, handle one endpoint I’m going to parse a couple of parameters We’re going to convert a latitude and a longitude coordinate into a plus code And then we’re going to return that as JSON Well, let’s just take a quick look at what that code looks like I’m going to build that container real quick And since it’s actually cached, it should hopefully be fairly fast All right And let me run that locally And so I’m going to forward port 8080 on my machine to port 80 in the container there Let me show you what that looks like All right So this is the little endpoint And that’s using my example data So to prove to you that this is real, let me pull up some– an actual latitude and longitude to throw in there Let’s look at the Moscone Center I’m just going to use Wikipedia for this, just a good source of geographical coordinates So let me grab those– whack it in here And then this is the plus code for the Moscone Center So if you have Google Maps, you can type this into Google Maps and you can find where we are And when I– actually, one of the cool things about this is you can actually drop off the first four digits if you’re in the same region So anyway, that’s my Swift app running in a container, running locally Let’s get it up onto Kubernetes

So on Kubernetes, on GKE you can create a cluster just with a few clicks I’ll show you what that looks like You just tap that create cluster button and then we have a whole list of templates that you can pick from, so just pick the one that matches your workload the best So say for this demo, frankly speaking, this little– your first cluster would probably be quite sufficient I can leave all that as default Click Create Now, since that might take a couple of minutes to come up, I’m just going to use this one that I created earlier, which is Next Demo So let’s take a look at that cluster And I can click the Deploy button And it’s going to ask me for a container image So let’s get back to the console And I’m going to tag that image And this is using a special tag for Google Container Registry, which is GCI.IO and then my project name, and then my application name So I’m going to push that up And, of course, looks like my cache is gone, so we’re going to test how good my wide connection is to upload a gigabyte of data While we’re waiting for it to happen, let me just show you a couple of things here So Kubernetes– we show you the workloads and the services and any applications you’re running all in the UI, which I think is something that maybe isn’t available on other platforms yet So for example, you can go in, look at the services that are running We’ll see how that’s going All right About halfway done OK, so while we’re waiting for that to upload, let me let me cue up some changes So after I’ve deployed that, one thing I’m going to show you is, also, how to do an update So the first update I’m going to do is I’m going to just make a very small change in this little service Let’s add like a status field Like this one So let me queue that up while we’re waiting All right It looks like we are good OK So I’ll go back to that deployment that I was creating And I’m going to pick that image that I just pushed to Google Container Registry I can leave the command as default, because it’s just going to pick up the command from the Docker file I’m going to give that a name Going to pick my cluster, which is going to be that demo cluster And just hit Deploy So this has now created a Kubernetes deployment all via the UI I can also take a look from the command line and see how that deployment is So I do this cube CTL get pods We’re just basically doing the same thing, but on the command line, just inspecting the state of those pods And we can see that already we have three containers, three pods running with that container So you can see just how fast that is Now, the next thing I’ll need to do here is expose this as a service So I’m going to go expose I’m going to leave that as port 80, because that’s my service Pick a load balancer, so we can access it from the internet And click Expose Now, that action will take about 30 seconds to complete because it’s actually provisioning a Google anti-cast IP, one of various points of presence around the world So to save time there, I’ve actually prepared that one earlier So we’ll go to the Services tab here And we can see this service that I created earlier, and just take a look at that And so that is now the same application, but running in Google Kubernetes Engine Let me pick another coordinate to try out just so you know I’m not faking it Let’s look at, say, the Golden Gate Bridge Using Wikipedia again Grab that latitudinal, longitudinal coordinate OK And I said I would also demonstrate an update, so let’s go back to the console We’ll build the container again This time tag it as version two Fortunately, this one should be quick to upload Sorry I need to actually build that OK

Build it first And while we’re waiting for that, let me get back into our Deployment View here And I’ll show you how to update So I can just go into the deployment I created And I can use this update action here, rolling update And I’ll specify that same image tag that I just created And I can also customize just how quickly I want this thing to roll out, so by default, when Kubernetes does a rolling update, it will one by one replace the pods in the deployment and it will do it in such a way that there’s no downtime If I really wanted that to go out quickly, I can kind of just say, you know what, I just want everything to go out at once So we can try that here OK So now that container’s built, let me push it up, and then I’ll hit the Update button One really nice thing about Kubernetes is, even if I clicked that update button before the container was ready, it will actually handle that condition So it would actually try and pull that image, it would notice that it wasn’t present, and it would try again a few seconds later That’s all part of that operative pattern I saying where Kubernetes is driving the desired state towards– sorry, it’s driving the observed state towards your desired state All right And if I go back in here, I should now see we have a new version of the application running just a few seconds ago imagename OK That would be the reason And looking at the UI view here, we see that revision three of my deployment is currently in the container creating state, so very shortly that will be available for use I can inspect the same thing on the console And looks like it might actually be running So let’s get back to our example Refresh that, and you see we have that status So that’s how to deploy and then update your application in Google Kubernetes Engine I’m going to show one more thing real quick, and that is how to set up a cloud build continuous integration pipeline So let’s go to Cloud Build and let’s also open up the source repository So what I’m going to do is I’m going to push my application up to the git repository in the cloud And I’m then going to use a continuous integration process to build that container So first thing, I’ll create the git repo And I’ll call that plus code I’ll go back to the app I’m going to make one more change, just so we can observe that change Let’s say the status is now awesome And thank you for laughing All right Let’s commit that Well, actually, before I push, let me create a build trigger So now I’m going to Cloud Build, into the triggers, and I’m going to add a trigger, so that anytime code is pushed to that repository I just created, it’s going to build the image, the container image in that repository So I can actually leave this all as the default, and it will actually build my Docker image for me So now let me push that up And if we go to the history here, we’ll see that a build has just kicked off And, again, this is going to take a minute But one nice thing is I already know what the image name is, so I can grab that And I can do the rolling update right now, even before it’s ready So I’m just putting the image name here from the Cloud Build In this case, I’m going to say, I want a max unavailable of zero, so I don’t want Kubernetes to take down any containers until the new one is ready I will do a surge of three though, because I think there’s enough capacity to handle that All right So two things are happening here Kubernetes is actually waiting for that container to appear and we’re waiting for that container to finish building Looks like it is underway All right

Well, I think we have waited enough for me to upload things So if you really want, you can grab that IP and check it out later, and you should see my deployment from Cloud Build With that, I’ll go back to the slides, please All right So one final word about plus codes, if you thought that was interesting technology, take a look at plus.codes This is a little project that a bunch of Googlists work on in their 20% time I think it’s pretty cool There’s also a video to check out And I’ll be putting the code that I used in the demo online later as well That will be the link at the bottom there, where I’ll upload it It’s not actually available quite yet, but I’ll make sure that’s up by the end of the week So what are the next steps? So now that we’ve got our application deployed into Kubernetes by the UI, there’s certainly a lot of things that you might want to do before you’re ready to hit the go to prod button on that The first one to consider would be readiness and liveness checks So these are probes that you can configure in Kubernetes where Kubernetes continuously probes your container to ensure that it’s actually still running and ready to traffic Then, I think you want to look at requests and limits This is specifying the resources that your container will need Requests relate to how much resources get reserved when your container is scheduled and limits is the maximum of those resources that your container can use So when your limits are greater than your requests, it means that your container can actually use any unutilized resources on that note Thirdly, I’d recommend looking at configuration as code So you saw me just deploying things and updating things by the UI, which is a really great way to get started and learn Kubernetes But eventually, you want to get that configuration, you want to download it into your version control, and treat it just like source code So have any configuration changes go through code review and any kind of security process you have with your source code And through that you’ll also gain the ability to rollback So if you released a version by changing the configuration through git, you didn’t realize it was a mistake, well, you can just do a git revert and your configuration would revert and Kubernetes will update Finally, I would recommend looking at auto scaling So GKE in particular has a really good auto scaling technology I actually demoed one last year at Next And I’ve put the link up here for you if you’re interested to see it You can actually set up a request per second based pod order scale, which is really kind of cool stuff Basically lets you say, for every 100 simultaneous requests per second that I’m getting, I want to have like one pod backing that, so if you get 1,000, I want 10 pods That is something that you can just basically state in like three lines of configuration, and Google Kubernetes Engine will make that happen for you All right So with that, I’d like to invite Cat and Marshall back up on stage, where we’re going to have a bit of a conversation about rolling out Kubernetes to your enterprise Cat and Marshall Thanks So one thing we realized, as we were preparing for this talk, is your experience with Kubernetes is kind of, I guess, fairly similar to some other customers I’ve spoken to, particularly the way that you actually deploy Kubernetes into your organization So I thought it might be interesting to just have the audience hear from you sort of how you introduced Kubernetes to your org CATHERINE CAI: Well, yeah, so in terms of introducing Kubernetes into your organization and how quickly you should do it, I would say very slowly, like glacially slowly, especially if you have a lot of workloads that are already in production that aren’t on Kubernetes For us, we had the benefit of being able to start on Kubernetes But, I think, for most companies, Kubernetes is a huge thing to bite into, I guess And so probably the best thing to do is to cut over non-critical, maybe new apps, into your Kubernetes cluster and start developing in-house expertise before you go whole hog into it MARSHALL BREKKA: Yeah Also, we had basically a year to develop in private, and more importantly, fail a lot in private So that was sort of why we were able to start with it But as Cat said, start slowly with kind of non-critical things, because it’s easy to make mistakes like that demo showed, where you roll out your change and you’ve got the image wrong and you didn’t set any safety and suddenly your application is entirely down WILLIAM DENNISS: And so readiness checks for that one, right? MARSHALL BREKKA: Yeah Readiness checks WILLIAM DENNISS: And I’m kind of curious to learn like, what average level of Kubernetes knowledge does it involve to have? So, for example, does the typical application developer need to know about services and deployments and yaml files and things like that in a daily job?

MARSHALL BREKKA: Yeah I think for us, most developers are kind of at an observability point of their knowledge They still are doing some configuration at sort of the deployment and service level, but it’s pretty standardized in terms of like that’s all they need to know There’s a whole bunch of other abstractions that Kubernetes provides that they don’t really need to So I think we have some power users, obviously, who are doing– I think our data science team, honestly, is on the further side of that power user side And they’re doing a lot of interesting stuff And they really are in the weeds But for most people, it’s just like, let the pipeline take care of it, and then use– keep GTL to see, what does that actually look like? What’s your state of your deployments? WILLIAM DENNISS: Yeah I remember when you said that, I really kind of like that setup where, if I’m a developer, I can kind of not use any of the Kubernetes if I don’t want to and just focus on my container Or if I really need to understand how things are working under the hood, I have that level of access as well And I think that kind of, to me at least, contrasts as to a PaaS, where you sort of can’t get access to that lower layer, right? MARSHALL BREKKA: Yeah WILLIAM DENNISS: And so we talked about failure What tips do you have for making workloads more resilient? CATHERINE CAI: Just we actually set anti-affinity policies, so that basically means telling Kubernetes, hey, I actually want to schedule pods of one deployment onto different nodes This is actually one of those moments where it’s like, yes, you should actually fail privately, spend some time really learning Kubernetes, because this is something that we weren’t really aware of, and we actually– it became like a war story in the sense that, whoops, we screwed this up We were actually rolling over a cluster, bringing in new machines, and this happens very rarely, but we had all of the pods of one deployment on one machine and when that machine was drained and it was taken down, we actually took down that entire deployment and service WILLIAM DENNISS: So basically, with Kubernetes, if you don’t actually specify how you want your pods kind of spread out, you could be very unlucky and have them all land on one node, so that’s kind of– MARSHALL BREKKA: I mean, yeah, it’ll just do a best guess and try and fit where it can, which is great in terms of filling up your VMs as best as possible But unless you give it those hints, it’s like, oh, I’ll put them all on one place WILLIAM DENNISS: And so since you set up the anti-affinity, how’s it working? MARSHALL BREKKA: Anti-affinitively WILLIAM DENNISS: And I’m kind of interested in your journey, particularly the sense that you started unmanaged Did you want to elaborate any more about that? MARSHALL BREKKA: Yeah, sure I mean, obviously, as Cat said, we started on ADBS, and there weren’t options for us there But it wasn’t necessarily impossible for us to run it ourselves, so we said cool, let’s go do it But it wasn’t really easy either And speaking very frankly, it’s basically just the two of us who have that institutional expertise, and we’re not seeing the benefits that we really– well, we’re not seeing benefits from having us just run it, we’re not– like she said, we’re not giving up control if we go with a managed offering So that’s really where keeping up with the pace of Kubernetes really, because there’s features coming in all the time, and we want to be able to take advantage of those features But then if it’s like, we got to do those Kubernetes version upgrades and version roll-outs, and that’s just more burden that we have to take on CATHERINE CAI: And just one additional point to that, I think– I was trying to test out istio, and it probably took a week to get it functioning just right We do have kind of a bespoke multi-cluster setup, so it took a little more time to tweak everything and make it work But on GKE it’s a one click deploy, so that was really cool to find out after the fact WILLIAM DENNISS: All right And the last question from me, before we throw it open to you all, tell me about preemptibles So preemptible is, if you’re not aware, it’s this machine type that isn’t guaranteed to even be available and can kind of be preempted and reclaimed with like 30 seconds notice But you mentioned in our prep that you’re actually using them, so I’m kind of curious to learn more CATHERINE CAI: Yeah For actually all of our non-production environments and our data science clusters we actually use preemptibles For data science, it’s actually a great advantage, because we’re actually spending cents on the dollar for those very, very beefy machine workloads And then all of our staging and sandbox environments are using preemptibles And there’s actually like a really funny story associated with it because preemptibles are called preemptibles for a reason– and we– this happens very rarely, but in the two years that we’ve been running with preemptibles in this setup, we actually, on a weekend once, got a ping from a developer who was like, hey, I don’t see any pods in staging, is it down? And so we all, we’re like, oh, no, what’s going on? We like jump in You do QCCL get pods, you see nothing

And you’re like, OK QCCL get nodes, you also see nothing And you’re like, OK, cool, what’s going on? And then it turns out that there were none of the instances left to bid on for those preemptibles And so we just kind of changed VM types And then staging came back up in five minutes, but that is the risk that you incur doing preemptibles So don’t do that in production For staging it’s fine WILLIAM DENNISS: Yeah I think any discussion of preemptibles has to come with that disclaimer, don’t do in production But it’s interesting because you have like non-production workloads that are still kind of very important to the business, like machine learning even, that just don’t necessarily need [INAUDIBLE] completed by a certain time MARSHALL BREKKA: Yeah And I mean, I think especially with those ones, it’s important to remember, as she said, once we had changed the instance type and our nodes were able to come back in, Kubernetes was just like, oh, the state of the world is not in the desired state, so I need to put it back into the desired state So like with machine learning and the training models they were doing there, if the nodes just went away, and then eventually they came back, it’d be like, oh, there’s still workloads I need to finish And then it would go and make them happen So that was kind of like a pseudo disaster recovery kind of scenario WILLIAM DENNISS: Yeah I definitely think using preemptibles without Kubernetes would be really, really hot You basically build Kubernetes, I think So anyway, cool Well, with that, that’s the end of our prepared content But it looks like we have about three minutes, if you have any questions And there’s also been a Dory So, I believe, if there’s any Dory questions, we’ll have those now as well But thank you, everyone, for joining us today [APPLAUSE] [MUSIC PLAYING]