DjangoCon US 2017 – End-to-End Django on Kubernetes by Frank Wiles

(relaxing music) – Good afternoon everybody Thanks for having me This may come as a surprise to you but I am not Josh Berkus But the more I thought about it I realized that you could be easily confused, because Josh’s name is on the program still, we both have beards, we both like Django and PostgreS and Kubernetes We even both have the same damn glasses But there’s some differences Over here on the left we’ve got the average user of PostgreS’s level of knowledge, and then I know a little bit more, and then Josh knows a whole lot more But about now you’re probably asking yourself what the hell this has to do with anything about the talk, and there’s one final important difference that does pertain to this talk My back works pretty okay Josh’s not so much Which is why I’m up here to talk to you about Django and Kubernetes See, Josh managed to hurt his back making these lovely speaker gifts for all of the DjangoCon attendees Josh is an excellent potter as well as open source geek Last week he hurt his back finishing these up, so he wasn’t able to make it, and the DjangoCon team was like, “Hey, any chance you wanna give a talk on Tuesday?” So this will probably not be my most polished talk ever, but on to the topic that you are all here for Kubernetes Kubernetes is arguably the best and most popular container orchestration system out their today That is if it doesn’t get replaced in a year by something cooler and better, but right now this is what everybody is playing with, for the most part Before we dive too deeply into Kubernetes we have to get in to some terminology So, the name, Kubernetes What does it mean? Is it Greek for ship captain? Or is it Google learned its lesson naming things after Go? Or is it three, all of the above? If you picked three, you are correct It is Greek for ship captain and I do think that they named it because they had so much trouble with Go Today most everybody is looking at or moving towards containers in some form of fashion, and containers are great about being able to package up all of our dependencies for an app into a thing that we can then move around and share and use But working with them on their own is not exactly the most user-friendly thing You had to remember which cores am I using, what volumes do I mount where That stuff gets complicated over time, and that’s why container orchestration systems exist Some examples of those if you’re not familiar are like Docker Compose is a container orchestration system Docker Swarm, Mesos, AWS Container Service, and Kubernetes are all orchestrating these containers that you wanna use together So what’s container orchestration? Event loops are used by Kubernetes components to reconcile things between local machines and the desired cluster state What does that mean to us? We basically tell Kubernetes, “This is how I would like the world to look,” and then Kubernetes sits there and spins in a loop and tries to make that happen for you It can’t always do it, but it will continually keep trying to make it happen The other term they use from the documentation here is a control loop, which in robotics is kind of the main process that’s sitting there going, “Am I standing up? “I hit a wall. Do I need to backup? “What’s happening right now?” And that’s basically what Kubernetes is constantly doing Is everything that’s supposed to be running, running Can everything talk to each other that’s supposed to be talking to each other And I’m not gonna lie to you and say that Kubernetes is super easy to learn We are definitely not going to learn it all in 40 minutes today It’s a big complicated system, it’s a bear It’s a big, scary bear, but my goal is to change your impression of it from this, to more of this When I started playing with Kubernetes, the terminology is what tripped me up There’s a bunch of new terms, there’s a whole bunch of new concepts that you just don’t tend to think about, that we’ve all done before, but the way they talk about them sometimes is confusing So most of today is gonna be coming up and getting comfortable with what these different terms mean and then we’ll piece it all together and have a working Django app towards the end These slides move slow when there’s big image Kubernetes has a concept of masters, and nodes, worker nodes

The masters are where all the Kubernetes magic happens of what should be running where, and the nodes are where your containers actually run You can have differing numbers, they’re not a one-to-one relationship This would be a simple three master cluster with three worker nodes A more real world scenario could be something like this So inside of an AWS region you have a master per availability zone, those three are clustered together, and then you have some number of worker nodes also in each availability zone Underneath, Kubernetes is really just an etcd cluster that has a nice API over top of it So when you say, “I want you to run these sets of containers,” it puts that data into the etcd cluster which gets then clustered between all of the masters, and then the API and the scheduler, little daemons that run on these nodes, constantly are looking at that and saying, “Am I running what I should be running? “Is there something out there “that needs to be running that’s not running, “where should we run it?” One of the things that trips people up is authenticating to your Kubernetes cluster Almost everything happens through this config file in your home directory, effectively known as kube config Authentication in Kubernetes is fairly pluggable, but not as pluggable and easy to use as say like Djangos authentication backends, but out of the box, you tend to have one user with a password and one set of SSL keys to talk to it, and you share that among multiple sysadmins, that’s kind of the default configuration It feels wrong and messy and sick, and it is kind of wrong and messy and sick, but it works There are other systems you can authenticate against Google, if you use Google Apps, so you can give certain people access to the cluster There are other schemes that you can employ We’re not gonna get too far into that, but it is important that you have a kube config and that it is properly setup pointing at your cluster You can have multiple of them so that you can switch between multiple clusters of which one you’re talking to at a given time So you can access your cluster by proxy If you run kubectl proxy it looks for that default kube config, it looks for the cluster that you’re currently pointing at So like I have five or six clusters in my config so I can pick which once I’m gonna be playing with at any given moment, and if I run kubectl proxy I am then able to access the API of the actual masters If your cluster is running the Kubernetes Dashboard you can then access it with this– Let me move this to a different No, that’s not going to let me play with my browser, is it Well, I guess we’re going to skip the dashboard portion of the evening The dashboard is a really decent web interface to the cluster as a whole You can see all of the various components, you can make edits to the config You can see resource utilization across your nodes, which ones have high CPU or high memory usage, and you can get a nice dashboard state of your cluster All of that same information, all of the information for the dashboard comes from the API, which is also all the same exact information that you get on the command line tools, that you can also then access from Python, or Go, or other languages One of the ways that you keep things sane in a Kubernetes cluster is by using namespaces A namespace in Kubernetes is a fence between other containers Pods in a namespace, containers in a namespace, can only easily access other containers in that same namespace You can use it as a light mechanism for multi-tenancy You can also use those kinds of namespaces as a light way to separate dev from stage from prod, and all inside the same cluster But it’s not as hard and fast a wall as true multi-tenant separation Your containers could end up talking to each other It’s a balsa wood fence, not a brick wall In Kubernetes you define resources using YAML, so this little bit of snippet at the top there is all the YAML you need to create a namespace called revsys-rocks You can create it in the cluster by just running kubectl apply -f, point it to the file you created

That will come back and say, “I have created this namespace.” If its already created it will say, “I configured this namespace,” if it was already configured You can reapply these same files, because all you’re doing is adding state to the system, so if the state is the same, nothing changes, but if there’s a new state, actions get taken Deployments are a template of how you like the world to look So you say I wanna have this Django app and I wanna run this particular container, and I wanna use these environment variables, and I wanna run three copies of them This is kind of hard to read, size wise, but you’ll see we just call it a kind of deployment, and then we say we wanna have two replicas, and the template is this particular container, and we wanna open that container on port 80, inside the cluster And just like with the namespace we do the exact same apply command to actually put that into the cluster Services in Kubernetes are what we think of as a service I have just created a Django app running in those couple of containers Now I need to tell the rest of the cluster that there is this web service out there, and I define that like this We create a service Notice all of these are in the same namespace, revsys-rocks I tend to use the same name for the namespace and the app, and the service, and everything to just keep myself sane You can call them things differently I know that one of my coworkers, Steven, would probably call this service http, or wizgy, where I would call it the name of the actual service that I’m thinking of it as the website There’s no hard or fast rule here All we’re doing is saying, hey, for this service I’m gonna open up port 80, and it needs to go to container port 80 Kubernetes has a concept called an ingress controller So far what we’ve done is available inside of our cluster, to other things running in the cluster, but it is not available to the outside world in any way, shape or form Opening that port 80 did not open port 80 on an external IP address So the ingress controllers map outside world things to inside the cluster Now, depending upon where you host this, changes what kind of ingress controller you could use So if you’re on AWS, it would use an ELB or an ALB as your ingress controller, and it manages what points where, when You just say, I want one of these, and it will go out and create one, and you start pointing B and S at the C name You don’t have to actually go configure it Which leads to a quick aside here We use a controller called kube-lego, which handles everything about Let’s Encrypted certificates You install this into your cluster, and then in these YAML definitions you can use what are called annotations, and they’re basically just keys in the YAML that Kubernetes itself is not particularly looking for So a controller is just something that is listening to this API, looking for state changes, and taking some action The default Kubernetes one handle things like, ah, I need to be running this container over on this node, but you can create your own annotations and take other actions So in this case somebody created a system to handle Let’s Encrypt certificates You say, hey, I want a Let’s Encrypt certificate for this particular host name It will go out and register it, it hijacks the .well-known location, handles all of the key management, stores it inside the cluster, and presents that to the world as Let’s Encrypted SSL connection from then on You literally have to do just a couple lines of config This is an ingress controller definition, and you’ll see we have a similar kind of pattern Name, namespace Then there’s the rules there, and it’s a host The domain is actually revsys.rocks We don’t really talk about domains so don’t get confused That could be .com or .org or whatever But then we have a little part that I wanna highlight, and this is the kube-lego part We just have an annotation there saying, hey, I want tls-acme, and I wanna use an nginx ingress controller, and its host should be revsys.rocks, and I want you to store that as a secret named revsys-rocks-tls And I don’t have to do anything else When it comes up it gets the cert If it needs renewal it’ll renew, and I don’t have to deal with that And I don’t have to deal with that on a per-application basis, or even per-container basis I can throw a couple of Rails projects and a Go project

in this cluster, and they have Let’s Encrypted It’s totally independent of whatever I’m doing in my container One of the last pieces of terminology is a pod When we create deployments, all the containers in the deployment form a pod Pods are sets of containers that are deployed together on a host, so if you have things in a pod and it has four containers to it, all four of those are gonna be deployed on host A If for some reason it can’t deploy on host A or host A dies, they will all be picked up and run together on host B So they are always a set together, like a pod of whales This can be useful for lots of scenarios In all the examples I have today, we are only using one single container, so the difference there does not become particularly apparent But if you needed additional containers that only talk to each other and not necessarily the outside world, this can be very efficient You can do things like share a Unix socket on a host, that you wouldn’t be able to do because you couldn’t guarantee that they would be running on the same host So if you have a one container Django app with one memcache instance, you could have that talk over a Unix domain socket, instead of a TCP socket, and get a little bit better performance And you can only do that because you know they’re always running on the same host So at a high level view, Kubernetes is, the masters run this API, and store this cluster state, and the nodes run pods, which provide services inside the cluster, and ingress controllers map the outside world to the inside world So if you have a host, let’s say we have three worker nodes and we have some stuff running on host A, and some stuff running on host B If you shoot host A in the head, AWS just terminates the instance The other masters are gonna go, wait a second, the pods that were scheduled on host A are no longer running on a host because we can no longer see host A I need to schedule them somewhere, where do they fit? Okay, host C is pretty empty, I’m going to run them over here Change all of the pointers, all of the different proxy ports, the ingress controller, all that stuff gets changed over, and you’re back up and running So you can do things like upgrade your worker nodes from one AWS instance size to another and never have any of your stuff go down, and not have to change any of your configurations, or IP address It gets you away from pointing at IP addresses and temporary host names, and it makes things move a little more smoothly So how do you run Kubernetes in the real world? There are three different things you might interact with One is called kops, K-O-P-S It’s a utility for spinning up kube clusters in AWS It works really well, it handles all the AWS-specific nature of Kubernetes So one of the hardest things about Kubernetes is getting a cluster to start It is not easy to turn on It is really hard to kill once you’ve turned them on, and that’s kind of its job, but getting them turned on is involved, and prone to error, so people have created these wrapper utilities to make the process a little more turnkey for us mere mortals The other option, and this is the option I would encourage you to play with first If you have an interest in Kubernetes, play with it on Google Container Engine, which is a hosted version of Kubernetes with Google The reason I suggest it is that then you know that you have a well-working, good to go Kubernetes cluster to play with Any problems you’re having are your misunderstanding of how Kubernetes works, or a configuration mistake, and not, perhaps, you set up the cluster poorly And then there’s also minikube, which runs a single node Kubernetes cluster on your laptop using Vagrant, or VirtualBox, and a Linux system on your laptop That’s a great way to play with Kubernetes in the smaller development environments You can use those same definitions to define which containers to run and which services to expose, on your laptop, and then use them in your production clusters One of the things you gotta be able to do with containers is configure them We’re all 12-factor apps now, so we’ve gotta be able to push this configuration into these containers, and Kubernetes provides several different ways Environment variables, of course We can just define environment into the YAML, there towards the bottom I’ll highlight this a little bit You can see we’ve just defined an environmental name and we’ve put in a value and that gets injected into the container’s environment That’s great and all,

but a lot of times we don’t want to expose all of that into our configuration We can also use what’s called ConfigMaps, and this lets us map sets of variable-like things, whole files or entire directories of files, into our pods Maybe we don’t wanna have to list every single environment variable in that deployment YAML We can say, here’s a ConfigMap of 25 environment variables, take these and apply them into this pod You can pick with map goes to which container, and it just does it all for you You can also do things like, I wanna use this nginx configuration file, put it here on disk, and it will grab it from Kubernetes configuration secret store and plop it into the pod at runtime Kubernetes also has a concept of secrets These are great We can obviously put our database password, our API keys into those environment variables in our deployment YAML, but that means that everybody gets to see them Perhaps we don’t want our developers to know those, and just the ops people should hold onto those so Kubernetes lets you define secrets Secrets are available, like most things, only inside the namespace that they’re defined in So you can’t share secrets across that fence Unfortunately for secrets, they’re not particularly secure Right now Kubernetes stores them as base 64 encoded text on the master, so they’re not as secret as you might want Now, to be fair, they are working towards real secrets, encrypted on the master secrets, and this was just a stepping stone to getting there But it does keep secrets that should not be on a node from getting to that node So until a pod needs to run there that needs access to that secret, that secret won’t exist on that node So it does keep them off places where they have absolutely no business It’s just that once they’re there, they’re not particularly secret So this is how you use secrets in an environment variable We say hey I wanna have this database password environment variable, get its value from the secret named revsys-projects-db-password, and the key in there of password You can also use– I mean, this is just a set of hosts, so you could run a Vault cluster in your Kubernetes cluster and get your secrets from Vault, or some other really truly secure secret storage Because we don’t know where our containers are going to be running, centralized logging becomes terribly important for being able to figure out what’s going on If you use Google Container Engine, the logs from your cluster go straight into Google’s logging tool, their Stackdriver logging system, that works fine We’ve had good luck with the EFK stack, which is Elasticsearch, Fluentd, or Fluent Bit, which is a smaller C-version of the Fluent daemon, and Kibana But you’ve gotta have this or you’re not gonna be able to tell what’s going on I don’t even know where revsys-rocks, which host it’s running on I’d have to go and dig and find out where it’s running, to even get onto the host to then look at logs So having centralized logging is important And so for part of that, we’ve lightly open sourced this We use it, I don’t know how useful it will be for you all but it’s jslog4kube, and it configures gunicorn and your Python apps to use JSON logging to stdout, and includes information that is Kubernetes specific, so like what was the pod’s IP address inside the cluster, what host was it running on, what was the name of the pod, what app is it in, those kinds of metadata that’s Kubernete’s specific gets added into the JSON that’s emitted in your logging Data persistence is pretty important, and there’s a couple of ways to handle it with Kubernetes The hard way is with persistent volumes This works, but it’s kind of hard to manage, and it’s kind of hard to wrap your brain around This is advanced Kubernetes here What you’re saying is I have this volume, I provide a certain amount of space, and then your apps claim, they make a persistent volume claim of how much space they need, and Kubernetes tries to match up the claims with the volumes as efficiently as it can, and then will mount those volumes on the hosts where those pods with the claims run And if those pods get evicted for some reason, or the host dies, it then remounts that volume to the new host where these things run And in a perfect world that’s exactly how it works, and it works that smoothly I have yet to experience that perfect world

The easier solution is to do off cluster storage, and this is where I encourage people to start All it this is is the existing way you were doing storage You have a database server somewhere that all of your containers then connect to, and you manage your database server as bare metal, or you use Amazon RDS, or something like that, for those kind of persistent data stores One of the things that Josh was gonna talk about is Patroni Patroni is a templating system for highly-available PostgreS The idea is that you could keep a master running in the cluster and slaves running in the cluster, and replicate your data from one to another, and as containers were killed off or nodes died, you could keep that replication chain working between the nodes to the point where you didn’t lose data I’ve heard good things about it I’ve never actually played with it, and so I wasn’t comfortable showing you how to do it having never done it myself But I do wanna mention it in case you’re interested in playing a little fast and loose with your data And this was out of– this slide is out of order, I’m sorry The idea here is your persistent data instance is just an instance there outside of your actual blue Kubernetes cluster, just inside the same VPC, so that the cluster can access it, but it’s not actually running on Kubernetes, it’s just a bare metal node using Ansible, or Puppet, or whatever you wanna use, or do it by hand One thing that I do not have a ton of experience with but I know is useful, is helm Helm is a package management system for Kubernetes You can think of it as templating those YAML blocks But it’s useful in more complicated scenarios so you can say, run me a console cluster, and I wanna have this many nodes, and it will figure out what all needs to be applied to the Kubernetes API to give you an up-and-working console cluster with the federation, and the leader election, and handle all that stuff for you So you can build these templatable systems to the point where I should be able to take your system and helm install it, and I just have that running and working on my cluster, and I shouldn’t have to do anything else, other than maybe a little bit of secrets management One of the things, because Kubernetes is just an API, really, we can use the API from Python This is all you have to write if you’ve got kubectl proxy running on your localhost or you have a well-formed kube config file in your home directory, all you have to write to get a list of all the pods running in your cluster, and I’m just printing out there, pods IP address, the namespace, and the name of the pod This is a generated API off the Swagger docs from Kubernetes, and it’s kept up to date with releases So this, you should always have full access of the API from Python So you do not have to build your tooling in Go unless you want to What does that look like? Here’s the output for that I ran this on our revsys production cluster and you can see various namespaces we’ve created there in the middle, and the various pod IP addresses, and then the names of the actual pods You’ll see that it takes the name that I gave them, like revsys-rocks, and then appends a uniqueness to it, and that’s that particular instance of that pod Every time a new pod comes up it gets its own unique name, and if it gets killed, a new one comes up So you can see a differentiation in the logs Even if it’s the same container, one got evicted and a new one got created, you’ll see that name change happen Everything in Kubernetes works with an operator or a controller, and why would you want to create your own? Like kubelego, you can create your own tooling that takes action when these things happen You add a little bit of annotation of your own, and you can watch the cluster using that little bit of Python and say, ah, I’m seeing a new pod come up that’s annotated “Frank needs to do something to it,” and I see that, and I can go take action Either inside the cluster or outside the cluster, whatever I want to have happen when that annotation comes up I can make happen Here’s some examples of operators you could build Pipe a message into Slack anytime somebody creates a new deployment When somebody’s launching a whole new thing, pop a note in Slack so we know that that happened Or maybe we wanna check any time pods come up and down,

for whatever reason we wanna get that message in Slack That’d be like 10 or 15 lines of Python, nothing particularly hard Package that up in a container, tell Kubernetes to run it You could watch your Django apps and look for the database connection information and automatically backup any databases that are running and being used by your cluster, without having to go in and define each one You can just say, oh, Frank’s test system 47 just came up, it’s annotated as backup equals true so I’m gonna go back it up and put it to S3 and I have one centralized system for dealing with it Just like we have centralized logging we can have centralized control, because we’ve abstracted out the whole concept of ops to this API we can watch Maybe you have really complicated rollout scenarios where you have, I dunno, six hours of collect static runs before finally things come up Well maybe you wanna handle low amounts of downtime by spinning up an entirely new service once it’s all ready, spin down the old one and move traffic over to the new one You could orchestrate that with just a little bit of Python, even if it’s something Kubernetes itself doesn’t really support Hopefully that is enough information to make you interested in Kubernetes, but I’m sure you probably have questions – [Audience Member] So how do you handle CPU, I have some service that’s gonna need a lot of CPU, and I don’t wanna run these two services on the same node because they’re gonna conflict with each other, that kind of stuff – In the effort of fitting things on the slide and not melting your brain too much, I left out resource quotas, which are just items that are in that same YAML, where you say, this takes this much memory It has a soft limit of this and a hard limit of this, and you can have that for CPU, memory, and storage, and Kubernetes will handle it much like any other kind of quota system If it reaches its soft limit, information goes into the API, I’m at my soft limit At my hard limit it actually kills the pod and then recreates it So you can tag pods by how much resources they should use and then where you wanna stop them if they grow beyond that You can also then target nodes, so you may wanna have a cluster with some memory-heavy AWS instances and some CPU-heavy AWS instances, and you can say this pod needs to run on one of my memory-heavy instances, and these should run on my CPU-heavy instances only, and that’s how you can do that So it will stack, as best it can, into those nodes based on the values you’ve given it It will overflow if you don’t put any resources So like in my examples, it will just keep packing them into nodes, and you will eventually hit swap and things like that But if it can’t then find a spot, because there’s not enough resources to put a pod, it will continually try to find a spot for it and you will see information in the dashboard, in the log, that I can’t run this pod, I don’t have the resources to do it You add another node to your cluster and it immediately puts it right there – [Audience Member] I’m totally new to Kubernetes On the ingress point, does it come to the nodes, or to the pods? – It comes to the ingress controller, and is then proxied inside the cluster from there You would think, oh this is gonna add this extra hop and it’s a pain It really, in practice, at a really really huge scale it matters, but for your day-to-day use, no one’s gonna ever notice that that extra hop is in there It’s a little go proxy, and it’s super efficient – [Audience Member] So considering that the pods auto replicate, I guess, the load, do I still need a load balancer in front of it or no? – The ingress controller on the outside will be the load balancer It comes to the cluster, and it load balances from there And that handles all that, where is this pod running, and it just shoots it to where it needs to go, inside the cluster, and you don’t have to think about it – [Audience Member] In thinking about your application, what are some ways to make a decision on does this solve more problems than it creates in terms of how do you decide at what point this is gonna do that for me, it’s gonna solve more problems than it creates – It’s a tough call, and that’s like with any other tooling On one level I think it’s easier, sometimes, to SSH into a box and I’ve gotta install the thing, and do it by hand, but that’s not reproducible in any way Every tool has its pros and cons The thing I like about this one is that I’m gonna be doing stuff with containers so I need some kind of container-based system, for the most part Ansible and Puppet and Chef and things like that are not container-focused, so I don’t see them as

good tools for solving things around containers that much Where you pick which one you use, or if you use one at all, is hard The thing I like about this is it really does free me up to not think about the mundane things like where is this going to run, what port is available to open on it, and how do I proxy from this port to that port I don’t have to think about any of those details But it does give me the power to listen on the API for when things change and take some sort of actions I like it for that If you have one app, and you do a deploy once a week, this is probably overkill If you’re managing 50 microservices, and you deploy 10 times a day, you probably are already building something like this, or using something like this to manage all that – [Audience Member] Thanks very much for the introduction I was wondering what the next step might be How did you go about learning this? Did you have resources that you thought were particularly helpful, and could you recommend them? – Kubernetes is a very fast-moving beast I think I first started playing with it at like 1.2, and it’s already at 1.7, and they come out about every six months They have a fairly nice process of things come out and they are marked as alpha, and then they are marked as beta, and then they eventually become stable Once they get to beta the YAML configuration for the most part doesn’t change and you can pretty much just pick it up and move it over The alpha stuff is pretty alpha, and good luck if it works So the documentation often lags behind the version just a bit on the newer stuff, or the stuff that just recently changed a lot So the documentation should be an amazingly great resource, and it is, as long as you keep in mind that if this thing came out in the last version, the docs may be wrong, where if it just moved from alpha to beta, the docs may be slightly off But really it’s mostly blog posts and tutorials, where like, how did somebody else go about this, let me go look at their Kubernetes configuration, and then some playing around There is no really great, here’s the book on Kubernetes and it solves all of your problems – [Man] Thank you Frank (applause)