DjangoCon EU 2013: Amjith Ramanujam – Thread Profiling in Python

thank you chest polska yes Tomaj oh thank you thank you that was my talk in Polish so I’ll spend the next 30 minutes translating it in English please bear with me so recently I got a chance to write a threat profiler for work and I was amazed by how ridiculously easy it was to do this in Python so I’m just gonna share the things that I found during during the creation of the third profiler and hopefully it’ll help someone you know write your own turf profiler because you’re wide awake at 3am because you’re kind of jet lagged and you’re wondering what to do so let’s let’s start my name is um Jeff and I go by I’m jathara on Twitter you’re welcome to follow me although I rarely tweet so I don’t see any point in doing that I i love python so I I really do it’s it’s my it’s my favorite language I like it more than my native tongue which by the way is not English so if you find me you know saying some complicated word it’s not because I have a rich vocabulary because i’m probably mispronouncing something so there’s that to look forward to i work for a company called New Relic how many of you here know what New Relic does oh that’s awesome thank you nice so we’re in the we’re in the space called application performance management which is just fancy talk for you pays money and we let you install our software and show you pretty graphs and we also like email you and stuff when your servers melting so that’s that’s pretty much what we do in New Relic enough with the background let’s let’s get started with with with the the actual talk let’s begin by answering a very simple question what is profiling many of you might already know this but it’s just the the action of just measuring the performance of your script and how can we do this in Python many of you no this famous coat or it’s like an unofficial motto of Python which is batteries included which comes from you know the standard module has everything that you need if you’re if you’re wondering how to fly then all you have to do is just import anti-gravity and you’re good to go true to its word a Python standard library does come at a see profile module and it’s very simple to use let’s say you have a script that you usually run by calling Python LCM pie all you have to do now to profile that is Python MC profile LCM pie and suddenly you get a bunch of information about your script which gives you how long it took to run how many function calls were made during that execution and what are the different function calls and how many times they were called what is the time it took to run each one of them and a bunch of different information like that and you can use this information very well to sort of optimize your code and make it run a little bit faster so see profile is fantastic but there is there is a slight problem it’s you I a CLI which I love but in this case it has a has an issue because the last time I tried running it on a Django application it produced an output which was like five thousand lines long and my scroll back wasn’t long enough to go back all the way to the top to get the screenshot it was that big and so if you are running this on a really large project you might sometimes find yourself with a ton of data that you’re that is kind of overwhelming in that case you have a better option you can use GUI called run snake run just you know cleverly named I think and this is a this is a wxpython project that gives you a GUI GUI and you basically write your profile into a file and you can you open it with run snake run and it produces gorgeous square maps well it produces square maps let me just show you a screenshot of that because that’s or not not just a screenshot but the actual application so this is the actual application for that particular one and here you get an option to sort by the number of calls that are made by alphabetical order if you wanted to sort them or if you just want to drill down into the specific when you double click and then you can just you know jump all the way down and keep double clicking and you know you can drill down and find out exactly how which functions are taking the longest time and what’s it’s called graph and that sort of thing and so clearly i mean the bigger the box the more time is spent in that particular function so that’s how that gooey works so that actually gets is pretty close to you know what we need and and how to how to optimize our our code but there is one small catch if we run our code

without the profiler turned on this is one of an exam one of the examples where it took about five seconds to run my code with the profiler on it took about ten seconds to run it’s a hundred percent overhead I mean you might be thinking you know it’s not that bad i mean it’s not terrible well the card kind of disagrees because he wants to run it in production if you’re wondering like who the hell runs profiler and production only the most successful company in the world you might think that when you run a google search and they show you that you know we found three million pages and they say we found this in 0.16 seconds you know you might be thinking that google is just boasting which they are kind of but they’re also a person they’re sitting and wondering you know if that number goes above a certain threshold there’s an army of engineers we’re going to you know work their work their butts off to bring that number back down or throw a ton of hardware to bring it back down to a minimum level again so you constantly have to be profiling your own results in order to find out you know when things are going wrong and when how to find them and fix them quickly so rounding profiler in production is a worthy cause and we should actually be you know striving for that so we kind of establish the sea profiles too slow to run in production and if we’re not running in production that makes Picard a sad panda so what can we do we can take a different approach which is you know most parents will agree with me here you could do targeted profiling which is kind of like you know picking your favorite kid and then sending her to college sorry I’m that’s that’s a terrible joke any the you pick your you pick your critical functions that you want to monitor because you know this is business critical and then you do some sort of some sort of a hybrid profiling the reason i call this hybrid profiling is because that’s kind of what we do at New Relic and I’m literally giving away the secret sauce here what we do is we target very specific web frameworks like Django bottle or flask or a ton of other ones that I haven’t listed here and these web frameworks have a definite recipe in which they work which is basically you go to a URL and then the control gets transferred into a view handler it basically gets routed to a view handler and in the view handler you basically talk to your models which are your databases and you generate the information and then you present that information to the user using via templates if it the critical functions in this ones are in these in these web frameworks are the view handlers the model layer and the template layer so if you can find a way to time those things automatically then you know those cover pretty much ninety percent of your use case so this is a screenshot from neuro Alex transaction trace a very specific web request when that happens this is how the the control gets transferred and we just time only the top level functions we don’t time every single function that gets called from your view handler the only time the view handler itself but I mean if you’re making calls to SQL and so on the reason i call this hybrid profiling was because in addition to just timing them we also capture the arguments that are passing into the function so we can reconstruct the data of what was the request that cost this the slow request to happen so for example i expanded out one of the SQL queries so we can show you the SQL statement that got captured which took about sixty percent of your response time so this this is like a perfect example but it’s not always sunshine and ponies because if you’re only picking critical functions you’re going to obviously find some holes because in case you write your view handler with which calls 50 different sub functions that are doing very custom work we’re going to club everything into application code that’s code word for like we don’t know what the hell you’re doing so this again it kind of falls show a couple of things I know you guys are getting annoyed like you know this starts about threat profiler when are you going to get to it I’m almost there so i’m going to introduce threat profile or as i call it the statistical threat profilers using a simple analogy your statistical thread profiler is like an overly attached girlfriend or a boyfriend let me explain an overly attached behavior is categorized into three three sections interrupt inquire and collate imagine getting a phone call every hour or a text message and asking where you are or who are you spending time with or who are the people you’re hanging out with and if somebody were to take notes very diligently for about a month or about a week they can pretty accurately tell like how you spend your mondays and tuesdays who your friends are and what times you do what are the different things you might think this is overly attached behavior but this is exactly what a statistical profiler does let me let me put it in a different view with a very professional rendition what I’m showing here is a single thread that’s running a very long function if

you interrupt it at very regular intervals and then you ask that thread what it is doing at a given time you can pretty easily tell which function is the one that’s occupying the CPU at that given time so it kind of gives you a representation of where your time is spent which function is occupying the CPU at any given time so it does not give you exactly how long it took to random function it basically just tells you how frequently a function might be called and how frequently a function is occupying the CPU time if it is a very short function but it gets called millions of times again the chances of that function appearing whenever you interrupt your process is going to be very high and that is the function of interest that you want to be optimizing so that’s the basic idea behind a statistical thread profiler if we are going to translate this into Python how can we do is okay so let’s let’s look at each category in in detail interrupting we could be interrupting and running process in two ways one is using unix signals or you could be using a background thread if you’re using unix signals the way to interrupt this is an imp I thon it’s just three lines of code or four lines of code you import the signal module and then you pick a specific unique signal that you want to use in this case i’m using I timer virtual and you’re basically setting yourself an alarm and then when that signal is sent to the sent to the to the main function you basically attach a handler function so anytime that signal reaches your process this handler function will be run and that’s where some work is going to be done and then you set yourself the alarm for you know 100 milliseconds and it’s going to repeat every hundred milliseconds and so it’s like happening every 100 milliseconds so this is one way of doing this obviously the the pros is it’s very accurate because it’s a UNIX timer that the UNIX process is going to be interrupting the colonel is going to be interrupting your process but the obvious conte is or it could still be considered a pro which is it’s only on UNIX systems it does not work on Windows and if you’re using Python in an embedded environment like mod wsgi then you know ma w’s gia is going to ignore your ignore your signal again see extensions cannot be interrupted unless you write your C extension to be well-behaved it cannot be interrupted the other method is using threats if we’re using threads again it’s three lines of code which is you import the threading module and then you say what is the handler function that has to run in a background thread you start the thread and then you in in the handler function you basically sleep 400 milliseconds so you’re basically saying this thread will sleep hundred milliseconds or it’ll wake up every hundred milliseconds and do some very specific task and then go back to sleep and then you know wake up in the next hundred milliseconds this is like an infinite loop that’s running and waking up every hundred milliseconds so that’s that’s actually very simple so that’s how we interrupt now that well the the obviously the pros are it is cross-platform threads can work in windows or unix and mod wsgi will honor background threads that are that are running in your system the cons are if your thread that you if your main function that you’re running is very CPU intensive the chances of a background thread waking up every 100 milliseconds and grabbing the gill is not very high especially if you’re in a multi-core system your CPU intensive task is not going to give up the guilt or it’ll give up the Gil but before the background thread actually grabs the Gil it’ll the main thread will capture back again and then I’ll continue running if you want to get more details you may want to look at David Beasley’s talk on how the Gil works and Python it’s it’s a fantastic talk but anyway that’s so if you’re using a if you’re trying to use statistical threat or a father or numpy syfy or something like that then you know you should probably go what signals not with background threads again even if you’re using threads you cannot interrupt see extensions so they’re like the gods of the Python world so now that you found a way to annoy the hell out of your main main function what are you going to do every hundred milliseconds when it wakes up well the equivalent of asking what you’re doing or who are you hanging out with in in programming terms is basically getting the stack trace of that specific threat that you are that you’ve just interrupted and getting a stack trace might sound complicated again python is a very easy language and it’s basically just two lines of code here you do import sis and then you say assistant underscore current underscore frames and the underscore part is very important this is a private method we’re not supposed to be using this is not like a public API that’s never going to change but it’s there we’re going to use it I mean if it changes oh well so what this gives is a it gives us a dictionary where the key is a is the thread ID and the value is a frame object the frame object is the one

that holds the information about the function that’s actually running so let’s dive into the frame object the frame object is like a linked list where each node represents a function in that stack trace and that that is a object which has the function name the file name the line number a bunch of arguments and global slogans and a bunch of different information the one that we are going to be interested in is the top three the function name the file name and the line number I’m going to get into that in just a second how do we extract this out of the frame object we can either you know do this manually by using a while loop or a for loop or there is another built-in module that extracts this information for us you do traceback module and then you do trace back to extract frame and provided the frame objects root frame and then it it gives you a list of tuples where each tuple has the file name the line number the function name and the actual line which was actually executing at that given time when it was interrupted so this is actually very nice this gives you a list that you could just use right away so how are we going to now that we have collected all the information how are we going to represent this so how are we going to build a sensible information call tree or whatever it is that we’re going to build let’s take a very simple approach this attempt number one is going to use default dictionaries in Python what we’re going to do is uniquely identify a function this is very simple we already have this information which is the way we we identify this is a function named file name and the line number and and form a to pull out of that and build a dictionary use that tuple as the key and the value is going to be an integer any time that key is present you just increment that value and at the end of the execution of your script you get a dictionary whose value whose keys are want to be the function names and the values are the call counts so this immediately gives you a it identifies which is the most called function and it’s extremely simple to implement you just use it with a print statement to get the list of functions and it’s call counts but obviously it’s just a bunch of bunch of a bunch of nodes without any context meaning in this particular one even though function C was called five times what was the parent function that called function C I mean you don’t get that information and which is very important because you want to know in what context was a function called in order to optimize optimize that function for that specific scenario so let’s make a little bit of a complicated approach here which is a temp number two where we’re going to try and build a call tree out of this now the left side is going to be the stack traces and the right side is going to be the culture use first time snaps of when we first interrupt let’s see let’s say that I get a stack trace which has function a calling be calling see the culturally right now is nothing let’s step through each of these nodes and then we’ll build the culture as we go so function a we’re seeing it for the first time and so we create a node and put a call count of 12 it function B we actually make function B as a child node to function a and then function C we do the same thing the next time snapshot we already have a existing call tree I get a new stack trace now I see function a again and so I increment function a I don’t add a new node I just increment that existing node function be again increment the existing node now function D is new so what I do is I make a child no doubt of that and now you have a call tree this is another example of function a and calling function d so function a is already there you increment that now function d even though its present and the call tree we’re not going to go through an increment because the parent function for this was function a so we’re going to move and create a new child node so this is preserving the context which under what context was function d called so that we can kind of optimize it for that specific code pad so that’s that’s the call tree method you could either builders the the tough part with this is the visualization because the these qualities there are dictionaries you can either use you know pretty print and print them out into a CD out or you can use one of the existing tools that actually already does this if you just want to use one of the one of the off-the-shelf tools there are three tools that are actually already available one is stabbed prov which is in the Python package index which uses signals and it has a command-line interface where it prints out every information that it captures and analyzes but you know we already saw some limitations of the command line so I’m going to skip over this one plop this is a a module that was created by Dropbox and it was open source and they also use unique signals but the visualization that they use is using d3.js they actually build call grass out of this and here’s a screenshot of the the UI again I’m going to show you how

this looks like so this so this is the this is the UI and the first time I actually use this I was just playing around with this for like five minutes before I started actually looking into what it actually does but anyway so that was kind of cool and the size of the bubble actually shows you which that that particular function took the longest to run so the size of the bubble is proportional to how frequently it was called so that that’s the representation that’s used there but it was kind of cluttered but the information is still there too for us to take a look at so new or i think i mean i mentioned that i wrote i had to write a threat profiler i mean so obviously we released that and we use we decided to go with background threads because you know we have to support all the web servers and mod wsgi is one of the ones that we absolutely have to support so threads and then we have our own GOI that we just feed the information into and this is how it looks like again and this is this is basically just giving you the call tree where you have the whiskey layer calling in to the to the view handler and a bunch of different you know tag methods and actual jangle framework code is also a bit also shown in that call tree and you’ll notice on the left side we have ways to identify whether a threat is serving a web request or whether it is trying to do a background task so we can kind of bucket them into different categories and so you can choose to see only the the background tasks for example and then see only the culture for that versus web requests and that was like a nice addition to that so that the that that kind of gives you an idea of how to write a threat profiler if you decide to do it yourself and it’s actually very cool because it’s very low overhead the all the testing that we’ve done shows it’s under like three percent overhead if you if you actually run it all the time it’s very low overhead it works on C Python and it also works on pipe i tested but it does not work on Jai thon so if you’re using Jai thon then you’re out of luck because the cysts are current frames is not implemented and again if you are trying to introspect green threads using green let’s or event let’s again you’re out of luck because this dark current threats does not give you the stack trace for the green threads but there are other other methods that you could use but it has its own complications so I’m not going to go into that right now but we can talk about that later if you have if you have still have questions you can come talk to me so that’s how we get to run run the profiler in production which is very exciting but for the grand finale so this is something that we recently released at New Relic so what we thought was you know we have we have the deterministic profiler where we have the transaction traces which show you exactly how long a function took to run but obviously it has holes because we’re not instrumenting everything and then we have the statistical profiler which does not give you how long a function took to run but it gives you a representation of how frequently a function got called so we decided to combine those two together and we call it x-ray sessions where you choose again you choose a specific web transaction like picking your favorite child and then you decide to run your x-ray session on it and it captures the hot captures up to the first hundred web requests that have happens it captures deterministic information about that and it also runs the threat profiler only for that web request in that context only so any holes you find in the transaction trays you can come to the threat profiler to take a look at which was the worst offending function in your in that specific web request and and again so you combine those together and you know you get histograms response times and throughput on all of that stuff so that that’s the extra sessions that was just released thank you very much that’s the end of my talk you