Visualizing Cloud Bigtable Access Patterns at Twitter for Optimizing Analytics (Cloud Next '18)

[MUSIC PLAYING] MISHA BRUKMAN: Hello, everyone, and welcome to the session Today we’re going to talk about how to visualize Cloud Bigtable access patterns with Visual Analytics And with us today we have Steve Niemitz from Twitter, and we have Wim de Pauw, an engineering manager of Google Cloud My name is Misha Brukman, and I’m a Product Manager on Big table on Google Cloud With that, Steve, take it away STEVE NIEMITZ: Thank you All right So hey, everybody I’m Steve Nemitz I’m a software engineer at Twitter I’m the team lead for our revenue data analytics product We basically focus on serving analytic queries specifically for our advertiser analytics So let’s talk about Twitter a little bit, just a little background for people unfamiliar with it So we have around 300 plus million monthly active users Of those users, they’re generally generating around 100 million tweets per day That turns into something like 300 petabytes of data that we have stored on HDFS And we’re a global company We have pops in over five continents So specifically, now, let’s talk about advertiser analytics So we have all those users interacting on the platform Our advertiser analytics product is what we use to provide to our advertisers in order to get information on how their ads and campaigns are performing So they can see things like spend breakdown, impressions, clicks, all these metrics in real time through UIs that we provide to them, and also APIs for our more advanced advertisers So the skill that we operate on here is pretty large We have over hundreds of thousands of active advertisers at any given times Those advertisers are running lots of campaigns And all those campaigns are generating hundreds of thousands of events per second Those could be coming from client events, from the web browsers, mobile apps, all of stuff But even internal applications that we have like our ad serving infrastructure, those are all generating events that go through our pipeline So all in all, it’s around 20 terabytes a day of raw log data And our pipelines will take that and compress it down and aggregate it to about eight terabytes a year of data So in all, our analytic data set size is somewhere on the order of 50 terabytes, but growing as we speak right now So let’s talk about what are the requirements that we need to provide in the SLOs we try to achieve for this analytics service that we provide The queries need to be fast So these things are powering interactive UIs They’re powering APIs that our advertisers are using So we try to shoot for pretty tight SLAs Our SLA is 4.5 seconds on many of these queries, pretty much all of the queries, which can be for things like give me three years of data hourly, or give me the last week of data just aggregated together We generally have around 5,000 QPS, which turns into around a gigabyte a second of data that we’re processing at any given time just for query execution And we obviously have a pretty high SLO on uptime, 9997 So let’s talk more about the queries we’re serving here So things like give me all of the impressions that my promoted tweets have generated for the last seven days Show me the spend that my campaigns have had over the last month totaled up in sum together Let’s say give me the impressions by day for only app installs by iOS users So we get really granular with some of these queries and can break them down and segment them many ways And then other things like hourly spend broken down by geography So like how much do people in Japan spend on tweets? All of these things can turn into some problems that we’ll run into when we try to scale off the shelf systems up The dimensionality of this data is very high We have at least 20 high cardinality dimensions, things like geography, interests, things like that And additionally, we also have another 20 or so lower cardinality dimensions that we need to query over So a lot of the systems we’ve looked at weren’t really designed for this level of QPS and latency They were more for ad hoc internal analytics of, like, maybe there’s a BA that’s trying to do some investigation and is just running a couple of queries per minute So that brings us to why we used Cloud Bigtable We decided to use this as our analytic database and basically build analytic product on top of it So reasons we chose it, it’s fully managed, which is awesome We don’t have to run any hardware We don’t have to run any software We don’t have to worry about upgrading it ever

All of that’s handled for us It’s very scalable, and it’s very easy to scale I can just go into the UI or use the API or whatever, and add another 20 nodes and scale up my cluster by 20%, and it just happens instantly I don’t have to worry about requisitioning hardware and getting that kind of stuff It’s very high performance It’s basically linearly scalable to any size, assuming you have good key design, which we’ll talk about later in this And the actual schema and the design of the system makes it really well sorted for the high performance time series analytics that we want to do So with that, I’m going to hand it over to Misha to talk more about Bigtable MISHA BRUKMAN: Thank you, Steve So let’s talk about Google Cloud Bigtable and the feature set that it provides to enable Steve and his team to do the analytics at scale So what is Bigtable? Bigtable is a sparse, distributed, persistent, multidimensional sorted map That’s the description that you can see in the paper from 2006 So let’s dive into details for a bunch of these different attributes and see how they work in practice Bigtable is a distributed key multivalue store It is sparse in the sense that unwritten cells don’t take up any space, so you don’t have to fill out every single cell in every single row for every single column family It supports atomic single row transactions so you can modify any values across any of the column families And it has a single index, which is the row keys, so there are no secondary indices for any of the columns that you see except for the row key And in this table, you can see how you would access the different cells in the table by using the row key, column family, and column qualifier to uniquely identify them Cloud Bigtable supports the multi-dimensional map model as follows For a tuple of values, the row, which is a string, a column, which is a combination of column family and an optional column qualifier, together with an N64 timestamp, given all of those values, you can retrieve the value And the value is an uninterpreted array of bytes And so here you can see that you have a built-in time dimension that you can use to store time series In fact, while most databases are two-dimensional, Bigtable is actually three-dimensional So the third dimension in Bigtable is time Every cell has an arbitrary number of versions, and there is an N64 attached to each one of those cell versions You can treat them as time You can also treat them as an arbitrary version ID, so you could just give them numeric incrementing numbers You can store an arbitrary number of them, but you can also enable automatic garbage collection, either based on time or based on count So you don’t have to manually go in and clean up and delete the unnecessary values You can keep just writing into Bigtable setup policy, and Bigtable will asynchronously remove data after some time or after you’ve reached a limit such as you say keep the last five values, and Bigtable will automatically garbage collect them asynchronously over time Let’s talk about the Request Routing mechanism in Bigtable Bigtable clients connect to a single endpoint, which is the load balancing proxy layer, and it then forwards the request to the Bigtable nodes that serve the data In the following set of slides I’m going to remove the load balancing proxy layer for clarity But keep in mind that clients don’t necessarily connect directly to the Bigtable nodes The requests are routed through this API layer So here you can see the clients are connecting to Bigtable Bigtable doesn’t actually store any data, which is kind of interesting for a database Bigtable itself is stateless because it stores all the data in Colossus, our distributed file system Here you can see that a Bigtable node manages data without holding onto the data in Colossus So Bigtable nodes have exclusive read write access to sections of the data in a table to provide the atomicity of updates to a single row So every row in Bigtable is exclusively managed by at most one node Nodes don’t talk to each other, so there are no cross-row transactions in this case But this allows for variable scalability Here the A, B, C, D, and E are sorted key ranges within a single table So as we mentioned earlier, the storage model sorts the keys in the table to allow you for fast linear scans Now let’s say we have a single node that is getting a lot of activity because there is a lot of popular data that is being accessed by a bunch of clients What Bigtable table can do is simply reassign the ownership of that data to a different node, and that is an online operation without any data motion So regardless of the gigabytes or terabytes of data that you’ve just reassigned, this is an online operation, and without any data copying, the data stays exactly where it was in Colossus, and this is just a metadata update to show that there’s new owner of this data, and the requests flow automatically The client is completely unaware of this because their requests are automatically forwarded by the load balancing proxy layer that I mentioned earlier So clients don’t need to do any discovery or any updates or management of who do they need to connect to They have a single endpoint, they connect to that, and the requests are routed automatically, even when the data is rebalanced and reassigned at scale This ability to easily reassign the data and the strong separation between the processing storage enables the simple scalability that Steve mentioned earlier What this means is that if we add another node to the cluster, as you can see here, we can easily shard and reassign the data to other nodes

And so the cluster will scale in performance in linear order compared with the number of nodes in the cluster You don’t have to do anything to enable this You just add more nodes to the cluster and you get the additional throughput in the cluster given the size of nodes Similarly, of course, you can shrink down the cluster, and we will again resize and we balance the assignments to fit that smaller cluster What this means is that you can get seamless scalability in scaling for Cloud Bigtable without any downtime And as Steve alluded, you get the linear scalability ability if you have good schema design Now what does that mean to have a good schema design? We’ll talk about this in a bit as to how you can get good schema design and what it actually means for Bigtable, and what performs well But before we get to that, let’s talk about some of the recent features that we’ve announced for Cloud Bigtable Yesterday we announced that Cloud Bigtable now supports replication, and it is now generally available There are no manual steps to enable replication of synchronization, or any steps to repair data, or anything to synchronize on your writes or deletes This is automatically handled for you as soon as you add another cluster to an instance And some of the use cases that you might consider for this are either high availability for serving or workflow separation for batch and serving use cases As I mentioned, as of yesterday, it increases the SLA from 95 to 9995 and a zero touch failover for HA So with that, let’s return to the row schema design and how that impacts performance As Steve mentioned, good row key schema design is crucial for performance to get the linear scalability because the row key schema design will guide the access patterns, and you want to make sure that your access patterns scale linearly to enable linear scaling for the performance in your Bigtable cluster If you have uneven access patterns, you’ll end up with hotspots If you’re storing too much data due to the row key schema design in a single row, you will also you have trouble doing that because since we cannot split a single row across multiple nodes in Bigtable, you will have single servers that are really trying to serve a single role that maybe gigabytes in size So in the end, you really want balanced access banners, you want reasonably sized rows, and that will provide the linear scalability that you have with Bigtable Traditional monitoring involves line charts, such as this one, where you can look at CPU utilization, you can see latency, you can see data stored and such So this will show you, for example, this graph, CPU utilization on average in a cluster We also have CPU utilization for the hottest node in the cluster, which might hint at the possibility of hotspots But what if you wanted to know more? What if you wanted to see what is actually happening under the covers, and how is it that you can improve your row key schema design to improve performance? How would you know if you had large rows? How would you know that there’s a map produced that may be affecting your serving latency? How would you debug if you really have high CPU and find what the cause of that high CPU might be? With that, I want to hand it over to Wim, who will talk about a new feature for Cloud Bigtable, Key Visualizer Wim? WIM DE PAUW: Thank you, Misha Good morning, everyone My name is Wim de Pauw I’m the Engineering Manager of the Visual Analytics team in New York and very proud to announce Key Visualizer for Cloud Bigtable This is a novel approach to analyze the behavior of your table You saw the line chart before Well, typical line charts, they only show one dimension We show read and write access patterns, both over time and the keyspace This will help you find and prevent hotspots You can find rows with too much data, and you can see if your key schema is balanced This is the heat map if Key Visualizer It will show time horizontally It proceeds from left to right This example shows one week of behavior of your Bigtable The keyspace is shown vertically with the lowest keys at the bottom and the highest keys at the top Then there’s the heat scale You can see it has colors from cold, in black, for zero, and then it goes to dark blue, red, yellow, to glowing white hot for extremely high values In this example here, you can see a diurnal pattern You see this lowest key range over here? Well, there’s about seven hotspots here So someone must have hammered that key range very heavily every night Over there you see a cold area because it’s practically completely dark And this horizontal line in the middle that is glowing white must be a key that has been used very, very heavily In this example here, you see a diagonal line, which is probably a sequential scan from a MapReduce job And then, if in Key Visualizer, if you hover with the mouse over the screen, you’ll see in the tool tip more details like the actual value,

the precise time, and then the start and the end key of the region that you’re hovering over Around the heat map, you’ll see at the bottom end on the right aggregate values in bar charts So here at the bottom you see averages over divisible keyspace in 15-minute intervals This is pretty much what you would see in your traditional line chart On the right side here, there’s another bar chart that shows you averages over visible time range per key range You see that spike over here? Remember, that was that bright white line of that hot key that we had in the middle Now when we were observing Bigtable application developers at Google, we observed that very often, although not always, they were creating their keys in a hierarchical value, in a hierarchical way So they were using prefixes and concatenating these prefixes a little bit like this example This example here uses slashes and dashes It could be other symbols, too And so we thought, we’re going to, in Key Visualizer, extract the prefix tree so that you can see your key schema in the heat map So the result of that will look something like that So here you see an icicle chart And the leftmost tall, thin rectangle here shows you the top prefix And then as you go right, you see prefixes going down the hierarchy Why is this useful? Well, first of all, you can now see your key schema of the data that you designed And hopefully you can recognize the data that you stored that way The other thing is that we often observed that behavior is correlated with the key layout So in this example here, this is a very small key range, and you see that corresponds to a series of key prefixes here And it’s a very distinct pattern You see this on and off pattern over four days over here, all correlated with the key prefix Key Visualizer gives you 11 metrics First of all, we give you ops, which is the sum of reads and writes Then we give you three warning metrics that can point you to possibly bad performance I’ll talk about that later on Then we have reads, writes We have performance metrics like latency And finally, we have the amount of data that is stored per row Let me show you now some common key access patterns So over here you see an imbalanced keyspace Again, remember, time goes horizontally from left to right And you see that about 20% of the top keys here were very heavily used They’re glowing yellow and red, whereas the 80% at the bottom, they’re barely used because you can see that it’s very cold In this example, you see that your database was pretty much cold over the whole key spectrum But then all of a sudden, at this time, you see a very sudden increase for a key range from here to here When we look at that example here, you see that everything is pretty much cold except for a few glowing white keys So that’s an example of a couple of hotspots that we see here And then finally in this example, this is the same example that I pointed you out in the previous slides here You see this diagonal line, which is probably a sequential read caused by a MapReduce job Now Key Visualizer is not just about visualization It’s also about analysis, analysis on possibly trillions of metric values So imagine if you have a database with a trillion rows If we want to show that behavior on your laptop, we have to downsample and aggregate it in one way or another to get it to your laptop in less than a minute So we have to aggregate and downsample all these performance metrics Then, on the other hand, we also analyze every single performance value that we collect in the system, and they’re very detailed The result of that is three metrics The first one is the read pressure index And that is a function, an opaque function of read CPU, read queue length, which is the number of read operations waiting to be executed, and latency And that function can point you to possibly bad read performance We have something similar for write behavior

And then finally, we have large rows which will point you to any rows that contain more than one gigabyte of data Now, this is a massive amount of calculations and data that we analyze But the good news is that this has absolutely no performance impact on your cluster, absolutely not And it’s all for free This is an example here of a high read pressure You see this glowing spot Now some of you may think, well, this may be a little bit overwhelming because I have 11 metrics I have all that data Where do I start? We made it easy for you So when you open Key Visualizer, it opens with ops That’s the default metric And what it shows you is just an overview of your activity You see if there’s any hot regions, any cold regions, if you have any diurnal patterns, any linear scans And then you’ll see if there’s any warnings at the top bar If there aren’t any warnings, that’s good news Then there’s nothing to do You can just relax If there are warnings, and they will show up like this example over here, then you click on the button to see which keys and when they were causing problems I’m going to show you an example now So imagine you open Key Visualizer in the ops view You see an overview of your behavior This is one week And you see what’s hot and what is cold, and the diurnal patterns And over there, which I’ve annotated in the red rectangle, you see this read pressure warning It has three exclamation marks, so that means it’s severe I click on that button, and it brings me immediately to the read pressure index metric You see this glowing white stripe over here? So when I hover over it with the tool tip, I see immediately the details of the problems So I’ve got a list of problems here But the most severe one is the one at the top here, which I’ve annotated with the red rectangle And I’m going to zoom into that now So what is this? Well, you see it’s glowing, so it’s bad So first of all, this is a single key And it’s been stressed for two hours and a half So that’s pretty long The other thing which you can see is the exact name of the offending key, shown over here in the second line And why do we think that it’s bad? Well, look at the total read CPU It’s using 446 millinodes, which is roughly half a server worth of CPU Remember, this is just for one key And then we also give the potential culprit, the suggestion of what probably was causing this Well, we’re dealing with a large row here So look at this This is 10.38 gigabyte And you know that the recommended row size, the maximum row size in Bigtable, is 256 megabytes So this is way too much, of course But you imagine someone is hammering that key very hard, pulling out all that data, and each time they’re pulling out 10 gigabytes No wonder that this is going to use a lot of CPU See, this is how easy it was, I open Key Visualizer I look at the overview, click on the alert buttons, and it shows me the problem What we’ve shown you so far are metrics that are views that are showing one metric, for example, ops And you can imagine that you’re looking at, say, a read hot spot And you may want to know, well, am I also writing at this time and space and place? And you switch to writing And then you want to see is there any latency, and you switch to latency And so this was a little bit cumbersome We thought to make this a little bit easier So this new feature, which we probably will launch in a couple of weeks from now, is the Multiple Metrics View So here in the middle, you see what we had before When I hover with my cursor over an area, I see in the tool tip the values But now, at the same time, I see all the metric values for each metric over there on the right side So this lets you compare and correlate metrics simultaneously just in one view In this example here, you see that the reads that are in purple, they’re somewhat lukewarm, whereas the writes, they’re bright yellow So we’re doing not so much reading, but a lot of writing in this example And then there’s the Expand All button over there So if I click on that one, this will bring up miniature views for each metric that are showing the same window as the main view

So here I had ops Over there I had the number of bytes written Over here I have the number of data stored, and here the number of rows And if I hover with the cursor over the main view, the cursor will simultaneously change over here, and you see the corresponding metrics also If I zoom in and out, and if I pan in the main view, all of these view will just move and will be zooming in synchronized So that will be coming in a couple of weeks Let me summarize to give you, again, a couple of the benefits that I’ve demonstrated so far So it helps you detect and prevent key access hotspots It helps you find large rows more than one gigabyte And there’s no performance impact for you at all And it’s free One use case that I did not illustrate over here is how to optimize your key schema And that’s what Steve is now going to talk about because he used Key Visualizer for that purpose STEVE NIEMITZ: Thank you So as Wim introduced, now we’re going to talk about how we used this feature while we were developing our analytics product in order to tune and really optimize the data So one of the things we realized while we were building this is that the key schema and the row key schema really matters a lot for how well the performance of your Bigtable will be Again, poor key schema design will end up with a very nonscalable Bigtable design So a few common things that we realized we needed to do is that we should tune our data to the query patterns that we were expecting, which kind of makes sense So in building our row keys, we basically want the queries, the dimensions in those queries, to match the row key from left to right So you add some dimensions, you put them in your row key in a serialized form You want to be able to match them left to right to get optimum data querying in your queries Also, you want your row keys to be well distributed across the whole keyspace Otherwise you’re not going to have a very linearly scalable system because only certain nodes will be serving a majority of the data And also, you want your queries to have the minimum number of scans required in order to keep your commonly accessed data together So what we’re going to do now is I’m going to introduce just an example, a kind of toy key schema that’s a simplified version of what we use for our actual advertiser analytics And we’ll walk through some simulated traffic on it and see what they looked like in Key Visualizer, and how we improved the design So here are some simple dimensions You can imagine advertiser ID is just the ID we give any advertiser that signs up with us A campaign ID is a campaign they’re running So maybe it’s an ad they’re showing, something like that Timestamp is the time the data that we’re querying corresponds to Generally we group things into an hour granularity So the timestamp will be at the hour level Engagement type is just kind of like what the user did to engage with the ad Did they view it, did they click on it, did they install an application, something like that And then display location is where it was on the page Was it in their timeline, was it a search result, things like that So basically our first iteration was really simple So let’s take all of these dimensions concatenated together from left to right, and that becomes our row key So timestamp, advertiser ID, campaign ID, et cetera, et cetera There are some pros here It’s really easy to scan across all advertisers if you have a time range, and generally all of our queries will have a time range in them You have good data locality because all the data for an advertiser is grouped together But now let’s check out the cons So this is a view of what our Key Visualizer looks like with this schema This isn’t great So we have here, there’s a big spot of basically no traffic Up here you can see there’s a very pronounced line of hot keys, which generally will correspond to the current week of data we’re looking at here Most of the queries for our data that we have are querying for the most recent data That makes sense People care more about what’s happening now than what happened in the past So this isn’t great This is not a really good design If we had run this in production, we’d probably have very hot nodes all the time because whatever node was hosting this small key range in the middle here would be overloaded all the time

So let’s iterate Let’s take what we learned here and see what we can do So we learned timestamp was not great to put in the front So let’s swap timestamp and advertiser ID What this is going to do now is the advertiser ID is at the beginning of my key And the timestamp is within there So it’s still easy to scan for a time range within an advertiser It’s harder to query for all advertisers However, in our use cases, that’s not really something we care about Advertisers coming in are only looking at their data Obviously we’re not going to let them query for anything So although we have a con here, it’s not really the end of the world It doesn’t really fit our use case anyways So let’s look at what happened here We got a little better So there’s more distribution However, we still have some hot keys, some hot ranges here These most likely will correspond to advertisers that are more frequently querying for their data Additionally, our advertiser ID keys might not be very well distributed across the entire keyspace They’re probably sequentially generated integers or something So in that case, we’ll end up with groupings of like, we probably generated a bunch of IDs here and then here And those are all groups that add factors So let’s iterate again We’ve learned that we put advertiser ID in front, things worked out a little better However, we didn’t get a good distribution on that advertiser ID keyspace So instead, let’s reverse the advertiser ID Basically the first digit is now the last digit, which will basically synthetically distribute the keys a lot better because now, instead of the most significant bit that’s generally not changing much, you have the least significant bit, which is always cycling all the time if they’re sequentially generated IDs So pros and cons are basically the same now There’s not a big difference other than you couldn’t scan advertiser IDs sequentially But we don’t care That’s not a use case anyways So this is what we end up with Everything looks pretty good We still have some rows here which correspond, probably, to more frequently accessed advertisers that are pulling their data more But in general, the keyspace is really well distributed across basically the entire range This is what we want it to look like And this will lead to a really well-performing design So the really important thing here is because we were able to do this, it really gave us confidence that when we took this product and went live with it, that we wouldn’t end up with horrible performance issues, and we wouldn’t have to redesign it in a year because it wouldn’t scale It’s also been really helpful for us if we have a production issue come up, like why is our latency going up for certain advertisers, or things like that We can go right into Key Visualizer and see what’s happening at the time And now I’m going to hand it over to Misha MISHA BRUKMAN: So as you’ve seen, Steve at Twitter was able to use Bigtable at the scale that Twitter needs to do visual analytics and serve their ads analytics Bigtable provides the fully managed, NoSQL database to serve these types of use cases for time series user analytics as well as others It provides petabyte-scale scalability for storage, high throughput and low latency for random access And it provides the ability to seamlessly redistribute the data to handle increases in load, and seamless scalability while adjusting to the access patterns that you can put on it And you can learn more about it at the URL WIM DE PAUW: So Key Visualizer, which is in beta now for Cloud Bigtable, visualizes your key access patterns It will help you find hotspots and large rows And it’s not just visualization We also analyze trillions of metric values It extracts the key prefix hierarchy if it’s available And it doesn’t have any impact on the performance of your cluster There’s a URL here that you can go to for more information STEVE NIEMITZ: And then finally, some of the really key features that Bigtable and Key Visualizer enabled for us at Twitter It really gave us a very fast time to market on bringing in this new improved advertiser analytics product up to scale Additionally, using Key Visualizer gave us really good confidence that we would be able to scale up and end up with a scalably designed system We were able to iterate on the row key design way faster than we would have been if we had to infer it through other metrics like P99 latency, things like that And then finally, it made it really easy to change the row key schema during development and just see what happened It let us iterate really quickly, but without going live in production and actually having to worry about impacting that So that’s all we have Thank you very much [MUSIC PLAYING]