EARL 2015 – Joe Cheng – RStudio – Conference Keynote – Shiny

well good morning this is gonna be a very different talk than my last time my name is Joe Chang I work at our studio I’m a software engineer and for the last couple years I’ve been working on shiny a web framework for well I’ll tell you what it is so I have a feeling that a lot of you might already be familiar with shiny can I see a show of hands who here has actually used shiny before oh good lord okay so there are still some people out there who have not worked with it so I’ll spend about maybe a third of the talk talking about what China is and then I have some some new stuff that guarantee you haven’t seen before so so I always start talking about shiny with the same four slides and really it starts with the motivation so why did we make a shiny it’s really we wanted to take all the things that we really like about our the state-of-the-art statistics the amazing visualization and the community and the packages around it and address some things that we saw as lacking in our at the time are by and large as a personal experience not a shared one you tend to work alone on your desktop computer or you know your your server session and you’re gaining insights by looking at your own console and your own plots and we wanted to make it something that was much easier to kind of share your knowledge with other people and what you do have outputs from your our sessions usually it’s in static formats like PNG s and PDFs and also a lot of modern visualization these days is happening in the browser there’s really cool libraries for JavaScript that are that are really making it very easy to experiment with with new forms of visualization so we wanted to address all those and shiny was our answer to that shinee is a package that lets you easily create your own interactive web applications around your our analysis so it is not a intended to be a general-purpose web framework it doesn’t replace Ruby on Rails you wouldn’t build a webstore and shiny probably it is designed for people who work with our to analyze data to build cool interfaces on top of them it does not require any web development skills no HTML CSS or JavaScript and I was really important to us to be able to tell any our user anyone with the ability to write a function and subset a data frame to be able to say you are ready to write shiny apps if you you know just look at these few pages of getting started that being said I’ve been a web developer since 1996 and the idea of working on a framework full-time that would hold me back and restrain me from doing whatever I wanted to do in the web browser it’s not terrible to me so shiny is also designed to let you go hog-wild with web development skills if those are available – so shiny is designed to look good by default but very customizable also it’s designed to integrate with libraries like d3 and leaflet and and what have you and it uses a reactive programming model which I won’t get into today as part of this talk but to me really represents a dramatically leap forward in how you’re able to create user interfaces very concisely and robustly using are okay so in case you have never seen a shiny app before this is not going to work okay good I have local copies of most of my demos so this is about as simple as it gets for a shiny application this is your everyday iris dataset that has been demo to death we’ve got a selector on the left for the x-axis a selector for the y-axis and as I change these you can see the plot immediately updates and I’m using the k-means clustering out with algorithm to have an assignment here with the colors representing the set membership and the X representing the centroid for each cluster and as I change this cluster count we can see what it looks like with different numbers of clusters and all this is just these two files this is the source code behind this application so in total maybe a couple dozen lines of code and

you can see here that the actual analysis part is just for everyday our code you know we’re calling some subsetting we’re calling k-means and then we just do a normal plot and call points so pretty straightforward and the UI dot our file it indicates what the user interface should look like and here again it’s a very literal translation of what you see on the screen and again it’s only using our code so extremely simple to get started and and you can build things that before shiny would have taken a little more effort so that’s a very simple example we can get quite a bit more complex if you like this is going to be so there should be a map of the United States behind here like Google Maps but we since I don’t have internet here you can just see the outline from the data this is a zoomable map that shows zip codes of the United States and the size of the data points represents the population I can click them to see an exact number and the color is either red or blue and the blue zip codes represent the most affluent best-educated ZIP codes in the United States the 95th percentile and you can see that they kind of tend to cluster around cities this is Seattle here and the Northeast in particular around to New York and Washington DC are just saturated with these things I can change what the representation of the map means I can change the color to represent rather than a binary is 95th percentile or not have a continuous grade of that score that ranks their income and education and down here I got two plots one that represents a histogram of that tile score and then the other scatterplot of income versus college and as I move around the map you can see that those plots update so I can zoom in to some particular part of the country and see how they’re doing versus other parts there’s also a data Explorer tab here that lets you look at the raw data that underlies that map and I can filter this down to you know particular city a particular state I can combine these filters so I can compare Seattle to Portland and and then maybe ranked that by their score and this entire app is three hundred and fifty lines of our code and again no no HTML was necessary let me just show one more this one’s completely not going to work that’s a shame so those two apps were used for exploration yeah this is totally not going to work what that should be doing is pulling live information from one of our our studio servers we run our own Cranmer and it’s quite a popular one and what this should be showing is live down like a live feed of downloads that are being performed on the our studio ekran mirror right now and and you would see a bubble chart filling up the main thing to take away here is that shiny apps don’t just need to be used for exploration you can also have these kind of dashboard you type things that are designed to to let you monitor live data that updates in the background so in terms of what what we’re doing and what we’re gonna be doing in the future with a shiny in case you’ve been keeping up with us one of the things we have added recently is Russian capabilities for based graphics and ggplot2 so in case you have plots that you want to interact with directly by by throwing a selection directly on the plot rather than you know moving sliders and things like that to the side those are now available not just for new JavaScript based visualizations but even for based graphics and ggplot2 and I’ll have a demo of that in a moment and another thing that we’ve heard from shiny users is that they’d like more tools for analyzing their applications performance and for fixing bugs and that’s something

that work we’re working on as well and we’ve also added a whole new set of options for enhancing for people who are very deep into JavaScript for enhancing the way SHINee communicates between the browser and the server and having all sorts of hooks for you to customize that okay so that’s just a taste of shiny if this is something that you’re you know haven’t used before and are interested in getting started with shiny our studio comm is our source for all things I can’t oh quit in I’ve screenshots so you can at least see we’ve got a very detailed tutorial that can get you started from from zero and then dozens of articles for diving into specific topics about shiny in a gallery with many many examples including some of the ones that I show today okay so now for something totally different there is a particular sorry I just played for a second so shiny gadgets as I was putting together demos last week for this talk I realized that there’s a particular way of using shiny that has been a little underutilized I think in the past and we thought it was kind of time to bring it to the floor and I want to explain what a shiny gadget is by first explaining what it’s not I want to contrast shiny gadgets to shiny apps and I took this quote from Roger ping his Coursera course developing data products he says a data product and in this case shiny apps are generally data products a data product is the production output from a statistical analysis data products automated complex analysis tasks are used technology to expand the utility of data informed model algorithm or inference and for me the key words here are production output you’ve done some analysis you’ve come to some conclusions and now you’re building something that could be an artifact for other people to it to engage with your conclusions or to do some exploration on their own but the bottom line is you’ve done your analysis you’ve done your work and now here here’s an output not to say that can’t meet to more work but but still this is something that you you’ve done your analysis and and now you have this this object and shiny gadgets in contrast are an interactive tool designed to be invoked straight from your are scripts or from the console to assist with analysis tasks that are inconvenient to tackle with code alone so the difference is a shiny app is like a report or a dashboard and a shiny gadget is like a tool to help you actually do your analysis so what do I mean by that okay so I’ve got a ggplot here over sorry this is meant to be hidden this is just a regular ggplot here can you guys see that okay and this is showing just the air quality data set and I’m plotting ozone versus temp so there are a couple things that I’d like to be able to do here so plot one is my GG plot object just like we normally create them and I’ve made a function called GG zoom so I’m gonna run this and unlike most shiny apps it didn’t pop up in a separate browser window it’s showing right in my viewer pane here in our studio and we just reach through the plot so I can make a selection here and choose to zoom in on a particular part and and it zooms in I can hit unzoom to go back to the original view so that’s not the slickest zooming you’ve ever seen an interactive plot but the cool thing here is that I can give almost any GG plot that I can come up with I can stick it to that GG zoom function and it will work that way so you know in this particular example it’s a very small data set so it doesn’t really matter but you can imagine much denser datasets where zooming in really helps you you know deal with over clotting issues and things like that another thing we can do is to use these tools to identify points so I can hover over some of these points and you can see at the bottom now it’s showing me the row number and then the Associated values for that row and another thing we can do here let’s

let’s look at the Diamonds data set so here I faceted it by the cut and what this tool this GG brush tool lets us do is examine some subset of this data so I can select out you know maybe some of these very good ones and it says I have 72 observations selected and if I hit done this actually gets returned to me as a data frame right to my art console I forgot to save it to a data frame so I can do that now with that last value and now I can do you know whatever I can do with with the data frame and so that’s one of the the main differences other than the fact that it’s showing up in our studio and it’s designed to be invoked as a function is that these gadgets can actually return values to the caller so you can use them straight within your are scripts or from the console interactively as you’re trying to you know make sense of your grade so going back to the iris dataset this example I can make a bigger this example shows three separate GG plot of the same data and I can make a selection in one of these plots and it will you know link them to the to the other plots I can make a selection in any of these possible show the others and including one dimensional plots I can just add on the x-axis from histogram and it’ll similarly work and again this is with any 3 GG plots you can come up with or we can do you know just two of them instead and if I run that same code then it shows up with two instead of three okay I guess that’s just another example the same thing and then so one more example with ggplot is if I have a bunch of data here I’ve taken the cars data set and I added some outliers because there weren’t any so now if I’m viewing this data you can see that there are some cars here some observations that have a braking distance that’s negative so clearly that’s wrong and if I select them I can remove them and this one for some reason I decided to not liar as well I can remove that and I can undo and reset and again when I am happy with my results I can hit done and it comes back again as a data frame that I can then continue and use so these examples used ggplot just because I I thought it would make a good demo and also because I don’t have any particularly useful domain experience in statistics so I limit myself to you know these kind of toy examples but the point is that this mechanism is generally useful and can be used for all sorts of different purposes so just to talk about the difference between some of the differences between shiny apps and shiny gadgets so as I said before apps represent the output of an analysis while the gadgets are used as you’re performing the analysis the shiny app you’re building that usually for end-users whether that’s collaborators or the general public or you know stakeholders in your business whereas the audience with shiny gadgets are people actually interactively using R or writing R scripts a shiny apps are deployed on servers and shiny gadgets are invoked directly from our as functions the way you write these are a little different as well when you write a shiny app you do it in a UI dot R server dot our file and shiny gadgets you to find them in line in a function maybe in a package or something like that and for shiny apps it’s really just user input and external data that drive the outputs and a shiny app and a shiny gadget you’ve got that and function arts so whoever is invoking you can pass in data or pass them even a callback function or whatever whatever you want and that’s something that’s very difficult to do in a regular shiny app shiny apps don’t return values what would that even mean when it’s running on a server somewhere and the shiny gadget because you’re invoking it as a function can return values that you can then go do for other things with and this is what it looks like to write

one of these shiny gadgets this is stuff that’s very new so we don’t have this documented on our website or anything but we will so if you have done some shiny before this probably looks familiar we’ve got you I owe you guys a UI variable where we’re declaring what the applications gonna look like we’ve got a server function that determines the behavior of you know what plots get drawn and where and then we have a shiny app call at the bottom to actually create the application so the things that are different about a shiny app is this dialogue page is custom to work well within that viewer pane in our studio when things want to render in a small space like that you tend to want to fill up the area rather than have a scrolling page and dialogue page it’s a function I’ve written to help me do that and it will hopefully be in the next release of shiny then you have to have this little chunk and observe event input done that tells our what to do when the done button is clicked and often we will calculate some kind of return value that makes sense for that particular gadget and then calling stop app is what causes the application to stop running and return the value to the caller so anything that you can do anything you can return from an arc function can be returned from a shiny gadget as well and then finally when you make your shiny app object this little incantation at the bottom will tell our studio to run this app in run this gadget in the our studio viewer if your if your app doesn’t make sense to run in the viewer then you could also leave this line out and and this also works fine from besar if you’re not in our studio fan it still works it just launches in a browser instead of in a dedicated window so I’ve touched on some of these use cases already with the demos that I’ve shown but these are just you know the things that Winston had Lee and myself thought of in a 10-minute conversation about what we could use gadgets for so scalable viewing I mean I’ve definitely heard from a lot of users that we’re often working with data that we both need to examine from a very high level to look for patterns but then once we see an interesting area that we want to zoom in and and in piece apart the individual observations so you know you can definitely imagine gadgets for all different kinds of data in our that would be really helpful for letting you do that at different levels a sub setting or you know whether it’s exclusion or just because you’re interested in a particular you know part of your data viewing high dimensional data link rushing is always popular and and then these these kind of iterative tasks that are just really frustrating to do on your own so for example if you’re if you’re trying to tweak some parameter in a model to make sure you’re I don’t know anything about statistics if you’re tweaking a model for some reason and you want to iteratively look at the output that can be something that’s a little bit annoying to do in a code based environment and if there’s a shiny gadget for that particular model or whatever then you can visualize it and interactively tweak it and when you’re done you get parameters back and and then code generation this requires a little bit more explanation so I’ll show you so on Saturday the Obama administration released the data dump for this college scorecard data set has anyone seen this over the weekend okay so I guess one of the one of the things that the Obama administration’s been talking about for a while is that they want to help college students make better decisions about what school to go to not just based on some arbitrary ranking of how good a school is but one of the likelihood of students who graduate from this you know university or college actually paying back their student loans did it actually result in then having salaries that are higher than you know the average high school graduate and they were going to come out with a ranking and after looking at the data I think they decided you know what here’s the data it’s really complicated so so here’s the data and they also made a web app that lets you kind of explore this on your own criteria so I heard about this last night and decided it would make an interesting gadget so let’s cross our fingers the interesting thing about this data is it’s several hundred megabytes and it covers the years 1996 through 2013 and it comes in a CSV file with 1700 columns so you know if you’re looking for some particular piece of

information that’s a little bit of a needle in a haystack and a lot of public datasets we deal with they’re like this if you look at census data you have to take a course just to figure out how to use the census data there’s just so much that’s part of it and for the college scorecard I wrote a function read here or just read where you pass it the years that you want and the columns that you’re interested in but how do you know what columns are interested in so I wrote a gadget for that so you see I wait for interactive read and this is showing you all the columns that are available there 168 pages of them and the column ID is on the left so this is what the column will be named after you import it and on the right is a description and this is searchable so I can say give me so this is a little zoomed in giving me the institution name I want the latitude and longitude and let’s talk about admission or age and let’s say 2013 is fine I can select a range here if I want but let’s keep it to one year and then if I want to I can filter it by school so you start typing I’m not gonna leave this blank so I’m going to get all the schools so once I’m happy with that I can hit read records it’s gonna take some time to parse the data and then here’s my data set so I’ve got the data set here I can save that and it also shows me this is the code that it executed in order to get that data set so I can copy that and put it in my script and away we go now I did this morning so it’s very simple but you could definitely imagine that when I click on one of these rows it gives me a preview of the actual data or if it’s you know a set of factors that it will tell me these are the set of factors or tell me how sparse the data is in that column you know for the years that I’ve selected you can do any number of things and and to do this without a gadget like this you’re diving through PDFs where they’re describing what the columns are about and then I’m copying and pasting column names and if it’s not right then I have to you know run it again so and again this was just for one data set just for the college score board and yet you know given the amount of time it took me to write it and how much time it will save for anyone who wants to use the college score board a scorecard data set I think it’s it’s well worth the investment in time so now I’m github yet so hopefully sometime later this week to publish that and finally I have one last demo so this is the miles-per-gallon data set that comes with ggplot2 let’s take a look at it it just basically has some cars that were reviewed in 1999 by I think Motor Trend and then reviewed again in 2008 with their contemporary equivalents and shows some statistics about or some attributes of the car and then the city and highway mileage I think this is a US based so when I was first learning deep liar how many people here have used the flyer okay some of you deep flier is really an amazing tool and you can use it to really easily subset and manipulate your data so in this case let’s say I want to grab the displacement and the year and the highway mileage right and now I can run that and and it shows me sorry so what’s like that and I can let’s say use a GG vis visualization to show the displacement versus the highway so as I was learning a deep fire I found it really kind of difficult to visualize what was happening when I make these various calls so select is one I can use mutate to to factorize the year and over and over again I’m selecting the code that I just wrote and running and then I want to see what what happened there or if something happens that’s weird like later in my code then I have to go back and select some earlier chunk of code and run it to remind myself that my data looks like so I wrote a little tool to

help with this so this is a pipeline of commands where in the center I’m writing all the individual elements of my deep-fryer pipeline so I’m going to start out with my data so mpg and you can see on the right that’s the data that I’m looking at now I’ll move to the next stage and when I move to the next stage the output of the previous stage now appears on my left and since it’s blank it’s just passing through the data so on the right I just have the same thing but now if I want to select I can easily see right in front of me what columns I have to choose from and I’m going to select what did I say displacement year and highway and I can get immediate feedback that yes Andy that did what I want I can remind myself what a previous stage look like by just mousing over and as I move back and forth it will show me the difference so in this case what did I want to do I wanted to mutate the year to make it a factor that has no visual difference and then I’m gonna go ahead and do my GG this again except displacement versus highway the points are fine but I actually want to add some trend lines so I’ll do layers smooth and and then let’s add the points back and since we can let’s color it by the year and what color the trend line as well stroke equals here so I can’t stroke the trend line because I haven’t grouped my data okay and now we see that the trend lines are separated and let’s say I want to I want to adjust the smoothness of that trend line okay so I can just play with this value until I see something that looks sensible okay and now that I’m happy with my overall visualization I can hit done and when I go back to our studio where my cursor was it’s now inserted the code that I inserted into that pipeline so this is barely working this is very very very much demo aware but I thought that it’d be useful to get you guys thinking about this so that as you think about your own patches in your own data you can think about cases where this might be useful and then hopefully you know in a few weeks or whatever when we can actually work with the our studio ide team and get this all hooked up and working really nicely that leaves kind of set the stage for for this to be a very useful technique so I just wanted to give one one quick answer to an objection that I’ve heard a couple of times of things with China in particular and then I know we’re gonna hear about gadgets which is that why why should I care you know especially people who have been around for a very long time they say we had linked brushing and interactive graphics in the 70s with Lisp machines you know like why is this stuff considered cool now and I would say that the reason we think that shiny is really interesting compared to these custom tools that have existed and continue to exist is and you know they’re their new versions of this like like there’s tableau for you know more more kind of simple manipulation but there’s also Jacobi and and there there are always new new custom tools coming out for doing interactive graphics so the reason that we think using shiny to do interactive graphics is is interesting is number one tight integration with our so any kind of analysis that you can do with are any kind of statistical methods a machine learning or whatever that you can do from are you can use from your shiny app it’s the installation and deployment has traditionally been really problematic for a lot of these tools they’re often distributed as either you know open source tool that has a bunch of requirements that you have to go hunt down or you know something that needs to be installed on every user’s machine and really the most important thing I think is the third point that new utilities can be built just by regular our users that you don’t have to be an expert like you would to build a custom tool like a tableau or like a gob we designed shiny free for anyone to be able to build interactive tools of their own and in fact not only is it easy to learn but the actual cost of writing these in terms of the lines of code and amount of time is so dramatically lower than what it is to write custom attributes that now we think it’s worthwhile to make a custom utility for each new data type so if you’re the author of a package that does some kind of time series analysis or something like that we think it’s worth your while

to think about what kind of interactive things do my users want to do just for my data type and create gadgets and apps around that data type and in fact we think it’s cheap enough to even write throwaway utility apps for a project so you know that college scorecard is one example but you might have a project you know you’re going to be working with this client and this data for you know six months and and it’s worth it to right tools to just help you work with the specific data that they have so so that’s all I have do we have time for maybe a question q hi yeah they’re kind of scattered around I’ll have the slides available for download I assume and then the links will be in the slides thanks rose pretty exciting tools coming about for government work safe I really love the idea of animals that got tired from going back to the script with you oh just maybe getting become self sensing absolutely and that’s something that we really need to think about how to do this right it’s it’s there’s a there’s a reproducibility problem here that gets introduced any time you talk about interactive tools and I think in order to address that we really need to be careful what we let users do versus what we kind of enable them to do themselves and and so that had not honestly occurred to me when I made the Judy Platt demos but it occurred to me by the time I did the college scorecard ones so I think that part of the work that we at our studio have to do before we’re ready to roll this out to everybody on our vlog and tell them to go start making gadgets it’s that defines the best practices about what gets returned from your you know done button are you returning a code snippet are you returning parameters or are you returning data or maybe you give the option to the user or maybe you have some way to return all three so maybe it’s you’re returning the data but there’s an attribute on it for the code and there’s an attribute on it for the parameters those are all things that we have to figure out I think we have the right people to figure it out so well with it to a nice place I’m sure one more question you shiny versus slide if I stratify is a package for making presentations for our I think a more direct comparison would be between slide if I and I Oh slides which is a package that comes with or a theme that comes at our mark phone I honestly I use keynote for most of my presentations because they don’t actually involve much you know statistical stuff but you know I think part of the really great thing about being in the art community right at this point in time is that there’s just so much energy around combining our and what’s good about our with the web one thing that I didn’t have time to mention today was HTML widgets which is a really great package that we’ve developed in collaboration with the author of sly defy so you know there are really a lot of different approaches that people are pursuing to combining our with interesting web-based and interactive stuff and to me I think that’s great we’re not too proud to steal great ideas wherever we see them or because everything is open-source to you know throw in our efforts and help improve those other packages as well so all right thank you very much oh sorry one more hand okay I’ll be available at the booth for the next couple days so we’re right near the entrance I look for the our studio table so thank you very much you