Teaching Statistics with JMP is as Easy as 1, 2, 3 (USCOTS 2015)

well we might as well get started I will you know go up for some basic stuff in the beginning for anyone who comes in late but I want to start by introducing myself and well first thanks for taking time out of your Friday night to be here I was actually a little worried you have 4:30 on Friday at a conference when all the people I know here like sorry booing I’m not gonna be them drinking beer on my porch so thank you for being here so my name is Julian I’m an academic ambassador with junk I also teach statistics at University California San Diego intro in advanced class how many people were in the session yesterday the one two three high several of you great and then the morning session with me in a couple of you as well great so you’ve heard a little bit of the basic stuff I wanted to talk about but the main purpose of the workshop today is really a getting started workshop so you’ve seen what jump can do or any things you’ve heard about it and you want to see how it works how it feels today was meant to be kind of a hands-on session doesn’t have to be I’m actually recording it so I’ll point you to the link later on if you want to have hands-on session later on watching it that’s fine too but if you would like to do it now actually you go to jump comm there’s this try jump with its top there and actually if you scroll down you can click start your trial now it’s only a couple little questions on form you can download here it’s not terribly fast so you may not get jumped until halfway through but you like to try it out certainly feel free if you have an older version of jump that’s fine jump twelve is what’s out now and I’ll certainly point out some new features but don’t feel like you have to have the newest to actually see what it’s about all right let me close out here I don’t want to see the projector get in the back not too small all right so let me tell you a little bit the scope of what I wanted to cover today really where I want us to go I’m gonna certainly talk about jump basics so how to get jumped I’ve actually already talked about that that’s just getting the demo actually at the end of this we have some cards too for jump student edition I’ll talk about the distinguishing features of student edition and regular jump in a second well we have a card so you can download it and have a I think it’s a two-year trial curt two years yeah so it’s a two years for you to use it I’ll talk about some basic used today my background is actually an experimental psychologist so I think of things in terms of minds well I’m talking about the mind of jump which is really just the developers have jump how they think about data I’m going to talk about working with data basic stuff getting data in how to import certainly important because you don’t have files and jump format almost certainly so how to get data in and also how to make a new table some basic data manipulation then I’m gonna talk about some essential tools you know these are features that are not you know the most advanced analyses you can do and jump are the things that make your everyday analysis easier so things like global and local data filters cutting up your data looking at pieces of it things like by variables looking at separations across variables the same output and broadcasting commands call them recode I’ll talk about all these things now we’re going to get to really the essential platforms of chunk now there’s a lot that jump can do it’s very very deep as far as a product as far as an intro or even ear media stats class goes there’s but the only a small subset of the platforms and jump you’ll actually need and these are the ones that are really instant ition as well so a graph builder really the the basic graphic platform and jump there’s tabulating so tabulation you know sometimes we just need a table out of our data it’s not a graph not even an analysis we just need crosstabs jump is a great facility for that doesn’t talk about distribution and took Y by ax these are big platforms and jumps really domains it inquiry let’s talk about those two and then fit model and multivariate a little more advanced but certainly something you might come across in an Nvidia course and then finally and actually I’m gonna start with this is the resources we have you know when you teach statistics you don’t want to teach software at least I don’t I mean I enjoy teaching jump I love jump but I want to teach statistics I want to teach reasoning I want my students to leave knowing something about the concepts now that’s tough sometimes when you’re teaching a new piece of software and so sort of what we do on the academic team is try to make that a little simpler and so we have a number of different resources the one that I’m gonna tell you about first is actually the learning library really a wealth of resources to just drop into your course so guys and videos I’ll talk about some concept discovery modules you saw some of these yesterday and these are really great ways of teaching difficult concepts things like sampling distribution this randomization permutation let me talk about some of the webinars so we have a lot of webinars that we do live and actually have recorded so you can point your students there or you can watch them yourself everything from basic to advanced election this is a great place to start I want actually look at jump comm flash learn because after today you know you may want to learn more and this isn’t your only opportunity so let me actually go over to this take out that starting parenthesis now the learning library is interesting it’s built around really a structure of types of analyses and so if we talk about a domain let’s say basic inferences for forces and means we go to that section it’s not gonna have every type of analysis jump does in this section but it’s gonna have the big ones so for instance let’s say you’re teaching parity tests and confidence

intervals and you don’t want to software you just want to teach the concepts you’re gonna make that it’s really a one sample test you’re gonna talk about the standard deviation of these different scores all those things you want your students to them click on this one page guide it’ll pull out the PDF these are all downloadable you down them as a set actually and it’s gonna point them to sample data and step by step instructions of how to do to jump and so you don’t have to teach this part you just teach them the concepts and then you can assign its homework actually the how to do in a jump part okay that’s a little sign a min now you can also if you have students who are particularly interested in reading and you have some of mine or not you can have them watch a video and so these are all fairly short two to five minutes really meant to be a quick and dirty approach how to do it and jump a little bit about what the output means certainly not teaching the concept you know that’s your child you want to teach that in the way you want to teach it these are meant to just cover jump and so really a set of resources that lets you attach it to really any curriculum and the benefit here I don’t know how many use textbooks in a class several of you probably now you don’t probably want to choose a textbook based on what software is in the textbook at least I don’t I don’t use a textbook on how they teach statistics you do they cover some of the squares the way I like so the nice thing about these is you can choose a textbook sort of indifferent to the type of software or their focus none of them except that jump calm such learn now we’ll come back to the content discovery module some of these other pieces in just a minute because what I want to do now is start to give you the basic overview of one jump in why it works the way it does and really why it’s a product obsess and that’s actually a question we get a lot of times the how many people have never seen jump before for science okay so never seen anything that’s fantastic so let me tell you a little bit of junk we’ve actually been in JEP this whole time this is a journal you can build these and jump as a way to teach classes or to teach things I like them for outlines but uh let’s talk about just why jump is what it is and I’m gonna pull up a dataset and to do this I just want to first talk about the structure of Jonathan so the anatomy I would say before we talk about the wave things that’s gonna look pretty friendly so those you have never seen it before this is gonna look similar to things you’ve seen it’s a spreadsheet and we have rows and columns the rows and this data set represent tables at a restaurant and the columns represent different attributes about each table so the bill amount the check them out whether they use a credit card or not those are attributes and typically in most statistics software your attributes are across columns and rows represent units of sample same thing is true in John if it’s a little special though there’s pieces of jump that you probably haven’t seen before if you’ve never seen a software so things like a columns list here these are the same columns that are across the top if I select them here it selects an across the top but we had this list cuz there’s some attributes of com columns we always want to bring attention to and they’re the icons right here and so those are the modeling types of the variables the data that are really the structure of the data that we’re measuring and the way jump does it is this little blue triangle is set as continuous so that’s interval or ratio scale data really numeric data the little red histogram is actually nominal data so data that’s just categories and we actually have ordinal as a measurement as well so if you have ordered categories you can set that now this is special because I’m like a lot of statistical software it’s gonna pay attention to the modeling type because if you think about it the type of data you have circumscribes a set of possible analyses you can run and so jump in its interface and I’ll come back to this in the mind of jump it’s smart about what it offers you based on the type of data you have I think that’s unique not a lot of software does that but it’s good because you don’t have to have menus full of different amounts John knows what to offer you based on the type of data you provide alright so that’s the basic structure now let’s talk about the mind and really there’s like two things you need to know about how jump works then I think you’ll actually you know be able to do pretty much everything in jump the first is what I just mentioned modeling depth that it matters that is the modeling types of categorical nominal and ordinal or quantitative continuous is determined that types things jump shows you now the second thing you need to know about jump is that it’s interface progresses forward that is once you produce output you dig deeper so when you produce an output and jump it’s not in a log it’s not an intersex window it’s actually going to let you continue to progress and let me just produce some output and then I’ll talk about why jump works this way so I’m going to go to a big platform it’s called distribution this is where you do all univariate or one variable analyses and jumps gonna ask me in this launch window what columns I want to use and so I’ll take a couple I’ll take tip amount and credit card let me just drag tip amount in I can click credit card or click it into y there’s different ways to select columns I’m gonna click OK and we’ll get some output now it’s pretty basic right now but I want you to pay attention to a couple things that may be not obvious when you first see it but the first is on the left I have a proper histogram I mean this is a histogram of data I can change the scale I’ll change the bins but that’s a histogram on the right I only have a frequency distribution plot they supply the frequencies of data now jump knew to do those two different plots different

plots because of the data type one was set as nominal on the right and wellness that’s continuous on the left now that takes the output I get to quantile summary statistics right that’s what I get on the left but on the right I got frequencies jump knows the difference who knows what to do for these columns now come back to this there’s something special about jump which is everything it’s linked so I can click on the yeses and noes and I can see on the left where those tables are you can even see in the background in the table itself the data table those observations are connected so I can even take this weird one there’s a table where somebody tipped 16 dollars maybe kind of strange I got a key right click I’m gonna give it a little marker cuz maybe were to come back to look at that later I can even exclude it right there if I want all right so principally though the main point I want to get across here is that the output dependent on the type of data we have and what about that second one’s the progressive interface let me expand this now the context has been an output is one piece of this the second piece is these red triangles and you’ll see them actually all over jump as you’re learning jump click every red triangle get to know what’s in every one of them and know that they’re going to change depending on the type of data you have so this for a triangle right here that tip amount look at the options I have things like testing a mean I could test in one sample case whether mean is different from something like lossless week or maybe I want to get a normal quantile plot right that makes sense in the context of it let me turn on that see what happens just pops out right in the same window so it’s an interface I’m not in a log window kind of format here I’m actually in a little interface of my data I’ll even turn it back on now on the right hand side credit card right that’s a nominal column it’s a different context still univariate we’re in the distribution platform so look at the outside here alright jump nosed offer me different things it’s not offering me a normal quantile plot it’s offering me things like a mosaic plot we’re representing the share or testing proportions it’s called testing probabilities here a one sample chi-square that’s the jump is contextualizing not only the output but the options it shows us to dig deeper and so every one of these red triangles does different things it’s down here I customize the summary statistics okay once I produce output let’s say I test an amine here so I go into a one sample t-test let’s say let’s say under the null hypothesis we’ve always had tips that are around four dollars and so we want to test whether this week maybe they were different that output gets produced in lime this is something kind of special about jump every statistic comes with a graphic this is a distribution under the null hypothesis the sampling distribution and my observed sample mean and the extreme shaded by showing you the two-tailed p-value and those would happen I got another p-value all right started another red triangle so I get another chance to dig deeper and I could do things like a p-value animation where I can see what would happen if I specified a different hypothesis or what if my sample size was doubled let’s say hat till section make it even wider so a higher p-value because I had smaller sample size so these red triangles let you dig deeper in a way that makes sense to power animations or things like this so always click a red triangle to see what’s underneath them all right so I want to mention one more thing at this very top you’ll see this on most platforms this is a platform output there’s usually a red triangle that applies to the whole platform so these have options that apply to really everything you’re looking at and so there’s one section we’ll come back to scripts so you can do scripting and jump if you like saving scripts for further analyses but there’s an option here in distribution I think you’ll probably want which is called stacking let me answer this question before I do this how many people like their histograms vertical very few I think I raise in your hands I think it’s really neat and I’ll tell you why if you have lots of output let’s say we’re looking at every column at once you know our screenings are wider than they are tall it’s actually nice you can see a lot but we are teaching this is an extra step for a student even though I can tell you being a cognitive psychologist that’s more natural higher means more high on a scale whereas if we rotate them to the right it’s more I think it’s 4 you’d be the stretching that’s a weird cognitive translation but we’re so used to it so I wanted to point out this option is called stacking empty stack everything gets rotated that’s probably more natural for a lot easier so if you like that better if you if you hate the vertical and I know some of you do let’s set a preference for it so today we’ll never see it as vertical and let me show you where this is because there’s something about jump I want to point out which is these are little interfaces so set jump to be the way you want it to and I think this is so amazing because you know I have mine set up to do specific things and what’s fantastic is jump works the way you think and you can set this so let’s see how we do this I’m gonna go to platforms and under the distribution platform there’s a whole settings section here you can turn on or off whatever you want I’m gonna check stack as I want you to see what happens then so for the rest of today or until I change it when we get a distribution we look at some columns we’re gonna get it that way it’s the way that’s more natural I think for most of you everything still works the same way but

we’ve set them to work the way we think and if you’re teaching and you like this better make your preferences and then send your students your preferences files and so then their jump works the way that you think they should think and so jump doesn’t force you into a style which i think is really unique and since these are little interfaces you want them to work the way you think those are the two things need to know about jump you’re experts now right so modeling type matters when you have a data that’s set a certain way that determines what jump shows you and the interface is progressive it lets you dig deeper as you produce more now there’s one downside to this and I’ll back up and say you know why doesn’t work this way and the reason it works this way is jump was born about 25 years ago now which is the time the Macintosh was really coming into prevalence now older Cisco software legacy software is born really when it was a server and client relationship you know you had a computer that didn’t have much horsepower so you sent all your commands out to a server and then it bounced back to you in a log that was the way statistics work that’s the way it had to work but when jump was born 1989 I think the date was September 35th or something so they wanted it to come out and I guess a quarter three so they didn’t quite make it so instead they extended September yeah fun jump lore do you know what jump stands for John’s Macintosh product yeah and so that’s part of history so John saw one of the cofounders of SAS decided because the Mac it had a mouse now it’s this interactive interface to computing why don’t we do statistics that we why can’t we make statistics be an interface to data where we can click stuff where we can drag where we can interact and touch the data in a way that lets us connect to it in a more team level why do we have this log and server based client relationship putting distance between us and our data why can’t it be fast and interactive so he made jump and that’s what it’s been since they didn’t really name for it so it was just John’s Mac program or John’s Mac project and so to this day it’s still honors John who’s still the chief developer of after 25 years he still programs in C++ you know or I think fit model is his baby which will come and talk about later so that’s the history of jump that’s why it works the way it is it’s built for a modern computing environment and so it works in a slightly different way that has this progressive interface and it can pay attention to things in your data set that’s one downside of this you know if you have a menu that has every analysis you can run and your scroll and scroll you get to general linear model and you can scroll down and you find you know particular variants of each model that’s great you can find analyses quickly in jump you know aren’t that long those distribution fit Y by X matched pairs tabulated model you know these top five items cover about 200 different analyses but maybe you don’t know where to look and this is something I want to point out took me a while to find of no jumped about ten years and five took me two years to find this guy’s it’s called the statistics index I’m gonna help many now write this down you’re gonna learn jump statistics index under the help I’m in it and here’s why it’s cool this is actually every analysis jump does and you can type in to a section and find it so anything you’re looking for if you’re looking for let’s say Koshi regression you know you can run this example it’ll bring up the platform and lives in and show you the output so the statistics index is great because if you’re looking for the very particular analysis you can find it and quickly get help you can actually see the script that is underlying it and all the details about it that’s the statistics index all right so that’s the mind of jumble we’ve saw really the anatomy of jump as well it’s a pretty straight forward I won’t talk about getting data in I mean we’ve we’ve seen one dataset and I pulled that from sample data now if you like some of the sample data I’ll show you where it is it’s actually really great comes up about 440 datasets jump does so under sample data and help thank you and it’s nice because they’re organized by types of analyses they’re good for so on the one hand if you’re looking to achieve something let’s say graph builder which I’ll show you in a second you know there is there’s different data sets that are really great for different things so if you wanted to you know make a line chart or show Napoleon’s March so you all probably know in this graphic right so that’s made a graph builder or and we’ll come back to this one this is so cool this is Napoleon’s March animated and jump so there’s his army splitting off and then it’s like oh crap winter and now he’s going back yeah and its army rejoins so the nice thing about the statistics index is you can actually click through you know these are all saved scripts that are part of the data tables and you can demonstrate things with them now for our purposes the ones I’m going to use today if you want to try later and you’re watching this recording you can just go down to the sample data directory pick out the exact data servers and so the one I’m using is a restaurant tips right now now we’re going to come back in a little bit and talk about the teaching scripts those either in the earlier sessions probably solving under teaching demonstrations there’s lots of neat things these are more we’re legacy demonstrations little tiny applets that show certain things so this is one that show

a least-squares lime and you can actually move the points around and see how it affects the line and so cool so teaching regression is amazing with jump because you can just all that interactivity really facilitates concepts but under the interactive teaching models modules these are the more advanced ones so really full teaching modules that show concepts like distribution of sample means confidence intervals so you can find those all under the sample data index all right so that’s a good example data and how do we get regular date you know nicely for jump it works like all your other software like Excel like word it’s just file open and as simple as you know pointing yourself to the sample data directory or any directory and you can just import data now I’m actually gonna build up a level because I want to show you what it looks like when you’re pulling a data that’s not in jump format so obviously you probably don’t have your data yet in jump format let’s let’s look at a couple examples so I just went up to import data this is one level up when you first get a file open and what I’m gonna do is let’s open a text file it’s a big class dot txt now I want you to see what its gonna do first I’m showing you this on a Mac on the PC it’s gonna look really similar the only difference is that these three options or four options I guess are gonna be right next to the Open button now I want you to see what the options are so open as text and I’ll just do this this isn’t what you’d want to select what’s gonna do is actually open the text of that file so just a plain text output now this is nice sometimes if you want to see what’s in the file before you try to import it and actually from here you can just click import as data and it’ll guess and bring it in so it’s as easy as that but I’m gonna back up cuz I want to show you the option I like so when I’m bringing it in this is one I always suggest to students too so we go back to big classic text and the one I like is data using preview because what this will open is the data preview or text and text preview window and what this looks like we actually move it down is you have your it makes it guess right away of your end of field delimiter we have tabs your character set it usually does a pretty good job from the get-go but occasionally you may have something else that delineates your rows or commas so you may have commas or extra spaces so you can set these all here you can also pull in subsets so this is a nice from a big data perspective if you have a data set with maybe 10 million rows and you want to take a probabilistic sample of them you could have every student take their own random sample of 250 thousand rows and bring it in and so there’s actually there’s probability per line so you can set or number of lines per file and so this is a nice little feature if you want to bring in pieces but for us we’re actually done we can just hit next and import and jump will bring this in just fine now we don’t have people header names here so we’d want to add these in but that was what jump did for the column types notice that for well anything has a character in it it marked it as nominal which makes sense if you you normally don’t enter numbers as characters 1 2 3 written out anything that has a number it brought in as continuous which is usually correct but does anyone have any data were ones and zeros represent males and females a couple of you right yeah so be careful with that because jump will think those are numeric they are numbers but they represent categories and jump will pay attention to that and it will give you different analyses if they’re marked as continuous anything can you change it exactly great question yep just click right here and we can set it over so for instance a good example is in this big class data set we often show this with age marked as ordinal let me show you why if I have it as ordinal and this is actually height and weight so let me actually go to a different platform and showing you that we just called thick y by ax this is a by burying platform the way jump organizes these in univariate for distribution by variant for kid why do I ask and there’s a little legend here I love this legend so this is what the modeling type is of the X variable whether it’s continuous or categorical non little ordinal and this is what the cat mommy type is the Y variable continuous or categorical and this little legend is really what type of analysis output display uses based on the modeling types of the input variables and so if we take this one which represents age 12 13 14 we haven’t marked as ordinal length but that was the ax now let’s take this column the one that represents weight I put that as the Y they maybe didn’t see it click open it now says one way it knows we’re in this little square here and the type of I’ll put them in again it’s going to be a window that gives me options related to ANOVA or t-tests so in this case this is where we can run that ANOVA or we can compare each level to each other the students t-test for each combination or things like an unequal variances test but that’ll make sense if you think of this numbering as categories but if those are just categories then those are just labels now let me run one analysis I’ll do the mean that are know about put that to the side now let’s go back to our day today and

I’m gonna click on this icon and set it to continuous the way I think a lot of us think about age and so let’s go back to an align state Y by ax now just so you don’t think I’m cheating I’m gonna click recall this will put back in exactly the same specification I did before when those jump designated this as a bivariate and that’s because of the modeling types that pay attention to it the output may look pretty similar but in fact look at the options jump knows this is now the domain of regression I can fit a line or if I want I can fit you know there’s six degree polynomial who knows why anyone would do that which again but it’s all right there there’s something I think I’ll mention now which is the pedagogical value of the organization of chalk so the mind of jump is built around these buildings and erm modeling type circumscribing the set of analyses that makes sense and I think that’s nice because i’ll tell you when i teach t-test comparing means or Tukey HSD s I’m also teaching the nonparametric equivalence I’m teaching Wilcox’s right or my new favorite is the steel this is a nonparametric Tukey HSD now what’s cool about the way jump is look how close those live there one click away in the same platform I’m not familiar any of you are with SPSS or Minitab but those are distance in those menus and that gives students the sense that they’re different guess they’re different they’re analytically doing very different things their assumption base is completely different but they answer a question that’s very similar they’re answering a question that we would typically ask in the same platform and so what I like about John is it reinforces this idea that these are all doing the same thing at least makes sense in this context and the same thing is true on the right you know if I’m fitting a line this is also where that things the model relationships between two continuous variables and so it’s seeing the connection between these things I think it really helps students to fit together but if you notice on the left hand side right the aqua we got was different than on the right that was just to do with the modeling type and you may have noticed to remember my point click every red triangle when you first learn and jump let’s see if thunder this one so now that we fit a line it’s giving us options that makes sense this is something I would always recommend you tell your students to do actually only set this as a default for them plotting the residuals so as soon as they fit a line they can’t help but look at the residual bundle so they get residuals by predicted actual by predicted residuals pyro residual by X and then normal QQ pop for the residuals now it depends how far you go in an intro class that may be a little overwhelming and so you may want to hold off on that but you know this is an advanced course and you want them to really diagnose the models that’s a great idea to have them turn on all right so we went pretty far away from my menu item over here that I was talking about which was importing data and so that was importing a text file just so you can see you know jump can import lots of different data types certainly SPSS Minitab our data type data types of course text files has no problem with Excel spreadsheets let me open Excel spreadsheet here there’s a beautiful interface for this it’s called the Excel import wizard and what I love about this is you know a lot of us have these my excel files I have twelve rows at the start that describes the file I have all sorts of nonsense in the header this lets you say well where does the data start you just put in what column and row and so you can actually you know not have to feed it a perfect Excel file you can just tell it what part to pull in and also if you have an excel file with multiple sheets we actually do this a lot so we’ll have ten years of data all the exact columns but I wanted to bring it together as one table you can check all the boxes for every sheet I just merge the columns that have the same names and it’ll actually have an indicator for which sheet it came from so it’s a great way to bring in data it’s all import this and you can see it’s brings it in right away no problem there’s another feature and this is too cool amount to show because I think for interest that’s the first week I have students do this it’s called the internet open this probably one of the neatest ways to bring in data so let’s say we’re I know this is a real story let’s say we’re working with ornithologist and we’re really interested in talking about state birds so let me imagine you know we’re in class here and this I’m a student and I want to bring in state bird so I’m gonna go wiki and you go to this list of state birds and there’s a table on a website right and if you want to bring that in to jump you could try to copy that here it’s gonna be unhappy copying it out of Safari let’s take the URL I’m gonna go over to jump and do Internet open I’m just gonna paste that URL and when I open it it’s gonna say oh here are the tables I found let’s just let all of them get open it’s gonna have some that aren’t right but there it is the last table there’s my state Birds table now this is kind of neat jump knows that there’s images on this table and so let’s tell jump to load those images what’s that gonna do we have a data table how can we work with images let me scroll across there and so we have the images from the

wiki and let me tell you there’s nothing like bringing images than a table and graphing with images that get students excited and I don’t know why there’s no reason it should nothing we ever do really but it’s something about the mixed-media and feeling like you’re connected to your data again and that’s just something jump can do I think that’s amazing and just being able to do Internet open you know I always have is a really early assignment just find data online and you’re gonna share with the rest the class in about 20 minutes and they just go searching you just something to do with sports or they come up with some interesting questions and these in an open bring it right into Jonathan yes it’s got to be an HTML table or XML or set up in the CSS so they’ll parse most most web pages there’s some places that put their data in really weird ways yeah but that would be I was actually chatting with somebody yesterday about this exact question there isn’t an OCR scraper I’ve been able to do something that this like that through integration with MATLAB so jump I didn’t even mention yet but jump you can connect MATLAB or R to recruit some of its power and that’s running through an OCR engine but now nothing built-in a jump for that which would be great though right to take those old PDF printouts of people’s tables under yeah and pull it in so not not entirely there but I mean that’s a great feature for that quick bring data in get people excited yeah yeah there’s a new feature in time twelve yes a jump twelve added B it’s actually a special type of column just called the expression column and that allows you to have images or actually any type of jump scripting built into the table yes that was a new one all right so bringing it in pretty straightforward now I want to tell you about a couple essential features and where are we now 5:05 am I really out of time no I started at 4:30 I talk a lot I’m sorry but I didn’t want to like go for everything just in the basics let me hold off on some essential tools though I’ll come back to some of these as I talk about some of the other platforms because I want to show you some of the essential platforms the ones you would have to see or will see if you’re using this in HR stats and actually perhaps the first thing I’ll show you is what I do with my students the very first day well after I get them to install jump we actually in class spend time doing this because I think it’s so important that they don’t sit at home and try to do it and don’t do it and then a week goes by and they’re like how do I install jump No get it on the computers the first day and actually open up sometimes this data says sometimes other data sets but just a data set that has something like States something like counties something Geographic and what we do is we go to graph builder now graph builder is on the graph menu it’s not built for analysis so you can get statistics out of it it’s built to show data and it’s built to show data in a really great way that lets you really pore through different columns drag things interact with your data it doesn’t stick you to a particular visualization I think that’s great because you have to see the graphic to know but tell us the story of the data let me give you an example so with this what I’m gonna do with that might say you know take region and I want you to just drag back to the X and so they drag to the X and I say what do you see well they seem just a bunch of dots there’s actually no Y dimension here so what jump is doing its just jittering the points in the Y dimension so you can see them now that may seem useless but there are times that that actually convey something about the number of observations you’ll actually learn to love that I think jitter pot right there but let’s give it a Y dimension so let’s give it a well we normally talk about SAT total right not verbal and math so it’s actually take both these variables and I want to show you something in a platform launch window which is what this is you can grab two of them right click you get a little menu let’s combine them make a song so now we have a sum that we can use it’s a temporary variable if you like you can save it to the data table by right clicking but for now let’s just use it here I’ll drag it into Y now we still have our jitter points shows us a little bit about the spread but look at the ribbon at the top these are all the visuals that make sense remember jump knows what type of data were working with so it’s going to give us things that make sense it’s not telling us we can do a linear regression here instead it says hey you can you can do this thing this cool this is a contour and so this shows let me drag the points on you could actually drag another visual on top so see what the contour is doing it’s a folded distribution that’s showing us the spread of data but it’s doing it as a nice little smooth curve let’s turn that off here’s one they often use box plots that’s not bad see the median yeah let’s keep that one open we have some other stuff in this data set like year now this is kind of cool there’s some zones rebec’s rap group Y and overlay those are zones that break up what we’re already seeing so let me take a year and on the hover it I’m not gonna drop it yet I’m gonna hover over X that’s an okay visual it’s a little bit noisy for my tapes right let’s look at group Y that’s noisy but useless like perceptually can you tell

it all what’s going on I can but rap with rap rap is pretty mean so rap shows for each year the little spread there between the different regions now this is exactly the time when I say if we want to my class if we wanted to show your to your changes in the region is this a good graph I think about what perceptually that princess states I don’t have to look from here to 94 right here to 97 I think right here that’s terrible perceptually like you need to have those right on top of each other so I say hit grab here again drop it in Oprah ah now we have it all click done action something make it a little bigger but now we have these all next to each other so we actually see within each of the different times right they may say okay well I would rather have some spaces there so let’s go back I’ll actually grab region out and drop it in roof racks so now actually does spaces for us lots of flexibility with what you can do now here’s actually the one I love I tell them to start over and I say okay well often real interested in geography right we have SAT scores across the country king bed state variable drag it down there to match shape now we get a map take that combined SAT total drive it in the middle there we go so now we actually have the geography and there’s something about and I always make this point you know the juxtaposition between this table where they get so little information I mean they see the data but this is a big mess I like to expand it fullscreen so they see it’s just a noise of overwhelming you know data points they don’t get meaning from that and then you click right over to this one you say can you make meaning of that of course you can I mean she I reckon we can see where the highs and lows are right that tells a story without much effort and I think that juxtaposition the first day of first week makes people feel the value of data is worth their time and it’s also play there’s nothing more fun than just playing with these variables we can take another variable like a year let’s drop that in the rap section and then my students say to me but Julian you know perceptually or dr Julian a they call me yeah sexually isn’t that bad means we can’t see California changes over time I say yeah that’s true let’s actually play with this all right click levels and you’ll do one at a time now we can click through a little interactive window there this is really cool anytime you have lots of levels of the variable or let me go back at all I show this yesterday this is a new thing in jump twelve rank or click on the red triangle the assumption make this graph into a data table okay what’s that going to do let’s run this we get a data table a little tiny cluster let’s drag the row Heights out and there we go we made a data table of the different years let me can scroll through really cool and so we use this all the time it seems like kind of a toy feature but when you have I do a lot of research consulting and you know friends have to 150 subjects in their data set and they it’s not a map for each person but it’s some kind of regression something you know individual and for every individual subject they can actually put out a plot of some relationship something within subject and so showing the data becomes a lot easier that’s just an option in the graph builder and scrap pillars worth playing with this is actually an iPad app as well so if you just have your students get their iPads out the first day and put some data in Dropbox they can play with it – the one on the desktop is so much better though being able to click and drag and get the room data I think it’s really really fun and you know all the types of visuals that makes sense you know drag it to quantitative variables you had different options so just like most things than jump very sensitive to your data all right now they go back there’s one more thing I want to show about graph builder and this is just neat if you ever have data like this data with latitude and longitude and this was doing jump 11 but these are San Francisco crime data and so this is only for I think like a week so 9,000 something incidents there’s a newer version of this data set it has the my car break-in and it this is like last December it was kind of fun Devon but anyway so let’s go track builder how are we gonna make sense of these data what we can do a number of things first and this is one that I really like using that heat map so let’s take time a day I don’t put that on the X if I can grab there we go time today okay get a big basketball that’s weird well let’s take Dave week drop in the Y in a bunch of box plots look at this little heat map icon click on the heat back look at this so this is across time a day of weak heat map of the number of incidents there’s a lot of meaning right away there right we can see you know between 3 and 6 p.m there’s some real hot spot especially Friday between 3 and 9 p.m. it’s really the highest counts when do you want to be out in San Francisco 3 and 6 a.m although if student mind was very quick to point out this should really be scaled relative to the number of people on the streets and I said it very good lunch which is true but you can do lots of things with this I mean we can do this by a police district and you put this as a rap it’s actually a great way to see it and so we can quickly see well okay it’s a lot of stuff happening in a southern region the rest of it is pretty

ok but what I really wanted to show you in this dataset is this so let me go down to latitude longitude I’m just gonna take them both and drop them in the middle and if any of you can work with maps you’re gonna say but but Julian a latitude should be on the y-axis and you’re right I’m gonna right click go to swap and swap in it’s actually a really useful thing every time you put it in the wrong place being able to swap is easy you might say what are we looking at here there’s a big cloud of points let me right click go to graph they did a background map and what it can do is have this connect to the street map service and pull down street maps of this and this is so neat so let me go to tools and I’ll go to my zoom magnifier let’s just zoom in here then we get down the street level detail for crime incidences and this is neat if you ever have data that has latitude longitude there’s actually a talk yesterday friend Ewa was talking about benefit gives you just like Zillow it’s data on houses and they come with latitude longitude so you can have students download from their own home zip codes the housing data and plot them on a map and if you because I wanna jump is linked you can still select these points we can just hover and see what’s going on that was a vehicle theft you know some of these are fraud flood fraud at San Francisco for some reason but so I think this is just a great way to get people excited about data seeing a map for some reason it really does it whenever you’re done with these I clicked it before but I’ll show you again you can click done and this is something you can save out of the PDF I haven’t shown this yet but you can actually export these out as interactive HTML leave them so on the Mac it’s gonna be under export on the PC it’ll be under say that this is most useful in platforms like distribution remember distribution had that a really cool feature where this is a weird one to show let’s do a different distribution all do let’s say please district let’s do a traffic incident or not day a week so we get three little history grams here but remember they’re interactive remember I can click on these and see them all but this is a great thing that it export out so if I go export and I’m PC again it’s gonna be save as interactive HTML of data this will actually be capsulate the data and an interactive HTML format that allows me to open a web browser I can find it on my desktop will clean up by name here uh-huh hidden opens it a web browser still interactive this is something you can share on the website your students don’t even need jump actually have them do this for their final projects and so I’ve done the story earlier you know grades motivate students absolutely but fear of being embarrassed in front of their peers really motivates students and so actually for their final projects I had them do a little experiment in my honors class and they had to put their data online and share it with friends and they were totally willing to share it with friends cuz these are little experiments they’ve been working on they came up with but one of the soon’s told me afterwards he really said he’s like you know knowing that my friends would be looking at my data online and thinking about what I was asking and what my questions were like really made me work hard because nothing motivates I’m like not thinking their friends are gonna think they’re smart so I think it’s a great way to motivate them plus their friends can click on the data and so even if their friends don’t know statistics and don’t have gentlemen they can they can interact and there’s something about touching the data interacting with it in that modality that’s really powerful and so it’s a neat neat thing to do I think it I must have taken our programmer so long to do this this is just so cool so I’m always excited about that and also for a practical reason anyone who publishes you put your data in an online appendix as interactive HTML then people who read your papers can actually go to your online attendance and click through and it just it really means publishers happy the whole idea of sufficient statistics and you know showing your data it’s a really nice way to show off alright so that was graph builder and so a really nice place to start with students where am I on time I’m especially loquacious today by 17 cool 15 minutes I want to show you tab really very quickly if they’ll move on to distribution if it Wi-Fi acts and I might as well just bring up the same dance that I’ve brought up before restaurant tips cuz this might be one you know you just want some tabulation you want to know what was the total number of let’s say total amount bills for Monday through Friday or for different servers in your restaurant and so let me go to analyze that tabulate and in tabulate just like graph builder there are drop zones for different rows or columns you want to use and I would say for this one imagine the table you want and then drag the rows into place if I’m imagining a table where I have a column that just says total or sum for bills and I have rows that are the different days of the week all right so my rows are going to be defined by day of week so let me drag Dave week over here notice it gets highlighted all right the first thing it did is does is it shows the number of observations so there’s 20 observations in the data set for Monday there was 13 for Tuesday you can even click them they’re interactive still but remember I wanted a total of bill so let me take bill amounts then I drag it

right here in the middle little drops on there so I want to replace what’s happening for N and actually some is the default they didn’t just read my mind that’s actually what it does by default if you want to change that you can right click go to statistics change it to whatever you want maybe I wanted the average or maybe I want both the mean end the average so actually or me nematode so I can go down to sum here you can drag in now watch what happens if I hover over the middle and drop there it replaces it that’s what it means in jump let’s head for you to hover over the middle let me undo I’m gonna drag some again and look like this on the right hand side there’s a little teeny drops on there or the left hand side that’s the prepending and appending drops in so i’m let me mop end to the end of meet and some that’s a pretty good table but let’s make it even more complicated maybe I want to break up day of weekend server and so I have server over here let’s take that over to day a week remember little heuristic dropping on top replaces dropping to the right or left up ends or pretense let’s prepend and now we have server and day a week now maybe I think to myself actually I want server to be different columns that’s the table I’m imagining let me grab server remember holding on the middle replaces holding on top or bottom now is the ending every been so there it is there’s some points missing what’s going on there they weren’t working there yeah perfect so even this table can give you insights into your data set and I actually use this sometimes for weekly cross relationships so when you’re doing factorial ANOVA sometimes you need to see do I have enough observations in every condition and you can make yourself a little table to check that once you’re done you can just click done or and this is neat the red triangle member click every one of these you can make this in right now it’s a little jump interactive table which you can make it into its own proper data table now why is this useful let me give you one example then we’ll move on that SAT by year dataset remember the one that had different years and we had lots of SAT scores you know let’s imagine we wanted to run a regression we have eight rows revery state we’re gonna multiply the grease-free two by eight and we should write because these eight rows are not independent so we need to tabulate need to average over the SAT scores from these eight years for every state if we wanted to run some regression so let’s try this I’m gonna go to tabulate let’s imagine the table each state will occupy one row in their state we want SAT tell remember the trick I’m going to grab and both right-click combined tickets um let’s take the sum and drop it at the top remember we wanted to have a mean so it’s right click I think the statistics make a mean and let’s say we wanted to do regression with expenditure remember the trick to on top replaces to the right or left up ends and prepends there we go final thing remember that red triangle table now we have an aggregated data step dinner table so now we can actually do our regression so look at her SAT scores by expenditure and there we go and so this is now averaged over year so our degrees of freedom actually makes sense now this should be a very concerning plot to you by the way SAT scores by expenditure it’s the states that spend more are actually getting lower SAT scores turns out there’s some good reasons whether this is actually states that require more people to take the SAT tend to get lower SAT scores and those also tend to be states that spend more in education so the states that require them or season use a CT not SAT but point here is that tabulate it’s not only a good table creation platform but it’s also good when you’re doing these aggregations so taking some zorad which is over variables all right so that’s tabulate now we already looked at distribution distribution was that one variable platform really good whenever you get univariate questions in fact my advice is every time you get a new dataset we’ll go back to let’s go back to restaurant tips every time you get a new dataset the first thing you should do is go to distribution and I’m telling classes take every variable put it into why click OK and now go through variable by variable and make sure things look okay it’s the best way to catch errors so for instance you may see bill amount here there’s a huge bill you know if you don’t like that you can right click and exclude right away and I’ll take it out of the data set it actually puts a row state that excludes it here so if I scroll down you can see there it is hidden and excluded hidden means it doesn’t show up in new analyses excluding means it doesn’t get calculated in summary statistics or statistics for that that row so I can go back and hide an excluded which brings it right back now I point about this as to screen your data it’s a really good idea just scroll through it so easy and it’s a great thing for students to get in the habit of because they’ll actually see more about their data and structure of the data and really take advantage of that interactivity you know get them clicking on things and sort of exploring if you want to run one sample tests remember the red triangles testing probabilities that someone sample chi-square if you want to absolute here for server test probabilities level a B and C now there’s a trick third a third a third that’s gonna be a lot of decimal places for me to type in if I want to type in

one third now instead jump will rescale so if I do one one one jump knows well you can’t have three for probability so hit done and it scales it for it’s a third a third a third we get that Pearson I can’t ratio so our goodness of fit kite square right there I showed testing a mean before I’ll show it again quickly it’s a test mean is under any continuous column and distribution and so here under the null hypothesis let’s imagine that two percentage used to be 16 last week and let’s say we also want that nonparametric actually have students check this all the time the wilcoxon signed-rank we can have produced right at the same time we always get a graph with every statistic now one play on something there’s the test statistic right here for t-test there’s the sign rank now the two-tailed p-value and jump there’s probability greater than the absolute value of T that’s the way it’s written and these are the two one tails the directional so one tell positive one tell them negative so the two tails right here that’s why usually point my students to it’s nice thing about showing this is you can actually have them you know look at the sampling distribution please asymptotically and look at the shaded and that is the p-value at least the two-tailed feedback so nice way of driving home the relationship between that and the actual test statistic is that’s distribution that can see pretty far that’s by the first couple weeks of an intro class but my back’s is really where you’re going to be going after that ones as we Suffolk WebEx has the contextual layout so it will produce different analyses based on what you put in and we tried this before but I’m doing all at once I’m gonna say let’s predict bill amount based on credit card in the number of guests now it may look like I’m running sort of an an Cova I’m not big wide way apps will always do bivariate output so what its gonna do is produce in the same window two separate outputs one fitting to billing out against credit card use and one with bill amount against number of guests and remember jump is contextualizing the output so we are teaching the t-test that’s gonna be in this context a nominal column on the X and something continuous on the Y so let’s look at these so needs to know the pool team so the pool t-test is as soon as equal variances so let me run that output now you’re gonna get a lot of stuff I usually hide the summer in fit and hide the ANOVA table if I’m doing it this way I keep the means for they over that’s fun and you get the t-test in the same format as we saw before the difference I’m gonna get the T ratio and proportion greater than the absolute value of T with the two-tailed p-value now I want to go back though because there’s the other option t-test and this will synthesize the degrees of freedom so it’s not assuming equal variances and it even hovers and shows you so this shows RI the t-test only available two levels owed should say not assume there is so assuming unequal variances and so it’s nice to actually print the both though you can actually show them the difference in T ratio you can say look the difference is identical but look at our standard air and look at our degrees of freedom and so they can actually see the changes here now formally you often want to test whether your variances are equal and remember jump puts things that are related really close to each other that’s in the same context unequal variances is right there and so if you’re doing a t-test and you’re trying to decide unequal variances or not well run the unequal variances test and you get multiple outputs so the f sided RF test two-sided only when you have two levels the rest of these are really nice I like the brown Forsythe makes some few blue beads is really good too doesn’t assume normality and testing an equal variances it always seemed strain to me to have assumptions for the test of assumptions that you’re running for that but these students can see there’s different ways of conceptualizing that test and the plot signs to standard deviation for the different groups all right that’s on the left that takes you pretty far too we just did you know ttat’s that’s also where you do ANOVA on the right hand side that’s where you’re gonna be fitting your lives but that’s a continuous variable against a continuous variable and typically when I start teaching this you know I look at the mean first this is actually kind of neat fit the me and you actually get the sum squares air about to mean that’s the corrected sums of squares total just kind of neat if you ever get into more advanced regression so you can see the deviations from the ground level but usually what your fitting is actually that line something line s that will give you your summary it fit your R squares all your details your analysis of variance output they’re playing with the overall model typically we’re talking about parameter estimates and so we’ll get them here the estimate standard error and the p-value I want to talk about something in general in jump when you have these tables try right clicking and go to columns because you’ll see some things that get hidden by default that honestly most people don’t want to see until they really want to see them things like the variance inflation factor things that relate to multicollinearity or design standard they’re the one you may want would be lower and upper 95% confidence intervals around that beta so if you want the actual I’ll just turn them on here so column is lower 95 and columns upper 95 now again if you always like those go to preferences and set it make jump work the way you want I actually prefer to have standardized

betas on to and so that’s giving you the standardized coefficient as if you z-score both of the variables all right so that takes us through t-test and linear regression now there’s one final combination of Michelle which is what happens if we have something like day a week predicting server that’s a contingency three minutes has curves so that’s a contingency so that’s when we have categorical by categorical and I wanted to show this output because this is probably one of the my favorite things like jump that it was weird the first time I saw it and what you’re looking at here is a mosaic how many have seen in mosaic before ok cool it’s not even seeing it so it’s showing three things all at once that I think is really special it’s showing the marginal distribution of both variables so what proportion of that is that have a B and C for service what the portion of the day is have have Monday through Friday and it’s also showing contingent distributions what proportion are a B and C servers for each day so the marginal distribution of server that’s pretty easy that’s a B and C right there I’m just selecting them and that’s the share in this big fat bar so most are the some are a and very few are C days now the marginal distribution of day of week is a little harder to see but look at the spacing on the x-axis Monday is kind of small Tuesday is tiny Wednesday is fat all right these are representative of the share the area of this or the line width is really religion share of day of week internally is really the magic on Monday we just had the server be on Tuesday we had a and C on Wednesday it was you know a B and C the internal geometry there or the share that doesn’t match the average that’s evidence against independence all right so these are in this case pretty dependent if I minimize the contingency table you can see your your Pearson are likely to ratio but we’re seeing is that the distribution of server depends on which day of week we’re in and you can see that visually you know for Wednesday Thursday Friday they’re kind of the same Monday and Tuesday look totally different from the average now again on the red triangle click every one of these you have lots of things you can turn on measures of Association is pretty typical get some effect sizes for that there’s lots of other options there as well but I think the contingency applies a really neat way to look at those data all right so I got like a third of the way through this no actually got pretty far so fit model multivariate I really need there’s some videos I’ll point you to for that but as far as resources I already talked about these guys the learning library concept discovery modules I hope some of you saw yesterday I pointed out a couple of them at the start of today those are again under the sample data directory under the help menu and they’re under the teaching scripts either interactive teaching modules or the teaching demonstrations and like I said I’ll be putting this video online let me show you where that is if you go to the jump dynamic community so it’s community that jump comm slash academic or just go to community jump calm and I’ll show you how to get to it go to other communities jump academic community and so we’re actually going to put it online here under the academic Resources Center and so we’ll actually put a link to we have a lot of things actually from yesterday we have the links to well we have two webinar recordings I think we have our exercises for us 2015 so I’ll put a link in here so again the way to find that is if you go to the main community and then go to other communities jump academic community and then on the right hand side there’s a section for academic resources right there that’s all I’ll make sure you guys we find it I would say look at the video to find the video but that’s little recursion so we can’t do that okay so I’ll stop there and take some questions I already told you guys I’m stay longer I know you’ve already spent an hour listening to me but uh yeah let me take some questions real quick and then I sure my anyone can go if they need to what’s one thing you can do with them oh yeah absolutely yeah so ah yeah basically under the file exchange here there’s plenty of things you can add in to jump yeah there’s a lot of really neat things that some of them are games some of them are advanced things that were done for industry the add-ins that we support are the ones that teaching modules we have a randomization testing atom really neat lots of these stuff here jump has a lot of extensibility and I already mentioned you can actually connect to our or a mat lab so you can do a lot of stuff but like some of my favorites are interactive bidding so bidding things interactively so that’s kind of neat so if you take something like bill amounts you can actually define which sections likes a suck-up coins at percentiles and I’ll actually make you know percentile bends lots of cool things you can do yeah and they’re all free yeah yeah synopsis tell me what the differences

are sure um in terms of scope I mean jump will go very far I always say is II test through a neural network so jump is very expensive and 3d I think probably the biggest difference would be the philosophy of the analysis in a Minitab SAS SPSS really most I’d say most statistics software on the market was built at a time when we needed to have that server client relationship and so they’re built in that structure you produce an analysis you get Hamlet you know even our which was really but that way didn’t have to be but that’s how it’s built you pretty soon alysus that it’s flat so jump is very different I think that’s the part the biggest differentiating factors that jump is built around what I always like to just call interfaces I mean these are little little windows to your data that lets you connect to it and interact with it as far as analytic scope I mean it’s really quite comparable jump goes really far if you want it to so classification regression trees for data mining neural nets you know all the time series you could ever want multiple correspond inside of analysis I mean these are all I’m here so I don’t think it’s a question of scope certainly not for an intro or mid-level class though jump is one of these pieces of software that they can they can learn and is friendly for introductory stats but they can take through their their doctorate and I think that’s actually special you know there’s very few pieces of software that I think cover the graphing and visualization as effectively and sort of as interactively and beautifully but also do the analysis and very few that take them from the start to the finish and I you know I say that not you know representing jump I I work for jump cuz I love jump so much and I just like I spent most of my grad career teaching people jump they should have been paying me I always tell them this for years but they weren’t but uh yeah I think it’s special because of just that feeling of connecting user data and that’s really big difference yeah yeah yeah so student it actually may be like her talk about that a little more he knows more about the Yesi packaging pricing yeah we do have also an option to rent jump for six months you’ll just mention for lot of schools do the site license because you actually have jump Pro with that now it’s really that goes beyond the scope of most introductory classes except if you’re going crazy with them but the benefit is you know researchers typically buy into this and so it’s not something that your students have to shoulder the burden of usually most campuses I think was like 87 T percent charge nothing for their students dad get it and the great thing about the license and I just think it’s amazing SATs does this is it’s at home for students as well so they can put it on their home computers as many copies as they want which means they’re taking their data and I think this is just the right way to do it home with them and playing with it because if students have to go into a lab to work on data that’s not the right it’s not the right mindset I would say play with it I’m very deliberate about this because if you’re playing your your your reconceptualizing what you’re doing is fun and you know even if you tricking them into thinking it’s fun that’s fine as long as they’re playing and they’re exploring and they’re not stuck behind the the math and they’re thinking about the value of the data and that’s something I think you jump always brought out in students more than I’ve seen other software bring out you know they see the graphs they see how pretty is they can drag and drop things so it’s so familiar that they just started playing and you know I’m amazed the first week of class they are mostly doing graph builder and then we have a project where they bring in their own data and they talk about you know two to five minutes just showing what they found and people figured out all sorts of stuff I never show them you know some people figured out how to this is some neat you can take multiple types of data and layer it so you can make these big travel spots at different variables and so people come in with these really advanced interactive things I’m like how did you forget that they just like drag things around it is plain I think that’s great I think it’s that’s the right mentality for data you know what we do with data is very serious sometimes but the approach to learning it needs to be I think more friendly because once you’re behind all the values and numbers

and formulas you know that that scares them away it shouldn’t because data is it statistics is the best class to teach you all know this you’re here so they should enjoy it and I think most students would enjoy it if they had more play yeah right um almost always I do it in little groups I find that to be less threatening if they have to present especially there’s nothing like getting up alone and talking about what you did that scares the crap out of them but uh you know in little groups what I’d like to do I’ll give them the same data set and something rich enough where there is I mean I don’t have a lot of the ones I use here but rich enough for there’s lots of columns lots of data that they can play with and it’s like 15 minutes I’m with groups of three find something interesting find a way to display it and be sure that you can convince us why you display the data the way you did and that’s a really early on one getting them to get out of this mindset of okay I have these data what’s the best visual which is the wrong question always it’s what visual tells your story and so as soon as possible I get them into the mindset of okay well what what can I find in the data so they explore and they have 15 minutes so in groups of three that’s not a huge amount of time to explore but enough where they find something interesting and then they put together a graphic it then it basically five minutes they have to get up and give a little talk on that’s one little activity um I like later on in the class you’re not gonna be getting I like to get them increasingly bad data and by that I mean data that you know starts off pretty this is pretty data but for instance showed a little of this yesterday you know later on I want them to get data that has a couple of things that are wrong and you know they by the time they get these data sets haven’t heard me say what’s the first thing you do and they say run distribution like what’s the first thing you did it look at distribution and so they know they’re gonna open distributions they’re gonna split all the columns in and I mean I hope they find these things they have in fast you know they get down a credit card and they say oh but something’s wrong there the nose and the ends and then they get down to things like Dave Week which is just a whole mess right and I think that’s great because you know real data is much messier than that’s anyone has done a study or done a survey you know notice that people come up with uniquely wrong ways of answering your questions and you know this is a pretty trivial example but I want them to see that because there’s you know great things and I almost software that helps you clean it up but jump has some really cool ones now I’ll just point this out this is new in 12 if you haven’t seen it under columns under utilities there’s a recode facility now recode has always been and jumped but jump 12’s recode does something really cool so look at this this is just so messed up right we have Friday’s we have spaces we’ve all sorts of stuff you can grab levels you can right-click and combine them if you want but this is something that jump can figure out so go to their red triangle remember click every red triangle go to group similar values and this is just magical so we can ignore all these things the difference ratio says what proportion of this word whatever word we’re talking about can I change as John to make one word in other words and a quarter means okay a quarter of the length of the word we click OK look at this it figured out that all those grouped together except for Wednesday and the rest of the weds because they’re so far it’s all hey group for those now I’m done let’s recoded like 15 levels in two levels they should be now there’s a couple things in this menu so done in place rights over the column don’t ever do that what that’ll do is that right over the column which is really okay but honestly you can never go back and so if you don’t catch an air and you’ve saved your file that’s a month later you have no way of going back so do one of the following do new column or and this is what I always tell students in each product you to is form watch what this does it’s gonna write a column to the data set which has the recoded values but the way it’s doing it is something special so let me scroll down at the bottom he doesn’t what it’s doing is recoding dynamically so if I type in lowercase W II D which was one of the heirs look what it did and recode it it still it is online or active recoding and the way it’s doing in let me right click and go to formula so columns as a whole take on formulas it wrote some conditional logic so I wrote a math function if you double click this this is actually JSL it’s part of the scripting language but let’s just look at it graphically it says match the date we call them that’s why it’s italicized and it says that I find this make a dish I find this make it this I find this make a dish and so it’s online recoding and this is great because this is your recoding schema too you can show this to somebody and show them this is how you coded so yeah as far as assignments of going back you know increasingly bad data I think that’s a great training for them and and worthwhile that’s one where they could do in teams of two or three give them a bad data sets they get it to get workable and we’ll come back to temperance