Analytics in Action | SAS Technology Connection from SAS Global Forum 2019

MODERATOR: Welcome to the SAS Global Forum Technology Connection Please welcome the Chief Operating Officer and Chief Technology Officer of SAS, Oliver Schabenberger OLIVER SCHABENBERGER: Good morning Good morning, and welcome to the Technology Connection at SAS Global Forum I’m Oliver Schabenberger, COO and CTO of SAS and your emcee for the morning And I want to reveal one big secret right up front– I do own a pair of jeans The theme of the conference is “Analytics In Action.” And during the next 90 minutes, we want to show you exactly that– how we solve important problems using data and analytics using SAS technology I speak to organizations and customers around the world, and many conversations have a common thread My industry is going through a transformation, digital transformation Physical assets, books, cars, computers, stores are turning into bits and bytes The world is drowning in data, and we’re not taking advantage of it We know we have to do something, and we know we have to get it right But we can’t find the talent to implement a data-driven business We do not know where and how to get started And now there is extreme hype around artificial intelligence and machine learning, the secret weapon in this fight But my organization is not wielding that sword yet Are we falling further behind? We do not want to add to the hype We do not want to add to the confusion Over the next 90 minutes, we want to make analytics, machine learning, and artificial intelligence real, bring it to life Yes, AI is overhyped But it’s also real and powerful Many technologies and methodologies are today swept under the AI umbrella, and that’s OK Someone quipped that, quote, “You only call it AI until it becomes useful Then you find another name for it.” Our current form of narrow artificial intelligence is data driven And that distinguishes this era of AI from the approaches in the past We try to create machine automation through handcrafted knowledge systems, expert systems where software developers pour our expertise into machine instructions And that works well for systems that are defined by clear rules When the task is to capture logic, it’s not to interact with a complex and dynamic world The incredible improvements we experienced in computer vision and natural language understanding in just the last decade are based on a different approach We worked for years on handcrafted models for object detection, facial recognition, natural language translation, and so on And despite honing those algorithms by the best of our species, that performance does not come close to what we can accomplish today with data-driven approaches, approaches that let algorithms discover patterns from data rather than coding logic The powerful message here is not that machines are taking over the world It is that we are learning that we can generate tremendous value by unlocking the information, patterns, and the behaviors that are captured in data, that we are understanding that this is a new era of machine automation governed by algorithms that are derived from data or that have shaped themselves iteratively And we are learning how to use this power at scale, how to apply it across an enterprise During the next 90 minutes, you will see analytics in action At the center of the technology connection this morning and throughout the conference are analytics, data, the SAS user, and not hype I’ve been part of many changes and transformations in the analytics market and in our software Our innovation is customer driven, innovate to meet your needs, and create tools and solutions that help you innovate And for this to work, we have to communicate, work together,

talk to each other, learn from each other, exchange openly what works and what does not Your feedback is all important to us Here to recognize one of our SAS users with the Annual User Feedback Award is Annette Harris, Senior Vice President for Technical Support at SAS Annette? [APPLAUSE] ANNETTE HARRIS: Hello Thank you for being with us at SAS Global Forum 2019 The theme for this year’s conference is “Analytics In Action,” and our winner today is a perfect example of someone who is visionary in examining ways that artificial intelligence and machine learning can be used for demand forecasting and planning capabilities He provided input that resulted in the creation of a new demand capacity, Assisted Demand Planning, that uses machine learning to boost forecast of value added He has shared ways that his company is using SAS Forecast Server and SAS Demand-Driven Planning and Optimization He has also engaged with our product management at SAS to discuss functional requirements for the upcoming Demand Planning Solution on SAS Viya He also led the deployment of SAS Solution to 48 countries globally So on behalf of SAS, I am proud to present the 2019 SAS User Feedback Award to Dr Davis Wu of Nestle [APPLAUSE] DAVIS WU: Thank you very much Thank you, Annette I’m really glad my contribution adds value to SAS product developments In Nestle, SAS has become an important tool for everyday process in demand planning globally And some of the users and supporters are with me this morning In fact, the success attributes to the great teamwork in Nestle Today I would like to thank especially my sponsor Oliver Gleron, who is also with us today and will be in a panel discussion tomorrow Also thanks to the support from Nestle IT, Francois, Raghav, and from SAS, Jonathan Riches and many of your colleagues Thank you very much Thank you MODERATOR: Congratulations to the 2019 SAS User Feedback Award winner, Dr. Davis Wu OLIVER SCHABENBERGER: Thank you, Annette Thank you, Davis Earlier I mentioned customer driven innovation We want to innovate to meet your needs, and we want to empower you to innovate with our products Organizations are learning about the power of analytics, and we are learning about their needs for applications Working together, we can generate value for the organization, for its constituents, and customers It is especially rewarding when that collaboration has a positive effect on lives, affects them, maybe improves them, maybe even saves them Analytics is an opportunity and a necessity in the transformation of health care About two years ago, we partnered with the Amsterdam University Medical Center to use computer vision and predictive analytics to improve care for cancer patients Ladies and gentlemen, please join me in welcoming Dr. Geert Kazemier, Professor of Surgery and Director of Surgical Oncology at the Amsterdam University Medical Center Good morning, Geert How are you? GEERT KAZEMIER: Good Good to be here OLIVER SCHABENBERGER: Nice to see you Geert, thank you so much for being with us Thank you for the partnership and for being here at Global Forum and sharing with the audience about the important work you’re doing at Amsterdam UMC Tell us about the medical problem we’re trying to solve and the kind of patients we are trying to help GEERT KAZEMIER: Oliver, in the product that we do together, the patients that we’re aiming to help have what we call colorectal liver metastases So those patients have large bowel cancer, and the cancer has spread to the liver Colorectal cancer is about the third most common type of cancer in the Western world, and those metastases occur in about half of the patients So half of the patients, the tumor does not stay in the large bowel but travels to the liver OLIVER SCHABENBERGER: Well, you have to let that sink in for a moment One of the most common cancers worldwide, and half of the patients experience liver metastases What sort of treatments are prescribed for these patients?

GEERT KAZEMIER: We have several, but the best available treatment for these patients is surgical removal, resection of the tumor That’s my daily work Unfortunately, it may not be safe to do this resection initially because the tumor is too large, or you have too many tumors And those patients can become resectable if we give them chemotherapy upfront So we give them chemo first and then operate And those are the patients that we are focusing on in the project OLIVER SCHABENBERGER: So we’d like to focus on patients who might undergo therapy to shrink tumors in order to make them candidates for resection Today, how do your physicians assess whether a patient might be responding to chemotherapy and is on the path to becoming resectable? GEERT KAZEMIER: Now, our radiologists do it They use what we called a RECIST criteria RECIST is actually an acronym that stands for Response Evaluation Criteria In Solid Tumors And to evaluate those RECIST criteria, a radiologist selects two lesions in a patient’s image, as shown on the screen And for each lesion, the radiologist manually measures its largest diameter in the slides before and after the chemo If the sum of the diameters decreases by at least 30% after treatment, that’s a good thing The tumor shrinks The patient is classified as responding to the therapy But if the sum of the diameters increases by 20% or more, the patient is progressing The cancer is progressing, which is a bad thing If it stays about the same, the patient is called stable And this classification of a patient determines how we proceed with the treatment OLIVER SCHABENBERGER: OK So I’m putting on my data science head for a minute here to frame that challenge that you’re facing The selection of a treatment path depends on the classification of a patient as responding, stable, or progressing The classification is made based on a rule-based system The decision input is measurements made manual for medical images, actually one image So it seems to me that the radiologists have to make some subjective decisions in following the RECIST guidelines, such as which lesions to look at and which image slices Also that RECIST criteria does not take into account all the details we could have available for modern scanners, like the 3D geometry of the lesions And finally maybe, maybe we could develop a better predictive approach model to predict patients respond better than summing just diameters of two lesions Am I on the right track with this? GEERT KAZEMIER: Yeah, you’re absolutely right I mean, up until now that was not possible It’s just the two lesions because more is too much work for them The manual process at this moment takes even more than 20 minutes per scan for a radiologist to do So we believe that those medical imaging analytics that you guys have at the SAS platform can provide alternative criteria that are indeed more objective, accurate, and automated OLIVER SCHABENBERGER: To make decisions based on data that are objective, optimized, can be applied to all the data because they scale, and can be carried out quickly and consistently– that sounds to me like a win-win situation Well, let’s see how we tackle this problem with analytics and how much progress we’ve made Please meet Fijoy Vadakkumpadan, Senior Staff Scientist in our Computer Vision Team [VIDEO PLAYBACK] – I was a very curious kid growing up, often tinkering with various electronic and mechanical devices at my parents’ house When I came across computers in my late teens, that opened up an entirely new world of things that I could build and fix, these new things being computer programs And I haven’t stopped coding ever since A personal experience that I had a few years ago changed the way I view my work In 2015, my wife and I were pregnant with identical twin girls Towards the end of the pregnancy, we were getting a detailed ultrasound imaging exam almost twice a week Because of all these exams, we discovered early on that one of the girls was not growing as fast as she should have So we decided to move forward with a planned C-section instead of waiting for natural delivery, which would have been unsafe at that point The C-section went well, and we now have two healthy and happy girls at home If it weren’t for medical image analytics, the outcome could have been very different And I’m very deeply touched by that experience The realization that I can– or my work can help make a similar impact on someone else’s life is very gratifying There’s no doubt that medical imaging has revolutionized medicine But at the same time, this revolution has brought new challenges to the clinic A radiologist typically has to look at thousands of images per day And this is where my team at SAS has stepped in to help

We have extended the SAS platform to process medical images SAS platform now provides an environment where users can build applications that convert medical image data to insights that can drive decision making My hope is that this work can help improve the lives of radiologists and associated health care professionals It may even help save a life one day [END PLAYBACK] OLIVER SCHABENBERGER: Good morning, Fijoy FIJOY VADAKKUMPADAN: Good morning, Oliver OLIVER SCHABENBERGER: Fijoy, the team has been working on extending the SAS platform for medical image processing, using it to develop applications that can help oncological teams like Geert’s What type of data have you received from Amsterdam UMC? FIJOY VADAKKUMPADAN: Oliver, we have received 3D CT images from Geert’s team, and also RECIST data for a number of patients The images are stored in DICOM format, which as you know is the most popular format used in the clinic Geert’s team has also provided contours of liver and lesions drawn by expert radiologists on each of these scans OLIVER SCHABENBERGER: Well, can we take a look? Let’s see what it looks like FIJOY VADAKKUMPADAN: Absolutely What you see on screen is a Python Jupyter notebook connected to SAS Viya The example that I’m going to show you is that of a female patient who was 73 years old at the time of her hospital visit Maybe some of you in the audience has a person like that near and dear to you in your lives On the screen are her data from multiple sources loaded, integrated, and processed all in SAS Viya This is a 3D visualization that I can interact with The image slices that you see on screen are three perpendicular slices from her CT scan Along with the slices, you also see the surface of her liver in transparent blue and the surfaces of her lesions in orange OLIVER SCHABENBERGER: Geert, Fijoy’s application can capture these highly detailed 3D geometries of the lesions and the liver from your data What are your thoughts on this when you see these images? GEERT KAZEMIER: Yeah, I’m very excited I mean, it’s amazing to see how far you guys came Those details– patient specific geometry is exactly the kind of information ignored by the current RECIST worldwide, actually I can’t wait– I have to be honest– to see the new criteria we can come up with and use those data FIJOY VADAKKUMPADAN: Sure, Geert The first criterion that we looked at was the total lesion volume in each of these scans We can compute quantities like that using a specialized action in SAS Viya now Let me run that action and show you the results What you see on the x-axis are 10 patient IDs And on the y-axis, we have total lesion volumes The blue bar shows the lesion volume before any therapy The orange bar shows the total lesion volume after therapy Now, therapy was continued for some patients And for those patients, the green bar shows the total lesion volume after continued therapy It’s clear from this plot that this volumetric captures the shrinkage of tumor that occurs in most cases during therapy OLIVER SCHABENBERGER: OK That sounds great It looks like we’re trending in the expected direction with the criteria I assume that this total lesion volume is more accurate than just a RECIST diameter, because we’re working with a 3D volume Do we have any quantitative evidence for how this might improve evaluation of the treatment response? FIJOY VADAKKUMPADAN: We do What I did was to take each volume value and from that calculate the diameter of a sphere that has the same volume Let’s call it the 3D diameter and look at the results for an example patient On the screen are data from a 69-year-old male patient On the left, you see his RECIST diameter going from 32 millimeter to 24 millimeter That is about 25%– that is exactly 25% reduction Now, that didn’t quite meet the 30% threshold that RECIST has to be considered responsive, so he was classified as stable by the radiologist Now look at his 3D diameter It goes from 33 millimeter to 23 millimeter, which is about 30% reduction If we use the same threshold of 30% for this new metric, he can be classified as responsive GEERT KAZEMIER: And that’s very important That actually can be life changing, because we know that patients who we call responsive, they can benefit from surgery And patients that we call stable cannot

So this can save lives, since we know that chemo alone can never cure a patient We most certainly need to investigate this new metric for the test OLIVER SCHABENBERGER: So we have a new metric that is potentially more accurate than RECIST But it’s still based on manual delineations of the tumor boundaries, which means that it does doesn’t quite address all the limitations of RECIST, in terms of the subjectivity of the work Do you have anything that will address those particular limitations? FIJOY VADAKKUMPADAN: That’s a good point, Oliver I want to show you preliminary results of applying the object detection capability of SAS Viya for response assessment First I took the pre-processed data that I showed earlier and generated bounding boxes of lesions in all slices Take a look What you see on the screen our example slices with rectangles around tumors Using these data, I trained a convolutional neural network based deep learning model in SAS Viya Let me show you a plot that illustrates the training process OLIVER SCHABENBERGER: So to make it clear, these are the bounding boxes determined by radiologists FIJOY VADAKKUMPADAN: Yes OLIVER SCHABENBERGER: Now we’re training a computer vision model on that data FIJOY VADAKKUMPADAN: Based on that The last function here on y-axis is the objective function that is minimized during training You can see that it gradually decreases with the number of epochs on the x-axis, which is the number of passes through the training data, indicating convergence This is what you want to see when you train a model Now, let’s score this trained model on a set of test slices OLIVER SCHABENBERGER: So now we’re looking at how well the model you trained performs FIJOY VADAKKUMPADAN: While the model is running, it’s called TinyYOLOV2 It has nine convolutional layers and about 11 million parameters It looks like the model has finished running Let me scroll it up so you can see What you see on the screen are results of automatic lesion detection performed in SAS Viya on some example slices OLIVER SCHABENBERGER: This is very impressive So we now have an AI model trained What impact here does this have such an automatic metric we can derive from here? Like, this one have teams like yours How can this be deployed in the clinic? GEERT KAZEMIER: Yeah, first such automation will save those radiologists a lot of time– as I explained to you earlier, 20 minutes per scan This is very important, given that some of our radiologist spent about a third of the daily work on RECIST OLIVER SCHABENBERGER: A third every day GEERT KAZEMIER: And I can’t share the secret with you, Oliver They don’t consider these measuring tasks the most inspiring part of their job, as you can imagine And secondly, it provides a more objective response assessment metric that will help us to treat patients consistently I’m very, very, very impressed with the results FIJOY VADAKKUMPADAN: We have actually a plot that shows the objective metric What I did was to take these bounding boxes and then calculated a single lesion-sized metric for each scan based on the side lengths of the bounding boxes Let’s call this the YOLO diameter and look at the results for all patients Again, on the x-axis, you see the patient IDs On the y-axis, now we have the YOLO diameter The colors have the same meaning as before You can see that this new metric captures the shrinkage of tumor that occurs during therapy in most cases, just like the 3D volumetric that we looked at earlier What you’ve just seen is a demonstration of the value proposition of Viya in medical image analytics, specifically its ability to support applications that can almost fully automatically go from raw images to objective metrics that may be used in the clinic OLIVER SCHABENBERGER: That’s wonderful Here, looking ahead, it seems to me that the new criteria we’re developing and deriving, you have applications beyond colorectal cancer and liver metastases and colorectal cancer Where do you see applications outside of this? GEERT KAZEMIER: Yeah, most definitely First, the new criteria we’re deriving may be applicable to other solid tumors I mean, this is just a use case that we came up with– other tumors, like breast cancer, lung cancer And secondly, some of those new criteria by themselves, or in combination with other data– I could imagine genomic data, your DNA, or whatever– may help us to predict outcome of surgery and overall patient survival much better than we do now And such predictive analytics is extremely important to us We know that not all patients respond to surgery or chemotherapy equally well OLIVER SCHABENBERGER: Yeah, we’ve made some great progress here to develop more reliable and repeatable metrics for medical images It helps with automation, saving precious time of medical professionals, and when

we talk about artificial intelligence augmenting us, supporting us, making us better at what we do, this is exactly what we have in mind But we’ve really only scratched the surface of what’s possible Geert, I totally agree Predictive analytics based on combining better intelligence about medical images with other sources of data, genetic information, environmental information, is the next logical and important step And personalized medicine, reliably predicting what will happen to the patient rather than to an average patient– that should be our goal Fijoy, where can we find out more about medical image analytics on the SAS platform and the SAS partnership with Amsterdam UMC? FIJOY VADAKKUMPADAN: We have two breakout sessions on these topics, one presented by myself and Dr. Joost Huiskens, and another by Dr. Xindian Long Please check them out OLIVER SCHABENBERGER: Geert and Fijoy, thank you very much for being with us today and for the very important work that you do FIJOY VADAKKUMPADAN: Thank you, Oliver OLIVER SCHABENBERGER: Good job [APPLAUSE] Ladies and gentlemen, you just experienced the following– medical image processing in SAS Viya to improve estimates of tumor lesion size and volume, augmenting a clinician by applying a machine learning model, and the power of combining data sources in the service of predicting health outcomes In this demo, Fijoy worked with an artificial neural network to recognize tumor lesions on those images And while the algorithm allows us to process more images, extracting better information faster, such automation also raises important questions Are the algorithms reliable? Can they be trusted? Are they performing as expected and anticipated by their designers? Are they equally accurate for men and women? Are factors that matter accounted for? Are protected classes indeed protected? Saying that software works as coded has never been an acceptable answer All software works as coded In this era of machine learning and artificial intelligence, we must rethink our approach and ask whether algorithms work not as designed, but do they work as intended? A set of data is a snapshot of the world It does not tell us how the world works Take all the patient data in the world, and algorithms can find patterns and correlate conditions with outcomes But they cannot learn medicine The desire and need for transparent and fair decisions naturally leads us to questions about interpretability, explainability, and bias of algorithms None of this is, new but it is amplified today because of the speed and the scale with which we can automate human tasks and the new domains, as you have just seen, into which data automation has penetrated We rightly want to know how we fare when important decisions about our lives are arbitrated by technology that is outside of our control A poorly placed ad is much less consequential than a misdiagnosed disease a college admission denied or financial reputation harmed by misrepresenting a disadvantaged group Interpretability uses a mathematical understanding of the outputs of a machine learning model How does the model react to changes in the inputs, for example Explainability goes further than that It involves full verbal explanation of how a model functions, what parts of it were derived automatically, what parts were modified in post-processing, how does the model meet regulations, and so forth Here to discuss and demonstrate model interpretability and bias is Xin Hunt, software developer in the AI and Machine Learning R&D at SAS [VIDEO PLAYBACK] – The first time I really got interested in software was in college I was in an engineering degree, so we had some programming classes One of the classes was teaching compiled language It was really fun and really got me interested in software developing I think what I’m doing, what I’m building, is going to have a big impact on the future of machine learning because in order for general public to accept certain tools, these kind of models,

for the society to accept it, you have to understand it And also it’s really fun I like working with the people here We have a wonderful, dedicated, hardworking group of people who are super-smart And ever since I came here as an intern, I felt like it’s a great group to work with Everybody was so friendly and so smart And all our products are vigorously tested, so we know it’s going to be easy to use and robust and reliable One thing about SAS software is so much dedication innovation goes in there We have whole groups working on the cutting edge machine learning and AI algorithms It’s also, I think– SAS software is for everyone, from novice practitioners to data scientists, very senior data scientists, you can always find a platform that suits the best for you [END PLAYBACK] OLIVER SCHABENBERGER: Good morning, Xin Welcome to the stage Xin, this is your first Global Forum, right? XIN HUNT: Yes, very excited to be here OLIVER SCHABENBERGER: Way to start out Xin, many machine learning models and AI models we are building today are not easily understandable We cannot just look at their parameters and figure out what’s going on and make sense of it And it’s these type of models that we want to focus on right now Xin, how would interpretability help the radiologists, the clinicians, in the lesion detection application Geert and Fijoy just showed us? XIN HUNT: I’d love to tell you all about that But before that, let’s take a step back and take a quick look at one of the difficulties detection algorithms tend to have So if you look at the demo right here, we’ll see that for each of the lesions the model detects, it gives you a probability the model decides the legion actually exists there So this means in the model the algorithm has to set a threshold And in the end, the model only shows you a bounding box if the probability is higher than that threshold This is tricky to set OLIVER SCHABENBERGER: So we could have, depending on what you said, more false positives So we might miss a lesion that actually exists How can we mitigate that risk? XIN HUNT: Yes So let me show you an example first Here in the middle, you see the ground truth labeled by the clinician On the left and right, we intentionally set the threshold a little bit too high and a little bit too low And you can see that in both cases, you’re met with either false positives or false negatives OLIVER SCHABENBERGER: So what do we do about this? How do we set those thresholds? XIN HUNT: Exactly OLIVER SCHABENBERGER: Or how do we explain how those images are detected? XIN HUNT: Right For these cases for medical applications, it’s extremely tricky because even a small number of mistakes is dangerous So what we really need is a clinician’s final decision and judgment So luckily for a good model, most of the mistakes are made right near the threshold I call those the marginal cases As you can see on the right, the marginal cases tend to have low contrast and irregular shapes Those are best recognized by a trained professional So what we want to do is have the model take a look at the images first, label those it’s confident about in green as lesions directly, and pass on those marginal cases to the clinician so they can make the final decision OLIVER SCHABENBERGER: It’s almost like giving the clinician a virtual assistant The model explains, or tries to explain, what it sees in the image XIN HUNT: Exactly It’s like an assistant– actually, let’s fire up our assistant here In this assistant here, we combine the capability of model interpretability and our natural language generation to generate a short report for the clinician OLIVER SCHABENBERGER: So you’re running Shapley method in SAS Viya XIN HUNT: Yes The Shapley method we’re running here, we actually call it HyperShap It’s a patent pending algorithm we developed here at SAS We patented this very scalable, accurate model agnostic explainer based on Shapley values, which gives you an idea how each variable– or in this case how

each pixel– contributes to the final decision made by the model OLIVER SCHABENBERGER: And without those performance improvements, without that scalability, we would not be able actually to automate that virtual system that you’re showing us now XIN HUNT: Right OLIVER SCHABENBERGER: So the results are back What do we see here? XIN HUNT: Let’s take a look The report says, hey, I found two lesions here with high probabilities So I labeled them directly in green on the left There’s one more area on the top of the image labeled in orange, because I’m not super-sure about it The red pixels in the explanations in that area shows why the model thought there could be a lesion OLIVER SCHABENBERGER: And the text, where does that come from? XIN HUNT: That is from the natural language generation tool I was talking about It can be changed to fit any type of application we’re running OLIVER SCHABENBERGER: So we want to reduce the workload of the clinicians by doing an initial pass with the model But why does the clinician need to know what the model thinks? XIN HUNT: There are a few reasons First of all, you see that the marginal cases we are passing on to the clinicians tend to have low contrast, and it’s hard for really anyone to see So if we can highlight here, where’s the red pixels, and show where the area really is, the clinicians can make a decision faster and more reliable It also– OLIVER SCHABENBERGER: Yeah, go ahead XIN HUNT: It also works as a feedback loop, where the explanations– if the model makes a mistake, the clinician can send the explanations back to the person who built the model, and it can potentially be used to figure out what went wrong and to further improve that model OLIVER SCHABENBERGER: That’s a very important point When we talk about augmentation, it’s not just the machine augmenting us It’s also us augmenting the machine It’s really augmenting both ways XIN HUNT: Yes OLIVER SCHABENBERGER: That’s exciting That’s wonderful So we have a model that now makes itself interpretable The computer vision model explains its eyes, both visually and in natural language Let’s shift gears a little bit And I’m going to take on a different persona now I’m in charge of college admissions at a university or in a county or state And I’m thinking about using machine learning– machine learning to gauge maybe a student’s propensity or aptitude for college And I’ve heard there’s some really cool machine learning stuff out there in AI And so I asked the data science team to come up with a model, which they did And they handed it to me They said, it’s a gradient boosting thingamajiggy It’s really, really cool I don’t know what that means So should I not deploy this model for real? Should I use this to score students and use this in college admissions? Xin, you’re my ethicist Thank god you’re here Tell me what I do with this model XIN HUNT: Sure So let’s first load the model and take a look So now we have the model The first thing we will want to see is what is in there, what variables are contributing to the decision process of that model So what we do is we run partial dependence to analyze all the potential variables that possibly would be used in the data set and take a look at their contribution OLIVER SCHABENBERGER: All right We’ve got a graph back What does that tell us? XIN HUNT: We see in the data set there are five relevant variables, including SAT score, the highest math class the student took in school, GPA, extracurricular activities, and high school ranking The analysis found that out of the five variables, four of them have significant contribution to the decision making process And this one variable, the high school rank variable, does not affect the model very much So it’s probably not being used by the model OLIVER SCHABENBERGER: Oh, and I see you used natural language generation to help me actually understand what that graph says That’s great So that makes sense to me I see the probability for college admission depends on your SAT score, goes up with an increasing SAT score That makes sense I feel more comfortable now about this model But I still don’t quite know how it works What would happen if I applied this model to the students? XIN HUNT: So one thing we will want to see is if the model is fair and unbiased, especially towards different groups of people So here we have, say, two counties, and we want to make sure that the model is behaving fairly to the students from them So what we run here– OLIVER SCHABENBERGER: So we have sort of an expectation how the model should behave And now we’re comparing the reality against the expectation XIN HUNT: Yes Here I ran two things On the left is the ICE plot, Individual Conditional

Expectation On the right is the partial dependence plot by county So on the left, each line is an individual, how their probability of admission would change if you changed your SAT scores and holding everything else constant On the right-hand side is the group average So what we see here, there is actually a small discrepancy between the two groups OLIVER SCHABENBERGER: Well, I don’t know– what was that, Individual Conditional– I don’t know what that means, Individual Conditional Expectation But I can look at the plot on the right, and I’m not comfortable So if students have the same SAT score– say, 1,000– then if they live in County B or going to school in County B, they are less likely to get into college compared to a student in County A XIN HUNT: Yes, that’s what the explanation for the model is saying to us OLIVER SCHABENBERGER: I would not have expected that We’ll provide the same resources, we have the same quality teachers in the counties What could explain that difference? XIN HUNT: Well, since our models are trained on the data, usually we want to find out what was causing it from the data So the first step is to take a look at our data and see what’s different between those two groups in the data, and that will give us an idea of why the model predicts different probabilities for them Here I plot– on the left is the mean difference between the two counties, using County B as a baseline We have four dots, four different variables And we see, out of the four variables used by the model, three of them are pretty similar Their difference is close to zero And only one variable stands out It’s the highest math level County A students tend to have highest math level than County B OLIVER SCHABENBERGER: Oh, OK I see what’s driving this If you take higher math classes, then this is a contributing factor to increasing the probability that you get into college XIN HUNT: Right OLIVER SCHABENBERGER: But I would not have expected that, because I thought that the math levels we’re offering in the counties are similar XIN HUNT: Well, there are two possibilities One is the two counties are actually offering different educational programs In that case, you would want to change the model to include that county information so we don’t penalize students from County B by just being in a different county On the other hand, if the assumption– or we know that two counties are offering similar classes, students are taking them but we are seeing a difference in the data, then that means we could be collecting data that’s not representative of the student population OLIVER SCHABENBERGER: So now we’re starting to talk about the root cause of a model deviating from our expectation It could be the model is wrong where the model needs to be corrected, or the input data does not represent what we really had in mind And then should we correct the model, or should we correct the data? XIN HUNT: It depends on the assumption Here we assume that the data is bad because we assume the two counties are actually offering similar classes Students take them similarly too So we are seeing the distribution difference in the student taking classes Then we want to either recollect the data, or if that’s not feasible we balance the data OLIVER SCHABENBERGER: I don’t have funds to go out and collect data on all the students in all the counties now But I see that this is unexpected Distribution of the students in the highest math level should be the same Can we just focus on those students and add more samples for that? XIN HUNT: Yes, we can do that We can resample the data to increase the percentage of County B students with high math classes, so that the distribution between two counties are similar in the end OLIVER SCHABENBERGER: And we would have to retrain the model, then? XIN HUNT: Yes, we will have to retrain the model And we do that and plot out the partial dependence and ICE plot again On the left is the original plots we saw earlier And on the right is after the data balancing, the two counties’ differences are now very small And basically they’re not statistically significant OLIVER SCHABENBERGER: Xin, thank you very much for joining us and for demoing this morning XIN HUNT: Thank you OLIVER SCHABENBERGER: It was wonderful [APPLAUSE] Thank you Well, should we correct the model, or should we change the data? We just showed you how to identify and correct potential bias in a model I think there’s a very important message here– that this is not a task that’s left to the data scientist alone It requires agreements on policy, regulations, and a clear definition of what success looks like, as well as an understanding of the data we expect, what it should be representative of on the data that we have This is really a conversation for all of us Ladies and gentlemen, this segment you saw the following–

a complex computer vision model that makes itself interpretable, a patent pending enhancement to the popular Shapley method that makes that interpretability scalable, and how to examine and correct data in a model for possible bias Putting analytics into action invariably requires automation of data flows, data processing, and decisioning We are dealing with increasingly voluminous data, and automation allows us to scale data prep and data processing We are dealing with increasingly varied data, unstructured data from logs transcripts and voice recordings Automating natural language processing ensures that these data are not left behind And we are dealing with increasingly complex models Finding the best model and its best parameters and hyperparameters is really facilitated through automation And maybe more most importantly, we are democratizing analytics, and allowing and enabling everyone to consume and to produce analytics The business analyst, the field engineer, police officers at headquarters and on the street should be able to produce and consume right-time insights Last night, during the opening session, we introduced you to New Hanover County in North Carolina, home of the city of Wilmington and ground zero for the opioid epidemic The extent of this epidemic comes into focus when you think about this statistic– 12% of the population of New Hanover County, one in eight, are abusing opioids This has huge impact on children With SAS Visual Investigator on Viya, the Department of Social Services can bring together disparate data sources from law enforcement, case management, 911 calls, and generate in near real time rule-based alerts when a child’s risk level has increased Now, let’s kick this up a notch What if– what if we could use the historical data to develop a machine learning model to predict a risk score for every child? And that score can accompany the alert and helps the social work to prioritize visits and follow-ups How then could we automate the modeling and deployment steps and derive a model that we feel good about, a model that we trust? Here to put analytics into action are Susan Haller, Director of Advanced Analytics at R&D, and Dragos Coles, Senior Machine Developer at SAS– Machine Learning Developer [VIDEO PLAYBACK] – I have been at SAS for 20 years, over 20 years So I’ve spent half of my life here And what I find exciting is that every day I come through the door I’m happy to be here, and I’m excited about the new challenges that are presented to me, working with my colleagues to come up with creative ways to kind of solve those problems – I mean, work is one thing, right? Work is important It’s important you like to work But it’s probably just as important that you like the people that you work with – We have created a new product that allows you to build dynamic and automated machine learning models – If you want to do machine learning, a data scientist would go through multiple steps to be able to model and build that final model, right? We’re taking all that work and we’re hiding it behind one click – This particular project has been super exciting to me since day one If you think about it, we’re taking analytics and we’re making them accessible to everyone – You know, we talk to a lot of customers who, when you mention machine learning, they’re interested in it They’ve heard the terminology But they’re afraid of it, so they don’t know how to get started This is going to be an enabling technology for those users It’s rewarding when you work on something that will be a real application that somebody can use So I’m not talking about things that are just cool because they sound cool, but things that are cool because they can have an impact – At the end of the day, I hope that the work that I’m doing helps our customers do their job better and more efficiently, so make them more productive, enable them to answer more complex business problems, allow them to look in their data and find information that may help them make a difference [END PLAYBACK] [APPLAUSE] OLIVER SCHABENBERGER: Susan and Dragos, thank you for joining us today Before we start out, I want to point out that the technology you’re about to see is not yet in use by New Hanover County We are showing technology that will soon be available from SAS OK– Susan, your role is now the senior data scientist, and you’re guiding Dragos, a business analyst at the Department of Social Services

Dragos, you are about to be augmented by artificial intelligence and machine learning Good luck DRAGOS COLES: I’m excited SUSAN HALLER: Thank you, Oliver As you’ve just heard, we have been tasked with building a machine learning model to generate and assign a safety risk score to each of the kids who are being followed by the Department of Social Services As you can imagine, lots of people are interested in the field of machine learning, but not everybody knows how to get started in building such a model With that in mind, our team of data scientists has built a very simple and custom web application that the business analyst in the department, such as my colleague Dragos, can use to get started building a dynamic and automated model So we’re going to spend just a few minutes with you this morning walking through building that model using this custom application while at the same time walking you through each of the steps that we’re executing underneath the covers So Dragos, let’s get started DRAGOS COLES: OK So what do you want me to do, just fill in these parameters? SUSAN HALLER: That’s it DRAGOS COLES: That’s simple enough So assign a project name, select a data source SUSAN HALLER: So here, the data science team has gone ahead and identified a handful of tables that could be useful in this model exercise Considering what we’ve been asked to build, let’s go ahead and select the child safety data DRAGOS COLES: OK And what’s our goal here? SUSAN HALLER: Now that we have our data, we’re presented with a list of variables in that data And by goal, we’re simply asking for you to identify the variable that represents the goal or the outcome that we’re trying to project in this model DRAGOS COLES: OK So in this case, we’re going with their safety risk flag SUSAN HALLER: That’s right That’s it You have now provided all of the required information that I need for you to go ahead and start building a model All that’s left is for Dragos to click that Build Model button Behind that button is a very powerful tool coming from SAS that offers an API for dynamic automated model building DRAGOS COLES: OK So, I mean, this sounds really simple, but what is an API? SUSAN HALLER: Ah API– anyone can build their own custom application as we’ve seen here based on their business problem, while at the same time embedding and leveraging SAS’ machine learning capabilities DRAGOS COLES: Maybe that’s too easy I’ll just run it SUSAN HALLER: Let’s run it DRAGOS COLES: OK, so right now machine learning is running behind the scenes Does that include any data preparation steps? SUSAN HALLER: Of course Imagine, if you will, that this API is simply emulating what I as a data scientist would do if I had been tasked with building this model by hand So first I’m going to explore my data Are there any issues that I need to resolve? Second, I’m going to iterate through different data preparation techniques– transformations, imputation, things such as that And finally, I’m even going to automate the building of features for you DRAGOS COLES: OK As a data scientist, though, you have to consider different type of models when you want to build the best model, right? What’s available here? SUSAN HALLER: So the API is obviously going to consider a variety of different models, finding the best model type for your data It’s going to look at things like radium boosting models, neural networks, random forest to name a few DRAGOS COLES: OK, sounds good But one thing that I heard about data scientists working on projects like this is they go through this iterative process of data preparation, some feature engineering, and then more modeling Is that iterative process running behind the scenes? SUSAN HALLER: This is where the intelligent part of the automation comes into play So at each step along the way, the API is going to continually reassess It’s going to add steps to the model It’s going to remove things that are no longer necessary It may go back and revisit existing steps and make modifications to them And when the API is happy with the data preparation and the model that it’s built, it goes one step further and creates an ensemble model, trying to improve our overall model accuracy DRAGOS COLES: Wow, Susan I mean, it really sounds like what we have here is a data scientist behind the click, right? It’s kind of you behind a button SUSAN HALLER: I guess you can say that And in just a few short minutes, you can see here that as we walk through each of the steps that we’re running behind that API, Dragos has gone ahead and created a model that helps us predict that safety risk score DRAGOS COLES: OK Now, we got all this output from the API Since I’m new to this, let me see if I can understand what’s happening here If we look at the project summary which, top left side, seems like we’re getting a summary of the project, but it seems a little bit like this text might be dynamic So it was telling us that our model is based on the KS statistic on the Test partition We have an accuracy rate of about 90%

SUSAN HALLER: I’m glad you noticed that Worth mentioning, included in this automation process is natural language generation, where we’re dynamically building this text for you based on your model and your data DRAGOS COLES: OK If I look over to the right side, I see that our best model is a gradient boosting model with 10% misclassification On the bottom left, the most important variable plot seems that this is listing our predictive attributes, sorted by relative importance And looking at these attributes, I can understand some of them, because I know the data So we have school reports in the last 60 days We have the parental attachment score I can intuitively understand where these prefixes are coming from, like impute or transform This PC1 and PC3, I’m pretty sure those variables are not in the original data You know, I really wish I could see what happened behind the scenes so I can understand where these things are coming from SUSAN HALLER: You are in luck So if you will, go ahead and select that Open Pipeline link at the top of your application Now, when Dragos executed the API to build his dynamic model, he also created a new project in a SAS product called Visual Data Mining and Machine Learning Visual Data Mining and Machine Learning provides a very nice visual representation and editable representation of each of the steps of the model that was created for us DRAGOS COLES: OK So you’re saying that the process is transparent and now this project is editable? SUSAN HALLER: That’s exactly right And remember, dynamic as well– so data specific Had Dragos selected a different data source or even a different goal for that matter, this pipeline could look vastly different DRAGOS COLES: OK, let’s go through this pipeline a little bit It looks like the orange nodes are data pre-processing nodes So we see we have some transformations, we have Variable Selection, Imputation, Feature Extraction here I mean, this is fairly intuitive, just understanding the process SUSAN HALLER: And it’s these exact data preparation steps that resulted in those variables that Dragos inquired about just a minute ago in his variable importance listing The feature extraction node, for example, is running a principal component analysis And that principal component analysis is creating some new features for us that were labeled PC1, PC2, and PC3, and we found those as significant in our model DRAGOS COLES: OK Looking further down, we have our modeling nodes It looks like the green ones are the modeling nodes You mentioned that the project is editable, right? So if I select a node, now I get a property panel over there on the right side I can edit those properties? SUSAN HALLER: That’s exactly right So here we’re looking at the properties associated with the Gradient Boosting model But every node in our pipeline has a similar property listing Not only do you see the properties themselves, but you also see the optimal value for each property that was selected by the automation process So I, as a data scientist, if I wanted to come in here and start changing things, see if I could make some modifications, could easily do So for example, I might want to see if I could reduce the complexity of my gradient boosting model while at the same time retaining the same accuracy The optimization process selected 75 trees from a gradient based model Dragos, why don’t you go ahead and change it to 50? You see we can easily do this, he can rerun the node, and update the model DRAGOS COLES: So what if I want to add a new node in the project? Can I do that? SUSAN HALLER: Of course So just like you can edit the properties to update your model, you can also insert new steps And you can do that by dragging nodes from the tools palette that he has expanded here into any step within your pipeline So it’s a very editable process, also very flexible If you notice, there are two nodes listed on the palette that allow you to inject your own custom code That custom code can be SAS-based code, obviously, or it can be open source, if you want to include R or Python into your model DRAGOS COLES: OK I mean, we have a project that gave us a good model we’re happy with, right? So how are we going to give this model and put it in the hands of the consumer so they can start making a more informed decision? SUSAN HALLER: Excellent question Obviously we all know that building the model is only the first step in the process It’s just as important that we’re able to deploy this model and get the model into the hands of those who want to consume it So at this point, Dragos, let’s go ahead and leave the SAS Visual Data Mining and Machine Learning product and go back into your custom application You see a Deploy Model button embedded in this application Why don’t you go ahead and click that? DRAGOS COLES: OK Is this another one of those APIs you were talking about? SUSAN HALLER: Of course Just like we had a button that allowed us access to an API

for dynamic and automated model building, we have embedded a similar button here that surfaces another SAS API for one click model deployment DRAGOS COLES: I mean, Susan, I’m really excited about this In about 10 minutes, you showed me how to leverage machine learning behind the scenes with the click of a button I can open that project that gets created behind the scenes I can use it as a learning tool or as a prototyping tool, and then we deployed a model also fairly easily I feel really enabled Thank you SUSAN HALLER: I’m happy you’re excited about the API More importantly, that something like this will enable and empower Dragos and other data analysts in the department to continue building models such as this in the future And if you consider our specific use case, imagine now that when an agent in the field gets an alert that a child needs a follow-up visit, that alert is now augmented with a model-based risk score indicative of their safety DRAGOS COLES: Wow Awesome OLIVER SCHABENBERGER: (SINGING) Happy birthday Happy birthday to you Happy birthday, Dear Susan Happy birthday to you SUSAN HALLER: Thank you OLIVER SCHABENBERGER: Well done And happy birthday SUSAN HALLER: Thank you OLIVER SCHABENBERGER: Dragos That was amazing And DCSH County is quite advanced in its use of machine learning Of course, it’s a fictitious county named after Dragos Coles and Susan Haller, but there’s nothing fictitious about the application or the demo Susan and Dragos, thank you very much Ladies and gentlemen, you just experienced the following– automating the iterative construction of a complex machine learning model in 10 minutes by simply calling one API; transparency of the resulting model– you can examine, you can understand, you can modify; and deploying a final model just as easily by simply calling one API Digital transformation and analytics are not science projects While pilot projects and POCs are important to prove feasibility and ROI, the goal is to impact the organization positively by increasing revenue, lowering costs, raising safety, maybe by launching a new business And there are many barriers to success in data-driven initiatives, chief among them lack of talent, lack of data of the right quality and quantity, difficulty operationalizing analytics, taking it from the science project to operational excellence Susan and Dragos showed us how SAS helps overcome these barriers Automation of the model building process, automation of the model selection process through challenging existing models, automation of the data preparation and feature engineering steps, abstraction of steps that previously required deprogramming expertise and deep analytic expertise, choosing your desired level of automation from an open API to a visual interface to programming interfaces We call this intelligent automation It is data led, dynamic, transparent, and you can look under the hood any time Automation does not mean to look away Automation does not mean you cannot intervene It is not the same as autonomy Analytics is not a science project, and it is not the domain of only statisticians and data scientists– not anymore Everyone can contribute, , everyone can consume, everyone can produce We’ve just now developed and deployed predictive analytics For each case and child, we can predict a risk score Why have we not yet fully operationalized the model yet? How do we put it in the hands of the users? Please meet our next contestant, Sebastian Charrot, Senior Manager in our Scottish R&D team [VIDEO PLAYBACK] – I recently became a dad, so I don’t have much spare time But when I do I like to do a bit of art and drawing My dad was a cartoonist for a number of French newspapers, so as soon as I could hold a pen, I was trying to imitate him And there’s something quite satisfying about the emotion you get when you’re really deep in drawing It’s quite similar to the flow that you get when you’re solving a programming problem If I think back to when I first began the world of work

after graduating, I still remember the sense of deep satisfaction of knowing that I was working on a real product, solving real problems for real people Once you get a taste for that, it’s hard to give up So the bigger police forces currently raise around a million intelligence reports a year That’s a million trips back to the office to raise the information that they’ve gathered in the field That’s a lot of waste of time and effort and manpower Having Mobile Investigator means that you’re no longer desk bound to access the information or the capabilities that you need to do your job It means maximizing the time that you have in the field and allowing you to access all those rich and powerful capabilities on the go And it marks the first time that we’ll be surfacing the operational and investigative powers of Viya to users in the field So it’s a big step So we release a lot of software at SAS And it’s easy to fall into the mindset of thinking about your work in terms of the releases that you ship or the bugs you fix or the features that you implement But in reality, we’re not in the business of delivering features We’re in the business of solving problems for our customers I’m very fortunate to be in a position where I think I know the challenges that our customers face and actually have the power to do something about it I work with some of the most wickedly smart, terrifyingly capable, generous, and creative people, and it’s a real joy to be able to build great things with them [END PLAYBACK] [APPLAUSE] OLIVER SCHABENBERGER: Seb, welcome to the stage SEBASTIAN CHARROT: Thank you much OLIVER SCHABENBERGER: Seb, what are the applications of the model Dragos has just shown us? In the lab, we use machine learning to detect and flag children who are potentially at high risk Now that we have the data, what do we do with it? How do we make use of it in the field, put it in the hands of those who need it? SEBASTIAN CHARROT: Well, SAS has a powerful suite of tools which allow our users to triage alerts, manage their intelligence, and then coordinate any investigations that need to follow from those Until recently, however, access to those capabilities was limited to users sitting at their desks in the office or the station, which is why I’m really proud to announce that we recently launched SAS Mobile Investigator, a mobile application which surfaces the operational and investigative powers of SAS Viya to users in the field So if we pick up where Susan and Dragos left off and continue our scenario, let’s say that I’m a police officer working in the Child Protection Unit So it’s my job to liaise with social workers, visit certain at risk children, assess the situations, and then determine any necessary course of action that we need to take And how do I know who to visit? Well, using Susan and Dragos’ model, we can generate a number of tasks to visit the highest risk children and assign those tasks to myself and other officers in the field So why don’t we just jump in and see how it plays? OK, so on my home screen here, you see at the bottom right hand, I have Mobile investigator installed So we’ll launch the app, and we’ll sign into the system Now, the first screen you’ll see here will be the Mobile Investigator homepage It’s your one-stop shop for all functionality in the system And on the banner, you’ll see I have a number of notifications Now, clicking this will take me to my prioritized task view So that’s a view of every task that’s been assigned to me in the system OLIVER SCHABENBERGER: So the model that Dragos and Susan developed is already running? It’s prioritized your tasks based on the risk score? SEBASTIAN CHARROT: Absolutely So the highest risk is at the top So it looks like Jack is indeed the highest risk child on my list So we’ll click into his records and have a look So we have an address We even have it plotted on the map So how about we just go visit Jack? Now, there’s a button underneath my map here to navigate If I click that, it’ll take Jack’s address and then launch my external map app and show me a route together Now, that’s the first of many examples of Mobile Investigator tapping into native capabilities to streamline things for its users OLIVER SCHABENBERGER: Let’s pause on this for a second, just so we can appreciate this The only other way I could have previously accessed all this information is turn the car around, go back to the station, go to the desk, do some research, and then head back out again SEBASTIAN CHARROT: Absolutely OLIVER SCHABENBERGER: A lot of waste of time and effort, now eliminated by just placing that information right into the hands of the police officer or the child safety person SEBASTIAN CHARROT: Exactly Now, let’s say we’re heading there We jump into the car, and my partner is driving using those directions And while we’re en route, I want to do a bit of research to see what else we know about Jack So I can take a look here There’s some basic details, everything you’d expect He’s a nine-year-old boy I can see his family details, so I know that Pete and Jane are his parents And crucially, I can see the risk factors that have come to play to determine Jack’s high risk score So these are all things that we should maybe be looking at which explain our level of concern And I could really drill into those

and appraise myself of that if I wanted Now, as well as all this core information, I can also see any documents that have been uploaded and associated with this file So I can see a couple of prior social care visit reports, and we even have a photograph of Jack OLIVER SCHABENBERGER: Recognize Jack? That was this COO/CTO a few years ago SEBASTIAN CHARROT: So additionally, I can see that Jack’s file here forms part of a much larger network of information in our system So he’s actually related to other reports in our data And a couple of things jump out immediately Firstly, I see that Pete Marsh, his dad, is suspected to be in possession of an unlicensed firearm Now, that’s an officer safety concern And it’s going to change my approach to how I carry out my task OLIVER SCHABENBERGER: So this information you’re receiving in this field is now affecting, shaping how you approach the task SEBASTIAN CHARROT: Absolutely So I may choose to not go into the premises Or I may choose to bring backup But regardless, I’m aware and informed So having this information in the field ahead of time can save officer lives Now, secondly, I can see that Jane, his mom, was arrested only yesterday on a DUI Now, that’s timely and relevant information which I need to have access to and which is going to shape my overall evaluation of Jack’s situation And lastly, I see that Jack has been involved in a number of school incidents in the recent past which maybe I want to discuss with him and his family when I sit down OLIVER SCHABENBERGER: So you visit to Jack You conduct an interview and an assessment While the information is fresh in your mind, what do you do? How can you record your findings? SEBASTIAN CHARROT: Yep There’s one last piece of research I want to do before we do that, and that’s a neighborhood search So we know how crucial the quality of a neighborhood is to the welfare of a child So what I want to do is click this top button to launch my neighborhood search It’ll take my current location It’ll search my immediate vicinity for any relevant intelligence or investigations or incidents that could be of interest to me So we’ll kick off that search And actually, when the results come back, I see there’s a fair amount of drug-related activity in the neighborhood So that’s also something that’s going to factor into my overall assessment So as you see, now it’s time to raise a new visit report So I’ll click a button to do that, and I can start filling this in to my heart’s content Jack was fine I’m always amazed when that works with a Scottish accent OLIVER SCHABENBERGER: Technology is amazing SEBASTIAN CHARROT: It’s amazing Now, I can really start fleshing this out with all the information I’ve gathered during the course of my visit And what you’ll notice is that Mobile Investigator is also capturing and adding in its own information to augment my report So the visit date has been set automatically I’ve been set as the reporter, as well as details of how to contact me– that’s not my real number– as well as the county that I was in– in fact, the exact location that I was in when I raised that report OLIVER SCHABENBERGER: Yep It’s about automating the obvious, time-consuming, and possibly the error-prone task Why spend time on that? SEBASTIAN CHARROT: Yeah And it provides crucial context for the report that I’m raising Now, if anything else comes to my attention, I could always just take a photo of it and upload that So I’ll take a photo of this terrific audience But that could just as easily be a picture of the neighborhood or drug paraphernalia or really anything that I deem to be of relevance So now all that information is in the system OLIVER SCHABENBERGER: Maybe it’s obvious, but I want to point out just how powerful this is The information in your report is now available using SAS Visual investigator to everyone who has access to SAS VI at the station or through Mobile Investigator in the field Systems are updated in real time, not through an overnight batch job And with that new information available, we can do additional reporting We could even kick off that modeling pipeline Dragos and Susan developed a moment ago Because the data collected through our site visit might contain important information and insights that might change how we do the risk score calculations SEBASTIAN CHARROT: Exactly Now, imagine a world without Mobile Investigator I would have had to go back to the station to raise that report, and it might be one of a dozen that I have to raise every day Having this app means that I can make that information available to everyone as soon as we know it and not as soon as traffic or bureaucracy allows OLIVER SCHABENBERGER: Indeed For the first time we have placed the power of SAS Viya into the hands of operational users, allowing them access to data and analytics wherever they are The users can spend more time in the field, are better informed, better equipped, and can do their job more effectively Seb, great work Thank you very much SEBASTIAN CHARROT: Yes OLIVER SCHABENBERGER: I don’t want to imagine a world without Mobile Investigator [APPLAUSE] Ladies and gentlemen, you just experienced the following– a highly flexible application that can be customized for almost any use case; real-time interaction between back end and front line, analytics on the go; the blending of systems of records, systems of engagement, and systems of intelligence

You heard that term throughout the morning, the model We are building a model, testing a model, deploying a model Models are at the heart of analytics, at the heart of data science But they are no longer just narrowly defined statistical models, like a finite mixture or proportional hazard models Today, models are complex pipelines of data transformations, data reductions, with internal tournaments and ensembles of approaches The input is data, the output can be a report, a prediction, a recommendation, a classification, and so on How many models do you have in your organization? Susan and Dragos built one for us Do you have two, three, 400, 2,000? When you work with models, some of the major challenges are knowing whether they are still valid, how can I track their version, their vintage? Is this model superior to one developed in a different language with different libraries? How do I move the model from the sandbox into production? How do I deploy the model in a data stream in Hadoop inside a database or capture its end point with an API– I have models in SAS, R, and Python How do I manage them all? Model management is a key ingredient in making analytics real, in making analytics stick in operation With SAS Model, Manager, you can control the versioning of models, compare them, test them, and publish them You can monitor their performance over time, challenge them, retrain them, and update You can integrate open source models in your data science pipeline and govern them alongside SAS models Please look for presentations on visual data mining and machine learning and Model Manager in the Quad, super demos, and in paper sessions This is the technology journey we took you on today The theme of the tech connection this morning was “Analytics in Action.” We use SAS technology to tackle problems in health care, child safety, fraud, and security intelligence Problems that can only be solved through data and analytic automation exist in many, many fields Here to discuss a domain that is near and dear to all of our hearts, the health of our planet, is John Gibson, Chairman for Energy Technology at Tudor, Pickering, Holt, and Company Welcome, John [APPLAUSE] JOHN GIBSON: How you doing, Oliver? OLIVER SCHABENBERGER: Hello, my friend JOHN GIBSON: Good to see you OLIVER SCHABENBERGER: Come on in John, thank you for being here at Global Forum We’re in Texas, the nerve center of the oil and gas industry And I admire your boots Do you admire my boots? You have deep roots in the oil and gas industry, and you’re an absolute expert in that field Share a little bit with the audience your background JOHN GIBSON: Well, Oliver, believe it or not, my first use of SAS was about 1988 at Chevron Research And so I’ve been a user then– don’t ask me to do anything now, though I couldn’t do a demo for you OLIVER SCHABENBERGER: We can automate this We can visualize it Visual program is very easy We’ll get you into Quad in front of lectern JOHN GIBSON: Well, my career after Chevron, I have had the opportunity to run two of the largest software companies in oil and gas So Landmark Graphics, which we sold to Halliburton Then left Halliburton and did Paradigm Geophysical, which is now Emerson E&P. And so I was CEO of both of those organizations and helped build those platforms, which really do a lot of computer vision and others for the subsurface there So I had a lot of work in the software technology area OLIVER SCHABENBERGER: John, probably the most important topic to the oil and gas industry– and I think to the world– is carbon, CO2, greenhouse gases All link quite closely to climate change How much carbon is being created on an annual basis? JOHN GIBSON: Well, on an annual basis, we’re at about 36 gigatons OLIVER SCHABENBERGER: 36– JOHN GIBSON: Gigatons– which you and I were talking about it It’s hard to visualize 36 gigatons So to try to create a mental image, if you’re familiar with Jerry’s World, the AT&T Stadium, if we took all of the air out of it and extracted the CO2, we’d get about 2.2 tons of CO2 So we only need 18 billion or so Jerry’s Worlds in order to extract the amount of CO2 we’re emitting each year above the carbon cycle OLIVER SCHABENBERGER: Billion, with a B JOHN GIBSON: B, billion OLIVER SCHABENBERGER: We always talk about technology and we focus on its urgencies, what it wants, the progress we made, that today’s better than the Stone Age because of technology

And then there are these side effects, the unintended consequences of technology, like CO2 emissions, like greenhouse gases What is going to happen to the world if we do not address carbon dioxide? JOHN GIBSON: So, you sort of put me on the spot as an oil and gas guy And we’ve been on the spot for the last few years If we don’t address carbon dioxide as a hydrocarbon industry, we can’t sustain the hydrocarbon industry We’ll have to go to a different form of energy We can’t see CO2 levels grow from 410 to 450 without having a plant to begin to address them We now estimate it could be up to 300,000 years for the Earth to restore the carbon level if we just left it alone And so we’re going to have to make positive actions in order to actually reduce the levels as we’re growing them OLIVER SCHABENBERGER: You mentioned this amazing– this huge number, this mind-boggling number of annual output So if you know how much it is, why don’t we do something about it? Isn’t this just an attribution problem, who generates what? JOHN GIBSON: Well, it is I mean, I kind of follow politics on this, and it’s getting to be very political The Green New Deal– I won’t ask everybody to shout out if they’re for it or against it But directionally, that tells you where our country, where the sentiment of our government’s going And so as a result, you’re going to see regulations come We’ve got about 60 bills that are going to be introduced, 30 in the House of Representatives, 30 in the Senate In the absence of a strong EPA, we’re seeing congressional efforts in that So even as we’re speaking now, we’re very close to launching OCO3, which is our newest carbon emission satellite here in the US And so it got no approval from the White House, but it got approval from Congress, and it will be going up shortly OLIVER SCHABENBERGER: So how does data and analytics play a role in all this? Who collects the data today? What do organizations need to know? JOHN GIBSON: Well, the regulation which is coming is really– most people are using greenhouse gas protocol And so on that greenhouse gas protocol, you report in scope one, scope two, scope three– which is what do you use directly, what do you use indirectly– so electricity generated that might come in in scope two– and then scope three would include business travel So if you’re sitting on a United Airlines flight coming here to the conference, what portion of the emissions from that should you be accounting back? Now, as it turns out, one company’s scope one is another company’s scope three And so you can see the hydrocarbon industry has Uber as scope three And then Uber has the hydrocarbon industry in scope one and producing it So just the sheer accounting and reporting of this is going to require some significant analytical models going forward OLIVER SCHABENBERGER: I read a fascinating article about the actual carbon footprint of some of the things we’re doing today There was a carbon footprint about streaming platforms, and we thought the carbon footprint was high when we all used vinyl on turntables But actually it turns out, the carbon footprint might be higher for the streaming music because of all the back end computing and the energy we have to generate to support that JOHN GIBSON: There’s no question there’s unintended consequences We tried to remove carbon, and we increase it We see that in Europe where an intent to be carbon neutral ends up increasing carbon because we end up having to outsource power to coal plants We’ve also seen an elimination of coal in the US, and we’ve seen coal consumption grow by 3% globally So we’ve underestimated the human element, which is that need for cheap energy in order to grow the quality of life in other countries And so consequently we’re doing the right thing here, and we’re getting the wrong outcome And so it’s a very complicated problem OLIVER SCHABENBERGER: Yeah Carbon accounting systems– so it’s rolling up all the contribution You mentioned scope one, two, three Where do you see SAS fitting into this urgent need to address carbon? JOHN GIBSON: Well, there’s no question that SAS I think could have a tremendous role And I’m hoping that the end of this session is the beginning of a new journey for SAS in climate accounting, because each company, if you’re one of the chief data scientists here or chief technology officer, you should be thinking about, how do you do scope one, how do you do scope two and scope three, and build a model? And then understand, as you turn those knobs, do you get the desired consequence or an unintended consequence? And how does that risk performance really get coordinated or communicated to a board of directors? You’re at the board level at SAS with these carbon models and how that’s going to create financial risk for organizations I hope next year I’m here and we have somebody actually doing a demo for you that’s really showing how they’ve done their climate model OLIVER SCHABENBERGER: How about you come back and drive that demo for us? JOHN GIBSON: Well, I’m not sure I’m the right guy OLIVER SCHABENBERGER: Something we have not mentioned much today is IOT and connectivity But I see opportunities for when everything is connected,

when devices are talking to each other, just as they report how much electricity they need, maybe they can start reporting without us having to know it their carbon footprint– you know, their scope one, two, three contributions– and we could roll it up JOHN GIBSON: There’s no question– OLIVER SCHABENBERGER: Use technology to address that problem with technology JOHN GIBSON: It has to be that way I mean, it can’t be a system where– if we take a look, there’s a quote on a slide that’ll tell you that KPMG, that 75% of global companies that are producing the majority of the revenue don’t have any statement on climate change In the US, 50% don’t have a statement on climate change Very few are doing greenhouse gas protocol reporting In the absence of data, we get no progress And I think that with SAS and with a data-driven activity associated with climate, we have a future Without it, we have a real problem that’s continuing to accrete if we put more and more CO2 in the air OLIVER SCHABENBERGER: Well, let’s work on the problem to secure the future John, thank you for sharing your insights this much JOHN GIBSON: Thank you so much I appreciate it so much Thank you OLIVER SCHABENBERGER: And John will be here presenting on Tuesday about predicting the unpredictable Technology is unstoppable It’s who we are and what we do– not just at SAS, as a species Technologies are all the inventions of the human mind, not just tools and gadgets– analytics The multidisciplinary effort to derive insight from data is technology And as such, it exhibits the same urgency as all technology It wants to reorganize It wants to become more distributed, abundant, and accessible What we have shown this morning is how these organizing principles manifest themselves, enabling insight and decisioning based on data by those without degrees in data science– jobs made easier, more productive, decisions made more reliably and faster, analytics that follows the data It becomes more distributed It’s supplied by the right person at the right place and time Analytics moves from science projects into operations, the hospital, the Department of Social Services, the field engineer, and police officer At SAS, we are on a mission– on a mission to remove barriers to producing and consuming analytics through visual interfaces at parity with programming interfaces, through open source integration, through APIs that make building and deploying models simpler, through automation of analytics, embedding analytics You can see this play out throughout this conference in talks, super-demos, and in the Quad Look for it– analytics in action, hidden in plain sight Enjoy the rest of the conference, and thank you very much [APPLAUSE]