Created with Sketch.
30 minutes | May 4, 2021
Telmo Silva Talks ClicData
Telmo Silva created ClicData, an end-to-end SAAS BI platform, which as he describes, is the little guy coming up in the BI platform world. He talks about how his company was started, where it’s been, and where it’s going with cutting-edge R&D. He also offers additional thoughts on the role of data in the business world today.
27 minutes | Apr 16, 2021
Pricing with Cactus Raazi
Keeping quality customers is the aim of nearly every healthy business. Cactus Raazi challenges the typical methods of doing this and suggests alternative data-focused pricing strategies in order for businesses to survive in the future.
26 minutes | Mar 25, 2021
AI Making Developers more Effective
Robin Purohit talks to us about how he and his company are creating AI tools to help developers be more effective. Learn what their approach is, how they're training their models, and where they're headed in the future.
24 minutes | Feb 27, 2021
Overcoming Cultural Hurdles in Tech
As the first Mexican woman to get a PhD from Stanford, Debbie Berebichez has experienced what it takes to challenge norms and boundaries. She speaks about her personal experience overcoming cultural norms regarding women in STEM and her professional experience training company culture in data science. Debbie Berebichez: It’s very important to encourage people to be evidence based. To see, okay, if you have a new idea for business, search for the metrics that are going to tell you that that idea may work or why it may not work, but set those parameters beforehand. Ginette: I’m Ginette, Curtis: and I’m Curtis, Ginette: and you are listening to Data Crunch, Curtis: a podcast about how applied data science, machine learning, and artificial intelligence are changing the world. Ginette: Data Crunch is produced by the Data Crunch Corporation, an analytics, training, and consulting company. Up until this month, Debbie Berebichez was Metis’s chief data scientist. We talk with her about her journey into STEM and her perspectives on data science. Debbie: Thank you for inviting me, Curtis. I love your Data Crunch Podcast, and it’s an honor for me to be here. I have kind of a unique story in that I was born in Mexico City, and I grew up in a fairly conservative community that discouraged girls from pursuing careers in STEM and specifically in science and math. And I was a very curious child. I, I asked a lot of questions, and I wanted to know things about the world and, and understand about the planets and how things work and growing up in a community like that, you know, it was cherished, but at a certain point when it came time to go to college, my teachers in school, as well as the counselors and my friends, including my parents, they were like, “no, you probably should pick a more feminine career, something easier.” My mother told me, be careful. Don’t tell boys that you like math ’cause you may end up not being able to get married, which almost happened. But, uh, it really, it was just a thing that I had to hide. And with that came a lot of insecurity because I just thought I’m not good enough in math and these topics, and, and I will never, never be able to do it. Until the day came to go to college, and my advisors in school had said, “you know, what, why don’t you study philosophy?” The funny thing is during high school, when people were doing their crazy rascal things and whatnot, I was actually getting books from the library about obscure physicists, like Tycho Brahe, and reading about how they were locked up in a tower or an observatory, and I thought to myself, “you know, maybe I’ll be like them. Maybe I’ll be locked up in, in some tower and alone, and, and, you know, not be a very sociable person. However, I’ll have my science and my observations with me, and that made me happy.” And so I grew up like that. And so when I had to decide to go to college, someone said, “you know, philosophy also studies several people and what their, their ideas about life are, and they’re very curious.” So I said, “okay, fine.” That, that appeased everyone around me. And I started studying philosophy. Two years in, my hunger to know about the world and the universe was louder than ever, and it was not going to go away. So I decided to apply behind everyone’s back to schools in the US, because I had learned that in the US, you can do a double major and study more than one topic, which I couldn’t do in Mexico, but I was afraid because my parents couldn’t pay an American University when we were paying an eighth of that in Mexico City for a private university. So I didn’t know if I could afford it in the middle of the application process. I got a beautiful offer from Brandeis University, a small university in Massachusetts that offered me a full scholarship that was offered to two international students per year to attend Brandeis. And I was so incredibly lucky and happy. I flew to Massachusetts. I had never seen the snow before. It was in the middle of the winter, ’cause I was a transfer student from Mexico. And here I am, I enrolled in my philosophy courses, and they didn’t really know what to do with me because I had already studied so much in Mexico, given that that was the only topic we were allowed to study. And so, my first semester, I had the courage to take a very generic course in astronomy, Astronomy 101. And I met the, a graduate student who was the assistant for the class. His name is Rupesh Ojah, and he came from India and Rupesh and I became very good friends, and we would walk around campus and I would ask him all kinds of questions about the universe and planetary emotion and the laws of physics. And he was the first person to really believe in me. And he said, “you know what you, you’re not the typical student that just has, you know, the, the thirst to have an A in, in the homeworks. You really care about this. You have so much passion. So one day we were walking in Harvard Square, and I told him, “Rupesh, I just don’t want to die without trying. I don’t want to die without trying to do physics.” So he got up, and we called his advisor, who was the head of the physics department at Brandeis. We had a meeting, and he basically handed me a book. It was calculus in three dimensions. It was called Div, Grad, and Curl, which was an alien language to me. And he said, “look, there’s somebody else.” ‘Cause I had a problem that my scholarship was only for two years, and that’s what I had left. Just those two years. And so to put a whole physics major when I was not confident and I had very little math background was going to be a big challenge. And so Rupesh, and his advisor said, “there’s someone else who has done this in the past. Edward Witten, he’s the father of string theory.” I thought they were pulling my leg, but comparing me to him. And they said, “well, uh, we’ll let you skip through the first two years of the physics major, if you’re able to cram all of these topics for an exam in two months at the end of the summer.” So Rupesh decided to devote his entire summer to tutoring me and mentoring me. And it was incredible. And the reason why I’m sharing this story, Curtis, is because I always wanted to pay Rupesh for all that he did for me. And he said to me that when he was growing up in Darjeeling, in like the, the mountain in India, there was this old man who used to climb up and teach him and his sisters, math, English, and the tablet, a musical instrument. And when Rupesh his family wanted to pay him, the old man insisted and said, “no, the only way you could ever pay me back is if you do this with someone else in the world, and that’s how my mission in life began to encourage and inspire other people, especially minorities or women who like myself feel attracted to STEM, but who, for some reason, feel that they cannot achieve their dreams. So after finishing Brandeis, I was accepted by the then Current Nobel prize winner, Steve Chu in physics at Stanford. And it was just incredible. This person that two years ago knew very little. I mean, even algebra was rusty was all of a sudden accepted to Stanford. So I became six years later, the first Mexican woman to get a PhD in physics from Stanford. And that’s when I realized that I had a responsibility to spread my message and help others. And since then, I, I, I did two postdocs in applied math. And physics at Columbia University and NYU at the Grant Institute. And I’ve just been working with numbers all my life. And I’ve had also these science communicator career on the side. So for example, right now, I’m the chief data scientist at Metis, which is a data science training company. And we do bootcamps, and we train corporations and businesses in, in data literacy. And in, in, you know, how to increase business insights through, through data literacy. And at the same time on the side, I, I, I’ve been hosting, co-hosting a TV show for the Discovery Channel called “Outrageous Acts of Science” that helps me exercise. The explaining of complex concepts in lay and entertaining ways. Curtis: And that’s, that’s an inspiring story. And how did you, I’m also interested in the transition you had between physics and into data science, right? And so a lot of, I mean, getting to physics is amazing. I think a lot of people would share your sentiments about, you know, some of these challenges that you had, but then there’s also this point where you’re now doing data science and data literacy, which no doubt, your physics PhD helps in that. But it is, uh, correct me if I’m wrong, but it’s, it’s probably a little bit different than if you had, you know, stayed doing research in physics. And so, so how did that transition work? Debbie: Absolutely. I completely agree with you. It was not easy. It was definitely a challenge. So the first thing is that physicists who couldn’t find jobs, which were many, many, especially after the Cold War, a lot of the departments were shrinking in the US and whatnot. So a lot of them found their way to Wall Street. So when I was looking for a job 15 years ago, they, Wall Street had a very kind of open policy where they would go to physics graduate students and interview everyone. And we were just put on the hat and the title of quant. And that’s what I did. I applied for a position in academia. I wanted to be a professor at some point, but then I realized after my postdocs that, that it could be very isolating, and it wouldn’t make me very happy. So I decided to give it a chance, and I became a quant. And so I worked in risk analysis for both a hedge fund named AQR, and then for Morgan Stanley Capital International, and I was building risk models and, and explaining them and selling them to hedge funds and banks. And yeah, I realized that what I was doing was data science, but it was only a very narrow area of data science, which was time series and working with numbers, not with images and not with audio or video, but it was certainly, you know, we were building models and we’re using data. And I realized that Wall Street, uh, I spent six years in Wall Street and I was into some of the intellectual challenges of building the models, but just being interested in money and how the stock market is doing was not my thing. Like I, I, as a child, I was very curious about the world. I was still very curious then. And so I decided to switch, and I had heard by this time I was friends with Hillary Mason and others. So at that time I realized. There’s a huge field that’s kind of being born, or it’s, it’s starting and it’s, there’s a boom for data scientists, so let me give it a try. And I remember I attended Strata, the conference, at one point, and they recorded me on video, and I, I’m embarrassed to say that I, I said something like, “well, there’s nothing new in data science. Physicists and Wall Street people have been doing it for 50 years and this and that. And then boy was I surprised when I actually ended up taking, taking a data science course. It was a four month evening course with General Assembly to basically be able to translate my mathematical and computational skills into data science. And it was really difficult. I mean, there were things in physics, for example, we do statistics, but nowhere near the depth that you need to know in order to do data science, and, you know, there were linear algebra and calculus and all those of course I knew by heart and they were advantages, but there were many, many other things like working with them. Images or, or, or words with NLP and all that, that conceptually were very difficult for me to understand. And, and what was accuracy about in a model and, and whatnot. And so after doing that, I got my first job in traditional data science. Uh, ThoughtWorks, which is a boutique consulting company that mostly does software, but they were starting their data science team. And from there, I realized that I wanted to combine data science with teaching because I really missed that academic field where I could be part of creating a curriculum and, and really kind of effecting change in, in people’s brains about how the world works and how we are inundated with data and what insights you can gain from it. And so that’s when, at another strata, my friend, Cathy O’Neil. So Cathy and I were speaking, and I said, I’m looking for another gig, but it has to be something really special where I can really have an impact. She introduced me to Jason Moss, who is the president and CEO of Metis, which is where I work now. And Jason hired me to sort of be his right hand. And since I joined Metis five years ago, I’ve created curriculum for both universities, Dublin business school, for example, as well as for the bootcamps and for corporations. I’ve been doing thought leadership. I’ve managed the team of science, uh, data, scientists, and instructor. So it’s been an amazing ride. Curtis: That’s awesome. Now I’d like to, I’d like to dive into sort of the subject matter there, the data literacy. But before we do that, I’m just curious, because I imagine some other people may have this question that is how, how does it feel now doing more data science type work instead of physics being, being that maybe physics you could say was kind of the first thing that really interested you like about the world and this kind of stuff. And now you’re doing more data science. Is it as enjoyable to you? Is it more enjoyable or maybe there’s not a comparison there, but just people that maybe are making that transition. Debbie: Absolutely. Well, I, I’m going to be honest with you. I think I will always miss doing basic research because if you have that curiosity from a very young age, I don’t think it ever goes away. My husband is actually a physics professor, so we have endless discussions about physics and what are the new discoveries and whatnot. So I keep my mind in physics, somewhat. But on the other hand, I do see that people who become professors and stay in academia end up spending quite a bit of time applying for grants and doing a little bit of department politics. So, you know, it’s not as idealized or as pure as when we are undergrads and we just spend all the time in the lab figuring things out. And so from that perspective, I do think that my personality lends itself better for working in business. And so, you know, in data science is such a vast field that you can always find ways of contributing to different projects. You can increase the literacy of a hedge fund, for example, and then seeing the aha moment, and they’re like, “okay, I no longer have to use Excel for this. And I have a Python code that allows me to automate this instead of taking four hours, it takes four minutes. That’s really cool.” But also being able to help, like at Metis, we did something called Metis for Good, where we, I took all the alumni from Metis, and we have tons of amazing, amazing alumni from our bootcamps, and I wrote to them right after an earthquake that happened in Mexico about four years ago. And I said, you know, “we, I need to build a map in real time showing the data off of where, what things are needed, where.” And a bunch of alumni helped me, and we built it on, it was shared. I forgot how many hundreds of thousands of times during the earthquake and the fact that we were able to help was just incredible. I think, you know, MediSYS corporate training has had the opportunity to train businesses in sort of taking their insights to the next level and getting data literacy to spread across the company, which is my sort of my big thing that data should not stay in silos and only the technical people comprehending what’s going on, but it should be adopted by every single person, HR, you know, the, the chief executive officers. Everyone should have a stake in the insights that are being gained by data analysis. So that has allowed me to not discover things about the universe, but definitely discover things about the world and about how people behave and how, you know, different projects evolve. And that has been equally fascinating. Curtis: And let’s talk about that a little bit ’cause a lot of companies, as you know, are struggling with this, how do we increase the data literacy of our employees? And maybe more than that, how do we put data literacy to good use so that we can make better decisions and these kinds of things. How do you approach it? How do you think about it? What are some of the key points do you think? Debbie: Yeah, so I think what’s happening is that data science as a business term is maybe 10 years old or even 15 years old. And so a lot of companies have already, uh, the novelty has worn off, and everybody’s like, “okay, so what can we do with it? It sounded so promising, like the Holy Grail, like it was going to save my PNL and everything was going to get better after I hired all these data scientists, but nobody ever communicated to those stakeholders what data science actually is, what its limitations are, what type of people should be working on data science. What kinds of data are useful? What kinds of problems can they solve? And what. The company actually enacts some of the decisions that the insights would bring about. And, and so people in industry have been frustrated, I think, with these new sexiest job of the 21st century, like Harvard business review called it. And so I think because of that, we need to sort of go back a little bit and redefine what data science is by educating every single person that works in a company. And by this, I don’t mean that everyone’s going to have to become a programmer and a data scientist, not at all, but everyone should at least have access to the ideas, to the goals and to the analysis that’s driving the decisions that the company makes. So say for example, that HR wants to know why women after maternity leave, tend to not come back to their executive positions in that company, what can they do to support them? And they try different measures and programs and they have data about it. Well, if I only have a bunch of, you know, technology people in the IT department analyzing that data, we’re probably not going to get very deep insights because we have to bring all that data to the HR department and to the women in the company who are stakeholders, because this is about them. And so when you, you allow everyone to gain access to the data, the graphs, the charts, and you collect information and feedback, then everybody has a stake in what you’re trying to accomplish. And so you’re much more aligned with the success of the program or whatever you’re you’re implementing. And when we see companies that have done this, we see, you know, a hundred percent difference in our corporate training. On the other hand, we’ve seen companies where the chief executive office doesn’t even agree on why data science is important or data literacy, and they hire a, a vendor or somebody to train their, their technical team. And then at the end of it, the technical team is trained, but they have no idea why. And why they should use it. And so every, even though they made a big investment, the, the data more data literacy did not get them anywhere. And so that’s why it’s really important to explain it, to talk data, to brief data and to have it as part of the, sort of like the blood system. That’s like spreading blood everywhere in the, in the company the same way we want to spread the message of data. Curtis: It’s a tall order. And, um, I think there’s a lot. I mean, what you’re talking about it, if you, and obviously we can’t boil this down to one or two things, right. There’s so much here, but, but maybe if you had one piece of advice or two pieces of advice that you think are maybe the most important that someone could listening to this episode today, they could take it and, and put it into action and it would help them. What, what would you say? Debbie: I would say my first piece of advice is critical thinking. Uh, Richard Feynman, a very famous physicist, used to say that it’s very easy for us to get fooled by, by what we see out there. But it’s actually a lot easier to fool ourselves. So when we are working in business or when we’re analyzing what our newspaper said about politics or COVID or whatnot, we tend to bring our own biases. And it’s very important to encourage people. To be evidence-based. To see okay, if you have a new idea for business, search for the factors or the measurements and metrics that are going to tell you that that idea may work or why it may not work, but set those parameters beforehand. Don’t say, “oh, my business idea was successful,” because three more people adopted your app or because you have to set the parameters of how many people should adopt the app for it to be successful before. And so to educate people on critical thinking and evidence-based approaches I think is, is like where I would start. And of course they’re practical, very practical examples of how to do that and how to teach people how to read charts. I actually teach a workshop on statistics and the art of deception, which is a very down to earth course, very little math involved, but I go through a lot of graphs that try to mislead the public in how, you know, the conclusions they make them sound seem bigger than they actually are. So if you teach everybody in a company to read graphs and to question the data who has the data, why, what for et cetera, I think your employers will be, employees will be much better prepared for dealing with company issues and projects. Ginette: A huge thank you to Debbie Berebichez. Feel free to reach out to her on LinkedIn or Twitter. As always check out our transcript and attributions at datacrunchcorp.com/podcast. Attributions Music “Loopster” Kevin MacLeod (incompetech.com) Licensed under Creative Commons: By Attribution 3.0 License http://creativecommons.org/licenses/by/3.0/ The post Overcoming Cultural Hurdles in Tech first appeared on Data Crunch Podcast.
25 minutes | Jan 30, 2021
Traffic Equilibrium and a PhD
Matthew Battifarano has always been interested in mobility and transportation. He talks to his intrigue with traffic equilibrium, which includes autonomous vehicles, ride hailing, bike sharing, etc.; what shaped it; and what lead him to his PhD at the Mobility Data Analytics Center at Carnegie Mellon University. Matthew Battifarano: This is something I’d never heard of before, before I started in the program. So a lot of the research I do now, and a lot of the research that’s interested in modeling new technologies like ride hailing or autonomous vehicles or bike sharing, or all of these different components of mobility that we see in cities now, a lot of them use this concept of traffic equilibrium. Ginette: I’m Ginette, Curtis: and I’m Curtis, Ginette: and you are listening to Data Crunch, Curtis: a podcast about how applied data science, machine learning, and artificial intelligence are changing the world. Ginette: Data Crunch is produced by the Data Crunch Corporation, an analytics, training, and consulting company. If you want to become the type of tech talent we talk about on our show today, you’ll need to master algorithms, machine learning concepts, computer science basics, and many other important concepts. Brilliant is a great place to start digging into these. The nice thing about Brilliant is that you can learn in bite-sized pieces at your own pace, and with a bit of consistent effort, you can tackle some really tough subjects. With 60+ courses that combine story-telling, code-writing, and interactive challenges, Brilliant helps develop the skills that are crucial to school, job interviews, and careers. Sign up for free and start learning by going to Brilliant.org slash Data Crunch, and also the first 200 people that go to that link will get 20% off the annual premium subscription. Today we chat with Matthew Battifarano, former data scientist at Bridj and current PhD student at the Mobility Data Analytics Center at Carnegie Mellon University. Matthew: I grew up in New York City, and that really shaped my view of transportation. I grew up really enjoying transportation. There’s a great transit museum out in Brooklyn that I loved going to as a kid. You could like go up in the driver’s seat of, of a bus they had there and all kinds of cool stuff. And that made a lot of sense to me, particularly growing up in that environment, like that’s how I got to school. That’s how I went everywhere that I needed to go. So from a young age and growing up there, that just made sense to me. When I went to school, went to college, like I didn’t really think about transportation. I went to the University of Chicago, which doesn’t have anything that’s applied. They’re very theory oriented. So there’s no engineering. There’s no anything that has like a real application. I studied math there. Did my undergrad there. I also, at that point, was really interested in computational neuroscience, which seems like totally unrelated to what I’m doing now. And it largely is. But the one thing that really stood out to me about it and what I, the reason why I was drawn to it is the whole field is taking a really complex biological system, the brain, which is like, if, as far as biological systems go, like that’s a really hard one to crack open. Curtis: Pretty much the top, right? Matthew: Yeah. It’s like at the top and they were using mathematical models. They’re using machine learning; they’re using all of these techniques to break down and try to understand it. And there are ways in which that’s really successful that I was very interested in. And there are also ways in which that’s really hard, and mathematic models like don’t yet give us good insight. So that’s what I was oriented towards throughout college. I spent two years in a research lab in computational neuroscience and basically learned in those two years that I didn’t really want to do that. It was, there were parts that I really liked. And I figured out that the parts that I really liked were the parts where I was figuring out how to mathematically model some process that we were investigating. And there were a lot of other components that I, the domain itself wasn’t as interesting as I had hoped it would be. And it didn’t grab me in the way that I wanted it to. So I was sort of looking for a change, and I found this startup called Bridj, which was just beginning out in Boston at the time. And this was around 2014. And their whole idea was we have all this data about how people move around a city. It’s everywhere. It’s in, it’s in Yelp. It’s in Google Maps. It’s in all these, like there’s tons of different ways, especially with the rise of smartphones, ton of different ways. We can measure mobility in cities in a way that we could never before, how can we leverage this to design a better transit infrastructure? So our focus was on these what’s now called micro transit, which is using small passenger van, sort of maybe like 12 passengers to a van and routing this dynamically. It follows, uh, this was not a new idea. It follows in the tradition of, you know, a lot of Jitney services, which exists in the US and, and more commonly throughout the world. It was trying to take that model and bring in a data layer on top of it to try to make it work better and more efficiently. So this was actually before Uber pool showed up. So when Uber pool showed up, it was great because our whole company got a lot easier to explain. On the flip side, it sounded a lot like Uber Pool. We were just like, “Oh yeah, it’s like Uber pool. But you know, now we’re dealing with, with 12 passenger vans.” And so there was positives and negatives to that. Curtis: Interesting. Was that, was that a net positive? Would you say, did it give you, or was it sort of like, why don’t I just use Uber pool? Matthew: I think it was a net positive. We, we were a startup. So, and at that point on a pretty small scale. And so just getting people used to the idea of taking out their phones and sharing a ride in some fashion, there are ways in which our platforms differed, obviously because we, you know, we were trying sort of have a more bus-like experience where it was a larger aggregation, a little less flexible than a point to point, but it would offer sort of a middle price range. So the idea of using your smartphone to call transit was an idea that I think spreading that idea was helpful to us. It also, it’s on the flip side, it’s kind of hard to compete with Uber or Lyft or these big, you know. So I was there for, uh, about two and a half years. I got hired on as to, to help with their data teams so that what we called the science team. And we had two sort of complimentary goals. The first is how do we know what people want? How do we know where people are trying to go? How are they moving around the city? So that was really the first part of that pitch of the business is, you know, we’re leveraging data about how people move. So the first thing is, “okay, let’s find the data, let’s figure out how we can extract these mobility patterns from it.” And then the second part is using that, that information. “Can we design routes that are efficient or optimal in some sense.Curtis: So, where did you source the data from? That’s usually a really hard problem in terms of like, where do you get the data? Do you have to pay for it all these kinds of things.Matthew: I’m not sure that I can. So the company itself went, ended up going under it in, in late 2014, but the, the remnants of the company were bought by an Australian company, and it exists down there. So I’m not sure exactly how much I’m allowed to say about the data sources, but it was, I would say two things. It was a variety of subscription-based data sources and also kind of more one-off like data purchasing. We tried to be really broad with this. Um, the reason behind being broad is that each data source comes with a particular set of biases. And our ultimate goal was sorta to figure out one, like, what is the total mobility pattern going on in the city, which is sort of a latent variable because you can’t directly observe it. And then as sort of like a subset of that population, who is actually going to get on a vehicle in the next month, or, you know, over a short horizon, ’cause we’re always trying to, we were again, trying to make our ridership numbers look good. So we were always trying to make decisions that would help those metrics along ’cause we were in the process of trying to get additional funding through much of our, through much of my time there. Curtis: Got it. Yep. And that’s, that’s a common problem in startups, right? It’s, it’s, uh, trying to use and analytics to prove that you’re doing something good that it’s worthwhile and to get more funding. So yeah, I get that. Matthew: Yeah. And it’s also one thing that we struggle with is the platform itself is a really nice data collection platform in the same way that Uber and Lyft have a great data collection platform. They understand a lot about their own service and about the demand that utilizes their services. They also run into problems. Of course, that data itself is biased. So there is even a tension there, but when you’re a startup and you’re just trying to introduce something to market your data is almost useless because it’s one it’s so small, and two it’s very concentrated and very biased in terms of where you’ve decided to go and how you’re marketing it. It’s very sensitive to these early on business decisions, which are of course, rapidly changing. So it’s hard to interpret that when it’s, when it’s at that early stage. Curtis: For sure. Yeah. So, so how far did you get in this process before you, you moved on to, to the, I assuming your next step was academia. Matthew: Yeah, we got pretty far, we actually successfully, there was, we had a roadmap that we were following to get toward this goal of, you know, these sort of super dynamic and, um, and convenient bus network that you could take at all hours of the day. We had some sort of milestones that we were aiming for, and we achieved a few of them. We didn’t get all the way to where we wanted to be, but we made significant steps, particularly on the optimization side. The demand forecasting side is really hard because no matter how good your methods are, if your data is not really there, then there’s only so much that you can say. And we ran into that problem pretty early on. The other thing that we realized is that, that the demand, even if you had a really good demand forecasting system, if you don’t have the ability to act on that information, then it doesn’t really matter how well you can predict demand. Right? So we sort of shifted our focus halfway through to really focusing in on the, on the optimization component. So assuming that we have some idea of where demand is, how can we create an optimal set of routes and how can we in particular, like optimize that sort of on the fly. You’re not optimizing like a fixed route. You’re optimizing a dynamic route that can change, which is a really interesting research question in its own in its own right. Sure. So we shifted focus. I spent like a lot of my last year there working on that, Curtis: That’s interesting. Was the problem you ran into there more maybe computationally, like how do we quickly take in these variables, run something against a model and then have something to use in a timeframe that that is suitable, or was it more the actual algorithms that you’re trying to design to work on the problem? Matthew: It was both, uh, which was tricky to, to manage. Uh, the idea is, of course, if you’re trying to do something dynamic, you are time limited because you need to respond somewhat quickly. Curtis: Sure.Matthew: A lot of, I think a lot of machine learning applications, and even the example, you know, demand forecasting that we were, that we were focused on for the first half of my time there, can be done in a completely offline setting. You spend a lot of computational resources. You spend a lot of time. You come up with a model. Maybe that model gets updated in the background every so often, but it’s not really being mo like . . . it’s being applied in real time. That’s the easy part. And it’s being trained sort of over longer cycles with when you’re trying to do something like what we were trying to do, where you want something that’s optimal, based on the current situation, you sort of have maybe two options. So one, you sort of figure out what are the, all the possibilities that can happen and figure out in advance what’s the best decision. For something like this problem, that really doesn’t make a lot of sense because there’s so many things that can happen. The search space is enormous. The other thing is that you have some sort of online method where you are sort of ingesting data or whatever, and using that to sort of figure out what your next move is. And so we sort of had to figure out what methods would provide us a balance between those two. What methods make sense in that context. And then also, how do we make sure that we’re able to come up with a solution or something in a constrained, in a time constrained environment? So that was both a domain question. And also an algorithmic question. Ultimately, we had to face the reality that sometimes for whatever reason, it just would not be able to finish a computation. And so there’s a question in there of what do you do? How does the application as a whole respond to failure? And that’s sort of within a much larger question of application design in general is you want resilience under failure, different components failing. So that was a really interesting intersection between a traditionally software engineering focus, or really not even software engineering, kind of . . . more like dev ops, where you’re considering the development and operation of a, of a software application and machine learning AI of what do we do when a component, when this particular component fails and how do we also at the same time minimize that or mitigate failure. Curtis: Right. So more user experience questions, right? What do we, what do we do to, to ensure that this is still useful or does something that, that can help out the user? It sounds like, um, that’s really interesting. And those kinds of problems often are, are the hairier ones. I find as I talk to people, although, you know, the modeling is not, not easy either. And I’m curious, and if you’re not at Liberty to say, that’s fine, but what kind of models did you end up using to solve these, these problems and infrastructure? Matthew: I can’t really speak to that specifically, but there’s a lot of research that exists to solve this sort of . . . This problem lives within a pretty well-known class of problems called the vehicle routing problem. And there are a ton of different variants of this problem that are aimed at solving different problems or aimed at different applications, rather. So a really sort of prototypical application of the v, of the vehicle routing problem or VRP is in logistics. So if you’re ups or FedEx, you have some depots where you have packages sitting, you’ve got a fleet of trucks, and you have a bunch of destinations. So the question is, how do you route these trucks to, to serve all of your, all of this package demand in the most efficient way possible? And there are a lot of different approaches to doing this. So, and again, depending on your operational constraints, some might be better than others. So there’s a ton of research out there on different methods that you can use. Curtis: So you didn’t have to develop anything from scratch. It sounds like there was some research you could build upon and sort of modify for your needs. Matthew: Yes. Curtis: Cool. Okay. That’s great. And so, so then how did this lead into you going back into academia? Matthew: Yeah, so about two years in, I, you know, I had been working on this demand forecasting. I had been working on this optimization engine also sort of on the side. I’ve been working with a small team that was really interested in drilling down on our operational metrics as well. So answering more short term, maybe more traditional data analytics questions about how the business was operating. This is looking at sort of more day to day metrics about business performance and, and how we might make small interventions, again, on like a day-to-day basis to improve the quality of the product. And I sort of had felt like I had gotten to a point where I could see myself in this position learning slowly on the job, but I saw sort of diminishing marginal returns on that. I was spending a lot of time implementing this optimization engine, and I was around and part of the discussion in terms of how to actually model it and how to develop it from a methodological standpoint. But I just didn’t have the, the background or the knowledge to really contribute in a fundamental way to that development. And that was something I realized I was really interested in being able to do. Curtis: Got it. Matthew: And everyone who was doing that, they had PhDs. So we had one that was, did their PhD in transportation, sort of more generally, we had another, that had their PhD in operations research and another that had their PhD in, in artificial intelligence. So between all of those perspectives, we, the, we were able to come up with what I thought was a really cool approach to that problem that I would have never been able to think of. I would never been, been able to express. One thing that was really cool is that in the process of this, I found there were parts of the method that really fit into an intuition that I already had, but I would have no way of getting from my intuition to an actual mathematical formulation or an algorithmic formulation. And that was at one point from one perspective, really cool ’cause I was like, Oh yeah, this is, this is what I was looking for. I just couldn’t express it. On the other hand, it was really frustrating because I felt like, well, what if I, if I had been able to express it, if I had had that background, I really would have been able to contribute a lot more than I did. So I started looking for academic programs that melded this view of transportation and mathematical modeling. And in that regard, going back to the computational neuroscience, that was a familiar desire for me. This is, I was looking here, we have this really complex system that no one really knows how it works. It’s a lot of individual decisions being made by individual people, and you can’t really measure it, even though we’re surrounded by it all the time. There’s a really complex system that’s very important to understand because it affects how we live every single day. And here we have some really interesting examples of how a mathematical modeling approach can really add to that understanding and can help us improve systems. So I started looking for programs and in particular professors who were taking this approach and taking this perspective of trying to bring a mathematical modeling approach in particular leveraging new sources of data that weren’t available before to understand and improve transportation and mobility, um, particularly in an urban setting. Curtis: Now, is that a, uh, a common topic? Was that like hard to find somebody that was focused on that specifically? Or is that more, uh, I don’t know. I mean, I’ve never heard about this particular sort of niche application before, so I’m curious if it was really hard to find someone focused on that or if there was some options. Matthew: That’s an interesting questions, I, I, as I’ve talked to other people as they go through their grad school applications, and I think one thing that I’ve heard and definitely I experienced is you have this idea of what you’re interested in, but you don’t necessarily have the vocabulary that the niche that you actually want is using. So, you know, if you, if you were starting your search for grad school or professors, you might type into Google, something that resembles what, how you would describe the area you’re interested in. And that might be correct, but it might be something that might be language that people don’t use in that field or in that niche. So it’s really hard to, to sort of figure out, I’ll give you a more concrete example. So what I do now has a lot to do with a particular modeling area or particular modeling method called traffic equilibrium. Curtis: Okay. Matthew: And very briefly it just basically answers the question or it models this phenomenon. If you have a bunch of people who are trying to use the road network or whatever transportation network to get from where they are to where they’re going, how do they end up using this network? And when you think about it, the use of the network depends on how everyone else is using the network. Right. Just think about like when you’re looking at Google maps and trying to figure out the shortest, the best way to get in your car from point A to point B, that’s going to depend on traffic, which depends on all the other decisions that everyone else has made. Curtis: Right. Matthew: And so traffic equilibrium is, is this sort of, sort of economic based model of how people make those decisions. This is something I’d never heard of before, before I started in the program. So a lot of the research I do now, and a lot of the research that’s interested in modeling new technologies like ride hailing or autonomous vehicles or bike sharing, or all of these different components of mobility that we see in cities now, a lot of them use this concept of traffic equilibrium. And so now I would say, when I look for other professors who are doing similar work or other labs that are doing similar work, I usually start from how are people looking at this in terms of traffic equilibrium? Curtis: Got it. Matthew: But on the, on, when I was applying, when I was doing, going through this process, I had no idea that that’s what I should be typing in to Google. Curtis: Sure. Yeah. That’s interesting. And so it’s, it’s a search problem, right? Yeah. Knowing you kind of know what you’re looking for, but you don’t know, like you say how to express it or what the vocabulary is behind it. So how did you find the program you’re in now? How did you determine that, that, that was the one that you wanted to do? Matthew: I ended up looking at, it was sort of an iterative process of, I would sort of put something out in, I ended up a lot in Google Scholar as well, looking at papers to try to figure out what’s going on. And so I started from a really basic understanding. I started and I, then I started finding what professors are doing something close to what I want, what are their papers look like, maybe find one or two papers that felt very, I felt the closest to what I wanted to do. Look at who they were citing and sort of branch out from there to try to find who’s asking what and how they’re asking it. And then try to find the, sort of maybe a full array or a fuller picture of what the field was looking and even doing that I wasn’t super successful. I originally applied to CMU with a focus on using, there’s a big focus at CMU here about sensory networks, I’m in the civil engineering department. And we’ve got a lot of people who are really focused in, on structural health monitoring, which involves putting a bunch of sensors in places and measuring things about the structure or about its use. There’s definitely, a, a, intersection between that and transportation. When you’re talking about data collection and putting sensors everywhere and understanding how a system is being used. And I sort of found CMU through that. And then only once I started digging further into that, that I find my current advisor who does nothing, has nothing to do with sensors, but was very much in line with this, this perspective that I was after of how do we combine transportation modeling and data and technology, Ginette: A big thank you to Matthew Battifarano, and as always, head to datacrunch.com/podcast for our transcripts and attributions. Attributions Music “Loopster” Kevin MacLeod (incompetech.com) Licensed under Creative Commons: By Attribution 3.0 License http://creativecommons.org/licenses/by/3.0/ The post Traffic Equilibrium and a PhD first appeared on Data Crunch Podcast.
22 minutes | Dec 31, 2020
Machine Learning and Flight with Ian Cassidy
Ian Cassidy: When you did a PCA, a principal component analysis, like, it was like beautiful. There was, like, a red circle in the middle of, you know, the blue on purchase, you know, data points. And there were the red purchase ones and they were all clustered together. It was, it was really interesting. And like the, the machine learning model had a really good time trying to predict that the ones in that red cluster where the things that people were were interested in purchasing. Ginette: I’m Ginette, Curtis: and I’m Curtis, Ginette: and you are listening to Data Crunch, Curtis: a podcast about how applied data science, machine learning, and artificial intelligence are changing the world. Ginette: Data Crunch is produced by the Data Crunch Corporation, an analytics, training, and consulting company. If you want to become the type of tech talent we talk about on our show today, you’ll need to master algorithms, machine learning concepts, computer science basics, and many other important concepts. Brilliant is a great place to start digging into these. The nice thing about Brilliant is that you can learn in bite-sized pieces at your own pace, and with a bit of consistent effort, you can tackle some really tough subjects. With 60+ courses that combine story-telling, code-writing, and interactive challenges, Brilliant helps develop the skills that are crucial to school, job interviews, and careers. Sign up for free and start learning by going to Brilliant.org slash Data Crunch, and also the first 200 people that go to that link will get 20% off the annual premium subscription. Now onto our show. We’ve waited to publish today’s episode because Covid has taken a toll on the travel industry and lots of things have changed since we recorded this episode, but there’s good information in this episode, so we don’t want to wait too long to publish it. Hopefully 2021 changes the travel industry’s fortunes and this information becomes even more applicable. So today we chat with Ian Cassidy, former senior data scientist at Upside Business Travel. Ian: I’m Ian Cassidy. And my interests are in the machine learning optimization realm, since I have experience with that from my grad school days, and a little bit about Upside is we are a travel company, travel management company. We offer a product that is no fees, 100% free. And in fact, if you spend over a hundred thousand dollars booking travel on our website, we offer a 3% cash back, as well as free customer service, 24/7, no contracts. So that’s you sign up with us, no contracts, you get all of this as soon as you sign up. We are a one-stop shop to book and manage all of your travel. In one place, we offer flights, hotels, rental cars, and we also offer expense integration and reporting for companies looking to, to manage all of their, their travelers and, and their expenses for that.Curtis: Right on. We talked before about the journey that your company has gone through, uh, to figure out how to best use data, you know, how to target and what really works with, with machine learning and things like this. So I’d love to just talk a little bit about that: where you guys started and how you guys made some decisions, what you learned along the way and what you’re, what you’re up to from a data science perspective.Ian: Yeah, sure. So, uh, you know, like you mentioned, things have changed quite a bit at Upside. We started off as a B2C company where we were targeting what we were calling do it yourself travelers. You did not have to be logged into our site in order to start doing a search and book flights or hotels. So that kind of made it interesting from a data collection perspective. We had like some unique IDs about who the people were that were doing the searching, but it was, it was largely kind of, you know, we didn’t really know much about you when you, when you were searching. So when we started, one of the main things that we were trying to improve upon was our sorting of inventory. So that’s a pretty hot topic and data science for all of e-commerce, so we started with, well, “how do we surface the inventory to the, to the user, to the customer who’s shopping for these things that they are most likely to, to purchase?”So it was a propensity model. When we first started the, the kind of the legacy sorting algorithms that they had were largely based on heuristics. They knew that in the top few tiles of things that we were presenting to the customer, we want to present the cheapest option. Maybe also the, the shortest duration option, if it was a flight for hotels would maybe the closest option to where you want it to stay. And then maybe also the, the option that, that gave Upside the most amount of profit on the back end. And what you would see was often very confusing, especially with flights, like the first option you might see might be a, a one-stop flight that left at 5:00 a.m. and had a four hour layover where you knew that there was, you know, this is DC to JFK. There’s a shuttle that runs every hour and it’s nonstop. Why the heck are they showing me this one-stop? So we came in, and we looked at what this pool of customers were searching for and ultimately buying. And thankfully we were collecting all this data. We were collecting all the shopping data, as well as what people were clicking on and what people were purchasing. So we trained some machine learning models, like right out of the box. We were training. We started looking at regression models and then moved into tree models. And obviously the tree models were performing better. And within the first two months of when I started at Upside, and this was back in September of 2017, we trained and productionalized some models that gave us a 25% increase in conversion rate, like almost overnight. We were A/B testing the old models to the new ones. And pretty much within a week, we had statistical significance that we were improving conversion rate. And, you know, obviously that leads to more and more sales and more profits. So that was great. So we did that on the, on the, uh, for our flight inventory, as well as our hotel inventory. So that’s, that’s kind of where we started, Curtis: What the heuristics we’re missing there. If I understand, right, was just some of the more nuanced things like you were saying, like, why would you give someone this flight that has this four hour layover when, you know, you can just take the shuttle or whatever. The machine learning models we’re then able to, you know, they don’t really understand that, but by seeing what people typically buy, you can catch those types of nuances. Is that what you think attributed to this 25% conversion? Or is that, are there other things . . .Ian: Yeah, absolutely. I mean, like, it was really, really neat to have this kind of data set because we treated this as a, as a binary, a binary optimization problem, where we had inventory options that were purchased or not purchased. And when we came up with some basic features about the space, so say flights, let’s take flights for, for instance, that was things like stops, price, layover, time of day, that you were leaving time of day, that you were arriving, all these things you can convert into a very nice feature space. A lot of continuous features, also some, some discrete features. And like when you did a PCA, a principal component analysis, like it was like beautiful. There was like a red circle in the middle of, you know, the blue unpurchase, you know, data points. And there were the red purchase ones and they were all clustered together. It was, it was really interesting. And like the machine learning model had a really good time, or a really easy time figuring out to try to predict that the ones in that red cluster where the things that people were, were interested in purchasing. And then we were obviously able to take the, the, the probabilities of purchase, based on the models that we were using, and sort just straight up through the probabilities. Curtis: That’s awesome. What was harder in that process? Just, I’m curious, was it more sort of conceptualizing what needed to be done? Was it more actually building the models and training them and optimizing them, or was it more the engineering side, making sure everything is orchestrated and works together and production, stuff like that? Ian: Yeah, I mean, I think the biggest thing is always, is always on like the productionization of these things and how to, how to structure your, your code and how, you know, where you’re deploying, where you’re storing these models, where you’re deploying them, how you’re deploying them. I know there, there are now things like, you know, AWS SageMaker and tools that will like help you turn machine learning models into a deployable API. But we, we were, you know, writing our own, our own, uh, services to, to handle that. And really, we were just trying to figure out where to store these models. And eventually we kind of got a little bit more complex where we were ensembling models together. So, the storage as well as like the feature engineering and transformation, just making sure that things are consistent. And, and, you know, what really helped with that is like, we had a really strong, really great engineering team that helped, helped me, you know, kind of think about like unit tests and how to make sure that you’re testing things from like an engineering perspective that, you know, when you, when you change code that you get the expected results, because, you know, the, these machine learning models are pretty black boxy. And like, if you transform a feature incorrectly, like you, you have to have some way of capturing that. Curtis: It sounds like you guys were doing this back before, uh, there were, there was, and it’s still hard. Right. But, but there seems to be better tooling now to at least facilitate this process a little bit. Ian: Yeah. I think so. We we’ve used a couple of those tools. Like, like I said, we’ve, we tried out SageMaker and, and, you know, a couple other things. Some of the models that we still in production we’re largely using services that we’ve built ourselves. So a lot of it is still custom work for us. Curtis: Okay. Yeah. That’s interesting. And then, you know, so you guys, you know, figured all that out. That’s like a big win for the business, 25%. That’s awesome. Where’d it go from there? Ian: Yeah. So from, from there, we ended up doing a bunch of A/B testing on more models, as things went along and we were acquiring more customers. They, you know, sometimes we stumbled upon some features that like helped to give, you know, a couple of percentage point improvement, but nothing as significant as the, you know, the, the early days when we went from heuristics to a machine learning model. And since then, we’ve, we’ve pivoted. We pivoted about a little over a year ago to focus on B2C and building out the, the functionality and capabilities for a travel management company that, that would focus on, on B2C type business. So we’ve kind of, we do still have some, some sorting and pricing experiments running in our production environment and in the product, but we’ve, we’ve kind of transitioned a little bit to focusing a little bit more on, on B2C. And also recently, you know, about nine months ago, we signed a huge partnership with a publicly traded travel company called Flight Center to work with them and build a B2C pro platform that they can start using themselves. So things have changed quite a bit. Um, once we signed that partnership, Curtis: How would you say the, uh, the data science work has has shifted with, with that model shift? Ian: Yeah. So one of the things that flight center brings to the table that, that helps Upside actually, you know, I talked a lot about, you know, that our product is completely free to use and we offer this cashback is that we are able to leverage Flight Center’s airline contracts, to help sell air travel and make money. So if you think about it, very large companies that travel, like IBM’s or Microsoft or Amazon, that spend like a billion dollars a year on travel, they are able to negotiate their own special corporate discounts with airlines like Delta and American. Similarly, other large companies are able to, they use large travel management companies, like Amex, Global Business Travel, or BCD travel, or even, you know, flight centers and that space as well. And the TMCs are responsible for managing their clients contracts and putting controls in place, such that their employees book the options that help save them the most amount of money, but small companies do not have this luxury. Ian: And they, you know, companies and only spend 100 or $200,000 a year on travel, or just either booking directly on, on the supplier website or using like something like Kayak or Google flights, so search for their travel. So now that we’re partnered with Flight Center, we have access to all of these airline contracts. And as you can imagine, these contracts are, they’re written by lawyers and they are, there’s a lot of text. There’s a lot of rules. And so one of the, one of the things that we’ve been challenged with and that the data science team here at Upside, and we also have a labs team, an experimental team of a couple of engineers that help us build some productional things and have helped with experimentation, is how do we, how do we programmatically extract the language that are in these contracts and match them back to, to flight inventory that is being returned by our, our shopping service and in order to present to our customers, the things that both save them the most money, because there are discounts and those contracts for the customers. And then we would make commission on the backend if we sell certain, certain flights. But the problem is there’s like hundreds, if not thousands of rules per contract. So one of the, you know, the challenging, but fun things that we’ve been doing lately is around natural language processing to kind of extract some of these contracts and, and build rules engines to, to encode these contracts and, and help, you know, our customers save money and help upside also make money through these backends commissions. Curtis: And how’s that going? I mean, that sounds like, like a really big, hairy problem to solve. So I’m curious, like, do you find that the current methods and tooling around NLP is sufficient? Or are you making some assumptions in certain areas? Or I’m just curious how, how you’re tackling that. Ian: Yeah. You know, we, we’ve kind of had to become all of us experts in the flight industry. Like there’s a lot of terminology and it’s, it’s a very complex thing. Like a, when you go to book a flight, like there’s just, there’s just a lot of complexity there that I think, you know, unless you’re like a flight travel nerd, like you, you don’t, you don’t really think about these things. So we’ve kind of had to level all of ourselves up and, and learn about the travel industry. And, you know, flight center has helped us with that because a lot of what’s baked into to this language, you have to be a subject matter expert to be able to understand it and code it. So there isn’t really a whole lot of NLP tooling out there, like out of the box that would, that would help with something like that. Largely we’ve been doing this a lot of it custom, although, you know what we’re doing now, where we’re trying to extract certain keywords and phrases that are in these lines of text, largely came out of a hackathon project that we, that we had a couple months ago where one of our engineers said, “okay, I want to take some of this text and extract some keywords and just do your basic one hot and code of, of all of the, the words that were in the corpus of all these rules.” And, you know, he had a hand, we have a hand labeled set of like a couple hundred, and we did get, actually get this nice clustering of certain rules that kind of played well together that, you know, these certain keywords meant that this rule was a certain classification. So we then decided kind of, instead of we, we were hand encoding these rules. So people were like taking a single line of text. Our engineers were taking single lines of text and then writing a function that would map a flight, a flight that someone searched for and determine, “yes, this, this rule set is satisfied by this flight or is not satisfied.” And we’ve started actually auto encoding some of these rules. And a lot of that came out of just this, this hackathon project. So that’s, that’s been pretty interesting, but again, I would say it’s largely, we’re not really building models. It’s a lot of reg X, a lot of custom rules, a lot of decision tables. And, you know, there’s a lot of engineering that goes into that. And again, I’ll just props to our, to our labs team that are helping build this all out. But from a data science perspective, what’s been interesting is like, we’ve been able to kind of step in and think about how do you test this? How do you validate that what you’re doing is correct. So we’ve kind of been, you know, approaching it and helping those guys from, from that perspective of building validation sets and thinking about how do we randomly sample from, you know, our set of rules in order to figure out, you know, how are we doing? Like, what’s the false positive rate, what’s a false negative rate. And, you know, you were asking a little bit about progress. We’ve, we’ve encoded something like a couple hundred contracts right now for, we’re starting with the big three domestic airline characters. So that’s Delta, United, American. And between the three of them, there’s something like five, 600 contracts. So where we’re making our way through those. And we’ve, we’ve really built some like great deals tooling to handle these generalized rules and, and figuring out, you know, when you, when we see this line of text, is it something that we haven’t seen before, or is it something that we can auto encode and just map to a function that we’ve written? Curtis: And can you give me some idea of, of the scale of this? ‘Cause I find a lot of times people don’t appreciate, especially from the business world, just how hard a lot of this is to do still. Can you give me a sense for kind of the size of your data science operations and how many people work on this and, and how difficult it is? Ian: Yeah, I mean, so we, we started working on encoding these contracts and building a system to handle that probably six months ago. And we have a team of three engineers full time, and then we’ve got myself and another data scientist who have kind of been serving as subject matter experts, both in terms of like, you know, testing and validation. And also in terms of like the, the airline industry and how to handle certain attributes of flights and tickets within flights and cabins and all that kind of stuff. And yeah, it’s, it’s, it’s been a huge effort and we have two custom, home-built services that are, that are extracting these parameters and storing them in databases and another, another service that’s kind of serving them up to on the front end to the user and within our product. So yeah, the scale is, it’s a lot, it’s a lot of work. Curtis: Yeah, yeah. That sounds like it. There’s so much we could dive into here, but one thing I wanted to make sure that we touch on, ’cause I just thought it was so interesting is, uh, you guys have a tool that can help you understand if your flight is going to be delayed. Right. And the way you guys approach that, some of the data you’re using, Dark Sky, things like that. I just, I just think that’s a really interesting thing. So I’d love to dive into that a little bit. Ian: Yeah, sure. So that, that’s kinda been my pet project for like the last year. We at one point at, at, at Upside wanted to really focus on what we were calling proactive analytics. So how could we do things to kind of anticipate issues and problems that a traveler might experience on a trip, right? Because when you’re traveling for business, like anything that can go wrong will go wrong. Like people, you know, you’re, you’re not, no, one’s a professional business traveler. Like it’s just something you kind of have to get through. And flight delays seemed like a, a pretty like green space and it still, it still is. It’s like, it’s hard to predict when a, when a flight is going to be delayed. There’s, there’s a couple of companies out there that are, that are doing it or claim to do it. So what we did was, so the FAA has a database that’s online. The airlines actually have to publish their outcome of, of all of their flights. If the flight, you know, exceeds a certain number, if they have the route, I think it’s like a routing thing where if they, if they run this route N number of times they have to publish the results of that, that route throughout the year to the FAA and the FAA has the state going back, I think it’s like, it’s like 10 or 15 years. So there’s like, there’s a couple of gigabytes worth of very, uh, very, it’s a very clean data set. It’s a really great data set for anyone who’s interested in data science and playing around with it to, to try. And I think even Kaggle has, has used this, um, not in a competition, but I think in one of like there, you know, learn how to be a data scientist, um, modules that they have. And one of the interesting things that we found is that there’s, uh, about 20% of flights are delayed at any time. And of those 20%, like half of them are due to weather-related delays. So that’s 10%. So that’s one in 10 flights are delayed due to weather. And at the time this was about a year ago, a couple of my, a couple of my coworkers who were engineers were like, “Oh my God, like Dark Sky. Like I have this app. It’s, it’s amazing. Like it’s super easy to use and it’s, it’s very accurate.” And it turns out they also have this great API that is fairly cheap. I think it’s, it’s like a dollar per 10,000 calls. The API will also will give you both historical data and predictions. I think they go up to, I think it’s like a week in advance, everything that you would see on the Dark Sky app on your phone through a single end point, it’s a single get request. So what I did was is I took the FAA data and merged it with the, with the weather data, for the weather-delayed flights. And just, you know, I started, you know, through the whole data pipeline of feature extraction, then looking at training a bunch of different models and hyper parameter tuning and found that it was a, there was a lot of signal there. And from a binary classification performance, I think we were getting upwards of somewhere around 85 to like 90% accuracy on, um, uh, on a balanced dataset w you know, 50/50 delays and, and non delays. And, you know, that that’s much better than, you know, if you were to flip a weighted coin, knowing apriori that 20% of flight flights are, are delayed. So it was something that we thought was super interesting and that we could help with this proactive analytics initiative. But what we ultimately ended up doing was we, we have a website that is free and open to the public, uh, labs that upside.com/delays. And you can go in, you can put your flight information in there up to three days prior to your flight, and it will give you a probability of delay between, you know, whether your flight is going to be on time 30 minutes to an hour delayed an hour to two hour delayed or two hours plus delayed. And I’ve been running a pipeline to, to retrain models, to kind of, you know, smooth out some of the seasonality, uh, for over a year now. So it gets retrained every, every week on, on, uh, historical data that is, you know, plus or minus a week of when, uh, the, the training date is. And, and we swap out the models on a, on a weekly basis. And I use it for, for my trips. Uh, my, my wife is a consultant. She uses it for her trips, um, and a bunch of people at Upside, or are still using it. And we found that it’s, it’s, it’s pretty useful. It’s a pretty, pretty cool tool to use. Ginette: A huge thanks to Ian Cassidy for being on our show. As always head to datacrunchcorp.com/podcast for our transcript and attributions. Attributions Music “Loopster” Kevin MacLeod (incompetech.com) Licensed under Creative Commons: By Attribution 3.0 License http://creativecommons.org/licenses/by/3.0/ Some of the biggest machine learning gains organizations see is when they initially implement ML models. That was the case for Upside Business Travel. Ian Cassidy, former senior data scientist for Upside, explains how machine learning has helped the company, specifically how it initially increased conversion rate by 25%. The post Machine Learning and Flight with Ian Cassidy first appeared on Data Crunch Podcast.
26 minutes | Dec 1, 2020
Implementing ML Algorithms with Ylan Kazi
Does it feel like your stakeholders aren’t open to adopting your team’s algorithms? Ylan Kazi shares his experience on how to conquer this type of problem. Ylan Kazi: And that is very important. But I think what we find, especially in larger organizations and in trying to implement these things across an enterprise, is that the relationships and the communication are really key. Without those, you can have the best algorithm in the world, but it will be impossible to implement it. Ginette: I’m Ginette, Curtis: and I’m Curtis, Ginette: and you are listening to Data Crunch, Curtis: a podcast about how applied data science, machine learning, and artificial intelligence are changing the world. Ginette: Data Crunch is produced by the Data Crunch Corporation, an analytics, training, and consulting company. If you want to become the type of tech leader we talk about on our show today, you’ll need to master algorithms, machine learning concepts, computer science basics, and many other important topics. Brilliant is a great place to start digging into these subjects. You can learn at your own pace, whether that’s brushing up on the basics of algebra, learning programming, or digging into cutting-edge topics like Neural Networks. Brilliant is a website, and app, that makes learning accessible and fun. Their approach is based on problem-solving and active learning. Their courses are laid out like a story, and broken down into pieces so that you can tackle them a little bit at a time. Sign up for free and start learning by going to Brilliant.org slash Data Crunch, and also the first 200 people that go to that link will get 20% off the annual premium subscription. Now on to our show. Today we chat with Ylan Kazi, VP of data science and machine learning for United Health group. Ylan: I’ve always been in healthcare analytics, for most of my career, but I would say I really stumbled into data science and machine learning a few years ago. My background is actually in healthcare administration and I was supposed to go the healthcare administrator route, but I ended up going into healthcare consulting. So I started off doing healthcare consulting with electronic medical records. And then transitioned my way into healthcare management consulting. And from there, I worked at Target Pharmacy within their healthcare division, for three years. And then I transitioned over to United Health Group. Originally when I started on my team close to four years ago, it was mainly an advanced analytics team. Pretty heavy into SAS and SQL. Then once we started to see the power of predictive analytics, really transitioned the team into more of a data science capacity. Curtis: And you said that there was a point where you saw the value of predictive analytics and you switched your team. I’m curious, because the transition points are interesting. What was it that made you say, “Hey, we should start doing this more”? Ylan: The biggest thing was we were giving insights to many of our business partners that we work with. So my team is embedded side-by-side with the business and we were looking backwards almost. They’re looking in the past and giving those insights. But what it wasn’t doing is potentially driving future action. So really the power of predictive analytics and just the power of machine learning, in our case, is being able to predict human behavior. And by doing that, we were finding that we could be a lot more proactive and really help out with some of these severe health conditions that our members have. So that was the ‘aha’ moment that we had. Curtis: Got it. That’s awesome. And so now that you’re doing some of these things, can you give us a, maybe an overview of the kinds of things you’re looking to predict or are predicting and maybe the impact that has on people and in your industry? Ylan: Sure. So we focus on basically improving health outcomes for our members. And what that means at a more tactical level is, figuring out which members have chronic diseases. So things like diabetes or heart disease or cholesterol issues- there’s quite a few more, but those are the common ones that affect a lot of people. Not only our members, but even in the U S and around the world. And, these diseases are very costly over the long term. So if you take somebody with diabetes, if it’s not managed correctly, that person can end up going to the hospital. They can end up, just in a lot of bad circumstances. Curtis: Got it. That’s really interesting. And what kinds of accuracies are you guys gunning for here? Because, oftentimes if, if a model is even 60% accurate, it’s better than guessing, But I’m assuming in the healthcare industry, you need to have a higher threshold of what you need. What kind of stuff are you looking at there? Ylan: Sure. I would say depending on our model and depending on the different disease stats, we’re looking at anywhere from 85 to 90%, which is pretty high when you think about it. You’re trying to predict very complex diseases and severities and really what the member is going to do and how they’re going to respond to an intervention. So we, we try to really prioritize having a very high level of accuracy. The other reason why we do that is because many times we’re working directly with different providers and different clinicians – think doctors, nurses, pharmacists – and there really is a higher bar or a higher standard. Because if a provider doesn’t trust a model, they’re not going to use it. And so we also have to provide a pretty big level of detail to our providers and really try to educate them on how they can best utilize the predictive scoring. Curtis: And has there been a lot of pushback on that and people not wanting to trust the algorithm or have people generally seen the value in it? Ylan: You know what I would say? I’d say initially there was pushback. Luckily, we had created some pretty strong relationships, not only with our business stakeholders, but also with our provider stakeholders. But there always is that hesitation, because the first question we’re all aspiring to get is- okay, you predicted that the member was going to do this. Why? And when you’re using traditional machine learning algorithms, they have a pretty high level of explainability, but when you start to use the more complex ones, or if you even start using things like neural networks, it’s very challenging to have that level of explainability. And that can really create mistrust because people are a lot more willing to trust a human than a machine or an algorithm. And they’re even more trusting of the human. In fact, human makes mistakes versus if an algorithm makes mistakes. So it’s really being as transparent as possible and showing the value of these predictions, not just, for one member or for a few hundred, but over millions of members. And I think that was something we learned, slowly, but we found that the more transparent we were, and the more that we could partner with our providers, the more effective these models were. Curtis: Got it. That’s interesting. And can you give me just a concrete example so that we can understand maybe on a personal level, how these predictions affect maybe someone in their life, how it can help them? Ylan: Sure. So if we take a member that has diabetes, for instance, depending on the severity, this person is going to be on one or more medications. And what we really can do is predict over the course of a year, over the course of a few years, how this member is really going to progress in their disease state. So the more that they can take their medications or see a physician, generally the better off they’re going to be. And they’re also going to be able to control their diabetes much better. We’ll find that there are a lot of different barriers to having someone manage their diabetes. And we try to utilize some of that information within our modeling, to better inform our business stakeholders as well as our providers stakeholders. Curtis: Once you predict that someone then has, or someone needs an intervention. how have you found it works to actually help them do the intervention, does the data play a part of that? Or is the data just there to let the provider know that something needs to be done and then they can handle that? Or I guess what I’m getting at is the data helpful and actually helping a patient take action? Or is it more the data informed someone that then tells the patient they should be doing something? Ylan: It’s more around the informed piece. So we don’t want, we don’t want to go in the direction of just using one of our predictions to solely drive any type of care, but it’s more of a, it’s more of a tool that our stakeholders can use, in addition to what they already have. So at the end of the day, we’re wanting to really inform, with, let’s just say our physicians in this case, which members that they need to focus on more. but we’re really relying on that physician to use their clinical judgment, and really provide that care. So it’s more of a, it’s more of a partnership, instead of a replacement. Curtis: So instead of direct to the patients, it goes through the care providers and they can help. That’s really cool. Seeing that model a lot of other places as well, it seems to work and it seems like you guys are having success with it. Now we’re going to stray a little bit away from the details of the models and things that you’re running. But, you’ve had lots of experience taking these models from an initial idea, right? A proof of concept and taking that all the way or taking a proof of concept, doing the whole thing, making it successful, and then actually implementing it in a company, which there’s a lot of steps there. So I’d really love to dive in to that process, to help our audience members understand, where are the pitfalls? How do I succeed with this? How do I make a whole project successful? Ylan: Sure. I think that many, many people in our industry and, as people are becoming data scientists, there’s such a strong focus on technical development, knowing the right languages, understanding the algorithms, having the math background, et cetera. And that is very important. But I think what we find, especially in larger organizations and in trying to implement these things across an enterprise, is that the relationships and the communication are really key. Without those, you can have the best algorithm in the world, but it will be impossible to implement it. And I think that was something that was very eye-opening for me. And it’s also something that, as I’ve talked to my peers, within the healthcare industry, and then even within the broader industry, that’s a common pain point. We have a great team, very talented. They create great models, but when we try to actually get them implemented and we want our company to use them, that’s where many of them fail. And, from my standpoint, I learned pretty early on that I needed to do a better job of creating more of these relationships, maintaining the relationships, and then also showing, the value of these models. That’s a huge piece if it’s, if it’s not going to, in our case, improve patient outcomes, or if it’s not going to have a positive ROI, It’s going to be very difficult to convince people to use it. So that’s, that’s a big part of it. And then I would say the, the other thing that comes to mind is, many people are used to doing something that works really well, and it can be very challenging to convince them, to try something new that, could even outperform their current method, but because their method is working so well they can be very hesitant to transition to the new way. Curtis: How do we do that? You’ve had success now in walking this process, what kind of advice, or what kind of things would you tell people who are trying to do this and maybe running into the same problem? People are people that want to implement this. It works better, but they’re comfortable with what they have. how do you go about, making that change? Ylan: The way that I started was really creating a business case. So finding a business problem or a business challenge, and then figuring out if machine learning could be applied to it and in doing this, you don’t just create one business case. You can identify 5, 10, 20 different business cases, but having, and starting with the portfolio. And then from there, really determining what the critical eye, what one or two business cases out of all of those would warrant machine learning, because, contrary to what I think many people outside the industry think it can’t be applied to everything. It’s really a very specific tool. And we’re wanting to find where will it provide the most value? I think starting there. And then, the relationship piece it’s really bringing on, whoever is going to either be a part of the solution or be effected by the solution, bringing them on very early on and getting their feedback. It’s always fun to, disrupt and to innovate and be that person in charge of it. But, we’ve all been on the other side where somebody else is doing the disruption and the innovation, and it changes up what we do in terms of our work or our role. So it’s going to be very, empathetic from that point. And then I would say the other piece is, bringing in the subject matter experts. So with our, with any of the modeling that we do or that people do in other companies, the data scientists will eventually develop that subject matter expertise, but in our case, I can’t expect them to be a doctor or a nurse. So it’s very useful and also, very eye opening to include subject matter experts early on. Curtis: A couple of questions there. One is, you mentioned building our business cases and, how do you go about identifying good business cases where machine learning can have an impact? Are there certain criteria or features that you look for in the business that kind of say- okay, yeah, this would be a really good machine learning problem to solve that could have a high impact. I think again, a lot of people may maybe trying to do this, right? The ideation phase of what could we even do with machine learning. What’s a good case? What’s a bad case? Are there things that help you do that? Ylan: Sure. I think the biggest way that we do that once we create our portfolio of business cases is just prioritizing them on which business cases are going to help the most people, which business cases are going to have the greatest return on investment. Because if a model is going to help a handful of people, and it’s only going to add $20,000 in value, probably doesn’t make sense to spend three months on it. But if it’s going to potentially help millions of people and create millions of dollars in value, that is something that peaks our interest and where we’ll go a little bit more in depth on to really, really scope it out. Now, depending on the industry that you’re in, obviously we’re in healthcare. We’re all about our patients and our members. But I think that same type of prioritization can be applied, in different types of organizations. And I’m sure there’ll be more revenue focused or profit driven, but that seems to be one of the best ways to do it. So that’s normally how we start. I would say the only other consideration that has come up a few times is if there’s a business problem and a lot of traditional analytics have been applied to it. But the problem is still there. That would indicate that, obviously the current solution is not working. Could machine learning be applied to it and, really help improve the business challenge. Curtis: You also mentioned the importance of, domain expertise, right? And there’s a couple, there’s two different ways people approach this, that I’ve heard. One is they hire the data scientists and the machine learning engineers that have that domain expertise already, or they hire people with that domain expertise that are analytically inclined and train them on the data science, or you just hire someone that’s really good at data science and then bring in subject matter experts to work with them. Have you had experience with both of those models or have an opinion on which one works better? Maybe both of them work or what are your thoughts there? Ylan: I’ve had, I would say I’ve had more experience on having the subject matter experts. So having the data scientist who has a healthcare background, and I think the benefit to that is that when they first joined your team, you’re not having to onboard them through, for both your processes as well as healthcare knowledge. So they’re able to onboard much quicker, and really get into the details much quicker. I have had, a few data scientists on my team come from outside of healthcare and it’s not necessarily a bad thing, but it’s different. So you’re having to take him down to training paths, wanting your processes, and then one of the other on healthcare knowledge, that model though can also be advantageous. So bringing in a data scientist that doesn’t have the subject matter expertise, because many times they’re coming from a view, beginner type view, so they can ask very simple questions that many times, if you’ve been in the industry, you won’t even think about asking them. So I would say that there’s in an ideal world, whatever industry that you’re in, you would have a few data scientists on your team that have experience in the industry, but at the same time, you would have at least a few of them that can bring, a fresh worldview to your team and really ask, some of the simple questions that people in the industry would not have thought. Curtis: I’m curious if there’s any experiences that come to your mind where someone like that had been on your team and they asked some question, and you’re like, “Oh, that’s like adolescent field”, but then it sparked an idea. Right? Has that, is there a use case where that has happened? Ylan: I’m trying to think of the, trying to think of the best one. I would say more generally if I have to think of, just a few of the different instances where it occurred, generally it’s around speed, right? So when you’re doing machine learning in a healthcare organization, it’s one of the most highly regulated industries in the world. And there are so many rules and regulations and patient protections in place that have to be followed. And I think what can occur is that things can move so slowly, because people are wanting to obviously stay compliant to it and make sure they’re not running a follower of any rules and regulations, which is a great thing that’s the last thing you’d want it to. But that can almost go too far sometimes, especially when you’re doing machine learning experiments, or you’re doing initial test cases. And I think what I’ve found is that it’s been nice to see with my data scientists that have come outside of the industry, how fast they can move and how, how they can apply that even within a very heavily regulated environment. So from that standpoint, it’s really around speed. And how do you be very efficient, even when you’re given a lot of constraints and when you’re having to work across the enterprise. Curtis: Now you mentioned something when we talked before that I wanted to touch on here and get your thought on. You mentioned that when you’re trying to explain machine learning solutions, too, you’re trying to get input from or buy in from other leaders. There’s two levels of the spectrum, right? Sometimes people think of AI as the sky net, that’s going to come in and kill everyone. Some people think of it as just like some basic analytics. But really they should be thinking of it in the middle of that. Is there a way that you help non-technical business leaders understand what machine learning is, how it can help them and things like this? Ylan: Yeah, that’s where I would say I spend a lot of my time actually. And, I didn’t, if I think back to a few years ago and just the evolution of my team, that’s something that I did not spend enough time on initially. And I think I really underestimated that piece of how long it would take, because what you’re finding is as you create the relationships and you maintain them with all of your different stakeholders, it’s very easy, because I’ve been in the field for a while, to assume that people have the same level of knowledge that you do. And they really don’t. And I think, especially with our stakeholders that are not technical, it really starts by educating them and showing them what is, what is AI, what is machine learning? What can it do and what can’t it do. Because they’re getting bombarded, by vendors, by marketing about, oh, AI is going to cure everything or, even the scariest stuff, like you mentioned about Skynet, it’s going to, it’s going to take over humanity and humans won’t be here anymore. So it’s the education piece. And I think they use the term evangelism, right? So data science evangelism, but it’s showing the value, but also being very, very rational about it. And not being alarmist. And I think that has helped immensely because then, your stakeholders are able to ask, sometimes very basic questions, but they feel comfortable to ask them because you’ve created a safe space. I think many times people get, they’re afraid to ask a question because it is so basic and people won’t think that they’re smart, but it’s very important that they understand what machine learning is, how it can be used. And the fact that it’s one of many tools. Curtis: And how do you keep up with that? The space moves so fast and there’s so many things going on. New research is coming out every week, almost. How do you as a practitioner kind of keep on top of what can machine learning actually do and how is it moving forward and how can I take advantage of these new, developments? How do you stay on top of it all? Ylan: Really the biggest way is just reading. So reading about some of these new developments, looking at newer research papers as well, I find that can be very helpful. I do actually try to avoid reading about artificial intelligence or machine learning in the news. Like just the regular news, because many times it’s either alarmist, which is not helpful, or it is not interpreted correctly. It’s almost as if the reporter took a snippet and then tried to expand it and just, not make it relevant. Sure. So that’s always a challenge- getting bombarded with it, but, definitely research papers. And then, I think with just my team in general, all of us are trying to stay, as updated as we can given our fast moving. So I also look to my team and, if there are new developments that could help us out or just something that people find interesting, we do knowledge shares across our team. Curtis: Are there certain places you go to look that you would recommend to people like, yeah, this is like a good source, where you can find legitimate information? Ylan: One of the best ones, that I’ve found, it’s a little bit more, detailed in depth, but it’s called archive. arXiv. And there, I’d say pretty much every day, there are new papers being published on that site. It’s very easily searchable as well. So if you have a specific subtopic that you’re looking for, you can probably find it on there. I would say, outside of that, in terms of the more, maybe some of the more major publications, I’d say even things like nature, the nature of publication every once in a while, they’ll have some articles around machine learning or AI. I’m trying to think if there are any others- national geographic here and there. Yeah. I can’t think of any other major ones. I would say avoid any of the major newspapers, right? Curtis: You mentioned arXiv, which is a great resource, maybe a little bit more, it’s you know, research papers, right? So it’s fairly technical. Have you guys found, the value from looking at arXiv. Has it been more to expand your mind on what the technology is doing or have you also been able to take certain papers and actually implement those models into things that you’re doing? I’m just curious what the value is you extract from there for you and your team? Ylan: I think the biggest value is, testing and experimentation. Some of the papers that are written on arXiv. Those solutions to actually implement them. We’re probably still three to five years out, just in general with some of them, but it does help to really spur a higher level of creativity. And I think when you see something that somebody else has done, all of a sudden, it makes it a lot less intimidating to try and implement that yourself versus being that first mover and having all that uncertainty. So it can actually create more confidence in your team, to get more creative and to really push the limits of innovation. And I’ve found that to actually be one of the best parts of reading some of these papers and doing some of these experiments. Curtis: So we’re coming up on time here. I want to leave you with the last words. If there’s anything you feel like we’ve missed, you feel is important that you’d like to share with the audience or even how to get contact with you or your company. I’ll let you, I’ll let you take it. Ylan: I would say just in our discussions that we’ve had, artificial intelligence, machine learning, it’s really going to be impactful in the future. It’s going to fundamentally change how we do business, how we work. It’s going to change every single industry and it’s really going to become infused in all industries. So it’s a very impactful change. And, it’s just going to be amazing to see what people are going to create and how it’s going to be used for good. I think the flip side of it though, is, with any technology, any technology is amoral, right? It’s really up to how people use it. Machine learning can also be used very, very unethically or very dangerously. And so that’s also something that we have to keep in mind. One of the things that has not been discussed enough just in the entire industry is machinery ethics. And I think that this needs to become more front and center because it really provides a good framework of what we should and should not be doing with machine learning. Curtis: Agreed. There’s some inroads there, but it is definitely not enough yet. Thank you so much for being here. This has been a really great episode, I think, and people will appreciate hearing your expertise. You’ve done a lot of interesting things. Ginette: A huge thanks to Ylan Kazi for being on our show. As always head to datacrunchcorp.com/podcast for our transcript and attributions. Attributions Music “Loopster” Kevin MacLeod (incompetech.com) Licensed under Creative Commons: By Attribution 3.0 License http://creativecommons.org/licenses/by/3.0/ The post Implementing ML Algorithms with Ylan Kazi first appeared on Data Crunch Podcast.
19 minutes | Oct 31, 2020
Hiring Top Tech Talent
Hiring top tech talent is hard, especially when these people are in high demand. So how can you build your tech team? HR and hiring experts Laura and Theo talk about their process. Laura Ianuly: I think the most important thing, the best advice I could give a hiring manager is making sure that they understand what they’re looking for. They’ve defined it, and they understand it. Secondly, being certain that they recognize it when they see it. And then thirdly, they act. So the failure to do those three things well is going to have an inferior recruitment process. And it’s going to impede your ability to build the team as quickly as you need to. Ginette: I’m Ginette, Curtis: and I’m Curtis, Ginette: and you are listening to Data Crunch, Curtis: a podcast about how applied data science, machine learning, and artificial intelligence are changing the world. Ginette: Data Crunch is produced by the Data Crunch Corporation, an analytics, training, and consulting company. Ginette: If you want to become the type of tech talent we talk about on our show today, you’ll need to master algorithms, machine learning concepts, computer science basics, and many other important concepts. Brilliant is a great place to start digging into these. The nice thing about Brilliant is that you can learn in bite-sized pieces at your own pace, and with a bit of consistent effort, you can tackle some really tough subjects. Their courses have storytelling, code-writing, and interactive challenges, which makes them entertaining, challenging, and educational. Sign up for free and start learning by going to Brilliant.org slash Data Crunch, and also the first 200 people that go to that link will get 20% off the annual premium subscription. Ginette: Now onto today’s show. Laura: I’m Laura Ianuly, the founder and CEO of Ianuly talent accelerators. We provide HR and recruitment strategy for venture-backed startup companies across ad tech, FinTech, health tech. We’re based in Seattle, Washington, and in New York. Prior to founding this business, I was the global head of HR and recruiting at DoubleClick. I joined as employee 70, 75. And when I left, we had over 1,500 employees and had gone public. And at the time it was the most successful IPO in New York. Curtis: That’s awesome. And how about you, Theo? Theo: Uh, my name’s Theo Ianuly. I’m the COO, CMO. I’m based out of the Seattle office. And three years ago, Laura brought the opportunity to me because Laura was growing at a rate where she needed help to continue to reach new audiences and reach new verticals in the startup space. So in the past three years, we’ve grown, and we’ve started to work with startups not only in Seattle, but New York and beyond.Curtis: That’s awesome. And so you guys see a lot of startups. I’m assuming a lot of these startups are in the tech sector. They need data teams. They need people who can do machine learning and data science and all these kinds of things. And the word that people hear is that there’s a big talent crunch with these kinds of people. They’re hard to hire. They’re hard to keep, and you guys live in this space. So, so maybe you can give us a street view of what this looks like practically for companies and how they’re navigating. Laura: Absolutely. I mean, there is definitely a war on talent. I think anybody who’s looking for a job feels it because they’re bombarded with calls from prospective clients needing to build their teams or recruiters, and they just have, you know, the pick of the litter. And I also think from the client side, it’s just a real pain in the neck to build their teams because there’s so much, there’s, there’s so much demand in such little supply for really great top talent out there. If you look today at what’s happening in supply and demand specifically for data science and machine learning talent. I mean, today, there are 27,000 plus open data science roles in the US, and there’s less than 10,000 people looking for jobs that are data scientists, and also for machine learning/AI, in the U S today there’s over 50,000 openings for machine learning/AI talent. And there are only 3,600 people looking. So that’s the problem that we’re in right now.Curtis: Sure. So, given that, that’s the landscape, it’s difficult to find these people, how do you approach it? How do you help people find talent, hire talent? You know, where do you start?Laura: A couple of different approaches? Okay. The first thing that we will always do is go out and meet with our clients. And because we’re working with startups, oftentimes when we meet with the VCs to get a better understanding as to why they made the investment in this idea, in this product, in this service. So oftentimes, every time we have a conversation with a candidate for one of our clients, it’s an upsell. It’s not as simple as saying, “I’ve got an, a data science job at Google, or I’ve got a data science job at Oracle.” So every single discussion is an upsell. So we’re not just selling the product or the service, but we also need to sell the management team, the executive team, their track record of taking a business public of successfully championing an exit, whether it’s an acquisition or going public. That’s how we, that’s exactly how we start. We have deep-dive discussions in our kickoff calls with clients to find out exactly what our hiring managers looking for, data science, machine learning, AI, that’s a function it’s new, it’s growing. Everybody wants to incorporate that technology into their product, but what’s the product? Is it a, you know, a B2C product? Is it a B2B product? Is it an e-commerce product? Is it direct to consumer? Understanding that the engineers and the data scientists that they’re looking for have an understanding of the product, the fundamental product and the industry it plays in is important. Curtis: Right, so domain knowledge you’re finding is, is very important for, to the success of these people, joining the teams. Is that correct? Laura: Exactly. That’s exactly it. Curtis: Okay. So domain, knowledge is one thing you kind of look for, and when you are building out these teams, I know everyone has different products, different industries, but is there a typical structure you look for in starting to hire a data team? Meaning would you go for someone who’s maybe a really good communicator and a good leader first, and then hire the, you know, a data engineer under them and so forth? Or how do you structure this well? Laura: So when we’re brought to a company, usually there’s somebody, somebody working in the data function. A lot of our CTOs, a lot of our founders have technical backgrounds. So at an initial standpoint, there’s a point person, whether they’re contract, whether they’re full-time, whether they’re a consultant or advisor, somebody’s in that role. When you’re building out a team, a lot of critical decisions get made. Are you going to hire your team off shore? Or you’re going to hire them here? Are you going to be able to compete with top talent, right within your marketplace? Ninety-nine percent of the clients, I think, start from the top down, they look for a strong technologist that also knows how to build teams and distribute workloads.Curtis: Okay. So that’s kind of the first one. And then what does the process look like? So you made the decision to make a hire here. You’re kind of looking for these people. There’s a lot of ways you could approach this. Recruiters are throwing stuff up on Indeed. And then you’re looking at GitHub profiles and doing the interviews. What questions do you ask? Can you take us through the process of doing this well? Laura: Sure. Um, the best hiring managers are the ones that make recruitment a key strategic initiative for their business. It can’t be something that you try to do on the side. It’s, as we said earlier, it’s just too competitive. So the right way to go about it is to set up a weekly touch base meeting with our team. We are on the hook to deliver the top talent that our team has recruited since last week’s touch base recruiting meeting. We pitched the top candidates to the hiring manager, and they start the interviewing process. Next week, we start to debrief them on the people that they interviewed, and it’s a constant force ranking of the people that they interviewed. And if there’s a lot of . . . there’s a lot of self-discovery that goes on from the hiring managers. And sometimes they might be pivoting or redefining or asking us to look in a different area. So we continue to refine the process of what we’re looking for throughout these weekly touch-base meetings, so that we are targeted and honing in more on what the client’s looking for. Once we identify candidates, the hiring manager is interested in, they certainly will want them to meet people on their team to see if they fit, if they can collaborate, they want additional inputs. Sometimes these interviews involve a case study. Sometimes they involve a technical test exam, but more often than not, our clients are not looking only at their technical skills. They’re looking at their ability to program and code and contribute to a team of people doing the same activity. Curtis: Interesting. And how do they, how do they effectively judge that that’s a, it’s sort of a mix of hard and soft skills. How do they go about figuring out who’s good at that?Laura: It’s process of elimination. Like a lot of hiring manager . . . there’s a great diversity, in my opinion, of hiring managers in their level of experience in hiring. I think the most important thing, the best advice I could give a hiring manager is making sure that they understand what they’re looking for. They’ve defined it, and they understand it. Secondly, being certain that they recognize it when they see it. And then thirdly, they act. So the failure to do those three things well is going to have an inferior recruitment process. And it’s going to impede your ability to build the team as quickly as you need to. Curtis: And maybe on the flip side, too, it’s an interesting question. If you’re someone who is looking for a job, obviously you have a lot of opportunity, but what is the most important for them? Is it, “I have a really good GitHub profile.” Is it, “I’ve been to an accredited university.” Is it I’ve taken these certifications online? What’s the most important thing they can bring to the table?Laura: Oh, I love that question. I mean, we, my team and I, we believe that we’re like agents to the tech talent out there. We stay very close to top tech talent, and we want to be the people that they refer to over and over when they’re conducting a job search. I mean, Michael Jordan wouldn’t do a deal with Nike without an agent. And right now the supply and demand has made these top tech people in the States like the Michael Jordans. So I think it’s very important for these top talent technologist to have representation, representation that knows the client knows what they’re looking for. Preferably has a history with the client, has placed other people into that company and can reference what those new hire experiences have been. And that can just prep the candidate and debrief the candidate. Failure to do that is not going to align for success. You know, these kinds of deals, they just don’t kind of roll over the finish line on their own. They need to be managed. And also you’ve got to declare . . . if you’re a candidate and you’re going on an interview and you know what you’re looking for when you’re in that interview with a client, declare that you’re interested, let them know this is something that you want to do. Let them know that you’ve interviewed with a bunch of other companies, but this is looking like the number one that you’re interested in doing. It’s not inherently a part of the personality of an engineer that might be a little bit more cerebral and introverted. So we try to encourage them to do that. Curtis: Right. Show, show your interest. I was reading an article the other day about the university degree is dead or whatever for the tech space. Do you feel like that is starting to be more true or still an important aspect?Theo: When we’re in the process and we’re, we’re out there looking for candidates, it really depends on the hiring managers and what they’re, what the attributes are that they’re looking for. So sometimes we’ll find candidates who have a wealth of startup experience, and that brings its own value to the table. Other times, when we do have hiring managers that are very focused on the educational aspect of the candidate, and then we might have another client partner that is really focused on, did that candidate have experience with one of the big companies that are really pushing the envelope on machine learning and AI. Is it Google? Is it Amazon? Is it Oracle? Or Microsoft? So it really depends on the hiring managers focus. Curtis: So you kind of see a mix still. So, so once you have, you know, you’ve gone through this whole process, you’ve hired some great people to work and help you in this space. How do you then retain them? Because one of the other problems I often hear is people don’t stay in jobs ’cause again, like you’re saying, they have all of these recruiters constantly trying to pull them away to other opportunities. How do you keep talent at your company? Laura: It’s a great question. So I think that companies are in different modes with their tech, like all companies or in different modes. So some are in fast growth mode and some are a little bit more in maintenance mode. The world that we operate in is super fast growth mode, right? Because they’re trying to really get there, get their product out, get a user-base on board. So different technology candidates, kind of excel at different phases of the process. There’s some people that are just amazing with companies that are going from, you know, a million to 25 million in revenues. And if the company gets big, like over 200 people, some technologists are just like, they just want to go right back into the early stage. So there’s different stages of companies. And in the stage that we focus on is that earlier stage. And so we’re constantly looking for candidates that have experience in that stage. So the retention question that you ask, what I get a lot is, if a technologist has worked at a company for two or three years, and that company has really grown and been successful, we’ll get a call saying, “find me exactly what I’m in now, but three years ago, I want to do it again ’cause that’s a fees that I love. Curtis: So it even depends on the phase of the company. People are really good at certain, you know, maintaining or building or these kinds of things. Laura: Exactly. Curtis: That’s awesome. I’d love to hear if you could share with us one or two concrete examples of companies you’ve actually worked with that have gone through this process so we can sort of see how it worked for them. Laura: So we had a company 33 across that we worked with. They were a technology publisher, SAAS based tool. They were on a massive growth trajectory based here in New York. And their CTO was based in Sunnyvale, California, like a super sharp guy with a lot of experience. And we took the team from about six or seven to over 25 in less than a year. They were constantly, um, this is what I really respected about this hiring manager. He would meet with me quarterly to present to me the product roadmap so that my team and I understood the contributions that the people that they hired through us were making and how they were continuing to grow in the kind of new skill sets that they would need throughout different phases of the company. And it was successful. I mean, we had four full-time recruiters on my team looking for people. We were successfully able to convince them to consider hiring remote talent. That was a significant milestone and provided them with the ability to grow faster than they would have been otherwise, I don’t know the details of exactly how, but this CTO had an exceptional retention on his team. People were not leaving. People were constantly growing in their role. People were rotating in jobs, um, on the tech team, it just was a very, it was a, just a, a high touch point, I think, collaboration between us and the client. It was just, it was great. It was a good situation. Would you like me to share a story from the other side, like from the candidate side? Curtis: Sure. Yeah, that’d be great. Great.Laura: So there was a candidate, a number of years ago that I hired into a CTO role for a startup company. And we worked together within this company for about a year, and he did some great things. The company grew quickly and we both left at about the same time, and he was looking to go back again to a very early stage company to start to pull together the roadmap and develop out the engineering team. So this individual over the past five or six years has not only just been a client and then a hiring manager, a candidate rather I meant to say, and then a hiring manager, but they went back to being a candidate again. And just two months ago, he’s now a hiring manager. So it’s interesting when you develop that relationship, because I know when he says, “I need somebody, who’s got like some street smarts. I don’t care about education, but they’ve got to have worked on a team successfully.” So I kind of know him and know what he means and it makes, makes it a lot easier. Ginette: Thank you to, Laura and Theo, for being on the show today and as always go to data, crunch Corp com slash podcast for our transcript and attributions. The post Hiring Top Tech Talent first appeared on Data Crunch Podcast.
24 minutes | Sep 30, 2020
Making Data Assets Profitable with VDC
21 minutes | Aug 27, 2020
Machine Learning with Max Sklar
31 minutes | Jul 30, 2020
Think Differently with Graph Databases
30 minutes | Jul 16, 2020
Data, Epidemiology, and Public Health
With recent events being what they are, epidemiology has come into the spotlight. What do epidemiologists do and how does data shape their everyday experience? Sitara and Mee-a from “Donuts and Data” fill us in. Ginette: I’m Ginette, Curtis: and I’m Curtis, Ginette: and you are listening to Data Crunch, Curtis: a podcast about how applied data science, machine learning, and artificial intelligence are changing the world. Ginette: Data crunch is produced by the Data Crunch Corporation, an analytics training and consulting company. Many people are on the lookout for online math and science resources right now, particularly data and statistics courses, and whether you’re a student looking to get ahead, a professional brushing up on cutting-edge topics, or someone who just wants to use this time to understand the world better, you should check out Brilliant. Brilliant’s thought-provoking math, science, and computer science content helps guide you to mastery by taking complex concepts and breaking them up into bite-sized understandable chunks. You’ll start by having fun with their interactive explorations, over time you’ll be amazed at what you can accomplish. Sign up for free and start learning by going to Brilliant.org slash Data Crunch, and also the first 200 people that go to that link will get 20% off the annual premium subscription. Now onto the show. Curtis: I’d like to welcome Sitara and Mee-a from the Instagram account Donuts and Data to talk to us today. I guess let’s just have you guys introduce yourselves, as opposed to me trying to introduce you cause you know what you do better than I do. So maybe we just have some introductions. Sitara: So I’m Sitara one half of Donuts and Data. I’m a PhD student in epidemiology at the University of Texas Health Science Center. I’m also a research assistant in a lab that I work in. Mee-a: And I’m Mee-a. I am an infectious disease epidemiologist that works in the public sector. I actually met Sitara through the lab that she’s currently working in. Curtis: Nice. And I’m excited to have you guys on. I just, I think epidemiology is a really interesting space, especially with what, you know, with what’s going on now with COVID. I think it’s more pertinent than it ever has been. Not that it ever hasn’t been pertinent, but maybe it’s more top of mind for people. So I’d love maybe just to have you guys level set with everybody, like what is epidemiology. There’s probably some confusion about what that is and maybe how you guys got into it. And then we can get into what your day to day is and, and what it’s all about. Sitara: So, epidemiology, I think everyone’s kind of understanding is setting patterns of disease in the, in the human population. And so in that sense, what Mee-a and I do are the same, but instead of studying infectious diseases or the natural science part of epidemiology, what I focus on is how human behavior contributes to those patterns of disease. So I look for patterns in data associated like demographics or just behaviors, diet, nutrition, and how that contributes to getting diseases. Mee-a: For me in the public sector, it’s going to be a lot of looking at incidents, rates of infectious diseases. It . . . primarily with COVID-19 right now, and just different ways that we can try to possibly implement infection prevention measures. So we are dealing a little bit more with, I don’t want to say the medical side of it because we aren’t clinicians, but we are dealing more with the medical side of, of the infectious disease than we are with, with the data compared to when I was in academia, at least. Curtis: So take us through maybe the end goal, right? So what you guys are working on. You’re hoping to come out with, I think, some recommendations for people to, to take maybe a better understanding of how the disease spreads, so we get in front of it. What does that look like? Mee-a: I always thought that epidemiology’s gold standard of what we try to achieve is probably smoking cessation. So, you know, when at least growing up for me, I felt like cigarettes and smoking were very, very pervasive and widespread. And as we grew up and we started seeing more of these campaigns showing just how unhealthy smoking was and how much it can really, really be such a detriment to your health, it became a thing where now as adults, our generation looks down upon smoking. And so that’s something that I feel like epidemiology and public health in general has helped to implement that view. And so for the public sector of things, our ideal goal is to really implement infection prevention measures. So that’s going to be in light of COVID-19, that would be making masking a normal thing, making sure social distancing is the new norm, making sure that we are washing our hands for the appropriate amount of time, making sure that when you do disinfect something that you’re disinfecting it properly. If we are in large congregate settings, that we’re trying to do everything that we can to make sure that we don’t create a hotbed of COVID cases. So that’s all the stuff that we’re trying to do right now. That would be, if everything goes correctly, ideally we would be getting to the point where we could either (1) control COVID or (2) completely eradicate it. So that’s, that would be our goal in the public sector. Sitara: And I think, going off of that, things like seatbelts were once seen as a radical change, but that was a public health measure. That was something that epidemiologists put people in the public health world, they looked at the data of car crashes and they decided that wearing a seatbelt was a safety measure that they could implement. And a lot of people were against it, but now that’s obviously the norm that’s in it’s own every car. So I think similar to that, we hope that mask wearing becomes the norm and it becomes okay. And it’s not, it’s not scary. It’s not . . . there’s no . . . there shouldn’t be any stigma on wearing a mask. But in terms of academia, I think what we want is for people to be able to read our research and, and know that that a lot of work went into it. And a lot of, you know, the scientific method, it’s evidence-based, and we’ve done these tests over and over again, this is real science. So I think in the end, we want people to read our research and take something away from it and, and be able to live a healthier lifestyle. Mee-a: The work that Sitara does in the academic field is what we build off in the public field. So we implement the measures that she proves in her research, if that makes sense. Curtis: Yeah, no, that’s awesome. And I’d like to maybe dig into that a little bit. Sitara, can you talk to us and maybe you can just pick one or, or however you want to go about it, but I’m curious, I’d like to give people a sense for how you approach a research problem like this, how you make sure it’s rigorous, how you go about collecting the data and analyzing it. All of that would be really interesting just to kind of hear from your perspective. Sitara: Yeah. So, okay. So for example, with COVID, we can talk about COVID, one of the faculty in the lab that I work in, we had a question of, you know, what is the shelter and policies? What are they doing to people’s behaviors? How is that affecting people’s behaviors? And we had these questions, like, are people working out more? Are they working out less? Are they eating more, are they eating less? And so we formulated a survey, we wrote questions. We took, we didn’t write the questions. That’s important. We took the questions from previously validated surveys. So these are, these are questionnaires that have been validated by other scientists that they’re good measures of asking these questions and getting the information that you want. And so we created this long survey that asks questions about physical activity, diet, drug use, sleep habits, and it’s this long survey. And then we just disseminated it on the internet. We shared it on our social media. We shared it in emails to the faculty at school, to students at the school. And then we just asked everyone, you know, could you share this with your friends, your family? And in the end, we ended up getting, I think, over 4,000 responses. And so what we’re doing with that data is then. So that this, the survey was on a data management website. We specifically used Red Cap and then that data was pulled from Red Cap, downloaded into an Excel file and plugged into a statistical software. So I think we used Stata for the specific one, and stata is what I most commonly used for data analysis, and then we just run tests on that data. So we do like T tests, Chi square test, cross tabulations, regression. That’s the type of tests that we do to see if there’s any pattern in that data to see if there’s any association. And then we take those results, and we write a manuscript, we write a paper, an introduction, a methods, results, conclusion, and then we to publish that. And then once that’s published, we hope that people read that we either hope that policymakers are reading that and they’re seeing these are the effects in shelter and policies. How can we change it to make it better? Or we hope that the public reads it or, or that the news, the media catches on and, and writes an articles, studies find that people are working out less during shelter and policies. So that’s kind of, you know, in a, in a, like in a nutshell, what the process is of coming up with a question and then getting that data and publishing it, there’s so many different ways of doin
23 minutes | Jun 30, 2020
Vast ETL Efficiency Gain with Upsolver
27 minutes | May 30, 2020
Data Flexibility in Healthcare
Jason Kolaczkowski has worked in both a large-company data shop and in a company trying to help large companies fix their problems. He shares his perspective as senior director of healthcare analytics at NextHealth and former Kaiser employee on the importance of streamlining data definitions—and many other helpful insights.
28 minutes | Apr 23, 2020
Education and AI
For David Guralnick, education, AI, and cognitive psychology have always held possibility. With many years of experience in this niche, David runs a company that designs education programs, which employ AI and machine learning, for large companies, universities, and everything in between. David Guralnick: Somehow what’s happened in a lot of the uses of technology and education to this point is we’ve taken the mass education system that was there only to solve a scalability problem, not because it was the best educational method. So we’ve taken that and now we’ve scaled that even further online because it’s easy to do and easy to track. Ginette Methot: I’m Ginette, Curtis Seare: and I’m Curtis, Ginette: and you are listening to Data Crunch, Curtis: a podcast about how applied data science, machine learning, and artificial intelligence are changing the world. Ginette: Data Crunch is produced by the Data Crunch Corporation, an analytics training and consulting company. Curtis: First off, I’d like to thank everyone who has taken the Tableau fundamentals zombie course that we announced the last episode. We’ve been getting a lot of great feedback from you. It’s fun to see how people are enjoying the course and thinking that it’s fun and also clear and it’s helping them learn the fundamentals of Tableau. The reason we made that course is because Tableau and data visualization are really important skills. They can help you get a better job, they can help you add value to your organization. And so we hope that the course is helping people out. Also, according to the feedback that we have received, we’ve made a couple of enhancements to the course, so there are now quizzes to test your knowledge. There are quick tips with each of the videos to help you go a little bit further than even what the videos teach. We’ve also included a way to earn badges and a certificate so that you can show off your skills to your employer or whoever. And we’ve also thrown in a couple other bonuses. One is our a hundred plus page manual that we actually use to train at fortune 500 companies so that’ll have screenshots and tutorials and tips and tricks on the Tableau fundamentals. And we have also included a checklist and a cheat sheet, both of which we actually use internally in our consulting practice to help us do good work. One of them will help you know which kind of chart to use in any given scenario that you may encounter, whether that’s a bar chart or a scatter plot or any number of other more advanced charts. And the other is a checklist that you can run down and say, “do I have this, this, this and this in my visualization before I take it to present to someone to make sure that that’s going to be a good experience.” So hopefully all of that equals something that is really going to help you guys. And something also where you can learn Tableau and have fun doing it, saving the world from the zombie apocalypse, and the price has risen a little bit since last time. But for our long-time listeners here, if you use the code “podcastzombie” without any spaces in the middle, then that’ll go ahead and take off 25% of the list price that is currently on the page. So hopefully more of you guys can take it and keep giving us feedback so we can keep improving it. And we would love to hear from you Ginette: Now onto the show today. We chat with David Guralnick, president and CEO of kaleidoscope learning. David: I’ve had a long time interest in both education and technology going way, way back. I was, I was lucky enough to go to an elementary school outside of Washington DC called Green acres school in Rockville, Maryland, which was very project based. So it was non-traditional education. You worked on projects, you worked collaboratively with people, your teachers’ role was almost as much an advisor and mentor as a traditional teacher. It wasn’t person in front of the room talking at you, and you learn how to, you know, you really learn how to think creatively and pursue your own interests and learn by doing, and so all of that stayed with me as I got older and I developed interest in technology from a really young age. I had my first computer at 13, which was at a time when people did not normally have a computer at 13 and was interested then through that in how computers could learn and what did artificial intelligence mean. And it was a field that was, was a bit of a mystery and ended up as I was finishing college, getting interested in the work of an artificial intelligence professional named Roger Shank who was at Yale. And Roger was just at the time leaving Yale with some faculty to start an Institute at Northwestern university that brought together a cognitive psychology, computer science and AI and education to apply artificial intelligence techniques to education. And so I did my PhD at that program and ended up being asked to focus particularly on business problems in the corporate world and work with some corporate clients through Accenture, that was in Anderson consulting and ah, it’s kind of what, what, you know, the work that continues to this day. Curtis: Yeah. That’s great. What, what year around were you’re doing your PhD, just so I get a. David: PhD for me was starting in ’89 and so wrapping up in ’94. Late ’80s early ’90s. Curtis: Before the AI wave hit everything, right. You guys were working on this stuff on the cutting edge it sounds like. David: Yeah, absolutely. It was, it was, um, we were considered on the cutting edge was a cutting edge lab. We were, you know, written up in the early days of wired magazine and all that kind of stuff. And it was really interesting place to be, it was a tremendous group of people. We had, I mean, some of them I still work with to this day. We had people who were excellent writers. We had people who were really cutting-edge thinkers in AI and in education and, and in cognitive psychology, which sometimes almost like cognitive science side sometimes gets left out, right? It’s, you know, how do you, how do you think and learn? How do you, how do you understand what your, you know, what you’re experiencing. And all of that goes into designing the experience. So yeah, those were, it was a really a really fascinating place to be and built on a lot of the principles that, that I kind of believed in from my formative years and couldn’t work out any better. Curtis: Yeah. That’s awesome. Now, now you’ve seen this whole progression of, of AI machine learning . . . What’s your perspective on that since you’ve, you’ve lived this entire cycle now? David: Yeah, I’ve lived a, yeah, I’ve lived a few cycles. When, I mean, when I first started doing it, it was kind of, you know, the, uh, you know, the almost, almost became the dying days of, of AI at one point, right? Like we were doing really interesting things I think in applying it to education. But as a field AI was considered, it was considered a failure. The years since my PhD were mostly what’s considered AI winter, you know, really it just didn’t had high hopes. We expect it to be in a Jetsons like world and we are not. What happened? And you know, now I’ve seen the Renaissance and the Renaissance has been certainly interesting to see. There’s obviously a lot more computing power now, which has helped. There’s sort of a lot more public interest and understanding of what AI could be. And some of that’s, you know, there’s probably more, more good than bad though. Sometimes it’s a little scary. We also are in danger of being over-hyped once again. And I think that’s the thing that we, we look at. I mean I’ll talk to people sometimes even about what’s possible, what kind of conversations online systems can have with people, and there’s usually an overstatement of, of what the reality is. And so I think that’s something to be cautious of as we move forward and keep thinking about where AI techniques and machine learning, which, which to me, which the traditionalist is a subset of AI can fit in and not, you know, not overstate and not necessarily feel like the goal has to be a fully functional human replacement. I don’t know that that’s a societal goal for a lot of reasons, but even in terms of technology, it’s not clear that that’s what we need. And in particular in the world of education, it’s not clear that that’s what we would want. Curtis: Right. Now, can we talk a little bit about cognitive psychology and the angle that, that that takes in your work? That’s not a topic we hit very often on this show, but I think it’s really interesting as it applies to to the work you’re doing. David: Yeah, absolutely. There’s, I mean to me, and it’s always been a critical part of what we do. You’re not looking at just putting technology out there, you’re looking at technology that in some ways on one side might mirror some of human thought processes. So that’s part of what we were doing back in my old research lab at Northwestern was thinking about how technology could, could reflect human thought processes. But then on the, on the end user side, so on the more practical side, we need to develop technology experiences that really do help people accomplish their goals, whether they’re educational goals or whether they’re otherwise. In order to do that, we need to have an understanding of how people think, how they learn, how they process information, how they acquire skills. Some of that borders on education research, but a lot of that is the cognitive side and it all, to me it really is all interdisciplinary, right? You
13 minutes | Mar 31, 2020
Upskilling from Home
Many of us are stuck at home right now, due to the Covid-19 pandemic. There are pros and cons to this. We have less of a commute, more quality time with people in our households, and time to do little tasks we've been putting off. On the flip side, it can feel isolating, basic necessities are much more of a concern, and every day often feels the same. Today we talk about taking advantage of extra time by upskilling in economies that may suffer as a result of the pandemic.
24 minutes | Feb 29, 2020
How to Reduce Uncertainty in Early Stage Venture Funding
Early stage venture investing has little data to draw from to make good investing decisions. So how has Connetic Ventures successfully developed a data system to inform their investment decisions? We chat with Chris Hjelm about the process they've used to develop something that does just that.
20 minutes | Jan 29, 2020
Data in Healthcare with Ron Vianu
If you’ve ever tried to find a doctor in the United States, you likely know how hard it is to find one who’s the right fit—it takes quite a bit of research to find good information to make an informed choice. Wouldn’t it be nice to easily find a doctor who is the right fit for you? Using data, Covera Health aims to do just that in the radiology specialty. Ron Vianu: I think the tools are really improving year over year to a significant degree, but like anything else, the tools themselves are only as useful as how you apply them. You can have the most amazing tools that could understand very large datasets, but you know how you approach looking for solutions, I think can dramatically impact. Do you yield anything useful Ginette Methot: I’m Ginette, Curtis Seare: and I’m Curtis, Ginette: and you are listening to Data Crunch, Curtis: a podcast about how applied data science, machine learning, and artificial intelligence are changing the world. Ginette: Data Crunch is produced by the Data Crunch Corporation, an analytics training and consulting company. If you’re a business leader listening to our podcast and would like to move 10 times faster and be 10 times smarter than your competitors, we’re running a webinar on February 13th where you can learn how to do this and more. Just go to datacrunchcorp.com/go to sign up today for free. If you’re a subject matter expert in your field, like our guest today, and you’re looking to understand data science and machine learning, brilliant.org is a great place to dig deeper. Their classes, help you understand algorithms, machine learning concepts, computer science basics, and many other important concepts in data science and machine learning. The nice thing about brilliant.org is that you can learn in bite-sized pieces at your own pace. Their courses have storytelling, code writing and interactive challenges, which makes them entertaining, challenging, and educational. Sign up for free and start learning by going to brilliant.org/data crunch. And also the first 200 people that go to that link will get 20% off the annual premium subscription. Today we chat with Ron Vianu, the CEO of Covera Health. Let’s get right to it. Curtis: What inspired you to get into what you’re doing, uh, to start Covera health? Where did the idea come from and what drives you? So if we could start there and learn a little bit about you and the beginnings of Covera health, that would be great. Ron: Sure. Uh, and I, I guess it’s important to state that, you know, I’m a problem solver by nature, and my entire professional career, I’ve been a serial entrepreneur building companies to solve very specific problems. And as it relates to Covera, the, the Genesis of it was understanding that there were two problems in the market with respect to, uh, the healthcare space, which is where we’re focused that were historically unsolved and there were no efforts really to solve them in, from my perspective, a data-driven way. And that was around understanding quality of physicians that is predictive to whether or not they’ll be successful with individual patients as they walk through their practice. And so if you, and we’re focused on the world of radiology, which today is highly commoditized and what that means is that there was a presumption that wherever you get an MRI or a CT study for some injury or illness, it doesn’t matter where you go. It’s more about convenience and price perhaps. Whereas what we understand given our research and the, the various things that we’ve published since our beginning is that one, it’s like every other medical specialty. It’s highly variable. Two, since radiology supports all other medical specialties in a, as a tool for diagnosis, diagnostic purposes, any sort of variability within that specialty has a cascading effect on patients downstream. And so for us, the beginning was, is this something that is solvable through data? Could we understand for an individual patient as they’re looking for medical care, what is the right physician for them that would yield the most accurate diagnosis related to their condition. Curtis: Got it. And I’m assuming you have some experience in the medical field. Do you usually have the companies, you’ve started been in the medical field and so you had insight into this issue or where did that come from? Ron: Yeah, I mean, my background, I was a premed student actually, uh, in New York and I, at the time, I felt like going to medical school really wouldn’t be solving problems as the way I saw, uh, the life of a physician. And so I decided that business was probably a better perspective to solve problems. And ironically I ended up solving problems within healthcare my entire professional career. And so I have a fairly deep knowledge base, if you will, around clinical medicine for a lay person and obviously a lot of experience around starting businesses and using data to solve problems. And so it really, for me it’s an interesting combination of skills that allows me to tackle these things in a way that perhaps a physician or a business person, uh, independently wouldn’t be able to do. Curtis: And where did your expertise in data come from? You seemed to approach things from a very data-driven perspective. Where did you get that from? Ron: I think that’s honestly something that one is innately born with and then one finds the tools to help them explore that. And so in college I studied chemistry and philosophy, and I think part of it is because I was trying to approach different parts of the way my brain functioned. And so when I solve problems today, I try to solve them in a very data-driven manner, generally speaking. And so when I find tools like statistical modeling or AI and so on and so forth, that can further enhance the approach that I would take in solving a problem. Those tools are extraordinarily useful for me. But I don’t think it’s something that I, you know, and one could argue, maybe others have this where you take a course and you’re like, ah, this is an interesting science and I could use this science. For me it was how do I kind of expand the very way I generally function. Curtis: One of the things that we see as the tooling and the understanding around machine learning and analytical practices is becoming better and better. As someone that didn’t study this, you know, computer science, this kind of stuff. Have you found it accessible? Sort of easy to pick up and apply to problems? Ron: Right. So I guess two points I would make there. One, I’m, I’m not a data scientist per se in any traditional way. My background is comp sci meaning back in a, in kind of an untraditional way, meaning both in college and pre college I was programming. And so I have a little bit of that background even though I didn’t study it in a formal setting, but I think the tools are really, uh, improving year over year to, uh, to a significant degree. But like anything else, the tools themselves are only as useful as how you apply them. And so I think, you know, you can have the most amazing tools that could understand very large datasets, but you know, how you approach looking for solutions I think can dramatically impact do you yield anything useful. Curtis: And do you have a specific approach that you do? Is this, does it come naturally to you or do you have some sort of framework or approach that you use to look at things and figure out how you, how you could solve it? Ron: Right. So I’m agnostic from a data science perspective with respect to the actual approach we’re taking, uh, meaning what tools are we going to be using? But moving aside technically, you know, there are two different approaches one can take when one broadly thinks about data science and analytics. And you know, the, the big approach that I think has been very popular over the last, call it five, seven years, is around big data as people call it, which is now that we have access to lots of data and we have access to all these interesting tools and algorithms that can analyze that data, what can we ultimately understand from that data, what patterns can emerge that perhaps we haven’t seen in the past? And I think that’s very productive and useful in many contexts in healthcare where it’s very difficult to understand what data you’re looking at to begin with. And so you have very dirty datasets and cleaning those up becomes half the challenge. And so for me, my approach with respect to healthcare data analytics has been more hypothesis driven rather than that big data approach. And what I mean by that is if you speak to physicians around this thing called quality, which is what we’re trying to solve, how do you understand what physician is ideally suited for a particular patient in order to yield the best outcome? And so as we approach that problem, we work with many experts across the field and we ask to understand their intuition around quality, what makes a good physician. And once we have a unified sense of what the experts think, then we start attacking the data in a way that explores those theories and understands if we can ultimately find some signal with respect to those theories or rather correlations with respect to those theories. And so it’s, it’s a little bit of a different approach, much more hypothesis driven than big data-driven. Curtis: So instead of sifting through the data to find random signals and then seeing if those are useful for some application, you then make some hypotheses, uh, bring domain knowledge and then see if you can find some signals in data that, that, that you have available. Is that accurate? Ron: That’s accurate. And, and I can give you a concrete
30 minutes | Dec 18, 2019
Data Literacy with Ben Jones
We talk with Ben Jones, CEO of Data Literacy, who's on a mission to help everyone understand the language of data. He goes over some common data pitfalls, learning strategies, and unique stories about both epic failures and great successes using data in the real world. Ginette Methot: I’m Ginette, Curtis Seare: and I’m Curtis, Ginette: and you are listening to Data Crunch, Curtis: a podcast about how applied data science, machine learning, and artificial intelligence are changing the world. Ginette: Data Crunch is produced by the Data Crunch Corporation, an analytics training and consulting company. It’s becoming increasingly important in our world to be data literate and to understand the basics of AI and machine learning, and Brilliant.org is a great place to dig deeper into this and related topics. Their classes help you understand algorithms, machine learning concepts, computer science basics, and many other important concepts in data science and machine learning. The nice thing about Brilliant.org is that you can learn in bite-sized pieces at your own pace. Their courses have storytelling, code-writing, and interactive challenges, which makes them entertaining, challenging, and educational. Sign up for free and start learning by going to Brilliant.org/DataCrunch, and also the first 200 people that go to that link will get 20% off the annual premium subscription. Curtis: Ben Jones is here with me on the podcast today. This is a couple months coming. Excited to have him on the show. He's well known in the data visualization community, he's done a lot of great work there. Uh, used to work for Tableau. Now he's off doing his own thing, has a company called Data Literacy, which is interesting. We're going to dig into that and also has a new book out called Avoiding Data Pitfalls. So all of this is really great stuff and we're happy to have you here, Ben. Before we get going, just give yourself a brief introduction for anyone who may not know you and we can go from there. Ben: Yeah, great. Thanks Curtis. You mentioned some of the highlights there. I uh, worked for Tableau for about seven years running the Tableau public platform, uh, in which time I wrote a book called Communicating Data with Tableau. And the fun thing was for me that launched kind of a teaching, um, mini side gig for me at the University of Washington, which really made me fall in love with this idea of just helping people get excited about working with data. Having that light bulb moment where they feel like they've got what it takes. And so that's what caused me to really want to lead Tableau and launch my own company Data Literacy at dataliteracy.com which is where I help people, you know, as I say, learn the language of data, right? Whether that's reading charts and graphs, whether that's exploring data and communicating it to other people through training programs to the public as well as working one on one with clients and such. So it's been a been an exciting year doing that. Also, other things about me, I live here in Seattle, I love it up here and go hiking and backpacking when I can and have three teenage boys all in high school. So that keeps me busy too. And it's been a fun week for me getting this book out and seeing it's a start to ship and seeing people get it. Curtis: Let's talk a little bit about that because the book, it sounds super interesting, right? Avoiding Data Pitfalls, and there are a lot of pitfalls that people fall into. So I'm curious what you're seeing, why you decided to write the book, how difficult of a process it was and then some of the insights that you have in there as well. Ben: Yeah, so I feel like the tools that are out there now are so powerful and way more so than when I was going to school in the 90s, and it's amazing what you can do with those tools. And I think also it's amazing that it's amazing how easy it is to mislead yourself. And so I started realizing that that's sometim...
23 minutes | Nov 20, 2019
Social Media and Machine Learning
How do you build a comprehensive view of a topic on social media? Jordan Breslauer would say you let a machine learning tool scan the social sphere and add information as conversations evolve, with help from humans in the loop. Ginette Methot: I’m Ginette, Curtis Seare: and I’m Curtis, Ginette: and you are listening to Data Crunch, Curtis: a podcast about how applied data science, machine learning, and artificial intelligence are changing the world. Ginette: Data Crunch is produced by the Data Crunch Corporation, an analytics training and consulting company. Ginette: Many of you want to gain a deeper understanding of data science and machine learning, and Brilliant.org is a great place to dig deeper into these topics. Their classes help you understand algorithms, machine learning concepts, computer science basics, probability, computer memory, and many other important concepts in data science and machine learning. The nice thing about Brilliant.org is that you can learn in bite-sized pieces at your own pace. Their courses have storytelling, code-writing, and interactive challenges, which makes them entertaining, challenging, and educational. Sign up for free and start learning by going to Brilliant.org slash Data Crunch, and also the first 200 people that go to that link will get 20% off the annual premium subscription. Let’s get into our conversation with Jordan Breslauer, senior director of data analytics and customer success at social standards. Jordan: My name is Jordan Breslauer. I'm the senior director of data analytics and customer success at social standards. I've always been a data geek as it pertains to sports. I think of Moneyball when I was younger, I always wanted to be kind of a the next Billy Bean and I, when I started working for sports franchises right after high school and early college days, I just realized that, that type of work culture is wasn't for me, but I was so, so into trying to answer questions with data that had no previously clear answer, you know? I loved answering subjective questions like, or what makes the best player or how do, how do I know who the best player is? And I thought what was always fun was to try and bring some sort of structured subjectivity to those sorts of questions through using data. And that's really what got me passionate about data in the first place. But then I just started to apply it to a number of different business questions that I always thought were quite interesting, which have a great deal of subjectivity. And that led me to Nielsen originally where my main question that I was answering on a day-to-day basis, what was, what makes a great ad? Uh, what I found though is that advertising at least, especially as it pertains to TV, is really where brands were moving away from and a lot of the real consumer analytics that people were looking for were trying to underpin people in their natural environment, particularly on social media. And I hadn't seen any company that had done it well. Uh, and I happened to meet social standards during my time at Nielsen and was truly just blown away with this ability to essentially take a large input of conversations that people were happening or happening, I should say, and bring some sort of structure to them to actually be able to analyze them and understand what people were talking about as it pertained to different types of topics. And so I think that's really what brought me here was the fascination with this huge amount of data behind the ways that people were talking about on social. And the fact that it had some structure to it, which actually allowed for real analytics to be put behind it. Curtis: It's a hard thing to do though. Right? You know, to answer this question of how do we extract real value or real insight from social media and you'd mentioned historically or up to this point, companies that that are trying to do that missed the mark.
Terms of Service
Do Not Sell My Personal Information
© Stitcher 2021