stitcherLogoCreated with Sketch.
Get Premium Download App
Listen
Discover
Premium
Shows
Likes
Merch

Listen Now

Discover Premium Shows Likes

Data & Science with Glen Wright Colopy

89 Episodes

73 minutes | Aug 2, 2022
Keith O’Rourke | The Logic of Statistics
Keith O'Rourke | The Logic of Statistics Dr. Keith O'Rourke talks about the logical reasoning behind statistical modeling. Topics include mathematical vs scientific reasoning, whether science has become too stats focused, and vice versa. Watch it on... Youtube: https://youtu.be/FqE4ROHBKpY Podbean: https://dataandsciencepodcast.podbean.com/e/keith-o-rourke-the-logic-of-statistics/   Topic List: 0:00 - The logic of statistics 0:30 - What is scientific statistics? 5:15 - The logic of statistics and CS Pierce 9:15 - Role of representation in statistics: explicit vs implicit 14:13 - Diagrammatic Reasoning 18:45 - Why is modeling counterfactual? 19:33 - How can statisticians become better scientists? 28:40 - Science is hard 31:24 - Computational approaches to learning 42:00 - Learning through metaphor 46:28 - Diagrammatic representations vs math 48:40 - Is science too statistics-focussed?  59:35 - Is statistics sufficiently science-focussed?  1:08:40 - Scientific Debate   #statistics #datascience #science 
52 minutes | Jul 26, 2022
Jack Fitzsimons | Evil Models: Hiding Malware in Neural Networks
Jack Fitzsimons | Evil Models: Hiding Malware in Neural Networks Did you know that it's possible to hide malware in neural networks? Actually, you can hide malware in many statistical models. This is the subject of two recently-published papers (aptly titled "EvilModel" & "EvilModel 2.0"). Dr. Jack Fitzsimons makes it easy to understand how this is done, using techniques that began long before computers.     Watch or listen on...  Youtube: https://youtu.be/QBnk8ogL8Nk Podbean: https://dataandsciencepodcast.podbean.com/e/jack-fitzsimons-evil-models-hiding-malware-in-neural-networks/
81 minutes | Jul 18, 2022
Scott Cunningham | Causal Inference (The Mixtape)
Scott Cunningham | Causal Inference (The Mixtape) Scott Cunningham (Baylor University) discusses the ideas of his book "Causal Inference: The Mixtape". Topics include trusting inference in the absence of counterfactuals and the challenges of apply scientific methods to social phenomena.  Watch it on... YouTube: https://youtu.be/yNaCudDVTkY Podbean: https://dataandsciencepodcast.podbean.com/e/scott-cunningham-causal-inference-the-mixtape/ 0:00 - COMING UP... 0:35 - What makes it into the mixed tape? 7:10 - Coding to learn 11:15 - More people are expected to work with data & code 12:50 - Design vs program vs estimators 20:40 - Causation with zero correlation 27:00 - Optimization make everything endogenous 28:45 - The hospital example 29:30 - Credible scientific discovery vs motivated discovery 39:55 - Different meanings of causality 43:30 - The impossible counterfactual  47:00 Counterfactual nihilism 49:20 Social experiments / Defund the police 53:35 - Skepticism about the science of social phenomena 1:05:20 - The Italian crime example 1:16:30 - Scientific debate  
84 minutes | Jul 11, 2022
Eric Daza | Important Ideas in Causal Inference
Eric Daza | Important Ideas in Causal Inference YouTube: https://youtu.be/K5nsSMJVIT0 Andrew Gelman and Aki Vehtari wrote a paper titled, "What are the most important statistical ideas of the past 50 years?". The first idea in the list is "counterfactual causal inference". Eric Daza (Evidation Health) walks us through the main ideas of the Gelman & Vehtari paper, drawing examples from several fields, including medical & healthcare statistics.  Topics 0:00 - Coming up...Correlation vs Causation 1:20 - Most important statistical ideas over the last 50 years 6:10 - Counterfactual Causal Inference 9:40 - Assumptions Change between Applied Domains 21:10 - Propensity Score Methods 25:15 - Transportability of Scientific Results  26:30 - People don't want generalizable results 32:00 - Generic Computation Algorithms 37:00 - Reweighting 43:57 - Matching Methods 58:20 - Medical Data is Higher Dimensional that we think. 1:00:15 - Is a Trial Population Representative?  1:10:35 - Causal Models in the Future 1:18:45 - Apostates Welcome 1:21:45 - Scientific Debate    
34 minutes | May 10, 2022
Wenting Cheng & Weidong Zhang | Advances in Biotech/Biopharma
Wenting and Weidong discuss how the statistical challenges in the biopharm industry have proliferated with the unique demands of biotech and related life science industries.
69 minutes | May 10, 2022
Ruda Zhang | Gaussian Process Subspace Regression
Ruda Zhang | Gaussian Process Subspace Regression Ruda Zhang (Duke University) walks us through "Gaussian Process Subspace Regression for Model Reduction" by Zhang, Mak, and Dunson. To keep the topic interesting for both the early career & advanced audience we recap key points at a high level so that no one gets lost.   This episode involves a presentation, so you may prefer to watch the YouTube version here: https://youtu.be/IPtqUUG4XcY   Ruda's website: https://ruda.city/ The paper: https://arxiv.org/abs/2107.04668
83 minutes | Apr 14, 2022
Ruda Zhang | Math-Science Duality
Ruda Zhang | Math-Science Duality Watch it on... Youtube: https://youtu.be/GoDwen-RGZg Podbean: https://dataandsciencepodcast.podbean.com/e/ruda-zhang-math-science-duality/ Statistics is thought to reside at the interface of science and mathematics. Ruda Zhang (Duke University) discusses the friction at this interface and the role that both mathematical formalism & observational/data-driven intuition play in scientific discovery. A great topic for anyone interested in statistics' role in scientific discovery. #datascience #ai #science #mathematics Topic List 00:00 COMING UP... 2:44 Ruda Zhang's compendium of cool ideas + a Gaussian process PSA 7:08 Is intuition undervalued in scientific research? 10:16 Mathematics vs observational science. Rigor vs intuition. 14:07 Intuition & discovery precedes mathematical rigor 21:58 Mathematics vs empirical science & the complexity of induction 30:24 Abstract thinking & the cost/benefit of discovery 37:25 The efficient frontier / Pareto Front of knowledge 42:55 Pragmatism and competence 50:24 Math /science dualism 1:15:52 AI making scientific discoveries 1:19:15 Statistical & scientific debate
79 minutes | Apr 6, 2022
Simon Mak | Integrating Science into Stats Models
Simon Mak | Integrating Science into Stats Models #statistics #science #ai It’s a common dictum that statisticians need to incorporate domain knowledge into their modeling and the interpretation of their results. But how deeply can scientific principles be embedded into statistical models? Prof. Simon Mak (Duke University) is pushing this idea to the limit by integrating fundamental physics, physiology, and biology into both the models and model inference. This includes Simon’s joint work with Profs. David Dunson and Ruda Zhang (also of Duke University). Scientific reasoning AND stats. What more could we ask for? Enjoy! Watch it on.... YouTube: https://youtu.be/bUbZO7R4z40 Podbean: https://dataandsciencepodcast.podbean.com/e/simon-mak-integrating-science-into-stats-models/   00:00 - COMING UP….Scientists & Statisticians 02:09 - Introduction - Integrating scientific knowledge into AI/ML 06:08 - How much domain knowledge is sufficient? 09:15 - Choosing which prior knowledge to integrate into a model 14:49 - Black box & gray box optimization 19:50 - Non-physics examples of integrating scientific theory into ML models 22:45 - Scientific principles & modeling at different scales 27:20 - Correlation is one just way of modeling linkage 36:37 - Conditional independence & different-fidelity experiments 39:40 - Innovation vs incorporation of known information in the model 42:52 - Aortic stenosis example 52:49 - Which mathematics can be used to represent scientific knowledge 57:09 - How to acquire scientific domain knowledge 1:02:45 - Complementary approaches to integrating science 1:06:48 - Gaussian process & integrating priors over functions 1:12:48 - A topic for statisticians and scientists to debate:science-based vs data-based learning. Simon Mak's Webpage: https://sites.google.com/view/simonmak/home  
76 minutes | Mar 16, 2022
Martin Goodson | Practical Data Science & The UK’s AI Roadmap
Martin Goodson | Practical Data Science & The UK's AI Roadmap #ai #datascience #startups Martin Goodson (Evolution AI) describes the key aspects of the UK's AI Roadmap & responses to the document by members of the Royal Statistical Society. In particular, Martin describes the disconnect between the priorities of AI startups and industry practitioners on one side, and government and academia on the other. Martin also outlines which skills early career data scientists should focus on while in school versus after entering the workforce. Also available on.... YouTube: https://youtu.be/T9qRl6Hclhg   Topic List 0:00 COMING UP: Scientific culture & AI 1:25 The UK AI Roadmap 8:44 Who is a data science “practitioner”?  12:53 Data science in AI startups 20:36 Is there a disconnect between practitioners & academia? 25:09 Key skills for new data science graduates 32:03 Coding & production level data science 39:30 Learning the right data analysis skills at the course-level.  45:32 AI leadership 58:40 AI from academia & OpenSource initiatives 1:05:37 Large institutions' impact on the AI field 1:08:24 Back to the UK AI roadmap   1:12:16 Building an AI community  1:13:15 AI in our lifetime: Moonshots & realistic goals 1:14:31 Scientific debate
74 minutes | Mar 1, 2022
Jack Fitzsimons | Data Security, Privacy, & Artificial Intelligence
Dr. Jack Fitzsimons (Oblivious AI) gives a high-level introduction to the technologies that can either exploit or protect your data privacy. If you'd like to survey the landscape of data privacy-preserving technologies (from someone who's building the tech) this is a good place to start! #datascience #privacy #ai   0:00 - Coming up... 3:24 - Introduction 6:20 - Data privacy and privacy enhancing technologies   13:00 - History of privacy enhancing technologies 19:54 - Differential privacy: Hiding the influence of a single data point 22:52 - Trading data utility for data privacy 38:32 - Tracking algorithms and how they decide user preferences 42:04 - Preserving privacy: Anonymizing data & VPNs 50:17 - Exploration vs Exploitation: Combining best of multiple domains to tackle problems 54:13 - Federated learning, input and output privacy of data 58:45 - Balancing data privacy vs data-driven personalization 1:05:50 - What should data scientists/statisticians debate?
70 minutes | Feb 22, 2022
Chris Tosh | The piranha problem in statistics
The piranha problem (too many large, independent effect sizes influence the same outcome) has received some attention on Andrew Gelman’s blog. But now it’s a paper!  Chris Tosh (Memorial Sloan Kettering) talks about multiple views of the piranha problem and detecting the implausible scientific claims that are published. The butterfly effect makes an appearance.  If you enjoyed the science-vs-pseudoscience topics, you’ll enjoy this one.   0:00 - Coming up in the episode 2:35 - What is the Piranha Problem? 19:54 - Confusing effect sizes 23:11 - The "words & walking speed" study 26:22 - Declaration of independent variables 30:58 - Piranha theorems for correlations 37:07 - Piranha theorems for linear regression 40:37 - Piranha Theorems for mutual information  44:13 - Bounds on the independence of the covariates 46:12 - Applying the piranha theorem to real data 50:12 - Applying the piranha theorem across studies 54:05 - A Bayesian detour 1:00:12 - The butterfly effect & chaos 1:04:26 - Applying the piranha theorem to cancer research
64 minutes | Feb 9, 2022
Chris Holmes | AI, Digital Health, & The Alan Turing Institute
Chris Holmes is Professor of Biostatistics at the University of Oxford and Programme Director for Health and Medical Sciences at The Alan Turing Institute. Chris’ research interests include Bayesian nonparametrics (which is the right kind of nonparametrics), statistical machine learning, genomics, and genetic epidemiology. 0:00 - Intro 1:38 - Chris Holmes, Professor of Biostatistics at Oxford University 3:28 - UK Biobank & designing a valuable dataset 8:42 - Healthcare charities in the UK 11:16 - Digital Health: prioritizing research questions 19:55 - Bayes, nonparametrics, and Bayesian nonparametrics 23:30 - Model prediction is at the heart of Bayesian inference 28:00 - Prioritization in model building for biology 33:09 - Model constraints to generate valid inference 37:34 - Hypothesis driven science in statistical learning versus deep learning 43:30 - Developing models in genomics & clinical informatics 48:37 - Building stable, generalizable and robust models 52:41 - Important questions to think about  54:05 - Causal reasoning and clinical risk prediction 57:50 - What topic should the statistical community debate?  
54 minutes | Feb 4, 2022
Philosophy of Data Science | Deborah Mayo | Revolutions, Reforms, and Severe Testing in Statistical Thinking
Philosophy of Data Science Series  Keynote with Deborah Mayo Episode 1: Revolutions, Reforms, and Severe Testing in Statistical Thinking In the first keynote of the Philosophy of Data Science Series we have a 2-part interview with Deborah Mayo (Virginia Tech). In the first part of our keynote with Deborah Mayo we cover... - The role of scientific revolution and its implications for statistics and data scientist. - The necessity of statistical reforms and why philosophy will play a role. - The value of severe testing of scientific claims. Watch it on...  YouTube: https://youtu.be/S4VAEShM3BU Podbean:  You can join our mail list at: https://www.podofasclepius.com/mail-list We're always happy to hear your feedback and ideas - just post it in the YouTube comment section to start a conversation.  Thank you for your time and support of the series!    Topics: 0:00 - Preface to First Keynote Interview 2:00 - Welcome Deborah Mayo! 5:05 - What is the Philosophy of Statistics? 8:15 - What does philosophy add to data science? 16:10 - Scientific revolution in statistics 20:10 - Statistical reforms 24:25 - Replication & hypothesis pre-specification 31:00 - Failure is severe testing 37:25 - Error statistics 48:00 - Scientific progress and closing remarks
77 minutes | Feb 1, 2022
Charlotte Deane | Bioinformatics, Deepmind’s AlphaFold 2, and Llamas
Charlotte Deane | Bioinformatics, Deepmind's AlphaFold 2, and Llamas #datascience #ai Charlotte Deane (Oxford University) talks about statistical approaches to bioinformatics, the evolution of Google Deepmind's AlphaFold 2 & its place in protein informatics deep learning landscape. She also describes humanizing antibodies, and the increasing role of software engineers in statistical research groups. The topic of llamas, camels, and alpacas (and their unique place in proteomics research) makes a surprise visit. [Note: This episode was originally published in January 2022, but the file contained a buffering error, which prevented the full interview from being played. This version, published Feb 1, 2022 contains the full interview.] Topics 0:00 Intro / An important topic to debate 3:50 What is a protein? Why are proteins foundational? 13:32 Immunotherapies, humanizing antibodies, & creating an scientific databases 16:04 Translating in silico research into immunotherapies 21:03 Nanobodies, camels, alpacas, & llamas.  25:05:00 Databases and data knowledge bases 33:21:00 Targeted therapies 39:45:00 Statistical modeling in proteomics 45:40:00 DeepMind AlphaFold's evolution 55:28:00 Software engineers in academic research groups 1:03:21 The adventure of science 1:07:42 Oxford Blues hockey & scientific debate
73 minutes | Dec 2, 2021
Eric Schwitzgebel | Consciousness, Zombies, & First Person Data | Philosophy of Data Science
The philosophical community continuously aims to reconcile differing views on first person data and the consciousness of the mind. Is it possible to live without consciousness? Can one conceive thoughts without matching images to them? In this episode, Eric Schwitzgebel of the University of California tries to dissect such topics and questions to help us better understand the philosophical world.    Keywords: philosophy, epistemic data, first person data, stimulus error, imageless thought, consciousness    
39 minutes | Nov 22, 2021
Starting a Statistics Consultancy | Janet Wittes
Starting a Statistics Consultancy | Janet Wittes The following interview was a keynote fireside chat with Janet Wittes (Statistics Collaborative, Inc.) titled "Statisticians as Entrepreneurs". It was recorded for the BBSW 2021 Conference (Nov 3 - 5 in Foster City, CA). References: BBSW 2021 Conference: https://www.bbsw.org/bbsw2021   Topics: 0:00 Janet's background prior to founding Statistics Collaborative, Inc. 3:00 Janet's initial research interest as a consultant 4:10 Why did Janet start her own business as opposed to joining a company or university.  5:45 Who were Janet's first clients? 8:00 What did Janet want to instill in her company? 15:50 Earning enough money to hire people 18:55 Initial ratio of clients to employees 22:42 Janet's company's statistical tech stack 25:00 Different challenges at different stages of the company 27:28 Growing a company but not taking on every possible client or project 28:13 Statisticians as entrepreneurs 37:00 Choosing the right people
83 minutes | Nov 16, 2021
Philosophy of Data Science | Jingyi Jessica Li | Advancing Statistical Genomics
Jingyi Jessica Li | Advancing Statistical Genomics Watch it on….    YouTube       Podbean Jingyi Jessica Li (UCLA) describes common statistical pitfalls in genomic data analysis & the statistical reasoning required to correct these mistakes. Common themes throughout include: Hypothesis-driven science & critical scientific reasoning over data p-values and non-sensical null hypotheses/distributions the value of appearing statistically rigorous researchers cutting intellectual corners & digging themselves into local minima   Episode Topics 0:00 A major advancement in genomic data leads to new statistical techniques 2:15 Hypothesis-driven science & hypothesis-free data analysis 2:55 A ChIP Seq Example 8:00 Misformulation of sampling variability 16:55 A false analogy: the permutation test 19:03 Losing my p-value religion: the value of statistical packaging 24:30 The Clipper Framework for false discovery rate control 31:50 Non-parametric developments 37:55 Inferred covariates 46:00 PseudotimeDE: inferences of differential gene expression along cell pseudotime 47:10 Selective inference 49:25 What biological/physiological data will be incorporated in the future? 52:30 Statistics, computer science, data science, ML, biology 57:05 Machine learning and prediction 1:01:30 Sophisticated models vs sophisticated research 1:07:45 Peer review in science 1:13:05 Hypothesis-driven science vs cutting intellectual corners 1:18:12 What topic should the statistics community debate?
81 minutes | Nov 9, 2021
Mine Çetinkaya-Rundel | Advancing Open Access Data Science Education
Mine Çetinkaya-Rundel | Advancing Open Access Data Science Education #datascience #statistics #education Mine Çetinkaya-Rundel (Duke University) describes the current and future states of statistics and data science education. Then she discusses the process of building open access learning material.   0:00 - Introduction 1:40 - Prioritizing topics in curricula 9:07 - Teaching with intent to test 11:22 - Statistics without computing 17:52 - What should be taught? How do we teach it? 19:07 - Computational thinking is valuable (to 31:45) 23:47 - Self reinforcing academics / positive feedback (to 31:45) 31:08 - Data science vs statistics (the computing angle) 37:55 - Statistical collaboration / technical collaboration 39:45 - Common language / imputation under ignorance 41:12 - Are some topics better for hands on or computational learning? 45:32 - Learning computation through visualization 52:40 - Video cut option before she gives an example 52:42 - Let them eat cake first. 56:08 - What is open source education? Open source vs open access. 59:36 - Advancing open source text books 1:03:55 - Economics of open source 1:07:55 - The open education ecosystem 1:12:17 - Modularizing & parallelizing learning topics 1:16:52 - Favorite dataset on OpenIntro.Org? 1:18:14 - What topic should the statistics community debate?
56 minutes | Sep 20, 2021
Jingyi Jessica Li | Statistical Hypothesis Testing vs Machine Learning Binary Classification
Jingyi Jessica Li | Statistical Hypothesis Testing versus Machine Learning Binary Classification Jingyi Jessica Li  (UCLA) discusses her paper "Statistical Hypothesis Testing versus Machine Learning Binary Classification". Jingyi noticed several high-impact cancer research papers using multiple hypothesis testing for binary classification problems. Concerned that these papers had no guarantee on their claimed false discovery rates, Jingyi wrote a perspective article about clarifying hypothesis testing and binary classification to scientists. #datascience #science #statistics 0:00 – Intro 1:50 – Motivation for Jingyi's article 3:22 – Jingyi's four concepts under hypothesis testing and binary classification 8:15 – Restatement of concepts 12:25 – Emulating methods from other publications 13:10 – Classification vs hypothesis test: features vs instances 21:55 - Single vs multiple instances 23:55 - Correlations vs causation 24:30 - Jingyi’s Second and Third Guidelines 30:35 - Jingyi’s Fourth Guideline 36:15 - Jingyi’s Fifth Guideline 39:15 – Logistic regression: An inference method & a classification method 42:15 – Utility for students 44:25 – Navigating the multiple comparisons problem (again!) 51:25 – Right side, show bio-arxiv paper
52 minutes | Aug 30, 2021
Gualtiero Piccinini | What Are First-Person Data? | Philosophy of Data Science
Gualtiero Piccinini | What Are First-Person Data? First-person methods (and its associated data) have been scientifically and philosophically contentious. Are they pseudoscientific? Or simply pushing the bounds of scientific methodology? Obviously, I have no idea… so Prof. Gualtiero Piccinini (University of Missouri – St. Louis) provides a helpful introduction to the topic covering the key points of its history and the philosophical/scientific debate. 0:00 Why cover first-person methods & data? 2:26 First-person methods vs first-person data? 7:10 Are first-person data legitimate at all? 11:50 Phenomenology 13:26 First-person data is extracted from human behavior 18:25 Skepticism & arguments against first-person data 25:40 Psychophysics, introspectionists, behavioralists, cognitivists, and the origins of first-person data 35:20 Using new instruments & methods in science 46:00 Is this where the philosophers roam? #datascience #statistics #science
COMPANY
About us Careers Stitcher Blog Help
AFFILIATES
Partner Portal Advertisers Podswag Stitcher Studios
Privacy Policy Terms of Service Your Privacy Choices
© Stitcher 2023