UNC School of Data Science faculty Alex McAvoy, Santiago Olivella and Harlin Lee (photo by Jen Hughey/UNC-Chapel Hill)
Faculty at Carolina’s new School of Data Science and Society share their visions for teaching and research that have a real-world impact.
Interviews and story by Drew Guiteras
In conversations about his work, Alex McAvoy, assistant professor at the UNC School of Data Science and Society (SDSS), sometimes shares a joke that an academic mentor once told him.
A mathematical biologist walks up to a farmer and says, “Hey, if I can guess how many sheep you own, can I take one home with me?” The farmer says, “You’ll never get it right, so go ahead!”
The mathematical biologist runs calculations, makes his guess and it’s exactly right. So he picks up his prize and starts to walk away. The farmer says, “That was amazing! But if I can guess what your job is, will you give me my animal back?”
The mathematical biologist says, “That’s fair. What’s your guess?” The farmer says, “You’re a mathematical biologist, and I can tell because you just picked up my dog.”
McAvoy uses the joke to illustrate a point about using data science tools. “I think there is insight into how modeling can be a double-edged sword,” he said. “It can be extremely useful, but it requires abstracting away the ‘right’ amount of information, which can feel like an art as much as a science.”
McAvoy, who has a secondary appointment in the mathematics department in the UNC College of Arts and Sciences, was among the first faculty members recruited to SDSS in 2023, and his farmer joke aligns with the school’s emphasis on a “human-centric” approach to applications of data science that solves real-world problems.
SDSS has assembled a cohort of faculty in fields ranging from the humanities to social sciences to medicine in an effort to facilitate collaboration and help students across Carolina develop data science skills.
“It’s a misconception that data science only happens in certain domains,” said Santiago Olivella, associate professor of political science in the College of Arts and Sciences with a full joint appointment in SDSS. “The school has given me an opportunity to see data science applied to areas that I wasn’t fully aware of, like comparative literature. I think many researchers who use data science are asking similar questions and face similar analytical challenges, just in different contexts. So understanding that can really catalyze innovation.”
Harlin Lee, assistant professor at SDSS with affiliations in the computer science and mathematics departments in the College of Arts and Sciences, said that collaboration helps generate progress beyond a single academic field.
“You don’t want to do just math sitting in your office and have it never see the light of the day,” Lee said. “You want to be working with experts in other scientific or social science domains. You want to advance the theory and methods of data science while also advancing how it’s used outside your immediate field.”
In recent interviews about their first year at SDSS, Lee, McAvoy and Olivella shared thoughts about the school, the opportunity to teach students and how data science impacts people in important or unexpected ways.
What does the “and Society” part of the school’s name mean to you?
- Olivella: “I think the school wants to emphasize that when we think about data science, we think of it in tandem with the human and social aspects of the discipline. The vast majority of applications of data science are designed to better understand people and their social interactions. For example, companies want to understand customers to put better products in front of them, or political candidates want to understand voters to run better campaigns. So leaving people out of the equation is to the detriment of the discipline.”
- McAvoy: “When I joined the school, I really liked the fact that this was part of the name. A lot of what I study centers around populations, and that’s essentially what our society is. I take a lot of inspiration from what we observe in populations and use tools of data science to translate that into a deeper understanding of how we interact with each other. To me, it’s about using the tools we have in data science to serve society.”
SDSS is offering its first undergraduate courses this fall (2024). Can you share a little bit about what you will be teaching?
- Lee: “Next semester, I’m teaching Intro to Data Science. Back in college, I actually changed my major and my whole career because of one introductory course. So I am very passionate about making this a good experience that is exciting and hands-on, and it’s not just the students sitting in a classroom and listening to me talk for an hour. We don’t expect you to have any computational knowledge. Just come with an appetite, and hopefully we’ll teach you enough to want to learn even more.”
Why is it important for students to learn about data science?
- McAvoy: “I think people use data and sometimes misinterpret it. So one example could be in baseball, where Player A has a higher batting average than Player B for five seasons in a row. It might surprise a lot of people, but it would be possible for Player B to have the higher average overall. We’ve seen court cases and other areas where this idea was very important. So even if you’re not opening up a computer and running a lot of tests or writing code, it’s important to understand that paradoxes like that can exist.”
Can you share a little bit about your research?
- Lee: “Nowadays, we’re flooded with data that comes from advanced sensing or the internet. And these data are no longer this well curated, very clean, very small data set that we used to work with. Understanding that kind of data is very difficult. So my work asks, ‘Well, what if we know something else about the data?’ Maybe that is a relationship between the data samples. Or maybe it is that the data we are working on can be represented with fewer numbers that can give us information to understand it better. Some of my work has been with pediatric sleep studies where you get lots of data that are noisy and difficult to interpret. Another is scientific publications where there is a lot of text data and metadata. It’s such a firehose that it’s hard to see trends and get something useful out of it.
- McAvoy: “I’m interested in conflicts of interest and social dilemmas. So you and I might interact and coordinate on an outcome that is better than if we don’t coordinate. But we may also be tempted to exploit each other and do better for ourselves. So I’m really interested in populations and behaviors that can help us overcome that and get to more collaborative behavior.”
What’s one way people encounter or use data science in their everyday lives?
- Olivella: “There are few things that are so innocuously named, that also have so much power, as the third-party cookie. You hear the word and you think, ‘It’s just a cookie!’ But the truth is that it’s a powerful source of data and it has fueled the way we use the internet. It has fueled us as a product for companies. It has fueled many of our experiences on the internet to be as good or bad as they are.”
- Lee: “I’m not a neuroscientist, but our brains are processing data that our eyes and our hands and our ears catch all the time. There’s a lot of steps to correct things that don’t seem right. Ignoring some information, enhancing some information, and combining all the different sensory information that we have into the way we see the world. And that is our brain doing data science in the best way possible, and much better than a lot of models can do it right now.”
Have you ever ignored the data in a decision you made? What were the results?
- Lee: “I made decisions about which graduate schools to apply to and where to go, and honestly I didn’t use much data. I was like, ‘Oh, I like these students. I could be friends with them! Oh, I think I can fit in here. And oh, I like this potential advisor!’ And I just kinda went with it. And I think it’s mostly worked out for me.”
- Olivella: “Well I can think of one time I followed the data with disastrous results. I was in Colombia with my family, and Google Maps told us to follow a route that took us on a road that wasn’t a road, but just this barely drivable stretch of beach. So it led me astray that one time, but for the most part I follow the data and it doesn’t let me down.”
Do you have one book or podcast recommendation for people interested in learning more about data science?
- Lee: “I like a podcast called ‘Stats + Stories.’ I enjoy it because it talks about statistics in the headlines and then talks to the researcher about what it means and really gets behind the story. And then my students tell me they like a YouTube series called ‘StatsQuest’, which is made by a former researcher at UNC. I haven’t personally checked it out, but I’ve heard good things.”
- McAvoy: “A book I like is ‘Calling Bullshit’ by Carl Bergstrom and Jevin West. I’m planning on assigning it as part of my class this fall. They’re both professors at the University of Washington. It’s all written in a way that you don’t need to be an expert with much technical background to get a lot out of this book. Another is ‘Weapons of Math Destruction’ by Cathy O’Neill. It’s similar in that it reads more like a nonfiction book than a textbook.”
- Olivella: “There’s a book called ‘Strength in Numbers’ by G. Elliot Morris, and it’s the history of how statistics and data science have been used in politics. Obviously, there is a negative taste around politics in general, and I think polling is a very important part of that perception of politics being broken. And so I think the book does a really good job of not just laying out how data science has been used in polling and politics in general, but also how it can be used to make politics better.”
Related Stories