Ivan Smirnov, PhD

computational social scientist

I am a computational social scientist at the University of Technology Sydney, where I employ a variety of computational methods—such as Natural Language Processing, Machine Learning, and Network Science—to explore questions fundamental to the well-being of communities. My primary focus is on inequality and segregation, with a particular interest in gender inequality, on combining digital traces with survey data, and on understanding and modelling human behaviour.

Prior to moving to Australia, I worked as an Assistant Professor at the University of Mannheim, and before that, I led a research group at the Higher School of Economics.

My research has been regularly presented at flagship conferences in my field, such as IC2S2 and ICWSM; published in Proceedings of the National Academy of Sciences, EPJ Data Science, PNAS Nexus, and Royal Society Open Science; and covered by leading Australian and international media including ABC TV, MIT Technology Review, The Times, and Nature.

In teaching, my goal is to provide social sciences students with the computational tools necessary to effectively tackle their research interests. Simultaneously, I aim to help computational sciences students appreciate the complexities of social sciences and become inspired by its big challenges. I also strongly believe in the importance of including meta-scientific knowledge in my classes, e.g. discussions on the nuances of publishing a paper.

In order to empower the next generation of computational social scientists, I twice co-organised and taught at the Summer Institute in Computational Social Science. I have also been involved in social entrepreneurship, helping to launch Teach for Russia RU, a program modelled after Teach for America, and a mentoring program Sci.STEPS.

I am sharing my thoughts on academia, migrant life, and other topics I care about on Substack. I try to organize these thoughts in my Digital Garden.


Email: ivan@ismirnov.eu
Curriculum Vitae        

Research Highlights

Toxic comments are associated with reduced activity of volunteer editors on Wikipedia

For a project entirely relying on volunteer work, Wikipedia's success is remarkable. It is the fourth most popular website on the internet, behind only such giants as Google, YouTube and Facebook. Every day, millions of people worldwide use it for quick fact-checks or in-depth research. And what happens to Wikipedia matters beyond the platform itself because of its central role in online information infrastructure. Given Wikipedia’s encyclopedic status, many do not suspect that discussions between editors could be quite heated. For example, one editor wrote to another: “i will find u in real life and slit your throat”. Our research reveals not just the presence of toxicity on Wikipedia, but also its significant impact on the editors. This could impact the quality of Wikipedia content and threaten the long-term viability of the project.

Read the paper

Schools are segregated by educational outcomes in the digital space

While there are many studies on the friendship between students, most of them focus on students from a single educational institution, i.e. study friendship ties within one school or one university. As a result, little is known about social connections between students from different schools. In this paper, I have used digital traces to investigate interschool friendship on a scale of the whole city. I have analyzed data on 37,000 students from 590 schools and their friendship links on VK and have found that students from similar performing schools tend to become online friends. One might assume that this is a trivial consequence of the geographical segregation of schools. However, by adding data on school locations and apartment prices, I was able to show that segregation in the digital space is in fact much stronger than geographical segregation.

Read the paper

Predicting academic performance from social media posts

In this paper, I have built a model to predict the academic performance of students from their posts on social media. I have combined unsupervised learning of word embeddings on a large corpus of social media posts with a supervised model trained on data from a nationally representative sample of young adults. This data set contains the academic performance of students measured by a standardized test as well as information on their public activity on social media. I have used a continuous-vocabulary approach that allowed achieving high accuracy using a relatively small training data set. It also allows computing interpretable scores for millions of words that are fun to explore!

Read the paper

Parents mention sons more often than daughters on social media

Parents' preference for sons is a well-known phenomenon that manifests in various forms from sex-selective abortions to higher investments in sons. In this paper, we used public posts made by 635,665 users on a popular Russian social networking site, to investigate public mentions of daughters and sons on social media. We find that both men and women mention sons more often than daughters in their posts. We also find that posts featuring sons receive more “likes” on average. Our results indicate that girls are underrepresented in parents’ digital narratives about their children. Previous studies have shown female characters are underrepresented in children’s books, textbooks, movies, and on Wikipedia. Gender imbalance in public posts may send yet another message that girls are less important and interesting than boys and deserve less attention, thus presenting an invisible obstacle to gender equality.

Read the paper

Selected Talks