Analyzing Domestic Abuse using Natural Language Processing on Social Media Data

A Thesis Submitted in Partial Fulfillment of the Requirements for the Degree of Master of Science in Computer Engineering.

, in 13 July 2015


Social media and social networking play a major role in billions of lives. Posts made publicly available on websites such as Twitter, Reddit, Tumblr, and Facebook can contain deeply personal accounts of the lives of users – and the crises they face. Health woes, family concerns, accounts of bullying, and any number of other issues that people face every day are detailed on a massive scale online. Utilizing natural language processing and machine learning techniques, these data can be analyzed to understand societal and public health issues. Largescale, expensive surveys need not be conducted with automatic understanding of social media data, allowing faster, cost-effective data collection and analysis that can shed light on sociologically important problems.

In this thesis, discussions of domestic abuse in social media are analyzed. The efficacy of classifiers that detect text discussing abuse is examined and these texts are analyzed for a comprehensive view into the dynamics of abusive relationships. Analysis reveals micro-narratives in reasons for staying versus leaving an abusive relationship, as well as the stakeholders and actions in these relationships. Findings are consistent across various methods, correspond to observations in clinical literature, and affirm the relevance of natural language processing for exploring issues of social importance in social media.


This thesis can be read in full through RIT Scholar Works. Get the data used in this thesis here.