Overview

As part of my research for my thesis and as a member of the computational linguistics and speech processing lab (CLASP) at RIT, I began research into the #WhyIStayed trend on Twitter shortly after it began. Eventually, I published my findings at NAACL-HLT 2015, presenting a short paper poster during NAACL’s June 2015 conference.

Published paper:

#WhyIStayed, #WhyILeft: Microblogging to Make Sense of Domestic Abuse

@InProceedings{schrading-EtAl:2015:NAACL-HLT,
  author    = {Schrading, Nicolas  and  Ovesdotter Alm, Cecilia  and  Ptucha, Raymond  and  Homan, Christopher},
  title     = {\#WhyIStayed, \#WhyILeft: Microblogging to Make Sense of Domestic Abuse},
  booktitle = {Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies},
  month     = {May--June},
  year      = {2015},
  address   = {Denver, Colorado},
  publisher = {Association for Computational Linguistics},
  pages     = {1281--1286},
  url       = {http://www.aclweb.org/anthology/N15-1139}
}

Abstract

In September 2014, Twitter users unequivocally reacted to the Ray Rice assault scandal by unleashing personal stories of domestic abuse via the hashtags #WhyIStayed or #WhyILeft. We explore at a macro-level firsthand accounts of domestic abuse from a substantial, balanced corpus of tweeted instances designated with these tags. To seek insights into the reasons victims give for staying in vs. leaving abusive relationships, we analyze the corpus using linguistically motivated methods. We also report on an annotation study for corpus assessment. We perform classification, contributing a classifier that discriminates between the two hashtags exceptionally well at 82% accuracy with a substantial error reduction over its baseline.

Technologies

I used primarily Python, Scikit-learn, NLTK, and TurboParser. I also utilized some tools in MATLAB for experimentation with dimensionality reduction. Later research integrated spaCy rather than NLTK and TurboParser for faster, more robust natural language processing, but this was after the NAACL paper was already published.

Presentations

I have presented this work at 3 different venues:

University of Rochester Medical Center’s Office of Mental Health Promotion: Community Counts Lunch and Discussion

Rochester, NY
July 31, 2015

Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

Denver, Colorado
June 02, 2015

Rochester Institute of Technologies Graduate Research Symposium

Rochester, NY
February 27, 2015

Data

Get the data used in this study here.