Toxicity of comments to votes in Request for Adminship on English Wikipedia

17 Jun 2019, 23:49

R / toxicity / Wikipedia / peRspective / API / RfA

This post was written during a research visit at the Department of Computer Science at Aalto University, Finland, supported by the Helsinki Institute for Information Technology.

Perspective is an API that uses machine learning models to predict the impact of a comment on the conversation. One of the models predicts the extent to which the comment might be perceived as toxic. A toxic comment is defined as “a rude, disrespectful, or unreasonable comment that is likely to make you leave a discussion.” The documentation of the Perspective API is available here. The brand new (at the time of writing) peRspective package by Fabio Votta makes it easy to access the Perspective API from R.

In this post I use the Perspective API to assess the toxicity of voters’ comments that accompany votes in Requests for Adminship (RfA) on English Wikipedia. According to Wikipedia itself, “[R]equests for adminship (RfA) is the process by which the Wikipedia community decides who will become administrators (also known as admins or sysops), who are users with access to additional technical features that aid in maintenance.” Voting takes a week and any Wikipedia editor can cast a supporting, neutral, or opposing vote. Votes are accompanied with textual comments, which are the object of this analysis.

The code used in this post is available here.

Data come from the Wikipedia Requests for Adminship (with text) dataset available from the Stanford Network Analysis Project. The dataset includes votes and comments in RfAs between mid-2003 and mid-2013. Below is an extract of the data with the year of the RfA, the source of the vote (the voter) and the target of the vote (the admin candidate), as well as the vote character (-1 for oppose, 1 for support, and 0 for neutral) and the text of the comment.

yea	src	tgt	vot	txt
2008	Acalamari	^demon	1	Was a great admin before, and will be again. Experienced with images.
2008	Qst	^demon	1	’‘’Support’’’. What Acalamari said ;).
2008	Mr.Z-man	^demon	1	Of course. <font face=Broadway>
2008	Rjd0060	^demon	1	Why not give them back? -
2008	Jj137	^demon	1	’‘’Support’’’ - definitely.   ’’’

In the period covered by the data the number of RfAs has indeed declined substantially, as shown below (the data covers only parts of 2003 and 2013, so they are excluded). Both the number of successfull and unsuccessfull RfAs has been declining, but while in the early years more RfAs concluded with a nomination than rejection, the later trend is reversed, and more RfAs are rejected than approved.

In the English Wikipedia, RfA results are decided by a bureaucrat who considers all the votes and comments and determines whether there is consensus that adminship should be granted. Wikipedia emphasizes that the RfA process involves discussion and not a simple vote count. In practice, successful RfAs are those that receive a substantial majority of supporting votes of at least 75% of supporting and opposing votes, as shown in the figure below.

Some of the decline in the number of RfAs has been attributed to the increasing unpleasantness of the RfA process, during which candidates often come under attack for any real or perceived wrongdoing in their editing history (cf. Common knowledge: An Ethnography of Wikipedia by Dariusz Jemielniak).

The chart below shows the distribution of toxicity in comments accompanying votes in RfAs between 2004 and 2012 depending on the result of the RfA. Toxicity estimates provided by the Perspective API range between 0 indicating no tixicity to 1 indicating maximum toxicity. A majority of comments in both successful and unsuccessful RfAs have relatively low levels of toxicity, well below 0.25. Comments in unsuccessful RfAs tend to be a bit more toxic, but the medians in both cases are similar. All distributions are strongly skewed with long positive tails corresponding to rare cases of extremely toxic comments. Overall it looks like toxicity of comments has been stable over time, with no clear differences between the early and late years.

This graph shows the distribution of toxicity in comments accompanying votes in RfAs between 2004 and 2012 depending on the type of the vote, i.e., whether it was supporting or opposing the request, or was neutral. In all years, the median level of tixicity among opposing votes was higher than among supporting votes, with neutral votes in-between.

The pattern of average tixicity of RfA comments corresponds to the intuition that comments should be more toxic in elections that failed to approve the candidate as admin, and in opposing comments. The differences however are very small, and there is little change over time. If the Perspective API accurately estimates the tone of the comments, toxicity is an unlikely cause of the decline in adminship requests. It is also possible that the API does not appropriately capture the tone of the comments as they are perceived by the admin candidate.