This post was written during a research visit at the Department of Computer Science at Aalto University, Finland, supported by the Helsinki Institute for Information Technology.

Perspective is an API that uses machine learning models to predict the impact of a comment on the conversation. One of the models predicts the extent to which the comment might be perceived as toxic. A toxic comment is defined as “a rude, disrespectful, or unreasonable comment that is likely to make you leave a discussion.” The documentation of the Perspective API is available here. The brand new (at the time of writing) peRspective package by Fabio Votta makes it easy to access the Perspective API from R.

The code used in this post is available here.

Data come from the Wikipedia Requests for Adminship (with text) dataset available from the Stanford Network Analysis Project. The dataset includes votes and comments in RfAs between mid-2003 and mid-2013. Below is an extract of the data with the year of the RfA, the source of the vote (the voter) and the target of the vote (the admin candidate), as well as the vote character (-1 for oppose, 1 for support, and 0 for neutral) and the text of the comment.

yea src tgt vot txt
2008 Acalamari ^demon 1 Was a great admin before, and will be again. Experienced with images.
2008 Qst ^demon 1 ’‘’Support’’’. What Acalamari said ;).
2008 Mr.Z-man ^demon 1 Of course. <font face=Broadway>
2008 Rjd0060 ^demon 1 Why not give them back? -
2008 Jj137 ^demon 1 ’‘’Support’’’ - definitely. &nbsp; ’’’

In the period covered by the data the number of RfAs has indeed declined substantially, as shown below (the data covers only parts of 2003 and 2013, so they are excluded). Both the number of successfull and unsuccessfull RfAs has been declining, but while in the early years more RfAs concluded with a nomination than rejection, the later trend is reversed, and more RfAs are rejected than approved.

In the English Wikipedia, RfA results are decided by a bureaucrat who considers all the votes and comments and determines whether there is consensus that adminship should be granted. Wikipedia emphasizes that the RfA process involves discussion and not a simple vote count. In practice, successful RfAs are those that receive a substantial majority of supporting votes of at least 75% of supporting and opposing votes, as shown in the figure below.

Some of the decline in the number of RfAs has been attributed to the increasing unpleasantness of the RfA process, during which candidates often come under attack for any real or perceived wrongdoing in their editing history (cf. Common knowledge: An Ethnography of Wikipedia by Dariusz Jemielniak).

The chart below shows the distribution of toxicity in comments accompanying votes in RfAs between 2004 and 2012 depending on the result of the RfA. Toxicity estimates provided by the Perspective API range between 0 indicating no tixicity to 1 indicating maximum toxicity. A majority of comments in both successful and unsuccessful RfAs have relatively low levels of toxicity, well below 0.25. Comments in unsuccessful RfAs tend to be a bit more toxic, but the medians in both cases are similar. All distributions are strongly skewed with long positive tails corresponding to rare cases of extremely toxic comments. Overall it looks like toxicity of comments has been stable over time, with no clear differences between the early and late years.

This graph shows the distribution of toxicity in comments accompanying votes in RfAs between 2004 and 2012 depending on the type of the vote, i.e., whether it was supporting or opposing the request, or was neutral. In all years, the median level of tixicity among opposing votes was higher than among supporting votes, with neutral votes in-between.

The pattern of average tixicity of RfA comments corresponds to the intuition that comments should be more toxic in elections that failed to approve the candidate as admin, and in opposing comments. The differences however are very small, and there is little change over time. If the Perspective API accurately estimates the tone of the comments, toxicity is an unlikely cause of the decline in adminship requests. It is also possible that the API does not appropriately capture the tone of the comments as they are perceived by the admin candidate.