Wikipedia:Wikipedia Signpost/2013-02-25/Recent research

Recent research

Wikipedia not so novel after all, except to UK university lecturers; EPOV instead of NPOV

A monthly overview of recent academic research about Wikipedia and other Wikimedia projects, also published as the Wikimedia Research Newsletter.

Wikipedia in historic context: "Stigmergic accumulation" is not new

Page with the entry Encyclopédie from Diderot and D’Alembert’s Encyclopédie. The work was the result of the collaboration of more than 100 contributors.

"Wikipedia and Encyclopedic Production"[1] by Jeff Loveland (a historian of encyclopedias) and Joseph Reagle situates Wikipedia within the context of encyclopedic production historically, arguing that the features that many claim to be unique about Wikipedia actually have roots in encyclopedias of the past. Loveland and Reagle criticize characterizations of Wikipedia that they believe to be ahistorical and exaggerated, laying special blame with authors who compare Wikipedia’s anonymous production to Encyclopedia Britannica’s production by named experts, and thus ignore the rich tradition of encyclopedic production through the centuries. The authors then set about characterizing the history of encyclopedic production as composed of three overlapping forms: compulsive collection, stigmergic accumulation, and corporate production.

"Compulsive collection" refers to the work of compiling encyclopedias that has traditionally been done by a few dedicated, tireless, detail-oriented individuals. Loveland and Reagle point out that, although Wikipedians share this compulsive behavior with past encyclopedists, the crucial distinction lies in the fact that the vast majority were motivated by money (even if this motive existed alongside more idealistic motivations) whereas Wikipedia editors are unpaid.

Loveland and Reagle use the term "stigmergic accumulation" to refer to the process of production by accretion onto a previous text. Even those responsible for a singly authored encyclopedia were relying on predecessors, the authors argue, "building on their work and using the cumulative character of texts and knowledge as a ladder of sorts". Examples of existing texts included the use of a previous edition of an encyclopedia that ran into multiple editions, and the practice of borrowing between different encyclopedias that was sometimes illegal but more often viewed as "piratical" i.e. morally wrong.

The category of "corporate production" is used by Loveland and Reagle to describe the process of encyclopedic editing by a group – groups that topped a thousand contributors in the 20th century. Editors of early encyclopedias like Diderot and D’Alembert’s Encyclopédie in the 1700s faced the challenge of trying to coordinate the contributions of about 140 contributors in a similar way to Wikipedia having to confront issues of consistency that result in debates about how important a subject must be to merit an article. In contrast to other encyclopedias, write Loveland and Reagle, Wikipedia settles these debates through community decision-making and in the open. The authors also note that previous encyclopedias didn’t always recruit on the basis of expertise and that some recognized that it would be cheaper and sometimes more accurate to have non-experts summarizing the works of experts.

The authors conclude by writing that although Wikipedia is in some respects unique in terms of its size, its reliance on volunteers and nonprofit nature, the continuities between Wikipedia and past encyclopedias are "numerous and significant". Their hope is that this paper "will help scholars avoid ahistorical claims about Wikipedia, identify historical material germane to the social scientist’s concerns (such as the motivations of encyclopedia-producers), and show that contemporary questions about Wikipedia (such as what exactly should be counted as a contribution) have a lifespan exceeding the past decade."

The collaboration of the two authors began after Reagle had learned (via a summary in this research report) of a review where Loveland had criticized Reagle's 2010 book Good Faith Collaboration: The Culture of Wikipedia for having "one major weakness, namely in historical contextualization". The resulting paper has received media attention starting with an article in The Atlantic, see the February 4 issue of The Signpost.

UK university lecturers still skeptical and uninformed about Wikipedia

Work has been ongoing for a while along the lines of recommendation 3: a brochure by the Wikipedia Education Program explains how to assess Wikipedia article quality. A number of similar materials exist on how to use Wikipedia in educational contexts, some also in other languages.

A study titled "Exploring the Cautionary Attitude Toward Wikipedia in Higher Education: Implications for Higher Education Institutions"[2] analyzes the attitudes of five British university lecturers towards Wikipedia, through qualitative analysis. The methodology consisted of 90-minute interviews with the lecturers who declared their familiarity with Web 2.0 educational tools, and analysis of university documentation, primarily in the form of “unofficial policy” regarding the use and evaluation of Wikipedia by students, as no official policy on Wikipedia existed. The author finds that Wikipedia is still treated with suspicion by the educators interviewed, due to 1) a lack of understanding of Wikipedia, 2) a negative attitude toward collaborative knowledge produced outside academia, and 3) the perceived detrimental effects of the use of Web 2.0 applications not included in the university suite. Some factors of particular concern included 1) "the difficulty of knowing if an article is correct," 2) doubts regarding the quality of information produced by anonymous contributors, 3) doubts about whether the Wikipedia crowdsourcing model of knowledge production can really outperform the experts (reviewer note: but of course...), 4) concerns that Wikipedia makes research "too easy" for students and 5) misunderstanding of Wikipedia's non-profit nature, with two interviewees suspicious of Wikipedia, expressing concerns such as "this is a commercial business; it wants to make money" and "they are obviously doing it from a business perspective", and another concerned about "politics and motivation behind [Wikipedia's dominant position on the Internet]". All the interviewed teachers hoped that the library staff would be able to provide guidance to students regarding evaluating when to use Wikipedia; however correspondence with the library staff showed that their guidelines are mostly inapplicable to Wikipedia, and they do not address Wikipedia during their literacy teaching sessions. The research also found that all five lecturers use Wikipedia in personal life, and four, in professional research, with two of them commenting that they feel a bit hypocritical using the same tool they warn the students about. None of the interviewed lecturers contributed to Wikipedia, and only one was aware of any outreach from Wikipedia to academics. As the author notes, Wikipedia is still alien to the academic culture, and while the attitudes are shifting, there is still much misunderstanding about Wikipedia's reliability, quality and non-profit mission. The author concludes that Wikipedia should address those concerns through the following recommendations: 1) "Increase understanding of Wikipedia, its policies and processes." 2) "Increase understanding of the nature of open and free collaboratively produced knowledge." and 3) "Make available Wikipedia guidance and evaluation criteria to students and teaching staff."

Saint Petersburg has more sisters than any other city in the world

Connections between twin cities. Brighter colours indicate shorter distance.

A research team from Barcelona Media Lab has put a preprint in arXiv recently,[3] in which the sistership relation between cities is analyzed. Although the paper has little to say about Wikipedia in itself, the corresponding dataset is extracted from the English Wikipedia, which, according to the paper, hosts "the most extensive but certainly not complete collection of this kind of relationships". Although this statement could be argued, and no evidence is presented in the article to support that claim, it is striking that mass collaboration of Wikipedians can provide such an interesting package of information on a global scale.

After a description of the data extraction process, the paper presents a set of standard Complex Network analysis, e.g., degree distribution, clustering, average separation of nodes on the network, etc. While there is no surprise among the results of the network analysis, the interesting conclusion is about the effects of the geographical distance of sister cities: "the geographical distance has only a negligible influence when a city selects a sister city."

"Distributed Wiki" proposal to replace NPOV with "every point of view (EPOV)"

In a paper titled "Towards Content Neutrality in Wiki Systems"[4] (an extended version of a conference keynote), a German computer scientist criticizes Wikipedia's "Neutral Point of View" policy as based on "an objectivist point of view" that is prominent in parts of the natural sciences, but is challenged in the fields of quantum physics, psychology and the social sciences, according to the author. He offers differences between language versions of the Wikipedia articles on Osama Bin Laden and the Mossad as further arguments against the NPOV concept. He admits that "two examples, based on machine translation and subjective classification by an author who wants to prove his point do not show anything", but claims that even more systematic studies "would suffer the same objection", proving "that the object under consideration, i.e., a neutral point of view, logically may be considered an ill-defined concept". However, it is still used on Wikipedia, and even ingrained in its architecture where the "linear version history evokes the illusion that there is one 'currently best' version of an article". To explain why so many Wikipedians do not accept his own logical conclusion, the author offers the psychological diagnosis of cognitive dissonance: "Accepting the illusion of NPOV, one does not have to live with never ending edit-wars on the ultimately right article and one does not have to suffer dissonant feelings in every article".

Heading back towards his own area of academic expertise, the author then outlines "a variant-augmented Wiki system" to modifying the linear versioning of usual wikis. He bases it on a proposed concept of "content neutrality" requiring "knowledge management platforms" "to store and provide content without any evaluation of its merit", modeled after network neutrality. He argues that "a collection of every-point-of-view, contradictory, possibly emotionally charged articles may provide a better approximation to reality than a synthetic and illusionary neutral point-of-view" and hopes that such a "content-neutral, EPOV knowledge base [has] a real chance of becoming a truly helpful instrument for science." Various possible problems with such an architecture are outlined, together with some suggested solutions, but the discussion remains brief and without detail ("The formal design and security analysis of low-level protocols are left to a later paper"). To protect against malicious site operators, the concept is extended to "distributed Wiki hubs". The paper mentions the existence of "a preliminary implementation as a MediaWiki plug-in prototype" and ongoing work on a use case "which can be described as a Wiki / Blog merge".

A section on related work cites a few earlier papers discussing distributed wikis or Wikipedia forks (out of many more, see e.g. this reviewer's "Timeline of distributed Wikipedia proposals").

Briefly

  • Mildly negative feedback makes newbies work harder: A paper[5] to be presented at the upcoming Conference on Human Factors in Computing Systems (CHI'13) reports on a "field experiment on Wikipedia to test the effects of different feedback types (positive feedback, negative feedback, directive feedback, and social feedback) on members’ contribution." The team from Carnegie Mellon University left user talk page messages for 703 English Wikipedia editors who had recently created a new article. These messages (example) were varied to test the effect of the different feedback types on the user's subsequent activity. To the researchers' surprise, none of these had a significant effect on experienced users. But for new editors, "positive feedback and social messages increase people’s general motivation to work". Negative feedback and directive feedback still increased newbies' edits on the corresponding articles. The researchers note those negative messages (example: "I noticed there are some holes that may need filling: the references in the article do not follow Wikipedia guidelines") "were intentionally designed to be milder than negative feedback messages actually sent between Wikipedia editors" which other research has found to decrease participation. In a 2011 paper, three of the authors had studied similar phenomena in a non-participatory analysis (review: "How different kinds of leadership messages increase or decrease participation").
  • Students editing at PhD level in APS Wikipedia initiative: Another conference paper for CHI'13[6] reports on a project where 640 undergraduate and graduate students edited Wikipedia articles on scientific topics in 36 university courses. This classroom editing project was part of the US Association for Psychological Science's Wikipedia Initiative. The authors found that the "students substantially improved the scientific content of over 800 articles, at a level of quality indistinguishable from content written by PhD experts" (measured in a content persistence metric).
  • Students unimpressed by professors' disapproval of Wikipedia: A paper on the factors affecting Wikipedia use,[7] based on a survey of 184 undergraduate students in Singapore, has a few suggestive findings:
    Female students were more likely to use Wikipedia.
    Believing that authority figures (such as professors) disapprove of Wikipedia use did not affect students' likelihood of using Wikipedia.
    Peer influence—whether or not students think their friends use Wikipedia—was significantly correlated with Wikipedia use.
    Whether these findings hold true for larger and more diverse groups of students is an open question.
  • Voting open for most important paper in Wikipedia research: As reported earlier, the French Wikimedia chapter is providing an award for the most influential paper published between 2003 and 2011. Out of more than 30 submissions, a jury of researchers has now selected five finalists, and until March 11 all Wikimedians are invited to vote to select the winning paper among them. Its authors will receive a grant of 2500 Euros.
  • Teahouse compared to Help Desk: A workshop paper titled "How to ask questions the n00b way: designing social Q&A for new users[8], presented at the CSCW 2013 "Workshop on Social Media Question Asking" covers the Teahouse, the support space for new editors launched on the English Wikpedia in 2012. It summarizes results from a longer paper presented at the same conference, which has been covered in the December issue of this research report. These include a survey among participants indicating that the Teahouse was generally well-received, and a comparison of edit rates showing, somewhat unsurprisingly, that newbies who followed an invitation to join the Teahouse tended to make more edits than those who ignored the invitation. (No attempt was made to measure the effect of invitations directly by comparing with non-invited newbies, because of the possible bias caused by Teahouse hosts avoiding inviting editors whose first edits do not seem productive.) Questions in the Teahouse generated more responses than those on the Help Desk, the English Wikipedia's longer-running help forum which is less focused on social elements and new editors.
  • Predicting admin elections based on social network analysis: A paper[9] modeled all admin elections on the Polish Wikipedia (since 2005) based on "multidimensional behavioral social networks derived from the Wikipedia edit history" of candidates and voters, finding that "we can classify the votes in the RfA procedures using this model with an accuracy level that should be sufficient to recommend candidates." (See also our review of an earlier paper by the same authors: "What it takes to become an admin: Insights from the Polish Wikipedia")
  • "Analysing the Entire Wikipedia History with Database Supported Haskell": Four researchers from Germany describe[10] a technique that allowed them to scale a quantitative analysis from a Wikisym 2010 paper (that had examined the collaboration on a smaller sample of 4,733 articles and 4,679 users) to the entire revision history of the German Wikipedia.
  • Traffic analysis report and research ethics: The Signpost's special report titled "Examining the popularity of Wikipedia articles: catalysts, trends, and applications"[11] gave an overview of several research topics regarding pageviews on the English Wikipedia, including a list "the most viewed pages on Wikipedia in a one hour period" since 2010 that generated media attention, e.g. in the Atlantic ("If You Want Your Wikipedia Page to Get a TON of Traffic, Die While Performing at the Super Bowl Half-Time Show"). The report also discusses possible causes of such page view spikes, and applications of research on pageviews, such as focusing efforts to improve article quality and assessing the impact of vandalism. The latter idea drew from earlier studies of one of the authors, including an experiment that had generated controversy in 2010 and caused the English Wikipedia's ArbCom to block the author temporarily, as reported in the Signpost at the time: "Large scale vandalism revealed to be 'study' by university researcher". A recent paper about research ethics that the author coauthored with his doctoral advisor and two others[12] criticized the University of Pennsylvania's institutional review board for its hesitant approval of the 2010 experiment, justified the lack of advance notification of Wikipedia community (because it would have biased the results of the experiment) and also talks about the "extremely mixed" response from reviewers of the resulting paper (reviewed in the September 2011 issue of this research report: "Link spam research with controversial genesis but useful results").
  • "Faces of Wikipedia" dataset: Two researchers from the École Polytechnique de Montréal have compiled[13] a database with facial images of over 50,000 Wikipedia article subjects, used to test facial recognition algorithms.
  • Detecting news events from Wikipedia edits: A paper to be presented at the annual European Conference on Information Retrieval[14] describes the detection of "real-world events such as political conflicts, natural catastrophes, and new scientific findings" from Wikipedia edits. Apart from bursts (peaks) in the editing activity of an article, another indicator used is the appearance of a current or recent date in the diff of an edit.
  • Evidence for damage caused by personal attacks and wikilawyering: A preprint titled "Stay on the Wikipedia Task: When task-related disagreements slip into personal and procedural conflicts" [15] reported on an analysis of 96 Wikipedia articles and the corresponding talk pages which found "that when group members’ disagreements – originally task-related – escalate into personal attacks or hinge on procedure, these disagreements impede group performance."
  • Corpus of 200+ research papers on Wikipedia from 2012: With this monthly research update having completed its second volume recently, we have released a bibliographical dataset listing all of the more than 200 academic publications that were covered in 2012.

Notes

  1. ^ Loveland, J.; Reagle, J. (2013). "Wikipedia and encyclopedic production". New Media & Society. doi:10.1177/1461444812470428. Closed access icon
  2. ^ Gemma Bayliss: Exploring the Cautionary Attitude Toward Wikipedia in Higher Education: Implications for Higher Education Institutions. New Review of Academic Librarianship Volume 19, Issue 1, 2013, doi:10.1080/13614533.2012.740439 Closed access icon
  3. ^ Andreas Kaltenbrunner; Pablo Aragón; David Laniado; Yana Volkovich (2013). "Not all paths lead to Rome: Analysing the network of sister cities". arXiv:1301.6900v1 [cs.SI]. Open access icon
  4. ^ Clemens H. Cap: Towards Content Neutrality in Wiki Systems. Future Internet 2012, 4(4), 1086-1104; doi:10.3390/fi4041086
  5. ^ Haiyi Zhu, Amy Zhang, Jiping He, Robert E. Kraut, Aniket Kittur: Effects of Peer Feedback on Contribution: A Field Experiment in Wikipedia. CHI 2013, April 27–May 2, 2013, Paris, France PDF
  6. ^ Rosta Farzan, Robert E. Kraut: "Wikipedia Classroom Experiment: bidirectional benefits of students’ engagement in online production communities" CHI’13, April 27–May 2, 2013, Paris, France. PDF
  7. ^ Chung, Siyoung (August 2012). "Cognitive and Social Factors Affecting the Use of Wikipedia and Information Seeking". Canadian Journal of Learning and Technology. 38 (3). Retrieved 26 February 2013.
  8. ^ Jonathan T. Morgan: How to ask questions the n00b way: designing social Q&A for new users. CSCW 2013 Workshop on Social Media Question Asking, http://research.microsoft.com/en-us/events/cscw2013smqaworkshop/morgan.pdf
  9. ^ Jankowski-Lorek, M.; Ostrowski, L.; Turek, P.; Wierzbicki, A. (2013). "Modeling Wikipedia admin elections using multidimensional behavioral social networks". Social Network Analysis and Mining. doi:10.1007/s13278-012-0092-6. Open access icon
  10. ^ George Giorgidze, Torsten Grust, Iassen Halatchliyski, and Michael Kummer: Analysing the Entire Wikipedia History with Database Supported Haskell. Fifteenth International Symposium on Practical Aspects of Declarative Languages (PADL'13), Rome, Italy, January 21-22, 2013. PDF
  11. ^ West.andrew.g and Milowent: Examining the popularity of Wikipedia articles: catalysts, trends, and applications. Wikipedia Signpost, February 4, 2013, Wikipedia:Wikipedia Signpost/2013-02-04/Special report
  12. ^ Andrew G. West, Pedram Hayati, Vidyasagar Potdar, and Insup Lee (2012). Spamming for Science: Active Measurement in Web 2.0 Abuse Research. In WECSR '12: Proceedings of the 3rd Workshop on Ethics in Computer Security Research, LNCS 7398 (J. Blythe, S. Dietrich, and L.J. Camp eds.), pp. 98-111. Kralendijk, Bonaire. PDF
  13. ^ Md. Kamrul Hasan, Christopher J. Pal: Creating a Big Data Resource from the Faces of Wikipedia. http://www.professeurs.polymtl.ca/christopher.pal/BigVision12/HasanBigVision12.pdf
  14. ^ Mihai Georgescu, Nattiya Kanhabua, Daniel Krause, Wolfgang Nejdl, and Stefan Siersdorfer: Extracting Event-Related Information from Article Updates in Wikipedia. ECIR 2013 PDF
  15. ^ Ofer Arazy, Lisa Yeo (2013): Stay on the Wikipedia Task: When task-related disagreements slip into personal and procedural conflicts. To appear in: Journal of the American Society for Information Science and Technology PDF