Jump to content

User:TPK/Random page meanderings: Difference between revisions

From Wikipedia, the free encyclopedia
Content deleted Content added
moved from Wikipedia Village pump
m Blanked the page
 
Line 1: Line 1:
== Random page meanderings ==

I've just done a quick 50-page survey using the random page link. The results are at [[User:TPK/Random]]. I have to say they're not very encouraging... though I'll do a larger survey eventually of course, (50 pages doesn't give a proper indication of WP as a whole), but the numbers I got here still aren't that great. Essentially, half of the 50 pages were stubs or sub-stubs (and half of those again lacked <nowiki>{{stub}} or {{substub}}</nowiki>), 2/3 lacked at least one category, only 3 had a see also section, only 16 had any external links, only 6 had an image or diagram... though 44 were properly wikified, if that's any consolation. Not that we didn't already know, but these few numbers show how far WP has to go. Also, for the record, the majority of the articles were either biographical or 'other'.

I know some other people have done surveys like this, so are there any other results to share? Also, if there are other things I should be looking for in my next random meander, do tell.

[[User:TPK|TPK]] 22:13, 15 Sep 2004 (UTC) <small>[[User talk:TPK|Talk]]</small>

:Some comments:
:Were all the so-called stubs really stubs? I personally find it annoying and counter-productive when stub templates are placed on perfectly good short articles. Yes, many short articles could be expanded, but many don't especially need it compared to many longer articles which are more incomplete in respect to what they ought to cover. A short article is often quite adequate as is, in which case absence of a stub template is a good thing. As to a "see also" section, that's often not necessary either if there are well-chosen links within the article. External links are also not needed for many articles. Pictures and diagrams are nice, but not as important as basic accuracy which you, understandably, don't get into.

::Agree that the stub notice is often wrongly added to an article just because it is short. This makes the notice far less useful than it would otherwise be. [[User:Andrewa|Andrewa]] 07:13, 16 Sep 2004 (UTC)

::Well, not being able to tell immediately whether a given article ''could'' have been longer or not, I took any short article (i.e. less than one paragraph) as a stub. I agree that some articles simply can't be made very long, but I tend to think that a single paragraph can always be expanded in some way or another. Also, as no-one can immediately tell if an article is expandable (unless it happens to be a topic they know about), a stub needs to be brought to the attention of others, and if it is as long as it's going to get, then someone who knows can remove the tag. (Problem being someone else might replace it - prehaps this is a deeper problem with the whole stub/article length subject). External links - given the nebulousness of the web, there's bound to be at least one decent link for practically ever article, so I believe that xlinks should be more common than they are; see-alsos though, might not be neccessary, as you said if there are good internal links, as well as categories. [[User:TPK|TPK]] 10:34, 16 Sep 2004 (UTC)
:::There are more short articles, whether marked stub or not, than anyone can deal with in any reasonable time. And they keep increasing. Random marking of some of these for expansion helps no-one. My experience is that articles marked for expansion are in fact often better served not by expansion but by having content merged with another article and being changed into redirects. Or just left alone. But someone who knows nothing about the topic ignorantly stamps them with the information that they need expansion because it is easy to do. Yet articles unmarked are just as likely on the average to ''need'' expansion as articles marked, or just as unlikely. Indiscriminate labelling of short articles by editors who know nothing about the topics of the articles has destroyed any usefulness that stub marking might have had. And I disagree that it is more necessary to bring the fact that an article is short to people's attention than any other supposed defect in an article. Anyone can ''see'' that an article is short. Shortness doesn't ''need'' to be labeled. But because it is so easy to see, it is shortness that is labeled, by labelling the article as a stub for expansion, whether an article especially needs expansion or not. And far more serious defects are not noticed at all. Stub labelling as implimented in Wikipedia is, in my opinion, more harmful than good.

::::You got me thinking - yes, the way stubs are marked isn't really that useful in that an article might never be expandable (though I still think most stubs could be expanded by a sentence here or there, but some articles will just always hover at the stub limit). The first problem is that everyone has a different idea of how long a "stub" is: either it's 1 paragraph or less, or 2 or less, or anything under 300 words, or 200, or 100, or a certain number of lines or sentences or an exact number of characters (exluding spaces and punctuation) if you're especially anally-retentive. Anyway, stubs are subjective. That's Problem 1 with the current stub system. Secondly, as some articles might never expand, the problem of stubs is more of "is the article structually, informatively, content-wise, ''complete''" (or near-to), rather than "is the article longer than xyz characters". Problem 2. So for stubs to be useful, we need to move away from counting characters, and move towards finding an objective way to tell if an article is hopelessly short ''for it's subject'', or if it's as long as it's going to get. That's nigh on impossible to do though. Problem 3. Prehaps the stub system is useful in that it brings short articles to the attention of others, who might be able to expand them. But if they can't be expanded, they will remain as a perpetual stub, even if they're effectively complete. Problem the Fourth. Prehaps the stub system, for that reason and those above, needs to be completely overhauled. Either way, it needs to be looked at more closely, rather than being taken for granted. [[User:TPK|TPK]] 22:58, 16 Sep 2004 (UTC)

:::I've also created and expanded many articles with no external links. The links were all in related articles. Or the subject was not covered decently on the web in a form that fitted a distinct link or not decently covered at all on the web. For example, in an article about a Greek mythological figure, one should provide sources, many of which are found on the web. But normally in such an article the first mention of a particular source is also an internal link to another Wikipedia article about the source itself. It is the Wikipedia article about the source that contains the external links to sites on the webs where those sources can be found. The article on the mythological figure is likely to contain no external links at all. Maintaining a set of external links that are balanced and valid is difficult enough without attempting needlessly to maintain many of the same links in forty or fifty different articles. If a new translation of the ''Aeneid'' appears on the web or a site that once had such a translation has vanished, one doesn't want to have to update thirty or forty articles to update Wikipedia. One fix in one article should be sufficient.
:::[[User:Jallan|Jallan]] 20:55, 16 Sep 2004 (UTC)

::::OK, well external links don't need to be duplicated across a number of crosslinked articles, but what I would like to see is xlinks placed on standalone articles - pages that don't link through a see-also to another page that does contain the relevant xlink(s). But where a page is a standalone, I think more of them ought to have some link or another. Yes, a lot of topics will be so obscure the only info on the web will be: A) imaginary, B) a copy of what's already on the page in question, or C) otherwise worthless. But I still think there are a lot of articles that could benefit from external linking - WP isn't and never will be a one-stop shop for information on every topic. More xlinks are needed, ''in some places'', but not all, I'll give you that. [[User:TPK|TPK]] 22:58, 16 Sep 2004 (UTC)
:::::I am in ''total'' agreement. My argument was really against the criteria being used to judge the 50 articles in question, that they weren't sufficient. There seemed to be an assumption that lack of a stub template on a short article was bad. Not always, especially considering the indiscriminate labelling of articles as stubs that need explansion. There seemed to be an assumption that lack of external references was a defect. That is not always true. A better experiement might be for someone to randomly select 100 articles and put them on a special page for comment. Not fixup. The copies would be protected during the length of the experiment. (The actual articles, could still be fixed up or changed according to normal Wikipedia procedure.) After comments had been made on the copies, perhaps after two weeks, people could rate the articles in different ways, e.g. from 1 to 10 for accuracy, from 1 to 10 for formatting, from 1 to 10 for attaining NPOV, from 1 to 10 for excellence of the writing, from 1 to 10 for use of links and so forth. Then we might get a better true picture of how Wikipedia rates. I'd rather have in Wikipedia, for example, 100 short stubs needing expansion that are accurate and well written and balanced as they stand than 100 long articles that are untrustworthy in terms of factual accuracy or POV, which are more serious but less easily spotted flaws than simply being brief. [[User:Jallan|Jallan]] 14:07, 17 Sep 2004 (UTC)

:I went through about 50 random articles myself yesterday and found one obvious copyvio that has been hanging around since January 2003 though edited 12 times since, one speedy deletion candidate (a short article about a high school supposedly founded last year with a list of about five notable people who had attended it: obvious joke vanity), and a badly disguised advertisment article that I placed on VfD.
:[[User:Jallan|Jallan]] 02:52, 16 Sep 2004 (UTC)

:#Categories are new enough that the lack of them should be no surprise at all.
:#So did you add stub labels to the previously unmarked stubs? -- [[User:Jmabel|Jmabel]] 03:39, Sep 16, 2004 (UTC)

::Well, I did say the numbers show how far WP has to go, not that it was a suprise. As for editing, I only made changes to pages that absolutely needed them. I didn't add stub tags, no. [[User:TPK|TPK]] 10:34, 16 Sep 2004 (UTC)

*I have repeated TPK's experiment: see [[User:Stormie/Random]] for my results. &mdash;[[User:Stormie|Stormie]] 05:54, Sep 16, 2004 (UTC)

As an aside, how many pages do most people think would be enough for a properly representative survey? I think 50 was too few; prehaps 100? More? [[User:TPK|TPK]] 10:34, 16 Sep 2004 (UTC)
:It depends on what margin of error you are willing to accept. A sample size of 50 will give about +/- 14%. 100 is +/- 10%. 500 at +/- 4% would be good, but would take a long time. It should be easy enough to calculate the number of tagged stubs as a % of total articles. Problem is that not all tagged stubs are stubs and not all genuine stubs are tagged. [[User:Filiocht|Filiocht]] 10:50, 16 Sep 2004 (UTC)


Some time ago I did a 250 page sample: [[User:Pjacobi/Random]]. I'd suggest someone using a bot a having a local copy would produce a list 500 or 1000 random page titles, perhaps with some info (Categories?) already extracted. This sample can be devided betwen collaborators and a previously agreed on breakdown be done. [[User:Pjacobi|Pjacobi]] 14:01, 16 Sep 2004 (UTC)

:I think it would be useful to have statistics about the 'average' Wikipedia article. For example, how many links does it contain? What is the proportion of live links to non-existent links that it contains? For the statistically minded, I think it would be useful to know the 5th, 50th and 95th percentile values in each case. On the specific point of stubs, I hope that you have all seen the topbanana reports:
:*[[Wikipedia:Offline reports/Should this be a stub?]]
:*[[Wikipedia:Offline reports/Is this really a stub?]]
:[[User:Bobblewik|Bobblewik]]&nbsp;&nbsp;[[User talk:Bobblewik|(talk)]] 14:39, 16 Sep 2004 (UTC)


::Yes, seeing some percentiles or better a cumulative graph on some measures would be a very enlightening information. My favorite ones are ''number of edits'' and ''time since last non-minor edit''. The weekly stats give about 14 (?) edits per artcle but I fear this average is composed of some articles with 300 edits and a large number of articles with less than five edits. -- [[User:Pjacobi|Pjacobi]] 20:24, 17 Sep 2004 (UTC)

== Random page experiment ==
Following on from [[User:Jallan|Jallan]]'s idea in [[#Random page meanderings|Random page meanderings]] above, as well as earlier, simpler random page surveys, I have created a quick proposal/mockup/brainstorm of a large-scale random page experiment at [[User:TPK/Drafts/RPE]]. The gist of the proposal is that x randomly-selected articles (where x was proposed at 100, but that may be too many &ndash; or too few) are copied into (presumably my) User: space, or into some Wikipedia: space, and left ''in situ'' for a month or so. On the "clone" article's talk pages, there are a number of topics, such as Formatting, Length, Content, Spelling and Grammer, POV, et cetera, and users are invited to look at the clone article, then give it a score from 0 to 10 for each topic. During all this, the original pages will remain untouched, and can be edited as usual (although a link would be left on the talk page to the scoring page of the clone). Given enough time (and ratings), each article would be given a final "Wikiscore" from 0 to 10, which would rate how "perfect" the community perceives that article to be. This would give us some ideas about paradigm articles &ndash; the best and the worst &ndash; as well as giving us an idea of the state of WP's "average article". I don't know what else could be gained from the experience, or if it's really that useful at all. It's only a vague idea at this stage (and again I give credit to Jallan). Have a look at the draft, suggest what topics you would use for scoring, how the results might be used or collated, whether this is all a waste of time, how the articles might be selected other than randomly (prehaps some previous featured articles should be randomly selected and included to see how they score), and anything else, including whether this is all a waste of time. Thankyou for ''your'' time. '''[[User talk:TPK|T.]]'''[[User:TPK|P.K.]] <small>07:44, 19 Sep 2004 (UTC)</small>
:I believe that's the use of the 'validate' function in 1.4 (see [http://test.wikipedia.org/ Testwiki]) [[User:Ilyanep| ]] &mdash; [[User:Ilyanep|<font color="gray">Il&gamma;&alpha;&eta;&epsilon;&rho;</font>]] [[User talk:Ilyanep|<font color="#333333">(T&alpha;l&kappa;)</font>]] 14:16, 19 Sep 2004 (UTC)

Latest revision as of 07:05, 12 February 2023