Friday, March 23, 2007

And for some big news....

Tito's note: This is the third in a series of articles about the Version 1.0 Editorial Team in the English Wikipedia.

Let's say that we have some big news...

A static release of Wikipedia is now a reality. :)

Copying the words of Martin Walker, the WP:1.0 coordinator for Version 0.5:

Assuming there are no last minute hitches, the
Wikipedia:Version 0.5 CD should be going on sale on March 26 at for around 10 Euros/$US13-14 (a portion goes to the Foundation). It will also be made available for free download. It consists of 1964 articles and a set of navigation pages, with an open source (GPL) search engine, Kiwix, developed by Linterweb. We now have an ongoing collaboration with Wikimedia France, and User:Kelson wrote many of the scripts for Version 0.5. This CD will make a great birthday present for your loved one! Walkerma 05:54, 22 March 2007 (UTC)

Remember that this is a *test* release. Please check it out, have a look at it, and give us feedback about it.

By the way, all of the articles included in Version 0.5 will be available in Version 0.7, which is being prepared right now. So come join the Version 1.0 Editorial Team to help out!


Monday, March 19, 2007


Tito's note: This is the second in a series of articles about the Version 1.0 Editorial Team in the English Wikipedia.

I was planning on making a longer article, but due to time constraints, I'm not able to do so. However, during a 1.0 meeting earlier today, there was an interesting proposition, and I wanted to ask the greater Wikimedia community about it:

The German Wikipedia has published WikiReaders before. Why cannot the English Wikipedia do so?

There is a page in enwiki that is... pretty much dead nowadays, but that has a few suggestions about potential WikiReader content. While the suggestions are old (and probably not a good idea now, as articles may have decayed on quality), it did bring up a troubling question: why is the English Wikipedia not producing them? There were two answers that came to mind:

  1. No one knows about them. If users don't know that they can make WikiReaders, they won't try to make them.
  2. More importantly, users do not know how to make them. The user who proposes a WikiReader will probably not know who to contact, and what issues follow from there.
Problem No. 1 is relatively easy to solve. No one knew about Version 0.5 until the Version 1.0 Editorial Team began doing some legwork. However, the second problem is much more of a community-wide issue that cannot be addressed by a group of users alone.

So, I pose the following questions to everyone:
  1. Do you have a group of articles that you would like to see in a WikiReader?
  2. We are beginning to consider how we can make publication of selected subsets of Wikipedia (or even Wikibooks, even though it is a bit outside our scope) articles easier. What suggestions do you have for the process?
  3. What would you want to see as the end result?
  4. Do you even want us to consider this, or do you think this is a waste of time?
  5. Would you be willing to help us at any stage in the process?
Note that the idea is not half-baked... it is about a quarter-baked. :P However, if you consider that the best way to encourage users to write is to showcase their work (see [[WP:FA]], [[WP:TFA]] and Raul654's thoughts about the issue, for the prime example), publication of WikiReaders is a great way to encourage editorial improvement of Wikipedia. It also should be orders of magnitude less complicated than a full release, such as V0.5.

Anyways, I'm eagerly interested in hearing all of your opinions, either here, or at 1.0's talk.


Friday, March 16, 2007

WP:1.0 assessment scale

Tito's note: This is the first in a series of articles about the Version 1.0 Editorial Team in the English Wikipedia.

As of today, there were a total of 385,469 assessed articles in the English Wikipedia. If we use the figure from Special:Statistics of 1,688,879 articles (sorry, no permalink here), that means that 22.8% of articles have been looked at by someone and classified according to their quality. (If we count the number of unassessed articles that are stored in the 1.0 database, we have 40.7% of the article base covered.) While the proportion of the numbers themselves make for interesting observations, most users do not know where those numbers come from, or how they are processed. Since I was involved in the design of the WP:1.0 bot framework, I thought it would be a good idea to explain how it works. It is quite a fascinating process, if I'm allowed to say so myself... ;)

First the WikiProject needs to do a bit of legwork. The WP1.0 bot uses the Mathbot code, which uses Perl to determine if there were any additions or subtractions to a particular category. Therefore, the vast majority of the processing is just a matter of, "Was an article added to this category? Was one removed? Was there an article in a category that was in a different category yesterday?" Before those operations can be done, the categories to process must, well, exist. Picking on WikiProject Tropical cyclones (as usual), the categories to make are:
  • Category:Hurricane articles by quality
    • Category:FA-Class hurricane articles
    • Category:A-Class hurricane articles
    • Category:GA-Class hurricane articles
    • Category:B-Class hurricane articles
    • Category:Start-Class hurricane articles
    • Category:Stub-Class hurricane articles
    • Category:Unassessed-Class hurricane articles
All the "class" categories must be subcategories of the "by quality" category. Otherwise, the bot will not find them. Also, the bot will not find the entire category tree unless the "by quality" category is itself a subcategory of Category:Wikipedia 1.0 assessments. The WP:1.0 bot reads this category and begins spidering from there.

Projects also have the ability to categorize pages by importance or priority. This is slightly more controversial, (take WikiProject Biography, for example: how can you say that someone is of "Low importance" without upsetting the person?) and is not required. However, for the projects that desire to use this portion of the framework, there's another category setup to do, parallel to the quality categories:
  • Category:Hurricane articles by importance
    • Category:Top-importance hurricane articles
    • Category:High-importance hurricane articles
    • Category:Mid-importance hurricane articles
    • Category:Low-importance hurricane articles
    • Category:No-importance hurricane articles
As with the "by quality" categories, these categories are a tree that needs to be a subcategory of Category:Wikipedia 1.0 assessments. (N.B.: You can have the bot do this for you, but you'll need a neighborhood's friendly admin's help.)

Once this is done, the way most projects feed their articles onto the bot framework is by adding a "class" parameter to their WikiProject banner, and optionally, an "importance" parameter as well. Once the MediaWiki job queue does its thing, all of the articles are in the "Unassessed" category.

Featured articleFA A Good articleGA B Start Stub Unassessed

Hurricane Nora (1997)
Hurricane Katrina
Hurricane Wilma

Assuming for a moment that these articles are brand-new and unassessed (which they aren't), at this point, the bot has still not done its daily run. At about 3:00 UTC that day, the bot starts its run. It reads the category tree, and copies them inside its internal database. The table above is now mirrored in the hard drive of the bot's computer. The bot also spits out a log, and updates both the statistics table for the project, and the global database.

Over the course of the day, these articles are assessed. This is done by updating the parameters on the WikiProject banner. An example of this would be the following, on Talk:Hurricane Katrina:


That same day, the three articles are assessed: Hurricanes Nora and Katrina to FA-Class, and Wilma to B-Class. So, our categories in the wiki are now the following:

Featured articleFA A Good articleGA B Start Stub Unassessed
Hurricane Nora (1997)
Hurricane Katrina

Hurricane Wilma

At this point, the bot runs. Again, all it does is compare the current categories and the previous category snapshot. The bot sees that the FA-Class was formerly empty, and now contains two articles: Hurricane Katrina and Hurricane Nora; the B-Class category contains Hurricane Wilma, and the Unassessed category is now empty. The bot now updates the statistics tables and the log, and updates its own snapshot with the current data.

The internal logic of the bot allows it to see that in these two cases, the articles were upgraded from one class to another, and the log will reflect that. If the bot sees a new article, then the log will also identify it as a new addition. If an article was removed, then the bot will flag its removal in the log as well. A recent modification to the code now allows page moves to be adequately recorded, instead of being recorded as an addition and a deletion.

This is the bot structure that 427 WikiProjects, task forces, Regional noticeboards, etc. use nowadays to monitor article quality and editorial progress. It works flawlessly, except that it cannot scale forever. The bot is now run every other day, as each one of its runs took approximately 36 hours to finish. This bot framework is also used by the Hungarian Wikipedia, and the Spanish Wikipedia as well; so if a friendly developer wants to make this part of the Wikimedia wiki farm's software arsenal, a lot of people would be happy... :)


Monday, March 12, 2007

Wikipedia:Miscellany for deletion/Autograph books

Apparently, I missed the fun of the MFD debate, but situations like these raise troubling questions. Many of the arguments to delete the page boiled down to, "Pages like these do not help in building the encyclopedia. Therefore, they must be deleted." Let's ponder on that line of thought for a second, and analyze it, to see how problematic it really is.

The first portion of the position indicates that Wikipedia is only an encyclopedia, and that ancillary activities such as signing autograph books is tantamount to wasting time. While Wikipedia is primarily an encyclopedia, any editor that stays on the site for a significant amount of time recognizes that it is simply impossible to run and maintain the encyclopedia without any socialization with others. One of the site's raisons d'ĂȘtre is that whatever mistake a user does, a second user will correct; in order to maximize the efficiency of this self-corrective process, it is necessary to allow users some (note: not complete) leeway to socialize with other users.

Other users indicated that while they saw no harm in the pages, they also saw no use, so they should be deleted. While the "no use" is a personal judgment that I respect, I have to disagree with the "no harm" assertion. If a group of productive users is doing something such as maintaining a page to keep an autograph book, or an "office bracketology pool", or something that is otherwise innocuous, I don't see how it is productive to go ahead and say, "No, you are violating policy." The reason? There are two possible outcomes to this. The user stops participating in the side activity, but is also forced to go elsewhere, which means that he will spend less time editing Wikipedia. (Remember, the assumption here is that we're talking about productive users, not users whose entire purpose here is to use Wikipedia as a chatroom.) The other outcome is that the user resists and turns on the defensive, which increases the probability of occurrence of a heated situation.

Finally, it is inherent in human nature to try to personalize one's space, to make it one's own, and to make it a place where one can feel comfortable. From my own personal experience, I remember that back when I was a n00b, the first thing I did was to make a user page, so I would not feel as much of an outsider in a new place. Some users have previously indicated that they do not understand why this is an issue at all; however, for me, it was similar to extending one's arm for a handshake upon entering a new space. If I had had my user page prodded in the month between I made my first edit to my user page and when I made my first edit to Dubya, I would have considered Wikipedia to be a hostile place, and would have never returned. The last thing we want to do as Wikipedians is to make Wikipedia appear as a contentious environment.

Our success depends on how many editors are comfortable editing here, and taking actions that are "anti-User" or "anti-Community" on the surface do not help us retain badly-needed users. Our success depends on whether we can produce a culture that nurtures collaborative processes between editors, perhaps even more than on the quality of our articles. As a result, if I had seen the debate while it was open, I would have opposed the nomination, and !voted keep.



All right, let's see how much I actually use this thing... please bear with me as I learn how to customize it, and most of all, how to actually use it... :P

Currently, not much going on. After the storm due to the credentials controversy subsided, everything else seems to have returned to normal. Vandals still are wasting our time, hurricane articles need to be improved and sent to FAC, how to fix RFA keeps being discussed, etc. All the usual things are going on.

I'm trying to see if I can make a patch for Bug 471. Let's see how that goes.

Signing off for now,