Wikipedia talk:WikiProject Missing encyclopedic articles/Hot/1
Reciprocal list of topics
[edit]Is there a reciprocal list of topics that Wikipedia has (substantial) articles on that are not found in the Encyclopedia Britannica? Courtland 02:56, 2005 Feb 28 (UTC)
- It would be at least 400,000 articles in length for English WP (480,000 vs 80,000), and not of much use for helping fill in WP gaps, so no. Stan 05:08, 28 Feb 2005 (UTC)
Is there a convention for editing these pages, for example by checking off "done" or listing the redirect page name? I notice the dash after the titles, which suggests we should add something. Or is it enough to notice that a title has gone blue? David Brooks 18:13, 8 Mar 2005 (UTC)
- If you delete the entries that are done, the count will go down and you can see your progress, can be a useful motivator ("just five more before bed, to get below 800!"). Dash was needed as a separator before Danny numbered the entries, now they're just detritus. One could add annotations I suppose, although IMHO if you know enough to add a note, you know enough to create a worthwhile stub. Another suggestion is to edit the lists to lower-case terms that are not proper names - no one is ever ever going to make a link to "Poultry Processing" instead of "poultry processing", so those redirs will always be pointless. Changing those links will show that we have a lot more of these articles than is apparent now. Stan 00:02, 9 Mar 2005 (UTC)
- Well, looking at a random sample (page 1) it seems that Danny set the convention by using the "delete when it turns blue" rule, while some other links are turning blue and being left there. For example Abram Stevens Hewitt was created just a few days ago (I don't know if it was in response to this list or coincidence), while I have just created a couple of redirects for items 1 and 3. Having the page gradually going blue is almost as good a motivator as having it shrink, and saves waiting for the database to update :-) But I'll delete if that's the best idea; should it be listed as a convention on this project page, or even on the Category:Wikipedia missing topics page? David Brooks 01:12, 9 Mar 2005 (UTC)
- If the page never shrinks, then the numbers are kind of pointless... Lots of articles do show up due to other activity. I find it most efficient to do the prune once in a while. Wouldn't hurt to mention somewhere. Stan 13:32, 9 Mar 2005 (UTC)
And another style question: is it OK to knock off one of these articles by creating a redirect to an existing WP article that deals with the EB subject in an embedded section? For example, "Abo Blood Group System" is dealt with perfectly adequately in a section of Blood type; would it be acceptable to create a redirect from "ABO Blood Group System" (correcting the capitalization) to Blood type? (I understand you can't redirect to an interior section). Or is enough to assert that the subject is covered and delete the entry? David Brooks 01:49, 9 Mar 2005 (UTC)
- I take a moment to think - what are the chances that someone will ever create a link using that phrase as the title? The chances are nonzero, because you're looking at one :-), but there are other circumstances. For instance, last name only, without accents, umlauts, etc is what a US reader will likely type into the search box. If the odd title seems more like an artifact of the software's crunching of titles ("Abo" instead of "ABO" for instance), I'm less inclined for the redir. If "ABO Blood Group System" is an official name, then the redir is good, because people do Google official names of things. Stan 13:32, 9 Mar 2005 (UTC)
- OK, got it; will delete the few I know about on page 1. It might help, though, if someone could run another bot trying a case-insensitive match of the titles. It would catch a fair number and would be useful to verify deletions. David Brooks 16:49, 9 Mar 2005 (UTC)
I consolidated these discussions, plus a few more things that came up, onto the main page. Take it out again if you find it too noisy. David Brooks 23:02, 10 Mar 2005 (UTC)
Several "lists of things of things Wikipedia hasn't got"
[edit]We have Wikipedia:2004 Encyclopedia topics, Wikipedia:1911 Encyclopedia topics and Wikipedia:List of encyclopedia topics (generated from various sources). Beyond a mere "tiding up" exercise I can see several advantages to merging the lists. Does anyone object if I do that? Pcb21| Pete 17:55, 28 Mar 2005 (UTC)
- I'd rather keep them separate - for the 1911 topics it's always possible to tell what is meant by looking at the online copies, while for 2004 you have to have access to a modern EB. The big list, it's not always possible to tell what the intended subject is supposed to be. This is not as arcane an issue as one might think; two examples that I bump into a lot are German names of now-Polish towns, and last names that could mean both a modern person and an older one. If the lists are mixed up, it's going to take more time to study multiple sources (almost by definition, the missing topics are all pretty obscure to begin with). It's also easier to keep score when you can compare EB topic lists without having a lot of other stuff interpolated. Stan 00:46, 30 Mar 2005 (UTC)
- Oops sorry missed your response the other day Stan. As I said at Wikipedia_talk:List of encyclopedia topics I would retain origin information in the merged list. I set up an example of how it might work at User:Pcb21/list temp. Pcb21| Pete 15:44, 1 Apr 2005 (UTC)
Redirects and capitalization
[edit]4.250.132.69's change said: redirects are important ESPECIALLY for "truly trivial differences such as capitalization" but no reason why. In the discussion above, Stan says: no one is ever ever going to make a link to "Poultry Processing" instead of "poultry processing", so those redirs will always be pointless.
I'd side with Stan because of the behavior of the Go button. If someone enters "Poultry Processing" in the box, the software finds "Poultry processing" for them. Proper capitalization is only important for people who try to construct a link, and how often does that cause a problem? Last night I changed "Arpad Dynasty" to "Arpad dynasty" just to make sure the link turned blue. Does anyone else have an argument here? Danny? David Brooks 16:26, 29 Mar 2005 (UTC)
- I err on the side of creating the orphan redirects because all the value they add is tiny, the downside is even smaller (0) Pcb21| Pete 18:15, 29 Mar 2005 (UTC)
- Not quite zero - if an article is moved, each redirs has to be updated separately. Stan 00:48, 30 Mar 2005 (UTC)
- I support making those redirects. People often use improper capitalization in articles (see the bird articles, for instance, or any other animal articles), and the links may appear empty, causing someone to create a duplicate article with slightly different capitalization. It doesn't really hurt to have them, and at worst, it causes pipes, which are easily fixed when found. BTW, some hints for places to look for what the articles are. I use http://www.britannica.com and http://www.probert-encyclopaedia.co.uk. Both have long lists that can be used to identify subjects, decide if it needs a disambiguation page or a redirect, etc. Also, the information can sometimes be used to start a stub. Danny 01:33, 30 Mar 2005 (UTC)
- Thanks, Danny, makes sense. I'll fix the places where I deleted an entry on the basis of a differently capitalized article - if I can find them in the histories. David Brooks 01:52, 30 Mar 2005 (UTC)
- Also, if you make a redir with different capitalization, let's put something in the edit summary (I've just been using "EB2004"). I read recently that some people think a redir page that differs only in caps, without explanation, is a candidate for speedy deletion. David Brooks 03:05, 30 Mar 2005 (UTC)
- (re-indenting for readability) - I used to believe that we could knock out a link so long as we are convinced that the same subject is covered under a slightly (or not-so-slightly) different name, but I'm persuaded by the above arguments. In addition, creating the semi-redundant links provides an element of provability and repeatability. The aim should be that, if the list is re-generated, it comes up empty, and we don't have to go back and figure out all over again that "Treaty of Erewhon" and "Erewhon Agreement" are the same thing. In addition, it would catch an accidental deletion of the wrong line. Nobody will ever look up, or link, Yehudi Menuhin as Yehudi, Lord Menuhin of Stoke D'abernon Menuhin (that, and some more contorted examples, are artefacts of processing the EB titles) but it does no harm. David Brooks 16:16, 1 Apr 2005 (UTC)
- I was wondering about where the contorted examples came from... if they arise out of some intermediate step of processing, then aren't the redirects less valuable than if they were from the actual naming convention(s) EB uses in its articles and indexes? Is it remotely feasible and valuable to repeat the generation process whilst refining the processing process? Pcb21| Pete 16:47, 1 Apr 2005 (UTC)
- Some of the garbles clearly look like software bugs - "Nim, or Neem" becomes "Neem Nim" for instance. If anyone ever plans to re-generate the list, it would be better to fix those bugs than to create artificial redirects to accommodate. Personally I don't see much value in re-generation; if we actually get WP articles for all but 20-30 EB articles that slip through the cracks due to clerical error, everybody will be completely impressed. Note also that 1911EB has hundreds of dicdefs for which we don't ever want to create articles. Stan 01:01, 2 Apr 2005 (UTC)
- And, after all, finding matching articles already in Wikipedia is the easy part, and we shouldn't be fussing about it. Lots of articles remain to be written, and that's what takes the effort (I noticed Antequera in the list and spent almost a week of some very limited spare time translating of the German article). Even a decent stub can take time. David Brooks 01:09, 2 Apr 2005 (UTC)
Pages of the Week
[edit]Hi. I want to nominate page 26 as the focus of this week. Let's see how much of this page we can get done by Tuesday. Danny 12:46, 31 Mar 2005 (UTC)
Well, it's Tuesday, and we got over 40% done. Great job, everyone! For this week, I want to nominate Page 14. This should actually be very easy, since a lot of the redirects will simply depend on diacritics and hyphens. Good luck, everyone! Danny 11:21, 5 Apr 2005 (UTC)
- We treat diacritics like capitals, I guess. Suppose both EB and WP have Flügel (I made that up), but because of the processing, it shows up as Flugel on this list. Well, we already met the real goal, but it does no harm to add a redirect from Flugel, especially as English readers are likely to drop the accent when looking it up. (Wicked thought: should we also make a Fluegel redirect for the special case of German?) David Brooks 17:09, 5 Apr 2005 (UTC)
- Yes, because spellings like "Fluegel" are often seen as attempts to represent "Flügel" in ASCII, for instance in mid-20th-century conference papers produced on typewriters. Stan 19:45, 5 Apr 2005 (UTC)
- Actually it's more official than that; the e is still a valid but less preferable alternative in Germany for umlaut (as is ss for ß). And the French have, I believe, banished the œ, which is why it's not in Latin-1. But I don't think there are any other examples of valid re-spellings: for example, neither Malmo nor Malmoe is correct Swedish. David Brooks 20:01, 5 Apr 2005 (UTC)
Highlighting real holes
[edit]90% of this list can either be redirect to an already existing article or be left untouched - but what about real coverage holes that are found? I've only seen one (Affair of the Poisons) Can we all agree to put a "(look)" mark next to the term? Lotsofissues 07:15, 9 Apr 2005 (UTC)
- I think only 10% missing coverage is very wishful thinking. What sort of thing do you think can "be left untouched"? Pcb21| Pete 09:27, 9 Apr 2005 (UTC)
- I'm not clear what a "real coverage hole" is. Are you distinguishing major topic areas from insignificant topics, or just saying that the topic is already handled somehow? So far I have written, from scratch, 13 articles that were more than just "see xxx#yyy for more details", and some of them were significant figures in their fields. All you have to do is work down the list; there are plenty of them. David Brooks 18:05, 9 Apr 2005 (UTC)
Tuesday again, and time to select a new page. For this week, I nominate Page 8. Danny 10:16, 12 Apr 2005 (UTC)
Tuesday again, and time to select a new page. For this week, I nominate Page 21. This time, however, it is a little more difficult, since it is 0% done. Danny 10:30, 19 Apr 2005 (UTC)
This week's page is 17. The goal is at least 30%. Danny 11:00, 26 Apr 2005 (UTC)
- I think we hit the goal already. 723 in the list and 39 of them are blue. David Brooks 19:35, 28 Apr 2005 (UTC)
Low hanging fruit?
[edit]My feeling is that there aren't many of us working on this project... (though it is difficult to tell for sure as the red links tend to turn blue "by magic")
On the other hand however, we have made good progress - nearly 6,000 red links have been turned blue. My guess is therefore that these have generally been the "low-hanging fruit", such as redirects to pre-existing articles. As time goes on, it looks like we can do 40% of the links this way. How do we do the remaining 60% ? Is the current team big enough? Should we be shouting about this project from the rooftops to get more input? Surely the "manpower" is out there (much as so many of us like arguing about policy etc, there must surely be plenty of appetite to create content on unimpeachibly encyclopedic topics!) but they just haven't found out about the project yet? On the other hand, do we really care about finishing this project in any particular length of time? Wouldn't we prefer to just carry on chipping away amongst ourselves, having fun finding out about all sorts of oddball things, and generally getting a real feeling of contributing useful content? Pcb21| Pete 14:11, 26 Apr 2005 (UTC)
- While there mightn't be that many people actively working on this project, I think it does provide a good motivator for some of those (like me) who don't really have much time to write articles to chip in and help sometimes. I've done one of these already, and I'm sure I'll try to do more in the future. That said, I agree - shouting from the rooftops would be useful, yet at the same time, chipping away also works well. :) Ambi 14:21, 26 Apr 2005 (UTC)
- It is daunting though. I have created 16 original articles under this project, and most of the bios are more than stubs but less than complete. Although the process is straightforward, research and wikifying takes serious time, and I don't have a lot of wiki-time (on the upside, I now know all I care to know about Belgian literary history!). There's a long way to go with such an apparently small team. David Brooks 17:46, 26 Apr 2005 (UTC)
At the moment, the amount of "real useful content" being generated is not that high. See User:Eugene_van_der_Pijll/EB2004: of the first 80 articles removed from page 8 since it was page of the week, only 7 were completely new articles. Or, to say it differently, most of the information was already in wikipedia: 55 of the "articles" are redirects.
For me, making redirects and copying articles from EB1911 is most of the fun. After that is done, we could begin to move articles to Wikipedia:Most wanted articles. Eugene van der Pijll 19:51, 26 Apr 2005 (UTC)
- Re the page in your user area: It is interesting that three of the 204 articles are still red. Could be a mistake in your list, but more likely a mistake by people working the list. I think that rate of error is acceptable because we can always start over :). Pcb21| Pete 07:43, 27 Apr 2005 (UTC)
- The red links are because some links are corrected before an article is created. For example: the original link Emanuele D' Astorga (capital D, space before A) was first changed to Emanuele d'Astorga, and then an article was written (in this case, copied from EB1911). This happens because some of the links are clearly wrong; not because EB has them wrong, but because of the program that created the list. Most clearly: EB "Aldrin, Jr., Edwin Eugene" was converted to Edwin Eugene, Jr. Aldrin. The correct redirect is at Edwin Eugene Aldrin, Jr.. Eugene van der Pijll 22:03, 27 Apr 2005 (UTC)
- Thanks for the info... my personal policy is to personally create as many redirects as I can think of.... including redirects from the faintly nonsensical names created from the filtering algorithm. Redirecting from nonsense does not harm and in the long-term probably does a little bit of good. Pcb21| Pete 00:33, 28 Apr 2005 (UTC)
I tend to focus on the 1911EB list, because the lifting process is more productive, and because many of the articles also appear in the 2004 version. I believe that there are very few 2004EB topics that are not either in WP already or in 1911EB, because those will usually be 20th-century topics, and WP is already pretty comprehensive for the 20th century. Stan 20:54, 26 Apr 2005 (UTC)
- It would be nice to think so, but I don't think that is the case for bios. My own meager efforts (those 16 original articles) have included 4 or 5 20th century bios, a couple of them significant in their fields. The other problem I sometimes run into is that a EB topic is dealt with in a few sentences in the bowels of a WP topic; that needs to be addressed case-by-case (new article with some duplication? Stub-sized article with a "q.v."? Redirect and let the reader figure it out?) David Brooks 00:17, 27 Apr 2005 (UTC)
- Interesting - well, I guess that's why we're working with these lists! Stan 13:01, 27 Apr 2005 (UTC)
Missing diacritics
[edit]I really should have brought this up before, but... the lists here have lost diacritics, although they are present in EB. For example, Louis-Honore Frechette is on page 17, but it wouldn't be there if it hadn't dropped the accents; the EB and WP articles on Louis-Honoré Fréchette are spelled the same way. My feeling is that this may have been unintended (Danny?) but it's a good mistake because English-speakers are quite likely to try a lookup without striving to type those odd characters. So, a redirect it is. David Brooks 00:29, 27 Apr 2005 (UTC)
- I found this list in an index file (in binary format -- I had to make a program to extract it) on the Britannica CD and they were without diacriticals (I assume this list was used only for search and not for display). bogdan ʤjuʃkə | Talk 12:07, 27 Apr 2005 (UTC)
- Yep, I'd say as a general principle that every title with diacritics should have a redir from the same title without. Stan 12:58, 27 Apr 2005 (UTC)
Make lots of redirects. Variant spellings, different conventions for titles, different numbering schemes, middle name, middle initials, etc etc. It's quite frustrating to search carefully to see if there's an article on some subject, decide there isn't, research and write your own, and then discover that there already was an article! This happened to me today with Marie Louise Élisabeth Vigée-Lebrun — I later found an article at Elisabeth Vigée-Lebrun (with inconsistent accents). I made a dozen redirects so future editors are less likely to make the same mistake. Gdr 16:58, 2005 Apr 27 (UTC)
- Yeah, that's happened to me a number of times. The birth/death categories are useful this way ... assuming that the birth or death years are known accurately, not always the case. Stan 21:20, 27 Apr 2005 (UTC)
Are we duplicating effort?
[edit]I really like Danny's idea of concentrating on a particular page for a week as it creates a real sense of progress. However it occurs to me that as a side-effect we may be duplicating each others work. E.g. we each look down the list looking at potential redirect candidates - doing searchs etc. Would it be useful to mark areas we have scoured with, say, "CFR" to mean "I looked at numbers X through Yand Couldn't Find Redirect"? Pcb21| Pete 10:34, 30 Apr 2005 (UTC)
- Do you mean to tag each line (preferably in batches)? One problem is that I might not be able to find a RD, but you could be better at it :-). Another suggestion: "1911" meaning I found a 1911 entry, but don't have the time right now to copyedit and wikify it. David Brooks 18:33, 30 Apr 2005 (UTC)
- I like the 1911 idea. You've hit the nail on the head when trying to figure out the best way to implement this. Tagging each line is laborious, whereas tagging in groups is more prone to error - i.e. redirects are easy to miss. The current practice of writing a few words next to a red name is also a good one if the writer doesn't feel the energy to write the stub. Pcb21| Pete 21:57, 30 Apr 2005 (UTC)
- I wasn't even contemplating errors. We've had the experience of one of us failing to find the existing article, and then another finding it with a more inspired search. Still, I think the advantage of avoiding duplicate effort is worth the possible missed one. David Brooks 04:55, 1 May 2005 (UTC)
Great job on page 17! 40 percent done. Our next goal is Page 9. Let's see how far we can take this one. Can we match 40 percent? Danny 11:24, 3 May 2005 (UTC)
- Yes we can. Eugene van der Pijll 02:58, 4 May 2005 (UTC)
- It must have been a typo. He meant 50%. David Brooks 04:08, 4 May 2005 (UTC)
- Seems feasible, there are still a good number of easy red topics on 9. Pcb21| Pete 07:07, 4 May 2005 (UTC)
- It must have been a typo. He meant 50%. David Brooks 04:08, 4 May 2005 (UTC)
This week we really came close to 50 percent. In fact, page 9 is now the page that is furthest along. Today, I am nominating Page 2. Let's try and match our record! Oh, and as an extra challenge, see what you can knock out on this Page 2 as well. Danny 10:14, 10 May 2005 (UTC)
After another great week, the new list for this week will be Page 12. This time, we are striving to break 50 percent. Danny 10:55, 17 May 2005 (UTC)
Get some popularity!
[edit]What can we do to improve the visibility of this project? It seems to me that it is really important, yet i would guess many people dont even know it exists, is there anywhere we could link to it to get some more traffic?
could we link to it on one of these template?; Template:specialpageslist Template:Resources for collaboration
Bluemoose 18:39, 21 May 2005 (UTC)
- OK i bit the bullet and did it anyway - but there must be other ways of advertising this important project? Bluemoose 18:46, 21 May 2005 (UTC)
- One way is to make sure it's always in your edit summaries. Any work I do on this - whether it's creating articles or redirects - has [[Wikipedia:2004 Encyclopedia topics|EB2004]] in the edit summary. I think that was how I found this - someone else had done that, and I spotted it in Recent Changes. OpenToppedBus 13:11, May 23, 2005 (UTC)
- Thats a good idea - when we started doing that in project stub sorting things really started to happen. Bluemoose 13:34, 23 May 2005 (UTC)
- Another thing that helps: some other Wikipedians watch certain topic areas and assist with our stubs, even if they don't know about the project). I've added a few articles about minerals, about which I know nothing, and found an enthusiast occasionally following me around and tidying up. Yesterday I contributed sanxian (san-hsien in EB, but WP seems to use pingying as standard) and worried that I didn't know enough to give the ideograms and other romanizations. Within 10 hours, User:Defrosted had helpfully added them. So don't be put off if your stub is going to look bare - the community will come to the rescue. David Brooks 17:28, 5 Jun 2005 (UTC)
We are getting good at this! For this week, let's collaborate on Page 19. We still have to break fifty percent on at least one page. Danny 10:05, 24 May 2005 (UTC)
- As Hercules, aka Eugene, points out actually /26 has just passed that landmark. Great work guys! Pcb21| Pete 10:09, 24 May 2005 (UTC)
Automated low-hanging fruit
[edit]Friends, I used a simple PERL script to produce a version of the POTW that tries to make wikilinks out of various common variations on the EB name. This is an attempt to automate some of the searches I'm sure we are all making. Look at User:DavidBrooks/sandbox/EBfruit. Anything in blue down the left side is a WP article; if its "original" line is red, there's a potential hit. But you still often have to verify it is the same topic.
Attempts are: if > 2 words then 1st and last word and 1st and 2nd word, lowercasing all but 1st word, removing hyphens and lowercasing all but 1st word.
- If there is a hyphen, use the string lowercased, and then use it replacing hyphens with spaces
- If there are >2 words, take first and last, then first and second words
- If there is >1 word, lowercase all but the first
If you think any other simple algorithm would be useful, let me know. I can't think of a way of adding a search for diacritics that wouldn't make the file too huge.
It getsno longer gets a little confused if you tag the line with another link, but not too badly. David Brooks 20:30, 24 May 2005 (UTC)
- Wow. That is some clever thinking there David. I am seriously impressed. I have nearly knocked out all of the matches for that page though so now we need something similar for every page :). (Perhaps hold back in case others have bright ideas for extensions before you do that though :). Pcb21| Pete 21:07, 24 May 2005 (UTC)
- Thanks. BTW I noticed that Muhammad Hafiz Ibrahim is not the same as Muhammad Ibrahim. That's one example of the need to verify the same topic. David Brooks 23:00, 24 May 2005 (UTC)
- Exactly. Slow down and verify. Someone made Napoleon-Joseph-Charles-Paul Bonaparte into a redirect to Napoleon Bonaparte. Not only did that create a double redirect, but it was also the wrong person! OpenToppedBus 09:12, May 25, 2005 (UTC)
- The person who made that mistake (ahem) didn't make it because of David's page. But of course we all must check the suggestions for correctness and redirectness but this is no big deal. Pcb21| Pete 10:06, 25 May 2005 (UTC)
- When you are sure you have the right article, some of the other generated possibilities should probably be created as redirects as well. OpenToppedBus 09:31, May 25, 2005 (UTC)
- I just refreshed User:DavidBrooks/sandbox/EBfruit. Not much left, I guess. Time to start writing! David Brooks 17:33, 25 May 2005 (UTC)
- Page 24 has arrived.
Before the next POTW arrives, has anyone thought of another useful rule to add? I wonder if a version of the string without ' characters would be useful. David Brooks 23:41, 6 Jun 2005 (UTC)
Page 5
[edit]I just added britannica searches to page 5, what do you think? worth doing other pages too? Bluemoose 16:52, 25 May 2005 (UTC)
- Not sure that putting the "referrer" headers in their web logs is such a good idea - although I'm suppose it's likely that there's someone at EB aware of this project. David Brooks 17:29, 25 May 2005 (UTC)
- I personally don't like it to be honest. My preference is to avoid looking at the Britannica (and other general reference sites) pages at all costs because it is hard to truly "forget" the Britannica-stub if you decide to write your own stub. On some occasions it appears that Brittanica is the only site on the web covering a particular topic, but this is rare and I simply leave those topics red. Your addition might encourage plagariasm. However the idea is good - but with Google searches instead of Britannica ones. Just my 2p. Pcb21| Pete 23:41, 25 May 2005 (UTC)
I actually like the Britannica searches. It helped me identify a lot of people rather easily. Maybe Britannica and Google searches would be good. Danny 01:44, 31 May 2005 (UTC)
- That would be pretty easy to do, if there is no opposition i'll add it to the collaboration page. In response to pete above, i would urge everyone to always check the eb articles, or you may get confused, as the the napoleon business in the conversation above demonstrates nicely. thanks Bluemoose 13:25, 1 Jun 2005 (UTC)
- OK, i went crazy and did it anyway (to page 24), can always be reverted if unpopular, although with google searches as well i think its quite useful. if it is helpful i'll add it to future collaboration pages. Bluemoose 14:00, 1 Jun 2005 (UTC)
- If I can get over how ugly it looks :-)... I also often search English WP and sometimes all WP's. The EN search (with google or the WP engine) can find articles that are different only in accents, or articles that contain significant material on the searched topic. The global one can find translatable articles, although they are considerable effort. David Brooks 18:51, 2 Jun 2005 (UTC)
- Britannica searches are useful, but I don't know if I'll use your links much. But they don't hurt.
- My method is to open three browser windows in parallel; one with the EB topics list, one with Special:Allpages, and one with britannica.com. If there are two similar links in the first two windows, I open them in the background (I'm using Opera, a tabbed browser; Firefox would also work, probably), and later I search in britannica to see if its article has the same subject of the existing wikipedia article. I estimate that I find redirects for about 15% of the red links that way. (I've done the first 4 sections of page 24 (see my contributions), all of the other "pages of the weeks", and small parts of some other pages. If anyone wants to take it over, that would be OK with me.) Eugene van der Pijll 19:10, 2 Jun 2005 (UTC)
EB's selection criteria is sometimes bizarre
[edit]National Women's History Week Project - a recently founded org with only 4,600 due payers? The eighth and current president of Duke?
A notability criteria that can't be explained. lots of issues | leave me a message 01:11, 31 May 2005 (UTC)
- Might be partially explained by Britannica's special "featured series" on particular subtopics - one of which is women in history. Seems like some of the specialist material there has been fed back into the main encyclopedia. Pcb21| Pete 09:12, 31 May 2005 (UTC)
Since I won't be around tomorrow morning, I am switching collaborations early. Ths was another outstanding week. In the new week, let's collaborate on Page 24. Danny 01:46, 31 May 2005 (UTC)
Mucking around with timelines
[edit]Any interest in me polishing this up into a usable-on-proper-page state? Pcb21| Pete 18:16, 2 Jun 2005 (UTC)
We are really moving ahead with this. This page should be easy. The challenge is to bring it up to 50%. We have a week. Let's see how far we can get on Page 6. One helpful hint to get the Chinese names is to check the pinyin spellings that are given in EB. Danny 01:30, 7 Jun 2005 (UTC)
- Also, I believe EB standard is the Wade-Giles romanization, while WP prefers pinyin. New articles should probably have their main page in pinyin with a redirect from the EB version. David Brooks 20:45, 7 Jun 2005 (UTC)
- Low-hanging fruit list uploaded. There are very few hits. David Brooks 20:45, 7 Jun 2005 (UTC)
- And - one more time - unless it's blindingly obvious, check the EB entry before making a redirect, to make sure it is the same topic! David Brooks 23:37, 7 Jun 2005 (UTC)
No Letter Left Behind
[edit]I have begun making an additional list by letter of the alphabet. For this week, I pose the additional challenge of getting rid of all the potential Q articles--there are all of 50 of them. While it may require some research, it is not unreasonable. So, when you get frustrated with page 6, see what kind of dent you can make in the Q's. Danny 11:30, 9 Jun 2005 (UTC)
- Are we going to keep at Q until it's done?, or go on to a new letter as it has been more than a week now. Bluemoose 11:21, 17 Jun 2005 (UTC)
- Personally, I'm not finding the alternative listing all that helpful. The problem comes when you're annotating the lists rather than just creating the articles straight off - do you have to annotate both lists, or do you expect that people who come after you will check both? OpenToppedBus - Talk 13:00, Jun 17, 2005 (UTC)
- I know what you mean, i dont think we should bother pruning the letter lists. But i think it is good to have because casual users can do 2 focus pages rather than get bored with one and give up, also Q has been quite successful. Bluemoose 13:30, 17 Jun 2005 (UTC)
- It's not so much pruning as annotating. If you look at Q you'll see that someone has made some notes about alternative spellings for Qasr'amrah. But those notes aren't on 22, meaning that someone working on that page is likely to end up repeating research. I take your point about offering variety, but there's nothing to stop people working on any page if they get bored with the POTW. OpenToppedBus - Talk 13:51, Jun 17, 2005 (UTC)
We need a project!
[edit]I firmly believe what we are doing here is really important. There were 28,000 missing articles, (now something like 19,000), and we have 293,000 reistered users on wiki, that was 1 article for every tenth person, thats nothing!. So whats our problem? well, we all work hard, but there seems to only be about 8-9 people actually frequently here! (me, Danny, Opentoppedbus, Eugene, Pcb21, Peta, and a few others). I think we should make a project page (possibly Wikipedia:WikiProject Missing encyclopedic articles), this would encompass Wikipedia:1911 Encyclopedia topics, Wikipedia:2004 Encyclopedia topics and Wikipedia:List of encyclopedia topics. The advantages would be; Greater coordination, for example, the tips on finding/making articles on the 2004 page can be applied to the other 2 pages as well. And most importantly, greater publicity - having a focus page (listed here) would no-doubt encourage a lot more people to get involved, and when they are involved, it will allow them to work better as well. Please, even if you dont have anything to add, say whether you think it would be a good idea or not. Thanks - Bluemoose 18:37, 10 Jun 2005 (UTC)
- I think that would be a great idea. If we can get a few more people involved, we would be finished with this list in no time. Danny 00:29, 11 Jun 2005 (UTC)
- This is a very interesting project (I haven't contributed yet, but I plan to soon) and it would surely benefit from increased exposure. AиDя01DTALKEMAIL 00:33, Jun 11, 2005 (UTC)
- I made a preliminary project page (Wikipedia:WikiProject Missing encyclopedic articles), loads can be added to it, such as pcb's timeline chart thingy above, policy will also be quite important. e.g. I believe it is policy to create redirects to articles that don't strictly need them (due to different capitalisation etc.) as there is no negative consequence, and will help in determining which articles we do and dont have. Bluemoose 10:14, 11 Jun 2005 (UTC)
- Did some copyediting and added my two-penn'orth. David Brooks 20:20, 11 Jun 2005 (UTC)
- We now have instructions on Wikipedia:2004 Encyclopedia topics and Wikipedia:WikiProject Missing encyclopedic articles — and they are different already. I know this because I just edited one of them. Do we nuke most of this page and point to the Wikiproject page? Is that the convention?
- Thats what i would like to do, i didnt want to do it without some agreement though, Bluemoose 19:18, 14 Jun 2005 (UTC)
- Sounds like a good idea. By the way, thanks for picking a new page while I was out of town. Danny 03:58, 15 Jun 2005 (UTC)
- Just out of interest if you look at what links here for the template, it gives an idea of how popular the project is. I am hoping that use of the template will propagate through out wiki, hopefully in a non-linear fashion. Bluemoose 07:48, 15 Jun 2005 (UTC)
What about dic defs?
[edit]I think most people would argue that an entry for "University extension", modelled after the EB "University Extension", would be a dic def, and not appropriate for WP. Niteowlneils 21:09, 10 Jun 2005 (UTC)
- The article names don't have to be 1-to-1 if you think a topic is too small for a WP topic (I often find this to be the case too). You can redirect to a suitable supra-topic and edit that topic to make explicit the sub-things that redirect there. Pcb21| Pete 07:28, 11 Jun 2005 (UTC)
Bluemoose has chosen a new POTW. Low-hanging fruit uploaded. David Brooks 02:02, 15 Jun 2005 (UTC)
- I'm pretty sure I've picked off all the fruit. OpenToppedBus - Talk 09:12, Jun 15, 2005 (UTC)
General Comments
[edit]I was doing some work on the page and getting others involved but i was wondering why is there no wp Britnaica and google links on the other pages. Also are we suposed to delete the link once we have finished or do we leave it. Maybe some one could do a header to this effect. And a bot on all the pages for links to helpfull resources.--Adric 13:01, 16 Jun 2005 (UTC)
- Thanks for getting involved! There are no Britannica and Google links on other pages because no-one has yet got around to generating them (I think someone's got some sort of script to do it). You can delete the link when you've created it if you really want to, but you don't have to bother - various people prune the lists from time to time. There are a lot more hints and tips at Wikipedia:WikiProject_Missing_encyclopedic_articles. OpenToppedBus - Talk 13:10, Jun 16, 2005 (UTC)
- I second the recommendation to read the tips. They have been developed on the basis of a several peoples' experience and debate. David Brooks 17:15, 16 Jun 2005 (UTC)
- You can delete links if you want, doesnt really matter. The external links are made by using a few tricks in microsoft excel, I havent done all pages yet because i havent had time, i can email the spreadsheet so you can see how it's done - anyone with a reasonabe knowledge of excel [or PERL -- DB, or Python! Pcb] would be able to do it. Bluemoose 15:03, 16 Jun 2005 (UTC)
Tips
[edit]One strategy I've found useful is clicking through to the EB article, which often displays alternate spellings of names and places, especially those transliterated from languages using other alphabets. I've had the most success with Arabic, for which EB seems to list three, four, or even five alternate spellings, one of which often has a WP article already there. In those cases I try to create redirects from all of EB's alternate spellings.
Also useful for biology articles is searching by scientific name -- WP seems to mix and match; some articles use common name while others use genus or species names. Searching by those can help locate existing articles -- this is especially true for groups of animals where the EB article says something like "Shoofly refers to any member of the family shooflyus". - Bryan is Bantman 18:58, Jun 16, 2005 (UTC)
- You've hit one of my pet peeves (pun intended :) on the scientific names issue. I just posted to Talk:Insect to see if there is going to be a consensus to get rid of that. My preference is to have the scientific names built out, and make the common names redirects. Wikibofh 20:31, 21 Jun 2005 (UTC)
- True, but for full credit check whether one encyclopedia article is about a genus while the other is about one of its species. Sometimes the "genus" article focuses almost exclusively on one species; then it's a judgement call. David Brooks 20:50, 16 Jun 2005 (UTC)
I apologize for the removal of Narodnost from the listing. I was mistaken about the purpose of this page. - Mike Rosoft 18:03, 21 Jun 2005 (UTC)
Additional helpful link next to all entries
[edit]How about a Google search link that queries the term paired with site:en.wikipedia.org I have Google search often more helpful than our internal app.
lots of issues | leave me a message 30 June 2005 13:03 (UTC)
- I would second that, and request a forth one: a google search with site:wikipedia.org (no en.) - sometimes another language wikipedia has content one of the articles... User:Dragonsflight might be able to help with eir bot... JesseW 4 July 2005 08:10 (UTC)
- I didnt want to put more than 3 links on as it gets a bit messy, which 3 would be the best ones is a different matter. Bluemoose 4 July 2005 08:48 (UTC)
- I am concerned that when I type "x britannica" into google to research topics, these pages are appearing in the results. We do not trade off Britannica's name, and we should not give the appearance of doing so... but maybe the only way of stopping it is to remove the eb search links? Pcb21| Pete 4 July 2005 09:26 (UTC)
really awful EB names
[edit]I have some problems with "every article in EB should have at least a redirect". Consider Severocesky, Zapadocesky (and other instances) - these are adjectives of former Czech region names (no longer in official use) transliterated from Czech. The most relevant redirect (I made them) would be to Czechoslovakia#Administrative_divisions (where I added these former names), however, this doesn't show up properly, so it isn't very helpful. Zapadocesky article is completely wrong, btw. Anyway, one could live with that. But Stredni Slovensko and Vychodni Slovensko? These are informal names of regions (even transliterated from Czech, not Slovak, I think), in English 'Central Slovakia' and 'East Slovakia' (however, they correspond to former regions in Czechoslovakia, as above). I wouldn't mind having an article on these, which would list current regions there, but for instance, there is no such thing as North Germany (actually is, but not quite what would you expect). How the redirects should be in this case? In these particular cases, EB is wrong (trust me, I am Czech); should we make WP also wrong in order to be compliant? Samohyl Jan 11:05, 14 July 2005 (UTC)
- Why do you say ""every article in EB should have at least a redirect"", I dont think that is any kind of project policy as far as i am aware. Also, why shouldn't Severocesky (original name acording to Czechoslovakia) link to Ústí nad Labem Region (currrent name acording to Czechoslovakia)? You'll have to forgive my ignorance of Czech geography! Bluemoose 11:16, 14 July 2005 (UTC)
- Well, I thought this was a policy to prevent misunderstanding in the future (and it is a good thing to prevent it, imho). It could, but "Severočeský kraj" was split and only part (about 3/4) of it is now Ústí nad Labem Region. The case of Zapadocesky is the same, and Vychodocesky is even bigger problem, because it was mostly split in half. Samohyl Jan 11:32, 14 July 2005 (UTC)
- Why not write an article on the old regions as well, if they're changed from the new ones? That'd be quite helpful and quite interesting to read, too. Ambi 12:06, 14 July 2005 (UTC)
- I put them all into one article, since it would be too much clutter. But there is still problem with the Slovak region names. Samohyl Jan 04:24, 15 July 2005 (UTC)
Lovetoknow Flowers
[edit]Lovetoknow appears to have a public domain plant scanned encyclopedia at http://www.lovetoknow.com/Flowers/flowers.htm but it does not say the name/year. Does anybody knows which one it is ? bogdan ʤjuʃkə | Talk 19:03, 21 July 2005 (UTC)
- That would be great. We do not have lots of articles about flowers. Danny 01:30, 29 July 2005 (UTC)
- I can't find where they've ripped there text from. But there is this book in Project Gutenberg Gardening for the Million by Alfred Pink which has a smiliar standard of content. Be careful with the taxonomy though since names have probably changed in alot of cases.--nixie 01:48, 29 July 2005 (UTC)
- Yes, I know. But the descriptions of plants have not changed. For the taxonomy, a good resource is Flora Europea. bogdan | Talk 08:30, 29 July 2005 (UTC)
- I can't find where its from either - they seem to have OCRed a book that nobody else has, and didn't care to leave any mention of where it came from. Ambi 02:36, 29 July 2005 (UTC)
- They probably would tell you if you asked via email, it is (I believe) in the public domain anyway. Bluemoose 08:07, 29 July 2005 (UTC)
- Would they? Their notice about how they have the copyright on stuff presented at 1911encyclopedia.org? They don't seem to play that nice. Pcb21| Pete 13:41, 7 August 2005 (UTC)
- Couldn't someone just make a list of the articles here? Ambi 04:41, 27 August 2005 (UTC)
An observation
[edit]It seems to me that the small lists get done really quickly, while we plod through the longer ones. Look how well we did on the various small letters in 2004 (Q, U, X, Z). With that in mind, and given how much harder it is becoming to find links, I would like to propose that we focus on smaller amounts in upcoming collaborations of the week, for instance Section 3 of 2004 page 6. With a more specified concentration, we are sure to make a larger dent in good time. So, whaddya think? Danny 01:30, 29 July 2005 (UTC)
- I'd prefer we continue with the current course until all of the original pages (1-29) have been "focused" for one week. That process does a pretty good (though not perfect) job of killing redirects, and will be completed in 10 more weeks after the current collaboration. After that, we can focus on the shortest lists, to pick up on the effect you've noted. I suspect though that your observation is only partially true -- finishing 50 articles on a long page might only move us from 30% to 35%, but doing the same 50 on a short list might cut it in half, or finish it altogether. - Bantman 01:46, July 29, 2005 (UTC)
- I'd agree with Bantman - let's finish the current process of focussing and then reassess where we are after that. OpenToppedBus - My Talk 08:25, July 29, 2005 (UTC)
Relevancy of see also links
[edit]I've now twice removed a see also link to a page about where Wikipedia is more comprehensive than Britannica so feel duty bound to explain myself a bit here. Admittedly I don't hold out much hope for that page - there are literally thousands of topics where Wikipedia beats Britannica - how do you pick?
But more imporantly I don't see how that page is important to the missing topics project. It doesn't help us get the topics filled in - the reason why this project has got over 12,000 links filled in is because we have a good, tight, focused project, unfettered by all of Wikipedia's usual problems and chock-full of excellent editors. I am ultra-sensitive to any changes that might weaken that excellent focus and have it become a regular moribund wikipedia: namespace page. Yes I'm probably over-reacting about just one link, but experience shows that time and time again the slippery slope argument is valid on Wikipedia. :/ Pcb21| Pete 23:23, 29 July 2005 (UTC)
- Ok. You care about the link more than I do, so I bow to your concern. Now let's go write some more stubs! :-) JesseW 23:29, 29 July 2005 (UTC)
Capitalisation on Page 26
[edit]Just wondering how and/or why we lost all capitalization at Wikipedia:2004 Encyclopedia topics/26. Danny 22:46, 10 August 2005 (UTC)
- It seems to have happened when the searches were put in. Since capitalisation in significant, I've reverted the change. (Could someone else please re-insert the searches?) Bluap 06:59, 11 August 2005 (UTC)
- Quadell did it (perhaps by accident, I don't know) when he put the searches in. I've reverted the page I've been working on and will do the others as I visit them. Presumably everyone agrees that all caps is better than no caps even though neither is absolutely ideal? Pcb21| Pete 07:55, 11 August 2005 (UTC)
- I'll do it properly later today. Martin (Bluemoose) 09:19, 11 August 2005 (UTC)
- Done some - which others need doing? Martin (Bluemoose) 11:35, 12 August 2005 (UTC)
- I'll do it properly later today. Martin (Bluemoose) 09:19, 11 August 2005 (UTC)
- I've fixed the rest. I think they were my mistakes. Sorry! – Quadell (talk) (sleuth) 14:38, August 12, 2005 (UTC)
Question from someone at Britannica
[edit]How was this list generated?
- From 216.146.93.139 (talk · contribs) which routes to corp.eb.com, i.e. the corporate office of Encyclopedia Brittanica. I have no opinion on what the reply should be, and in fact do not even know the answer, but I figure that recognizing who is asking the question could be important here. Dragons flight 00:28, August 24, 2005 (UTC)
A message from alinktothepast was posted here, and deleted by Lotsofissues, saying "I have delete this message by alinktothepast, I apologize for this act of censorship witout permission. I will speak to you. I would like to remind ppl in the know that this is Britannica interrogating...please don't reveal sensitive info that would get them in trouble.lots of issues | leave me a message 01:09, 24 August 2005 (UTC)"
- What are you talking about? Martin - The non-blue non-moose 18:04, 23 August 2005 (UTC)
- I was thinking the same thing. The 2004 topics list is the Britannica list. You may be thinking of the Encarta list, but where did you get that information from? - Taxman Talk 22:49, August 23, 2005 (UTC)
- Ack'd. I didn't realize the protocol w.r.t. the talk page. Can someone help me find out more about how this list (or the Encarata one) was generated? There seems to be a great deal of specific numbers available on it (E.g, "The list began with 28,360 articles on twenty-nine pages each...") and the "2004" bit suggests a specific version of Brittannica from which it was derived, so surely someone can provide more information on where the list came from? Is it from the print, online or cd version of EB?
- Erm, IANAL, but I would suggest it might be wise to tread lightly here; methinks an inquiry from corporate EB could be a forerunner to some sort of copyright issue over who owns the list and what acceptable uses for it are. - Bantman 00:36, August 24, 2005 (UTC)
- While I have no specific information with regards to the creation of this 2004 Encyclopedia topics list, such a list as this could be created(or re-created) simply by typing in the entries one at a time, checking from Wikipedia:Allpages for Wikipedia ones, and http://www.britannica.com/eb/index for EB. The comparing could be semi-automated by a simple script that downloaded the respective pages, split them by line, and added missing ones to an output list. (I could write such a script in an hour or so). As for legal issues, if there is any claims of copyright in the arrangement of the partial, output list(which does not resemble anything published by EB, as it is missing thousands of topics listed on the web page mentioned above), this could be simply fixed by resorting the list alphbetically by the third letter of the word. No such list has been published by anyone, EB or otherwise, so no copyright issue should apply. Hopefully this will provide some useful guidance, however, IANAL, and as I said above, I have no idea if this list was generated in the way I lay out above. Thanks for your interest. JesseW 07:47, 24 August 2005 (UTC)
- Erm, IANAL, but I would suggest it might be wise to tread lightly here; methinks an inquiry from corporate EB could be a forerunner to some sort of copyright issue over who owns the list and what acceptable uses for it are. - Bantman 00:36, August 24, 2005 (UTC)
- Ack'd. I didn't realize the protocol w.r.t. the talk page. Can someone help me find out more about how this list (or the Encarata one) was generated? There seems to be a great deal of specific numbers available on it (E.g, "The list began with 28,360 articles on twenty-nine pages each...") and the "2004" bit suggests a specific version of Brittannica from which it was derived, so surely someone can provide more information on where the list came from? Is it from the print, online or cd version of EB?
- While I have no idea how the lists were generated, it is relatively easy to work out who posted them on Wikipedia, and ask them. In terms of copyright, it is worth pointing out that this is _not_ a complete list of topics in Britannica, but a subset of that list, that has been generated by wikipedia-specific criteria. Bluap 09:19, 24 August 2005 (UTC)
- It was about 28% of all EB articles (now it is half that), I doubt it was copied from anywhere (rather it was probably generated somehow), and the names have been altered from EB; they list things as Lastname, Firstname and they have been altered to Firstname Lastname. I dont think there is any legal issue here at all, rather it is just some guy from EB looking through Wiki (note their contributions) and even (strangely) making some improvements. Martin - The non-blue non-moose 09:29, 24 August 2005 (UTC)
Well that was a hopeless attempt at deception. I even created the wrong impression. The message was intended to suggest Encarta provided the Britannica list.
So what's up at Britannica? lots of issues | leave me a message 19:28, 24 August 2005 (UTC)
- What the hell? Why are you attempting to be deceptive? How is that helpful? I don't know how the list was originally generated, but I'm pretty damn sure that we don't have anything to hide from anyone, and if we did, deception would hardly be the best way to go about it. OpenToppedBus - Talk to the driver 08:43, August 25, 2005 (UTC)
- We will not be helpful. 1. EB initiated a rivalry. 2. For motives unknown, this employee has more than a casual interest in finding out the source of this list. He has contacted four ppl repeating and then repeating the question. lots of issues | leave me a message 23:43, 25 August 2005 (UTC)
- I think that you're reading too much into the question - personally I think that it's a single EB employee, looking through wikipedia in their spare time. Let's be sensible about this, and just continue with the project. If EB have a genuine issue, they'll let us know. Bluap 08:34, 26 August 2005 (UTC)
New progress chart
[edit]I liked the above chart, but I've contacted the user that made it and he never got back to me. So I made my own version. Is this good enough to go on the main page? I think the big coloured section gives a much better sense of our progress than the numbers alone.
Big is right. I think it looks UGLY Reflex Reaction 04:30, 27 August 2005 (UTC)
- I don't like it at all... Bluap 08:54, 27 August 2005 (UTC)
- Neither. Ambi 09:14, 27 August 2005 (UTC)
- Thanks for registering your opinion so politely. It's great to be appreciated. Soo 11:23, 27 August 2005 (UTC)
- Neither. Ambi 09:14, 27 August 2005 (UTC)
- Colours and sizes are changeable, if that's the problem. Pcb21| Pete 11:29, 27 August 2005 (UTC)
- Please, no more statistics! Martin - The non-blue non-moose 11:30, 27 August 2005 (UTC)
- It's a different way of displaying the same statistics as already there. Pcb21| Pete 11:57, 27 August 2005 (UTC)
- Yeah, but it means yet more unproductive work. Martin - The non-blue non-moose 14:25, 27 August 2005 (UTC)
- Agree with Martin, et. all. It also seems to be broken, as of now. But thanks for trying. ;-) JesseW 15:58, 27 August 2005 (UTC)
- Yeah, but it means yet more unproductive work. Martin - The non-blue non-moose 14:25, 27 August 2005 (UTC)
- It's precisely the same amount of work as what we already do though.... we update the percentage now and then. Come on admit it, you just think the pink looks crap ;). Pcb21| Pete 16:31, 27 August 2005 (UTC)
I updated the chart so it is not broken, and less huge and ugly, but I agree with others that it should live here and not join the myriad of things in need of updating. Dragons flight 16:39, August 27, 2005 (UTC)
- I wrote a Perl script so the chart is generated automatically from the current statistics page. The new colour scheme is much prettier, thanks to all those who altered it. I for one like it. As for unproductive work, err, this is Wikipedia, not a job. In accordance with popular opinion, I'll keep it on this page. Soo 16:47, 27 August 2005 (UTC)
Page deleted
[edit]Jimbo suddenly asked for the Encarta page to be deleted. This page has been deleted too. See the thread beginning at http://mail.wikimedia.org/pipermail/wikien-l/2005-August/027848.html. Pcb21| Pete 15:39, 28 August 2005 (UTC)
- And I was the one who deleted the pages in question. Since Jimbo wanted the Encarta one gone for legal reasons, I did not want to take any chances with this list being active (especially due to the above chatter). Zscout370 (Sound Off) 15:55, 28 August 2005 (UTC)
- With the page active for six months, I feel that action was perhaps a bit trigger-happy. I hope Jimbo will be able to publish whatever caused him to make the Encarta announcement. Pcb21| Pete 16:12, 28 August 2005 (UTC)
- While I do not know that he can, I am restoring the project page and the A-Z index so this "merging" can take place. Zscout370 (Sound Off) 16:34, 28 August 2005 (UTC)
- With the page active for six months, I feel that action was perhaps a bit trigger-happy. I hope Jimbo will be able to publish whatever caused him to make the Encarta announcement. Pcb21| Pete 16:12, 28 August 2005 (UTC)
As someone who has long been in the minority on this issue, I have always believed these lists were likely a problem. Which is to say that I have believed that EB et al. do have a defensible copyright on the list of topics in their encyclopedia, that the lists posted here were unambiguously derived works, and that we did not have a sustainable claim to fair use. Everytime I have brought it up, it has led no where so I wonder what changed. However, I would like to call attention to the fact that my same reading would make the following additional pages also copyright problems: Wikipedia:Hutchinson_Encyclopedia_topics, Wikipedia:Encyclopedia_of_Modern_Jewish_Culture_topics, Wikipedia:Missing_science_topics, Wikipedia:List_of_missing_Middle_Eastern_topics, Wikipedia:Canadian_wikipedians'_notice_board/Dictionary_of_Canadian_Biography, Wikipedia:Evangelical_Dictionary_of_Theology, Wikipedia:MacTutor_archive. I don't know if people want to get trigger happy about these others, but the basic pattern of facts is essentially the same only on a smaller scale. Dragons flight 16:52, August 28, 2005 (UTC)
- I really feel sad at this deletion, particularly so, as I found the list like a gold mine to find topics to create at least stubs. OK pals, I will continue to fill the "gaps" in the wikipedia, with edit summary reading like: created a stub for a similar entry in EB. Please suggest a better and shorter summary. Thanks. --Bhadani 15:50, 1 September 2005 (UTC)
What is and isn't a potential copyright violation
[edit]I would like to point out something. By hosting the EB index page (and perhaps similar pages above), Wikipedia could be seen as violating EB's copyright. If EB took Wikipedia to court, it's possible that the judge would rule in Wikipedia's favor (which is what I suspect would happen), but it would be expensive, and a hassle, and would distract attention from Wikipedia's core goals. From what I can tell, this is Jimbo's reason for asking that the page be deleted, and I support it, although reluctantly.
But, and this is the important bit, it is not a potential copyvio for a Wikipedian to use such a list. It's only problematic to publish (or copy) such a list. Some of us could hypothetically look through the list in the history pages (or download the list to our local computers, in case the history pages are deleted), and use these lists to find holes in Wikipedia. While it may or may not be a copyright violation for you to do so, it would be vanishingly unlikely that an individual would be prosecuted for such an act, and it would not make Wikipedia liable in any way. I'm not openly advocating this; but it is possible for this work to go on in this way. – Quadell (talk) (sleuth) 19:13, August 28, 2005 (UTC)
- A solution for this problem would be hosting the list outside the Wikimedia servers and the US, as in some countries (Sweden, for example), publishing the list in itself would be fair use and not be punishable under copyright infrigement because it does not create any loss of profit to Britannica (required for prosecution under their copyright laws), since nobody buys an encyclopedia only for the index. bogdan | Talk 19:20, 28 August 2005 (UTC)
- IANAL, but my understanding of copyright law is that we would be better protected if we combine all of our lists (Encarta, EB 2004, EB 1911, Nuttall, and all the others), as well as accept article "suggestions" for the combined list from individuals. That way there would be no one owner of the material in the list, and it might be harder to use the "derivative work" argument if we add some "creativity" in generating our own list of encyclopedic topics not yet covered by WP. Just a thought. - Bantman 21:24, August 28, 2005 (UTC)
- While I do apologize for the deletions this morning, I was asked by others on the mailing list and on IRc to restore the pages for a merge. Personally, it does not matter to me what yall do, just let me know what happens. Zscout370 (Sound Off) 21:37, 28 August 2005 (UTC)
- As a matter of procedure, I was under the belief that if a page qualified as copyright violation, then someone should put a copyvio tag on the page, to give people the opportunity to discuss the issue. Only last week, Bluemoose suggested that copyvios should be added to the criteria for speedy deletion, only to have the idea discounted (see [1] for details). In this particular occasion, why were the project pages speedy deleted? Bluap 21:52, 28 August 2005 (UTC)
- Because Jimbo Wales, as the head of the Wikimedia Foundation, is our benevolent dictator and gets to set the rules (or break them) as he sees fit. It is also worth noting that if there really were a lawsuit, he is the one of the people most likely to be named personally. Dragons flight 22:27, August 28, 2005 (UTC)
- Is the subpage of the list deleted as well?--Kiba 23:04, 28 August 2005 (UTC)
- Because Jimbo Wales, as the head of the Wikimedia Foundation, is our benevolent dictator and gets to set the rules (or break them) as he sees fit. It is also worth noting that if there really were a lawsuit, he is the one of the people most likely to be named personally. Dragons flight 22:27, August 28, 2005 (UTC)
Sigh. Fine. I'll write the tool. The tool being a simple script to on-the-fly, download a copy of the EB index published on their website, reformat the article names to Wikipedia style, and remove ones that are blue links, and display, on the screen of the person using it, the result; the resulting data would never be published, and so, while IANAL, I can't see that there would be any concerns. JesseW 23:07, 28 August 2005 (UTC)
Britannica will lodge any complaint through a letter first. EB hasn't sent an intimidating C&D because they have no cause. In any case a mere written threat would appear silly and would make great Slashdot fodder and advance onto the pages of pro-Wikipedia newspapers such as the Guardian. It would help complete the transformation of news coverage of Wikipedia. In late 2004, articles assessed our reliability and versatility in balanced fashion. Beginning in Q2 2005, notes of caution in articles about Wikipedia started to disappear. Now, we can enter another stage, where Wikipedia will be identified as the peer upstart to EB. Any EB initiated standoff would lend credence to this new image. lots of issues | leave me a message 23:46, 28 August 2005 (UTC)
- ...or it would lend credence to the (unfounded) claim that "Wikipedia just copies Britanica's information." While untrue, a lawsuit like this could color people's opinions of Wikipedia. Lawsuits are also expensive and time-consuming, even for the winning party. See also: SLAPP suit. – Quadell (talk) (sleuth) 11:13, August 29, 2005 (UTC)
Unbelievable. What wimps. Britannica didn't even address the issue. lots of issues | leave me a message 21:35, 31 August 2005 (UTC)
- Here's a link for an off-site database of the articles: http://tinyurl.com/cbwr8 bogdan | Talk 23:25, 31 August 2005 (UTC)
- Thanks bogdan - I guess my tool is not needed now. ;-) JesseW 23:48, 31 August 2005 (UTC)
Question: Non-American cities
[edit]I notice while going over this project that a substantial amount of the remaining entries are names of non-American cities, particularly Indian, Brazilian and South American ones. It could be something like 2-5% of the entries on the lists right now... and while you can find a redirect for some (looking at differences in spelling due to translation) most simply aren't in the English Wikipedia yet.
I feel like it might be a little futile to manually write something for each one, in light of how American cities are done mostly by bots, with a little extra information added by individual editors if it's warranted. Population, location and demographic data is really what's most helpful and that can be automated, given a data source.
I have found the 2000 Brazil Census information for all cities [[2]], for example. I really think that would take a lot of items off these lists, and more importantly, be a good addition to Wikipedia.
But I'm not sure that's really viable. I just thought I'd point out this trend here to see what people think. --W.marsh 22:55, 30 August 2005 (UTC)
- If Ram man is still around, his User:Rambot tools could probably be rewritten to use that data. Try contacting him. - Taxman Talk 23:09, August 30, 2005 (UTC)
- Is the Brazil Census information copyright friendy to wikipedia? --Kiba 23:11, 30 August 2005 (UTC)
- That's a good question, and I actually can't tell... much of the site is in Portugese. Anyway I was probably getting ahead of myself with all of that anyway. My point is that a substantial ammount of the remaining items in this project are cities which it might be hard to find anything beyond basic Gazetteer info on, in english at least. --W.marsh 03:13, 31 August 2005 (UTC)
- It doesn't matter. Raw data can't be copyrighted. bogdan | Talk 04:45, 31 August 2005 (UTC)
- I would have thought that the Brazilian census would be copyrighted by the Brazilian government. To my knowledge, it's only the USA that has the odd policy that anything produced by the government is automatically non-copyrighted.
- Raw data really can't be copyrighted. Pcb21| Pete 08:26, 31 August 2005 (UTC)
I've left a message for Ram-Man. He's out of town until the end of August, but that's now, I suppose. JesseW 10:03, 31 August 2005 (UTC)
- He's gotten back to me, and I pointed him to the link for the Brazil census. JesseW 17:54, 2 September 2005 (UTC)
- As an aside, I reckon it's considerably higher than 5%, although maybe a lot of them will have been done already because its so easy to write a geo stub. Definitely an avenue worth persuing though. Soo 16:12, 31 August 2005 (UTC)
- I'm currently working on a translation of the english U.S. city/county articles into other languages, which is quite complicated. If someone was willing to get the raw brazillian data for me, I could have the rambot upload the articles much more rapidly. I did glance at the data and it all looks pretty good, except the "download" link didn't seem to work for me. Writing a web crawler to pull out the data would take a while. Anyway, if no one else does it, I can get the data sometime in the unforseen future, but if someone else wants to do the work, I can process it with ease. — Ram-Man (comment) (talk) 18:12, September 2, 2005 (UTC)
- A simpler table, containing only the name of the municipality, the state, and the estimated population in 2004, is available at http://www.ibge.gov.br/home/estatistica/populacao/estimativa2004/estimativa.shtm?c=1 . Eugene van der Pijll 21:18, 2 September 2005 (UTC)