Named Entity list from German Wikipedia
Wednesday, February 3. 2010 • Category: Automatic Mind • Comment (1) • Trackbacks (0)In 2008, I developed an anagram solver that is still presented online in some German posts of my other blog. It actually is an anagram solver but it comes as a riddle solver for a stupid type of TV quizzes. To cut a long story short: do never ever call these numbers on the screen. If you just want to know the solution, use the linked pages and turn off your TV right after.
So what's interesting about anagram solvers and these riddles? Well, one needs a fairly large word list. I took the one from the Ispell dictionary, the one I have it online on the page of BananaSplit. I soon found out that these riddles made use of celebrity names a lot. Yet my word list back then contained only a few named entities. I solved this issue by Wikipedia harvesting. What do all humans have in common? Right, they are born at some point. And of course, everybody has to die. Now Wikipedia has these births and deaths lists in the entry for each year of the calendar. Back in May 2008, I downloaded all pages from year 1000 to year 2008 and simply extracted all the persons that had ever been born or that had died. This gave me a list of 51.353 person names from German Wikipedia.
Recently a colleague asked me if he could have this list. So why not simply putting it online? Here we go.
- Person name list from Wikipedia, CSV (bz2 compressed, 0,55MB) – downloaded in May 2008.
- Perl-Script used for extracting the named entities, in case you want to do this yourself.
I hope this is useful for some people out there. Feel free to let me know what you're using it for or what improvements could be made. You can use the comment box below or write me an e-mail.





