Jobpy released (pp is good, jobpy is more special)

Hello world Jobpy!

As part of my diploma thesis I needed a multithreaded job producer/consumer architecture implementation with at-least-once semantics. To make testing easier I implemented the architecture and semantics part in a seperate framework and made sure it worked well. The code is pretty java-ish which I’m not all that happy about, but I thought that maybe it could be of use for someone anyway. So I created a little sourceforge.net project for jobpy.

You may ask yourself now why the hell to use jobpy when there is also parallelpython (aka pp). Well, there are some problems with pp - E.g. it’s pretty hard to share ressources like a socket or a database among the consumer threads because the jobs have to be serializable. But sometimes data structures cannot be serialized and you would have to code around this limitation. Jobpy is not designed to replace pp, it’s just a solution for the special case of multithreaded producer/consumer job processing with optionally shared ressources.

wiki2beamer 0.7 is out

Hi all out there,

Wiki2beamer is a small tool to create LaTeX beamer presentations by converting a wiki-like syntax to LaTeX beamer code. It took a while after the 0.7 alpha 2 release, but I wanted to give it some time: wiki2beamer-0.7 is released. Major new features are:

  • Easy animated code listings via the [code] environment
  • Total escaping from wiki syntax via the [nowiki] environment
  • Template-less mode via the [autotemplate] environment
  • A setup.py
  • A man-page!!!11

So, go there, and spread the word :)

I’m on TV

Finally, after some time, Thaddäus finished cutting and converting the first recording of the Javakurs 2009. He started with LE3 - which is me and Dennis teaching our students about methods and call-by-reference vs. call-by-value in Java. I’m pretty happy with my performance and so I’m happy that I can now show the world ;)

Here it is: Javakurs 2009 LE3, with slides

wiki2beamer 0.7 alpha 2

Hey there, wiki2beamer 0.7 alpha 2 is out. This means there is a dramatic increase of features of which the most important are:

  1. autotemplate-support: You don’t have to create a template file anymore, wiki2beamer can now do the job for you and create a fully functional .tex file from the .txt sources.
  2. code-environment: There is a new code environment which allows you to create animated code listings with a very dense and easy grammar.
  3. nowiki-environment: The nowiki environment allows you to completely escape from wiki2beamer.

These features were written in a very short time, so there probably are bugs. Also some feedback about the syntax and usability would be very helpfull.

MOTM

Music of the month — beeing (electro-something):

Paul Kalkbrenner - Sky and Sand

Daft Punk - Aerodynamic

Фузион - I’m coming! :)

wiki2beamer 0.6 is out

Do you like LaTeX? Would you also like to create your presentation slides for your talks with LaTeX-Beamer but fear the overhead? Then wiki2beamer probably is something for you. wiki2beamer speeds up beamer code creation by converting a media-wiki like syntax into functioning LaTeX-Beamer coder.

Here’s an example of the syntax:


==== Frametitle ====
* Bullet 1
* Bullet 2

# Enumerate
#<2-> Enumerate with pup-up effect

This really speeds up slide creation alot. Now wiki2beamer 0.6 is out which contains a small patch from me which allows you to manually close frames with [frame]>   and then continue with some LaTeX code outside of the frame environment. The examples contain a demonstration of how this works and how it can be used, espacially with verbatim environments and beamer.

On may, 8th I will give a small talk at Freitagsrunde TechTalks where I show how to efficiently create source-code centric presentations with LaTeX beamer (and wiki2beamer). So, I’d be glad if you’d just show up if you’re interested.

Einfach mal Fresse halten.

Kennt ihr das? Anstelle nichts zu sagen, zuzuhören und sich zu überlegen ob es tatsächlich ein gutes Argument dagegen oder dafür gibt welches noch nicht hervorgebracht wurde wird einfach geplappert. Natürlich kennt ihr das. Und wer dieses Verhalten auch manchmal an sich selbst entdeckt, der sollte sich vielleicht mal auf Logorrhoe diagnostizieren lassen. Existenzbeweis durch Wikipedia. Auch gut: Moria ist keine unterirdische Phantasy-Stadt. Die Betroffenen sollten mehr bloggen. Hilft und entlastet die Umwelt.

It’s worth the time.

Street photography. Rediscovered my love, once again, and spent some hours outside on the streets of Berlin. Look what I’ve seen.


Weary head

More in my gallery on deviantart.com

How many cities has the world?

So this is what I did this weekend: For my diploma thesis I needed a list of all the cities of the world (to filter text collections). There already exist corpora like the “Getty Thesaurus of Geographic Names” but they are not free and way over my needs - I just want a list.

Who could have such a list? First try: IATA airport list from wikipedia. Parsed. Erroneous (typos, problems with parsing…). Second try: openstreetmap.org.

And here is where the story begins. Openstreetmap.org collects geographical knowledge in a huge database and publishes it under a free license (CC-BY-SA). The database dump can be downloaded as a huge XML file which currently is about 5GB in bzip2 compression. The expected ratio for decompression is 10:1 so this is a fricking 50GB XML file. The structure is fairly simple. There are nodes, ways and relations and these can have tags. The tags then encode information like “this is a place of the kind city” or “its name is Footown”. To get this information I wrote a little python script that iterates over every line and extracts the relevant parts with some regular expressions. A run on the 5MB bzip2ed relations file took 17s. Well … that was to slow. So I removed the regexps and did some dirty by-hand parsing. 8s. Better. But still to slow.

So, next step: C.
The evil thing!


But, you have to admit - there’s nothing else when it comes to performance. About six hours of coding later I got the first results: 3.5s for a 1,170,000 lines XML file. Good. After some further improvements I got it down to 2.6s. Yeah! :)

On a 1.6GHz Core2 Duo the combination of bzcat and grepplaces.c (both running on one core) gives around 1.2MB/s reading speed on the bzip2ed planet-file. So a complete scan over the planet-file now takes about 70 minutes.

So, here’s the code: http://cleeus.de/grepplaces/

The extracted corpus will follow, as soon as it’s ready.

So long, some statistics:


$ grep “^city” places_planet.txt | wc -l
4179
$ grep “^town” places_planet.txt | wc -l
29401
$ grep “^village” places_planet.txt | wc -l
249716

Commons, baby, light my fire!

Although this will be yet another literature suggestion and I’m usually doing these in german, this one will (hopefully) come along (nicely) in english. I just started a book that appears to be one masterpiece. I can’t really put into words what makes a book a masterpiece in my view but it seems to be something that is between the lines. Occasionally, there are books that make you feel bad and take your energy away - and by that I don’t mean that they are boring. And there are masterpieces that do the exact opposite: They set your mind on fire in a positive way. They give you a starting point for a thought on its own with every of their sentences.

On of those seems to be “The Wealth of Networks” by Yochai Benkler. Benkler is a professor at the famous Berkman Center For Internet And Society at Harvard. Fame aside, he also seems to be someone with a very beautiful writing and thinking style. I merely read the first few chapters yet and already if not learned then understood deeper allot of things that move the creation and distribution of information goods. He is an advocate of free and open software and a general information commons and gives allot of reasons for everyone, even hardcore economists, to also embrace the emergence of a new world of culture creation. I have yet to see what is coming but it seems be a strong statement against strong copyright policy because of simple economic reasons and not some 69’s dream of equality (not that I wouldn’t like these dreams). So for everyone who already loves commons and who would like to bring light into his understandings of motivations for information production and who wants to get insp(f)ired, I can really suggest the read. :)