Saturday, May 24, 2008

Regular Expressions Locale and Unicode

Today I had a very hard time to figure out, how do you actually use some options of the Python re module. I was especially interested in using re.LOCALE and re.UNICODE.

I don’t understand why there are no examples of their use in the Python documentation. At last I could google a site where I saw this kind of syntax for re.IGNORECASE (or short: re.I):

re.split(re.compile('th',re.I),'Brillig and the Slithy Toves')

By this, I finally knew that I shall use those options in this way:

rx = re.compile('some regex syntax', re.UNICODE)

or re.LOCALE instead of re.UNICODE.

These small syntax details are written nowhere in the Python documentation and that bothers me.

No comments: