Thursday 21 May 2009

MS-DOS Codepage 850 to ISO 8859-14

Different character sets, don't you love 'em. Today I had to deal with some exported text that was DOS encoded (Codepage 850 to be precise), that was needed in ISO 8859-14 encoding. Luckily, this sort of thing is pretty straightforward in Linux.

On the command line, glibc provides a fantastic converter called iconv. Invoking it is as simple as this:

iconv --from-code=CP850 --to-code=ISO-8859-14 \
original_file > converted file


In my case, I need to incorporate this into a python script. Luckily, python makes this very simple without having to resort to third party tools. Once you've read in your text, encode it into unicode and further encode it into your desired charset.

converted_text = unicode(original_txt, \
'cp850').encode('iso8859_14')

Wednesday 6 May 2009

OpenOffice.org Proper / Title Case

Today I had to convert thousands of lines of text in OpenOffice Calc to title/proper case. I could have scripted it, but it felt like OpenOffice *should* have this sort of functionality built it. Under the Format->Change Case menu options, there are Uppercase/Lowercase options, but no title/proper case.

I found a couple of old macros that purportedly did the job - they didn't work and I really didn't fancy fucking about with VBScript or whatever the hell it is. In my case, the simplest way to do this was to create a neighbouring column and enter =PROPER(A1) with A1 being the neighbouring cell. Copy this simple formula down the rest of the column, copy the values and paste-special the strings. Simples!

I'm sure there are more elegant ways to do this but I had a deadline and I didn't really fancy any extra legwork, just to get the values converted. Hopefully this will save someone some time dicking about with macros that don't work and other such irritants.