Thursday 21 May 2009

MS-DOS Codepage 850 to ISO 8859-14

Different character sets, don't you love 'em. Today I had to deal with some exported text that was DOS encoded (Codepage 850 to be precise), that was needed in ISO 8859-14 encoding. Luckily, this sort of thing is pretty straightforward in Linux.

On the command line, glibc provides a fantastic converter called iconv. Invoking it is as simple as this:

iconv --from-code=CP850 --to-code=ISO-8859-14 \
original_file > converted file


In my case, I need to incorporate this into a python script. Luckily, python makes this very simple without having to resort to third party tools. Once you've read in your text, encode it into unicode and further encode it into your desired charset.

converted_text = unicode(original_txt, \
'cp850').encode('iso8859_14')

No comments: