Writing dbf file with custom encoding (DBF Package)

Question:

I have some Characters in Farsi and I want to write them to a dbf file with my custom codepage which is 1 byte per character.
I think this problem can be solved in one of these two ways:

1- Passing my custom codepage to the dbf table.

2- Writing binary data directly to the dbf file without using the default codepage of dbf package (which is utf8).

How can I solve this problem with either of these approaches?

Here is the code:

import dbf

man = 'مرد'
woman = 'زن'
row1 = (man, woman)
row2 = (man, woman)

with open('./file.dbf', 'w') as f:
    table = dbf.Table(filename='./file.dbf',
        field_specs='field1 C(3); field2 C(3)', codepage='customCodePage', on_disk=True)
    table.open(dbf.READ_WRITE)
    table.append(row1)
    table.append(row2)
    table.close()
Asked By: Hossein Yousefian

||

Answers:

dbf was designed to work with existing code pages, and so custom code pages were not considered.

If you’re adventerous:

  • add a custom number to dbf.code_pages with short and long decriptions (e.g. dbf.code_pages[0xa1] = ('farsi','single-byte farsi code page')
  • register your custom code page with the codecs module so that codecs.getdecoder('farsi') and codecs.getencoder('farsi') (or whatever name you choose to use) returns the appropriate decoder/encoder
  • test, test, test (with backup copies)
Answered By: Ethan Furman

After trying to register my codec I ended up translating my data from utf8 to "Custom Farsi codec" and then to equivalent character of windows-1256 that has the same decimal codepoint. So when the user reads the data with the custom codec, the windows-1256 characters will point to the right decimal in custom codec, of course characters in this raw form are not meaningful.

An example would be Letter پ in unicode has decimal codepoint of 1662 and in custom codec it has codepoint of 148. the equivalent of 148 codepoint in windows-1256 is ”. so the پ translates to ” using 3 different dictionaries. I did this for all characters in Farsi keyboard.

Answered By: Hossein Yousefian
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.