Pyarrow Join (int8 and int16)

Question:

I have two Pyarrow Tables and want to join both.

A.join(
        right_table=B, keys="A_id", right_keys="B_id"
    )

Now I got the following error:

{ArrowInvalid} Incompatible data types for corresponding join field keys: FieldRef.Name(A_id) of type int8 and FieldRef.Name(B_id) of type int16

What is the preferred way to solve this issue?

I did not find a way to cast one column to either int8 or int16 in pyarrow Table.

Thanks

Asked By: daniel guo

||

Answers:

you need to change field type of one of your tables.

How to change ‘A_id’ field for your table A

# change type of 'A_id'
schema = A.schema
for num, field in enumerate(schema):
    if field.name == 'A_id':
        new_field = field.with_type(pa.int16()) # return a copy of field with new type
        schema = schema.remove(num) # remove old field 
        schema = schema.insert(num, new_field) # add new field 

A = A.cast(target_schema=schema) # update new schema to Table A
# join tables
A.join(
        right_table=B, keys="A_id", right_keys="B_id"
    )
Answered By: Lucas M. Uriarte
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.