Need code for importing .csv file via python or ruby code to Cassandra 3.11.3 DB (Production use)

Question:

We have 7 node Cassandra 3.11.3 production cluster, we get ticket details dump to a mid server, I need to read from this .csv file and import .csv data to cassandra table. I tried ruby code which was easy for me to write but it does not take care of all the column values (As this .csv will have special characters, enters/different lines, UTF issues, too much of text description as it is in ticketing tool) as data keep changing in each and every row in .csv.

I Want to know if ruby or python is good to perform this activity in production or does anyone have good sample code for mitigating issues mentioned above and performing this kind of activity in production environment?

Asked By: Hareesha

||

Answers:

Both Ruby and Python are perfect for this kind of task, but if your source file is in bad format then any potential tool could fail – there is no magic button tool that could deduce the context from the (broken) data file and fix all the problems for you automatically.

I’d suggest splitting the task into two parts: 1) fix the encoding and data quality problem(s) (and perform any data transformations if necessary) and then 2) import clean data.

Task 2 could be easily done with almost any programming language (that has appropriate cassandra driver available) but if you have a well-formatted csv source you probably don’t need any hacking at all (depending on the use case, of course) – Cassandra supports copy ... from command that allows importing data from csv directly (https://docs.datastax.com/en/cql/3.3/cql/cql_reference/cqlshCopy.html).

Answered By: Konstantin Strukov