convert csv to sqlite but i don't know if column valid

Question:

i want white function in python to convert csv to sqlite. in csv, i have 4 columns Setting State Comment and Path. Sometimes the real Path is in the next column or two next columns not every time in Path column.

def csv_to_sqlite(csv_file, sqlite_file):
    # Connect to the SQLite database
    connection = sqlite3.connect(sqlite_file)
    cursor = connection.cursor()

    # Read the CSV file
    with open(csv_file, 'r') as f:
        reader = csv.reader(f)
        headers = next(reader)

        # Create the table in the SQLite database
        cursor.execute(f'CREATE TABLE data ({", ".join(headers)})')

        # Get the index of the "Path" column
        path_index = headers.index("Path")

        # Insert the data from the CSV file into the SQLite database
        for row in reader:
            modified_row = row.copy()

            # Check if the "Path" column starts with ''
            if re.match(r'^\', modified_row[path_index]):
                cursor.execute(f'INSERT INTO data VALUES ({", ".join(["?" for header in headers])})', modified_row)
            else:
                # Search for the first column that starts with ''
                for i in range(path_index + 1, len(headers)):
                    if re.match(r'^\', modified_row[i]):
                        modified_row[path_index] = modified_row[i]
                        cursor.execute(f'INSERT INTO data VALUES ({", ".join(["?" for header in headers])})',
                                       modified_row)
                        break

    # Commit the changes and close the connection
    connection.commit()
    connection.close()

but i get error

    cursor.execute(f'INSERT INTO data VALUES ({", ".join(["?" for header in headers])})', modified_row)
sqlite3.ProgrammingError: Incorrect number of bindings supplied. The current statement uses 4, and there are 5 supplied.

i expect get db like csv and not error

Edit:
i try to solve this problem from pandas

    df = pd.read_csv(file_path, sep=',', encoding='cp1252')

i get error
pandas.errors.ParserError: Error tokenizing data. C error: Expected 4 fields in line 18, saw 5

this is my data enter image description here

example the problem enter image description here

Asked By: Topaz Mothada

||

Answers:

The error is an evidence that the current row has 5 columns while the header row only has 4. You should ignore excess columns by limiting the used length of the row:

cursor.execute(f'INSERT INTO data VALUES ({", ".join(["?" for header in headers])})',
               modified_row[:len(headers)])
Answered By: Serge Ballesta

The issue is probably due to the number of values in modified_row being different than the number of columns in data. This is likely because the code is appending extra values to modified_row when searching for the first column that starts with ''.

You can try to only include the values for the columns in data.

Answered By: stanislav-iablokov

Try to check what encoding of the file to use with pandas. After that, you will skip lines with an error. Good luck

Answered By: oren a
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.