convert csv to sqlite but i don't know if column valid
Question:
i want white function in python to convert csv to sqlite. in csv, i have 4 columns Setting State Comment and Path. Sometimes the real Path is in the next column or two next columns not every time in Path column.
def csv_to_sqlite(csv_file, sqlite_file):
# Connect to the SQLite database
connection = sqlite3.connect(sqlite_file)
cursor = connection.cursor()
# Read the CSV file
with open(csv_file, 'r') as f:
reader = csv.reader(f)
headers = next(reader)
# Create the table in the SQLite database
cursor.execute(f'CREATE TABLE data ({", ".join(headers)})')
# Get the index of the "Path" column
path_index = headers.index("Path")
# Insert the data from the CSV file into the SQLite database
for row in reader:
modified_row = row.copy()
# Check if the "Path" column starts with ''
if re.match(r'^\', modified_row[path_index]):
cursor.execute(f'INSERT INTO data VALUES ({", ".join(["?" for header in headers])})', modified_row)
else:
# Search for the first column that starts with ''
for i in range(path_index + 1, len(headers)):
if re.match(r'^\', modified_row[i]):
modified_row[path_index] = modified_row[i]
cursor.execute(f'INSERT INTO data VALUES ({", ".join(["?" for header in headers])})',
modified_row)
break
# Commit the changes and close the connection
connection.commit()
connection.close()
but i get error
cursor.execute(f'INSERT INTO data VALUES ({", ".join(["?" for header in headers])})', modified_row)
sqlite3.ProgrammingError: Incorrect number of bindings supplied. The current statement uses 4, and there are 5 supplied.
i expect get db like csv and not error
Edit:
i try to solve this problem from pandas
df = pd.read_csv(file_path, sep=',', encoding='cp1252')
i get error
pandas.errors.ParserError: Error tokenizing data. C error: Expected 4 fields in line 18, saw 5
this is my data enter image description here
example the problem enter image description here
Answers:
The error is an evidence that the current row has 5 columns while the header row only has 4. You should ignore excess columns by limiting the used length of the row:
cursor.execute(f'INSERT INTO data VALUES ({", ".join(["?" for header in headers])})',
modified_row[:len(headers)])
The issue is probably due to the number of values in modified_row
being different than the number of columns in data
. This is likely because the code is appending extra values to modified_row
when searching for the first column that starts with ''
.
You can try to only include the values for the columns in data
.
Try to check what encoding of the file to use with pandas. After that, you will skip lines with an error. Good luck
i want white function in python to convert csv to sqlite. in csv, i have 4 columns Setting State Comment and Path. Sometimes the real Path is in the next column or two next columns not every time in Path column.
def csv_to_sqlite(csv_file, sqlite_file):
# Connect to the SQLite database
connection = sqlite3.connect(sqlite_file)
cursor = connection.cursor()
# Read the CSV file
with open(csv_file, 'r') as f:
reader = csv.reader(f)
headers = next(reader)
# Create the table in the SQLite database
cursor.execute(f'CREATE TABLE data ({", ".join(headers)})')
# Get the index of the "Path" column
path_index = headers.index("Path")
# Insert the data from the CSV file into the SQLite database
for row in reader:
modified_row = row.copy()
# Check if the "Path" column starts with ''
if re.match(r'^\', modified_row[path_index]):
cursor.execute(f'INSERT INTO data VALUES ({", ".join(["?" for header in headers])})', modified_row)
else:
# Search for the first column that starts with ''
for i in range(path_index + 1, len(headers)):
if re.match(r'^\', modified_row[i]):
modified_row[path_index] = modified_row[i]
cursor.execute(f'INSERT INTO data VALUES ({", ".join(["?" for header in headers])})',
modified_row)
break
# Commit the changes and close the connection
connection.commit()
connection.close()
but i get error
cursor.execute(f'INSERT INTO data VALUES ({", ".join(["?" for header in headers])})', modified_row)
sqlite3.ProgrammingError: Incorrect number of bindings supplied. The current statement uses 4, and there are 5 supplied.
i expect get db like csv and not error
Edit:
i try to solve this problem from pandas
df = pd.read_csv(file_path, sep=',', encoding='cp1252')
i get error
pandas.errors.ParserError: Error tokenizing data. C error: Expected 4 fields in line 18, saw 5
this is my data enter image description here
example the problem enter image description here
The error is an evidence that the current row has 5 columns while the header row only has 4. You should ignore excess columns by limiting the used length of the row:
cursor.execute(f'INSERT INTO data VALUES ({", ".join(["?" for header in headers])})',
modified_row[:len(headers)])
The issue is probably due to the number of values in modified_row
being different than the number of columns in data
. This is likely because the code is appending extra values to modified_row
when searching for the first column that starts with ''
.
You can try to only include the values for the columns in data
.
Try to check what encoding of the file to use with pandas. After that, you will skip lines with an error. Good luck