The Problem of Multiple Threads for Sqlite3 – got an unexpected result
Question:
i try to test wirting/reading data into sqlite database by using multiple threads.
Sometimes it doesn’t seem to get the right result. Is that BUG?
i make two files to test it. the first one is test.py.
import threading
import master
def add():
for i in range(10):
num = master.get()
tmp = num + 1
master.update(tmp)
print(f"add: {i}, {num}")
def sub():
for i in range(10):
num = master.get()
tmp = num - 1
master.update(tmp)
print(f"sub: {i}, {num}")
if __name__ == "__main__":
subThread01 = threading.Thread(target=add)
subThread02 = threading.Thread(target=sub)
subThread01.start()
subThread02.start()
subThread01.join()
subThread02.join()
print(master.get())
the second file is master.py.
import sqlite3
import threading
lock = threading.Lock()
conn = sqlite3.connect(':memory:', check_same_thread=False)
cur = conn.cursor()
# creat table
cur.execute("""CREATE TABLE IF NOT EXISTS info ( userid INT PRIMARY KEY, data INT );""")
conn.commit()
# insert init data
db = (0, 0)
cur.execute("INSERT INTO info VALUES(?, ?);", db)
conn.commit()
# update data
def update(num):
with lock:
db = (num, 0)
cur.execute("UPDATE info set data = ? where userid = ?;", db)
conn.commit()
# get data
def get():
with lock:
cur.execute(f"SELECT data FROM info where userid = 0;")
result = cur.fetchone()
return result[0]
the result I expected was 0 when i run the test.py
. but the actual result is random, sometimes -3, sometimes 9, etc.
Where does the problem lie?
Answers:
This is probably a feature, not a bug.
For the result to be 0, both threads would have to be scheduled for running exactly in sequence.
And if you had only two threads, that might work.
However there is at third thread (the main thread).
Without extra measures, there is no way to tell which thread will be selected for running after that.
You could however use e.g. a Barrier
instead of a Lock
to enforce the threads running one after another.
update
and get
functions are thread-safe
but add
and sub
functions are not. This will create synchronization problems. You should also do thread-safe your add
and sub
functions like ;
def add():
for i in range(10):
with lock:
num = master.get()
tmp = num + 1
master.update(tmp)
print(f"add: {i}, {num}")
def sub():
for i in range(10):
with lock:
num = master.get()
tmp = num - 1
master.update(tmp)
print(f"sub: {i}, {num}")
Edit:
My answer is missing, I forgot to specify a new lock object. It should be like:
import threading
import master
lock=threading.Lock()
def add():
for i in range(10):
with lock:
num = master.get()
tmp = num + 1
master.update(tmp)
print(f"add: {i}, {num}")
def sub():
for i in range(10):
with lock:
num = master.get()
tmp = num - 1
master.update(tmp)
print(f"sub: {i}, {num}")
Edit 2 (As an answer to OP’s comment):
Let’s examine, (please read comments in the add
func)
def add():
for i in range(10):
num = master.get() # let's say num==0
tmp = num + 1
"""
Now tmp==1. And think that, GIL released and OS switch to subThread02.
When switching, i==0 this is where we left
"""
master.update(tmp)
continue with subThread02;
def sub():
for i in range(10):
num = master.get()
tmp = num - 1
master.update(tmp)
Think that, GIL not released and for loop finished(Without any Interrupt). last operation will be master.update(-10)
After the last operation, GIL will released and then Operating System switch to subThread01.
In add
function, we will continue where we leave, In add
function, master.update(0)
(Attention please) will be evaluated and then for loop will iterate 9 times and lastly it will do master.update(10). So synchronization problem will occur and print(master.get())
will show you 10, but result can be vary maybe 5 or -3 or maybe 0
Also you saying that "I removed sqlite
and set a variable and then I tested it, there is no any synchronization problems" I will want to you change this for i in range(100):
to for i in range(100000):
in both threads.(Because for i in range(100):
loop finished immediately without any interruption and you will see correct result, but this is not guarantee, interrupts can happen anytime) Then you will see wrong results (Please run it more than one to see wrong result).
Please look at this also.
i try to test wirting/reading data into sqlite database by using multiple threads.
Sometimes it doesn’t seem to get the right result. Is that BUG?
i make two files to test it. the first one is test.py.
import threading
import master
def add():
for i in range(10):
num = master.get()
tmp = num + 1
master.update(tmp)
print(f"add: {i}, {num}")
def sub():
for i in range(10):
num = master.get()
tmp = num - 1
master.update(tmp)
print(f"sub: {i}, {num}")
if __name__ == "__main__":
subThread01 = threading.Thread(target=add)
subThread02 = threading.Thread(target=sub)
subThread01.start()
subThread02.start()
subThread01.join()
subThread02.join()
print(master.get())
the second file is master.py.
import sqlite3
import threading
lock = threading.Lock()
conn = sqlite3.connect(':memory:', check_same_thread=False)
cur = conn.cursor()
# creat table
cur.execute("""CREATE TABLE IF NOT EXISTS info ( userid INT PRIMARY KEY, data INT );""")
conn.commit()
# insert init data
db = (0, 0)
cur.execute("INSERT INTO info VALUES(?, ?);", db)
conn.commit()
# update data
def update(num):
with lock:
db = (num, 0)
cur.execute("UPDATE info set data = ? where userid = ?;", db)
conn.commit()
# get data
def get():
with lock:
cur.execute(f"SELECT data FROM info where userid = 0;")
result = cur.fetchone()
return result[0]
the result I expected was 0 when i run the test.py
. but the actual result is random, sometimes -3, sometimes 9, etc.
Where does the problem lie?
This is probably a feature, not a bug.
For the result to be 0, both threads would have to be scheduled for running exactly in sequence.
And if you had only two threads, that might work.
However there is at third thread (the main thread).
Without extra measures, there is no way to tell which thread will be selected for running after that.
You could however use e.g. a Barrier
instead of a Lock
to enforce the threads running one after another.
update
and get
functions are thread-safe
but add
and sub
functions are not. This will create synchronization problems. You should also do thread-safe your add
and sub
functions like ;
def add():
for i in range(10):
with lock:
num = master.get()
tmp = num + 1
master.update(tmp)
print(f"add: {i}, {num}")
def sub():
for i in range(10):
with lock:
num = master.get()
tmp = num - 1
master.update(tmp)
print(f"sub: {i}, {num}")
Edit:
My answer is missing, I forgot to specify a new lock object. It should be like:
import threading
import master
lock=threading.Lock()
def add():
for i in range(10):
with lock:
num = master.get()
tmp = num + 1
master.update(tmp)
print(f"add: {i}, {num}")
def sub():
for i in range(10):
with lock:
num = master.get()
tmp = num - 1
master.update(tmp)
print(f"sub: {i}, {num}")
Edit 2 (As an answer to OP’s comment):
Let’s examine, (please read comments in the add
func)
def add():
for i in range(10):
num = master.get() # let's say num==0
tmp = num + 1
"""
Now tmp==1. And think that, GIL released and OS switch to subThread02.
When switching, i==0 this is where we left
"""
master.update(tmp)
continue with subThread02;
def sub():
for i in range(10):
num = master.get()
tmp = num - 1
master.update(tmp)
Think that, GIL not released and for loop finished(Without any Interrupt). last operation will be master.update(-10)
After the last operation, GIL will released and then Operating System switch to subThread01.
In add
function, we will continue where we leave, In add
function, master.update(0)
(Attention please) will be evaluated and then for loop will iterate 9 times and lastly it will do master.update(10). So synchronization problem will occur and print(master.get())
will show you 10, but result can be vary maybe 5 or -3 or maybe 0
Also you saying that "I removed sqlite
and set a variable and then I tested it, there is no any synchronization problems" I will want to you change this for i in range(100):
to for i in range(100000):
in both threads.(Because for i in range(100):
loop finished immediately without any interruption and you will see correct result, but this is not guarantee, interrupts can happen anytime) Then you will see wrong results (Please run it more than one to see wrong result).
Please look at this also.