The Problem of Multiple Threads for Sqlite3 – got an unexpected result

Question:

i try to test wirting/reading data into sqlite database by using multiple threads.

Sometimes it doesn’t seem to get the right result. Is that BUG?

i make two files to test it. the first one is test.py.

import threading
import master

def add():
    for i in range(10):
        num = master.get()
        tmp = num + 1
        master.update(tmp)
        print(f"add: {i}, {num}")


def sub():
    for i in range(10):
        num = master.get()
        tmp = num - 1
        master.update(tmp)
        print(f"sub: {i}, {num}")

if __name__ == "__main__":
    subThread01 = threading.Thread(target=add)
    subThread02 = threading.Thread(target=sub)
    subThread01.start()
    subThread02.start()
    subThread01.join()
    subThread02.join()
    print(master.get())

the second file is master.py.

import sqlite3
import threading

lock = threading.Lock()

conn = sqlite3.connect(':memory:', check_same_thread=False)
cur = conn.cursor()

# creat table
cur.execute("""CREATE TABLE IF NOT EXISTS info ( userid INT PRIMARY KEY, data INT );""")
conn.commit()

# insert init data
db = (0, 0)
cur.execute("INSERT INTO info VALUES(?, ?);", db)
conn.commit()

# update data
def update(num):
    with lock:
        db = (num, 0)
        cur.execute("UPDATE info set data = ? where userid = ?;", db)
        conn.commit()

# get data
def get():
    with lock:
        cur.execute(f"SELECT data FROM info where userid = 0;")
        result = cur.fetchone()
        return result[0]

the result I expected was 0 when i run the test.py. but the actual result is random, sometimes -3, sometimes 9, etc.

Where does the problem lie?

Asked By: Brian

||

Answers:

This is probably a feature, not a bug.

For the result to be 0, both threads would have to be scheduled for running exactly in sequence.
And if you had only two threads, that might work.

However there is at third thread (the main thread).
Without extra measures, there is no way to tell which thread will be selected for running after that.

You could however use e.g. a Barrier instead of a Lock to enforce the threads running one after another.

Answered By: Roland Smith

update and get functions are thread-safe but add and sub functions are not. This will create synchronization problems. You should also do thread-safe your add and sub functions like ;

def add():
    for i in range(10):
        with lock:
            num = master.get()
            tmp = num + 1
            master.update(tmp)
            print(f"add: {i}, {num}")


def sub():
    for i in range(10):
        with lock:
            num = master.get()
            tmp = num - 1
            master.update(tmp)
            print(f"sub: {i}, {num}")

Edit:
My answer is missing, I forgot to specify a new lock object. It should be like:

import threading
import master

lock=threading.Lock()

def add():
    for i in range(10):
        with lock:
            num = master.get()
            tmp = num + 1
            master.update(tmp)
            print(f"add: {i}, {num}")


def sub():
    for i in range(10):
        with lock:
            num = master.get()
            tmp = num - 1
            master.update(tmp)
            print(f"sub: {i}, {num}")

Edit 2 (As an answer to OP’s comment):

Let’s examine, (please read comments in the add func)

def add():
    for i in range(10):
        num = master.get() # let's say num==0
        tmp = num + 1 
        """
        Now tmp==1. And think that, GIL released and OS switch to subThread02.
        When switching, i==0 this is where we left
        """
        master.update(tmp) 

continue with subThread02;

def sub():
    for i in range(10):
        num = master.get()
        tmp = num - 1
        master.update(tmp)

Think that, GIL not released and for loop finished(Without any Interrupt). last operation will be master.update(-10)

After the last operation, GIL will released and then Operating System switch to subThread01.

In add function, we will continue where we leave, In add function, master.update(0)(Attention please) will be evaluated and then for loop will iterate 9 times and lastly it will do master.update(10). So synchronization problem will occur and print(master.get()) will show you 10, but result can be vary maybe 5 or -3 or maybe 0

Also you saying that "I removed sqlite and set a variable and then I tested it, there is no any synchronization problems" I will want to you change this for i in range(100): to for i in range(100000): in both threads.(Because for i in range(100): loop finished immediately without any interruption and you will see correct result, but this is not guarantee, interrupts can happen anytime) Then you will see wrong results (Please run it more than one to see wrong result).

Please look at this also.

Answered By: Veysel Olgun
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.