Python:How os.fork() works?

Question:

I am learning multiprocessing in python. I tried multiprocessing and after I read the source code of multiprocessing module, I found it use os.fork(), so I wrote some code to test os.fork(), but I am stuck. My code is as following:

#!/usr/bin/env python
# -*- coding: utf-8 -*-

import os
import time

for i in range(2):
    print '**********%d***********' % i
    pid = os.fork()
    print "Pid %d" % pid

I think that each print will be executed two times but they execute three times. I can’t understand how this works?
I read this Need to know how fork works?
From what this article says it also will be executed twice, so I am so stuck…

Asked By: tudouya

||

Answers:

To answer the question directly, os.fork() works by calling the underlying OS function fork().

But you are surely interested in what this does. Well this creates another process which will resume at exactly the same place as this one. So within the first loop run, you get a fork after which you have two processes, the “original one” (which gets a pid value of the PID of the child process) and the forked one (which gets a pid value of 0).

They both print their pid value and go on with the 2nd loop run, which they both print. Then they both fork, leaving you with 4 processes which all print their respective pid values. Two of them should be 0, the other two should be the PIDs of the child they just created.

Changing the code to

#!/usr/bin/env python
# -*- coding: utf-8 -*-

import os
import time

for i in range(2):
    print '**********%d***********' % i
    pid = os.fork()
    if pid == 0:
        # We are in the child process.
        print "%d (child) just was created by %d." % (os.getpid(), os.getppid())
    else:
        # We are in the parent process.
        print "%d (parent) just created %d." % (os.getpid(), pid)

you will see better what happens: each process will tell you its own PID and what happened on the fork.

Answered By: glglgl

First of all, remove that print '******...' line. It just confuses everyone. Instead, let’s try this code…

import os
import time

for i in range(2):
    print("I'm about to be a dad!")
    time.sleep(5)
    pid = os.fork()
    if pid == 0:
        print("I'm {}, a newborn that knows to write to the terminal!".format(os.getpid()))
    else:
        print("I'm the dad of {}, and he knows to use the terminal!".format(pid))
        os.waitpid(pid, 0)

Okay, first of all, what is “fork”? Fork is a feature of modern and standard-compliant operating systems (except of M$ Windows: that joke of an OS is all but modern and standard-compliant) that allows a process (a.k.a: “program”, and that includes the Python interpreter!) to literally make an exact duplicate of itself, effectively creating a new process (another instance of the “program”). Once that magic is done, both processes are independent. Changing anything in one of them does not affect the other one.

The process responsible for spelling out this dark and ancient incantation is known as the parent process. The soulless result of this immoral abomination towards life itself is known as the child process.

As shall be obvious to all, including those for which it isn’t, you can become a member of that select group of programmers who have sold their soul by means of os.fork(). This function performs a fork operation, and thus results in a second process being created out of thin air.

Now, what does this function return, or more importantly, how does it even return? If you want not to become insane, please don’t go and read the Linux kernel’s /kernel/fork.c file! Once the kernel does what we know it has to do, but we don’t want to accept it, os.fork() returns in the two processes! Yes, even the call stack is copied on!

So, if they are exact copies, how does one differentiate between parent and child? Simple. If the result of os.fork() is zero, then you’re working in the child. Otherwise, you’re working in the parent, and the return value is the PID (Process IDentifier) of the child. Anyway, the child can get its own PID from os.getpid(), no?

Now, taking this into account, and the fact that doing fork() inside a loop is the recipe for mess, this is what happens. Let’s call the original process the “master” process…

  • Master: i = 0, forks into child-#1-of-master
    • Child-#1-of-master: i = 1 forks into child-#1-of-child-#1-of-master
    • Child-#1-of-child-#1-of-master: for loop over, exits
    • Child-#1-of-master: for loop over, exits
  • Master: i = 1, forks into child-#2-of-master
    • Child-#2-of-master: i = 1 forks into child-#1-of-child-#2-of-master
    • Child-#1-of-child-#2-of-master: for loop over, exits
    • Child-#2-of-master: for loop over, exits
  • Master: for loop over, exits

As you can see, there are a total of 6 parent/child prints coming from 4 unique processes, resulting in 6 lines of output, something like…

I’m the dad of 12120, and he knows to use the terminal!

I’m 12120, a newborn that knows to write to the terminal!

I’m the dad of 12121, and he knows to use the terminal!

I’m 12121, a newborn that knows to write to the terminal!

I’m the dad of 12122, and he knows to use the terminal!

I’m 12122, a newborn that knows to write to the terminal!

But that’s just arbitrary, it could have output this out instead…

I’m 12120, a newborn that knows to write to the terminal!

I’m the dad of 12120, and he knows to use the terminal!

I’m 12121, a newborn that knows to write to the terminal!

I’m the dad of 12121, and he knows to use the terminal!

I’m 12122, a newborn that knows to write to the terminal!

I’m the dad of 12122, and he knows to use the terminal!

Or anything other than that. The OS (and your motherboard’s funky clocks) is solely responsible for the order in which processes get timeslices, so go blame on Torvalds (and expect no self-steem when back) if you dislike how the kernel manages to organize your processes ;).

I hope this has led some light on you!

Answered By: 3442

As both answers cover almost everything about how os.fork() works. I wanted to add some more specific information regarding the case.

Fork system call is used for creating a new process which is called the child process, which runs concurrently with the process that makes the fork() call (parent process). After a new child process created both processes will execute (Execution state; heap, stack, and processor registers) the next instructions following fork() system call(system call that creates a new process identical to
the calling one, makes a copy of the text, data, stack, and heap). fork() does not take parameters and returns an integer value. After successfully call to fork(), the child process is basically an exact duplicate of the parent process. fork is Unix system call which is used to create a new process. System call way to request services from the kernel. fork returns a value, if the value greater than 0, that means successful fork but you may get below 0 as well, this might be that process out of memory or other errors, it cannot be created and it will raise an error. If it is equal to zero it is a child greater than 0, which means parent process. So the process runs concurrently, we archive here parallelization, but everything after fork call will be executed.

Calling fork duplicates process that is referred to parent. fork means two different identical copies of address base one for child one for the parent process. Both proses are exact copy of each other except PID and a few others. Since both processes have identical but separate address spaces, those variables initialized before the fork() call have the same values in both address spaces. Since every process has its own address space, any modifications will be independent of the others. In other words, if the parent changes the value of its variable, the modification will only affect the variable in the parent process address space. Other address spaces created by fork() calls will not be affected even though they have identical variable names.

Upon fork() all resources owned by the parent are duplicated and the copy is given to the child, In Linux, fork() is implemented through the use of Copy-on-Write pages. Copy on Write or simply CoW is a resource management technique. More about CoW

You can use fork, in case if you want to create parallel program with multiple processes.

Answered By: Sabuhi Shukurov
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.