Recursive SQL CTE query in Pandas?

Question:

how do I elegantly port the below recursive SQL query to Pandas python code?
Somehow I don’t see a straightforward way without writing own recursive function?

Python sample code:

import datetime
import numpy as np
import pandas as pd
import pandas.io.data
from pandas import Series, DataFrame

data = {
        'ID': [1,2,3,4,5,6,7,8],
        'Name': ['Keith','Josh','Robin','Raja','Tridip','Arijit','Amit','Dev'],
        'MgrID': [0,1,1,2,0,5,5,6]
    }

df = pd.DataFrame.from_dict(data)
df.set_index('ID', inplace=True, drop=False, append=False)
df.ix[df.query('MgrID >0')['MgrID']]

Trying to get this:

nLevel      ID          Name
================================
1           6            Arijit
2           8               Dev
1           1            Keith
2           2               Josh
2           3               Robin
3           4                 Raja
1           5            Tridip
2           7               Amit

Recursive SQL Query:

;WITH Employee (ID, Name, MgrID) AS 
(
    SELECT 1,      'Keith',      NULL   UNION ALL
    SELECT 2,      'Josh',       1      UNION ALL
    SELECT 3,      'Robin',      1      UNION ALL
    SELECT 4,      'Raja',       2      UNION ALL
    SELECT 5,      'Tridip',     NULL   UNION ALL
    SELECT 6,      'Arijit',     NULL      UNION ALL
    SELECT 7,      'Amit',       5      UNION ALL
    SELECT 8,      'Dev',        6   
)
,Hierarchy AS
(
    --  Anchor
    SELECT   ID
            ,Name
            ,MgrID
            ,nLevel = 1
            ,Family = ROW_NUMBER() OVER (ORDER BY Name)
    FROM Employee
    WHERE MgrID IS NULL

    UNION ALL
    --  Recursive query
    SELECT   E.ID
            ,E.Name
            ,E.MgrID
            ,H.nLevel+1
            ,Family
    FROM Employee   E
    JOIN Hierarchy  H ON E.MgrID = H.ID
)
SELECT nLevel ,ID,space(nLevel+(CASE WHEN nLevel > 1 THEN nLevel ELSE 0 END))+Name Name FROM Hierarchy ORDER BY Family, nLevel
Asked By: user3615048

||

Answers:

First, you need to correct the typo in python code MgrID list:
[0,1,1,2,0,0,5,6]

Second, if this job is done recursively in SQL, why do you expect Python/Pandas can do it without recursive method? It isn’t too hard:

def nlevel(id, mgr_dict=df.MgrID, _cache={0:0}):
    if id in _cache:
        return _cache[id]

    return 1+nlevel(mgr_dict[id],mgr_dict)

df['nLevel'] = df.ID.map(nlevel)

print df[['nLevel','ID','Name']]

Then the output(nLevel) is what you need (except the order, which I don’t understand from your SQL):

    nLevel  ID    Name
ID                    
1        1   1   Keith
2        2   2    Josh
3        2   3   Robin
4        3   4    Raja
5        1   5  Tridip
6        1   6  Arijit
7        2   7    Amit
8        2   8     Dev

[8 rows x 3 columns]
Answered By: Happy001

Can we run the recursive SQL as is in the python script using pandasql package, which allows to run sql queries within python program?

Answered By: Adarsh A
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.