Create an R or Python function to generate sets based on the value of C
Question:
I need help creating an R or Python function that generates sets based on the value of C. Here’s an example to illustrate what I’m looking for:
For C = 2, I have 2^2 sets, including the sets: Cl_atypique, Cl_1, Cl_2, Cl_incertains
For C = 3, I have 2^3 sets, including the sets: Cl_atypique, Cl_1, Cl_2, Cl_1_2, Cl_3, Cl_1_3, Cl_2_3, Cl_incertains
For C = 4, I have 2^4 sets, including the sets: Cl_atypique, Cl_1, Cl_2, Cl_1_2, Cl_3, Cl_1_3, Cl_2_3, Cl_1_2_3, Cl_4, Cl_1_4, Cl_2_4, Cl_1_2_4,Cl_3_4,Cl_1_3_4, Cl_2_3_4, Cl_incertains
I would like to create an R or Python function that takes the value of C as input and returns a vector containing the sets in the specified order. This vector will be used to name a table later on.
I have this function that works, but the problem is with the order of the sets.
get_ensembles <- function(C) {
ensembles <- c()
ensembles <- c(ensembles, "Cl_atypique")
for (i in 1:(C-1)) {
subsets <- combn(1:C, i)
for (j in 1:ncol(subsets)) {
ensemble <- subsets[, j]
ensembles <- c(ensembles, paste0("Cl_", paste0(ensemble, collapse = "_")))
}
}
ensembles <- c(ensembles, "Cl_incertains")
return(ensembles)
}
get_ensembles(3)
get_ensembles(3)
[1] "Cl_atypique" "Cl_1" "Cl_2" "Cl_3" "Cl_1_2"
[6] "Cl_1_3" "Cl_2_3" "Cl_incertains"
As can be seen in the output, Cl_3 and Cl_1_2 are reversed. I can’t find a solution
Thank you !
Answers:
This Python solution provides the output you need.
Note: Only works for digit up to 9, ie. (get_ensembles(9))
from itertools import combinations
from pprint import pprint
def get_ensembles(C):
L = []
for i in range(1,C):
for c in combinations(range(1, C+1),i):
L.append('_'.join(['Cl', *map(str,c)]))
L.sort(key=lambda x: x[-1])
L.insert(0,'Cl_atypique')
L += ['Cl_incertains']
return L
pprint(get_ensembles(4))
Output is:
['Cl_atypique',
'Cl_1',
'Cl_2',
'Cl_1_2',
'Cl_3',
'Cl_1_3',
'Cl_2_3',
'Cl_1_2_3',
'Cl_4',
'Cl_1_4',
'Cl_2_4',
'Cl_3_4',
'Cl_1_2_4',
'Cl_1_3_4',
'Cl_2_3_4',
'Cl_incertains']
To sort with C > 9, a small adjustment needs to be made with a helper function, parse(s)
(This handles C values from 2 on up)
from itertools import combinations
from pprint import pprint
import re
def parse(s):
m = re.search(r'(?<=_)(d+)$', s)
return int(m.group(1))
def get_ensembles(C):
L = []
for i in range(1,C):
for c in combinations(range(1, C+1),i):
L.append('_'.join(['Cl', *map(str,c)]))
L.sort(key=lambda x: parse(x))
L.insert(0,'Cl_atypique')
L += ['Cl_incertains']
return L
pprint(get_ensembles(10)) # prints 1024 lines in the desired order
a python version
from itertools import combinations
def get_ensembles(C):
ensembles = ['Cl_atypique']
ensembles_set = set()
for numCount in range(1, C+1):
for pickCount in range(1, C):
for pick in combinations(range(1, numCount+1), pickCount):
if pick not in ensembles_set:
ensembles_set.add(pick)
ensembles.append('Cl_'+'_'.join(map(str, pick)))
ensembles.append('Cl_incertains')
return ensembles
if you just want to iterate over the possible long list:
from itertools import combinations
def get_ensembles(C):
yield 'Cl_atypique'
ensembles_set = set()
for numCount in range(1, C+1):
for pickCount in range(1, C):
for pick in combinations(range(1, numCount+1), pickCount):
if pick not in ensembles_set:
ensembles_set.add(pick)
yield 'Cl_'+'_'.join(map(str, pick))
yield 'Cl_incertains'
in R you could do:
get_ensembles <- function(C){
cbs <- function(x,v = x[1]){
if(length(x)<2) v
else cbs(x[-1], c(v, x[2], paste(v, x[2], sep = '_')))
}
paste0('CL_', c('atypique', cbs(seq.int(C))[-2^C+1], 'incertains'))
}
get_ensembles(4)
[1] "CL_atypique" "CL_1" "CL_2" "CL_1_2"
[5] "CL_3" "CL_1_3" "CL_2_3" "CL_1_2_3"
[9] "CL_4" "CL_1_4" "CL_2_4" "CL_1_2_4"
[13] "CL_3_4" "CL_1_3_4" "CL_2_3_4" "CL_incertains"
You could directly convert the code above into python:
import numpy as np
def get_ensembles(C):
def cbs(x, v = np.array('1')):
if x.size < 2 : return v
return cbs(x[1:], np.r_[v, np.array(x[1]),
np.char.add(v, np.char.add('_',x[1]))])
return np.r_[np.array('CL_atypique'),
cbs(np.arange(1,C+1).astype(str))[:-1],
np.array('CL_incertains')]
get_ensembles(4)
array(['CL_atypique', '1', '2', '1_2', '3', '1_3', '2_3', '1_2_3', '4',
'1_4', '2_4', '1_2_4', '3_4', '1_3_4', '2_3_4', 'CL_incertains'],
dtype='<U13')
I need help creating an R or Python function that generates sets based on the value of C. Here’s an example to illustrate what I’m looking for:
For C = 2, I have 2^2 sets, including the sets: Cl_atypique, Cl_1, Cl_2, Cl_incertains
For C = 3, I have 2^3 sets, including the sets: Cl_atypique, Cl_1, Cl_2, Cl_1_2, Cl_3, Cl_1_3, Cl_2_3, Cl_incertains
For C = 4, I have 2^4 sets, including the sets: Cl_atypique, Cl_1, Cl_2, Cl_1_2, Cl_3, Cl_1_3, Cl_2_3, Cl_1_2_3, Cl_4, Cl_1_4, Cl_2_4, Cl_1_2_4,Cl_3_4,Cl_1_3_4, Cl_2_3_4, Cl_incertains
I would like to create an R or Python function that takes the value of C as input and returns a vector containing the sets in the specified order. This vector will be used to name a table later on.
I have this function that works, but the problem is with the order of the sets.
get_ensembles <- function(C) {
ensembles <- c()
ensembles <- c(ensembles, "Cl_atypique")
for (i in 1:(C-1)) {
subsets <- combn(1:C, i)
for (j in 1:ncol(subsets)) {
ensemble <- subsets[, j]
ensembles <- c(ensembles, paste0("Cl_", paste0(ensemble, collapse = "_")))
}
}
ensembles <- c(ensembles, "Cl_incertains")
return(ensembles)
}
get_ensembles(3)
get_ensembles(3)
[1] "Cl_atypique" "Cl_1" "Cl_2" "Cl_3" "Cl_1_2"
[6] "Cl_1_3" "Cl_2_3" "Cl_incertains"
As can be seen in the output, Cl_3 and Cl_1_2 are reversed. I can’t find a solution
Thank you !
This Python solution provides the output you need.
Note: Only works for digit up to 9, ie. (get_ensembles(9))
from itertools import combinations
from pprint import pprint
def get_ensembles(C):
L = []
for i in range(1,C):
for c in combinations(range(1, C+1),i):
L.append('_'.join(['Cl', *map(str,c)]))
L.sort(key=lambda x: x[-1])
L.insert(0,'Cl_atypique')
L += ['Cl_incertains']
return L
pprint(get_ensembles(4))
Output is:
['Cl_atypique',
'Cl_1',
'Cl_2',
'Cl_1_2',
'Cl_3',
'Cl_1_3',
'Cl_2_3',
'Cl_1_2_3',
'Cl_4',
'Cl_1_4',
'Cl_2_4',
'Cl_3_4',
'Cl_1_2_4',
'Cl_1_3_4',
'Cl_2_3_4',
'Cl_incertains']
To sort with C > 9, a small adjustment needs to be made with a helper function, parse(s)
(This handles C values from 2 on up)
from itertools import combinations
from pprint import pprint
import re
def parse(s):
m = re.search(r'(?<=_)(d+)$', s)
return int(m.group(1))
def get_ensembles(C):
L = []
for i in range(1,C):
for c in combinations(range(1, C+1),i):
L.append('_'.join(['Cl', *map(str,c)]))
L.sort(key=lambda x: parse(x))
L.insert(0,'Cl_atypique')
L += ['Cl_incertains']
return L
pprint(get_ensembles(10)) # prints 1024 lines in the desired order
a python version
from itertools import combinations
def get_ensembles(C):
ensembles = ['Cl_atypique']
ensembles_set = set()
for numCount in range(1, C+1):
for pickCount in range(1, C):
for pick in combinations(range(1, numCount+1), pickCount):
if pick not in ensembles_set:
ensembles_set.add(pick)
ensembles.append('Cl_'+'_'.join(map(str, pick)))
ensembles.append('Cl_incertains')
return ensembles
if you just want to iterate over the possible long list:
from itertools import combinations
def get_ensembles(C):
yield 'Cl_atypique'
ensembles_set = set()
for numCount in range(1, C+1):
for pickCount in range(1, C):
for pick in combinations(range(1, numCount+1), pickCount):
if pick not in ensembles_set:
ensembles_set.add(pick)
yield 'Cl_'+'_'.join(map(str, pick))
yield 'Cl_incertains'
in R you could do:
get_ensembles <- function(C){
cbs <- function(x,v = x[1]){
if(length(x)<2) v
else cbs(x[-1], c(v, x[2], paste(v, x[2], sep = '_')))
}
paste0('CL_', c('atypique', cbs(seq.int(C))[-2^C+1], 'incertains'))
}
get_ensembles(4)
[1] "CL_atypique" "CL_1" "CL_2" "CL_1_2"
[5] "CL_3" "CL_1_3" "CL_2_3" "CL_1_2_3"
[9] "CL_4" "CL_1_4" "CL_2_4" "CL_1_2_4"
[13] "CL_3_4" "CL_1_3_4" "CL_2_3_4" "CL_incertains"
You could directly convert the code above into python:
import numpy as np
def get_ensembles(C):
def cbs(x, v = np.array('1')):
if x.size < 2 : return v
return cbs(x[1:], np.r_[v, np.array(x[1]),
np.char.add(v, np.char.add('_',x[1]))])
return np.r_[np.array('CL_atypique'),
cbs(np.arange(1,C+1).astype(str))[:-1],
np.array('CL_incertains')]
get_ensembles(4)
array(['CL_atypique', '1', '2', '1_2', '3', '1_3', '2_3', '1_2_3', '4',
'1_4', '2_4', '1_2_4', '3_4', '1_3_4', '2_3_4', 'CL_incertains'],
dtype='<U13')