Create an R or Python function to generate sets based on the value of C

Question:

I need help creating an R or Python function that generates sets based on the value of C. Here’s an example to illustrate what I’m looking for:

For C = 2, I have 2^2 sets, including the sets: Cl_atypique, Cl_1, Cl_2, Cl_incertains

For C = 3, I have 2^3 sets, including the sets: Cl_atypique, Cl_1, Cl_2, Cl_1_2, Cl_3, Cl_1_3, Cl_2_3, Cl_incertains

For C = 4, I have 2^4 sets, including the sets: Cl_atypique, Cl_1, Cl_2, Cl_1_2, Cl_3, Cl_1_3, Cl_2_3, Cl_1_2_3, Cl_4, Cl_1_4, Cl_2_4, Cl_1_2_4,Cl_3_4,Cl_1_3_4, Cl_2_3_4, Cl_incertains

I would like to create an R or Python function that takes the value of C as input and returns a vector containing the sets in the specified order. This vector will be used to name a table later on.

I have this function that works, but the problem is with the order of the sets.

get_ensembles <- function(C) {
  ensembles <- c()
  ensembles <- c(ensembles, "Cl_atypique")
  for (i in 1:(C-1)) {
    subsets <- combn(1:C, i)
    for (j in 1:ncol(subsets)) {
      ensemble <- subsets[, j]
      ensembles <- c(ensembles, paste0("Cl_", paste0(ensemble, collapse = "_")))
    }
  }
  ensembles <- c(ensembles, "Cl_incertains")
  return(ensembles)
}
get_ensembles(3)

get_ensembles(3)
[1] "Cl_atypique" "Cl_1" "Cl_2" "Cl_3" "Cl_1_2"
[6] "Cl_1_3" "Cl_2_3" "Cl_incertains"

As can be seen in the output, Cl_3 and Cl_1_2 are reversed. I can’t find a solution

Thank you !

Asked By: Armel Soubeiga

||

Answers:

This Python solution provides the output you need.

Note: Only works for digit up to 9, ie. (get_ensembles(9))

from itertools import combinations
from pprint import pprint

def get_ensembles(C):
    L = []
    
    for i in range(1,C):
        for c in combinations(range(1, C+1),i):
            L.append('_'.join(['Cl', *map(str,c)]))
    
    L.sort(key=lambda x: x[-1])
    L.insert(0,'Cl_atypique')
    L += ['Cl_incertains']
    
    return L

pprint(get_ensembles(4))

Output is:

['Cl_atypique',
 'Cl_1',
 'Cl_2',
 'Cl_1_2',
 'Cl_3',
 'Cl_1_3',
 'Cl_2_3',
 'Cl_1_2_3',
 'Cl_4',
 'Cl_1_4',
 'Cl_2_4',
 'Cl_3_4',
 'Cl_1_2_4',
 'Cl_1_3_4',
 'Cl_2_3_4',
 'Cl_incertains']

To sort with C > 9, a small adjustment needs to be made with a helper function, parse(s) (This handles C values from 2 on up)

from itertools import combinations
from pprint import pprint
import re

def parse(s):
    m = re.search(r'(?<=_)(d+)$', s)
    return int(m.group(1))

def get_ensembles(C):
    L = []
    
    for i in range(1,C):
        for c in combinations(range(1, C+1),i):
            L.append('_'.join(['Cl', *map(str,c)]))
    
    L.sort(key=lambda x: parse(x))
    L.insert(0,'Cl_atypique')
    L += ['Cl_incertains']
    
    return L

pprint(get_ensembles(10)) # prints 1024 lines in the desired order
Answered By: Chris Charley

a python version

from itertools import combinations

def get_ensembles(C):
  ensembles = ['Cl_atypique']
  ensembles_set = set()
  for numCount in range(1, C+1):
    for pickCount in range(1, C):
      for pick in combinations(range(1, numCount+1), pickCount):
        if pick not in ensembles_set:
          ensembles_set.add(pick)
          ensembles.append('Cl_'+'_'.join(map(str, pick)))
  ensembles.append('Cl_incertains')
  return ensembles

if you just want to iterate over the possible long list:

from itertools import combinations

def get_ensembles(C):
  yield 'Cl_atypique'
  ensembles_set = set()
  for numCount in range(1, C+1):
    for pickCount in range(1, C):
      for pick in combinations(range(1, numCount+1), pickCount):
        if pick not in ensembles_set:
          ensembles_set.add(pick)
          yield 'Cl_'+'_'.join(map(str, pick))
  yield 'Cl_incertains'
Answered By: rioV8

in R you could do:

get_ensembles <- function(C){
  cbs <- function(x,v = x[1]){
      if(length(x)<2) v
      else cbs(x[-1], c(v, x[2], paste(v, x[2], sep = '_')))
  }
  paste0('CL_', c('atypique', cbs(seq.int(C))[-2^C+1], 'incertains'))
}

get_ensembles(4)
 [1] "CL_atypique"   "CL_1"          "CL_2"          "CL_1_2"       
 [5] "CL_3"          "CL_1_3"        "CL_2_3"        "CL_1_2_3"     
 [9] "CL_4"          "CL_1_4"        "CL_2_4"        "CL_1_2_4"     
[13] "CL_3_4"        "CL_1_3_4"      "CL_2_3_4"      "CL_incertains"

You could directly convert the code above into python:

import numpy as np
def get_ensembles(C):
    def cbs(x, v = np.array('1')):
        if x.size < 2 : return v
        return cbs(x[1:], np.r_[v, np.array(x[1]), 
            np.char.add(v, np.char.add('_',x[1]))])
    return np.r_[np.array('CL_atypique'),
        cbs(np.arange(1,C+1).astype(str))[:-1],
        np.array('CL_incertains')]

get_ensembles(4)
array(['CL_atypique', '1', '2', '1_2', '3', '1_3', '2_3', '1_2_3', '4',
       '1_4', '2_4', '1_2_4', '3_4', '1_3_4', '2_3_4', 'CL_incertains'],
      dtype='<U13')
Answered By: Onyambu
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.