Subsetting columns and counting the 1's (TURF analysis?)

Question:

The aim is to count the corresponding 1’s in the rows of each subset (>2) of columns:

    0   2   4
0   0   1   0
1   1   1   1
2   1   0   0
3   1   1   0
4   1   0   0
... ... ... ...

In above example we would have 4 subsets. Then the idea is to summarize these counts in a bar plot where each bar is labelled according the subset.

The aim is to make an UpSet plot

UpSet plot

Asked By: Sean_TBI_Research

||

Answers:

It looks like you’re looking for an UpSetPlot:

# pip install upsetplot
import upsetplot

upsetplot.plot(df.astype(bool).value_counts())

Output:

enter image description here

With all combinations

upsetplot.plot(df.astype(bool).value_counts()
                 .reindex(product([True, False], repeat=3), fill_value=0)
              )

enter image description here

older answer

It looks like you might want something like:

df.value_counts().plot.bar()

Output:

enter image description here

Or, by column name for 1 values:

(df.reset_index().melt('index', var_name='cols')
   .query('value == 1')
   .groupby('index')['cols'].agg(frozenset)
   .value_counts().plot.bar()
)

Output:

enter image description here

Answered By: mozway
Categories: questions Tags: , , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.