Rearranging a list to get the 2nd column entries as rows

Question:

I have a list associated to strings as follows;

A   string1^description1`string2^description2`string3^description3
B   string4^description4
C   string1^description1`string5^description5`string3^description3
D   .
E   string6^description6`string1^description1
F   string7^description7
G   string1^description1`string4^description4`string5^description5

I would like to switch the first and second columns so that the stings in the 2nd column are the main list and the previous 1st column becomes the string as follows;

string1^description1    A   C   E   G
string2^description2    A
string3^description3    A   C
string4^description4    B   G
string5^description5    C   G
string6^description6    E
string7^description7    F

I have struggled with this and can’t come up with anything. I am new to scripting.

Asked By: alex kiarie

||

Answers:

from collections import defaultdict
data = '''A   string1^description1`string2^description2`string3^description3
B   string4^description4
C   string1^description1`string5^description5`string3^description3
D   .
E   string6^description6`string1^description1
F   string7^description7
G   string1^description1`string4^description4`string5^description5'''

d = defaultdict(list)
for line in data.split('n'):  # split the input data into lines
    char, info = line.split()  # in each line get the char and info
    for desc in info.split('`'):  # get the categories separated by `
        if len(desc) < 6:      # avoid case like line D where there is no data
            continue
        d[desc].append(char)

for k, v in d.items():
    print(f"{k} {' '.join(v)}")

Output:

string1^description1 A C E G
string2^description2 A
string3^description3 A C
string4^description4 B G
string5^description5 C G
string6^description6 E
string7^description7 F
Answered By: Jay

An AWK solution:

#! /usr/bin/env bash

INPUT_FILE="$1"

awk 
'
BEGIN {
    FS=" "
}
{
    key=$1
    $1=""
    gsub(/^ */, "")
    n=split($0, a, /`/)
    for (i=1; i<=n; i++) {
        if (a[i] != ".") {
            hash[a[i]]=hash[a[i]] "   " key
        }
    }
}
END {
    PROCINFO["sorted_in"] = "@ind_str_asc"
    for (elem in hash) {
        print elem " " hash[elem]
    }
}
' 
< "${INPUT_FILE}"

Output:

string1^description1    A   C   E   G
string2^description2    A
string3^description3    A   C
string4^description4    B   G
string5^description5    C   G
string6^description6    E
string7^description7    F

Answered By: Arnaud Valmary

Since we seem to be iterating over the tags, here’s a perl solution:

#!/usr/bin/env perl
use v5.10;
my %labels;
while (<>) {
  chomp;
  my ($label, $rest) = split ' ',$_,2;
  foreach my $key (split '`', $rest) {
    push @{$labels{$key}}, $label unless $key eq '.'
  }
}

foreach my $key (sort keys %labels) {
  say "$keyt", join("t", @{$labels{$key}});
}

But all of these have the same idea. You split the lines on whitespace to separate the initial letter from the string/description pairs, then split those pairs on backtick to extract the individual ones, which become the keys in an associative array (dict/hash) whose values are lists of the letters on whose lines that string/description pair was seen.

Answered By: Mark Reed
Categories: questions Tags: , , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.