Putting brackets around a specific column

Question:

With a bash script, I extracted a .conllu file into a three columned .txt with the Lemma, POS and meaning. So some kind of dictionary. Now I am trying to make it prettier by putting the second column (POS) in brackets.

It looks like:

ami NOUN    mother
amo VERB    sleep
asima   NOUN    younger_sister
ati NOUN    older_sister

Every column is seperated by a tab.

I want it to look like this:

ami (NOUN)  mother
amo (VERB)  sleep
asima   (NOUN)  younger_sister
ati (NOUN)  older_sister

and ideally:

ami (NOUN)  - mother
amo (VERB)  - sleep
asima   (NOUN)  - younger_sister
ati (NOUN)  - older_sister

I tried regex and sed

sed -e 's/[a-zA-Z]+ /(/g' -e 's+[a-zA-Z]+=[a-zA-Z]+/)/g' dictjaa.txt > test.txt

but failed unfortunately.

Asked By: DIC

||

Answers:

Using sed

sed -E 's/([^[:alpha:]]+)([^ ]*)  /1(2) -/' input_file
ami (NOUN) -  mother
amo (VERB) -  sleep
asima   (NOUN) -  younger_sister
ati (NOUN) -  older_sister
Answered By: HatLess

If there are always uppercase characters A-Z:

sed -E 's/([[:blank:]])([A-Z]+)[[:blank:]]+/1(2)  - /' dictjaa.txt > test.txt

The pattern matches:

  • ([[:blank:]]) Capture group 1, match either a space or tab
  • ([A-Z+]) Capture group 2, match 1+ uppercase chars A-Z
  • [[:blank:]]+ Match 1+ occurrences of either a space or tab

The content of test.txt:

ami (NOUN)  - mother
amo (VERB)  - sleep
asima   (NOUN)  - younger_sister
ati (NOUN)  - older_sister
Answered By: The fourth bird
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.