How to get value based on key for the text file
Question:
I have below text format in text file
; Generated for TDD
[Document]
Mainline = PQRS.N - Holdings Corp conference call, Jun. 22, 2006 / 10:00AM ET
DocDate = 20060622100000
CtbDocUID = T234567
Action = ADD
CtbId = 4567
UserId = ftp_contribution
Password = PASSWORD
Attachment.MainFile.CtbName = T_1234.xml
SynopsisFile.CtbName = T_1234.tdm
MainLanguage = en
WorldReg[0].MxpCode = NAM
Country[0].MxpCode = USA
Currency[0].MxpCode = USD
Distribution.GroupID[0] = 3
Author[0].MxpCode = 5GOW
[EndOfFile]
Left side is key and Right side is value .
We need a way to get the values based on the key .
So when we provide DocDate as key we should get value as 20060622100000
Not sure how to do this .
spacing is not uniform and fixed its just the key will be fixed always .
Please suggest a way
Either Java or Python or May be regex is also fine .
Answers:
Can be done like this:
import json
INPUT_FILE = 'test.txt'
with open(INPUT_FILE) as f:
lines = f.readlines()
data = {}
for line in lines:
parts = line.split('=')
if len(parts) == 2:
data[parts[0].strip()] = parts[1].strip()
print(json.dumps(data, indent=' '))
Result:
{
"Mainline": "PQRS.N - Holdings Corp conference call, Jun. 22, 2006 / 10:00AM ET",
"DocDate": "20060622100000",
"CtbDocUID": "T234567",
"Action": "ADD",
"CtbId": "4567",
"UserId": "ftp_contribution",
"Password": "PASSWORD",
"Attachment.MainFile.CtbName": "T_1234.xml",
"SynopsisFile.CtbName": "T_1234.tdm",
"MainLanguage": "en",
"WorldReg[0].MxpCode": "NAM",
"Country[0].MxpCode": "USA",
"Currency[0].MxpCode": "USD",
"Distribution.GroupID[0]": "3",
"Author[0].MxpCode": "5GOW"
}
Do you also need to infer data types?
try this (java)
@Test
public void t() throws Exception {
Path path = Paths.get("/path/to/fileTest.txt");
Stream<String> lines = Files.lines(path);
String data = lines.collect(Collectors.joining("n"));
lines.close();
Map<String, String> result = Arrays.stream(data.split("n"))
.filter(row -> row.contains("="))
.map(row -> row.split("="))
.collect(Collectors.toMap(
a -> a[0].trim(), //key
a -> a[1].trim() //value
));
assertThat(result.get("DocDate"), Is.is("20060622100000"));
assertThat(result.get("Mainline"), Is.is("PQRS.N - Holdings Corp conference call, Jun. 22, 2006 / 10:00AM ET"));
}
Please suggest a way(…)Python(..)fine
I suggest taking look at configparser
, let file.txt
content be
[Document]
Mainline = PQRS.N - Holdings Corp conference call, Jun. 22, 2006 / 10:00AM ET
DocDate = 20060622100000
CtbDocUID = T234567
Action = ADD
CtbId = 4567
UserId = ftp_contribution
Password = PASSWORD
Attachment.MainFile.CtbName = T_1234.xml
SynopsisFile.CtbName = T_1234.tdm
MainLanguage = en
WorldReg[0].MxpCode = NAM
Country[0].MxpCode = USA
Currency[0].MxpCode = USD
Distribution.GroupID[0] = 3
Author[0].MxpCode = 5GOW
[EndOfFile]
then
import configparser
config = configparser.ConfigParser()
config.read("file.txt")
print(config["Document"]["DocDate"])
gives output
20060622100000
configparser
is part of standard library so you do not have to install anything, if you want to know more read linked docs.
Python: Build a dictionary so that you can search for all/any keys. For example:
db = {}
with open('tdd.txt') as infile:
for line in map(str.strip, infile):
if len(t := line.split('=')) == 2:
db[t[0].strip()] = t[1].strip()
print(db['DocDate'])
Output:
20060622100000
I have below text format in text file
; Generated for TDD
[Document]
Mainline = PQRS.N - Holdings Corp conference call, Jun. 22, 2006 / 10:00AM ET
DocDate = 20060622100000
CtbDocUID = T234567
Action = ADD
CtbId = 4567
UserId = ftp_contribution
Password = PASSWORD
Attachment.MainFile.CtbName = T_1234.xml
SynopsisFile.CtbName = T_1234.tdm
MainLanguage = en
WorldReg[0].MxpCode = NAM
Country[0].MxpCode = USA
Currency[0].MxpCode = USD
Distribution.GroupID[0] = 3
Author[0].MxpCode = 5GOW
[EndOfFile]
Left side is key and Right side is value .
We need a way to get the values based on the key .
So when we provide DocDate as key we should get value as 20060622100000
Not sure how to do this .
spacing is not uniform and fixed its just the key will be fixed always .
Please suggest a way
Either Java or Python or May be regex is also fine .
Can be done like this:
import json
INPUT_FILE = 'test.txt'
with open(INPUT_FILE) as f:
lines = f.readlines()
data = {}
for line in lines:
parts = line.split('=')
if len(parts) == 2:
data[parts[0].strip()] = parts[1].strip()
print(json.dumps(data, indent=' '))
Result:
{
"Mainline": "PQRS.N - Holdings Corp conference call, Jun. 22, 2006 / 10:00AM ET",
"DocDate": "20060622100000",
"CtbDocUID": "T234567",
"Action": "ADD",
"CtbId": "4567",
"UserId": "ftp_contribution",
"Password": "PASSWORD",
"Attachment.MainFile.CtbName": "T_1234.xml",
"SynopsisFile.CtbName": "T_1234.tdm",
"MainLanguage": "en",
"WorldReg[0].MxpCode": "NAM",
"Country[0].MxpCode": "USA",
"Currency[0].MxpCode": "USD",
"Distribution.GroupID[0]": "3",
"Author[0].MxpCode": "5GOW"
}
Do you also need to infer data types?
try this (java)
@Test
public void t() throws Exception {
Path path = Paths.get("/path/to/fileTest.txt");
Stream<String> lines = Files.lines(path);
String data = lines.collect(Collectors.joining("n"));
lines.close();
Map<String, String> result = Arrays.stream(data.split("n"))
.filter(row -> row.contains("="))
.map(row -> row.split("="))
.collect(Collectors.toMap(
a -> a[0].trim(), //key
a -> a[1].trim() //value
));
assertThat(result.get("DocDate"), Is.is("20060622100000"));
assertThat(result.get("Mainline"), Is.is("PQRS.N - Holdings Corp conference call, Jun. 22, 2006 / 10:00AM ET"));
}
Please suggest a way(…)Python(..)fine
I suggest taking look at configparser
, let file.txt
content be
[Document]
Mainline = PQRS.N - Holdings Corp conference call, Jun. 22, 2006 / 10:00AM ET
DocDate = 20060622100000
CtbDocUID = T234567
Action = ADD
CtbId = 4567
UserId = ftp_contribution
Password = PASSWORD
Attachment.MainFile.CtbName = T_1234.xml
SynopsisFile.CtbName = T_1234.tdm
MainLanguage = en
WorldReg[0].MxpCode = NAM
Country[0].MxpCode = USA
Currency[0].MxpCode = USD
Distribution.GroupID[0] = 3
Author[0].MxpCode = 5GOW
[EndOfFile]
then
import configparser
config = configparser.ConfigParser()
config.read("file.txt")
print(config["Document"]["DocDate"])
gives output
20060622100000
configparser
is part of standard library so you do not have to install anything, if you want to know more read linked docs.
Python: Build a dictionary so that you can search for all/any keys. For example:
db = {}
with open('tdd.txt') as infile:
for line in map(str.strip, infile):
if len(t := line.split('=')) == 2:
db[t[0].strip()] = t[1].strip()
print(db['DocDate'])
Output:
20060622100000