How to search for numbers/strings in a column and populate numbers accordingly in a different column?
Question:
I’m annotating keystroke logging data (in Excel) where I need to create a trial number (Trial column) for each logged key (Input column), depending on the number logged at the beginning of each line (after RETURN key). As shown in the table below, for the line "10. red" the trial number for each key should be 10.
Could anyone suggest a way to fill out the Trial column automatically? Even if it’s not 100% accurate, it will save me a lot of time. Thanks in advance!!
Input
Trial
RETURN
9
1
10
0
10
.
10
r
10
e
10
d
10
RETURN
10
1
11
1
11
.
11
b
11
l
11
u
11
e
11
RETURN
11
1
12
2
12
I imagine python could help but I don’t know what functions exactly.
Answers:
import pandas as pd
s = 'InputtTrialnRETURNt9n1t10n0t10n.t10nrt10net10ndt10nRETURNt10n1t11n1t11n.t11nbt11nlt11nut11net11nRETURNt11n1t12n2t12'
data = [r.split('t')[0] for r in s.split('n')[1:]]
df = pd.DataFrame(data, columns=['Input'])
start_trial = 9
df['Trial'] = start_trial
df.loc[1:, 'Trial'] += ((df.Input == 'RETURN').cumsum())[1:]
print(df)
prints
Input Trial
0 RETURN 9
1 1 10
2 0 10
3 . 10
4 r 10
5 e 10
6 d 10
7 RETURN 11
8 1 11
9 1 11
10 . 11
11 b 11
12 l 11
13 u 11
14 e 11
15 RETURN 12
16 1 12
17 2 12
With base R:
dt <- structure(list(V1 = c("RETURN", "1", "0", ".", "r", "e", "d",
"RETURN", "1", "1", ".", "b", "l", "u", "e", "RETURN", "1", "2"
), V2 = c(9L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 11L, 11L, 11L,
11L, 11L, 11L, 11L, 11L, 12L, 12L)), class = "data.frame", row.names = c(NA,
-18L))
dt$V3 <- 9 + cumsum(dt$V1 == "RETURN") - as.numeric(dt$V1 == "RETURN")
dt
#> V1 V2 V3
#> 1 RETURN 9 9
#> 2 1 10 10
#> 3 0 10 10
#> 4 . 10 10
#> 5 r 10 10
#> 6 e 10 10
#> 7 d 10 10
#> 8 RETURN 10 10
#> 9 1 11 11
#> 10 1 11 11
#> 11 . 11 11
#> 12 b 11 11
#> 13 l 11 11
#> 14 u 11 11
#> 15 e 11 11
#> 16 RETURN 11 11
#> 17 1 12 12
#> 18 2 12 12
Created on 2023-05-18 with reprex v2.0.2
In Excel use IF below your first line:
I’m annotating keystroke logging data (in Excel) where I need to create a trial number (Trial column) for each logged key (Input column), depending on the number logged at the beginning of each line (after RETURN key). As shown in the table below, for the line "10. red" the trial number for each key should be 10.
Could anyone suggest a way to fill out the Trial column automatically? Even if it’s not 100% accurate, it will save me a lot of time. Thanks in advance!!
Input | Trial |
---|---|
RETURN | 9 |
1 | 10 |
0 | 10 |
. | 10 |
r | 10 |
e | 10 |
d | 10 |
RETURN | 10 |
1 | 11 |
1 | 11 |
. | 11 |
b | 11 |
l | 11 |
u | 11 |
e | 11 |
RETURN | 11 |
1 | 12 |
2 | 12 |
I imagine python could help but I don’t know what functions exactly.
import pandas as pd
s = 'InputtTrialnRETURNt9n1t10n0t10n.t10nrt10net10ndt10nRETURNt10n1t11n1t11n.t11nbt11nlt11nut11net11nRETURNt11n1t12n2t12'
data = [r.split('t')[0] for r in s.split('n')[1:]]
df = pd.DataFrame(data, columns=['Input'])
start_trial = 9
df['Trial'] = start_trial
df.loc[1:, 'Trial'] += ((df.Input == 'RETURN').cumsum())[1:]
print(df)
prints
Input Trial
0 RETURN 9
1 1 10
2 0 10
3 . 10
4 r 10
5 e 10
6 d 10
7 RETURN 11
8 1 11
9 1 11
10 . 11
11 b 11
12 l 11
13 u 11
14 e 11
15 RETURN 12
16 1 12
17 2 12
With base R:
dt <- structure(list(V1 = c("RETURN", "1", "0", ".", "r", "e", "d",
"RETURN", "1", "1", ".", "b", "l", "u", "e", "RETURN", "1", "2"
), V2 = c(9L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 11L, 11L, 11L,
11L, 11L, 11L, 11L, 11L, 12L, 12L)), class = "data.frame", row.names = c(NA,
-18L))
dt$V3 <- 9 + cumsum(dt$V1 == "RETURN") - as.numeric(dt$V1 == "RETURN")
dt
#> V1 V2 V3
#> 1 RETURN 9 9
#> 2 1 10 10
#> 3 0 10 10
#> 4 . 10 10
#> 5 r 10 10
#> 6 e 10 10
#> 7 d 10 10
#> 8 RETURN 10 10
#> 9 1 11 11
#> 10 1 11 11
#> 11 . 11 11
#> 12 b 11 11
#> 13 l 11 11
#> 14 u 11 11
#> 15 e 11 11
#> 16 RETURN 11 11
#> 17 1 12 12
#> 18 2 12 12
Created on 2023-05-18 with reprex v2.0.2
In Excel use IF below your first line: