How to select SPECIFIC data points for testing and training?

Question:

I have the following dataset and I want to split it 25% for testing and the rest for training. I want to choose the last 15 data points to be included in (be part of) the randomly generated training dataset. How can I do that?……..

scaler = preprocessing.MinMaxScaler()
names = data_set.columns
d = scaler.fit_transform(data_set)
scaled_df = pd.DataFrame(d, columns=names)
X, y = scaled_df[[ "Part's Z-Height (mm)","Part's Weight (N)","Part's Volume (cm^3)","Part's Surface Area (cm^2)","Layer Height (mm)","Infill Density (%)","Printing/Scanning Speed (mm/s)","Part's Orientation (Support's height) (mm)","Part's Orientation (Support's volume) (cm^3)"]], scaled_df [["Climate change (kg CO2 eq.)","Climate change, incl biogenic carbon (kg CO2 eq.)","Fine Particulate Matter Formation (kg PM2.5 eq.)","Fossil depletion (kg oil eq.)","Freshwater Consumption (m^3)","Freshwater ecotoxicity (kg 1,4-DB eq.)","Freshwater Eutrophication (kg P eq.)","Human toxicity, cancer (kg 1,4-DB eq.)","Human toxicity, non-cancer (kg 1,4-DB eq.)","Ionizing Radiation (Bq. C-60 eq. to air)","Land use (Annual crop eq. yr)","Marine ecotoxicity (kg 1,4-DB eq.)","Marine Eutrophication (kg N eq.)","Metal depletion (kg Cu eq.)","Photochemical Ozone Formation, Ecosystem (kg NOx eq.)","Photochemical Ozone Formation, Human Health (kg NOx eq.)","Stratospheric Ozone Depletion (kg CFC-11 eq.)","Terrestrial Acidification (kg SO2 eq.)","Terrestrial ecotoxicity (kg 1,4-DB eq.)"]]
scaled_df.head(34)



Part's Z-Height (mm)    Part's Weight (N)   Part's Volume (cm^3)    Part's Surface Area (cm^2)  Layer Height (mm)   Infill Density (%)  Printing/Scanning Speed (mm/s)  Part's Orientation (Support's height) (mm)  Part's Orientation (Support's volume) (cm^3)    Climate change (kg CO2 eq.) Climate change, incl biogenic carbon (kg CO2 eq.)   Fine Particulate Matter Formation (kg PM2.5 eq.)    Fossil depletion (kg oil eq.)   Freshwater Consumption (m^3)    Freshwater ecotoxicity (kg 1,4-DB eq.)  Freshwater Eutrophication (kg P eq.)    Human toxicity, cancer (kg 1,4-DB eq.)  Human toxicity, non-cancer (kg 1,4-DB eq.)  Ionizing Radiation (Bq. C-60 eq. to air)    Land use (Annual crop eq. yr)   Marine ecotoxicity (kg 1,4-DB eq.)  Marine Eutrophication (kg N eq.)    Metal depletion (kg Cu eq.) Photochemical Ozone Formation, Ecosystem (kg NOx eq.)   Photochemical Ozone Formation, Human Health (kg NOx eq.)    Stratospheric Ozone Depletion (kg CFC-11 eq.)   Terrestrial Acidification (kg SO2 eq.)  Terrestrial ecotoxicity (kg 1,4-DB eq.)
0   0.419716    0.009814    0.009814    0.000000    0.0 0.00    0.666667    0.380679    0.039133    0.048122    0.097393    0.015227    0.010446    0.160780    0.006969    0.008898    0.161179    0.030612    0.009631    0.007965    0.008285    0.008061    0.110836    0.011551    0.012469    0.008489    0.012848    0.070063
1   0.419716    0.009814    0.009814    0.000000    0.2 0.00    0.666667    0.380679    0.035639    0.019322    0.039110    0.006289    0.004875    0.064476    0.004181    0.005085    0.064558    0.012911    0.004815    0.004707    0.004751    0.004478    0.044499    0.004950    0.004988    0.004669    0.005353    0.028190
2   0.419716    0.018646    0.018646    0.000000    0.4 0.00    0.666667    0.161784    0.030049    0.010755    0.020706    0.004303    0.003482    0.033060    0.004878    0.004661    0.033140    0.007705    0.004357    0.004707    0.004508    0.004478    0.023486    0.003300    0.003325    0.004527    0.003212    0.015331
3   0.419716    0.017664    0.017664    0.000000    0.6 0.00    0.666667    0.161784    0.030049    0.006562    0.011887    0.003310    0.002786    0.018275    0.004878    0.004237    0.018291    0.004581    0.004127    0.004345    0.004142    0.004478    0.013185    0.002475    0.002494    0.004244    0.002141    0.008902
4   0.419716    0.019627    0.019627    0.000000    1.0 0.00    0.666667    0.380679    0.030748    0.002917    0.003067    0.002979    0.003482    0.003080    0.005575    0.005085    0.003013    0.002499    0.004586    0.005069    0.004630    0.004926    0.002884    0.002475    0.002494    0.004810    0.002141    0.002967
5   0.213663    0.008832    0.008832    0.000000    0.0 0.00    0.666667    0.135417    0.029350    0.036274    0.079371    0.007613    0.003482    0.134702    0.000697    0.002119    0.134926    0.021241    0.002752    0.001448    0.002071    0.001343    0.091059    0.004950    0.004988    0.001839    0.005353    0.055391
6   0.213663    0.008832    0.008832    0.000000    0.2 0.00    0.666667    0.135417    0.028651    0.012760    0.030291    0.000993    0.000000    0.052772    0.000000    0.000000    0.052937    0.006664    0.000000    0.000000    0.000000    0.000000    0.035023    0.000000    0.000000    0.000000    0.000000    0.020607
7   0.213663    0.019627    0.019627    0.000000    0.4 0.00    0.666667    0.135417    0.027952    0.009114    0.018021    0.003310    0.002786    0.028953    0.004181    0.003814    0.029051    0.005623    0.003440    0.003621    0.003533    0.003583    0.020190    0.002475    0.002494    0.003679    0.002141    0.013023
8   0.213663    0.019627    0.019627    0.000000    0.6 0.00    0.666667    0.135417    0.027254    0.004739    0.009202    0.001986    0.001393    0.014579    0.003484    0.002966    0.014633    0.003540    0.002752    0.003259    0.002802    0.003135    0.010301    0.001650    0.001663    0.002971    0.001071    0.006759
9   0.213663    0.018646    0.018646    0.000000    1.0 0.00    0.666667    0.135417    0.026555    0.000000    0.000000    0.000000    0.000000    0.000000    0.002787    0.002119    0.000000    0.000000    0.001605    0.002172    0.001706    0.001791    0.000000    0.000000    0.000000    0.001981    0.000000    0.000000
10  0.125603    0.008832    0.008832    0.039373    0.0 0.00    0.000000    0.137778    0.053110    0.063799    0.121166    0.025157    0.019499    0.195072    0.014634    0.017373    0.195180    0.043107    0.018344    0.016293    0.016813    0.016122    0.136794    0.021452    0.021613    0.016836    0.022484    0.090010
11  0.125603    0.008832    0.008832    0.039373    0.0 0.00    0.333333    0.137778    0.053110    0.061065    0.115414    0.024495    0.019499    0.186858    0.014634    0.017373    0.186572    0.042066    0.017886    0.016293    0.016813    0.016122    0.130202    0.020627    0.020781    0.016695    0.022484    0.085064
12  0.125603    0.008832    0.008832    0.039373    0.0 0.00    0.866667    0.137778    0.053110    0.057783    0.108512    0.023502    0.018802    0.174333    0.014634    0.016949    0.174736    0.039983    0.017657    0.015930    0.016813    0.016122    0.122373    0.020627    0.020781    0.016553    0.021413    0.080448
13  0.125603    0.008832    0.008832    0.039373    0.0 0.00    1.000000    0.137778    0.053110    0.057237    0.107745    0.023502    0.018802    0.172485    0.014634    0.016949    0.172800    0.039983    0.017657    0.015930    0.016813    0.016122    0.121137    0.019802    0.019950    0.016412    0.021413    0.079624
14  0.771175    0.000000    0.000000    0.039373    0.0 0.00    0.000000    0.786031    0.073375    0.079657    0.145706    0.035419    0.029248    0.229979    0.022997    0.026271    0.231762    0.056643    0.027287    0.024982    0.026559    0.025078    0.163576    0.031353    0.031588    0.025750    0.032120    0.109792
15  0.771175    0.000000    0.000000    0.039373    0.0 0.00    0.333333    0.786031    0.073375    0.077470    0.140721    0.034757    0.028552    0.221766    0.022997    0.026271    0.223155    0.055602    0.027058    0.024982    0.026559    0.025078    0.157808    0.030528    0.030756    0.025608    0.032120    0.104847
16  0.771175    0.000000    0.000000    0.039373    0.0 0.00    0.866667    0.786031    0.073375    0.074918    0.135736    0.034426    0.028552    0.213552    0.022997    0.025847    0.214547    0.053519    0.026829    0.024620    0.026559    0.024631    0.152040    0.030528    0.030756    0.025467    0.031049    0.101550
17  0.771175    0.000000    0.000000    0.039373    0.0 0.00    1.000000    0.786031    0.073375    0.074553    0.134586    0.034095    0.028552    0.211499    0.022997    0.025847    0.212395    0.053519    0.026829    0.024620    0.025341    0.024631    0.151215    0.030528    0.030756    0.025467    0.031049    0.101550
18  1.000000    0.402355    0.402355    0.239088    0.0 0.25    0.666667    0.695845    0.083857    0.228946    0.263804    0.206885    0.202646    0.303901    0.200697    0.202542    0.309232    0.218034    0.202018    0.202028    0.201998    0.201970    0.273589    0.203795    0.204489    0.202037    0.204497    0.244972
19  1.000000    0.586850    0.586850    0.239088    0.0 0.50    0.666667    0.695845    0.083857    0.303682    0.328988    0.285005    0.280641    0.367556    0.279443    0.280932    0.367334    0.294044    0.282275    0.279508    0.281189    0.280340    0.339102    0.282178    0.282627    0.281268    0.282655    0.315859
20  1.000000    0.770363    0.770363    0.239088    0.0 0.75    0.666667    0.695845    0.083857    0.376595    0.398006    0.361139    0.358635    0.429158    0.357491    0.360169    0.429740    0.370054    0.360238    0.359160    0.359162    0.359606    0.406675    0.359736    0.359933    0.359083    0.359743    0.386746
21  1.000000    0.952895    0.952895    0.239088    0.0 1.00    0.666667    0.695845    0.083857    0.511484    0.597393    0.453823    0.444986    0.706366    0.435540    0.440678    0.707338    0.482507    0.442788    0.438812    0.440789    0.440215    0.620931    0.448020    0.448878    0.441143    0.450749    0.549951
22  0.000000    0.409225    0.409225    0.239088    0.0 0.25    0.666667    0.000000    0.000000    0.177907    0.207439    0.158226    0.155292    0.246407    0.153310    0.154661    0.246826    0.168055    0.156157    0.154598    0.154483    0.154501    0.215904    0.155941    0.156276    0.153933    0.156317    0.190570
23  0.000000    0.590775    0.590775    0.239088    0.0 0.50    0.666667    0.000000    0.000000    0.252643    0.279141    0.235353    0.231894    0.314168    0.230662    0.232203    0.313536    0.244065    0.231828    0.232078    0.232456    0.231975    0.287186    0.232673    0.233583    0.231749    0.233405    0.264754
24  0.000000    0.773307    0.773307    0.239088    0.0 0.75    0.666667    0.000000    0.000000    0.327379    0.348160    0.311486    0.309192    0.377823    0.308711    0.309322    0.378093    0.320075    0.309791    0.308472    0.310429    0.309449    0.357231    0.310231    0.310889    0.309564    0.310493    0.337290
25  0.000000    0.952895    0.952895    0.239088    0.0 1.00    0.666667    0.000000    0.000000    0.456799    0.536043    0.404171    0.393454    0.638604    0.385366    0.389831    0.640628    0.429404    0.392341    0.388125    0.390838    0.390954    0.559126    0.396865    0.398171    0.390209    0.399358    0.492252
26  0.812500    0.682041    0.682041    1.000000    0.0 0.00    0.666667    0.031409    0.000000    0.310973    0.359663    0.278385    0.273677    0.420945    0.269686    0.272458    0.421132    0.295085    0.273102    0.272266    0.272661    0.271832    0.373712    0.275578    0.275977    0.272779    0.277302    0.332344
27  0.812500    0.682041    0.682041    1.000000    0.0 0.00    0.000000    0.031409    0.000000    0.331024    0.401840    0.285005    0.276462    0.490760    0.269686    0.273729    0.492145    0.307580    0.275396    0.272266    0.273879    0.272727    0.419036    0.279703    0.280133    0.274194    0.281585    0.362018
28  0.812500    1.000000    1.000000    1.000000    0.0 0.25    0.666667    0.031409    0.000000    0.445862    0.493865    0.414101    0.409471    0.554415    0.405575    0.406780    0.554551    0.431487    0.408393    0.406227    0.409113    0.408867    0.505562    0.411716    0.412303    0.408602    0.412206    0.467524
29  0.813166    0.663395    0.663395    1.000000    0.0 0.00    0.666667    1.000000    1.000000    0.863288    0.861963    0.860973    0.860724    0.866530    0.860627    0.860169    0.866581    0.864640    0.862417    0.862419    0.862329    0.861173    0.868150    0.859736    0.858687    0.862762    0.860814    0.863172
30  0.813166    0.663395    0.663395    1.000000    0.0 0.00    0.000000    1.000000    1.000000    0.883339    0.904141    0.867594    0.860724    0.936345    0.860627    0.864407    0.939746    0.877135    0.864710    0.862419    0.863548    0.861173    0.913473    0.867987    0.866999    0.864177    0.865096    0.894494
31  0.813166    0.986261    0.986261    1.000000    0.0 0.25    0.666667    1.000000    1.000000    1.000000    1.000000    1.000000    1.000000    1.000000    1.000000    1.000000    1.000000    1.000000    1.000000    1.000000    1.000000    1.000000    1.000000    1.000000    1.000000    1.000000    1.000000    1.000000

ABS_FOR_TRAINING=scaled_df[-15:]

ABS_FOR_TRAINING

Part's Z-Height (mm)    Part's Weight (N)   Part's Volume (cm^3)    Part's Surface Area (cm^2)  Layer Height (mm)   Infill Density (%)  Printing/Scanning Speed (mm/s)  Part's Orientation (Support's height) (mm)  Part's Orientation (Support's volume) (cm^3)    Climate change (kg CO2 eq.) Climate change, incl biogenic carbon (kg CO2 eq.)   Fine Particulate Matter Formation (kg PM2.5 eq.)    Fossil depletion (kg oil eq.)   Freshwater Consumption (m^3)    Freshwater ecotoxicity (kg 1,4-DB eq.)  Freshwater Eutrophication (kg P eq.)    Human toxicity, cancer (kg 1,4-DB eq.)  Human toxicity, non-cancer (kg 1,4-DB eq.)  Ionizing Radiation (Bq. C-60 eq. to air)    Land use (Annual crop eq. yr)   Marine ecotoxicity (kg 1,4-DB eq.)  Marine Eutrophication (kg N eq.)    Metal depletion (kg Cu eq.) Photochemical Ozone Formation, Ecosystem (kg NOx eq.)   Photochemical Ozone Formation, Human Health (kg NOx eq.)    Stratospheric Ozone Depletion (kg CFC-11 eq.)   Terrestrial Acidification (kg SO2 eq.)  Terrestrial ecotoxicity (kg 1,4-DB eq.)
17  0.771175    0.000000    0.000000    0.039373    0.0 0.00    1.000000    0.786031    0.073375    0.074553    0.134586    0.034095    0.028552    0.211499    0.022997    0.025847    0.212395    0.053519    0.026829    0.024620    0.025341    0.024631    0.151215    0.030528    0.030756    0.025467    0.031049    0.101550
18  1.000000    0.402355    0.402355    0.239088    0.0 0.25    0.666667    0.695845    0.083857    0.228946    0.263804    0.206885    0.202646    0.303901    0.200697    0.202542    0.309232    0.218034    0.202018    0.202028    0.201998    0.201970    0.273589    0.203795    0.204489    0.202037    0.204497    0.244972
19  1.000000    0.586850    0.586850    0.239088    0.0 0.50    0.666667    0.695845    0.083857    0.303682    0.328988    0.285005    0.280641    0.367556    0.279443    0.280932    0.367334    0.294044    0.282275    0.279508    0.281189    0.280340    0.339102    0.282178    0.282627    0.281268    0.282655    0.315859
20  1.000000    0.770363    0.770363    0.239088    0.0 0.75    0.666667    0.695845    0.083857    0.376595    0.398006    0.361139    0.358635    0.429158    0.357491    0.360169    0.429740    0.370054    0.360238    0.359160    0.359162    0.359606    0.406675    0.359736    0.359933    0.359083    0.359743    0.386746
21  1.000000    0.952895    0.952895    0.239088    0.0 1.00    0.666667    0.695845    0.083857    0.511484    0.597393    0.453823    0.444986    0.706366    0.435540    0.440678    0.707338    0.482507    0.442788    0.438812    0.440789    0.440215    0.620931    0.448020    0.448878    0.441143    0.450749    0.549951
22  0.000000    0.409225    0.409225    0.239088    0.0 0.25    0.666667    0.000000    0.000000    0.177907    0.207439    0.158226    0.155292    0.246407    0.153310    0.154661    0.246826    0.168055    0.156157    0.154598    0.154483    0.154501    0.215904    0.155941    0.156276    0.153933    0.156317    0.190570
23  0.000000    0.590775    0.590775    0.239088    0.0 0.50    0.666667    0.000000    0.000000    0.252643    0.279141    0.235353    0.231894    0.314168    0.230662    0.232203    0.313536    0.244065    0.231828    0.232078    0.232456    0.231975    0.287186    0.232673    0.233583    0.231749    0.233405    0.264754
24  0.000000    0.773307    0.773307    0.239088    0.0 0.75    0.666667    0.000000    0.000000    0.327379    0.348160    0.311486    0.309192    0.377823    0.308711    0.309322    0.378093    0.320075    0.309791    0.308472    0.310429    0.309449    0.357231    0.310231    0.310889    0.309564    0.310493    0.337290
25  0.000000    0.952895    0.952895    0.239088    0.0 1.00    0.666667    0.000000    0.000000    0.456799    0.536043    0.404171    0.393454    0.638604    0.385366    0.389831    0.640628    0.429404    0.392341    0.388125    0.390838    0.390954    0.559126    0.396865    0.398171    0.390209    0.399358    0.492252
26  0.812500    0.682041    0.682041    1.000000    0.0 0.00    0.666667    0.031409    0.000000    0.310973    0.359663    0.278385    0.273677    0.420945    0.269686    0.272458    0.421132    0.295085    0.273102    0.272266    0.272661    0.271832    0.373712    0.275578    0.275977    0.272779    0.277302    0.332344
27  0.812500    0.682041    0.682041    1.000000    0.0 0.00    0.000000    0.031409    0.000000    0.331024    0.401840    0.285005    0.276462    0.490760    0.269686    0.273729    0.492145    0.307580    0.275396    0.272266    0.273879    0.272727    0.419036    0.279703    0.280133    0.274194    0.281585    0.362018
28  0.812500    1.000000    1.000000    1.000000    0.0 0.25    0.666667    0.031409    0.000000    0.445862    0.493865    0.414101    0.409471    0.554415    0.405575    0.406780    0.554551    0.431487    0.408393    0.406227    0.409113    0.408867    0.505562    0.411716    0.412303    0.408602    0.412206    0.467524
29  0.813166    0.663395    0.663395    1.000000    0.0 0.00    0.666667    1.000000    1.000000    0.863288    0.861963    0.860973    0.860724    0.866530    0.860627    0.860169    0.866581    0.864640    0.862417    0.862419    0.862329    0.861173    0.868150    0.859736    0.858687    0.862762    0.860814    0.863172
30  0.813166    0.663395    0.663395    1.000000    0.0 0.00    0.000000    1.000000    1.000000    0.883339    0.904141    0.867594    0.860724    0.936345    0.860627    0.864407    0.939746    0.877135    0.864710    0.862419    0.863548    0.861173    0.913473    0.867987    0.866999    0.864177    0.865096    0.894494
31  0.813166    0.986261    0.986261    1.000000    0.0 0.25    0.666667    1.000000    1.000000    1.000000    1.000000    1.000000    1.000000    1.000000    1.000000    1.000000    1.000000    1.000000    1.000000    1.000000    1.000000    1.000000    1.000000    1.000000    1.000000    1.000000    1.000000    1.00000
Asked By: JZ0

||

Answers:

Sample dataframe

df = pd.DataFrame({
    'feature1': [x for x in range(1, 21)],
    'feature2': [x for x in range(21, 41)],
    'target': [x for x in range(31,51)]
})

Looks as below

feature1    feature2    target
0   1   21  31
1   2   22  32
2   3   23  33
3   4   24  34
4   5   25  35
5   6   26  36
6   7   27  37
7   8   28  38
8   9   29  39
9   10  30  40
10  11  31  41
11  12  32  42
12  13  33  43
13  14  34  44
14  15  35  45
15  16  36  46
16  17  37  47
17  18  38  48
18  19  39  49
19  20  40  50

Extract last n rows (change n based on your need)

n = -5

# extract last n rows
to_be_added = df[n:]

Use train_test_split and split remaining rows

from sklearn.model_selection import train_test_split

Y = df[:n].loc[:, 'target']

df.drop('target', inplace=True, axis=1)

X_train, X_test, y_train, y_test = train_test_split(df[:n], Y, test_size=0.25)

After splitting – y train looks as below

0     31
1     32
3     34
4     35
7     38
8     39
10    41
11    42
12    43
13    44
14    45

Lets concat x_train and y_train with the extracted info

y_train = pd.concat([y_train, to_be_added["target"]])
to_be_added.drop("target", inplace=True, axis=1)
X_train = pd.concat([X_train, to_be_added])

print y_train

0     31
1     32
3     34
4     35
7     38
8     39
10    41
11    42
12    43
13    44
14    45
15    46
16    47
17    48
18    49
19    50

Last n (in my case 5) rows added.

Answered By: srinath
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.