Taking two samples from the data but with different observations

Question:

My data is made of about 9000 observations and 20 features (Edit – Pandas dataframe). I’ve taken a sample of 200 observations like this and conducted some analysis on it:

sample_data = data.sample(n = 200)

Now I want to randomely take a sample of 1000 observations from the original data, with non of the observations that showed up in the previous n = 200 sample. How do I do that?

Asked By: Kev

||

Answers:

If you are using pandas.DataFrame, you can simply do it by dropping the old ones and sampling 1000 new ones from the remaining data:

prev_sample_index = sample_data.index
filtered_data = data.drop(prev_sample_index)
new_sample = filtered_data.sample(n = 1000)
Answered By: JayPeerachai
Categories: questions Tags: , , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.