Problem reading a data from a file with pandas Python (pandas.io.parsers.TextFileReader)
Question:
i want to read a dataset from a file with pandas, but when i use pd.read_csv(), the program read it, but when i want to see the dataframe appears:
pandas.io.parsers.TextFileReader at 0x1b3b6b3e198
As additional informational the file is too large (around 9 Gigas)
The file use as a separator the vertical lines, and i tried using chunksize but it doesn’t work.
import pandas as pd
df = pd.read_csv(r"C:UsersdguerrDocumentsfilesAutomotivetarget_file", iterator=True, sep='|',chunksize=1000)
I want to import my data in the traditional pandas dataframe format.
Answers:
You can load it chunk by chunk by doing:
import pandas as pd
path_to_file = "C:/Users/dguerr/Documents/Acxiom files/Automotive/auto_model_target_file"
chunk_size = 1000
for chunk in pd.read_csv(path_to_file,chunksize=chunk_size):
# do your stuff
You might want to check encoding types within a DataFrame. Your pd.read_csv defaults to utf8
, should you be using latin1
for instance, this could potentially lead to such errors.
import pandas as pd
df = pd.read_csv('C:/Users/dguerr/Documents/Acxiom files/Automotive/auto_model_target_file',
encoding='latin-1', chunksize=1000)
i want to read a dataset from a file with pandas, but when i use pd.read_csv(), the program read it, but when i want to see the dataframe appears:
pandas.io.parsers.TextFileReader at 0x1b3b6b3e198
As additional informational the file is too large (around 9 Gigas)
The file use as a separator the vertical lines, and i tried using chunksize but it doesn’t work.
import pandas as pd
df = pd.read_csv(r"C:UsersdguerrDocumentsfilesAutomotivetarget_file", iterator=True, sep='|',chunksize=1000)
I want to import my data in the traditional pandas dataframe format.
You can load it chunk by chunk by doing:
import pandas as pd
path_to_file = "C:/Users/dguerr/Documents/Acxiom files/Automotive/auto_model_target_file"
chunk_size = 1000
for chunk in pd.read_csv(path_to_file,chunksize=chunk_size):
# do your stuff
You might want to check encoding types within a DataFrame. Your pd.read_csv defaults to utf8
, should you be using latin1
for instance, this could potentially lead to such errors.
import pandas as pd
df = pd.read_csv('C:/Users/dguerr/Documents/Acxiom files/Automotive/auto_model_target_file',
encoding='latin-1', chunksize=1000)