bigdata

Problem reading a data from a file with pandas Python (pandas.io.parsers.TextFileReader)

Problem reading a data from a file with pandas Python (pandas.io.parsers.TextFileReader) Question: i want to read a dataset from a file with pandas, but when i use pd.read_csv(), the program read it, but when i want to see the dataframe appears: pandas.io.parsers.TextFileReader at 0x1b3b6b3e198 As additional informational the file is too large (around 9 Gigas) …

Total answers: 2

Sklearn-GMM on large datasets

Sklearn-GMM on large datasets Question: I have a large data-set (I can’t fit entire data on memory). I want to fit a GMM on this data set. Can I use GMM.fit() (sklearn.mixture.GMM) repeatedly on mini batch of data ?? Asked By: abilng || Source Answers: There is no reason to fit it repeatedly. Just randomly …

Total answers: 4

sklearn and large datasets

sklearn and large datasets Question: I have a dataset of 22 GB. I would like to process it on my laptop. Of course I can’t load it in memory. I use a lot sklearn but for much smaller datasets. In this situations the classical approach should be something like. Read only part of the data …

Total answers: 4

Would SQLite for Python be helpful for a file this size?

Would SQLite for Python be helpful for a file this size? Question: I have a file of around 60 million lines. I am trying to constantly query the file to find information for a list of names. Each line in the file contains a name followed by relevant information. I tried to build a dictionary …

Total answers: 1

Querying relational data in a reasonable amount of time

Querying relational data in a reasonable amount of time Question: I have a spreadsheet with about 1.7m lines, totalling 1 GB, and need to perform queries on it. Being most comfortable with Python, my first approach was to hack together a bunch of dictionaries keyed in a way that would facilitate the queries I was …

Total answers: 3

How to get started with Big Data Analysis

How to get started with Big Data Analysis Question: I’ve been a long time user of R and have recently started working with Python. Using conventional RDBMS systems for data warehousing, and R/Python for number-crunching, I feel the need now to get my hands dirty with Big Data Analysis. I’d like to know how to …

Total answers: 2