large-data

How to efficiently filter a large python list?

How to efficiently filter a large python list? Question: I have a relatively large array called allListings and want to filter out all rows where allListings[:][14] == listingID. This is the code I am using: tempRows = list(filter(lambda x: x[14] == listingID, allListings)) The filtering is repeated in a for loop for all different listingID …

Total answers: 2

How to delete a table rows and keep top X rows

How to delete a table rows and keep top X rows Question: I have this large table in MySQL incident_archive that has millions of records, I want to sort the rows by created column and keep the Top X rows and delete the rest, what is the most efficient way to do this. So far …

Total answers: 1

Radial heatmap from similarity matrix in Python

Radial heatmap from similarity matrix in Python Question: Summary I have a 2880×2880 similarity matrix (8.5 mil points). My attempt with Holoviews resulted in a 500 MB HTML file which never finishes "opening". So how do I make a round heatmap of the matrix? Details I had data from 10 different places, measured over 1 …

Total answers: 2

How to handle large volume of data in a single array using xarray?

How to handle large volume of data in a single array using xarray? Question: I have 16 years of daily meteorological data in NetCDF, it has and each data contain a grid size of 501 x 572. This means each year has dimensions of 365 x 501 x 572. I converted it into a one-dimensional …

Total answers: 1

Problem reading a data from a file with pandas Python (pandas.io.parsers.TextFileReader)

Problem reading a data from a file with pandas Python (pandas.io.parsers.TextFileReader) Question: i want to read a dataset from a file with pandas, but when i use pd.read_csv(), the program read it, but when i want to see the dataframe appears: pandas.io.parsers.TextFileReader at 0x1b3b6b3e198 As additional informational the file is too large (around 9 Gigas) …

Total answers: 2

Writing large Pandas Dataframes to CSV file in chunks

Writing large Pandas Dataframes to CSV file in chunks Question: How do I write out a large data files to a CSV file in chunks? I have a set of large data files (1M rows x 20 cols). However, only 5 or so columns of the data files are of interest to me. I want …

Total answers: 3

Python tools for out-of-core computation/data mining

Python tools for out-of-core computation/data mining Question: I am interested in python mining data sets too big to sit in RAM but sitting within a single HD. I understand that I can export the data as hdf5 files, using pytables. Also the numexpr allows for some basic out-of-core computation. What would come next? Mini-batching when …

Total answers: 4

"Large data" workflows using pandas

"Large data" workflows using pandas Question: I have tried to puzzle out an answer to this question for many months while learning pandas. I use SAS for my day-to-day work and it is great for it’s out-of-core support. However, SAS is horrible as a piece of software for numerous other reasons. One day I hope …

Total answers: 16

Shared memory in multiprocessing

Shared memory in multiprocessing Question: I have three large lists. First contains bitarrays (module bitarray 0.8.0) and the other two contain arrays of integers. l1=[bitarray 1, bitarray 2, … ,bitarray n] l2=[array 1, array 2, … , array n] l3=[array 1, array 2, … , array n] These data structures take quite a bit of …

Total answers: 5