Python Pandas equivalent in JavaScript

Question:

With this CSV example:

   Source,col1,col2,col3
   foo,1,2,3
   bar,3,4,5

The standard method I use Pandas is this:

  1. Parse CSV

  2. Select columns into a data frame (col1 and col3)

  3. Process the column (e.g. avarage the values of col1 and col3)

Is there a JavaScript library that does that like Pandas?

Asked By: neversaint

||

Answers:

Ceaveat The following is applicable only to d3 v3, and not the latest d4v4!

I am partial to d3.js, and while it won’t be a total replacement for Pandas, if you spend some time learning its paradigm, it should be able to take care of all your data wrangling for you. (And if you wind up wanting to display results in the browser, it’s ideally suited to that.)

Example. My CSV file data.csv:

name,age,color
Mickey,65,black
Donald,58,white
Pluto,64,orange

In the same directory, create an index.html containing the following:

<!DOCTYPE html>
<html>
  <head>
    <meta charset="utf-8"/>
    <title>My D3 demo</title>

    <script src="http://d3js.org/d3.v3.min.js" charset="utf-8"></script>
  </head>
  <body>

      <script charset="utf-8" src="demo.js"></script>
  </body>
</html>

and also a demo.js file containing the following:

d3.csv('/data.csv',

       // How to format each row. Since the CSV file has a header, `row` will be
       // an object with keys derived from the header.
       function(row) {
         return {name : row.name, age : +row.age, color : row.color};
       },

       // Callback to run once all data's loaded and ready.
       function(data) {
         // Log the data to the JavaScript console
         console.log(data);

         // Compute some interesting results
         var averageAge = data.reduce(function(prev, curr) {
           return prev + curr.age;
         }, 0) / data.length;

         // Also, display it
         var ulSelection = d3.select('body').append('ul');
         var valuesSelection =
             ulSelection.selectAll('li').data(data).enter().append('li').text(
                 function(d) { return d.age; });
         var totalSelection =
             ulSelection.append('li').text('Average: ' + averageAge);
       });

In the directory, run python -m SimpleHTTPServer 8181, and open http://localhost:8181 in your browser to see a simple listing of the ages and their average.

This simple example shows a few relevant features of d3:

  • Excellent support for ingesting online data (CSV, TSV, JSON, etc.)
  • Data wrangling smarts baked in
  • Data-driven DOM manipulation (maybe the hardest thing to wrap one’s head around): your data gets transformed into DOM elements.
Answered By: Ahmed Fasih

It’s pretty easy to parse CSV in javascript because each line’s already essentially a javascript array. If you load your csv into an array of strings (one per line) it’s pretty easy to load an array of arrays with the values:

var pivot = function(data){
    var result = [];
    for (var i = 0; i < data.length; i++){
        for (var j=0; j < data[i].length; j++){
            if (i === 0){
                result[j] = [];
            }
            result[j][i] = data[i][j];
        }
    }
    return result;
};

var getData = function() {
    var csvString = $(".myText").val();
    var csvLines = csvString.split(/n?$/m);

    var dataTable = [];

    for (var i = 0; i < csvLines.length; i++){
        var values;
        eval("values = [" + csvLines[i] + "]");
        dataTable[i] = values;
    }

    return pivot(dataTable);
};

Then getData() returns a multidimensional array of values by column.

I’ve demonstrated this in a jsFiddle for you.

Of course, you can’t do it quite this easily if you don’t trust the input – if there could be script in your data which eval might pick up, etc.

Answered By: Steve K

I think the closest thing are libraries like:

Recline in particular has a Dataset object with a structure somewhat similar to Pandas data frames. It then allows you to connect your data with “Views” such as a data grid, graphing, maps etc. Views are usually thin wrappers around existing best of breed visualization libraries such as D3, Flot, SlickGrid etc.

Here’s an example for Recline:

// Load some data
var dataset = recline.Model.Dataset({
  records: [
    { value: 1, date: '2012-08-07' },
    { value: 5, b: '2013-09-07' }
  ]
  // Load CSV data instead
  // (And Recline has support for many more data source types)
  // url: 'my-local-csv-file.csv',
  // backend: 'csv'
});

// get an element from your HTML for the viewer
var $el = $('#data-viewer');

var allInOneDataViewer = new recline.View.MultiView({
  model: dataset,
  el: $el
});
// Your new Data Viewer will be live!
Answered By: Rufus Pollock

I’ve been working on a data wrangling library for JavaScript called data-forge. It’s inspired by LINQ and Pandas.

It can be installed like this:

npm install --save data-forge

Your example would work like this:

var csvData = "Source,col1,col2,col3n" +
    "foo,1,2,3n" +
    "bar,3,4,5n";

var dataForge = require('data-forge');
var dataFrame = 
    dataForge.fromCSV(csvData)
        .parseInts([ "col1", "col2", "col3" ])
        ;

If your data was in a CSV file you could load it like this:

var dataFrame = dataForge.readFileSync(fileName)
    .parseCSV()
    .parseInts([ "col1", "col2", "col3" ])
    ;

You can use the select method to transform rows.

You can extract a column using getSeries then use the select method to transform values in that column.

You get your data back out of the data-frame like this:

var data = dataFrame.toArray();

To average a column:

 var avg = dataFrame.getSeries("col1").average();

There is much more you can do with this.

You can find more documentation on npm.

Answered By: Ashley Davis

This wiki will summarize and compare many pandas-like Javascript libraries.

In general, you should check out the d3 Javascript library. d3 is very useful "swiss army knife" for handling data in Javascript, just like pandas is helpful for Python. You may see d3 used frequently like pandas, even if d3 is not exactly a DataFrame/Pandas replacement (i.e. d3 doesn’t have the same API; d3 does not have Series / DataFrame classes with methods that match the pandas behavior)

Ahmed’s answer explains how d3 can be used to achieve some DataFrame functionality, and some of the libraries below were inspired by things like LearnJsData which uses d3 and lodash.

As for DataFrame-style data transformation (splitting, joining, group by etc) , here is a quick list of some of the Javascript libraries.

Note the libraries are written in different languages, including…

  • browser-compatible aka client-side Javascript
  • Node.js aka Server-side Javascript
  • Typescript
  • Some even use CPython transpiled to WebAssembly (but work with Node.js and/or browsers)

…so use the option that’s right for you:

  • Pyodide (browser-support AND Nodejs-support)
  • danfo-js (browser-support AND NodeJS-support)
    • From Vignesh’s answer

    • danfo (which is often imported and aliased as dfd); has a basic DataFrame-type data structure, with the ability to plot directly

    • Built by the team at Tensorflow: "One of the main goals of Danfo.js is to bring data processing, machine learning and AI tools to JavaScript developers. … Open-source libraries like Numpy and Pandas…"

    • pandas is built on top of numpy; likewise danfo-js is built on tensorflow-js

    • please note danfo may not (yet?) support multi-column indexes

  • pandas-js
    • UPDATE The pandas-js repo has not been updated in awhile
    • From STEEL and Feras‘ answers
    • "pandas.js is an open source (experimental) library mimicking the Python pandas library. It relies on Immutable.js as the NumPy logical equivalent. The main data objects in pandas.js are, like in Python pandas, the Series and the DataFrame."
  • dataframe-js
    • "DataFrame-js provides an immutable data structure for javascript and datascience, the DataFrame, which allows to work on rows and columns with a sql and functional programming inspired api."
  • data-forge
  • jsdataframe
    • "Jsdataframe is a JavaScript data wrangling library inspired by data frame functionality in R and Python Pandas."
  • dataframe
    • "explore data by grouping and reducing."
  • SQL Frames
    • "DataFrames meet SQL, in the Browser"
    • "SQL Frames is a low code data management framework that can be directly embedded in the browser to provide rich data visualization and UX. Complex DataFrames can be composed using familiar SQL constructs. With its powerful built-in analytics engine, data sources can come in any shape, form and frequency and they can be analyzed directly within the browser. It allows scaling to big data backends by transpiling the composed DataFrame logic to SQL."

Then after coming to this question, checking other answers here and doing more searching, I found options like:

  • Apache Arrow in JS
    • Thanks to user Back2Basics suggestion:
    • "Apache Arrow is a columnar memory layout specification for encoding vectors and table-like containers of flat and nested data. Apache Arrow is the emerging standard for large in-memory columnar data (Spark, Pandas, Drill, Graphistry, …)"
  • polars
  • Observable
    • At first glance, seems like a JS alternative to the IPython/Jupyter "notebooks"
    • Observable’s page promises: "Reactive programming", a "Community", on a "Web Platform"
    • See 5 minute intro here
  • portal.js (formerly recline; from Rufus’ answer)
    • MAY BE OUTDATED: Does not use a "DataFrame" API
    • MAY BE OUTDATED: Instead emphasizes its "Multiview" (the UI) API, (similar to jQuery/DOM model) which doesn’t require jQuery but does require a browser! More examples
    • MAY BE OUTDATED: Also emphasizes its MVC-ish architecture; including back-end stuff (i.e. database connections)
  • js-data
    • Really more of an ORM! Most of its modules correspond to different data storage questions (js-data-mongodb, js-data-redis, js-data-cloud-datastore), sorting, filtering, etc.
    • On plus-side does work on Node.js as a first-priority; "Works in Node.js and in the Browser."
  • miso (another suggestion from Rufus)
  • AlaSQL
    • "AlaSQL" is an open source SQL database for Javascript with a strong focus on query speed and data source flexibility for both relational data and schemaless data. It works in your browser, Node.js, and Cordova."
  • Some thought experiments:

Here are the criteria we used to consider the above choices

  • General Criteria
    • Language (NodeJS vs browser JS vs Typescript)
    • Dependencies (i.e. if it uses an underlying library / AJAX/remote API’s)
    • Actively supported (active user-base, active source repository, etc)
    • Size/speed of JS library
  • Panda’s criterias in its R comparison
    • Performance
    • Functionality/flexibility
    • Ease-of-use
  • Similarity to Pandas / Dataframe API’s
    • Specifically hits on their main features
    • Data-science emphasis
    • Built-in visualization functions
    • Demonstrated integration in combination with other tools like Jupyter
      (interactive notebooks), etc
Answered By: The Red Pea

Here is an dynamic approach assuming an existing header on line 1. The csv is loaded with d3.js.

function csvToColumnArrays(csv) {

    var mainObj = {},
    header = Object.keys(csv[0]);

    for (var i = 0; i < header.length; i++) {

        mainObj[header[i]] = [];
    };

    csv.map(function(d) {

        for (key in mainObj) {
            mainObj[key].push(d[key])
        }

    });        

    return mainObj;

}


d3.csv(path, function(csv) {

    var df = csvToColumnArrays(csv);         

});

Then you are able to access each column of the data similar an R, python or Matlab dataframe with df.column_header[row_number].

Answered By: Manuel

Below is Python numpy and pandas

“`

import numpy as np
import pandas as pd

data_frame = pd.DataFrame(np.random.randn(5, 4), ['A', 'B', 'C', 'D', 'E'], [1, 2, 3, 4])

data_frame[5] = np.random.randint(1, 50, 5)

print(data_frame.loc[['C', 'D'], [2, 3]])

# axis 1 = Y | 0 = X
data_frame.drop(5, axis=1, inplace=True)

print(data_frame)

“`

The same can be achieved in JavaScript* [numjs works only with Node.js]
But D3.js has much advanced Data file set options. Both numjs and Pandas-js still in works..

import np from 'numjs';
import { DataFrame } from 'pandas-js';

const df = new DataFrame(np.random.randn(5, 4), ['A', 'B', 'C', 'D', 'E'], [1, 2, 3, 4])

// df
/*

          1         2         3         4
A  0.023126  1.078130 -0.521409 -1.480726
B  0.920194 -0.201019  0.028180  0.558041
C -0.650564 -0.505693 -0.533010  0.441858
D -0.973549  0.095626 -1.302843  1.109872
E -0.989123 -1.382969 -1.682573 -0.637132

*/

Answered By: STEEL

Pandas.js
at the moment is an experimental library, but seems very promising it uses under the hood immutable.js and NumpPy logic, both data objects series and DataFrame are there..

10-Feb-2021 Update as @jarthur mentioned it seems no update on this repo for last 4 years

Answered By: 0xFK

@neversaint your wait is over. say welcome to Danfo.js which is pandas like Javascript library built on tensorflow.js and supports tensors out of the box. This means you can convert danfo data structure to Tensors. And you can do groupby, merging, joining, plotting and other data processing.

Answered By: Vignesh Prajapati

Arquero is a library for handling relational data, with syntax similar to the popular R package dplyr (which is a sort of SQL-like).
https://observablehq.com/@uwdata/introducing-arquero

Answered By: jason mogg

It’s the year 2023 of the Common Era, and we have Pyodide, which is a cross-compilation of the entire CPython kernel, plus a large chunk of the numeric Python ecosystem, to WebAssembly (wasm) in order to run in browsers/Node.js. This of course includes Pandas, so see for example "Pandas Tutor: Using Pyodide to Teach Data Science at Scale" for what you can build with Pandas + Pyodide, and the full list of packages that come out-of-the-box with Pyodide.

For Node, this is it. Pandas via Pyodide is the answer. For browsers… there are some challenges that have more to do with the fundamental nature of browsers than anything with the Pyodide stack. For example, the easiest way to load a CSV file into Pandas is to do a web request (see https://github.com/joyceerhl/vscode-pyolite/issues/12 and references therein), and similarly there are challenges with "saving" data (if you can, POST it to a server, but that’s extra setup—no doubt Dropbox/Google Drive/etc. integrations are coming soon).

But in the nearly 8 years since this question was asked, no JavaScript package has even come close to bringing us Pandas, and here we are: Pandas has come to us. Hallelujah.

Answered By: Ahmed Fasih
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.