Make a dataframe with elements of a list with a common columns

Question

I am trying to sort a set of data, which is given by a set of csv files.

The problem is to join all the elements of a list into a single new dataframe, maintaining a single date column that has a range for each element of the list.

The following code snippet creates two lists, dflistcomplete and dflistpriceusd, and uses a for loop to iterate through a list of previously fetched names (The "nombres" object is a list of the name of each of the csv files in the directory). Within the loop, a CSV file is read using the read_csv function of the Pandas library and the result is stored in a temporary dataframe called temp_df.

Then, the "PriceUSD" column is checked to see if it is present in the temporary dataframe with the if "PriceUSD" statement in temp_df.columns. If such a column is present, the CSV file is read again, but this time only the "time" and "PriceUSD" columns are included using the usecols argument. The result is stored in another temporary dataframe called temp_df_priceusd, and the column "PriceUSD" is renamed with the name found in the name list. Finally, the temporary dataframe is added to the dflistpriceusd list. If the column "PriceUSD" is not present in the temporary dataframe, the rest of the loop is skipped with the continue statement.

Finally, the original temporary dataframe is added to the dflistcomplete list. At the end of the loop, both lists will contain dataframes read from CSV files with specific names.

Attached is a "schematic" of how I intend to organise the data.

Many thanks in advance

dflistpriceusd = []
for i in range(len(nombres)):
    temp_df = pd.read_csv(filepath_or_buffer = "csv221022/" + nombres[i] + ".csv",
                          header = 0,
                          sep = ",")
    if "PriceUSD" in temp_df.columns:
        temp_df_priceusd = pd.read_csv(filepath_or_buffer = "csv221022/" + nombres[i] + ".csv",
                          header = 0,
                          usecols = ["time", "PriceUSD"],
                          sep = ",")
        temp_df_priceusd.rename(columns = {'PriceUSD': nombres[i]}, inplace = True)
        dflistpriceusd.append(temp_df_priceusd)

    else:
        continue
    dflistcompleto.append(temp_df)

Asked By: Luis Bleriot

||

Source

Answer 1

In R:

## Bucle que crea objetos diferentes para cada CSV

for(file in files){
  dataframe <- read.csv(file, stringsAsFactors = FALSE)
  for(col in colnames(dataframe)){#checknombrecolumna
    if(col != "PriceUSD"){
    }
    else{
      ## Primero montar los dataframes y elegir solo dos columnas
      dataframe2 <- read.csv(file, stringsAsFactors = FALSE)
      dataframe2 <- dataframe2[c("time", "PriceUSD")]
      ## Configurar el nombre del objeto a "df_file"
      dfnames2 <- as.character(gsub("\.csv$", "", file))
      prefijo <- as.character("df_")
      dfnames_final <- paste(c(prefijo, dfnames2), collapse = "")
      ## Cambiar el nombre de la columna "PriceUSD"
      prefijo = "Price"
      dfcolnames <- paste(c(prefijo, dfnames2), collapse = "")
      colnames(dataframe2)[colnames(dataframe2) == "PriceUSD"] <- dfcolnames
      ## Formar el objeto
      assign(dfnames_final, dataframe2)
    }
  }
}

## Eliminar los dataframes residuales del bucle anterior del environment
rm(dataframe, dataframe2, col, dfcolnames, dfnames_final, dfnames2, file, prefijo)

## Generar una lista con los objetos del environment que son un dataframe
environmentlist <- Filter(is.data.frame, mget(ls()))

## Ordenar las fechas de los dataframes
environmentlist <- lapply(environmentlist, function(x) arrange(x, time))

## Juntar todos los dataframes en uno solo
df_final <- reduce(environmentlist, full_join, by = "time")

## Cambiar el formato de la fecha
df_final$time <- as.Date(df_final$time, format = "%Y-%m-%d")

## Ordenar fechas del dataframe final
df_final <- arrange(df_final, time)

Answered By: Luis Bleriot

Make a dataframe with elements of a list with a common columns

Question:

Answers: