How to read a H5 file containing satellite data in Python?

Question:

As part of a project I’m exploring satellite data and the data is available in H5 format. I’m new to this format and I’m unable to process the data. I’m able to open the file in a software called Panoply and found that the DHI value is available in a format called Geo2D. Is there anyway to extract the data into a CSV format as shown below:

X Y GHI
X1 Y1
X2 Y2

Attaching screenshots of the file opened in Panoply alongside.

Link to the file: https://drive.google.com/file/d/1xQHNgrlrbyNcb6UyV36xh-7zTfg3f8OQ/view

I tried the following code to read the data. I’m able to store it as a 2d numpy array, but unable to do it along with the location.

`

import h5py
import numpy as np
import pandas as pd
import geopandas as gpd


#%%
f = h5py.File('mer.h5', 'r')

for key in f.keys():
    print(key) #Names of the root level object names in HDF5 file - can be groups or datasets.
    print(type(f[key])) # get the object type: usually group or dataset
    ls = list(f.keys())
   


key ='X'


masterdf=pd.DataFrame()


data = f.get(key)   
dataset1 = np.array(data)
masterdf = dataset1


np.savetxt("FILENAME.csv",dataset1, delimiter=",")


#masterdf.to_csv('new.csv')

enter image description here

enter image description here
`

Asked By: Rishikesh Sreehari

||

Answers:

Found an effective way to read the data, convert it to a dataframe and convert the projection parameters.

Code is tracked here: https://github.com/rishikeshsreehari/boring-stuff-with-python/blob/main/data-from-hdf5-file/final_converter.py

Code is as follows:

import pandas as pd
import h5py
import time
from pyproj import Proj, transform


input_epsg=24378
output_epsg=4326

start_time = time.time()


with h5py.File("mer.h5", "r") as file:
    df_X = pd.DataFrame(file.get("X")[:-2], columns=["X"])
    df_Y = pd.DataFrame(file.get("Y"), columns=["Y"])
    DHI = file.get("DHI")[0][:, :-2].reshape(-1)

final = df_Y.merge(df_X, how="cross").assign(DHI=DHI)[["X", "Y", "DHI"]]



final['X2'],final['Y2']=transform(input_epsg,output_epsg,final[["X"]].to_numpy(),final[["Y"]].to_numpy(),always_xy=True)


#final.to_csv("final_converted1.csv", index=False)

print("--- %s seconds ---" % (time.time() - start_time))
Answered By: Rishikesh Sreehari
Categories: questions Tags: , , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.