Pandas DataFrame column numerical integration

Question

Currently I have a DataFrame as shown below:

Device   TimeSec  Current  
 1       0.1      0.02
 1       0.25     0.05
 1       0.32     0.07
 1       0.45     0.12
 1       1.32     0.34
 1       2.37     2.24
 2       0.22     0.56
 2       0.34     0.79
 2       1.87     2.76
 2       3.21     3.11
 3       0.16     1.87
 3       1.12     2.33
 3       2.45     3.21
 3       3.45     5.11
 ......

I would like to do the numerical integration of Current with TimeSec (∫Idt) for different Devices and collect the data into a new DataFrame as below:

Device   IntegratedCurrent  
 1         x
 2         y
 3         z

The problem is that the time interval is not even and the number of data for each device is not even as well.

Asked By: FunkyMore

||

Source

Answer 1

Use some numerical integration function, e.g., scipy.integrate.trapz:

from scipy import integrate

df.groupby(df.Device).apply(lambda g: integrate.trapz(g.Current, x=g.TimeSec))

Note that this function, using the trapezoid integration rule, allows to specify the x values.

Answered By: Ami Tavory

Answer 2

Integrals are linked to the mean value theorem…
if you have a Device 1 consuming an average of 0.02 current units between 0.1s and 0.25s, the average transfered charge is 0.02 * (0.25 - 0.1) current units x seconds (Coulombs, if current unit is Amperes). When the same Device 1 average current changes to 0.05 current units for the next (0.32 – 0.25) seconds, the charge transfered is 0.05 * (0.32 - 0.25) units of charge. Considering this context, current as a step function is correctly represented by the dataframe, and there is no need for Simpson’s rule. And that’s the way it goes normally with current threshold algoritms, and current (continuous or alternate, rms) is mostly a stepwise function. Now, if Current in dataframe is instantaneous and varies acording to some parabolic rule with time, them Simpson’s rule should give us a better estimate on transfered or moved charge. Considering all this, stepwise current integration with elapsed time is better! Note that last data from Devices 1, 2, 3 and others has no use, as you do not know in principle for how long time it runs…

Let d be your dataframe, then:

d['dt'] = -d.TimeSec.diff(-1) # elapsed time; negatives are meaningless...
d.loc[~(d['dt'] > 0), 'dt'] = np.nan # replaces negatives with nans...
d['ct'] = d.dt * d.Current # calculates charge transfer
devs = d.Device.unique() # identify devices in d.Device
tct = [d[d.Device==dev].ct.sum() for dev in devs] # calculates total charge transfer

The list tct will give total charge transfer for each d.Device.

Another way to treat those nans is to suppose that each d.Device starts from time zero, substituting all negative d.dt by the corresponding value in d.TimeSec. This will also solve this issue and represents a good assumption to the problem. Under this view, dt should be calculated by .diff(1):

d['dt'] = d.TimeSec.diff() # as diff(1) is default.

There will be a nan at d.dt[0], so, to solve it:

d.dt[0] = d.TimeSec[0]

And you also can solve the negatives in d.dt with a list comprehension expression to substitute them with d.TimeSec corresponding values. Solving it with nans firstly and list comprehension after for instance:

d['dt'] = d.TimeSec.diff()
d.loc[~(d['dt'] > 0), 'dt'] = np.nan
d.dt=[ts if np.isnan(dt) else dt for ts,dt in zip(d.TimeSec,d.dt)]
d['ct'] = d.dt * d.Current

The same devs and tct lists copied from above will solve the problem of counting transfers by devices for this other kind of stepwise function representation.

Answered By: ePuntel

Pandas DataFrame column numerical integration

Question:

Answers: