Orange Python Script create custom timestamp (Orange Data Mining Windows 10)

Question:

I am trying to achieve a script, which will create an Orange data table with just a single column containing a custom time stamp.

Usecase: I need a complete time stamp so I can merge some other csv files later on. I’m working in the Orange GUI BTW and am not working in the actual python shell or any other IDE (in case this information makes any difference).

Here’s what I have come up with so far:

From Orange.data import Domain, Table, TimeVariable
import numpy as np

domain = Domain([TimeVariable("Timestamp")])

# Timestamp from 22-03-08 to 2022-03-08 in minute steps
arr = np.arange("2022-03-08", "2022-03-15", dtype="datetime64[m]")

# Obviously necessary to achieve a correct format for the matrix
arr = arr.reshape(-1,1)

out_data = Table.from_numpy(domain, arr)

However the results do not match:

>>> print(arr)
[['2022-03-08T00:00']
 ['2022-03-08T00:01']
 ['2022-03-08T00:02']
 ...
 ['2022-03-14T23:57']
 ['2022-03-14T23:58']
 ['2022-03-14T23:59']]

>>> print(out_data)
[[27444960.0],
 [27444961.0],
 [27444962.0],
 ...
 [27455037.0],
 [27455038.0],
 [27455039.0]]

Obviously I’m missing something when handing over the data from numpy but I’m having a real hard time trying to understand the documentation.

I’ve also found this post which seems to tackle a similar issue, but I haven’t figured out how to apply the solution on my problem.

I would be really glad if anyone could help me out here. Please try to use simple terms and concepts.

Asked By: user20123884

||

Answers:

Thank you for the question, and apologies for the weak documentation of the TimeVariable.

In your code, you must change two things to work.
First, it is necessary to set whether the TimeVariable includes time and/or date data:

  • TimeVariable("Timestamp", have_date=True) stores only date information — it is analogous to datetime.date
  • TimeVariable("Timestamp", have_time=True) stores only time information (without date) — it is analogous to datetime.time
  • TimeVariable("Timestamp", have_time=True, have_date=True) stores date and time — it is analogous to datetime.datetime

You didn’t set that information in your example, so both were False by default. For your case, you must set both to True since your attribute will hold the date-time values.

The other issue is that Orange’s Table stores date-time values as UNIX epoch (seconds from 1970-01-01), and so also Table.from_numpy expect values in this format. Values in your current arr array are in minutes instead. I just transformed the dtype in the code below to seconds.

Here is the working code:

from Orange.data import Domain, Table, TimeVariable
import numpy as np

# Important: set whether TimeVariable contains time and/or date
domain = Domain([TimeVariable("Timestamp", have_time=True, have_date=True)])

# Timestamp from 22-03-08 to 2022-03-08 in minute steps
arr = np.arange("2022-03-08", "2022-03-15", dtype="datetime64[m]").astype("datetime64[s]")

# necessary to achieve a correct format for the matrix
arr = arr.reshape(-1,1)
out_data = Table.from_numpy(domain, arr)
Answered By: Primoz
Categories: questions Tags: , , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.