RTTM file format

Question:

I am currently developing a program making use of the RTTM file format. However, there does not seem to be documentation on the contents. Does anyone have specific elaborations on the fields indicated in this file format?

Asked By: Zen

||

Answers:

You can find this in Appendix A of NIST’s The 2009 (RT-09) Rich Transcription
Meeting Recognition Evaluation Plan
(archived version, original link is dead).

Answered By: bunbun

There is also https://catalog.ldc.upenn.edu/docs/LDC2004T12/RTTM-format-v13.pdf which differs from the spec given by @bernlim mostly in the omission of the 10-th field “SLAT”. I found that some tools indeed do not output 10 but 9 fields, and that md-eval-v21.pl used for scoring diarizations does not bother to read the 10-th field.

Answered By: qbolec

RTTM

Rich Transcription Time Marked (RTTM) files are space-delimited text files containing one turn per line, each line containing ten fields:

  • Type — segment type; should always by SPEAKER
  • File ID — file name; basename of the recording minus extension (e.g., rec1_a)
  • Channel ID — channel (1-indexed) that turn is on; should always be 1
  • Turn Onset — onset of turn in seconds from beginning of recording
  • Turn Duration — duration of turn in seconds
  • Orthography Field — should always by < NA >
  • Speaker Type — should always be < NA >
  • Speaker Name — name of speaker of turn; should be unique within scope of each file
  • Confidence Score — system confidence (probability) that information is correct; should always be < NA >
  • Signal Lookahead Time — should always be < NA >

For instance:

SPEAKER CMU_20020319-1400_d01_NONE 1 130.430000 2.350 <NA> <NA> juliet <NA> <NA>
SPEAKER CMU_20020319-1400_d01_NONE 1 157.610000 3.060 <NA> <NA> tbc <NA> <NA>
SPEAKER CMU_20020319-1400_d01_NONE 1 130.490000 0.450 <NA> <NA> chek <NA> <NA>

To write rttm file:

with open(rttmf, 'wb') as f:
    for turn in turns:
        fields = ['SPEAKER', turn.file_id, '1', format_float(turn.onset, n_digits), format_float(turn.dur, n_digits),
              '<NA>', '<NA>', turn.speaker_id, '<NA>', '<NA>']
        line = ' '.join(fields)
        f.write(line.encode('utf-8'))
        f.write(b'n')

reference urls:
https://github.com/nryant/dscore
https://github.com/nryant/dscore/blob/824f126ae9e78cf889e582eec07941ffe3a7d134/scorelib/rttm.py#L103

Answered By: zeeshan
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.