Speech Ocean Guidelines
Speech Ocean Guidelines
This is not necessarily what the formal version of the word is or it’s ungrammatical or not in the given
contents.
E.g. I want to go to the mall vs. I wanna go to the mall
If the speakers said “wanna”, the transcription will be “wanna” not “want to”.
If a speaker utters the plural form “bonds” in the sample sentence below, transcribe it exactly as
“bonds”: find a bonds with a ten year maturity date
2. Case
Transcriptions are CASE SENSITIVE, unless specified otherwise in the documentation of the database. All
words and names should be transcribed in mixed case.
Proper Noun (e.g. names, addresses, countries, organizations, months and etc.) begins with capital
letter, such as India, Microsoft, Virat, Gurgaon, January, and etc.
Brand names, trademarks are transcribed as their original format including their case form (e.g.
MySpace, Hotmail dot com, KFC, IBM, NASA, Amazon, Flipkart, etc).
If not special specified, a word should not be capitalized just because it’s at the beginning of a sentence.
Words should only be capitalized if they are usually capitalized in mid-sentence.
Number sequences (flight numbers, times, dates, aircraft types, money amounts, etc.) will be spelled out
to reflect what was said ("flight six one three"; "seven thirty"; "August twenty first"; "seven forty seven";
"four hundred and ten dollars".)
If digits have alternate dictionary forms (e.g. "zero" or "oh" or "naught" in English), the correct
alternative should be used that reflects the form actually pronounced.
Long numbers may be written together, or with blanks between parts in order to reduce the lexicon size.
4. Letter Sequences
Letter sequences occur in spelled words, ZIP-codes, acronyms and abbreviations ("D F W"; "A P slash
eighty"; "P M"; "C O"; “I B M” etc.) Letters should be in upper case, separated by a space.
The AM and PM of times (e.g., "five thirty P M") will be treated as examples of letter sequences, i.e.,
upper case and separated by a space.
Example: my name is mister Tom T O M (here speaker is saying his name and also spells that out)
5. Acronyms
Acronyms refer to terms based on the initial letters of their various elements and are spoken as words.
They should be transcribed as words in upper case without white spaces between the letters
E.g.
6. Abbreviations
Do not introduce abbreviations in the transcription. Always use the spelled-out form (full word) when
pronounced as such.
E.g.
"Mrs. Smith this way please." = "missus Smith, this way please."
“Then they drove to St. Paul.” = “then they drove to Saint Paul.”
7. Punctuation
Use end-punctuations (full stop, question mark, exclamation mark) to indicate the end of a complete
sentence.
Use punctuation symbols that are essential part of the word, such as apostrophes and hyphens.
Use commas to break up long stretches of speech. This is to facilitate reader comprehension.
AVOID: semi-colons, quotation marks
If someone speaks a special character, replace the character with the corresponding word (lower case).
This should only be done when it is certain the user has spoken a character. The transcription should
reflect exactly what was said.
E.g.
8. Unintelligible
Words Unintelligible speech, words or stretches of speech that are completely unintelligible, were
transcribed by “**”. The “**” marker is separated from neighbouring intelligible words with spaces.
E.g.
9. Filler Words
Filler words are “words” that speakers use to indicate hesitation or to maintain control of a conversation
while thinking of what to say next. Each language has a limited set of filler words that speakers can use.
The spelling of filler words should not be altered to reflect how the speaker pronounces the word, and
each filler word should be preceded with a hashtag (#).
E.g.
Five categories of non-speech acoustic events must be transcribed. Events will only be transcribed if they
are clearly distinguishable. Very low-level, i.e. non-intrusive events will be ignored.
The event will be transcribed at the place of occurrence, using the defined symbols in angle brackets. For
noise events that occur over a span of one or more words, the transcription should indicate the beginning
of the noise, just before the first word it affects.
The first category of acoustic events <SPK/> originate from the speaker, and the other categories
originate from another source. Sounds originating from the speaker usually do not overlap with the
target speech, while sounds originating from other sources could of course occur simultaneously with the
speech.
1. [SPK]
Speaker's sudden and obvious noise, laughing, coughing, heavy breath. Don’t use in an invalid blank.
This is not frequently used tag.
2. [NON]
Sudden non-human noise, knocking, ringing, bumping table etc... Don’t use in an invalid blank. This
is not frequently used tag.
3. [NPS]
Voice from another person not the speaker.
4. [STA]
Sudden long background noise, wind blow, rain, music etc.; Continuous background noise from the
audio shall be ignored. This is not frequently used tag.
5. [Z]
To indicate the part that somebody’s talking but the whole part is all of unclear and unintelligible
words or a third language;Blank used with this tag shall be invalid. [Z] shall not exist in valid parts. It
appears in a blank alone;
6. [S]
To indicate the part that nobody’s talking. Parts used with this tag shall be invalid. Blank used with
this tag shall be invalid.
[S] shall not exist in valid parts; It appears in a blank alone
7. #
Please use #before a filler. # Does not appear alone ;
8. **
Indicates that a word cannot be heard clearly and cannot be written;
Remark:
<STA/> and <NON/> should only be used if the sounds are not inherent to the environment as
such.
E.g. in the car a stationary background noise and street noises can be expected as given with
the environment of the recording. These noises should not be transcribed. Only obvious and
salient deviations from the given background should be marked.
<STA/> is usually put in the initial position of the utterances. <NPS/> is NOT preferred to using
in “restaurant” or “street” environment either to mark the other people speaking in vicinity.
If <SPK/> or <NON/> begins in a word then the symbol was put before the first word affected.
The symbols were always separated from the surrounding words by spaces.
Below is the screenshot of the work portal where you need to work, the work is quite simple and
doable you just need to listen the audio and write what you hear. The audio quality is good. You don’t
need to create any segments as you will get audio where segments are already created. So its simple
you need to listen each segments and write what you hear following the guidelines.
Each ID or Individual should complete at-least 1hours of transcription to claim their first payment
The project volume is 550hours now and we are planning to onboard a huge number of team
members, so if you have team or friends who can work along with you, they are most welcome.