0% found this document useful (0 votes)
171 views6 pages

Speech Ocean Guidelines

The document provides guidelines for transcribing audio recordings from Hindi language sources. It outlines 10 general rules for transcription, including transcribing what is said verbatim, capitalizing proper nouns and acronyms, spelling out numbers and letters, punctuating according to grammar rules, and indicating unintelligible speech, filler words, and non-speech acoustic events using tags. Transcribers are expected to work online, listen to 2-3 minute audio clips in Hindi and submit accurate transcriptions for quality review.

Uploaded by

Ayushi Rajput
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
171 views6 pages

Speech Ocean Guidelines

The document provides guidelines for transcribing audio recordings from Hindi language sources. It outlines 10 general rules for transcription, including transcribing what is said verbatim, capitalizing proper nouns and acronyms, spelling out numbers and letters, punctuating according to grammar rules, and indicating unintelligible speech, filler words, and non-speech acoustic events using tags. Transcribers are expected to work online, listen to 2-3 minute audio clips in Hindi and submit accurate transcriptions for quality review.

Uploaded by

Ayushi Rajput
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

SPEECHOCEAN & SHININGSPIN CONFIDENCIAL

Speech-Ocean Transcription Guidelines- Hindi


Transcription Procedure & Requirements
The following recommendations and requirements are made for the transcription procedure:
1. You should use good quality headsets or earphone to listen audio properly also transcriptions should
be made in a quiet environment to maintain your focus.
2. You should have a decent internet to work on as this is online work, where you get audio in our portal
which you need to listen and write exactly the same.
3. Most Important: You should know Hindi very well because you will get audio in Hindi which you need
to listen and write the same.
4. Each audio will be of 2-3 minutes only which you need to submit and start working on next audio
immediately. The audio which you submit will go for Quality Check and if quality team will find any error,
they will revert back that particular audio for redo or rework.

General Transcription Rule:


1. Transcriptions should reflect what the user really says. This means you need to write what you hear.

This is not necessarily what the formal version of the word is or it’s ungrammatical or not in the given
contents.
E.g. I want to go to the mall vs. I wanna go to the mall
If the speakers said “wanna”, the transcription will be “wanna” not “want to”.
If a speaker utters the plural form “bonds” in the sample sentence below, transcribe it exactly as
“bonds”: find a bonds with a ten year maturity date

2. Case

Transcriptions are CASE SENSITIVE, unless specified otherwise in the documentation of the database. All
words and names should be transcribed in mixed case.

Proper Noun (e.g. names, addresses, countries, organizations, months and etc.) begins with capital
letter, such as India, Microsoft, Virat, Gurgaon, January, and etc.

Brand names, trademarks are transcribed as their original format including their case form (e.g.
MySpace, Hotmail dot com, KFC, IBM, NASA, Amazon, Flipkart, etc).

If not special specified, a word should not be capitalized just because it’s at the beginning of a sentence.
Words should only be capitalized if they are usually capitalized in mid-sentence.

3. Number Sequences. Numbers should be in words.

Number sequences (flight numbers, times, dates, aircraft types, money amounts, etc.) will be spelled out
to reflect what was said ("flight six one three"; "seven thirty"; "August twenty first"; "seven forty seven";
"four hundred and ten dollars".)
If digits have alternate dictionary forms (e.g. "zero" or "oh" or "naught" in English), the correct
alternative should be used that reflects the form actually pronounced.

Long numbers may be written together, or with blanks between parts in order to reduce the lexicon size.

4. Letter Sequences

Letter sequences occur in spelled words, ZIP-codes, acronyms and abbreviations ("D F W"; "A P slash
eighty"; "P M"; "C O"; “I B M” etc.) Letters should be in upper case, separated by a space.

The AM and PM of times (e.g., "five thirty P M") will be treated as examples of letter sequences, i.e.,
upper case and separated by a space.

Example: my name is mister Tom T O M (here speaker is saying his name and also spells that out)

5. Acronyms

Acronyms refer to terms based on the initial letters of their various elements and are spoken as words.
They should be transcribed as words in upper case without white spaces between the letters

E.g.

"I work for NASA."

"AIDS has a great impact on society."

6. Abbreviations

Do not introduce abbreviations in the transcription. Always use the spelled-out form (full word) when
pronounced as such.

E.g.

“This is Dr. Smith.” = “this is doctor Smith.”

"Mrs. Smith this way please." = "missus Smith, this way please."

“Then they drove to St. Paul.” = “then they drove to Saint Paul.”

7. Punctuation

Use punctuation as required by the grammar rules.

 Use end-punctuations (full stop, question mark, exclamation mark) to indicate the end of a complete
sentence.

 Use punctuation symbols that are essential part of the word, such as apostrophes and hyphens.
 Use commas to break up long stretches of speech. This is to facilitate reader comprehension.
 AVOID: semi-colons, quotation marks
If someone speaks a special character, replace the character with the corresponding word (lower case).
This should only be done when it is certain the user has spoken a character. The transcription should
reflect exactly what was said.

E.g.

“Pictures + Camera” = “pictures and camera”

"My email is m-@" = "my email is M dash golden at."

"1 + 1 = 2." = "one plus one equals two."

8. Unintelligible

Words Unintelligible speech, words or stretches of speech that are completely unintelligible, were
transcribed by “**”. The “**” marker is separated from neighbouring intelligible words with spaces.

E.g.

“Stop the mu… the music.” = “stop the ** the music.”

"Play ??? on Spotify." = "play ** on Spotify."

9. Filler Words

Filler words are “words” that speakers use to indicate hesitation or to maintain control of a conversation
while thinking of what to say next. Each language has a limited set of filler words that speakers can use.
The spelling of filler words should not be altered to reflect how the speaker pronounces the word, and
each filler word should be preceded with a hashtag (#).

E.g.

“but #um I like it.”

"#hmm perhaps you're right."

“#ah I’ve got it.”

10. Non-Speech Acoustic Events

Five categories of non-speech acoustic events must be transcribed. Events will only be transcribed if they
are clearly distinguishable. Very low-level, i.e. non-intrusive events will be ignored.
The event will be transcribed at the place of occurrence, using the defined symbols in angle brackets. For
noise events that occur over a span of one or more words, the transcription should indicate the beginning
of the noise, just before the first word it affects.

The first category of acoustic events <SPK/> originate from the speaker, and the other categories
originate from another source. Sounds originating from the speaker usually do not overlap with the
target speech, while sounds originating from other sources could of course occur simultaneously with the
speech.

TAGS Defination Example


<SPK/> Speaker noise: The various sounds <SPK/> Vikas
and noises made by the speaker
that are not part of the prompted
text, e.g. lip smack, cough, grunt,
throat clear, tongue click, loud
breath, laugh, loud sigh. This marker
should also be used in the case that
the speaker blows into the
microphone, before or after a word.
Only loud lip smacks, and breaths
should be transcribed.
<STA/> Stationary noise: This category <STA/> how are you doing ?
contains background noise that is
not intermittent and has a more or
less stable amplitude spectrum over
some time. Examples, voice babble
(cocktail-party noise), background
noise, sirens, wind, rain, loud car
noise from the outside. Music was
also marked as stationary if it was
audible while not by designed. This
mark should be rarely used in quiet
desktop environment.
<NON/> Non human noise: This category there is no evidence to suggest
contains noise of an intermittent <NON/> he received a bribe.
nature. These noises typically occurs
only once like a door slam, dropping
something or mouse clicking.
<NPS/> Non-Primary Speaker: Noise were she is already being placed <NPS/>
transcribed as <NPS/>, Noise of under medical examination for her
other humans like lip smacking, treatment.
coughing, clear throat, tongue click,
load breath, laughing and speech
not from primary speakers are all
included.
TAGS explanation in short:
There will be 8 tags:

1. [SPK]
Speaker's sudden and obvious noise, laughing, coughing, heavy breath. Don’t use in an invalid blank.
This is not frequently used tag.
2. [NON]
Sudden non-human noise, knocking, ringing, bumping table etc... Don’t use in an invalid blank. This
is not frequently used tag.
3. [NPS]
Voice from another person not the speaker.
4. [STA]
Sudden long background noise, wind blow, rain, music etc.; Continuous background noise from the
audio shall be ignored. This is not frequently used tag.
5. [Z]
To indicate the part that somebody’s talking but the whole part is all of unclear and unintelligible
words or a third language;Blank used with this tag shall be invalid. [Z] shall not exist in valid parts. It
appears in a blank alone;
6. [S]
To indicate the part that nobody’s talking. Parts used with this tag shall be invalid. Blank used with
this tag shall be invalid.
[S] shall not exist in valid parts; It appears in a blank alone
7. #
Please use #before a filler. # Does not appear alone ;
8. **
Indicates that a word cannot be heard clearly and cannot be written;

Remark:

 <STA/> and <NON/> should only be used if the sounds are not inherent to the environment as
such.

E.g. in the car a stationary background noise and street noises can be expected as given with
the environment of the recording. These noises should not be transcribed. Only obvious and
salient deviations from the given background should be marked.

<STA/> is usually put in the initial position of the utterances. <NPS/> is NOT preferred to using
in “restaurant” or “street” environment either to mark the other people speaking in vicinity.

 If <SPK/> or <NON/> begins in a word then the symbol was put before the first word affected.
The symbols were always separated from the surrounding words by spaces.
Below is the screenshot of the work portal where you need to work, the work is quite simple and
doable you just need to listen the audio and write what you hear. The audio quality is good. You don’t
need to create any segments as you will get audio where segments are already created. So its simple
you need to listen each segments and write what you hear following the guidelines.

General Work Terms & Condition: -


 Each Individual will be provided with their own ID and expected to work for at-least 20mins to
30mins of transcription per day. Which means minimum 10 audio files daily to maximum no
limit.

 Each ID or Individual should complete at-least 1hours of transcription to claim their first payment

 The project volume is 550hours now and we are planning to onboard a huge number of team
members, so if you have team or friends who can work along with you, they are most welcome.

 Payment will be done monthly in 30days via UPI/NEFT/Bank Transfer.

You might also like