0% found this document useful (0 votes)

171 views6 pages

Speech Ocean Guidelines

The document provides guidelines for transcribing audio recordings from Hindi language sources. It outlines 10 general rules for transcription, including transcribing what is said verbatim, capitalizing proper nouns and acronyms, spelling out numbers and letters, punctuating according to grammar rules, and indicating unintelligible speech, filler words, and non-speech acoustic events using tags. Transcribers are expected to work online, listen to 2-3 minute audio clips in Hindi and submit accurate transcriptions for quality review.

Uploaded by

Ayushi Rajput

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

171 views6 pages

Speech Ocean Guidelines

Uploaded by

Ayushi Rajput

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

SPEECHOCEAN & SHININGSPIN CONFIDENCIAL

Speech-Ocean Transcription Guidelines- Hindi

Transcription Procedure & Requirements
The following recommendations and requirements are made for the transcription procedure:
1. You should use good quality headsets or earphone to listen audio properly also transcriptions should
be made in a quiet environment to maintain your focus.
2. You should have a decent internet to work on as this is online work, where you get audio in our portal
which you need to listen and write exactly the same.
3. Most Important: You should know Hindi very well because you will get audio in Hindi which you need
to listen and write the same.
4. Each audio will be of 2-3 minutes only which you need to submit and start working on next audio
immediately. The audio which you submit will go for Quality Check and if quality team will find any error,
they will revert back that particular audio for redo or rework.

General Transcription Rule:

1. Transcriptions should reflect what the user really says. This means you need to write what you hear.

This is not necessarily what the formal version of the word is or it’s ungrammatical or not in the given
contents.
E.g. I want to go to the mall vs. I wanna go to the mall
If the speakers said “wanna”, the transcription will be “wanna” not “want to”.
If a speaker utters the plural form “bonds” in the sample sentence below, transcribe it exactly as
“bonds”: find a bonds with a ten year maturity date

2. Case

Transcriptions are CASE SENSITIVE, unless specified otherwise in the documentation of the database. All
words and names should be transcribed in mixed case.

Proper Noun (e.g. names, addresses, countries, organizations, months and etc.) begins with capital
letter, such as India, Microsoft, Virat, Gurgaon, January, and etc.

Brand names, trademarks are transcribed as their original format including their case form (e.g.
MySpace, Hotmail dot com, KFC, IBM, NASA, Amazon, Flipkart, etc).

If not special specified, a word should not be capitalized just because it’s at the beginning of a sentence.
Words should only be capitalized if they are usually capitalized in mid-sentence.

3. Number Sequences. Numbers should be in words.

Number sequences (flight numbers, times, dates, aircraft types, money amounts, etc.) will be spelled out
to reflect what was said ("flight six one three"; "seven thirty"; "August twenty first"; "seven forty seven";
"four hundred and ten dollars".)
If digits have alternate dictionary forms (e.g. "zero" or "oh" or "naught" in English), the correct
alternative should be used that reflects the form actually pronounced.

Long numbers may be written together, or with blanks between parts in order to reduce the lexicon size.

4. Letter Sequences

Letter sequences occur in spelled words, ZIP-codes, acronyms and abbreviations ("D F W"; "A P slash
eighty"; "P M"; "C O"; “I B M” etc.) Letters should be in upper case, separated by a space.

The AM and PM of times (e.g., "five thirty P M") will be treated as examples of letter sequences, i.e.,
upper case and separated by a space.

Example: my name is mister Tom T O M (here speaker is saying his name and also spells that out)

5. Acronyms

Acronyms refer to terms based on the initial letters of their various elements and are spoken as words.
They should be transcribed as words in upper case without white spaces between the letters

E.g.

"I work for NASA."

"AIDS has a great impact on society."

6. Abbreviations

Do not introduce abbreviations in the transcription. Always use the spelled-out form (full word) when
pronounced as such.

E.g.

“This is Dr. Smith.” = “this is doctor Smith.”

"Mrs. Smith this way please." = "missus Smith, this way please."

“Then they drove to St. Paul.” = “then they drove to Saint Paul.”

7. Punctuation

Use punctuation as required by the grammar rules.

 Use end-punctuations (full stop, question mark, exclamation mark) to indicate the end of a complete
sentence.

 Use punctuation symbols that are essential part of the word, such as apostrophes and hyphens.
 Use commas to break up long stretches of speech. This is to facilitate reader comprehension.
 AVOID: semi-colons, quotation marks
If someone speaks a special character, replace the character with the corresponding word (lower case).
This should only be done when it is certain the user has spoken a character. The transcription should
reflect exactly what was said.

E.g.

“Pictures + Camera” = “pictures and camera”

"My email is m-@" = "my email is M dash golden at."

"1 + 1 = 2." = "one plus one equals two."

8. Unintelligible

Words Unintelligible speech, words or stretches of speech that are completely unintelligible, were
transcribed by “**”. The “**” marker is separated from neighbouring intelligible words with spaces.

E.g.

“Stop the mu… the music.” = “stop the ** the music.”

"Play ??? on Spotify." = "play ** on Spotify."

9. Filler Words

Filler words are “words” that speakers use to indicate hesitation or to maintain control of a conversation
while thinking of what to say next. Each language has a limited set of filler words that speakers can use.
The spelling of filler words should not be altered to reflect how the speaker pronounces the word, and
each filler word should be preceded with a hashtag (#).

E.g.

“but #um I like it.”

"#hmm perhaps you're right."

“#ah I’ve got it.”

10. Non-Speech Acoustic Events

Five categories of non-speech acoustic events must be transcribed. Events will only be transcribed if they
are clearly distinguishable. Very low-level, i.e. non-intrusive events will be ignored.
The event will be transcribed at the place of occurrence, using the defined symbols in angle brackets. For
noise events that occur over a span of one or more words, the transcription should indicate the beginning
of the noise, just before the first word it affects.

The first category of acoustic events <SPK/> originate from the speaker, and the other categories
originate from another source. Sounds originating from the speaker usually do not overlap with the
target speech, while sounds originating from other sources could of course occur simultaneously with the
speech.

1. [SPK]
Speaker's sudden and obvious noise, laughing, coughing, heavy breath. Don’t use in an invalid blank.
This is not frequently used tag.
2. [NON]
Sudden non-human noise, knocking, ringing, bumping table etc... Don’t use in an invalid blank. This
is not frequently used tag.
3. [NPS]
Voice from another person not the speaker.
4. [STA]
Sudden long background noise, wind blow, rain, music etc.; Continuous background noise from the
audio shall be ignored. This is not frequently used tag.
5. [Z]
To indicate the part that somebody’s talking but the whole part is all of unclear and unintelligible
words or a third language；Blank used with this tag shall be invalid. [Z] shall not exist in valid parts. It
appears in a blank alone;
6. [S]
To indicate the part that nobody’s talking. Parts used with this tag shall be invalid. Blank used with
this tag shall be invalid.
[S] shall not exist in valid parts; It appears in a blank alone
7. #
Please use #before a filler. # Does not appear alone ；
8. **
Indicates that a word cannot be heard clearly and cannot be written;

Remark:

 <STA/> and <NON/> should only be used if the sounds are not inherent to the environment as
such.

E.g. in the car a stationary background noise and street noises can be expected as given with
the environment of the recording. These noises should not be transcribed. Only obvious and
salient deviations from the given background should be marked.

<STA/> is usually put in the initial position of the utterances. <NPS/> is NOT preferred to using
in “restaurant” or “street” environment either to mark the other people speaking in vicinity.

 If <SPK/> or <NON/> begins in a word then the symbol was put before the first word affected.
The symbols were always separated from the surrounding words by spaces.
Below is the screenshot of the work portal where you need to work, the work is quite simple and
doable you just need to listen the audio and write what you hear. The audio quality is good. You don’t
need to create any segments as you will get audio where segments are already created. So its simple
you need to listen each segments and write what you hear following the guidelines.

General Work Terms & Condition: -

 Each Individual will be provided with their own ID and expected to work for at-least 20mins to
30mins of transcription per day. Which means minimum 10 audio files daily to maximum no
limit.

 Each ID or Individual should complete at-least 1hours of transcription to claim their first payment

 The project volume is 550hours now and we are planning to onboard a huge number of team
members, so if you have team or friends who can work along with you, they are most welcome.

 Payment will be done monthly in 30days via UPI/NEFT/Bank Transfer.

Updated Verbit General Guidelines-1
No ratings yet
Updated Verbit General Guidelines-1
8 pages
Transcription Guidelines Go Transcript
67% (3)
Transcription Guidelines Go Transcript
14 pages
Eura English Transcription Guidelines 2024 - ADAP QF
No ratings yet
Eura English Transcription Guidelines 2024 - ADAP QF
25 pages
Transcription Guidelines
No ratings yet
Transcription Guidelines
12 pages
Job 2 Guidelines
No ratings yet
Job 2 Guidelines
9 pages
Forester Spanish (Mexico) Transcription Guidelines
No ratings yet
Forester Spanish (Mexico) Transcription Guidelines
22 pages
Text Format Descriptions: Full Verbatim
No ratings yet
Text Format Descriptions: Full Verbatim
10 pages
Transcriptionformat
No ratings yet
Transcriptionformat
14 pages
Annotation Guidelines 2021.6.15
No ratings yet
Annotation Guidelines 2021.6.15
8 pages
Data Annotation Guideline
No ratings yet
Data Annotation Guideline
8 pages
General Transcription Guidelines Localized - en UK
No ratings yet
General Transcription Guidelines Localized - en UK
11 pages
longform_transcription_conventions_en_US..su
No ratings yet
longform_transcription_conventions_en_US..su
12 pages
avert_transcription_style_guide_1.0
No ratings yet
avert_transcription_style_guide_1.0
16 pages
Transcription Guidelines
No ratings yet
Transcription Guidelines
13 pages
Transcription Guidelines en Ver2-9 05291019
No ratings yet
Transcription Guidelines en Ver2-9 05291019
12 pages
Clean Verbatim Guidelines Final
No ratings yet
Clean Verbatim Guidelines Final
23 pages
Paypal Payoneer Paypal Payoneer: Example
No ratings yet
Paypal Payoneer Paypal Payoneer: Example
5 pages
Aragorn Training Document
No ratings yet
Aragorn Training Document
34 pages
Transcription (Complex) Guidelines
No ratings yet
Transcription (Complex) Guidelines
10 pages
CrowdSurf General Guidelines
100% (1)
CrowdSurf General Guidelines
26 pages
(En - US) Jungle Transcribe Long-Form Transcription Conventions
No ratings yet
(En - US) Jungle Transcribe Long-Form Transcription Conventions
23 pages
Transcription Guide 20171117
No ratings yet
Transcription Guide 20171117
11 pages
Transcriptionformat
No ratings yet
Transcriptionformat
14 pages
Assignment Grouping BST256
No ratings yet
Assignment Grouping BST256
27 pages
General Guidelines Part 1
No ratings yet
General Guidelines Part 1
11 pages
SJJ Hindi Transcription
No ratings yet
SJJ Hindi Transcription
9 pages
game 外语视频标注规范
No ratings yet
game 外语视频标注规范
6 pages
Quebec Accent French Colloquial Video Speech Transcription
No ratings yet
Quebec Accent French Colloquial Video Speech Transcription
6 pages
Style Guide For Test
No ratings yet
Style Guide For Test
6 pages
T104 - TranscribeMe Style Guide Version 3.1 20200708 PDF
No ratings yet
T104 - TranscribeMe Style Guide Version 3.1 20200708 PDF
23 pages
VOICE Mark-Up Conventions v2-1 PDF
No ratings yet
VOICE Mark-Up Conventions v2-1 PDF
8 pages
Transcription Guidelines FAAV
No ratings yet
Transcription Guidelines FAAV
16 pages
Shujiajia Audio Transcription & QA
No ratings yet
Shujiajia Audio Transcription & QA
6 pages
Guide For Transcription PDF
0% (1)
Guide For Transcription PDF
11 pages
Transcription Guidelines
100% (1)
Transcription Guidelines
12 pages
Tigerfish Transcription Style Guide 1.06
No ratings yet
Tigerfish Transcription Style Guide 1.06
7 pages
Guide For Transcribing Audio Records: July 2018
No ratings yet
Guide For Transcribing Audio Records: July 2018
8 pages
Transcription Guidelines: Last Updated: 05292019
No ratings yet
Transcription Guidelines: Last Updated: 05292019
11 pages
Gujarat (standard language) specification
No ratings yet
Gujarat (standard language) specification
6 pages
Transcription Skills Style Guide
No ratings yet
Transcription Skills Style Guide
4 pages
Transcription Coaching
80% (10)
Transcription Coaching
14 pages
Guidelines: Naming Finished Files
No ratings yet
Guidelines: Naming Finished Files
6 pages
General Transcription Guidelines English Sept2023
No ratings yet
General Transcription Guidelines English Sept2023
10 pages
Main Style Guide For Transcribing: The Basics
No ratings yet
Main Style Guide For Transcribing: The Basics
4 pages
Casting Words Guidelines
No ratings yet
Casting Words Guidelines
1 page
General Guidelines Updated Version
100% (1)
General Guidelines Updated Version
6 pages
Transcription Style Guide
No ratings yet
Transcription Style Guide
15 pages
NX
No ratings yet
NX
776 pages
Carneros Transcription Guidelines - Updated 20210727
No ratings yet
Carneros Transcription Guidelines - Updated 20210727
29 pages
HIAT Transcription Conventions
No ratings yet
HIAT Transcription Conventions
7 pages
Pactera Transcription Guidelines
100% (3)
Pactera Transcription Guidelines
4 pages
Indic Written Domain Conversion
No ratings yet
Indic Written Domain Conversion
10 pages
Transcription Rules - English Version
No ratings yet
Transcription Rules - English Version
7 pages
wheeler2001 mechanism and activation behaviour
No ratings yet
wheeler2001 mechanism and activation behaviour
30 pages
NCC Standard Guidelines 10.20.17
No ratings yet
NCC Standard Guidelines 10.20.17
19 pages
Abtipper - Dresing Und Pehl - Einfache Transkription - ENG - Freelancer
No ratings yet
Abtipper - Dresing Und Pehl - Einfache Transkription - ENG - Freelancer
7 pages
What Do We Do?: We Provide Audio Transcription Services, Which Means That We Convert Audio and Video Files Into Text
No ratings yet
What Do We Do?: We Provide Audio Transcription Services, Which Means That We Convert Audio and Video Files Into Text
12 pages
Guideline
No ratings yet
Guideline
4 pages
Andrea
100% (5)
Andrea
29 pages
Transcription Requirements AA
No ratings yet
Transcription Requirements AA
11 pages
Transcription Guidelines
No ratings yet
Transcription Guidelines
7 pages
Tiktok Project Rules: Audio Characteristics
No ratings yet
Tiktok Project Rules: Audio Characteristics
7 pages
Week 1 - Non-traditional Security - Dhs3343[1]
No ratings yet
Week 1 - Non-traditional Security - Dhs3343[1]
11 pages
Math 8 Quarter 1 Week 4
No ratings yet
Math 8 Quarter 1 Week 4
7 pages
Study The Given Information Carefully To Answer The Given Questions
No ratings yet
Study The Given Information Carefully To Answer The Given Questions
121 pages
BK_SMIT_000004 speed Read Anything
No ratings yet
BK_SMIT_000004 speed Read Anything
16 pages
Kreatryx Signals & Systems
100% (1)
Kreatryx Signals & Systems
30 pages
Production & Operation Management
No ratings yet
Production & Operation Management
13 pages
Ejemplo de Ensayo de Explicación de Poema
100% (1)
Ejemplo de Ensayo de Explicación de Poema
8 pages
Ncjemasa 04
No ratings yet
Ncjemasa 04
22 pages
Transcription Guidelines For GoTranscript and Rev
No ratings yet
Transcription Guidelines For GoTranscript and Rev
1 page
1.5 Hour Time Table Spring 2025-1
No ratings yet
1.5 Hour Time Table Spring 2025-1
1 page
Theories of Grief and Attachment
No ratings yet
Theories of Grief and Attachment
4 pages
6TH Social Cba-3 (2023-24)
No ratings yet
6TH Social Cba-3 (2023-24)
6 pages
Introduction To Technical Writing
No ratings yet
Introduction To Technical Writing
13 pages
LAW OF CONTRACT Module 2
No ratings yet
LAW OF CONTRACT Module 2
19 pages
Law of Torts Defence of Consent I Volenti Non Fit Injuria
No ratings yet
Law of Torts Defence of Consent I Volenti Non Fit Injuria
23 pages
Law of Torts Introduction
No ratings yet
Law of Torts Introduction
17 pages
Session 3: Ipc & CRPC, Difference Between Bailable Vs Non-Bailable, Cognizable Vs Non-Cognizable, Compounded Vs Non-Compounded Offences
No ratings yet
Session 3: Ipc & CRPC, Difference Between Bailable Vs Non-Bailable, Cognizable Vs Non-Cognizable, Compounded Vs Non-Compounded Offences
8 pages
Audio Processor: NJW1138M NJW1138L
No ratings yet
Audio Processor: NJW1138M NJW1138L
21 pages
Law of Tort - 1.1: Torts-Nature & Meaning
No ratings yet
Law of Tort - 1.1: Torts-Nature & Meaning
6 pages
C# Basic Program
No ratings yet
C# Basic Program
17 pages
Jenkins Pipeline For Angular App
No ratings yet
Jenkins Pipeline For Angular App
26 pages
Differential Eqautions
No ratings yet
Differential Eqautions
45 pages
AMA 2005 Electronic Monitoring & Surveillance Survey
No ratings yet
AMA 2005 Electronic Monitoring & Surveillance Survey
13 pages
VRIP - Innovation Center For The Biobío Region
No ratings yet
VRIP - Innovation Center For The Biobío Region
4 pages
M16
No ratings yet
M16
2 pages
Annexure B Proforma of Details To Be Furnished by The Accused at The Time of Bail
No ratings yet
Annexure B Proforma of Details To Be Furnished by The Accused at The Time of Bail
1 page
CS276 PA1 Report Rukmani Ravi Sundaram Tayyab Tariq 1.description of The Structure of The Program: Index - Py
No ratings yet
CS276 PA1 Report Rukmani Ravi Sundaram Tayyab Tariq 1.description of The Structure of The Program: Index - Py
2 pages
Growth Mindset vs. Fixed Mindset
No ratings yet
Growth Mindset vs. Fixed Mindset
1 page
Food Chemistry: R. Sahraei, A. Farmany, S.S. Mortazavi
No ratings yet
Food Chemistry: R. Sahraei, A. Farmany, S.S. Mortazavi
4 pages
Shear Strength in Concrete Joints
No ratings yet
Shear Strength in Concrete Joints
2 pages
Diagnostic Test1
No ratings yet
Diagnostic Test1
8 pages
Rhapsody Handout Statecharts Sequence Diagrams v1r1
No ratings yet
Rhapsody Handout Statecharts Sequence Diagrams v1r1
8 pages
Atik Specifications
No ratings yet
Atik Specifications
4 pages
Dr Chester's Spoken English for Chinese Speakers: Numbers
From Everand
Dr Chester's Spoken English for Chinese Speakers: Numbers
Myrvin Chester
No ratings yet

Speech Ocean Guidelines

Uploaded by

Speech Ocean Guidelines

Uploaded by

SPEECHOCEAN & SHININGSPIN CONFIDENCIAL

Speech-Ocean Transcription Guidelines- Hindi

General Transcription Rule:

3. Number Sequences. Numbers should be in words.

"I work for NASA."

"AIDS has a great impact on society."

“This is Dr. Smith.” = “this is doctor Smith.”

Use punctuation as required by the grammar rules.

“Pictures + Camera” = “pictures and camera”

"My email is m-@" = "my email is M dash golden at."

"1 + 1 = 2." = "one plus one equals two."

“Stop the mu… the music.” = “stop the ** the music.”

"Play ??? on Spotify." = "play ** on Spotify."

“but #um I like it.”

"#hmm perhaps you're right."

“#ah I’ve got it.”

10. Non-Speech Acoustic Events

TAGS Defination Example

General Work Terms & Condition: -

 Payment will be done monthly in 30days via UPI/NEFT/Bank Transfer.

You might also like