Transcript Style | Editor supplement | Sample Transcript | Rare Situations Supplement

Main style guide for transcribing and editing: the basics

How to use this guide

It’s crucial that you read this guide over completely before beginning your first transcription HIT with CastingWords (CW). There is, quite honestly, no way you can get our highest bonuses if you do not. This goes even if you have done other Turk transcription work before.

We know this guide is long. However, it is a complete reference for matters of CW style, and we have tried to make it as clear and easy to use as possible by breaking it into sections. You are welcome to make a personal quick-reference guide, and we encourage you to do so. Creating one for yourself is always a great test of whether you understand a particular style well.

Besides the headings, everything that we show in boldface in this style guide is a specific example of correct style for CW. You can follow those examples exactly.

What we need you to know about CW style

Confidentiality is of utmost importance. Do not ever post to the Internet any transcripts or audio that you receive or create for us, or post these anywhere else that they’ll be accessible by anyone besides you and CW.

Beyond that, the special instructions on the coversheet of the HIT are the second most important determiner of how you handle the transcript. These are our rules for those.

● No special instructions can ever counteract certain CW requirements, so always continue to do the following four items, even if special instructions request that you do otherwise. (And let CW know if the special instructions conflict with these.)

● All instructions in this style guide and other style documents assume that there are no special instructions otherwise. Aside from the exceptions above, when your special instructions contradict the style here, follow the special instructions. Otherwise, not following them is reason to downgrade your transcript.

● If the special instructions seem very unusual for CastingWords HITs, contact CW to ask. Sometimes, clients leave instructions that CW has not actually seen. However, as long as the unusual special instructions don’t violate the exceptions given above, go ahead and follow them until you hear back. You can submit your transcript before you get a response. Just note in the comments that you requested clarification about those instructions.

The transcript you submit must meet the following criteria, or we may reject it:

Non-verbatim transcripts: must be cleaned up, not verbatim (many examples throughout this guide). Our standard transcription jobs are all non-verbatim.

● Uses standard English, not netspeak or phonetics (amirite? cuz u know its not profeshnul);

● Does not paraphrase (do not type the gist of what you heard, but the words actually used).

● Includes nothing typed in that wasn’t on the audio, including your comments or the job number or title (there is a separate area in the HIT for you to type your comments); and

● Uses word wrap.

Other major rules

Do not make up words. There are two ways in which we mean this:

  1. Do not spell words phonetically. All words should be spellchecked and must be actual English words, unless the speaker was deliberately making up words, such as "what awesome majorness!" Otherwise, your transcript will be heavily downgraded for using phonetic spelling, such as “coco van” for “coq au vin.” (More on this in point 2.)

  2. Do not include words just because they sound similar to the syllables that were spoken. This is one of the co-owners’ pet peeves. These tips should prevent that problem:

○ Read your transcript before submitting, as if you were reading an article or story. If the words you used do not make sense in each sentence, they are probably not the words the speaker was saying.

○ For example, is it really likely that a TSA officer repeatedly talked about "hot gowns" while discussing his work? That may sound similar to what was said, but it’s more likely that “pat-downs” was his phrase, since that’s what he probably does all day. Same with “coco van,” when it would make much more sense for that chef being interviewed to be talking about “coq au vin.” Always consider the context of the conversation; use common sense to interpret the audio.

Tag any words that you are uncertain about or can’t get. It is not enough to mention in the job’s Comments that words are missing or that you are unsure. You must tag the spots where they should have been or where you don’t think you have the right word(s).

● If you give it a few tries but the words you’re hearing still don’t make sense for the transcript, use [xx] or [?] instead. See the Tags section of this guide for more details about how the two should be used, but we prefer you use such tags instead of just leaving in words that sound close but don’t make sense.

● Your transcript won’t be downgraded for using [xx] or [?] unless it’s clear that you’re doing it to avoid making a little effort.

Use all the information you have available. There are a few major ways in which you can get extra information, and we ask that you use them:

● The coversheet for the HIT may contain clues such as the speakers’ names or the correct spelling for certain terms mentioned in the audio. Not including information from the coversheet in the transcript may get your work downgraded, so please check that coversheet!

● The audio itself can give you new information. For example, if at the end, the interviewer says, "Thanks, Dave, for this interview!", and the interviewee’s response clearly indicates that he is Dave, then you must go back and relabel the interviewee as Dave: throughout the transcript. See the Sample Transcript for an example of this having been done.

● Use our Word List for correct spellings of various common and not-so-common terms, and check our Rare Situations Supplement if something comes up you’re not sure how to handle. Our Editor Supplement *is useful if you’re doing our editor work, and we’ve designed our Sample Transcript to be useful to everyone. If all else fails, contact CW directly to get your question answered. We are happy to help, but it is important to us to see that you have used our documentation before asking.

Remove filler words, including "ah, er, um, uh, mm-hmm," unless they are absolutely necessary to indicate meaning. Do the same for “And, But, Or, So” when they start a sentence, as well as for whatever other words a particular speaker uses (some use “like” and others use “Yeah,” for example) to start a sentence without adding meaning.

● When one person is speaking and another says nothing but "uh-huh" or “mm-hmm” in the meantime, leave out all those murmuring noises as long as they aren’t an answer to a question from the speaker. Do this: “I went downtown today.” “Uh-huh.” “I like it there.” “Mm-hmm.” “And I think I’m getting better at finding parking spots!” becomes just I went downtown today. I like it there. I think I’m getting better at finding parking spots!, because none of what the other person said was important...and neither was the starting “And”.

● If the "Mm-hmm" is in response to a direct question from the speaker, and the speaker waited for that answer, then include it to keep the transcript making sense. See the John-Jerry conversation in the Sample Transcript for details.

Don’t include false starts, unless they add information to the transcript that will be missing otherwise. Compare these examples:

● "What did I do with the dog’s...I need to get to the bank before it closes" becomes What did I do with the dog’s...I need to get to the bank before it closes, because the false start adds information that the rest of the sentence does not, so it should be included.

● "I need to...need, um...I need to get to the bank before it closes" becomes I need to get to the bank before it closes, because the false start added no new information.

Verbatim Jobs

Verbatim transcripts: write down every utterance exactly as you hear it. Transcribe stutters as accurately as possible. Do not leave out any filler words unless a customer specifically instructs you to do so -- the customer may request any level of verbatim for a transcription -- if any, they will be in the "Notes:" section of the job. Our verbatim jobs are clearly marked as being verbatim.


● Word wrap must be turned on.

○ Do not use software that does not have word wrap.

○ Surefire clue that you didn’t have word wrap on: the transcript looks wrongly formatted when pasted into the HIT, even if it looked OK on your screen before. Reformat in a word processor before submitting.

● Do not indent paragraphs.

● Start every sentence with a capital letter.

● Put a blank line between paragraphs.

● Each change in speakers should be placed on a new line. Add a blank line before the changed speaker.

● Break sentences into smaller ones where possible, always trying to maintain grammatically complete sentences rather than creating sentence fragments. However, if a sentence is ridiculously long, go ahead and break it into fragments.

● Use short paragraphs. Paragraphs should be no more than 400 characters each, which is just a couple or three average sentences.

Speaker labels

● Standard format: a complete speaker label includes a colon after the label as well as a space after that colon. Do this: Woman 2:

● CW calls these "speaker labels," not “speaker tags.” We consider only items that have brackets around them to be tags.

● When to use: whenever the speakers change, or whenever something happens on a separate line (like [laughter]) in the middle of a person speaking (even if the same person keeps speaking after).

● Order of preference for labels: use names whenever possible, then roles, then use gender as the last resort.

● Full names: when you have information about a speaker’s full name (from the coversheet or because they state their name or are announced by name), use that the first time they appear in your audio chunk. After that, use only their first name. If they have a title, always use their title, but after they first show up, use only their first name OR their last name with the title, depending on your best guess about which one that particular person prefers. Like this: Dr. Jane Michaels: the first time becomes Dr. Michaels: later, or Pastor Linda Thomas: becomes Pastor Linda: later.

● Never use: Speaker: , Speaker 1: , Female: , Lady: , Male: , Dude: , People: . Nothing like these is OK.

● Descriptiveness: make each speaker’s label as informative as possible about the person’s role in the audio. Except in the case of large groups (see special subsection, later in this section), labels must be useful for telling one person from another. Woman 1: is acceptable, but Interviewer: or Host: is much better. Other roles that may apply (use your judgment): Congregant:, Audience Member: , Passerby: , Announcer: , Interviewee: .

● Adding gender: use Male and Female only as adjectives for roles, never by themselves. Only mention gender at all if people of different genders have the same role in the audio. Like this: Male Host: , Female Host: , but two female hosts would just be Host 1: and Host 2: .

● Adding numbers and cutting down on clutter: Always use numbers with "Man" or “Woman” labels. Do not use numbers if the speaker has a role other than just “Man” or “Woman,” unless the audio includes two or more people of the same gender who are playing that same role. Like this: Woman 1: and Man 2: , or (if there are two male hosts and one female one) Female Host: , Male Host 1: , and Male Host 2:.

Special rules for speaker-labeling a large group

Audience: is the label for an audience as a whole, unless they are gathered in a church or other place of worship, which makes them a Congregation:.

● A single member of that group is labeled Audience Member: or Congregant: .

● If there are already two or more other speakers in your audio, don’t worry about telling the audience or congregation apart. Each one will just be Audience Member: or Congregant: , with no numbers or gender needed.

● However, if there is only one main speaker on the audio, then be more detailed in specifying the first two group members who speak. We prefer you do this by mentioning gender (if they have different genders from each other): Female Audience Member:. If they are both the same gender, then add a number to their labels instead: Audience Member 2: .

Special rules for speaker-labeling jobs with the tags, Focus Group, Panel or Round Table

● The group leader is to be labeled Facilitator.

● The participants are to be labeled Female Participant or Male Participant.


Type 1: Use these when you can’t get a word/phrase

General rules

● Do not use square brackets around a word because you aren’t sure about it, or to highlight spoken word(s) for any other reason. Only use square brackets with our designated tags and with noise descriptions, all of which are detailed in this section.

● Use the [xx] tag when you feel unable to even guess at what was said. We do not penalize you for using [xx], unless you seem to do it for words that sound clear or obvious to us. This tag shows us that you knew words were there but couldn’t hear them. It also shows us that you aren’t just randomly leaving words out, which is really important to your grades.

● Use [?] when you are willing to guess at what you’re hearing, but your best guess makes the sentence into nonsense. It’s your way of showing us when you know something looks like nonsense. It’s important to us that you be able to tell!

● Both [?] and [sp] go before the word(s) you’re not sure about -- this alerts the editor that they're about to see a word that needs to verified or corrected. Each must be used at every instance of those words. If the same word or phrase repeats five times in the transcript, tag it all five times anyway.

● [crosstalk] is used when you weren’t able to get all of someone’s words because another speaker talked over them. Please get as much of what is said as possible, but go ahead and resort to this tag when you must.

Tags list for things you couldn’t make out, with brief definitions

[xx]: Indecipherable audio that seems to be in English.

[foreign word]: Word (or [foreign words]) was spoken in a language that isn’t English.

[?]: This is your best guess about the word or words, but it doesn’t really make sense in the sentence, so you’d like someone to give it a close look. (He hurt his knee playing [?] Monopoly.)

[sp]: This is the right word, but you tried Googling and still can’t confirm your spelling (should be rare).

[crosstalk]: Two or more speakers are talking over one another. Include ellipses to show where one person stopped being intelligible and the other started being intelligible; see the John and Jerry conversation in the Sample Transcript for an example.

Type 2: Use these for sound events

● When there are no real speakers: [background sounds only], [background conversations] or [silence] are completely OK to use as the complete transcript for an audio. But if that happens, also leave CW a comment to confirm that’s what you mean. If the audiofile is completely silent, email us at Support before submitting -- it may be an incomplete or corrupted audiofile.

● Signs that an audiofile may be corrupted include; all static; high-pitched squealing; high-speed, high-pitched voices, etcetera.

● Simply put, if it's out of the ordinary, email Support.

● Indicating how the audio starts: Most of our audio starts abruptly, so don’t indicate this by adding tags, thank you. Just use an ellipsis to indicate that the speaker was in the middle of talking at the audio’s start.

● Acceptable for ending a transcript: [cuts off]. However, we prefer you use an ellipsis.

● Two important tags that aren’t sounds but are necessary to show how the speaker meant something, to help the transcript make sense: [sarcastically], [jokingly].

● Tags to show events that the speakers may be reacting to or speaking over: [applause], [phone rings], [siren], and anything else that comes up. You should make up your own tags here as needed, and make them specific: [noise] is too vague, for example. Only use these tags in the following two cases:

■ a single speaker [laughs] (this tag always appears inline with dialogue); two or more people makes it [laughter] (always appears on its own line, with one line of space above and below).

[music] includes all music that occurs at the beginning, end, or anywhere else in the audio that nobody is talking over. If it is talked over, then it’s [background music]. Usually, background music only appears in the middle, because if no one’s speaking over it, it’s not actually "background."

● No need to tag or otherwise acknowledge: Coughs and sneezes. You also don’t need to transcribe the actual contents of a commercial break that shows up on the audio. Just tag as [commercial break].

● Finally, don't be wordy -- keep tags as simple as possible. Do this: [fire alarm], not this: [alarm for fire drill goes off].

Other Audio Problems -- what to check before reporting

● If the audio is all silence or there is talking only on the left or right channel, i.e., you hear only silence where a question or response should be -- check all of your audio settings, connections, cables and equipment (headphones, speakers) before sending a report to Support -- it may be a stereo recording where all the talking is on the left (or right) and silent on the right (or left).

● If all settings, connections and equipment check out, and there is still silence where there would normally be talking, it could be an improperly converted QuickTime MOV format -- do not transcribe, report it immediately, then wait for us to give you the OK to return the job (if you return without our OK, it might be reposted with the silence).

Punctuation and grammar


Semicolons: too formal for most transcripts, so they should only be used rarely.

Colons: Use only for speaker labels (Man 1:), for times of day (2:45 PM or 3:00 AM), or Bible (or similar) verses, e.g., "Ecclesiastes 3:1-8". Colons must always have a space after them when they end a speaker label. However, for CW’s purposes, they must not have spaces after them anywhere else. So, in cases other than the two mentioned above, find other punctuation to use instead.

● Hyphens (-) are only for joining words, numbers and other symbols. They are not an all-purpose substitute for em dashes, ellipses or commas. Never use them to start or end sentences. See the section on basic grammar for more details.

● Use an em dash (--) to place special emphasis on a phrase. Always put a space before and after an em dash. Do not use an em dash where a word/phrase would normally be italicized (italics are disabled on CW). Do not use em dashes where a speaker simply stresses a word. Do this: That point is not relative to this discussion, not this: That point is -- not -- relative to this discussion.

● Use an ellipsis (...)** to mark a break, such as a speaker breaking off mid-sentence or the audio ending in the middle of someone’s sentence. Do not add spaces before or after an ellipsis, except when it comes right after a speaker label. Do this: Man 1: ...said she wasn’t sure about that. But let me think... or (if the audio ends abruptly; we prefer an ellipsis to a tag for that) Interviewee: One thing we must consider is how we...

● Do not include partial words anywhere in a transcript (unless it's a verbatim job), just use an ellipsis before the first or after the last full word.

● Speaking of spacing...Only use a single space between sentences and after speaker labels. Never use double spaces anywhere in a CastingWords transcript unless a customer specifically requests them.

● Do not include periods in abbreviations: TV, US, NAFTA, UK, UN, DC (Washington)

● Put quotation marks (not apostrophes) around the titles of media, including books, magazines, movies, television shows. Only use quotation marks the first time a title appears. Do this: Yesterday, I fell asleep during "The Daily Show." That was very unusual...Most nights I laugh too hard to to drift off during The Daily Show.

● When commas and periods happen right after a quotation, move them inside the quotation marks. Do this: I think my roommate took my copy of “Blender,” because it’s not on my chair now. The position of question marks and exclamation marks varies -- if they are not part of the quotation, they belong outside the quotation marks. Do this: Did you read today’s "New York Times"? If they are part of the quotation do this: My roommate asked me "Did you read today’s "New York Times?"

● Speaking of quotation marks: all quoted lines, sentences, sayings, etc., should be verbatim*, whether or not the job is verbatim or non-verbatim. For many reasons, it's important to be very accurate when it comes to quotations.

Grammar highlights

These are not the only grammar rules you are expected to follow. You should use all the grammar knowledge you have at your disposal. We are just highlighting some important grammar issues here that are especially confusing for transcribers.

Hyphen reminders

● Use with two or more words that precede and modify a noun as a unit, especially if the words include a past participle, a present participle, a single letter, or a number. Do this: line-by-line scrolling, read-only memory, I-beam construction, eight-sided polygon, copy-protected disc, free-moving electrons.

● Do not use a hyphen between an adverb ending in -ly and the word it modifies. Do this: Windows is a highly graphical interface ("highly-graphical" is wrong).

Hints for apostrophes and contractions

● Use of an apostrophe automatically indicates a contraction. If you don’t mean to contract anything, you probably don’t need an apostrophe.

● If you think you usually get these right, reading this section might just confuse you, so don’t worry about it. But if we let you know that you may be having trouble telling apart any of these pairs, please read our hints here.

its versus it's: "It’s" is short for "it is" or "it has." If the sentence makes sense when you substitute "his," then use "its," no apostrophe.

your versus you're: "You're" is short only for "you are." If the sentence makes sense when you substitute "you are," then keep the apostrophe. If you can’t, use "your."

were versus we’re: "We’re" is short for “we are.” If the sentence makes sense when you substitute "we are," then keep the apostrophe. If not, use "were."

lets versus let’s: "Let’s" is always a contraction of “let us”. If the sentence makes sense when you substitute "let us," then keep the apostrophe. If not, use "lets."

Spelling out

Use the spelled-out versions of these words, always:

● going to, want to, have to (not gonna, wanna, haveta -- even if the speaker says them) The exception is quoted material, which should always be verbatim, to protect the integrity of the quote.

● and (not &)

● percent (not %)

Keep every apostrophe-based contraction that the speaker uses. Also, don't contract when the speaker didn't. Do this: "isn't" stays isn't, "don't" stays don't, and "would not" stays would not.

If the speaker is spelling out words letter by letter, please check the Rare Situations Supplement to find out what to do.

Verbatim Transcripts

Numbers: large numbers, fractions, decimals, time and money

Use only words to represent numbers in these cases:

● Positive, whole numbers (numbers with no fractions and no decimals) between zero and nine, including money (which has a special section below) and ages. Do this: zero, six, nine, five years old, eight-year-old girl.

● Really big numbers that have special names. Do this: million, billion, zillion.

● Simple fractions (please add a hyphen, too). Do this: two-fifths.

● The names of mathematical operations, except when the audio is a math lesson (see the Rare Situations Supplement for details). Do this: "square root of eighteen point four" becomes square root of 18.4, because the operation itself stays spelled out.

Change the words to numerals in all other cases:

● All numbers from 10 on up. In numbers of at least four digits, include commas every three digits. Count that starting with the digit farthest to the right, and don’t allow numbers to have any spaces between digits. Do this: "nine thousand" is 9,000, and “thirty thousand” is 30,000. Also, “sixteen year old” is 16-year-old, and “two hundred fifty” is 250.

● Any decimal value, no matter how large. When the number is at least 0.1 but less than 1, put in a 0 to the left of the decimal. Otherwise, don’t add any zeros. Do this: 0.1, .005, 0.927, 1.9874, and 13,350.2.

● All negative values. This is true whether they are decimals or not, and no matter how large they are. Do this: -1, -15, -123.5.

● Times of day and dates. Always capitalize AM and PM. Do this: 2:45 PM, 5:00 AM. In dates, use numerals for the day and year, but spell out the name of the month. Do this: November 11, 2005. This applies to ordinal numbers as well: April 1st, 1562,July 4th, 1776, etc. (The use of "o'clock" by a speaker should not be changed. Do this: I'll meet you at six o'clock, not this: I'll meet you at 6:00)

Exception for number series:

Series spoken as "ten, twenty, or thirty thousand," where the speaker clearly means "10,000, 20,000, or 30,000" should be spelled out more than usual to minimize confusion. Do this: 10, 20, or 30 thousand.

Times in years, including age ranges:

Decades are shown with an apostrophe in front of them but nowhere else. Giving someone's age range as a decade of life involves no apostrophes anywhere. Do this: "They partied during the seventies" becomes They partied during the '70s, "Eighties Flashback Friday" becomes '80s Flashback Friday, and "My grandma is in her sixties" becomes My grandma is in her 60s.

Special rules for money:

● Spell out in words when in quantities of nine or less, or when the amount given is "a" instead of “one” or some other precise value. Do this: three dollars, nine euros, six cents; **and “about a thousand dollars” stays about a thousand dollars.

● Use numerals when a decimal value is given. Do this: "eighteen dollars and five cents" becomes $18.05, and “1.5 million clams” becomes 1.5 million clams.

● With whole numbers, show currency symbols only if the formal name of the currency is stated AND the total quantity of money involved is 10 or more. Do this: two dollars,10 bucks, but "twenty-five hundred dollars" becomes $2,500; *and “5 billion euros” should be *€5 billion.

● When a specific whole number is given along with "hundred" or “thousand,” then use numerals. Do this: “nearly three thousand dollars” becomes nearly $3,000; something like a hundred bucks” stays something like a hundred bucks.

● Use international monetary symbols where appropriate. These include, but are not limited to: $, ¢, £, ¥, . In Windows XP, 7 or 8, symbols that are not on the English keyboard will be in the Character Map. We suggest creating a Desktop shortcut so the map is always at hand. For Mac OS X, Google "monetary symbol keyboard shortcuts for Mac".

● Spell out "million, billion," or “trillion” always. Any number given alongside any of these is spelled out too, unless it’s a decimal value, or a currency is specified, or it’s larger than nine on its own. Do this: “he spent almost five trillion” stays he spent almost five trillion; “two trillion euros were missing” becomes €2 trillion were missing; “one point five billion bucks” becomes 1.5 billion bucks; “eighteen million was in the vault” becomes 18 million was in the vault.

One more special numbers rule

● When two or more related numbers are mentioned in the same sentence, if any one of them is shown as a numeral for style reasons, then show all the others as numerals too, no matter how small they are. Do this: "he had four dollars and she had fifteen dollars" becomes he had $4 and she had $15, “the values were six, two, and point seven five” becomes the values were 6, 2, and 0.75, and “I have a two year old and an eleven year old” becomes I have a 2-year-old and an 11-year-old.

Word list

**If a word or term is on this list and is spoken in the audio, please make sure it gets into your transcript, spelled just this way. Capitalize any of these when starting a sentence.

Special Note: abbreviations (other than common all-caps acronyms) are never to be used.**

à la

à la carte

a lot (always two words; "allot" is a separate term with a separate meaning, and “alot” is not standard English)

all right (never "alright")


because (never 'cause or 'cuz, except on verbatim)


color (added "u" as in "colour" -- applies to other words as well -- is the UK spelling, incorrect for CW)

coq au vin

crème de la crème


dialog (added "ue" as in "dialogue" -- applies to other words as well -- is the UK spelling, incorrect for CW)


etcetera (never et cetera or etc.)


faux pas

foie gras



gray (with "e" as in "grey" -- applies to other words as well -- is the UK spelling, incorrect for CW)


in lieu

Internet (always capitalized)

iPod, iPad, iTunes (all Apple products have a lowercase "i" but uppercase the next letter)





nouveau riche

OK (always all caps)




raison d'être



Shih Tzu (dog breed)

traveling (with "ll", as in "travelling" (applies to other words as well), is the UK spelling, incorrect for CW)

tsetse (a type of fly or disease)

Ubuntu (always capitalized when referring to the Linux OS distribution)




Web (always capitalized)

website (note that it is not capitalized)

Transcript Style | Editor supplement | Sample Transcript | Rare Situations Supplement