0% found this document useful (0 votes)
22 views11 pages

Summary Ch2 - External Book Huma and Machine-The+History+of+Machine+Translation

Chapter 2 discusses the evolution of human and machine translation, emphasizing that translation involves converting texts between languages rather than translating languages themselves. It outlines the criteria for effective translation, the role of translation memory, and the impact of machine translation technologies, including statistical and neural approaches. The chapter also highlights the importance of human translators in maintaining quality and the challenges faced by machine translation systems in achieving accuracy and understanding context.

Uploaded by

i5alid55
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
22 views11 pages

Summary Ch2 - External Book Huma and Machine-The+History+of+Machine+Translation

Chapter 2 discusses the evolution of human and machine translation, emphasizing that translation involves converting texts between languages rather than translating languages themselves. It outlines the criteria for effective translation, the role of translation memory, and the impact of machine translation technologies, including statistical and neural approaches. The chapter also highlights the importance of human translators in maintaining quality and the challenges faced by machine translation systems in achieving accuracy and understanding context.

Uploaded by

i5alid55
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

Chapter 2

Human and machine translation


The History of Machine Translation

what is translation?
what is text ?
Translation what are the criteria of text?

• translation is the production of a text in one language, the target language,


on the basis of a text in another language, the source language while text
refers to a real use of spoken or written language. Texts are expected to meet
certain criteria:
1. they should be coherent.
2. they should serve some kind of purpose.
3. We also usually have particular expectations regarding what texts will
or should be like, given the particular language and context.
what do we translate?

• Most people believe that we translate language. Indeed, we translate texts


and not languages. Languages are vast, complicated, abstract systems that are
used to produce infinite examples of human communication and expression.
Texts are concrete instances of language in use.
translation is a relationship between.......?
• A second element of the definition of translation given above is the
contention that translation involves the production of a text on the basis of
another, pre-existing text. This clearly establishes translation as involving a
relationship between two texts, commonly known as the source text and the
target text.
could the translation process add more information ?

• Sometime translation might say more than its source text which means our
translation could add more information that does not exist in source text. For
example, the English pronoun "I" could be translated into French "née" for
female or "né" for male that means in this translation we add the information
of gender which is not mentioned in the source text. what is equivalence?
could the target text have the same meaning of the source text?
• Many scholars are reluctant to say that a source and target text have the
same meaning because languages have a lot of differences that sometime
cannot be conveyed to the target text. The right term should be used for this
relationship is " equivalence". Equivalence is the relationship that emerges
from the decision-making of a translator and arises between two text

Converting one language (SL) to another (TL) so that the TL could convey the intended message in
SL
snippets because the translator has deemed them to be of equal value in
their respective co-texts and contexts.

• How do translators normally solve translation problems?


When a professional translators do not understand something in the source text, or
cannot recall a specialized term in the target language, or is struggling to come up
with a way of formulating an idea in the target language. These professional
translators might follow some of the following ways to solve these problems:
1. They may go to the website of various local authorities to see how they
explain the technology involved.
2. They might access one of the many publicly available termbanks to find an
equivalent for a given term.
3. They might consult other documentation produced by their client’s company
or speak to engineers at the company.
4. They could consult with their colleagues, if they have any, or post a query to a
translator’s forum.

• The main thing is that most professional translators will realize when they
have a gap in their knowledge, or need inspiration, and they will conduct
conscientious research to address that gap, solve the translation problem and
move on.
What is the role human plays in machine translation process?
1. Human translation sets the standard by which machine translation is judged,
and anything that contributes to the maintenance of high quality in human
translation is ultimately of relevance to machine translation. Likewise, human
translation processes can help to put into sharp relief occasional deficits in
machine translation.
2. Most contemporary machine translation relies on translations completed by
humans to learn how to translate in the first place.

Translation memory
When did translation memory start?
In the 1990s translators working in the burgeoning software localization industry
found themselves translating texts that were either extremely repetitive in
themselves or that repeated verbatim whole sections of earlier versions of a
document. This was the case, for example, with software manuals that had to be
updated any time there was a new release of the software. Rather than translate
each sentence from scratch, as if it had never been translated before, they invented a
tool that would store previous translations in a so-called translation memory.

How does TM work?


• It does three things to the source text:
1. It divides the source text into segments.
2. Then it compares each of these segments with the source-language
segments already stored in memory.
3. If an exact match or a very similar segment was found, then the
corresponding target-language segment would be offered to the translator for
re-use, with or without editing.
• While translators are doing their job, they will get hits from the translation
memory, and they can accept, reject or edit the existing translation.

what is translation units?


How did translation units start?
After the translation memories grew extremely large, some companies who were
early adopters of the technology built up translation memories containing hundreds
of thousands and then millions of translation units, that is source-language segments
aligned with their target-language segments.
what happens if the translation are
made without TM?
Does TM contribute in machine translation industry? And how?
Translation memory tools enabled translation data to be created in great quantities and in a
format that could be easily used in machine translation development. Translation memories
can be seen as a special type of parallel corpus, that is a collection of source texts aligned at
sentence level with their target texts. In cases where translations were created without the
use of a translation memory tool, translated texts could still be aligned with their source texts
after the fact.

Could MT make a mistake ?

what are other technologies that helps MT?


What is machine translation?

• Machine translation involves the automatic production of a target-language


text on the basis of a source-language text. It is the automated translation of
a source-language text into a target-language text. It might produce a
translation with slightly different meanings to the source text, since we admit
that human translator does so. It needs the help of other technologies such
as automatic speech recognition and speech synthesis, or optical character
recognition and digital image processing.

• Machine translation industry went through four main phases:


1. After the Second World War, MT was one of the first non-numerical applications of
the digital computers.
2. Automatic translation systems were in operation primarily in defense, government
and international organizations by the late 1960s and 1970s, and by the end of the
century their use was expanding in commercial settings.
3. The technology became available to millions of internet users in 1997, when the
American search engine AltaVista starting giving access to free, online machine
translation under the Babel Fish name.
4. After the internet has expanded rapidly, MT users increased dramatically. For
example, by 2016, online machine translation system, Google Translate, was
reported to have over half a billion users, translating over 100 billion words per day
and supporting 103 languages.
does audio-visual translation, subscription video
effect the industry of MT?

• After audio-visual translation, subscription video streaming services thrive. Audio-


visual content is thus becoming just the latest in a long line of commercial products
whose markets can be expanded through machine translation. In the seventy or so
years since its inception, machine translation has thus moved from being the
preserve of governments and international organizations to being a mass consumer
good. does MT make mistake ?
how can solve this problem?
• Just like human translators, machine translation could make mistakes. Some
errors are serious such as in healthcare, news translation or international
diplomacy. Many researches are conducted to estimate the quality of
translations produced by machine translations, evaluate particular output,
design ways to correct these errors by post-editing machine translation
output or pre-editing the source texts to make them easier to translate.
what does MT literacy consist of :

• Machine translation literacy consists of two components:


1. First component is to determine the use of machine translation.
2. Second component is to have excellent background of how machine
translation works and of the wider societal, economic and environmental
implications of its use.

There are two uses of machine translation:


1- Using machine translation for assimilation which refers to a private use of the
translated text, with little risk of reputational or other damage.
2- Using machine translation for dissemination which means using it to publish
your work, for example, your blog in a second language, or to advertise your
business.

Development of machine translation technology


• The first public demonstration of an automated translation system, which
translated 250 words between Russian and English, was held in the US in
1954 (Quah, 2006, p.70). This first generation architecture is dictionary based
and attempts to match the source language to the target language word for
word, i.e. translating directly. “This approach was simple and cheap but the
output results were poor and mimic…the syntactic structures of the source
language” (Quah, 2006, p.70).
• By the mid-1960s, new research into rule-based approaches: transfer and
interlingua, saw the beginnings of second generation machine translation
systems. While automated translation systems had proven unsuitable for
replacing human translators on a general level, it was observed that they
were quite accurate when the language input was limited or very specific.
The problem was an inability to create “a truly language-neutral
representation that represents ‘all’ possible aspects of syntax and semantics
for ‘all’ known languages” (Quah, 2006, p.73)
• In the late 1970s and early 1980s, research focused more on the transfer
approach. In this architecture, the source text is analyzed by a source
language dictionary and converted into an abstract form. This form is
translated to an abstract form of the target text via a bilingual dictionary
then it is converted into a target text using a target language dictionary. This
rule-based approach was less complicated than interlingua and more suited
to working with multiple languages than direct translation. Problems arose
where dictionaries contained insufficient knowledge to deal with
ambiguities. Programming and updating dictionaries for machine
translation is a time-consuming and expensive process. They need to contain
huge amounts of information to deal with issues such as lexical ambiguity,
complex syntactic structures, idiomatic language and anaphora across
numerous languages.
• In the 1990s, research led to a third generation of machine translation
system: corpus-based architectures, namely the statistical and example based
approaches. The example based approach imitates combinations of
examples of pre-translated data in its database. For this approach to be
successful, the database must contain close matches to the source text. This
approach forms the basis of translation memory tools.
• On 15 November, Google announced that they are putting neural machine
translation into action in their Google Translate tool. They are rolling it out
with with a total of eight language pairs to and from English combined with
French, German, Spanish, Portuguese, Chinese, Japanese, Korean and
Turkish.
Artificial intelligence, machine learning and machine translation
• Artificial intelligence (AI) is often defined as the branch of computer science
that aims to create computer programs which have the ability to solve
problems that would normally require human intelligence.
what are the types of AI?
• Types of AI:
1. Narrow AI is the one that is designed to solve narrow problems such as
recognizing faces.
2. Strong AI is a more aspirational undertaking. It would involve either
general AI – in which machines would have human-like intelligence, be
self-aware, able to learn and plan for the future.
3. Superintelligence AI which exceeds intelligence of any human.
• It is fair to say that translation, as practiced by professional translators,
requires the kind of intelligence that strong AI aspires to, but that such
intelligence still remains beyond the capacity of machine translation systems.
are there Al machine translation that exceeds professional translator?

Rule-based machine translation


what is RBMT approach?
• Rule-based machine translation RBMT approach is to give a computer
program all the knowledge it would need to solve a particular problem, and
rules (as algorithms) that specify how it can manipulate this knowledge. For
example, you can give the program a list of all the words in each of the source
and the target languages, along with rules on how they can combine to create
well-formed structures.
• You can then specify how the words and structures of one language can map
onto the words and structures of the other language, and give the machine
some step-by-step instructions (an algorithm) on how to use all this
what is the first online available MT based
information to create translated sentences. on?
• When free online machine translation first became available in 1997, for
example, it was based on RBMT.
RBMT was beset by a number of problems:
1. It was very expensive to develop.
2. It required highly skilled linguists to write the rules for each language pair.
3. It suffered from knowledge bottlenecks: it was simply impossible in many
cases to anticipate all the knowledge necessary to make RBMT systems work
as desired.

Data-driven machine translation


Data-driven machine translation centers around the idea of machine learning which
means rather than telling a machine – or, more precisely, a computer program –
everything it needs to know from the outset, it is better to let the machine acquire its
own knowledge. This approach depends on the translation units that constitute
training data from which contemporary machine translation systems learn.
what are the types of Data-driven MTs?
Data-driven machine translation is divided into two types:
1. statistical machine translation SMT
2. neural machine translation NMT

what is SMT?

does SMT learn from data?


Statistical Machine Translation what is this data called?

Statistical Machine Translation (SMT) systems basically build two types of statistical
models based on the training data, the mathematical representation of observed
data:
1- The first model is known as translation model. This type of model
is bilingual one which divides the source text into segments then
compares them and puts them in a table, which is called phrase
table, alongside their translation, using statistical evidence and
distortion probabilities to choose the most appropriate
translation. The word "phrase" in "phrase table" is inaccurate as
the strings don’t necessarily correspond to phrases as commonly
understood in linguistics. However, they are 𝑛-grams which is one
way to help machine understand a word in its context. N-grams
are strings of one, two, three or 𝑛 words that appear adjacent to
each other in the training data, for example, In the sentence,
“appear adjacent” is a bigram, and “appear adjacent to” is a
trigram. The table below contains an example of an excerpt from
such a phrase table.

2- The second model, known as the language model is a monolingual


model (or combination of models) of the target language. It is
built from the output language monolingual data which finds the
best choice from the probable translations based on the
translation language. It would give the probability of seeing a
particular word in the target language, given that you had already
seen the two words in front of it. A trigram model could tell you
the probability of seeing the word “gorgonzola” if you have
A trigram target language model, for example, would give the probability of seeing a
particular word in the target language, given that you had already seen the two words in
front of it. it is based on n-grams.

probability of seeing one word with another.


already seen “I like” in the Europarl corpus, for example. It turns
out to be 0.024, which means that while “I like gorgonzola” does
occur in the training data (it actually occurs four times) there are
many words other than “gorgonzola” that are much more likely to
follow “I like.
• What is the difference between translation model and language model?
Translation model is supposed to capture knowledge about how individual
words and m-grams likely to be translated into target language, while the
language model tells you what is likely to occur in the target language in the
first place.

• SMT goes through three phases of programing:


1. The first phase is when the machine learns directly from the training
date.
2. In a second phase, called tuning, system developers work out the
weight that should be assigned to each model to get the best output.
Once the system is trained and tuned, it is ready to translate
previously unseen source sentences.
3. Translation is called decoding in SMT. It generally involves generating
many thousands of hypothetical translations for the input sentence,
and calculating which one is the most probable, given the particular
source sentence, the models the system has learned, and the weights
assigned to them.

• The reasons for the drop of STM:


SMT performed particularly poorly on agglutinative and highly inflected languages.
Other problems included word drop, where a system simply failed to translate a
word, and inconsistency, where the same source-language word was translated two
different ways, sometimes in the same sentence.

Is SMT still used nowadays?


SMT is still used as the basic of NMT, it starts as SMT developed to NMT: SMT is still
used in the translation industry, albeit in limited contexts: a supplier of machine
translation services might, for example, first create an SMT system to see how viable
the project is and whether or not it is worthwhile investing time and effort in
subsequently developing a neural system.
Neural Machine Translation NMT
NMT systems use neural networks in order to predict the likelihood of a set of words
in sequence. The Stanford success heralded the beginning of what Bentivogli et al.
(2016) call “the new NMT era.” The excitement was palpable among researchers and
especially in the press. how do we test NMT
Data-driven machine translation is based on technologies developed to solve
problems that were already known and solved by humans. Such correct answers may
be provided in the training data or they may be arrived at through generalization
from the training data. When a machine translation system is tested in order to
check its efficiency by giving it a problem to which we already know the answer, we
ask it to predict the translation of several sentences we already have good (human)
translations that we set aside specifically for this purpose.

Why is NMT so much better that SMT, if it is simply learning from data? Is that not
what SMT was already doing?
The answer lies in the kind of representations that NMT systems use and in the kind
of models they learn.

how NMT works?

Models in NMT what is computational model of


translation?
• A computer model is an abstract, mathematical representation of some real-
life event, system or phenomenon. A computational model of translation, for
example, should be able to predict a target-language sentence given a
previously unseen source-language sentence.
• SMT systems use probabilistic models of translation and the target language
that are encapsulated in phrase tables and 𝑛-gram probabilities. NMT
systems, in contrast. use models that are inspired by the human brain.
Therefore, they use artificial neural networks, in which thousands of artificial
neurons are linked to thousands of other artificial neurons. Each neuron is
activated depending on the stimuli received from other neurons, and the
strength or weight of the connections between neurons. As Forcada (2017)
explains, the activation states of individual neurons do not make much sense
by themselves. It is, instead, the activation states of large sets of connected
neurons that can be understood as representing individual words and their
relationships with other words. Like other machine learning, NMT needs vast
quantities of parallel data to get exposed to in order to build itself.

Representing words in NMT


what is embedding?
How are words represented in NMT?

• In NMT, we use the vector in order to represent a word. Vector is a fixed


sized list of numbers and depends on corpora. For example, the word "apple"
could be represented by a vector like [1.20, 2.80, 6.10]. The vector [1.20,
2.80, 5.50], for example, could be the vector for the word "pear". It differs
from the vector for apple [1.20, 2.80, 6.10] in just the last number. This will
make these two words very close to each other. The vector-based
representations of words that the machine learns are called word
embeddings. Since "apple" and "pear" tend to keep appearing in the same or
similar co-texts – both occur very regularly before the word tree, they will
end up with similar embeddings.
• An artificial neural network that has multiple layers sandwiched between its
external layers is known as a deep neural network.

The advantages and disadvantages of neural machine translation

• Advantages of NMT:
1. NMT can build up very rich representations of words as they appear in
a given source text, taking the full source sentence into account,
rather than mere 𝑛-grams.
2. Since NMT deals with full sentences, it is better at dealing with tricky
linguistic features like discontinuous dependencies and it handles all
sorts of agreement phenomena better than SMT.

• Disadvantages of NMT:
1. NMT systems are restricted to only sentence-level not the full text
which means they do not look beyond the current sentence. So, they
face difficulties to find information from a previous sentence in order
to figure out what a pronoun like “it” refers to in the current
sentence.
2. NMT can also output words that don’t actually exist in the target
language. its output can be fluent but inaccurate.
3. Like other technologies trained on large quantities of existing text, it
can also amplify biases encountered in the training data.
4. NMT systems take much longer and much more computing power to
train than their predecessors and use up vast quantities of energy in
the process. They usually require expensive hardware and massive
quantities of training data, which are not available for every language
pair.

Systems, engines and custom NMT


• Machine translation system often refers to a machine translation product or
service made available by a single supplier or developer. Google Translate is
thus understood as Google’s machine translation system; while Microsoft has
a system called Microsoft Translator.
• Machine translation engine is basically a machine translation program (or
even a “model”) that has been trained to deal with a particular language pair
and, often, domain or genre. For example, commercial machine translation
company may offer its customers access to an English-French engine that was
trained on a parallel corpus of financial statements.
• Custom machine translation is discussed in greater depth by Ramírez-Sánchez
(2022 [this volume]), and the MultiTraiNMT project has developed a bespoke
pedagogical interface that allows students to train their own NMT engines.
does different systems output the same results?
could single system change?
Four last things you need to know about machine translation
Many readers tend to use only free, online machine translation and engines that are
built for the language pair. These readers should be interested to learn that:
1. different systems may output different translations;
2. different engines in the same system may output different translations;
3. a single system may output different translations for the same input
depending on the co-text
4. a single system’s outputs may change over time.

Conclusions
Some people might claim that there is no need for learning foreign language or
training human translators, however, they forget the fact that NMT still depends on
human translations or at least translations validated by humans as training data.
NMT, just like other machine translation, needs human intervention as its output
sometime should be evaluated and improved by excellent translators.

You might also like