Dissertation
Dissertation
There are five basic elements of multimedia: text, images, audio, video and
animation. Example - Text in fax, Photographic images, Geographic information
system maps, Voice commands, Audio messages, Music, Graphics, Moving
graphics animation, Full-motion stored and live video, Holographic images.
Machine learning (ML), reorganized and recognized as its own field, started to flourish in the
1990s. The field changed its goal from achieving artificial intelligence to tackling solvable
problems of a practical nature. It shifted focus away from the symbolic approaches it had
inherited from AI, and toward methods and models borrowed from statistics, fuzzy logic,
and probability theory.[25]
Data mining[edit]
Machine learning and data mining often employ the same methods and overlap significantly,
but while machine learning focuses on prediction, based on known properties learned from
the training data, data mining focuses on the discovery of (previously) unknown properties in
the data (this is the analysis step of knowledge discovery in databases). Data mining uses
many machine learning methods, but with different goals; on the other hand, machine
learning also employs data mining methods as "unsupervised learning" or as a
preprocessing step to improve learner accuracy. Much of the confusion between these two
research communities (which do often have separate conferences and separate
journals, ECML PKDD being a major exception) comes from the basic assumptions they
work with: in machine learning, performance is usually evaluated with respect to the ability
to reproduce known knowledge, while in knowledge discovery and data mining (KDD) the
key task is the discovery of previously unknown knowledge. Evaluated with respect to known
knowledge, an uninformed (unsupervised) method will easily be outperformed by other
supervised methods, while in a typical KDD task, supervised methods cannot be used due to
the unavailability of training data.
Machine learning also has intimate ties to optimization: many learning problems are
formulated as minimization of some loss function on a training set of examples. Loss
functions express the discrepancy between the predictions of the model being trained and
the actual problem instances (for example, in classification, one wants to assign a label to
instances, and models are trained to correctly predict the pre-assigned labels of a set of
examples).
Generalization
The difference between optimization and machine learning arises from the goal
of generalization: while optimization algorithms can minimize the loss on a training set,
machine learning is concerned with minimizing the loss on unseen samples. Characterizing
the generalization of various learning algorithms is an active topic of current research,
especially for deep learning algorithms.
Statistics
Machine learning and statistics are closely related fields in terms of methods, but distinct in
their principal goal: statistics draws population inferences from a sample, while machine
learning finds generalizable predictive patterns. According to Michael I. Jordan, the ideas of
machine learning, from methodological principles to theoretical tools, have had a long pre-
history in statistics. He also suggested the term data science as a placeholder to call the
overall field.
Conventional statistical analyses require the a priori selection of a model most suitable for
the study data set. In addition, only significant or theoretically relevant variables based on
previous experience are included for analysis. In contrast, machine learning is not built on a
pre-structured model; rather, the data shape the model by detecting underlying patterns. The
more variables (input) used to train the model, the more accurate the ultimate model will be.
Leo Breiman distinguished two statistical modeling paradigms: data model and algorithmic
model,] wherein "algorithmic model" means more or less the machine learning algorithms
like Random Forest.
Some statisticians have adopted methods from machine learning, leading to a combined field
that they call statistical learning.
Physics
Analytical and computational techniques derived from deep-rooted physics of disordered
systems can be extended to large-scale problems, including machine learning, e.g., to
analyze the weight space of deep neural networks. Statistical physics is thus finding
applications in the area of medical diagnostics.
Theory :
Main articles: Computational learning theory and Statistical learning theory
A core objective of a learner is to generalize from its experience. Generalization in this
context is the ability of a learning machine to perform accurately on new, unseen
examples/tasks after having experienced a learning data set. The training examples come
from some generally unknown probability distribution (considered representative of the space
of occurrences) and the learner has to build a general model about this space that enables it
to produce sufficiently accurate predictions in new cases.
For the best performance in the context of generalization, the complexity of the hypothesis
should match the complexity of the function underlying the data. If the hypothesis is less
complex than the function, then the model has under fitted the data. If the complexity of the
model is increased in response, then the training error decreases. But if the hypothesis is too
complex, then the model is subject to overfitting and generalization will be poorer.
In addition to performance bounds, learning theorists study the time complexity and
feasibility of learning. In computational learning theory, a computation is considered feasible
if it can be done in polynomial time. There are two kinds of time complexity results: Positive
results show that a certain class of functions can be learned in polynomial time. Negative
results show that certain classes cannot be learned in polynomial time.
Approaches:
Machine learning approaches are traditionally divided into three broad categories, which
correspond to learning paradigms, depending on the nature of the "signal" or "feedback"
available to the learning system:
Supervised learning: The computer is presented with example inputs and their desired
outputs, given by a "teacher", and the goal is to learn a general rule that maps inputs to
outputs.
Unsupervised learning: No labels are given to the learning algorithm, leaving it on its
own to find structure in its input. Unsupervised learning can be a goal in itself
(discovering hidden patterns in data) or a means towards an end (feature learning).
Reinforcement learning: A computer program interacts with a dynamic environment in
which it must perform a certain goal (such as driving a vehicle or playing a game against
an opponent). As it navigates its problem space, the program is provided feedback that's
analogous to rewards, which it tries to maximize.[6] Although each algorithm has
advantages and limitations, no single algorithm works for all problems.
With indexing, the audience can easily find a certain location inside the
presentation or lecture without having to search through it by hand. This is
more inconvenient over time.
Aspects of Augmented Reality (AR): With AR technology, digital images can be placed
over the actual studio space, seamlessly integrating them with the live-action movie.
LED Walls and Volumes: Rather than using green screens to portray virtual
surroundings, some studios use LED walls. This technique allows for more realistic
lighting and reflections, as well as more natural interactions between the actors and
their surroundings.
A discussion on a news feed or a Twitter hashtag. The system has the ability to
dynamically include the content into the livestream by detecting and learning
important keywords from a variety of digital media platforms.
Machine learning can help automate each of these video indexing methods,
helping to save tremendous costs by reducing the need for manual
transcription. Human operators can instead use their time to verify
transcribed/converted text, therefore helping the software learn new words and
correct any grammar issues.
Possibilities for machine learning include: Transcribing audio to text
and using that text as a basis for indexing important points in the VOD;
Using OCR to turn overlays, lower thirds, and other on-screen text into
searchable data and automatically index important points in the video;
Learn visual and audio cues for each video source or layout;
Switch to each video source or layout based on learned cues.
5. Dynamic calibration of images
For viewers to see a presentation that is clearly visible, live streams and recordings
need to have their image settings (such as exposure and white balance) calibrated
to perfection. Picture calibration can be a challenging procedure, especially if
users lack the knowledge to make the required corrections or if environmental
conditions (such lighting) are unpredictable. By identifying the existing picture
settings and applying adjustments to enhance picture quality, machine learning
can expedite the calibration process.
Offer recommendations for enhancing the current image (or even set up the
settings automatically!);
There may be occasional issues with the lectures or webcast, including a speaker
switching, a delay in preparing the presentation materials, minor technological
problems, etc. With the use of a recorded presentation, professionals may
eliminate any downtime and provide viewers with a polished and expert end
product. Through the identification and removal of gaps in the recorded content
during post-production, machine learning can assist in automating this process.
10 Highlight reels :
The last in the list of machine learning applications for video production is
highlight reels. A recorded presentation can be repurposed as marketing collateral
by editing the original material to contain only the presentation highlights, such as a
speaker’s key points or important moments in the event. Machine learning can be
applied to automatically search for and isolate key moments in the recorded video(s)
using visual (e.g. transcribed text) and audio cues (e.g. audience applause). The code
can
help create a highlight reel based on these isolated clips for the video editors to review.
This is particularly helpful for video editors in saving time and effort on routine
post-production tasks in a high-volume video setting.
Machine learning possibilities:
• Learn visual and audio cues that correspond to an important moment, such as
audience applause or keywords within transcribed text.
• Automatically isolate video clips based on learned cues.
• Use clips together to form a highlight reel.
streamlining, and personalizing your live streams and recordings. Whether you’re a
content creator, AV technician for an educational institution, or live event specialist,
machine learning can help improve your live video experience. With rapid
advancement of science and technology and with investment of scholars to explore
more AI and its technologies like machine learning applying to multimedia
applications, I it will open new branches and bring more convenience to human life.
References:
https://www.techtarget.com/searchenterpriseai/tip/The-history-
of-artificial-intelligence-Complete-AI-timeline
https://www.technologyreview.com/2019/12/18/102365/the-
future-of-ais-impact-on-society/
Wikipedia: Link:https://en.wikipedia.org/wiki/Multimedia.
IBM: https://www.ibm.com/cloud/learn/machine-learning.
Epiphan:
https://www.epiphan.com/blog/machine-learning-applications/.