Syllabus of Course Big Data Integration
Syllabus of Course Big Data Integration
1. Course Information
The course is part of the academic unit: Escuela de Ciencias Básicas Tecnología e
Ingeniería ECBTI
Course Description:
The Big Data Integration course is part of the disciplinary training field of the Specialization
in Data Science and Analytics, which is articulated with the problem core NP1: “Information
requirements of the organization”. To obtain information, database organization is a crucial
step that facilitates the identification of aspects of absence, failure, opportunity or
improvement in their management and the development of information systems.
Information requirements apply to all areas of science, technology, and any type of
organization. Systemic research is understood in data science as the phase prior to analysis
and includes the stages of collecting and classifying information to meet needs or solve
problems in organizations and production systems with real or potential actions that
efficiently contribute to it. Likewise, we support students in obtaining an international
certification in Introduction to Big Data through our partnership with IBM.
The course is designed so that the student can understand Big Data and the way in which it
impacts business and society through the tools and systems used by data scientists, it will
1
also guide the student in the basic use of Hadoop with MapReduce. Initially, fundamental
concepts of data science that address the problems posed by Big Data are presented. Also,
it is analyzed how Big Data is the result of efforts in two areas: machine learning and cloud
computing. In addition, it is shown how the speed aspect of Big Data demands analytical
algorithms that can operate data in motion and make decisions about the process with
heavy data summaries.
As for its structure, the course is of a methodological type of two (2) academic credits
distributed in the development of the academic period in accordance with its offer period.
The learning strategy used in the course is: Problem Based Learning (TBL). The first unit is
based on the historical review of Big Data, the massive volume of Data, the Domain of
Statistics and its relationship with Machine Learning. The second unit is focused on the
knowledge of Hadoop for Cloud Computing, where in this section two codes for data
management are worked on.
Represents the Big Data scenery with statistical and technological tools related to cloud
computing and Machine Learning, by means of problems in the real world for including the
key sources of Big Data: people, organizations, and sensors with a correct Data Handling
3. Learning Outcomes:
Learning Outcome 1: Understands the V’s of Big Data through the collection, analysis, and
reporting of data to achieve a low computational cost in the process of Data Handling.
2
Learning Outcome 2: Distinguishes what problems are and what are not big data through
the reformulation from the data science for a correct choice of model’s predictions based on
data.
Learning Outcome 3: Understands features of core Hadoop stack components as the job
management system and MapReduce programming model for the cloud computing in the
Big Data Analytics.
Learning Outcome 4: Run Hadoop by means of simple hardware for the analysis of massive
volume data.
4. Learning Strategy:
Problem Based Learning (TBL) is the strategy proposed for this course
This Learning Strategy is based on: Using this strategy, the student has the possibility of
addressing the course contents through individual and collaborative activities in a realistic
environment. In this case, the student faces a real problem, confronting previous knowledge
with the new concepts which pretends to build.
Adopting this model prepares students for the professional growth and increases their
academic motivation because the tasks respond to situations in working life, where the
professor takes a role of learning manager (Liang, 2008).
3
5. Course Contents and Bibliographic References
Casas Roma, J. Nin Guerrero, J. & Julbe López, F. (2019). Big data: análisis de datos en
entornos masivos. Editorial UOC. https://elibro-
net.bibliotecavirtual.unad.edu.co/es/lc/unad/titulos/117744
Holmes, D. E. (2018). Big Data: una breve introducción. Antoni Bosch editor.
https://elibro-net.bibliotecavirtual.unad.edu.co/es/lc/unad/titulos/122682
https://cognitiveclass.ai/courses/what-is-big-data/
López Murphy, J. J. & Zarza, G. (2017). La ingeniería del big data: cómo trabajar con datos.
Editorial UOC. https://elibro-
net.bibliotecavirtual.unad.edu.co/es/lc/unad/titulos/59093
4
Ríos Insua, D. & Gómez-Ullate Oteiza, D. (2019). Big data: conceptos, tecnologías y
aplicaciones. Editorial CSIC Consejo Superior de Investigaciones Científicas.
https://elibro-net.bibliotecavirtual.unad.edu.co/es/lc/unad/titulos/122031
• Big Data Analytics and Cloud Computing with Hadoop and MapReduce
• Google File System and HDFS.
• Flink and Data Process Engines.
• Getting Started with Hadoop.
• Copy the data into the Hadoop Distributed File System (HDFS).
5
López, I. A. (2022). Introducción a Hadoop. [Podcast]. Repositorio Institucional UNAD.
https://repository.unad.edu.co/handle/10596/50650.
Initial Moment:
The activities are: Mind map about the main concepts of Big Data and the distinction of
Business Analytics for Big Data definitions.
The highest score for this activity is 25 points, corresponding to 5% of the course
evaluation.
Intermediate Moment
6
Step 2 : Big Data Analytics and Machine Learning
To be developed from week 3 to week 8
This responds to Learning Outcome: 2
The activities are: Mind map about the main concepts of Business Analytics, Data,
and Statistical Methods and description of Data processing, analysis and
visualization.
Evaluation of Step 2
Evaluation of Step 3
The evaluation criteria for this activity are:
7
• Mind map about the Big Data systems.
• Getting Started with Hadoop.
• Word Counter with Hadoop.
• The Hadoop Ecosystem
• Socialization of Exercise 2 in the forum.
The highest score for this evaluation moment is 350 points, corresponding to 70% of
the course evaluation.
Final Moment
The activities are: Mind map about the main relations between Big Data Analytics
and Cloud Computing
Evaluation of Step 4
The evaluation criteria for this activity are
• Mind map about the main relations between Big Data Analytics and Cloud
Computing.
• Description of cloud computing.
• Debates about the MapReduce and distinct tools.
• The distinction of advantages of the Machine Learning for computer
processing.
• Reflection about the role of Machine Learning and Cloud Computing for the Big
Data with Hadoop.
8
• Socialization of Exercise 5 in the forum.
The highest score for this activity is 125 points, corresponding to 25% of the course
evaluation.
7. Teacher’s Support
To develop the course activities, you will have the support of a teacher or tutor. The
options for this academic support are: