1.2. Preparing Machine Learning Environment: Installation of Python (In Windows OS)
1.2. Preparing Machine Learning Environment: Installation of Python (In Windows OS)
Installation of ML Tools
To install machine learning tools in Python, you can install libraries and packages, either one by
one or many at once:
− Install one by one in command prompt:
o Pip install matplotlib
o Pip install NumPy
o Pip install keras
o Pip install gekko
− Install many packages at once in command prompt (separated with a space among them):
o Pip install matplotlib numpy gekko pandas plotly
Gekko provides an interface to gradient-based solvers for machine learning and
optimization of mixed-integer, differential algebraic equations.
– pip install gekko
matplotlib generates plots in Python.
– pip install matplotlib
Numpy is a numerical computing package for mathematics, science, and engineering.
– pip install numpy
OpenCV is a package for real-time computer vision
pg. 1
– pip install opencv-python
Pandas visualizes and manipulates data tables.
– pip install pandas
Plotly renders interactive plots with HTML and JavaScript.
– pip install plotly
PyTorch enables deep learning, computer vision, and natural language processing.
– pip install torch
Beautiful Soup is a Python package for extracting (scraping) information from web
pages.
– pip install beautifulsoup4 lxml
2. Why is it important to use a package manager like pip or conda in a machine learning project?
a) To monitor model performance in production
b) To ensure that all dependencies and libraries are installed and updated correctly
c) To visualize data and model performance
d) To manage containerized applications
3. How can Jupyter Notebooks assist in the machine learning development process?
a) By managing machine learning experiments and deployments
b) By providing an interactive platform for code execution, data visualization, and documentation
c) By creating containerized environments for running models
d) By handling version control of code and data
pg. 2
1.3. DATA COLLECTION AND ACQUISITION
Data collection is the process of gathering and measuring information on variables of interest, in
an established systematic fashion that enables one to answer stated research questions, test
hypotheses, and evaluate outcomes.
The main techniques for gathering data are observation, interviews, questionnaires, schedules,
and surveys.
Key Terms Description
– Data: refers to the set of observations or measurements that can be used to train
a machine-learning model. Types of data for ML are: numerical data, categorical
data, time series data, and text data.
– Information: Data that has been interpreted and manipulated and has now some
meaningful inference for the users.
– Dataset: is a collection of data that is used to train and test algorithms and models.
– Data warehouse: It involves combining data from different sources into a single
database, which can then be used for analysis and decision making.
– Big data: is a collection of data from various sources, often characterized by
volume, variety, velocity…
pg. 3
user profiles, Hashtags and Keywords (trending topics), geolocation data, Engagement
Metrics (likes, shares, comments)
Transactional data: Data related to transactions or interactions, such as:
o E-commerce Transactions: Purchase records, shopping cart data.
o Financial Transactions: Credit card transactions, bank transfers.
Computer: different data are generated from computers, such as: Text Data, Images,
Audio Data, Video Data
o Text Data: Articles, books, social media posts, emails, and other text sources used
in natural language processing tasks.
o Images: Photos, diagrams, and graphics used in computer vision tasks. Formats
include JPEG, PNG, GIF, etc.
o Audio Data: Speech recordings, music, and other sound data used in speech
recognition and audio analysis.
o Video Data: Video files used for tasks like action recognition and video
classification.
Smartphone: generate a wealth of data that can be valuable for machine learning
applications, such as:
o Sensor data: accelerometer, gyroscope, magnetometer, GPS, proximity,
temperature and humidity, light, camera, microphone…
o Text data: user input’s Text messages, emails, or notes can be analyzed for
sentiment analysis, language modeling, or personalized recommendations.
o Usage data: call and SMS logs, app usage…
pg. 4
• Data representing quantities & measurable values
Numerical Data •Use Cases: Regression tasks, statistical analysis, and feature engineering.
pg. 5
Description of types of data
Various types of data are used depending on the problem and the model being applied.
The common types of data re:
– Structured data: This type of data is highly organized and easily searchable. It includes:
o Tabular Data: Data organized in rows and columns, like spreadsheets or SQL databases.
Eg.: customer information, financial records, and sensor readings.
o Time-Series Data: Data collected at successive time points, such as stock prices or weather
measurements. Eg.: stock prices, weather data, and website traffic logs.
– Semi-structured data: This type of data does not fit neatly into tables but contains some
organizational properties, such as:
o XML Data: Data marked up with tags, often used in web services.
o JSON Data: Data in JavaScript Object Notation format, commonly used in web
applications.
– Unstructured data: This type of data does not follow a specific format or structure. It
includes:
o Text Data: Data in the form of text, such as articles, emails, or social media posts.
o Image Data: Data consisting of visual information, used in tasks like image classification
or object detection.
o Audio Data: Data captured in audio formats, like speech & music, used in speech
recognition or audio classification.
o Video Data: Data in the form of video sequences, which is used in video classification or
activity recognition.
– Categorical data: Data that represents categories or labels, such as colors, types of products,
or user demographics. It can be:
o Nominal: Categories without any inherent order, like types of animals.
o Ordinal: Categories with a specific order, like customer satisfaction ratings.
pg. 6
– Numerical data: Data represented by numbers, which can be:
o Discrete: Countable values, such as the number of visitors.
o Continuous: Measurable values, such as temperature or height.
pg. 7
10. In which scenario would you use "categorical data" in machine learning?
a) Predicting house prices based on historical data
b) Classifying emails into different categories like spam or not spam
c) Forecasting future stock prices based on past trends
d) Analyzing customer reviews for sentiment
pg. 8