Tutorial+1
Tutorial+1
1
TA: Benedict Tan
Final Year Business Analytics Student
2
Anaconda Installation
Different projects may have different version of Python dependencies. Anaconda creates a
"virtual environment,” an isolated dependencies library for different projects, so there will
be no “version conflicts”.
3
What is Jupyter Notebook?
• Jupyter Notebook is a common tool in data science that makes it easy to explore
and plot the data.
• Take a look here for more about how to write markdowns.
• Very popular code editor of choice for fast prototyping of “data
science/analytics” code. (But not a substitute for a proper IDE, e.g., PyCharm, to
develop .py files!)
4
What is Jupyter Notebook?
5
Install Jupyter Notebook
Remove kernel
• jupyter kernelspec list
• jupyter kernelspec uninstall unwanted-
kernel
Jupyter Notebook Kernels: How to Add, Change, Remov
e
6
Install Jupyter Notebook
7
How to open Jupyter Notebook
Mac
1. Open Terminal
2. Enter command `jupyter notebook`
Windows
1. Open powershell
2. Enter command `jupyter notebook`
8
Data Serialisation formats:
1. CSV
2. Json
3. XML
4. Avro
How it is transferred:
5. HTTP (All formats)
6. SOAP (XML)
10
How to install package in Jupyter Notebook?
Example:
pip install pandas
import pandas as pd
pd.show_versions()
11
How to read csv in Jupyter Notebook?
import pandas as pd
mpg = pd.read_csv('Path_Route')
Example:
import pandas as pd
mydata
=pd.read_csv('C:/Users/Mike/Documents/mpg.csv')
12
How to read json in Jupyter Notebook?
import pandas as pd
df = pd.read_json(" 'Path_Route")
Example:
import pandas as pd
df = pd.read_json("FILE_JSON.json")
13
How to read xml in Jupyter Notebook?
Example:
from bs4 import BeautifulSoup
Bs_data = BeautifulSoup(data,
"xml")
print(b_unique)
14
What is Avro:
1. Another serialisation format that serialises in compact binary format.
2. Efficient for both storage and transmission
3. Schema based
15
How to read avro file in Jupyter Notebook?
Example:
import avro.schema
from avro.datafile import DataFileReader,
DataFileWriter from avro.io import DatumReader,
DatumWriter
schema = avro.schema.parse(open("user.avsc").read())
16
XML & HTML
17
XML & HTML
18
XML Example
•HTML is the primary, standardized language for web development. It is platform-agnostic and works in all
browsers and applications that support it.
• HTML uses a simple markup syntax made of tags and attributes. These tags are predefined.
• HTML is not case-sensitive and will display even with typos and syntax errors.
• It creates static web pages that don’t update or change.
• HTML can integrate with other web languages such as CSS, XML, and back-end languages
18
XML vs. HTML
Key Difference
•XML is abbreviation for extensible Markup Language whereas HTML stands for Hypertext Markup
Language.
• XML mainly focuses on transfer of data while HTML is focused on presentation of the data.
• XML tags are extensible whereas HTML has limited tags.
• XML is Case sensitive while HTML is Case insensitive
Summary
• XML’s primary function is in storing and transporting data, it isn’t concerned with displaying the data.
•HTML is the primary language used for coding the front end of a website. While it’s commonly used alongside
and integrates with other languages like CSS, XML, and back-end languages such as Ruby and Python, HTML
is primarily responsible for crafting a website’s layout and basic appearance.
Human language
• needs external linguistic knowledge or data to process
• sometimes even need model or training
• essential for enabling high-level analysis
NLTK is a leading platform for building Python programs to work with human language data.
It provides easy-to-use interfaces to over 50 corpora and lexical resources such as WordNet,
along with a suite of text processing libraries for classification, tokenization, stemming,
tagging, parsing, and semantic reasoning, wrappers for industrial-strength NLP libraries.
20