0% found this document useful (0 votes)
3 views

Unit-2 Hadoop and Python

This document provides an overview of Hadoop and Python, detailing the Hadoop ecosystem, architecture, and the MapReduce programming model. It also covers cloud computing design considerations, service-oriented architecture, and the cloud component model, emphasizing scalability, reliability, and security. Additionally, it discusses various Hadoop schedulers and their functionalities, including FIFO, Fair, and Capacity schedulers.

Uploaded by

sivatarak12
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

Unit-2 Hadoop and Python

This document provides an overview of Hadoop and Python, detailing the Hadoop ecosystem, architecture, and the MapReduce programming model. It also covers cloud computing design considerations, service-oriented architecture, and the cloud component model, emphasizing scalability, reliability, and security. Additionally, it discusses various Hadoop schedulers and their functionalities, including FIFO, Fair, and Capacity schedulers.

Uploaded by

sivatarak12
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 50

UNIT – 2

HADOOP AND PYTHON


CLOUD COMPUTING
D r. R. M AH AM M AD SH AF I
TABLE OF CONTENTS
HADOOP ECO SYSTEM................................ 4
MAPREDUCE ................................................. 7
HADOOP SCHEDULERS............................. 10
DESIGN CONSIDERATIONS FOR CLOUD
APPLICATIONS ........................................... 12
SERVICE ORIENTED ARCHITECTURE ..... 15
CLOUD COMPONENT MODEL ................... 17
SOA Vs CCM................................................ 20
MODEL VIEW CONTROLLER ..................... 21
RELATIONAL DATABASES ......................... 23
PYTHON....................................................... 25
NUMBERS IN PYTHON ............................... 26
STRINGS IN PYTHON ................................. 27
LISTS IN PYTHON ....................................... 28
TUPLES IN PYTHON ................................... 29
DICTIONARIES ............................................ 30
TYPE CONVERSIONS IN PYTHON ............ 31
CONTROL FLOW – IF STATEMENT ........... 32
CONTROL FLOW – FOR STATEMENT....... 33
CONTROL FLOW – WHILE STATEMENT ... 34
CONTROL FLOW – RANGE STATEMENT . 35
CONTROL FLOW – BREAK/ CONTINUE
STATEMENTS ............................................. 36
CONTROL FLOW – PASS STATEMENT..... 37
FUNCTIONS................................................. 38
MODULES .................................................... 41
PACKAGES .................................................. 42

ii
FILE HANDLING .......................................... 43
DATE/ TIME OPERATORS .......................... 45
CLASSES ..................................................... 46
SUMMARY ................................................... 48
Unit-2: Hadoop and Python

HADOOP ECO SYSTEM

Apache Hadoop is an open source framework for distributed batch processing of big data.
Hadoop Ecosystem includes:
• Hadoop MapReduce
• HDFS
• YARN
• HBase
• Zookeeper
• Pig
• Hive
• Mahout
• Chukwa
• Cassandra
• Avro
• Oozie
• Flume
• Sqoop

Fig-2.1: Hadoop Eco System

Prepared By: Dr. R. Mahammad Shafi, Professor 4


Unit-2: Hadoop and Python

APACHE HADOOP
A Hadoop cluster comprises of a Master node, backup node and a number of slave nodes. The master
node runs the NameNode and JobTracker processes and the slave nodes run the DataNode and
TaskTracker components of Hadoop. The backup node runs the Secondary NameNode process.
NameNode: NameNode keeps the directory tree of all files in the file system, and tracks where across
the cluster the file data is kept. It does not store the data of these files itself. Client applications talk to
the NameNode whenever they wish to locate a file, or when they want to add/copy/move/delete a file.
Secondary NameNode: NameNode is a Single Point of Failure for the HDFS Cluster. An optional
Secondary NameNode which is hosted on a separate machine creates checkpoints of the namespace.
JobTracker: The JobTracker is the service within Hadoop that distributes MapReduce tasks to specific
nodes in the cluster, ideally the nodes that have the data, or at least are in the same rack.

Fig-2.2: APACHE Hadoop Cluster

TaskTracker: TaskTracker is a node in a Hadoop cluster that accepts Map, Reduce and Shuffie tasks
from the JobTracker. Each TaskTracker has a defined number of slots which indicate the number of
tasks that it can accept.
DataNode: A DataNode stores data in an HDFS file system. A functional HDFS filesystem has more than
one DataNode, with data replicated across them. DataNodes respond to requests from the NameNode
for filesystem operations. Client applications can talk directly to a DataNode, once the NameNode has

Prepared By: Dr. R. Mahammad Shafi, Professor 5


Unit-2: Hadoop and Python

provided the location of the data. Similarly, MapReduce operations assigned to TaskTracker instances
near a DataNode, talk directly to the DataNode to access the files. TaskTracker instances can be
deployed on the same servers that host DataNode instances, so that MapReduce operations are
performed close to the data.

Prepared By: Dr. R. Mahammad Shafi, Professor 6


Unit-2: Hadoop and Python

MAPREDUCE

MapReduce job consists of two phases:


Map: In the Map phase, data is read from a distributed file system and partitioned among a set of
computing nodes in the cluster. The data is sent to the nodes as a set of key-value pairs. The Map tasks
process the input records independently of each other and produce intermediate results as key-value
pairs. The intermediate results are stored on the local disk of the node running the Map task.
Reduce: When all the Map tasks are completed, the Reduce phase begins in which the intermediate
data with the same key is aggregated.
Optional Combine Task: An optional Combine task can be used to perform data aggregation on the
intermediate data of the same key for the output of the mapper before transferring the output to the
Reduce task.

Fig-2.3: MapReduce
MapReduce to Job execution workflow
MapReduce job execution starts when the client applications submit jobs to the Job tracker. The
JobTracker returns a JobID to the client application. The JobTracker talks to the NameNode to determine
the location of the data. The JobTracker locates TaskTracker nodes with available slots at/or near the
data.

Prepared By: Dr. R. Mahammad Shafi, Professor 7


Unit-2: Hadoop and Python

The TaskTrackers send out heartbeat messages to the JobTracker, usually every few minutes, to
reassure the JobTracker that they are still alive. These messages also inform the JobTracker of the
number of available slots, so the JobTracker can stay up to date with where in the cluster, new work can
be delegated.
The JobTracker submits the work to the TaskTracker nodes when they poll for tasks. To choose a task
for a TaskTracker, the JobTracker uses various scheduling algorithms (default is FIFO). The TaskTracker
nodes are monitored using the heartbeat signals that are sent by the TaskTrackers to JobTracker.
The TaskTracker spawns a separate JVM process for each task so that any task failure does not bring
down the TaskTracker. The TaskTracker monitors these spawned processes while capturing the output
and exit codes. When the process finishes, successfully or not, the TaskTracker notifies the JobTracker.
When the job is completed, the JobTracker updates its status.

Fig-2.4: MapReduce to Job execution workflow

MAPREDUCE 2.0 – YARN


In Hadoop 2.0 the original processing engine of Hadoop (MapReduce) has been separated from the
resource management (which is now part of YARN).
This makes YARN effectively an operating system for Hadoop that supports different processing engines
on a Hadoop cluster such as MapReduce for batch processing, Apache Tez for interactive queries,
Apache Storm for stream processing, etc.
YARN architecture divides architecture divides the two major functions of the JobTracker - resource
management and job life-cycle management - into separate components:
• ResourceManager
• ApplicationMaster.

Prepared By: Dr. R. Mahammad Shafi, Professor 8


Unit-2: Hadoop and Python

Fig-2.5: Hadoop 2.0


Resource Manager (RM): RM manages the global assignment of compute resources to applications.
RM consists of two main services:
Scheduler: Scheduler is a pluggable service that manages and enforces the resource scheduling policy
in the cluster.
Applications Manager (AsM): AsM manages the running Application Masters in the cluster. AsM is
responsible for starting application masters and for monitoring and restarting them on different nodes in
case of failures.
Application Master (AM): A per-application AM manages the application’s life cycle. AM is responsible
for negotiating resources from the RM and working with the NMs to execute and monitor the tasks.
Node Manager (NM): A per-machine NM manages the user processes on that machine.
Containers: Container is a bundle of resources allocated by RM (memory, CPU, network, etc.). A
container is a conceptual entity that grants an application the privilege to use a certain amount of
resources on a given machine to run a component task.

Prepared By: Dr. R. Mahammad Shafi, Professor 9


Unit-2: Hadoop and Python

HADOOP SCHEDULERS

Hadoop scheduler is a pluggable component that makes it open to support different scheduling
algorithms. The default scheduler in Hadoop is FIFO.
Two advanced schedulers are also available - the Fair Scheduler, developed at Facebook, and the
Capacity Scheduler, developed at Yahoo.
The pluggable scheduler framework provides the flexibility to support a variety of workloads with varying
priority and performance constraints.
Efficient job scheduling makes Hadoop a multi-tasking system that can process multiple data sets for
multiple jobs for multiple users simultaneously.

FIFO SCHEDULER
FIFO is the default scheduler in Hadoop that maintains a work queue in which the jobs are queued. The
scheduler pulls jobs in first in first out manner (oldest job first) for scheduling. There is no concept of
priority or size of job in FIFO scheduler.

FAIR SCHEDULER
The Fair Scheduler allocates resources evenly between multiple jobs and also provides capacity
guarantees. Fair Scheduler assigns resources to jobs such that each job gets an equal share of the
available resources on average over time. Tasks slots that are free are assigned to the new jobs, so that
each job gets roughly the same amount of CPU time.
Job Pools: The Fair Scheduler maintains a set of pools into which jobs are placed. Each pool has a
guaranteed capacity. When there is a single job running, all the resources are assigned to that job. When
there are multiple jobs in the pools, each pool gets at least as many task slots as guaranteed. Each pool
receives at least the minimum share. When a pool does not require the guaranteed share the excess
capacity is split between other jobs.
Fairness: The scheduler computes periodically the difference between the computing time received by
each job and the time it should have received in ideal scheduling. The job which has the highest deficit
of the compute time received is scheduled next.

CAPACITY SCHEDULER
The Capacity Scheduler has similar functionally as the Fair Scheduler but adopts a different scheduling
philosophy.

Prepared By: Dr. R. Mahammad Shafi, Professor 10


Unit-2: Hadoop and Python

Queues: In Capacity Scheduler, you define a number of named queues each with a configurable number
of map and reduce slots. Each queue is also assigned a guaranteed capacity. The Capacity Scheduler
gives each queue its capacity when it contains jobs, and shares any unused capacity between the
queues. Within each queue FIFO scheduling with priority is used.
Fairness: For fairness, it is possible to place a limit on the percentage of running tasks per user, so that
users share a cluster equally. A wait time for each queue can be configured. When a queue is not
scheduled for more than the wait time, it can preempt tasks of other queues to get its fair share.

Note:
Watch this Youtube Video: https://www.youtube.com/live/JrM-EcSQ2YE?feature=shared

Prepared By: Dr. R. Mahammad Shafi, Professor 11


Unit-2: Hadoop and Python

DESIGN CONSIDERATIONS FOR CLOUD APPLICATIONS

Scalability: Scalability is an important factor that drives the application designers to move to cloud
computing environments. Building applications that can serve millions of users without taking a hit on
their performance has always been challenging. With the growth of cloud computing application
designers can provision adequate resources to meet their workload levels.
Reliability & Availability: Reliability of a system is defined as the probability that a system will perform
the intended functions under stated conditions for a specified amount of time. Availability is the probability
that a system will perform a specified function under given conditions at a prescribed time.
Security: Security is an important design consideration for cloud applications given the outsourced
nature of cloud computing environments.
Maintenance & Upgradation: To achieve a rapid time-to-market, businesses typically launch their
applications with a core set of features ready and then incrementally add new features as and when they
are complete. In such scenarios, it is important to design applications with low maintenance and
upgradation costs.
Performance: Applications should be designed while keeping the performance requirements in mind.

REFERENCE ARCHITECTURE - e-Commerce, Business-to-Business, Banking and Financial apps

Fig-2.6: Reference Architecture for designing Cloud Applications

Prepared By: Dr. R. Mahammad Shafi, Professor 12


Unit-2: Hadoop and Python

Load Balancing Tier: Load balancing tier consists of one or more load balancers.
Application Tier: For this tier, it is recommended to configure auto scaling. Auto scaling can be triggered
when the recorded values for any of the specified metrics such as CPU usage, memory usage, etc.
goes above defined thresholds.
Database Tier: The database tier includes a master database instance and multiple slave instances.
The master node serves all the write requests and the read requests are served from the slave nodes.
This improves the throughput for the database tier since most applications have a higher number of read
requests than write requests.

REFERENCE ARCHITECTURES – Content delivery Apps

Fig-2.7: Reference Architecture for Content delivery Apps

The Figure-2.7 shows a typical deployment architecture for content delivery applications such as online
photo albums, video webcasting, etc. Both relational and non-relational data stores are shown in this
deployment. A content delivery network (CDN) which consists of a global network of edge locations is
used for media delivery. CDN is used to speed up the delivery of static content such as images and
videos.

Prepared By: Dr. R. Mahammad Shafi, Professor 13


Unit-2: Hadoop and Python

REFERENCE ARCHITECTURES – Analytics App

Fig-2.8: Reference Architecture for Analytics App


The Figure-2.8 shows a typical deployment architecture for compute intensive applications such as Data
Analytics, Media Transcoding, etc. Comprises of web, application, storage, computing/analytics and
database tiers. The analytics tier consists of cloud-based distributed batch processing frameworks such
as Hadoop which are suitable for analyzing big data.
Data analysis jobs (such as MapReduce) jobs are submitted to the analytics tier from the application
servers. The jobs are queued for execution and upon completion the analyzed data is presented from
the application servers.

Prepared By: Dr. R. Mahammad Shafi, Professor 14


Unit-2: Hadoop and Python

SERVICE ORIENTED ARCHITECTURE

Service Oriented Architecture (SOA) is a well-established architectural approach for designing and
developing applications in the form services that can be shared and reused. SOA is a collection of
discrete software modules or services that form a part of an application and collectively provide the
functionality of an application.
SOA services are developed as loosely coupled modules with no hard- wired calls embedded in the
services. The services communicate with each other by passing messages. Services are described using
the Web Services Description Language (WSDL). WSDL is an XML-based web services description
language that is used to create service descriptions containing information on the functions performed
by a service and the inputs and outputs of the service.

Fig-2.9: Service Oriented Architecture


SOA LAYERS
Business Systems: This layer consists of custom-built applications and legacy systems such as
Enterprise Resource Planning (ERP), Customer Relationship Management (CRM), Supply Chain
Management (SCM), etc.

Prepared By: Dr. R. Mahammad Shafi, Professor 15


Unit-2: Hadoop and Python

Service Components: The service components allow the layers above to interact with the business
systems. The service components are responsible for realizing the functionality of the services
exposed.
Composite Services: These are coarse-grained services which are composed of two or more service
components. Composite services can be used to create enterprise scale components or business-unit
specific components.
Orchestrated Business Processes: Composite services can be orchestrated to create higher level
business processes. In this layers the compositions and orchestrations of the composite services are
defined to create business processes.
Presentation Services: This is the topmost layer that includes user interfaces that exposes the
services and the orchestrated business processes to the users.
Enterprise Service Bus: This layer integrates the services through adapters, routing, transformation
and messaging mechanisms.

Fig-2.10: Service Oriented Architecture – Layers

Prepared By: Dr. R. Mahammad Shafi, Professor 16


Unit-2: Hadoop and Python

CLOUD COMPONENT MODEL

Cloud Component Model (CCM) is an application design methodology that provides a flexible way of
creating cloud applications in a rapid, convenient and platform independent manner.CCM is an
architectural approach for cloud applications that is not tied to any specific programming language or
cloud platform.
Cloud applications designed with CCM approach can have innovative hybrid deployments in which
different components of an application can be deployed on cloud infrastructure and platforms of different
cloud vendors.
Applications designed using CCM have better portability and interoperability. CCM based applications
have better scalability by decoupling application components and providing asynchronous
communication mechanisms.

CCM APPLICATION DESIGN METHODOLOGY


CCM approach for application design involves:
• Component Design
• Architecture Design
• Deployment Design

Fig-2.11: (a) CCM approaches (b) Component Design

Prepared By: Dr. R. Mahammad Shafi, Professor 17


Unit-2: Hadoop and Python

CCM COMPONENT DESIGN


Cloud Component Model is created for the application based on comprehensive analysis of the
application’s functions and building blocks. Cloud component model allows identifying the building
blocks of a cloud application which are classified based on the functions performed and type of cloud
resources required.
Each building block performs a set of actions to produce the desired outputs for other components. Each
component takes specific inputs, performs a pre- defined set of actions and produces the desired
outputs. Components offer their functions as services through a functional interface which can be used
by other components. Components report their performance to a performance database through a
performance interface.

Fig-2.12: CCM map for an e-Commerce application

CCM ARCHITECTURE DESIGN


In Architecture Design step, interactions between the application components are defined. CCM
components have the following characteristics:
Loose Coupling: Components in the Cloud Component Model are loosely coupled.
Asynchronous Communication: By allowing asynchronous communication between components, it is
possible to add capacity by adding additional servers when the application load increases. Asynchronous
communication is made possible by using messaging queues.

Prepared By: Dr. R. Mahammad Shafi, Professor 18


Unit-2: Hadoop and Python

Fig-2.13: Architecture design of an e-Commerce application

CCM DEPLOYMENT DESIGN


In Deployment Design step, application components are mapped to specific cloud resources such as
web servers, application servers, database servers, etc. Since the application components are designed
to be loosely coupled and stateless with asynchronous communication, components can be deployed
independently of each other.
This approach makes it easy to migrate application components from one cloud to the other. With this
flexibility in application design and deployment, the application developers can ensure that the
applications meet the performance and cost requirements with changing contexts.

Fig-2.14: Deployment design of an e-Commerce application

Prepared By: Dr. R. Mahammad Shafi, Professor 19


Unit-2: Hadoop and Python

SOA Vs CCM

Table-2.1: Similarities between SOA and CCM

SOA CCM
SOA advocates principles of reuse
CCM is based on reusable
and well-defined relationship between
Standardization & Re-use components which can be used by
service provider and service
multiple cloud applications.
consumer.
CCM is based on loosely coupled
SOA is based on loosely coupled
Loose coupling components that communicate
services that minimize dependencies.
asynchronously
SOA services minimize resource CCM components are stateless.
Statelessness consumption by deferring the State is stored outside of the
management of state information. components.

Table-2.2: Differences between SOA and CCM

SOA CCM
SOA services have small and well- CCM components have very large number of
End points defined set of endpoints through which endpoints. There is an endpoint for each
many types of data can pass. resource in a component, identified by a URI.
SOA uses a messaging layer above
CCM components use HTTP and REST for
Messaging HTTP by using SOAP which provide
messaging.
prohibitive constraints to developers.
Uses WS-Security , SAML and other
Security CCM components use HTTPS for security.
standards for security
CCM allows resources in components
Interfacing SOA uses XML for interfacing. represent different formats for interfacing
(HTML, XML, JSON, etc.).
CCM components and the underlying
Consuming traditional SOA services in a component resources are exposed as XML,
Consumption
browser is cumbersome. JSON (and other formats) over HTTP or
REST, thus easy to consume in the browser.

Prepared By: Dr. R. Mahammad Shafi, Professor 20


Unit-2: Hadoop and Python

MODEL VIEW CONTROLLER

Model View Controller (MVC) is a popular software design pattern for web applications.
Model: Model manages the data and the behavior of the applications. Model processes events sent by
the controller. Model has no information about the views and controllers. Model responds to the requests
for information about its state (from the view) and responds to the instructions to change state (from
controller).
View: View prepares the interface which is shown to the user. Users interact with the application through
views. Views present the information that the model or controller tell the view to present to the user and
also handle user requests and sends them to the controller.
Controller: Controller glues the model to the view. Controller processes user requests and updates the
model when the user manipulates the view. Controller also updates the view when the model changes.

Fig-2.15: Model View Controller (MVC)

RESTful Web Services


Representational State Transfer (REST) is a set of architectural principles by which you can design web
services and web APIs that focus on a system’s resources and how resource states are addressed and
transferred. The REST architectural constraints apply to the components, connectors, and data
elements, within a distributed hypermedia system.

Prepared By: Dr. R. Mahammad Shafi, Professor 21


Unit-2: Hadoop and Python

A RESTful web service is a web API implemented using HTTP and REST principles. The REST
architectural constraints are as follows:

• Client-Server
• Stateless
• Cacheable
• Layered System
• Uniform Interface
• Code on demand

Prepared By: Dr. R. Mahammad Shafi, Professor 22


Unit-2: Hadoop and Python

RELATIONAL DATABASES

A relational database is database that conforms to the relational model that was popularized by Edgar
Codd in 1970.
The 12 rules that Codd introduced for relational databases include:
1. Information rule
2. Guaranteed access rule
3. Systematic treatment of null values
4. Dynamic online catalog based on relational model
5. Comprehensive sub-language rule
6. View updating rule
7. High level insert, update, delete
8. Physical data independence
9. Logical data independence
10. Integrity independence
11. Distribution independence
12. Non-subversion rule

Relations: A relational database has a collection of relations (or tables). A relation is a set of tuples (or
rows).
Schema: Each relation has a fixed schema that defines the set of attributes (or columns in a table) and
the constraints on the attributes.
Tuples: Each tuple in a relation has the same attributes (columns). The tuples in a relation can have any
order and the relation is not sensitive to the ordering of the tuples.
Attributes: Each attribute has a domain, which is the set of possible values for the attribute.
Insert/Update/Delete: Relations can be modified using insert, update and delete operations. Every
relation has a primary key that uniquely identifies each tuple in the relation.
Primary Key: An attribute can be made a primary key if it does not have repeated values in different
tuples.
ACID GUARANTEES
Relational databases provide ACID guarantees.
Atomicity: Atomicity property ensures that each transaction is either “all or nothing”. An atomic
transaction ensures that all parts of the transaction complete or the database state is left unchanged.

Prepared By: Dr. R. Mahammad Shafi, Professor 23


Unit-2: Hadoop and Python

Consistency: Consistency property ensures that each transaction brings the database from one valid
state to another. In other words, the data in a database always conforms to the defined schema and
constraints.
Isolation: Isolation property ensures that the database state obtained after a set of concurrent
transactions is the same as would have been if the transactions were executed serially. This provides
concurrency control, i.e. the results of incomplete transactions are not visible to other transactions. The
transactions are isolated from each other until they finish.
Durability: Durability property ensures that once a transaction is committed, the data remains as it is,
i.e. it is not affected by system outages such as power loss. Durability guarantees that the database can
keep track of changes and can recover from abnormal terminations.

NON-RELATIONAL DATABASES
Non-relational databases (or popularly called No-SQL databases) are becoming popular with the growth
of cloud computing. Non-relational databases have better horizontal scaling capability and improved
performance for big data at the cost of less rigorous consistency models.
Unlike relational databases, non-relational databases do not provide ACID guarantees. Most non-
relational databases offer “eventual” consistency, which means that given a sufficiently long period of
time over which no updates are made, all updates can be expected to propagate eventually through the
system and the replicas will be consistent.
The driving force behind the non-relational databases is the need for databases that can achieve high
scalability, fault tolerance and availability. These databases can be distributed on a large cluster of
machines. Fault tolerance is provided by storing multiple replicas of data on different machines.
Types of Non-relational Databases:
Key-value store: Key-value store databases are suited for applications that require storing unstructured
data without a fixed schema. Most key-value stores have support for native programming language data
types.
Document store: Document store databases store semi-structured data in the form of documents which
are encoded in different standards such as JSON, XML, BSON, YAML, etc.
Graph store: Graph stores are designed for storing data that has graph structure (nodes and edges).
These solutions are suitable for applications that involve graph data such as social networks,
transportation systems, etc.
Object store: Object store solutions are designed for storing data in the form of objects de?ned in an
object-oriented programming language.

Prepared By: Dr. R. Mahammad Shafi, Professor 24


Unit-2: Hadoop and Python

PYTHON

Python is a general-purpose high-level programming language and suitable for providing a solid
foundation to the reader in the area of cloud computing.
The main characteristics of Python are:
Multi-paradigm programming language: Python supports more than one programming paradigms
including object-oriented programming and structured programming
Interpreted Language: Python is an interpreted language and does not require an explicit compilation
step. The Python interpreter executes the program source code directly, statement by statement, as a
processor or scripting engine does.
Interactive Language: Python provides an interactive mode in which the user can submit commands
at the Python prompt and interact with the interpreter directly.

PYTHON – BENEFITS
Easy-to-learn, read and maintain: Python is a minimalistic language with relatively few keywords,
uses English keywords and has fewer syntactical constructions as compared to other languages.
Reading Python programs feels like English with pseudo-code like constructs. Python is easy to learn
yet an extremely powerful language for a wide range of applications.
Object and Procedure Oriented: Python supports both procedure-oriented programming and object-
oriented programming. Procedure oriented paradigm allows programs to be written around procedures
or functions that allow reuse of code. Procedure oriented paradigm allows programs to be written
around objects that include both data and functionality.
Extendable: Python is an extendable language and allows integration of low-level modules written in
languages such as C/C++. This is useful when you want to speed up a critical portion of a program.
Scalable: Due to the minimalistic nature of Python, it provides a manageable structure for large
programs.
Portable: Since Python is an interpreted language, programmers do not have to worry about
compilation, linking and loading of programs. Python programs can be directly executed from source
Broad Library Support: Python has a broad library support and works on various platforms such as
Windows, Linux, Mac, etc.

Prepared By: Dr. R. Mahammad Shafi, Professor 25


Unit-2: Hadoop and Python

NUMBERS IN PYTHON

Number data type is used to store numeric values. Numbers are immutable data types, therefore
changing the value of a number data type results in a newly allocated object.

#Integer
>>>a=5
>>>type(a) #Addition
<type ’int’> >>>c=a+b
>>>c #Division
#Floating Point 7.5 >>>f=b/a
>>>b=2.5 >>>type(c) >>>f
>>>type(b) <type ’float’> 0.5
<type ’float’> >>>type(f)
#Subtraction <type float’>
>>>d=a-b
#Long
>>>d
>>>x=9898878787676L
>>>type(x) 2.5
<type ’long’> >>>type(d)
<type ’float’> #Power
#Complex >>>g=a**2
#Multiplication >>>g
>>>y=2+5j
>>>e=a*b 25
>>>y
>>>e
(2+5j)
>>>type(y) 12.5
<type ’complex’> >>>type(e)
>>>y.real <type ’float’>
2
>>>y.imag
5

Prepared By: Dr. R. Mahammad Shafi, Professor 26


Unit-2: Hadoop and Python

STRINGS IN PYTHON

A string is simply a list of characters in order. There are no limits to the number of characters you can
have in a string.

#Create string #Print string


>>>s="Hello World!" >>>print s
>>>type(s) Hello World!
<type ’str’> #Formatting output
#String concatenation >>>print "The string (The string (Hello World!)
>>>t="This is sample program." has 12 characters
>>>r = s+t #Convert to upper/lower case
>>>r >>>s.upper()
’Hello World!This is sample program.’ ’HELLO WORLD!’
#Get length of string >>>s.lower()
>>>len(s) ’hello world!’
12 #Accessing sub-strings
#Convert string to integer >>>s[0]
>>>x="100" ’H’
>>>type(s) >>>s[6:]
<type ’str’> ’World!’
>>>y=int(x) >>>s[6:-1]
>>>y ’World’
100

#strip: Returns a copy of the string with the


#leading and trailing characters removed.
>>>s.strip("!")
’Hello World’

Prepared By: Dr. R. Mahammad Shafi, Professor 27


Unit-2: Hadoop and Python

LISTS IN PYTHON

List a compound data type used to group together other values. List items need not all have the same
type. A list contains items separated by commas and enclosed within square brackets.

#Create List #Removing an item from a list


>>>fruits=[’apple’,’orange’,’banana’,’mango’] >>>fruits.remove(’mango’)
>>>type(fruits) >>>fruits
<type ’list’> [’apple’, ’orange’, ’banana’, ’pear’]
#Get Length of List #Inserting an item to a list
>>>len(fruits) >>>fruits.insert(1,’mango’)
4 >>>fruits
#Access List Elements [’apple’, ’mango’, ’orange’, ’banana’, ’pear’]
>>>fruits[1] #Combining lists
’orange’ >>>vegetables=[’potato’,’carrot’,’onion’,’beans’,’r
>>>fruits[1:3] adish’]
[’orange’, ’banana’] >>>vegetables
>>>fruits[1:] [’potato’, ’carrot’, ’onion’, ’beans’, ’radish’]
[’orange’, ’banana’, ’mango’] >>>eatables=fruits+vegetables
#Appending an item to a list >>>eatables
>>>fruits.append(’pear’) [’apple’,
>>>fruits ‘mango’,
[’apple’, ’orange’, ’banana’, ’mango’, ’pear’] ‘banana’,
‘pear’,
‘potato’,
#Mixed data types in a list ‘carrot’,
>>>mixed=[’data’,5,100.1,8287398L] ‘onion’,
>>>type(mixed) ‘beans’,
<type ’list’> ‘radish’]
>>>type(mixed[0])
<type ’str’>
>>>type(mixed[1])
<type ’int’>
>>>type(mixed[2])
<type ’float’>
>>>type(mixed[3])
<type ’long’>
#Change individual elements of a list
>>>mixed[0]=mixed[0]+" items"
>>>mixed[1]=mixed[1]+1
>>>mixed[2]=mixed[2]+0.05
>>>mixed
[’data items’, 6, 100.14999999999999, 8287398L]
#Lists can be nested
>>>nested=[fruits,vegetables]
>>>nested
[[’apple’, ’mango’, ’orange’, ’banana’, ’pear’],
[’potato’, ’carrot’, ’onion’, ’beans’, ’radish’]]

Prepared By: Dr. R. Mahammad Shafi, Professor 28


Unit-2: Hadoop and Python

TUPLES IN PYTHON

A tuple is a sequence data type that is similar to the list. A tuple consists of a number of values separated
by commas and enclosed within parentheses. Unlike lists, the elements of tuples cannot be changed, so
tuples can be thought of as read-only lists.

#Create a Tuple
>>>fruits=("apple","mango","banana","pineapple")
>>>fruits
(’apple’, ’mango’, ’banana’, ’pineapple’)
>>>type(fruits)
<type ’tuple’>
#Get length of tuple
>>>len(fruits) 4

#Get an element from a tuple


>>>fruits[0]
’apple’
>>>fruits[:2]
(’apple’, ’mango’)
#Combining tuples
>>>vegetables=(’potato’,’carrot’,’onion’,’radish’)
>>>eatables=fruits+vegetables
>>>eatables
(’apple’, ’mango’, ’banana’, ’pineapple’, ’potato’, ’carrot’, ’onion’, ’radish’)

Prepared By: Dr. R. Mahammad Shafi, Professor 29


Unit-2: Hadoop and Python

DICTIONARIES

Dictionary is a mapping data type or a kind of hash table that maps keys to values. Keys in a dictionary
can be of any data type, though numbers and strings are commonly used for keys. Values in a dictionary
can be any data type or object.

#Create a dictionary #Get all keys in a dictionary


>>>student={’name’:’Mary’,’id’:’8776’,’major’:’CS’} >>>student.keys()
>>>student [’gender’, ’major’, ’name’, ’id’]
{’major’: ’CS’, ’name’: ’Mary’, ’id’: ’8776’} #Get all values in a dictionary
>>>type(student) >>>student.values() [’female’, ’CS’,
<type ’dict’> ’Mary’, ’8776’]
#Get length of a dictionary #Add new key-value pair
>>>len(student) >>>student[’gender’]=’female’
3 >>>student
#Get the value of a key in dictionary {’gende
>>>student[’name’] r’: ’female’, ’major’: ’CS’, ’name’: ’Mary’, ’id’: ’8776’}
’Mary’ #A value in a dictionary can be another dictionary
#Get all items in a dictionary >>>student1={’name’:’David’,’id’:’9876’,’major’:’ECE’}
>>>student.items() >>>students={’1’: student,’2’:student1}
[(’gender’, ’female’), (’major’, ’CS’), (’name’, ’Mary’), >>>students
(’id’, ’8776’)] {’1’:
{’gender’: ’female’, ’major’: ’CS’, ’name’: ’Mary’, ’id’: ’8776’}, ’2’:
{‘major’:’ECE’,’name’:’David’,’id’:’9876’}}

#Check if dictionary has a key


>>>student.has_key(’name’)
True
>>>student.has_key(’grade’)
False

Prepared By: Dr. R. Mahammad Shafi, Professor 30


Unit-2: Hadoop and Python

TYPE CONVERSIONS IN PYTHON

Type Conversion Examples:

#Convert to string
>>>a=10000
>>>str(a)
’10000’
#Convert to int
>>>b="2013"
>>>int(b)
2013
#Convert to float
>>>float(b)
2013.0

#Convert to long
>>>long(b)
2013L
#Convert to list
>>>s="aeiou"
>>>list(s)
[’a’, ’e’, ’i’, ’o’, ’u’]
#Convert to set
>>>x=[’mango’,’apple’,’banana’,’mango’,’banana’]
>>>set(x)
set([’mango’, ’apple’, ’banana’])

Prepared By: Dr. R. Mahammad Shafi, Professor 31


Unit-2: Hadoop and Python

CONTROL FLOW – IF STATEMENT

The if statement in Python is similar to the if statement in other languages.

>>>a = 25**5 >>>if a>10000:


>>>if a>10000: if a<1000000:
print "More" print "Between 10k and 100k"
else: else:
print "Less" print "More than 100k"
More elif a==10000:
print "Equal to 10k"
else:
print "Less than 10k"
More than 100k

>>>s="Hello World" >>>student={’name’:’Mary’,’id’:’8776’}


>>>if "World" in s: >>>if not student.has_key(’major’):
s=s+"!" student[’major’]=’CS’
print s >>>student
Hello World! {’major’: ’CS’, ’name’: ’Mary’, ’id’: ’8776’}

Prepared By: Dr. R. Mahammad Shafi, Professor 32


Unit-2: Hadoop and Python

CONTROL FLOW – FOR STATEMENT

The for statement in Python iterates over items of any sequence (list, string, etc.) in the order in which
they appear in the sequence.
This behavior is different from the for statement in other languages such as C in which an initialization,
incrementing and stopping criteria are provided.

#Looping over characters in a string #Looping over items in a list


helloString = "Hello World" fruits=[’apple’,’orange’,’banana’,’mango’]

for c in helloString: i=0

print c for item in fruits:


print "Fruit-%d: %s" % (i,item)
i=i+1

Prepared By: Dr. R. Mahammad Shafi, Professor 33


Unit-2: Hadoop and Python

CONTROL FLOW – WHILE STATEMENT

The while statement in Python executes the statements within the while loop as long as the while
condition is true.

#Prints even numbers upto 100


>>> i = 0
>>> while i<=100:
if i%2 == 0:
print i
i = i+1

Prepared By: Dr. R. Mahammad Shafi, Professor 34


Unit-2: Hadoop and Python

CONTROL FLOW – RANGE STATEMENT

The range statement in Python generates a list of numbers in arithmetic progression.

#Generate a list of numbers from 0 – 9


>>>range (10)
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

#Generate a list of numbers from 10 - 100 with increments


of 10
>>>range(10,110,10)
[10, 20, 30, 40, 50, 60, 70, 80, 90,100]

Prepared By: Dr. R. Mahammad Shafi, Professor 35


Unit-2: Hadoop and Python

CONTROL FLOW – BREAK/ CONTINUE STATEMENTS

The break and continue statements in Python are similar to the statements in C.

break Statement
Break statement breaks out of the for/while loop

#Break statement example


>>>y=1
>>>for x in range(4,256,4):
y=y*x
if y > 512:
break
print y
4
32
384

continue Statement
Continue statement continues with the next iteration

#Continue statement example


>>>fruits=[’apple’,’orange’,’banana’,’mango’]
>>>for item in fruits:
if item == "banana":
continue
else:
print item
apple
orange
mango

Prepared By: Dr. R. Mahammad Shafi, Professor 36


Unit-2: Hadoop and Python

CONTROL FLOW – PASS STATEMENT

The pass statement in Python is a null operation.


The pass statement is used when a statement is required syntactically but you do not want any command
or code to execute.

>fruits=[’apple’,’orange’,’banana’,’mango’]
>for item in fruits:
if item == "banana":
pass
else:
print item
apple
orange
mango

Prepared By: Dr. R. Mahammad Shafi, Professor 37


Unit-2: Hadoop and Python

FUNCTIONS

A function is a block of code that takes information in (in the form of parameters), does some
computation, and returns a new piece of information based on the parameter information. A function in
Python is a block of code that begins with the keyword def followed by the function name and
parentheses. The function parameters are enclosed within the parenthesis.
The code block within a function begins after a colon that comes after the parenthesis enclosing the
parameters. The first statement of the function body can optionally be a documentation string or
docstring.

students = { '1': {'name': 'Bob', 'grade': 2.5},


'2': {'name': 'Mary', 'grade': 3.5},
'3': {'name': 'David', 'grade': 4.2},
'4': {'name': 'John', 'grade': 4.1},
'5': {'name': 'Alex', 'grade': 3.8}}
def averageGrade(students):
“This function computes the average grade”
sum = 0.0
for key in students:
sum = sum + students[key]['grade']
average = sum/len(students)
return average
avg = averageGrade(students)
print "The average garde is: %0.2f" % (avg)

FUNCTIONS – DEFAULT ARGUMENTS

>>>def displayFruits(fruits=[’apple’,’orange’]):
Functions can have default values print "There are %d fruits in the list" % (len(fruits))
of the parameters. If a function with for item in fruits:
default values is called with fewer print item
parameters or without any #Using default arguments
>>>displayFruits()
parameter, the default values of
apple
the parameters are used. orange
>>>fruits = [’banana’, ’pear’, ’mango’]
>>>displayFruits(fruits)
banana
pear
mango
Prepared By: Dr. R. Mahammad Shafi, Professor 38
Unit-2: Hadoop and Python

FUNCTIONS – PASSING BY REFERENCE


All parameters in the Python functions are passed by reference. If a parameter is changed within a
function the change also reflected back in the calling function.

>>>def displayFruits(fruits):
print "There are %d fruits in the list" % (len(fruits))
for item in fruits:
print item
print "Adding one more fruit"
fruits.append('mango')
>>>fruits = ['banana', 'pear', 'apple']
>>>displayFruits(fruits)
There are 3 fruits in the list
banana
pear
apple
#Adding one more fruit
>>>print "There are %d fruits in the list" % (len(fruits))
There are 4 fruits in the list

FUNCTIONS – KEY ARGUMENTS


Functions can also be called using keyword arguments that identifies the arguments by the parameter
name when the function is called.

>>>def #Correct use


printStudentRecords(name,age=20,major=’CS’): >>>printStudentRecords(name=’Alex’)
print "Name: " + name Name: Alex
print "Age: " + str(age) Age: 20
print "Major: " + major Major: CS
#This will give error as name is required argument
>>>printStudentRecords(name=’Bob’,age=22,major=’EC E’)
>>>printStudentRecords()
Name: Bob
Traceback (most recent call last):
Age: 22
File "<stdin>", line 1, in <module>
Major: ECE
TypeError: printStudentRecords() takes at least 1
>>>printStudentRecords(name=’Alan’,major=’ECE’)
Name: Alan
Age: 20
Major: ECE

Prepared By: Dr. R. Mahammad Shafi, Professor 39


Unit-2: Hadoop and Python

#name is a formal argument.


#**kwargs is a keyword argument that receives all
arguments except the formal argument as a
dictionary.
>>>def student(name, **kwargs):
print "Student Name: " + name
for key in kwargs:
print key + ’: ’ + kwargs[key]
>>>student(name=’Bob’, age=’20’, major = ’CS’)
Student Name: Bob
age: 20
major: CS

FUNCTIONS – VARIABLE LENGTH ARGUMENTS


Python functions can have variable length arguments. The variable length arguments are passed to as a
tuple to the function with an argument prefixed with asterix (*)

>>>def student(name, *varargs):


print "Student Name: " + name for item in varargs:
print item
>>>student(’Nav’) Student Name: Nav
>>>student(’Amy’, ’Age: 24’) Student Name:
Amy
Age: 24
>>>student(’Bob’, ’Age: 20’, ’Major: CS’) Student Name: Bob
Age: 20 Major: CS

Prepared By: Dr. R. Mahammad Shafi, Professor 40


Unit-2: Hadoop and Python

MODULES

Python allows organizing the program code into different modules which improves the code readability
and management.
A module is a Python file that defines some functionality in the form of functions or classes. Modules
can be imported using the import keyword. Modules to be imported must be present in the search path.

#student module - saved as student.py #Using student module


def averageGrade(students): >>>import student
sum = 0.0 >>>students = '1': 'name': 'Bob', 'grade': 2.5,
for key in students: '2': 'name': 'Mary', 'grade': 3.5,
sum = sum + students[key]['grade'] '3': 'name': 'David', 'grade': 4.2,
average = sum/len(students) '4': 'name': 'John', 'grade': 4.1,
return average '5': 'name': 'Alex', 'grade': 3.8
>>>student.printRecords(students)
def printRecords(students):
There are 5 students
print "There are %d students" %(len(students))
Student-1:
i=1
Name: Bob
for key in students:
Grade: 2.5
print "Student-%d: " % (i)
Student-2:
print "Name: " + students[key]['name']
Name: David
print "Grade: " + str(students[key]['grade'])
Grade: 4.2
i = i+1
Student-3:
Name: Mary
Grade: 3.5
Student-4:
Name: Alex
Grade: 3.8
Student-5:
Name: John
Grade: 4.1
# Importing a specific function from a module
>>>avg = student. averageGrade(students)
>>>from student import averageGrade
>>>print "The average garde is: %0.2f" % (avg)
# Listing all names defines in a module
3.62
>>>dir(student)

Prepared By: Dr. R. Mahammad Shafi, Professor 41


Unit-2: Hadoop and Python

PACKAGES

Python package is hierarchical file structure that consists of modules and subpackages. Packages
allow better organization of modules related to a single application environment.

# skimage package listing

skimage/ Top level package


init .py Treat directory as a package

color/ color color subpackage


init .py
colorconv.py
colorlabel.py
rgb_colors.py

draw/ draw draw subpackage


init .py
draw.py
setup.py
exposure/ exposure subpackage
init .py
_adapthist.py
exposure.py
feature/ feature subpackage
init .py
_brief.py
_daisy.py
...

Prepared By: Dr. R. Mahammad Shafi, Professor 42


Unit-2: Hadoop and Python

FILE HANDLING

Python allows reading and writing to files using the file object. The open(filename, mode) function is
used to get a file object. The mode can be read (r), write (w), append (a), read and write (r+ or w+),
read-binary (rb), write-binary (wb), etc.
After the file contents have been read the close function is called which closes the file object.

Prepared By: Dr. R. Mahammad Shafi, Professor 43


Unit-2: Hadoop and Python

Prepared By: Dr. R. Mahammad Shafi, Professor 44


Unit-2: Hadoop and Python

DATE/ TIME OPERATORS

Python provides several functions for date and time access and conversions. The datetime module allows
manipulating date and time in several ways. The time module in Python provides various time-related
functions.

Prepared By: Dr. R. Mahammad Shafi, Professor 45


Unit-2: Hadoop and Python

CLASSES

Python is an Object-Oriented Programming (OOP) language. Python provides all the standard features
of Object-Oriented Programming such as classes, class variables, class methods, inheritance, function
overloading, and operator overloading.
Class: A class is simply a representation of a type of object and user-defined prototype for an object that
is composed of three things: a name, attributes, and operations/methods.
Instance/Object: Object is an instance of the data structure defined by a class.
Inheritance: Inheritance is the process of forming a new class from an existing class or base class.
Function overloading: Function overloading is a form of polymorphism that allows a function to have
different meanings, depending on its context.
Operator overloading: Operator overloading is a form of polymorphism that allows assignment of more
than one function to a particular operator.
Function overriding: Function overriding allows a child class to provide a specific implementation of a
function that is already provided by the base class. Child class implementation of the overridden function
has the same name, parameters and return type as the function in the base class.

Class Example

The variable studentCount is a class variable that is shared by all


instances of the class Student and is accessed by
Student.studentCount.
The variables name, id and grades are instance variables which are
specific to each instance of the class.
There is a special method by the name init () which is the class
constructor.
The class constructor initializes a new instance when it is created. The
function del() is the class destructor

Prepared By: Dr. R. Mahammad Shafi, Professor 46


Unit-2: Hadoop and Python

# Examples of a class
class Student:
studentCount = 0
def init (self, name, id):
print "Constructor called"
self.name = name
self.id = id
Student.studentCount = Student.studentCount + 1
self.grades={}
def del (self):
print "Destructor called"
def getStudentCount(self):
return Student.studentCount
def addGrade(self,key,value):
self.grades[key]=value
def getGrade(self,key):
return self.grades[key]
def printGrades(self):
for key in self.grades:
print key + ": " + self.grades[key]

>>>s = Student(’Steve’,’98928’) Constructor


called
>>>s.addGrade(’Math’,’90’)
>>>s.addGrade(’Physics’,’85’)
>>>s.printGrades() Physics: 85
Math: 90
>>>mathgrade = s.getGrade(’Math’)
>>>print mathgrade 90
>>>count = s.getStudentCount()
>>>print count 1
>>>del s Destructor called

Prepared By: Dr. R. Mahammad Shafi, Professor 47


Unit-2: Hadoop and Python

SUMMARY

Hadoop is an open-source software framework that is used for storing and processing large amounts of
data in a distributed computing environment. It is designed to handle big data and is based on the
MapReduce programming model, which allows for the parallel processing of large datasets.

Hadoop has two main components:

HDFS (Hadoop Distributed File System): This is the storage component of Hadoop, which allows for
the storage of large amounts of data across multiple machines. It is designed to work with commodity
hardware, which makes it cost-effective.

YARN (Yet Another Resource Negotiator): This is the resource management component of Hadoop,
which manages the allocation of resources (such as CPU and memory) for processing the data stored in
HDFS.

Hadoop also includes several additional modules that provide additional functionality, such as Hive (a
SQL-like query language), Pig (a high-level platform for creating MapReduce programs), and HBase (a
non-relational, distributed database). Hadoop is commonly used in big data scenarios such as data
warehousing, business intelligence, and machine learning. It’s also used for data processing, data
analysis, and data mining. It enables the distributed processing of large data sets across clusters of
computers using a simple programming model.

When designing applications for the cloud, irrespective of the chosen platform, often found it useful to
consider four specific topics scalability, availability, manageability and feasibility. The consideration of
these four topics will help you discover areas in your application that require some cloud-specific thought,
specifically in the early stages of a project.

Python is an interpreted, object-oriented, high-level programming language with dynamic semantics. Its
high-level built in data structures, combined with dynamic typing and dynamic binding, make it very
attractive for Rapid Application Development, as well as for use as a scripting or glue language to connect
existing components together. Python's simple, easy to learn syntax emphasizes readability and
therefore reduces the cost of program maintenance. Python supports modules and packages, which
encourages program modularity and code reuse. The Python interpreter and the extensive standard
library are available in source or binary form without charge for all major platforms, and can be freely
distributed.

Prepared By: Dr. R. Mahammad Shafi, Professor 48


Unit-2: Hadoop and Python

Often, programmers fall in love with Python because of the increased productivity it provides. Since there
is no compilation step, the edit-test-debug cycle is incredibly fast. Debugging Python programs is easy:
a bug or bad input will never cause a segmentation fault. Instead, when the interpreter discovers an error,
it raises an exception. When the program doesn't catch the exception, the interpreter prints a stack trace.
A source level debugger allows inspection of local and global variables, evaluation of arbitrary
expressions, setting breakpoints, stepping through the code a line at a time, and so on. The debugger is
written in Python itself, testifying to Python's introspective power. On the other hand, often the quickest
way to debug a program is to add a few print statements to the source: the fast edit-test-debug cycle
makes this simple approach very effective.

VISION – DEPT. OF AI & DS

“To be a Centre of Excellence in the domain of Artificial Intelligence


and Data Science for addressing local and global challenges.”

MISSION – DEPT. OF AI & DS

▪ To impart quality education in the areas of existing and AI & DS


techniques.
▪ To groom students technologically superior and ethically stronger.
▪ To equip students with interdisciplinary skills to solve real-world
problems and make them life-long learners.

Prepared By: Dr. R. Mahammad Shafi, Professor 49


Unit-2: Hadoop and Python

Aditya College of Engineering, Madanapalle under the umbrella of Veda Educational Society was Established in the year 2009 on lofty
and noble ideals to impart excellent technical and value-based Education under the able and dynamic leadership of
Sri R. Ramamohan Reddy, Secretary & Correspondent and Sri. M. Nagamalla Reddy, President and under the visionary guidance of
Dr. S. Ramalinga Reddy, Director.
The Institution is approved by AICTE & affiliated to Jawaharlal Nehru Technological University Anantapur. The Institution is beautifully
nestled against an array of mountains and lush greenery about 10 km from the heart of Madanapalle. Madanapalle is famous for
agricultural products such as Tomato, Mango, Groundnut, and Tamarind etc.,.Madanapalle has the biggest tomato market in Asia.
Bharath Ratna Rabindranath Tagore translated the National Anthem from Bengali to English and also set it to music in Madanapalle.

50

You might also like