0% found this document useful (0 votes)
6 views

Questions Data

The document discusses different Python data structures and packages used for data analysis. It covers key differences between lists and tuples, mutable and immutable data types, and packages like NumPy, Pandas, Scikit-learn. Random number generation and lambda functions are also explained.

Uploaded by

Mosheer Khan
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views

Questions Data

The document discusses different Python data structures and packages used for data analysis. It covers key differences between lists and tuples, mutable and immutable data types, and packages like NumPy, Pandas, Scikit-learn. Random number generation and lambda functions are also explained.

Uploaded by

Mosheer Khan
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

PYTHON QUESTION

Q1. What are the different data structures present in Python?


The different data structures in Python are:
 Lists
 Strings
 Tuples
 Sets
 Dictionaries

Q2. Out of these data structures, which are mutable and which are immutable?
Mutable data structures are the one whose contents can be changed after it's declared
whereas immutable data structures are those whose contents cannot be altered once they
are declared. Among the data structures mentioned, the mutable ones are lists, sets,
and dictionaries. Strings and tuples, on the other hand, are immutable.

Q3. What is the difference between a list and a tuple?


The main differences between lists and tuples are:
1. Lists are mutable whereas tuples are immutable.
2. The syntax for tuples is given using 2 round brackets, i.e. (), whereas 2 square brackets []
are used in the case of lists.
3. Tuples can be used as keys for dictionaries whereas lists cannot, i.e. the following
declaration of a dictionary is valid since it is using tuples as keys,
d = {(1,1): 'Sumit', (1,2): 'Shruti'}
where as the following declaration is invalid:
d = {[1,1]: 'Sumit', [1,2]: 'Shruti'}

Q4. What are dictionaries in Python? How do access the keys in a dictionary?
Dictionaries are a data structure present in Python which store key-value pairs. The idea is
that essentially any value in the dictionary can be accessed using its corresponding key.
Let's look at the following example:
d = {'Name': 'Kaustubh', 'Age': 25, 'Blood Group': 'B+'}
Now, using d['Name'] will return 'Kaustubh', d['Age'] will return 25, and d['Blood
Group'] will return 'B+'.
The keys, on the other hand, can be accessed using the '.keys()' command. In the dictionary
above, using d.keys() will return ['Name', 'Age', 'Blood Group'].

Q5. What are comprehensions in Python?


List/dictionary comprehensions provide a concise way to create lists/dictionaries. Common
applications are to make new lists where each element is the result of some operations
applied to each member of another sequence or iterable or to create a subsequence of
those elements that satisfy a certain condition. Let's take a simple example. Suppose you
want to create a list containing the squares of the first 'n' numbers. A normal code for this
would look like the following:
l = [] for i in range(1,n+1): l.append(i**2)
But it can be done through a really concise code by using list comprehension as:
l = [x**2 for x in range(1,n+1)]

Q6. What are negative indexing in Python and how is it used?


Indexing in Python can be done using both positive and negative integers. In negative
indexing, the indexing starts from -1 which represents the last element in an iterative data
structure. For example, if you have a list l = [1, 2, 3, 4, 5], l[-1] will return the last element,
i.e. 5, l[-2] will return the second last element, i.e. 4 and so on.

Q7. How can you generate random numbers in Python?


Python has a module named 'random' which can be used to generate random numbers. It
can be called using the following code:
import random
After you have imported the random module, you can use random.random() to generate a
random number ranging from 0 to 1. Apart from this, there are also a few other methods
available to customise your method of random number generation. These are:
1. random.choice(any_iterable): You can pass any data structure in the bracket and it will
select one of the elements from it at random. For example, if you have a list l = [3, 4, 1, 9, 6]
and you use random.choice(l), it will randomly select any one element from the list.
2. random.randrange(a, b): This function helps you generate a random integer from the
range [a, b).
3. random.uniform(a, b): This function is the same as the above, the only difference being,
this function will return you a floating point number instead of an integer.
4. normalvariate(μ, σ): This function helps you generate a random number from a normal
distribution whose mean is μ and standard deviation is σ.

Q8. When will you use a list vs tuple vs set?


A list is the most common type of data structure which can store a sequence of elements.
Also, lists are mutable by nature. So list as a data structure is a perfect choice when you
want to store a series of elements which you can change later.
Tuples are the same as lists with a small difference that tuples are immutable and they also
use lesser space as compared to lists. Tuples can be used when you want a sequence of
constants which cannot be altered.
Sets as a data structure store a sequence of unique elements which are unordered. Sets are
the perfect option when you want to store many elements while avoiding duplication.

Q9. What are lambda functions in Python? How are they used?
Lambda functions are single expression anonymous functions which can be created on the
go. Lambda functions can have any number of arguments like a normal function and
provide a very concise way to write otherwise long functions. The syntax of a lambda
function is very simple. For example, if you want to write a lambda function that finds the
maximum of two integers, it can be simply written as:
f = lambda x, y: x if x > y else y f(9, 5)
The following function will simply return 9. If you would have written a normal function, on
the other hand, you would have 4-5 additonal lines of code. Another advantage of lambda
function is it doesn't require a 'return' statement.

Q10. What are global and local variables?


If a variable is assigned a value within some function, it is said to be a local variable whereas
if a variable is assigned outside the scope of any function, i.e. in the main program, it is said
to be a global variable. For example, in the following code: x = 5 def my_func(): x = 4
print(x) print(x)
If the function my_func() is called, the value of x will be printed as 4 since it is defined
within the function, but when the print statement which is present outside of the function
is executed, the value will be printed as 5 since, x is defined as 5 in the main program. The
value of x = 4 in only limited to the function my_func() whereas x = 5 is valid throughout.

Q1. What are the different packages that are necessary for data analysis in Python?
The following are the most important packages that are used for data analysis in Python:
 Numerical/Mathematical Packages:
1. NumPy (or, Numerical Python)
2. SciPy

 Dataframe Manipulation:
1. Pandas

 Data Visualisation:
1. Matplotlib
2. Seaborn

 Model Building:
1. SciKit-Learn
2. Statsmodels

Q2. What is the difference between NumPy and SciPy packages?


In an ideal world, NumPy would contain nothing but the array data type and the most basic
operations: indexing, sorting, reshaping, basic elementwise functions, et cetera. All
numerical code would reside in SciPy. However, one of NumPy’s important goals is
compatibility, so NumPy tries to retain all features supported by either of its predecessors.
Thus NumPy contains some linear algebra functions, even though these more properly
belong in SciPy. In any case, SciPy contains more fully-featured versions of the linear
algebra modules, as well as many other numerical algorithms. If you are doing scientific
computing with python, you should probably install both NumPy and SciPy. Most new
features belong in SciPy rather than NumPy.

Q3. What is the advantage of NumPy arrays over lists?


While lists are a highly efficient data structure which allows storing variables of different
types, mutation, insertion, deletion, etc. fairly efficiently, they do have some disadvantages
compared to the NumPy array.
1. Lists do not allow vectorised operations like NumPy arrays. In NumPy arrays, you can
perform various operations on the elements of the array using the simple operators
present in Python which is a feature not allowed by lists. This makes programming
using NumPy arrays very convenient.
2. Since NumPy is built on top of C, programming operations in NumPy arrays are
significantly faster compared to Python lists.

Q4. What is the difference between long and wide format?


In wide format, all the features of an instance are provided in a single row with each
feature being indicated in separate columns whereas, in the long format, a particular
instance is repeated in each row with different attributes, i.e. in the long format, only one
attribute is specified in one row. The following tables are examples of long and wide
formats:
Wide Format
Name Age Blood Group

Siddhartha 23 AB+

Sahil 22 A-
Long Format
Name Attribute Value

Siddhartha Age 23

Siddhartha Blood Group AB+

Sahil Age 22

Sahil Blood GroupA-

Q5. What is the difference between a Pandas series and a Pandas dataframe?
According to the official Pandas documentation, dataframes are two-dimensional, size-
mutable, potentially heterogeneous tabular data structure with labelled axes (rows and
columns). Arithmetic operations align on both row and column labels. It Can be thought of
as a dict-like container for Series object.
Series primarily contains just one homogenous column with corresponding indices. So,
series can be thought of as the data structure for a single column of the pandas dataframe.
And in actuality as well, the data is stored in a dataframe as a collection of series. If you use
the command type() on any of the columns in the dataframe, you will see that the output is
pd.series.
An analogy to understand this is arrays and matrices, or 1D array and 2D arrays. 1D arrays
are like the building blocks of 2D arrays, and 2D arrays cannot be made without the
existence of 1D arrays.

SQL QUESTION

Q1. What are the different types of statements supported by SQL?


There are 3 types of SQL statements:
1) DDL (Data Definition Language): DDL consists of commands that can be used to create
or modify the database schema of a database system. They can be used to create or edit
the structure of the objects of the database, such as tables, views, indexes, etc.
Some of the DDL commands are listed below:
CREATE: It is used to create a table.
ALTER: The ALTER TABLE statement is used to modify an existing table object in the
database.
DROP: The DROP TABLE statement is used to drop an existing table in a database.
2) DML (Data Manipulation Language): These statements are used to manipulate data in
records. The most commonly used DML statements are Insert, Update, and Delete.
The Select statement is a partial DML statement as it does not manipulate any data. It used
to select all or relevant records in the table and display them as the output.
3) DCL (Data Control Language): These statements are used to set privileges, such as Grant
and Revoke database access permission, to a specific user.

Q2. What do you mean by Data Definition Language? What do you mean by Data
Manipulation Language?
Data Definition Language, or DDL, consists of the commands using which a user can create
or modify a database schema. These can be used to create or edit the structure of the
objects of a database, such as tables, indexes, etc.
Data Manipulation Language, or DML, consists of the commands using which a user can
access or manipulate data in the database. It allows you to perform the following functions:
 Insert data or rows in a database
 Delete data from a database
 Retrieve or fetch data
 Update data in a database.

Q3. What is a transaction and what are its controls?


A transaction can be defined as a sequence task that is performed on databases in a logical
manner to gain certain results. Operations such as creating, updating, and deleting records
in a database comes from transactions.
In simple words, a transaction refers to a group of SQL queries executed on database
records.
There are four transaction controls:
 COMMIT: It is used to save all changes made through a transaction.
 ROLLBACK: It is used to roll back a transaction such as all changes made by the
transaction are undone and the database is reverted to its original form.
 SET TRANSACTION: It is used to set the name of a transaction.
 SAVEPOINT: It is used to set the point from where a transaction is to be rolled back.

Q4. What are the properties of a transaction?
The properties of a transaction are abbreviated as ACID properties, and they are as follows:
 Atomicity: This ensures the completeness of all transactions performed and checks
whether every transaction is completed successfully. If not, the transaction is aborted
at the failure point and the previous transaction is rolled back to its initial state as the
changes are undone. In simple words, this property ensures the correctness in the
database by taking care of the fact that all transactions are executed completely.
 Consistency: This ensures that all changes made through a successful transaction are
reflected properly on the database.
 Isolation: This ensures that all transactions are performed independently and the
changes made by one transaction are not reflected on another. The changes are
visible only after they are committed to the main memory.
 Durability: This ensures that the changes made in the database with committed
transactions persist as they are even after a system failure. Thus, the effects of
transactions are never lost.
Q5. Define ‘view’ in SQL. Write the syntax for creating a view in SQL.
A ’view’ can be defined as a virtual table that contains rows and columns with fields from
one or more tables.
Syntax:
CREATE VIEW view_name AS
SELECT column_name(s)
FROM table_name
WHERE condition

Q6. What is an index?


A database index is a data structure that facilitates data retrieval operations on a database
table and improves their speed at the cost of more storage space to maintain the extra
copy of data.
The data is stored on disk only in a particular order. So, to support faster access according
to different values, searching algorithms such as ‘binary search’ are desired. For this
purpose, indexes are created on tables. These indexes need extra space on disk, but they
allow faster search, according to different values.

Q7. What is a clustered index? What is a non-clustered index? What is the difference
between clustered and non-clustered indexes?
Clustered indexes sort and store the data rows in the table or view based on their key
values. These are the columns included in the index definition. There can be only one
clustered index per table because the data rows can be stored in only one order.
In a non-clustered index, there is a second list that has pointers to the physical rows. You
can have many non-clustered indexes, although each new index will increase the time it
takes to write new records.
The different between these two indexes are as follows.
 One table can have only one clustered index but multiple non-clustered indexes.
 Clustered indexes can be read more rapidly than non-clustered indexes.
 Clustered indexes store data physically in the table or view, but non-clustered
indexes don’t store data in the table as they have a separate structure from data
rows.

Q8. What is the difference between DELETE and TRUNCATE? What is the difference
between DROP and TRUNCATE?
 The basic difference between DELETE and TRUNCATE is that DELETE is a DML
command, while TRUNCATE is a DDL command. (For more details, take a look at this)
 DELETE is used to delete specific rows from the table, whereas TRUNCATE removes
all the rows from the table.
 DELETE can be used with the WHERE clause, which cannot be used with TRUNCATE.
TRUNCATE removes all rows from the table, while DROP removes the entire table from the
database.

Q9. What do you mean by subquery? How many row comparison operators are used
while working with a subquery?
A query within another query is called a subquery. A subquery is also called an inner query,
which returns the output to be used by another query.
There are three row comparison operators used in subqueries, namely IN, ANY, and ALL.

Q10. What is the difference between a nested subquery and a correlated subquery?
A subquery within another subquery is called a nested subquery. If the output of a
subquery is dependent on the column values of the parent query table, the query is called a
correlated subquery.

Q11. What is the difference between SQL and MySQL?


SQL, or Structured Query Language, is a standard language that is used for manipulating
and accessing the relational database. MySQL, on the other hand, is a relational database
that uses SQL as the standard database language.

Q12. How many aggregate functions are available in SQL?


An SQL aggregate function calculates values from multiple columns in a table and returns a
single value.
There are seven aggregate functions we use in SQL:
 AVG(): This returns the average value from a specified column.
 COUNT(): This returns the number of table rows.
 MAX(): This returns the largest value among the records.
 MIN(): This returns the smallest value among the records.
 SUM(): This returns the sum of specified column values.
 FIRST(): This returns the first value.
 LAST(): This returns the last value.

Q13. What are scalar functions in SQL?


A scalar function is used to return a single value based on the input values.
The following are scalar functions:
 UCASE(): This converts the specified field to upper case.
 LCASE(): This converts the specified field to lower case.
 MID(): This extracts and returns characters from the text field.
 FORMAT(): This specifies the display format.
 LEN(): This specifies the length of the text field.
 ROUND(): This rounds up the decimal field value to an integer.

Q14. What is the difference between CHAR and VARCHAR2 data type in SQL?
Both these data types are used for characters, but VARCHAR2 is used for character strings
of variable lengths, whereas CHAR is used for character strings of a fixed length. For
example, if we specify the type as CHAR(5), we will not be allowed to store a string of any
other length in this variable. However, if we specify the type of this variable as
VARCHAR2(5), we will be allowed to store strings of variable lengths.

Q15. What is UNIQUE constraint in MySQL? What is the difference between a PRIMARY
KEY and UNIQUE constraints?
A UNIQUE constraint is a single field or a combination of fields that uniquely defines a
record. Some of the fields can contain null values as long as the combination of values is
unique.
The difference between PRIMARY KEY and UNIQUE constraints are -
1. A PRIMARY KEY cannot have NULL value, while UNIQUE constraints can have NULL
values.
2. There is only one PRIMARY KEY in a table, but there can be multiple UNIQUE
constraints. By this, we mean that in a table, there can be only one PRIMARY KEY, but
the UNIQUE constraint can be put up on as many columns as we want.
3. The PRIMARY KEY creates the cluster index automatically, but the UNIQUE constraint
does not.

Q16. What do you mean by FOREIGN KEY?


A FOREIGN KEY is a key(field) used to link two tables together. A foreign key is a field (or a
collection of fields) in one table that refers to the PRIMARY KEY in another table.

Q17. What is a join in SQL? What are the types of joins?


The JOIN command combines tables based on the common field (column) between them.
There are several ways to join tables. These are -
 INNER JOIN: The INNER JOIN keyword selects all the rows from both the tables
where the condition is satisfied i.e. the value of the common field, using which the
join is executed, is the same.
 LEFT JOIN: The LEFT JOIN returns an output table that is comprised of all the records
of the table on the left side of the join and the corresponding matching records on
the right side. Here, the term ‘matching records’ means that the given condition is
satisfied. If there are no matching records in the right-side table, the output table will
contain NULL. LEFT JOIN is also known as LEFT OUTER JOIN.
 RIGHT JOIN: RIGHT JOIN is similar to LEFT JOIN. The RIGHT JOIN keyword returns an
output table comprising all the records of the table on the right side of the join and
the corresponding matching records on the left side. If there are no matching records
in the left side table, the output table will contain NULL. RIGHT JOIN is also known as
RIGHT OUTER JOIN.
 FULL OUTER JOIN: FULL OUTER JOIN returns an output table comprising all the
records from both the tables. If there are no matching records on the other side, the
output table will contain NULL values. It can be seen as a combination of both LEFT
and RIGHT joins.
 CROSS JOIN: The CROSS JOIN returns an output table in which the number of rows is
equal to the number of rows in the first table times the number of rows in the second
table if the WHERE clause is not used along with CROSS JOIN. When the WHERE
clause is used, the CROSS JOIN functions like an INNER JOIN.

Q18. What is the Cartesian product of tables?


The output of a CROSS JOIN is called Cartesian product. It returns rows, combining each row
from the first table with each row of the second table. For example, if we join two tables
with 25 and 20 columns respectively, the Cartesian product of the two tables will have
25×20 = 500 rows

Q19. What is the difference between UNION and UNION ALL?


Both UNION and UNION ALL merge the contents of two structurally-compatible (having the
same columns) tables into a single, combined table. The difference between them is that
UNION ALL includes duplicate records, while UNION omits duplicate entries.

Q20. What is the difference between JOIN and UNION?


Both JOIN and UNION are used to combine data from different tables; the difference lies in
the way they combine data. JOIN combines data from two tables column-wise, i.e. data
from both the tables will be side by side in different columns. UNION, on the other hand,
combines data from two tables row-wise, i.e. data from both the tables will be one above
the other in sets of rows.

You might also like