Data Analysis Interview (1)
Data Analysis Interview (1)
a. Inquisitive
b. Curiosity
f. Problem-solving skills
g. Mathematical ability
c. Working capital
d. Current ratio
e. Quick ration
f. Leverage
h. Inventory turnover
j. Return on equity
k. Return on assets
3. if there are 5 people in a room and each chooses a number from 1 to 10 at random what is
the probability that two or more people have the same number
a. First work out the probability that none have the same number: In this case,
imagine each person picks their random number in turn. From person one
through five, probabilities of picking unique numbers:
For person one: 10/10 (since no other numbers have been picked yet)
For person two: 9/10 (since one number has been picked by person one)
Person three: 8/10 (Assume that persons one and two have picked unique numbers
since we are looking for the prob. that ALL have picked unique numbers)
Person 5: 6/10
Multiply together: get 30,240/100,000 = 30.24% Subtract from 100% to get 69.76%
4. What are some factors you will consider for defining KPIs for a project?
a. Focus on priorities
a. Problem definition
b. Data Exploration
c. Data Preparation
d. Modelling
e. Data Validation
a. Statistical method used in finance, investing and other disciplines that attempts
to determine the strength and character of the relationship between the X and Y
axis.
9. What is an outlier?
a. Data points which is far from the normal range. There are 2 types:
i. Univariate
ii. Multivariate
MS EXCEL
1. How would you clear all formatting without removing the cell contents?
b. Custom format – Open format cell dialog box – Type ;;; in custom option field.
a. BODMAS or PEDMAS
b. Hlookup needs the value to be looked up in the top row of the dataset.
a. Scenario Manager
b. Goal Seek
c. Data table
a. Whenever you copy formulas in Excel, the addresses of the reference cells get
modified automatically in order to match the position where the formula is copied.
This is done by a system that is called Relative Cell Addresses.
b. If you do not want Excel to change the addresses when you copy formulas, you
must make use of Absolute Cell Addresses. When you use Absolute Cell
References, the row and the column addresses do not get modified and remain
the same.
a. Excel allows you to automate the tasks you do regularly by recording them into
macros. So, a macro is an action or a set of them that you can perform n number
of times.
MySQL/SQL
b. One to Many and Many to One: Record in one table in connected to several
records in another. Most commonly used.
c. Many to Many: Used when defining a relationship that requires several instances
on each sides.
a. Select
b. Insert
c. Update
d. Delete
e. Create database
f. Alter database
a. Database Management
b. Structuring a database
ID, COUNT(ID)
FROM table
GROUP BY ID
HAVING
COUNT (ID)>1;
i. AVG()
ii. COUNT()
iii. MAX()
iv. MIN()
v. SUM()
vi. FIRST()
vii. LAST()
a. Inner
b. Left Outer
c. Right outer
d. Full outer
e. Cross
TABLEAU
1. Can we automate Tableau Reports? If yes, how?
a. Yes, we can automate reports in Tableau. First, we have to
publish/release the Tableau report to the tableau server. At the time of
publishing, there will be an option to schedule reports. In that section, we
have to specify the time when we want to refresh the data
2. Which Tableau data types are preferable while dealing with structured data?
a. We can prefer using Text (string) values and numerical values as the two
popular data types while dealing with structured data in Tableau. Tableau
Desktop works best with structured data because the data remains
arranged in a tabular format (in rows and columns).
3. What are Measures in Tableau?
a. The data that we can measure or are quantifiable comes under
Measures. These are numerical metrics that remain stored in tables. They
have foreign keys that refer to their interconnected dimension tables
uniquely. For example, an employee table will have an employee ID,
customer key, projects delivered, etc., belonging to a specific project or
event.
4. What are Dimensions in Tableau? What will be the different dimensions of a web
app project?
a. We can represent the dimensions of the various characteristics, values,
and attributes descriptively for a particular project or product. The multiple
dimensions of a web app project will be its project name, project type,
budget, size, number of developers required, delivery date, etc.
5. What are the different platforms from where you can pull data to process
visualisation?
a. Tableau allows us to connect and pull data from a broad spectrum of
platforms. Tableau can extract data from simple data storage systems
such as MS. Excel or MS. Access and intricate database systems like
Oracle. It can also pull data from cloud services like Microsoft Azure SQL
database, Amazon Web services, or Google Cloud SQL.
6. How will you define the Tableau Dashboard?
a. Tableau dashboard is a combination of different data views. These data
views are various forms of visualisations that data analysts produce using
Tableau. If the BI analyst or the data analyst makes specific changes in
the data, it gets directly reflected in the dashboard.
7. What are the two different ways of sorting data in Tableau?
a. We can sort Tableau data using manual sorting and computed sorting. In
manual sorting, we drag the dimension field order and rearrange them in
an ad hoc fashion. In computed sorting, we apply the sort button on an
axis to sort the data.
8. Can you name the different joins available in Tableau?
a. Tableau joins are the same as that of SQL. These are:
i. Left join
ii. Right join
iii. Inner join
iv. Full outer join
9. What is the highest number of tables can you join in Tableau?
a. We can join 32 tables in tableau. However, the size of a table in tableau
must be limited to 255 fields (columns).
10. What does Tableau's analytics pane give us?
a. The analytic pane in Tableau gives us the ability to access our everyday
analytics objects easily. It allows dragging trend lines, references, outliers,
forecasts, and other elements from the analytics pane.
1. What is DAX?
b. Calculated Column
c. Calculated Measure
a. Using the slicers, a user may quickly sort and filter through a large report to get
just the information they need. When examining a report, users may choose
characteristics while using slicers instead of filters since they are visible on the
report.
a. Data connectivity and data preparation technology that enables end users to
seamlessly import and reshape data from within a wide range of Microsoft
products
a. When users click on the Get Data icon in Power BI, a drop-down menu appears
and it shows all data sources from which data can be ingested. Data can actually
be directly ingested from any source including files in Excel, CSV, XML, JSON,
PDF, and SharePoint formats and databases such as SQL, Access, SQL Server
Analysis Services, Oracle, IBM, MySQL, and much more.
c. Database administrators
d. Consumer Reports
b. Custom Visualization
c. Ease of use.
a. Microsoft Bing.
b. Advantage over tableau – user does not need to provide latitude and longitude
coordinates.
a. Visualizations
b. Datasets
c. Reports
d. Dashboards
e. Tiles
Section 2: SQL (Structured Query Language)
1. What is Database?
2. What is DBMS?
RDBMS stands for Relational Database Management System. The key difference
here, compared to DBMS, is that RDBMS stores data in the form of a collection of
tables, and relations can be defined between the common fields of these tables.
Most modern database management systems like MySQL, Microsoft SQL Server,
Oracle, IBM DB2, and Amazon Redshift are based on RDBMS.
4. What is SQL?
SQL stands for Structured Query Language. It is the standard language for relational
database management systems. It is especially useful in handling organized data
comprised of entities (variables) and relations between different entities of the data.
A table is an organized collection of data stored in the form of rows and columns.
Columns can be categorized as vertical and rows as horizontal. The columns in a
table are called fields while the rows can be referred to as records.
● NOT NULL - Restricts NULL value from being inserted into a column.
● CHECK - Verifies that all values in a field satisfy a condition.
● DEFAULT - Automatically assigns a default value if no value has been
specified for the field.
● UNIQUE - Ensures unique values to be inserted into the field.
● INDEX - Indexes a field providing faster retrieval of records.
● PRIMARY KEY - Uniquely identifies each record in a table.
● FOREIGN KEY - Ensures referential integrity for a record in another table.
The PRIMARY KEY constraint uniquely identifies each row in a table. It must contain
UNIQUE values and has an implicit NOT NULL constraint.
A table in SQL is strictly restricted to have one and only one primary key, which is
comprised of single or multiple fields (columns).
CREATE TABLE Students ( /* Create table with a single field as primary key */
Name VARCHAR(255)
);
CREATE TABLE Students ( /* Create table with multiple fields as primary key */
LastName VARCHAR(255)
CONSTRAINT PK_Student
);
ALTER TABLE Students /* Set a column as primary key */
write a sql statement to add primary key 't_id' to the table 'teachers'.
Write a SQL statement to add primary key constraint 'pk_a' for table 'table_a' and
fields 'col_b, col_c'.
A UNIQUE constraint ensures that all values in a column are different. This provides
uniqueness for the column(s) and helps identify each row uniquely. Unlike primary
key, there can be multiple unique constraints defined per table. The code syntax for
UNIQUE is quite similar to that of PRIMARY KEY and can be used interchangeably.
Name VARCHAR(255)
);
LastName VARCHAR(255)
CONSTRAINT PK_Student
);
Name VARCHAR(255)
LibraryID INT
);
Name VARCHAR(255)
);
Write a SQL statement to add a FOREIGN KEY 'col_fk' in 'table_y' that references
'col_pk' in 'table_x'.
The SQL Join clause is used to combine records (rows) from two or more tables in a
SQL database based on a related column between the two.
● (INNER) JOIN: Retrieves records that have matching values in both tables
involved in the join. This is the widely used join for queries.
SELECT *
FROM Table_A
JOIN Table_B;
SELECT *
FROM Table_A
SELECT *
FROM Table_A A
ON A.col = B.col;
● RIGHT (OUTER) JOIN: Retrieves all the records/rows from the right and the
matched records/rows from the left table.
SELECT *
FROM Table_A A
ON A.col = B.col;
● FULL (OUTER) JOIN: Retrieves all the records where there is a match in
either the left or right table.
SELECT *
FROM Table_A A
ON A.col = B.col;
A self JOIN is a case of regular join where a table is joined to itself based on some
relation between its own column(s). Self-join uses the INNER JOIN or LEFT JOIN
clause and a table alias is used to assign different names to the table within the
query.
Cross join can be defined as a cartesian product of the two tables included in the
join. The table after join contains the same number of rows as in the cross-product of
the number of rows in the two tables. If a WHERE clause is used in cross join then
the query will work like an INNER JOIN.
Write a SQL statement to CROSS JOIN 'table_1' with 'table_2' and fetch 'col_1' from
table_1 & 'col_2' from table_2 respectively. Do not use alias.
Write a SQL statement to perform SELF JOIN for 'Table_X' with alias 'Table_1' and
'Table_2', on columns 'Col_1' and 'Col_2' respectively.
A database index is a data structure that provides a quick lookup of data in a column
or columns of a table. It enhances the speed of operations accessing data from a
database table at the cost of additional writes and memory to maintain the index data
structure.
CREATE INDEX index_name /* Create Index */
There are different types of indexes that can be created for different purposes:
Unique indexes are indexes that help maintain data integrity by ensuring that no two
rows of data in a table have identical key values. Once a unique index has been
defined for a table, uniqueness is enforced whenever keys are added or changed
within the index.
ON students (enroll_no);
Non-unique indexes, on the other hand, are not used to enforce constraints on the
tables with which they are associated. Instead, non-unique indexes are used solely
to improve query performance by maintaining a sorted order of data values that are
used frequently.
Clustered indexes are indexes whose order of the rows in the database corresponds
to the order of the rows in the index. This is why only one clustered index can exist in
a given table, whereas, multiple non-clustered indexes can exist in the table.
The only difference between clustered and non-clustered indexes is that the
database manager attempts to keep the data in the database in the same order as
the corresponding keys appear in the clustered index.
Clustering indexes can improve the performance of most query operations because
they provide a linear-access path to data stored in the database.
As explained above, the differences can be broken down into three small factors -
● Clustered index modifies the way records are stored in a database based on
the indexed column. A non-clustered index creates a separate entity within
the table which references the original table.
● Clustered index is used for easy and speedy retrieval of data from the
database, whereas, fetching records from the non-clustered index is relatively
slower.
● In SQL, a table can have a single clustered index whereas it can have
multiple non-clustered indexes.
Data Integrity is the assurance of accuracy and consistency of data over its entire
life-cycle and is a critical aspect of the design, implementation, and usage of any
system which stores, processes, or retrieves data. It also defines integrity constraints
to enforce business rules on the data when it is entered into an application or a
database.
FROM myDb.students
WHERE student_id = 1;
WHERE student_id = 1;
A subquery is a query within another query, also known as a nested query or inner
query. It is used to restrict or enhance the data to be queried by the main query,
thus restricting or enhancing the output of the main query respectively. For example,
here we fetch the contact information for students who have enrolled for the maths
subject:
FROM myDb.contacts
WHERE roll_no IN (
SELECT roll_no
FROM myDb.students
WHERE subject = 'Maths');
Write a SQL query to update the field "status" in table "applications" from 0 to 1.
Write a SQL query to select the field "app_id" in table "applications" where "app_id"
less than 1000.
Write a SQL query to fetch the field "app_name" from "apps" where "apps.id" is
equal to the above collection of "app_id".
SELECT operator in SQL is used to select data from a database. The data returned
is stored in a result table, called the result-set.
20. What are some common clauses used with SELECT query in SQL?
Some common SQL clauses used in conjuction with a SELECT query are as follows:
● WHERE clause in SQL is used to filter records that are necessary, based on
specific conditions.
● ORDER BY clause in SQL is used to sort the records based on some field(s)
in ascending (ASC) or descending order (DESC).
SELECT *
FROM myDB.students
GROUP BY country
The UNION operator combines and returns the result-set retrieved by two or more
SELECT statements.
The MINUS operator in SQL is used to remove duplicates from the result-set
obtained by the second SELECT query from the result-set obtained by the first
SELECT query and then return the filtered results from the first.
The INTERSECT clause in SQL combines the result-set fetched by the two
SELECT statements where records from one match the other and then returns this
intersection of result-sets.
Certain conditions need to be met before executing either of the above statements in
SQL -
● Each SELECT statement within the clause must have the same number of
columns
● The columns must also have similar data types
● The columns in each SELECT statement should necessarily have the same
order
UNION
SELECT name FROM Students /* Fetch the union of queries with duplicates*/
UNION ALL
Write a SQL query to fetch "names" that are present in either table "accounts" or in
table "registry".
Write a SQL query to fetch "names" that are present in "accounts" but not in table
"registry".
Write a SQL query to fetch "names" from table "contacts" that are neither present in
"accounts.name" nor in "registry.name".
A database cursor is a control structure that allows for the traversal of records in a
database. Cursors, in addition, facilitates processing after traversal, such as
retrieval, addition, and deletion of database records. They can be viewed as a
pointer to one row in a set of rows.
SELECT name
FROM myDB.students
FETCH next
FROM db_cursor
INTO @name
CLOSE db_cursor /* Close the cursor and deallocate the resources */
DEALLOCATE db_cursor
Entity: An entity can be a real-world object, either tangible or intangible, that can be
easily identifiable. For example, in a college database, students, professors,
workers, departments, and projects can be referred to as entities. Each entity has
some associated properties that provide it an identity.
An alias is represented explicitly by the AS keyword but in some cases, the same
can be performed without it as well. Nevertheless, using the AS keyword is always a
good practice.
B.emp_name AS "Supervisor"
A view in SQL is a virtual table based on the result-set of an SQL statement. A view
contains rows and columns, just like a real table. The fields in a view are fields from
one or more real tables in the database.
Normal Forms are used to eliminate or reduce redundancy in database tables. The
different forms are as follows:
Students Table
Windsor Street
Ansh 777 Dracula (Bram Stoker) Mr.
As we can observe, the Books Issued field has more than one value per record, and
to convert it into 1NF, this has to be resolved into separate individual records for
each book issued. Check the following table in 1NF form -
A relation is in second normal form if it satisfies the conditions for the first normal
form and does not contain any partial dependency. A relation in 2NF has no partial
dependency, i.e., it has no non-prime attribute that depends on any proper subset of
any candidate key of the table. Often, specifying a single column Primary Key is the
solution to the problem. Examples -
Example 1 - Consider the above example. As we can observe, the Students Table in
the 1NF form has a candidate key in the form of [Student, Address] that can uniquely
identify all records in the table. The field Books Issued (non-prime attribute) depends
partially on the Student field. Hence, the table is not in 2NF. To convert it into the
2nd Normal Form, we will partition the tables into two while specifying a new
Primary Key attribute to identify the individual records in the Students table. The
Foreign Key constraint will be set on the other table to ensure referential integrity.
Here, WX is the only candidate key and there is no partial dependency, i.e., any
proper subset of WX doesn’t determine any non-prime attribute in the relation.
● Third Normal Form
A relation is said to be in the third normal form, if it satisfies the conditions for the
second normal form and there is no transitive dependency between the non-prime
attributes, i.e., all non-prime attributes are determined only by the candidate keys of
the relation and not by any other non-prime attribute.
Example 1 - Consider the Students Table in the above example. As we can observe,
the Students Table in the 2NF form has a single candidate key Student_ID (primary
key) that can uniquely identify all records in the table. The field Salutation (non-prime
attribute), however, depends on the Student Field rather than the candidate key.
Hence, the table is not in 3NF. To convert it into the 3rd Normal Form, we will once
again partition the tables into two while specifying a new Foreign Key constraint to
identify the salutations for individual records in the Students table. The Primary Key
constraint for the same will be set on the Salutations table to identify each record
uniquely.
Salutation_I Salutatio
D n
1 Ms.
2 Mr.
3 Mrs.
Q -> S
T -> P
For the above relation to exist in 3NF, all possible candidate keys in the above
relation should be {P, RS, QR, T}.
A relation is in Boyce-Codd Normal Form if satisfies the conditions for third normal
form and for every functional dependency, Left-Hand-Side is super key. In other
words, a relation in BCNF has non-trivial functional dependencies in form X –> Y,
such that X is always a super key. For example - In the above example, Student_ID
serves as the sole unique identifier for the Students Table and Salutation_ID for the
Salutations Table, thus these tables exist in BCNF. The same cannot be said for the
Books Table and there can be several books with common Book Names and the
same Student_ID.
TRUNCATE command is used to delete all the rows from the table and free the
space containing the table.
DROP command is used to remove an object from the database. If you drop a table,
all the rows in the table are deleted and the table structure is removed from the
database.
Write a SQL query to remove first 1000 records from table 'Temporary' based on 'id'.
Write a SQL statement to delete the table 'Temporary' while keeping its relations
intact.
If a table is dropped, all things associated with the tables are dropped as well. This
includes - the relationships defined on the table with other tables, the integrity checks
and constraints, access privileges and other grants that the table has. To create and
use the table again in its original form, all these relations, checks, constraints,
privileges and relationships need to be redefined. However, if a table is truncated,
none of the above problems exist and the table retains its original structure.
The TRUNCATE command is used to delete all the rows from the table and free the
space containing the table.
The DELETE command deletes only the rows from the table based on the condition
given in the where clause or deletes all the rows from the table if no condition is
specified. But it does not free the space containing the table.
Note: All aggregate functions described above ignore NULL values except for the
COUNT function.
A scalar function returns a single value based on the input value. Following are the
widely used SQL scalar functions:
The user-defined functions in SQL are like functions in any other programming
language that accept parameters, perform complex calculations, and return a value.
They are written to use the logic repetitively whenever required. There are two types
of SQL user-defined functions:
OLAP stands for Online Analytical Processing, a class of software programs that
are characterized by the relatively low frequency of online transactions. Queries are
often too complex and involve a bunch of aggregations. For OLAP systems, the
effectiveness measure relies highly on response time. Such systems are widely used
for data mining or maintaining aggregated, historical data, usually in multi-
dimensional schemas.
37. What is Collation? What are the different types of Collation Sensitivity?
Collation refers to a set of rules that determine how data is sorted and compared.
Rules defining the correct character sequence are used to sort the character data. It
incorporates options for specifying case sensitivity, accent marks, kana character
types, and character width. Below are the different types of collation sensitivity:
DELIMITER $$
CREATE PROCEDURE FetchAllStudents()
BEGIN
END $$
DELIMITER ;
A stored procedure that calls itself until a boundary condition is reached, is called a
recursive stored procedure. This recursive function helps the programmers to deploy
the same set of code several times as and when required. Some SQL programming
languages limit the recursion depth to prevent an infinite loop of procedure calls from
causing a stack overflow, which slows down the system and may lead to system
crashes.
) BEGIN
DECLARE score INT DEFAULT NULL; /* Set the default value => "score" */
ELSE
END IF;
Creating empty tables with the same structure can be done smartly by fetching the
records of one table into a new table using the INTO operator while fixing a WHERE
clause to be false for all records. Hence, SQL prepares the new table with a
duplicate structure to accept the fetched records but since no records get fetched
due to the WHERE clause in action, nothing is inserted into the new table.
SQL pattern matching provides for pattern search in data if you have no clue as to
what that word should be. This kind of SQL query uses wildcards to match a string
pattern, rather than writing the exact word. The LIKE operator is used in conjunction
with SQL Wildcards to fetch the required information.
The % wildcard matches zero or more characters of any type and can be used to
define wildcards both before and after the pattern. Search a student in your database
with first name beginning with the letter K:
SELECT *
FROM students
Use the NOT keyword to select records that don't match the pattern. This query
returns all students whose first name does not begin with K.
SELECT *
FROM students
Search for a student in the database where he/she has a K in his/her first name.
SELECT *
FROM students
WHERE first_name LIKE '%Q%'
● Using the _ wildcard to match pattern at a specific position
The _ wildcard matches exactly one character of any type. It can be used in
conjunction with % wildcard. This query fetches all students with letter K at the third
position in their first name.
SELECT *
FROM students
The _ wildcard plays an important role as a limitation when it matches exactly one
character. It limits the length and position of the matched results. For example -
FROM students
FROM students
PostgreSQL was first called Postgres and was developed by a team led by
Computer Science Professor Michael Stonebraker in 1986. It was developed to help
developers build enterprise-level applications by upholding data integrity by making
systems fault-tolerant. PostgreSQL is therefore an enterprise-level, flexible, robust,
open-source, and object-relational DBMS that supports flexible workloads along with
handling concurrent users. It has been consistently supported by the global
developer community. Due to its fault-tolerant nature, PostgreSQL has gained
widespread popularity among developers.
Indexes are the inbuilt functions in PostgreSQL which are used by the queries to
perform search more efficiently on a table in the database. Consider that you have a
table with thousands of records and you have the below query that only a few
records can satisfy the condition, then it will take a lot of time to search and return
those rows that abide by this condition as the engine has to perform the search
operation on every single to check this condition. This is undoubtedly inefficient for a
system dealing with huge data. Now if this system had an index on the column
where we are applying search, it can use an efficient method for identifying matching
rows by walking through only a few levels. This is called indexing.
This can be done by using the ALTER TABLE statement as shown below:
Syntax:
The first step of using PostgreSQL is to create a database. This is done by using the
createdb command as shown below: createdb db_name
After running the above command, if the database creation was successful, then the
below message is shown:
CREATE DATABASE
46. How can we start, restart and stop the PostgreSQL server?
Starting PostgreSQL: ok
● To restart the PostgreSQL server, we run:
ok
● To stop the server, we run the command:
service postgresql stop
ok
Partitioned tables are logical structures that are used for dividing large tables into
smaller structures that are called partitions. This approach is used for effectively
increasing the query performance while dealing with large database tables. To
create a partition, a key called partition key which is usually a table column or an
expression, and a partitioning method needs to be defined. There are three types of
inbuilt partitioning methods provided by Postgres:
The type of partition key and the type of method used for partitioning determines how
positive the performance and the level of manageability of the partitioned table are.
RESTART IDENTITY;
We can also use the statement for removing data from multiple tables all at once by
mentioning the table names separated by comma as shown below:
TRUNCATE TABLE
table_1,
table_2,
table_3;
To get the next number 101 from the sequence, we use the nextval() method as
shown below:
SELECT nextval('serial_num');
We can also use this sequence while inserting new records using the INSERT
command:
They are character sequences bound within single quotes. These are using during
data insertion or updation to characters in the database.
There are special string constants that are quoted in dollars. Syntax:
$tag$<string_constant>$tag$ The tag in the constant is optional and when we are
not specifying the tag, the constant is called a double-dollar string literal.
This can be done by using the command \l -> backslash followed by the lower-case
letter L.
This can be done by using the DROP DATABASE command as shown in the syntax
below:
If the database has been deleted successfully, then the following message would be
shown:
DROP DATABASE
ACID stands for Atomicity, Consistency, Isolation, Durability. They are database
transaction properties which are used for guaranteeing data validity in case of errors
and failures.
The command enable-debug is used for enabling the compilation of all libraries and
applications. When this is enabled, the system processes get hindered and generally
also increases the size of the binary file. Hence, it is not recommended to switch this
on in the production environment. This is most commonly used by developers to
debug the bugs in their scripts and help them spot the issues. For more information
regarding how to debug, you can refer here.
59. How do you check the rows affected as part of previous transactions?
SQL standards state that the following three phenomena should be prevented whilst
concurrent transactions. SQL standards define 4 levels of transaction isolations to
deal with these phenomena.
To tackle these, there are 4 standard isolation levels defined by SQL standards.
They are as follows:
The following table clearly explains which type of unwanted reads the levels avoid:
Read
Uncommitted Might occur Might occur Might occur
60. What can you tell about WAL (Write Ahead Logging)?
Write Ahead Logging is a feature that increases the database reliability by logging
changes before any changes are done to the database. This ensures that we have
enough information when a database crash occurs by helping to pinpoint to what
point the work has been complete and gives a starting point from the point where it
was discontinued.
61. What is the main disadvantage of deleting data from an existing table using
the DROP TABLE command?
DROP TABLE command deletes complete data from the table along with removing
the complete table structure too. In case our requirement entails just remove the
data, then we would need to recreate the table to store data in it. In such cases, it is
advised to use the TRUNCATE command.
'interviewbit' ~* '.*INTervIewBit.*'
We can achieve this by using the pg_dump tool for dumping all object contents in the
database into a single file. The steps are as follows:
Step 2: Execute pg_dump program to take the dump of data to a .tar folder as
shown below:
The database dump will be stored in the sample_data.tar file on the location
specified.
The commit action ensures that the data consistency of the transaction is maintained
and it ends the current transaction in the section. Commit adds a new record in the
log that describes the COMMIT to the memory. Whereas, a checkpoint is used for
writing all changes that were committed to disk up to SCN which would be kept in
datafile headers and control files.
Conclusion:
SQL is a language for the database. It has a vast scope and robust capability of
creating and manipulating a variety of database objects using commands like
CREATE, ALTER, DROP, etc, and also in loading the database objects using
commands like INSERT. It also provides options for Data Manipulation using
commands like DELETE, TRUNCATE and also does effective retrieval of data using
cursor commands like FETCH, SELECT, etc. There are many such commands
which provide a large amount of control to the programmer to interact with the
database in an efficient way without wasting many resources. The popularity of SQL
has grown so much that almost every programmer relies on this to implement their
application's storage functionalities thereby making it an exciting language to learn.
Learning this provides the developer a benefit of understanding the data structures
used for storing the organization's data and giving an additional level of control and
in-depth understanding of the application.
Detail your approach to cleaning and preparing data efficiently in Power BI.
Implement and manage live data connections for real-time insights in Power BI reports.
Diagnose and address performance issues to optimize Power BI report loading and refreshing
times.
1. Can you walk me through your experience with Power BI and how you've used it in your
previous roles?
2. What are some key benefits of using Power BI for data analysis and visualization?
3. Can you explain the difference between Power BI Desktop and Power BI Service?
4. How do you handle large datasets in Power BI to ensure optimal performance?
5. Have you utilized Power BI's data modeling capabilities in your projects? If so, can you
provide an example?
6. What strategies do you employ for data cleansing and manipulation in Power BI?
7. Can you discuss a challenging data visualization problem you've encountered in Power BI
and how you solved it?
8. How do you approach creating interactive dashboards in Power BI to meet user
requirements?
9. Have you worked with Power BI's integration with other Microsoft tools like Excel or SQL
Server? If so, can you elaborate on your experience?
10. How do you stay updated with new features and updates in Power BI?
11. Can you provide an example of a complex calculation you've implemented using DAX in
Power BI?
12. What are some common pitfalls to avoid when designing Power BI reports and dashboards?
13. How do you ensure data security and compliance when working with sensitive data in Power
BI?
14. Have you worked on any projects involving real-time data streaming in Power BI? If so, how
did you set it up?
15. Can you discuss a time when you had to troubleshoot and resolve a technical issue in Power
BI?
Section 3.2: Tableau
a. What are dimensions and measures usually contain
b. What are discrete and continuous fields and how are they displayed
c. Explain the difference between discrete dateparts and continuous date values
d. Explain why Tableau aggregates measures
1. Which of the following is the best reason to use an extract instead of a live connection?
Answer: You need to apply an aggregation that takes too long when using a live
connection.
2. You created a group by selecting field labels in a view. How can you remove members from the
group?
Answer: In the Data pane, right-click the group and select Edit Group.
3. Interactive elements that you can add to a dashboard for users include .
5. Using the Stocks 2010-2013 table, create a chart to see the monthly change in volumes of stocks,
the beginning of 2010 to the end of 2013. Which two consecutive months saw the least fluctuation
in increase or decrease?
The correct answer is March 2012 - April 2012.
The answer to this question can be found by placing the continuous month date value (May 2015)
format on Columns, and placing SUM(Volume) on Rows.
6. Using the Stocks 2010-2013 table, create a crosstab showing the sum of Volume per
Company per Year, then add grand totals to the view. What was the total volume for
Apple in 2013 and the total volume for Apple for 2010 through 2013, respectively?
The correct answer is 25,606,397,999 and 127,322,019,216.
The answer to this question can be found by placing YEAR(Date) on columns and
Company on rows. Drag SUM(Volume) to labels.
7. Using the Stocks 2010-2013 table, create a chart that shows the percent difference
in Volume for each company by year and quarter. How many quarters did Biogen
Idec show a positive percent difference in volume?
The correct answer is 6.
The answer to this question can be found by placing Company and Volume on Rows,
and Date on Columns. On Columns, click the plus sign on YEAR(Date) to add
QUARTER(Date) to the view.
8. Using the Flights table, create a bar chart showing the average of Minutes of Delay per Flight
broken down by Carrier Name, and filtered by State to only show Minnesota (MN). What was the
average minutes of delay per flight for United in Minnesota?