Cognizant Data Analyst Interview Questions 1745235888
Cognizant Data Analyst Interview Questions 1745235888
INTERVIEW QUESTIONS
0-3 YOE
Cleaning and preparing data is a critical first step in the data analysis process, ensuring
data is accurate, consistent, and usable. Here’s how I typically approach it:
Step-by-Step Approach:
o Review column names, data types, and the number of unique values.
2. Remove Duplicates:
6. Fix Inconsistencies:
8. Outlier Detection:
SQL (Structured Query Language) is a fundamental tool in data analysis, especially when
working with relational databases.
1. Data Extraction:
4. Joining Tables:
o Use INNER JOIN, LEFT JOIN, etc., to merge data from multiple tables.
5. Window Functions:
o Use nested queries and WITH statements for better readability and
reusability.
o Check duplicates, data ranges, invalid entries directly in SQL before analysis.
Real-Life Scenario:
FROM orders
GROUP BY region
Handling missing or incomplete data is crucial to ensure the accuracy and robustness of
your analysis.
o In Python: df.isnull().sum()
Drop rows/columns When missing values are very few and insignificant
5. Document Assumptions:
o Keep track of what and how you imputed or removed data for reproducibility.
Example in Python:
df['Age'].fillna(df['Age'].mean(), inplace=True)
df['Gender'].fillna(df['Gender'].mode()[0], inplace=True)
Important Consideration:
Always assess the impact of missing data on your analysis before deciding on a strategy.
Sometimes, missingness itself can be an insight (e.g., customers not responding to a
question).
Data normalization refers to the process of transforming data into a common scale without
distorting differences in the ranges of values. It ensures that each feature contributes
equally to the analysis or model.
It can also refer to a database design technique that organizes tables to reduce
redundancy and improve data integrity.
1. Min-Max Normalization:
o Formula:
x′=x−min(x)/(max(x)−min(x))
o Formula:
x′=(x−μ)/σ
3. Decimal Scaling:
o Moves the decimal point of values to bring them within a standard range.
o Algorithms like KNN, SVM, and Gradient Descent are distance-based and
sensitive to scale.
As a Data Analyst, data visualization is key to presenting findings in a clear, compelling way.
Best Practices:
Correlation:
• Example: Ice cream sales and temperature tend to rise together (positive
correlation).
Key Differences:
Step-by-Step Approach:
1. Formulate Hypotheses:
else:
Both Primary Key and Foreign Key are used to establish and enforce relationships
between tables in a relational database.
Feature Primary Key Foreign Key
Uniqueness Must be unique and non-null Can have duplicates and nulls
Location Defined within the current table Refers to a primary key in another table
Real-Life Example:
-- Primary Key
Name VARCHAR(50)
);
-- Foreign Key
EmployeeID INT,
);
Steps I Follow:
o Ensure each column has the correct data type (e.g., date columns are in
datetime format).
6. Outlier Detection:
8. Automated Testing:
o Use scripts or data validation tools to enforce rules (e.g., no negative sales
values).
10 )
𝗪𝗿𝗶𝘁𝗲 𝗮 𝗾𝘂𝗲𝗿𝘆 𝘁𝗼 𝗳𝗶𝗻𝗱 𝗰𝘂𝘀𝘁𝗼𝗺𝗲𝗿𝘀 𝘄𝗵𝗼 𝗵𝗮𝘃𝗲 𝗺𝗮𝗱𝗲 𝗺𝗼𝗿𝗲 𝘁𝗵𝗮𝗻 𝗼𝗻𝗲
𝗽𝘂𝗿𝗰𝗵𝗮𝘀𝗲 𝗼𝗻 𝗱𝗶𝗳𝗳𝗲𝗿𝗲𝗻𝘁 𝗱𝗮𝘁𝗲𝘀 𝗮𝗻𝗱 𝗰𝗮𝗹𝗰𝘂𝗹𝗮𝘁𝗲 𝘁𝗵𝗲𝗶𝗿 𝘁𝗼𝘁𝗮𝗹 𝗽𝘂𝗿𝗰𝗵𝗮𝘀𝗲
𝗮𝗺𝗼𝘂𝗻𝘁.
SELECT
customer_id,
SUM(sale_amount) AS total_purchase_amount
FROM
Sales
GROUP BY
customer_id
HAVING
Explanation:
1001 2 900
1002 2 900
Table B
7
7
7
A INNER JOIN B =
A LEFT JOIN B =
A UNION B =
A UNION ALL B =
A EXCEPT B =
A INTERSECT B =
A CROSS JOIN B WHERE A.ID=7 =
Table A:
7 → 4 rows (value 7)
Table B:
7 → 3 rows (value 7)
We're assuming there's an implicit ID column in each table with values = 7. Now let's
analyze each SQL operation:
A INNER JOIN B
Each row from A joins with each matching row in B (on value = 7).
Result = 4 × 3 = 12 rows
A LEFT JOIN B
Result = 4 × 3 = 12 rows
A UNION B
All values = 7
Only one unique value = 7
Result = 1 row
A UNION ALL B
• 4 from A
• 3 from B
Result = 4 + 3 = 7 rows
A EXCEPT B
A INTERSECT B
Result = 1 row
Assuming CROSS JOIN is filtered with WHERE A.ID = 7 (and all values in A are 7), it's the
same as:
SELECT *
FROM A
CROSS JOIN B
WHERE A.ID = 7;
All rows in A meet the condition (4 rows), and all 3 in B combine with each → 4 × 3 = 12 rows
Result = 12 rows
A INNER JOIN B 12
A LEFT JOIN B 12
A UNION B 1
Query Output Rows
A UNION ALL B 7
A EXCEPT B 0
A INTERSECT B 1