PwC Data Analyst Interview question and its answer

PwC Data Analyst Interview Experience (1–3 Years)

Are you preparing for a data analyst role at PwC or a similar organization? Here’s my recent experience tackling some challenging SQL and Python interview questions during the selection process for a PwC Data Analyst role. These questions test both foundational knowledge and problem-solving skills. Here's how I approached them.

SQL Questions

1. How Indexing Works in SQL

Indexing improves query performance by allowing faster retrieval of rows. A clustered index organizes data physically, while a non-clustered index uses pointers to rows. Choose columns frequently used in WHERE or JOIN clauses for indexing, like CustomerID in a Transactions table.

2. Identify Customers with Purchases in Consecutive Months

Using window functions:

sql
WITH ConsecutivePurchases AS (
    SELECT 
        CustomerID, 
        MONTH(TransactionDate) AS TransactionMonth, 
        YEAR(TransactionDate) AS TransactionYear, 
        LAG(MONTH(TransactionDate)) OVER (PARTITION BY CustomerID ORDER BY TransactionDate) AS PrevMonth
    FROM Transactions
)
SELECT DISTINCT CustomerID
FROM ConsecutivePurchases
WHERE TransactionMonth - PrevMonth = 1;

This query checks for customers with transactions in back-to-back months.

3. Monthly User Retention Rate

Calculate the number of returning users per month:

sql
WITH Retention AS (
    SELECT 
        UserID, 
        MONTH(TransactionDate) AS Month, 
        COUNT(*) OVER (PARTITION BY MONTH(TransactionDate)) AS TotalUsers,
        COUNT(DISTINCT CASE WHEN LAG(UserID) OVER (PARTITION BY UserID ORDER BY TransactionDate) IS NOT NULL THEN UserID END) AS RetainedUsers
    FROM Transactions
)
SELECT Month, 
       (CAST(RetainedUsers AS FLOAT) / TotalUsers) * 100 AS RetentionRate
FROM Retention;

This computes the percentage of retained users for each month.

4. Nth Highest Salary (Dynamic n)

To find the nth highest salary dynamically:

sql
SELECT DISTINCT Salary 
FROM Employees 
ORDER BY Salary DESC 
LIMIT 1 OFFSET :n - 1;

Here, :n is a parameter passed dynamically to the query.

5. Top 5 Products by Sales Volume Excluding Recent Zero Sales

sql
SELECT ProductID, SUM(SalesVolume) AS TotalSales
FROM Sales
WHERE ProductID NOT IN (
    SELECT DISTINCT ProductID 
    FROM Sales 
    WHERE TransactionDate >= DATEADD(MONTH, -3, GETDATE()) AND SalesVolume = 0
)
GROUP BY ProductID
ORDER BY TotalSales DESC
LIMIT 5;

This query excludes products with zero sales in the last three months.

6. Cumulative Revenue by Month for Each Product Category

Using SUM() with window functions:

sql
SELECT 
    CategoryID, 
    MONTH(SaleDate) AS SaleMonth, 
    SUM(Revenue) OVER (PARTITION BY CategoryID ORDER BY SaleDate) AS CumulativeRevenue
FROM Sales;

This calculates cumulative revenue grouped by product category and month.

7. Differences Between SQL Joins

LEFT JOIN: Returns all rows from the left table and matched rows from the right table. Use when you want unmatched rows from the left table.
RIGHT JOIN: Opposite of LEFT JOIN; use for unmatched rows in the right table.
FULL OUTER JOIN: Combines unmatched rows from both tables. Use when you need complete data from both.
For instance, a FULL OUTER JOIN is helpful for reconciling two datasets with missing values on either side.

8. HAVING vs. WHERE

WHERE: Filters rows before aggregation.
HAVING: Filters groups after aggregation.
Example: To find departments with more than 10 employees:

sql
SELECT DepartmentID, COUNT(*) AS EmployeeCount 
FROM Employees 
GROUP BY DepartmentID 
HAVING COUNT(*) > 10;

Python Questions

1. Palindrome Checker

python
import string

def is_palindrome(s):
    cleaned = ''.join(char.lower() for char in s if char.isalnum())
    return cleaned == cleaned[::-1]

# Example usage:
print(is_palindrome("A man, a plan, a canal: Panama"))  # True

2. Deep Copy vs. Shallow Copy

Shallow Copy: Copies only references to objects (e.g., copy.copy() or slicing).
Deep Copy: Recursively copies objects and nested structures (e.g., copy.deepcopy()).

3. Find Unique Pairs with Target Sum

python
def find_pairs(nums, target):
    seen = set()
    pairs = set()
    for num in nums:
        complement = target - num
        if complement in seen:
            pairs.add((min(num, complement), max(num, complement)))
        seen.add(num)
    return pairs

# Example usage:
print(find_pairs([1, 2, 3, 4, 5], 5))  # {(2, 3), (1, 4)}

4. Python Decorators

A decorator modifies a function’s behavior without changing its code.
Example: Logging execution time.

python
import time

def timer(func):
    def wrapper(*args, **kwargs):
        start = time.time()
        result = func(*args, **kwargs)
        end = time.time()
        print(f"Execution time: {end - start:.2f} seconds")
        return result
    return wrapper

@timer
def example_function():
    time.sleep(2)
    print("Function executed!")

example_function()

Reflections and Key Takeaways

This interview experience reinforced the importance of:

SQL Optimization: Structuring queries for efficiency.
Python Mastery: Handling data manipulation and algorithmic challenges.
Conceptual Clarity: Understanding core database and programming principles.

Preparing for such questions not only boosts confidence but also sharpens real-world problem-solving skills.

Your Turn!
How would you approach these questions? Share your solutions below!

#Data_Analytics #SQL #Python #CareerDevelopment

Interview Guide

Search This Blog

DevOps Consultant Interview Questions at MNC

Ad