Skip to main content

Meesho PySpark Interview Questions for Data Engineers in 2025

Meesho PySpark Interview Questions for Data Engineers in 2025 Preparing for a PySpark interview? Let’s tackle some commonly asked questions, along with practical answers and insights to ace your next Data Engineering interview at Meesho or any top-tier tech company. 1. Explain how caching and persistence work in PySpark. When would you use cache() versus persist() and what are their performance implications? Answer : Caching : Stores data in memory (default) for faster retrieval. Use cache() when you need to reuse a DataFrame or RDD multiple times in a session without specifying storage levels. Example: python df.cache() df.count() # Triggers caching Persistence : Allows you to specify storage levels (e.g., memory, disk, or a combination). Use persist() when memory is limited, and you want a fallback to disk storage. Example: python from pyspark import StorageLevel df.persist(StorageLevel.MEMORY_AND_DISK) df.count() # Triggers persistence Performance Implications : cache() is ...

Ad

PwC Data Analyst Interview question and its answer

PwC Data Analyst Interview question and its answer

PwC Data Analyst Interview Experience (1–3 Years)

Are you preparing for a data analyst role at PwC or a similar organization? Here’s my recent experience tackling some challenging SQL and Python interview questions during the selection process for a PwC Data Analyst role. These questions test both foundational knowledge and problem-solving skills. Here's how I approached them.



SQL Questions

1. How Indexing Works in SQL

Indexing improves query performance by allowing faster retrieval of rows. A clustered index organizes data physically, while a non-clustered index uses pointers to rows. Choose columns frequently used in WHERE or JOIN clauses for indexing, like CustomerID in a Transactions table.


2. Identify Customers with Purchases in Consecutive Months

Using window functions:

sql
WITH ConsecutivePurchases AS (
SELECT CustomerID, MONTH(TransactionDate) AS TransactionMonth, YEAR(TransactionDate) AS TransactionYear, LAG(MONTH(TransactionDate)) OVER (PARTITION BY CustomerID ORDER BY TransactionDate) AS PrevMonth FROM Transactions ) SELECT DISTINCT CustomerID FROM ConsecutivePurchases WHERE TransactionMonth - PrevMonth = 1;

This query checks for customers with transactions in back-to-back months.


3. Monthly User Retention Rate

Calculate the number of returning users per month:

sql
WITH Retention AS (
SELECT UserID, MONTH(TransactionDate) AS Month, COUNT(*) OVER (PARTITION BY MONTH(TransactionDate)) AS TotalUsers, COUNT(DISTINCT CASE WHEN LAG(UserID) OVER (PARTITION BY UserID ORDER BY TransactionDate) IS NOT NULL THEN UserID END) AS RetainedUsers FROM Transactions ) SELECT Month, (CAST(RetainedUsers AS FLOAT) / TotalUsers) * 100 AS RetentionRate FROM Retention;

This computes the percentage of retained users for each month.


4. Nth Highest Salary (Dynamic n)

To find the nth highest salary dynamically:

sql
SELECT DISTINCT Salary
FROM Employees ORDER BY Salary DESC LIMIT 1 OFFSET :n - 1;

Here, :n is a parameter passed dynamically to the query.


5. Top 5 Products by Sales Volume Excluding Recent Zero Sales

sql
SELECT ProductID, SUM(SalesVolume) AS TotalSales
FROM Sales WHERE ProductID NOT IN ( SELECT DISTINCT ProductID FROM Sales WHERE TransactionDate >= DATEADD(MONTH, -3, GETDATE()) AND SalesVolume = 0 ) GROUP BY ProductID ORDER BY TotalSales DESC LIMIT 5;

This query excludes products with zero sales in the last three months.


6. Cumulative Revenue by Month for Each Product Category

Using SUM() with window functions:

sql
SELECT
CategoryID, MONTH(SaleDate) AS SaleMonth, SUM(Revenue) OVER (PARTITION BY CategoryID ORDER BY SaleDate) AS CumulativeRevenue FROM Sales;

This calculates cumulative revenue grouped by product category and month.


7. Differences Between SQL Joins

  • LEFT JOIN: Returns all rows from the left table and matched rows from the right table. Use when you want unmatched rows from the left table.
  • RIGHT JOIN: Opposite of LEFT JOIN; use for unmatched rows in the right table.
  • FULL OUTER JOIN: Combines unmatched rows from both tables. Use when you need complete data from both.
    For instance, a FULL OUTER JOIN is helpful for reconciling two datasets with missing values on either side.

8. HAVING vs. WHERE

  • WHERE: Filters rows before aggregation.
  • HAVING: Filters groups after aggregation.
    Example: To find departments with more than 10 employees:
sql
SELECT DepartmentID, COUNT(*) AS EmployeeCount
FROM Employees GROUP BY DepartmentID HAVING COUNT(*) > 10;

Python Questions

1. Palindrome Checker

python
import string
def is_palindrome(s): cleaned = ''.join(char.lower() for char in s if char.isalnum()) return cleaned == cleaned[::-1] # Example usage: print(is_palindrome("A man, a plan, a canal: Panama")) # True

2. Deep Copy vs. Shallow Copy

  • Shallow Copy: Copies only references to objects (e.g., copy.copy() or slicing).
  • Deep Copy: Recursively copies objects and nested structures (e.g., copy.deepcopy()).

3. Find Unique Pairs with Target Sum

python
def find_pairs(nums, target):
seen = set() pairs = set() for num in nums: complement = target - num if complement in seen: pairs.add((min(num, complement), max(num, complement))) seen.add(num) return pairs # Example usage: print(find_pairs([1, 2, 3, 4, 5], 5)) # {(2, 3), (1, 4)}

4. Python Decorators

A decorator modifies a function’s behavior without changing its code.
Example: Logging execution time.

python
import time
def timer(func): def wrapper(*args, **kwargs): start = time.time() result = func(*args, **kwargs) end = time.time() print(f"Execution time: {end - start:.2f} seconds") return result return wrapper @timer def example_function(): time.sleep(2) print("Function executed!") example_function()

Reflections and Key Takeaways

This interview experience reinforced the importance of:

  1. SQL Optimization: Structuring queries for efficiency.
  2. Python Mastery: Handling data manipulation and algorithmic challenges.
  3. Conceptual Clarity: Understanding core database and programming principles.

Preparing for such questions not only boosts confidence but also sharpens real-world problem-solving skills.

Your Turn!
How would you approach these questions? Share your solutions below!

#Data_Analytics #SQL #Python #CareerDevelopment

Comments

Ad

Popular posts from this blog

Deloitte Data Analyst Interview Questions and Answer

Deloitte Data Analyst Interview Questions: Insights and My Personal Approach to Answering Them 1. Tell us about yourself and your current job responsibilities. Example Answer: "I am currently working as a Data Analyst at [Company Name], where I manage and analyze large datasets to drive business insights. My responsibilities include creating and maintaining Power BI dashboards, performing advanced SQL queries to extract and transform data, and collaborating with cross-functional teams to improve data-driven decision-making. Recently, I worked on a project where I streamlined reporting processes using DAX measures and optimized SQL queries, reducing report generation time by 30%." 2. Can you share some challenges you encountered in your recent project involving Power BI dashboards, and how did you resolve them? Example Challenge: In a recent project, one of the key challenges was handling complex relationships between multiple datasets, which caused performance issues and in...

Deloitte Recent Interview Questions for Data Analyst Position November 2024

Deloitte Recent Interview Insights for a Data Analyst Position (0-3 Years) When preparing for an interview with a firm like Deloitte, particularly for a data analyst role, it's crucial to combine technical proficiency with real-world experiences. Below are my personalized insights into common interview questions. 1. Tell us about yourself and your current job responsibilities. Hi, I’m [Your Name], currently working as a Sr. Data Analyst with over 3.5 years of experience. I specialize in creating interactive dashboards, analyzing large datasets, and automating workflows. My responsibilities include developing Power BI dashboards for financial and operational reporting, analyzing trends in customer churn rates, and collaborating with cross-functional teams to implement data-driven solutions. Here’s a quick glimpse of my professional journey: Reporting financial metrics using Power BI, Excel, and SQL. Designing dashboards to track sales and marketing KPIs. Teaching data analysis conce...

EXL Interview question and answer for Power BI Developer (3 Years of Experience)

EXL Interview Experience for Power BI Developer (3 Years of Experience) I recently appeared for an interview at EXL for the role of Power BI Developer . The selection process consisted of three rounds: 2 Technical Rounds 1 Managerial Round Here, I’ll share the key technical questions I encountered, along with my approach to answering them. SQL Questions 1️⃣ Write a SQL query to find the second most recent order date for each customer from a table Orders ( OrderID , CustomerID , OrderDate ). To solve this, I used the ROW_NUMBER() window function: sql WITH RankedOrders AS ( SELECT CustomerID, OrderDate, ROW_NUMBER () OVER ( PARTITION BY CustomerID ORDER BY OrderDate DESC ) AS RowNum FROM Orders ) SELECT CustomerID, OrderDate AS SecondMostRecentOrderDate FROM RankedOrders WHERE RowNum = 2 ; 2️⃣ Write a query to find the nth highest salary from a table Employees with columns ( EmployeeID , Name , Salary ). The DENSE_RANK() fu...