Skip to main content

Meesho PySpark Interview Questions for Data Engineers in 2025

Meesho PySpark Interview Questions for Data Engineers in 2025 Preparing for a PySpark interview? Let’s tackle some commonly asked questions, along with practical answers and insights to ace your next Data Engineering interview at Meesho or any top-tier tech company. 1. Explain how caching and persistence work in PySpark. When would you use cache() versus persist() and what are their performance implications? Answer : Caching : Stores data in memory (default) for faster retrieval. Use cache() when you need to reuse a DataFrame or RDD multiple times in a session without specifying storage levels. Example: python df.cache() df.count() # Triggers caching Persistence : Allows you to specify storage levels (e.g., memory, disk, or a combination). Use persist() when memory is limited, and you want a fallback to disk storage. Example: python from pyspark import StorageLevel df.persist(StorageLevel.MEMORY_AND_DISK) df.count() # Triggers persistence Performance Implications : cache() is ...

Ad

SQL Questions Asked in an American Express Interview

How I Would Solve These Tricky SQL Questions Asked in American Express Interview

SQL is a fundamental skill for any data analyst, and mastering complex queries is key to standing out in interviews. Below, I break down how I would approach solving the tricky SQL questions mentioned. Each of these challenges is designed to test both your technical proficiency and your problem-solving ability. Let’s dive into the solutions.




1. Find the Second-Highest Salary in a Table Without Using LIMIT or TOP

This is a classic problem that requires creativity. My solution:

sql

SELECT MAX(salary) FROM employees WHERE salary < (SELECT MAX(salary) FROM employees);

Here, the subquery finds the maximum salary, and the outer query selects the highest salary below that.


2. Find All Employees Who Earn More Than Their Managers

Joining the table to itself is the key here:

sql
SELECT e1.employee_name
FROM employees e1 JOIN employees e2 ON e1.manager_id = e2.employee_id WHERE e1.salary > e2.salary;

This query compares employees' salaries with their managers' salaries.


3. Find Duplicate Rows Without Using GROUP BY

Use a ROW_NUMBER() window function to isolate duplicates:

sql
WITH cte AS (
SELECT *, ROW_NUMBER() OVER (PARTITION BY column1, column2 ORDER BY id) AS row_num FROM table_name ) SELECT * FROM cte WHERE row_num > 1;

This identifies duplicates without relying on GROUP BY.


4. Find the Top 10% Earners in a Table

Using PERCENTILE_CONT:

sql
WITH percentiles AS (
SELECT PERCENTILE_CONT(0.9) WITHIN GROUP (ORDER BY salary) AS percentile_90 FROM employees ) SELECT * FROM employees, percentiles WHERE salary > percentile_90;

This query extracts the 90th percentile and selects employees earning above that threshold.


5. Find the Cumulative Sum of a Column

A window function simplifies this task:

sql
SELECT employee_id,
salary, SUM(salary) OVER (ORDER BY employee_id) AS cumulative_sum FROM employees;

The SUM() function with OVER computes the cumulative sum across rows.


6. Find Employees Who Have Never Taken a Leave

A NOT IN query solves this effectively:

sql
SELECT *
FROM employees WHERE employee_id NOT IN (SELECT employee_id FROM leaves);

This ensures only employees without leave records are selected.


7. Difference Between Current Row and Next Row

Using the LEAD() window function:

sql
SELECT employee_id,
salary, salary - LEAD(salary) OVER (ORDER BY employee_id) AS diff FROM employees;

The LEAD() function retrieves the next row value for the comparison.


8. Find Departments With More Than One Employee

Classic aggregation with a HAVING clause:

sql
SELECT department, COUNT(*) AS employee_count
FROM employees GROUP BY department HAVING COUNT(*) > 1;

This identifies departments with multiple employees.


9. Maximum Value for Each Group Without Using GROUP BY

Using a WINDOW FUNCTION:

sql
SELECT *
FROM ( SELECT *, MAX(column_name) OVER (PARTITION BY group_column) AS max_in_group FROM table_name ) subquery WHERE column_name = max_in_group;

This method avoids the need for explicit GROUP BY.


10. Employees Taking More Than 3 Leaves in a Month

Group the leave data and filter by HAVING:

sql
SELECT employee_id, MONTH(leave_date) AS leave_month, COUNT(*) AS leave_count
FROM leaves GROUP BY employee_id, MONTH(leave_date) HAVING COUNT(*) > 3;

This query isolates employees with excessive leave.


Reflections and Key Takeaways

Each of these questions tests a specific SQL skill, from basic operations to advanced window functions and subqueries. Here’s how I prepared for such challenges:

  • Practice Daily: I honed my skills on platforms like HackerRank and LeetCode.
  • Master Window Functions: Functions like ROW_NUMBER(), LEAD(), and SUM() OVER are crucial for handling complex scenarios.
  • Understand Business Context: These questions often simulate real-world scenarios, so understanding the "why" behind the query is essential.

Pro Tip: In interviews, explain your thought process clearly. Employers value logical problem-solving as much as correct answers.

If you're preparing for SQL interviews, focus on efficiency, creativity, and understanding the intent behind queries. Master these skills, and you’ll be ready to ace your interview! 🚀


Like, Share, and Comment!

Let me know if you want me to break down more such tricky interview questions. 😊
#SQL #InterviewExperience #AmericanExpress #DataAnalytics #CareerGrowth

Comments

Ad

Popular posts from this blog

Deloitte Data Analyst Interview Questions and Answer

Deloitte Data Analyst Interview Questions: Insights and My Personal Approach to Answering Them 1. Tell us about yourself and your current job responsibilities. Example Answer: "I am currently working as a Data Analyst at [Company Name], where I manage and analyze large datasets to drive business insights. My responsibilities include creating and maintaining Power BI dashboards, performing advanced SQL queries to extract and transform data, and collaborating with cross-functional teams to improve data-driven decision-making. Recently, I worked on a project where I streamlined reporting processes using DAX measures and optimized SQL queries, reducing report generation time by 30%." 2. Can you share some challenges you encountered in your recent project involving Power BI dashboards, and how did you resolve them? Example Challenge: In a recent project, one of the key challenges was handling complex relationships between multiple datasets, which caused performance issues and in...

Deloitte Recent Interview Questions for Data Analyst Position November 2024

Deloitte Recent Interview Insights for a Data Analyst Position (0-3 Years) When preparing for an interview with a firm like Deloitte, particularly for a data analyst role, it's crucial to combine technical proficiency with real-world experiences. Below are my personalized insights into common interview questions. 1. Tell us about yourself and your current job responsibilities. Hi, I’m [Your Name], currently working as a Sr. Data Analyst with over 3.5 years of experience. I specialize in creating interactive dashboards, analyzing large datasets, and automating workflows. My responsibilities include developing Power BI dashboards for financial and operational reporting, analyzing trends in customer churn rates, and collaborating with cross-functional teams to implement data-driven solutions. Here’s a quick glimpse of my professional journey: Reporting financial metrics using Power BI, Excel, and SQL. Designing dashboards to track sales and marketing KPIs. Teaching data analysis conce...

EXL Interview question and answer for Power BI Developer (3 Years of Experience)

EXL Interview Experience for Power BI Developer (3 Years of Experience) I recently appeared for an interview at EXL for the role of Power BI Developer . The selection process consisted of three rounds: 2 Technical Rounds 1 Managerial Round Here, I’ll share the key technical questions I encountered, along with my approach to answering them. SQL Questions 1️⃣ Write a SQL query to find the second most recent order date for each customer from a table Orders ( OrderID , CustomerID , OrderDate ). To solve this, I used the ROW_NUMBER() window function: sql WITH RankedOrders AS ( SELECT CustomerID, OrderDate, ROW_NUMBER () OVER ( PARTITION BY CustomerID ORDER BY OrderDate DESC ) AS RowNum FROM Orders ) SELECT CustomerID, OrderDate AS SecondMostRecentOrderDate FROM RankedOrders WHERE RowNum = 2 ; 2️⃣ Write a query to find the nth highest salary from a table Employees with columns ( EmployeeID , Name , Salary ). The DENSE_RANK() fu...