Skip to main content

Meesho PySpark Interview Questions for Data Engineers in 2025

Meesho PySpark Interview Questions for Data Engineers in 2025 Preparing for a PySpark interview? Let’s tackle some commonly asked questions, along with practical answers and insights to ace your next Data Engineering interview at Meesho or any top-tier tech company. 1. Explain how caching and persistence work in PySpark. When would you use cache() versus persist() and what are their performance implications? Answer : Caching : Stores data in memory (default) for faster retrieval. Use cache() when you need to reuse a DataFrame or RDD multiple times in a session without specifying storage levels. Example: python df.cache() df.count() # Triggers caching Persistence : Allows you to specify storage levels (e.g., memory, disk, or a combination). Use persist() when memory is limited, and you want a fallback to disk storage. Example: python from pyspark import StorageLevel df.persist(StorageLevel.MEMORY_AND_DISK) df.count() # Triggers persistence Performance Implications : cache() is ...

Ad

Meesho Data Analyst Interview question and answer (0-3 Years)

Meesho Data Analyst Interview Experience (0-3 Years)

Recently, I interviewed for a Data Analyst position at Meesho, and I encountered an engaging mix of Power BI and SQL questions. Below, I’ve outlined how I approached and answered these questions to help others preparing for similar roles.


Power BI Questions

1️⃣ Explain the concept of context transition in DAX and provide an example.

Context transition refers to the conversion of row context into filter context when using certain functions like CALCULATE. For example:

DAX

SalesTable = SUMMARIZE( Orders, Orders[CustomerID], "TotalSales", CALCULATE(SUM(Orders[SalesAmount])) )

Here, CALCULATE changes the row context (specific customer) into a filter context, allowing aggregate functions like SUM to work accurately.


2️⃣ How would you optimize a complex Power BI report for faster performance?

Some key optimization techniques include:

  • Reducing the model size: Remove unused columns and reduce the granularity of data.
  • Efficient DAX formulas: Avoid complex and iterative measures; use variables to store intermediate results.
  • Aggregations: Pre-aggregate data at the database level.
  • Optimize visuals: Minimize the use of visuals like tables and matrices with large data.

3️⃣ Describe the process of creating and using calculation groups in Power BI.

Calculation groups allow reusing measures with different time-intelligence calculations:

  1. Create a calculation group using Tabular Editor.
  2. Define calculation items (e.g., YTD, MTD, QoQ).
  3. Use these items across measures to apply consistent calculations.
    Example: Adding a "Year-To-Date" calculation to multiple measures using one group.

4️⃣ Explain how you would handle large datasets in Power BI without compromising performance.

  • Use DirectQuery for real-time data and aggregations for high-level metrics.
  • Implement incremental refresh for historical data updates.
  • Use Power Query for data transformations to optimize data ingestion.
  • Leverage partitioning for large tables.

5️⃣ What is a composite model in Power BI, and how can it be used effectively?

A composite model combines Import and DirectQuery modes, enabling flexibility. Example: Use Import for frequently queried data and DirectQuery for real-time updates.
Effective use: Linking aggregated historical sales (Import) with live inventory data (DirectQuery).


6️⃣ How does the USERELATIONSHIP function work, and when would you use it?

USERELATIONSHIP activates an inactive relationship in a model temporarily. Example:

DAX

TotalSales = CALCULATE( SUM(Sales[Amount]), USERELATIONSHIP(Sales[OrderDate], Calendar[Date]) )

Use it when you need to analyze data using a non-default relationship.


7️⃣ Describe how to use Power Query M language for advanced data transformations.

Power Query M allows you to manipulate data effectively:

  • Column transformations: Use Table.AddColumn.
  • Pivot/unpivot: Use Table.Pivot or Table.Unpivot.
  • Parameterization: Create reusable queries for dynamic filtering.

Example: Splitting a full name column into first and last names using Text.Split.


8️⃣ Explain the difference between CROSSFILTER and TREATAS in DAX.

  • CROSSFILTER: Controls the direction of relationship filtering.
    DAX

    CALCULATE(SUM(Sales[Amount]), CROSSFILTER(Orders[OrderID], Sales[OrderID], BOTH))
  • TREATAS: Applies a table as a filter without creating a relationship.
    DAX

    CALCULATE(SUM(Sales[Amount]), TREATAS({1, 2, 3}, Products[CategoryID]))

Use CROSSFILTER for modifying relationships and TREATAS for temporary filters.


SQL Questions

1️⃣ How would you optimize a slow-running query with multiple joins?

  • Indexes: Add indexes to columns used in joins.
  • Reduce result set: Filter data early using WHERE or JOIN conditions.
  • Query execution plan: Analyze for bottlenecks.
  • Avoid nested loops: Use temp tables or CTEs to simplify logic.

2️⃣ What is a recursive CTE, and can you provide an example of when to use it?

A recursive CTE is a query that references itself to traverse hierarchical data. Example: Employee hierarchy.

sql

WITH RecursiveCTE AS ( SELECT EmployeeID, ManagerID, Name FROM Employees WHERE ManagerID IS NULL UNION ALL SELECT e.EmployeeID, e.ManagerID, e.Name FROM Employees e INNER JOIN RecursiveCTE r ON e.ManagerID = r.EmployeeID ) SELECT * FROM RecursiveCTE;

3️⃣ Explain the difference between clustered and non-clustered indexes and when to use each.

  • Clustered index: Sorts and stores data physically (e.g., PRIMARY KEY).
  • Non-clustered index: Logical structure for faster lookups.
    Use clustered for unique columns (ID) and non-clustered for search queries.

4️⃣ Write a query to find the second highest salary in each department.

sql

WITH RankedSalaries AS ( SELECT DepartmentID, Salary, RANK() OVER (PARTITION BY DepartmentID ORDER BY Salary DESC) AS Rank FROM Employees ) SELECT DepartmentID, Salary FROM RankedSalaries WHERE Rank = 2;

5️⃣ How would you detect and resolve deadlocks in SQL?

  • Detection: Use SQL Server Profiler or system views (sys.dm_tran_locks).
  • Resolution:
    • Reduce lock time by using shorter transactions.
    • Use NOLOCK where appropriate.
    • Avoid cyclic dependencies in queries.

6️⃣ Explain window functions and provide examples of ROW_NUMBER, RANK, and DENSE_RANK.

Window functions perform calculations across a set of table rows:

  • ROW_NUMBER: Assigns a unique number to rows.
  • RANK: Assigns ranks with gaps.
  • DENSE_RANK: Assigns ranks without gaps.

Example:

sql

SELECT EmployeeID, ROW_NUMBER() OVER (ORDER BY Salary DESC) AS RowNum, RANK() OVER (ORDER BY Salary DESC) AS Rank, DENSE_RANK() OVER (ORDER BY Salary DESC) AS DenseRank FROM Employees;

7️⃣ Describe the ACID properties in database transactions and their significance.

  • Atomicity: Entire transaction succeeds or fails.
  • Consistency: Maintains data integrity.
  • Isolation: Prevents transactions from interfering.
  • Durability: Ensures data persists after commit.

8️⃣ Write a query to calculate a running total with partitions based on specific conditions.

sql

SELECT DepartmentID, EmployeeID, Salary, SUM(Salary) OVER (PARTITION BY DepartmentID ORDER BY JoinDate) AS RunningTotal FROM Employees;

Final Thoughts

The Meesho interview thoroughly tested my skills in SQL, DAX, and Power BI. These questions required a strong understanding of fundamentals and real-world application. Sharing these insights will hopefully help others ace their interviews. Best of luck! 🚀

#Meesho #PowerBI #SQL #InterviewExperience #DataAnalyst

Comments

Ad

Popular posts from this blog

Deloitte Data Analyst Interview Questions and Answer

Deloitte Data Analyst Interview Questions: Insights and My Personal Approach to Answering Them 1. Tell us about yourself and your current job responsibilities. Example Answer: "I am currently working as a Data Analyst at [Company Name], where I manage and analyze large datasets to drive business insights. My responsibilities include creating and maintaining Power BI dashboards, performing advanced SQL queries to extract and transform data, and collaborating with cross-functional teams to improve data-driven decision-making. Recently, I worked on a project where I streamlined reporting processes using DAX measures and optimized SQL queries, reducing report generation time by 30%." 2. Can you share some challenges you encountered in your recent project involving Power BI dashboards, and how did you resolve them? Example Challenge: In a recent project, one of the key challenges was handling complex relationships between multiple datasets, which caused performance issues and in...

Deloitte Recent Interview Questions for Data Analyst Position November 2024

Deloitte Recent Interview Insights for a Data Analyst Position (0-3 Years) When preparing for an interview with a firm like Deloitte, particularly for a data analyst role, it's crucial to combine technical proficiency with real-world experiences. Below are my personalized insights into common interview questions. 1. Tell us about yourself and your current job responsibilities. Hi, I’m [Your Name], currently working as a Sr. Data Analyst with over 3.5 years of experience. I specialize in creating interactive dashboards, analyzing large datasets, and automating workflows. My responsibilities include developing Power BI dashboards for financial and operational reporting, analyzing trends in customer churn rates, and collaborating with cross-functional teams to implement data-driven solutions. Here’s a quick glimpse of my professional journey: Reporting financial metrics using Power BI, Excel, and SQL. Designing dashboards to track sales and marketing KPIs. Teaching data analysis conce...

EXL Interview question and answer for Power BI Developer (3 Years of Experience)

EXL Interview Experience for Power BI Developer (3 Years of Experience) I recently appeared for an interview at EXL for the role of Power BI Developer . The selection process consisted of three rounds: 2 Technical Rounds 1 Managerial Round Here, I’ll share the key technical questions I encountered, along with my approach to answering them. SQL Questions 1️⃣ Write a SQL query to find the second most recent order date for each customer from a table Orders ( OrderID , CustomerID , OrderDate ). To solve this, I used the ROW_NUMBER() window function: sql WITH RankedOrders AS ( SELECT CustomerID, OrderDate, ROW_NUMBER () OVER ( PARTITION BY CustomerID ORDER BY OrderDate DESC ) AS RowNum FROM Orders ) SELECT CustomerID, OrderDate AS SecondMostRecentOrderDate FROM RankedOrders WHERE RowNum = 2 ; 2️⃣ Write a query to find the nth highest salary from a table Employees with columns ( EmployeeID , Name , Salary ). The DENSE_RANK() fu...