Meesho PySpark Interview Questions for Data Engineers in 2025 Preparing for a PySpark interview? Let’s tackle some commonly asked questions, along with practical answers and insights to ace your next Data Engineering interview at Meesho or any top-tier tech company. 1. Explain how caching and persistence work in PySpark. When would you use cache() versus persist() and what are their performance implications? Answer : Caching : Stores data in memory (default) for faster retrieval. Use cache() when you need to reuse a DataFrame or RDD multiple times in a session without specifying storage levels. Example: python df.cache() df.count() # Triggers caching Persistence : Allows you to specify storage levels (e.g., memory, disk, or a combination). Use persist() when memory is limited, and you want a fallback to disk storage. Example: python from pyspark import StorageLevel df.persist(StorageLevel.MEMORY_AND_DISK) df.count() # Triggers persistence Performance Implications : cache() is ...
Meesho Data Analyst Interview Experience (0-3 Years) Recently, I interviewed for a Data Analyst position at Meesho , and I encountered an engaging mix of Power BI and SQL questions. Below, I’ve outlined how I approached and answered these questions to help others preparing for similar roles. Power BI Questions 1️⃣ Explain the concept of context transition in DAX and provide an example. Context transition refers to the conversion of row context into filter context when using certain functions like CALCULATE . For example: DAX SalesTable = SUMMARIZE( Orders, Orders[CustomerID], "TotalSales", CALCULATE(SUM(Orders[SalesAmount])) ) Here, CALCULATE changes the row context (specific customer) into a filter context, allowing aggregate functions like SUM to work accurately. 2️⃣ How would you optimize a complex Power BI report for faster performance? Some key optimization techniques include: Reducing the model size : Remove unused columns and reduce the granularity o...