Meesho PySpark Interview Questions for Data Engineers in 2025 Preparing for a PySpark interview? Let’s tackle some commonly asked questions, along with practical answers and insights to ace your next Data Engineering interview at Meesho or any top-tier tech company. 1. Explain how caching and persistence work in PySpark. When would you use cache() versus persist() and what are their performance implications? Answer : Caching : Stores data in memory (default) for faster retrieval. Use cache() when you need to reuse a DataFrame or RDD multiple times in a session without specifying storage levels. Example: python df.cache() df.count() # Triggers caching Persistence : Allows you to specify storage levels (e.g., memory, disk, or a combination). Use persist() when memory is limited, and you want a fallback to disk storage. Example: python from pyspark import StorageLevel df.persist(StorageLevel.MEMORY_AND_DISK) df.count() # Triggers persistence Performance Implications : cache() is ...
Flipkart Business Analyst Interview Experience (1-3 Years) Recently, I appeared for an interview at Flipkart for the position of Business Analyst , and I’m excited to share the questions asked during the process along with how I would approach answering them. The interview covered various domains such as SQL, guesstimates, case studies, managerial scenarios, and Python. Here’s how I would have tackled each question: SQL Questions 1️⃣ What are window functions, and how do they differ from aggregate functions? Can you give a use case? Answer : Window functions perform calculations across a set of table rows related to the current row, without collapsing the result set into a single value like aggregate functions. Example: sql SELECT CustomerID, OrderID, OrderDate, ROW_NUMBER () OVER ( PARTITION BY CustomerID ORDER BY OrderDate DESC ) AS OrderRank FROM Orders; Use case: Finding the latest order per customer without grouping data. 2️⃣ Explain indexing. When could an i...