Meesho PySpark Interview Questions for Data Engineers in 2025 Preparing for a PySpark interview? Let’s tackle some commonly asked questions, along with practical answers and insights to ace your next Data Engineering interview at Meesho or any top-tier tech company. 1. Explain how caching and persistence work in PySpark. When would you use cache() versus persist() and what are their performance implications? Answer : Caching : Stores data in memory (default) for faster retrieval. Use cache() when you need to reuse a DataFrame or RDD multiple times in a session without specifying storage levels. Example: python df.cache() df.count() # Triggers caching Persistence : Allows you to specify storage levels (e.g., memory, disk, or a combination). Use persist() when memory is limited, and you want a fallback to disk storage. Example: python from pyspark import StorageLevel df.persist(StorageLevel.MEMORY_AND_DISK) df.count() # Triggers persistence Performance Implications : cache() is ...
BlackRock Data Analyst Interview and Answer BlackRock’s Data Analyst interview process is known for its intensity and focus on technical expertise, especially in SQL and Python. The questions were a mix of practical problems, theoretical knowledge, and real-world financial scenarios, reflecting BlackRock's emphasis on analytical rigor and financial acumen. Here’s a breakdown of the questions I encountered and my approach to solving them. SQL Questions 1️⃣ Identify customers who have invested in at least two funds with opposite performance trends over the last 6 months. Answer : sql WITH FundPerformance AS ( SELECT FundID, CASE WHEN AVG ( Return ) > 0 THEN 'Increasing' ELSE 'Decreasing' END AS Trend FROM FundReturns WHERE Date >= DATE_SUB(CURDATE(), INTERVAL 6 MONTH ) GROUP BY FundID ), CustomerInvestments AS ( SELECT CustomerID, FundID FROM Investments ) SELECT ci.CustomerID FR...