Meesho PySpark Interview Questions for Data Engineers in 2025 Preparing for a PySpark interview? Let’s tackle some commonly asked questions, along with practical answers and insights to ace your next Data Engineering interview at Meesho or any top-tier tech company. 1. Explain how caching and persistence work in PySpark. When would you use cache() versus persist() and what are their performance implications? Answer : Caching : Stores data in memory (default) for faster retrieval. Use cache() when you need to reuse a DataFrame or RDD multiple times in a session without specifying storage levels. Example: python df.cache() df.count() # Triggers caching Persistence : Allows you to specify storage levels (e.g., memory, disk, or a combination). Use persist() when memory is limited, and you want a fallback to disk storage. Example: python from pyspark import StorageLevel df.persist(StorageLevel.MEMORY_AND_DISK) df.count() # Triggers persistence Performance Implications : cache() is ...
Interview Framework at Paytm for a Business Analyst Role (For Freshers & Experienced Candidates) In this blog, I’ll share a detailed breakdown of the Paytm interview process for the Business Analyst role, including insights into the technical rounds, expectations, and how to answer key questions. If you’re preparing for this role, this blog is for you! Round 1: Technical Interview (With Analysts) Duration : 1 Hour Structure : This round is conducted by analysts from the team you'll work with. It focuses on SQL , Python , Excel , and Visualization tools like Power BI or Tableau. Here’s a breakdown of the typical questions and how I would approach them: 1. SQL Questions Question : Explain the difference between ROW_NUMBER(), RANK(), and DENSE_RANK(). Answer : ROW_NUMBER() assigns a unique number to each row, starting from 1, without caring about duplicates. RANK() assigns ranks, but if there are ties, the next rank skips numbers. For example: 1, 2, 2, 4 . DENSE_RANK() is sim...