Meesho PySpark Interview Questions for Data Engineers in 2025 Preparing for a PySpark interview? Let’s tackle some commonly asked questions, along with practical answers and insights to ace your next Data Engineering interview at Meesho or any top-tier tech company. 1. Explain how caching and persistence work in PySpark. When would you use cache() versus persist() and what are their performance implications? Answer : Caching : Stores data in memory (default) for faster retrieval. Use cache() when you need to reuse a DataFrame or RDD multiple times in a session without specifying storage levels. Example: python df.cache() df.count() # Triggers caching Persistence : Allows you to specify storage levels (e.g., memory, disk, or a combination). Use persist() when memory is limited, and you want a fallback to disk storage. Example: python from pyspark import StorageLevel df.persist(StorageLevel.MEMORY_AND_DISK) df.count() # Triggers persistence Performance Implications : cache() is ...
Power BI Developer Interview at Novartis: My Approach to the Questions Excited to share how I would answer the questions asked in a recent interview for a Power BI Developer role at Novartis. These questions cover both technical concepts and practical applications, so let’s dive in! 1️⃣ Introduce Yourself Answer: I’m a passionate data professional with [X years] of experience in data visualization, analytics, and reporting. I specialize in Power BI, SQL, and Python, having worked on projects involving dashboard creation, data modeling, and KPI analysis to drive business insights. My experience includes collaborating with cross-functional teams and delivering actionable insights for data-driven decision-making. 2️⃣ Explain Merge and Append Queries Answer: Merge Queries: Used to join two tables based on a common column (like SQL joins). It’s useful for combining data from different sources. Append Queries: Used to stack or union tables vertically, adding rows from one table to another....