Meesho PySpark Interview Questions for Data Engineers in 2025 Preparing for a PySpark interview? Let’s tackle some commonly asked questions, along with practical answers and insights to ace your next Data Engineering interview at Meesho or any top-tier tech company. 1. Explain how caching and persistence work in PySpark. When would you use cache() versus persist() and what are their performance implications? Answer : Caching : Stores data in memory (default) for faster retrieval. Use cache() when you need to reuse a DataFrame or RDD multiple times in a session without specifying storage levels. Example: python df.cache() df.count() # Triggers caching Persistence : Allows you to specify storage levels (e.g., memory, disk, or a combination). Use persist() when memory is limited, and you want a fallback to disk storage. Example: python from pyspark import StorageLevel df.persist(StorageLevel.MEMORY_AND_DISK) df.count() # Triggers persistence Performance Implications : cache() is ...
Recently Asked Power BI Developer Interview Question at Indegene As a Power BI enthusiast or developer, interview questions often delve into the technical intricacies of DAX (Data Analysis Expressions). Here’s a deep dive into a commonly asked question, recently posed to a 2+ year candidate for the Power BI Developer role at Indegene. 1. What is the difference between ALL, ALLSELECTED, and ALLEXCEPT functions? Understanding these functions is key to managing filters effectively in your calculations. ALL ➡️ Removes all filters applied to a table or column, including slicers, visuals, and external filters. Example: DAX TotalSalesWithoutFilters = SUMX(ALL(Sales), Sales[Amount]) If filters are applied to Region and Product, using ALL(Sales) ignores both. One-liner: Removes all filters from the data. ALLSELECTED ➡️ Removes filters inside a visual but respects filters from slicers or external visuals. Example: DAX SalesInSlicerContext = SUMX(ALLSELECTED(Sales), Sales[Amount]) If a slic...