Skip to main content

Posts

Showing posts with the label Shell Data Analyst 2024

Meesho PySpark Interview Questions for Data Engineers in 2025

Meesho PySpark Interview Questions for Data Engineers in 2025 Preparing for a PySpark interview? Let’s tackle some commonly asked questions, along with practical answers and insights to ace your next Data Engineering interview at Meesho or any top-tier tech company. 1. Explain how caching and persistence work in PySpark. When would you use cache() versus persist() and what are their performance implications? Answer : Caching : Stores data in memory (default) for faster retrieval. Use cache() when you need to reuse a DataFrame or RDD multiple times in a session without specifying storage levels. Example: python df.cache() df.count() # Triggers caching Persistence : Allows you to specify storage levels (e.g., memory, disk, or a combination). Use persist() when memory is limited, and you want a fallback to disk storage. Example: python from pyspark import StorageLevel df.persist(StorageLevel.MEMORY_AND_DISK) df.count() # Triggers persistence Performance Implications : cache() is ...

Ad

Shell Data Analyst Interview question and answer December 2024

Shell Data Analyst Interview Experience: CTC - 18 LPA Shell’s Data Analyst role demands strong SQL, Python, and Power BI skills alongside the ability to align technical insights with business strategy. Below, I’ve shared the questions asked during my interview process and how I would have answered them. SQL Questions 1️⃣ Write a query to calculate the cumulative revenue per customer for each month in the last year. Answer : sql SELECT CustomerID, DATE_FORMAT( Date , '%Y-%m' ) AS Month , SUM (Amount) OVER ( PARTITION BY CustomerID ORDER BY DATE_FORMAT( Date , '%Y-%m' )) AS CumulativeRevenue FROM Transactions WHERE Date >= DATE_SUB(CURDATE(), INTERVAL 1 YEAR ); 2️⃣ Identify plants that consistently exceeded their daily average output for at least 20 days in a given month. Answer : sql WITH DailyAvg AS ( SELECT PlantID, AVG (Output) AS AvgOutput FROM Production GROUP BY PlantID ), ExceedDay...

Ad