Skip to main content

Meesho PySpark Interview Questions for Data Engineers in 2025

Meesho PySpark Interview Questions for Data Engineers in 2025 Preparing for a PySpark interview? Let’s tackle some commonly asked questions, along with practical answers and insights to ace your next Data Engineering interview at Meesho or any top-tier tech company. 1. Explain how caching and persistence work in PySpark. When would you use cache() versus persist() and what are their performance implications? Answer : Caching : Stores data in memory (default) for faster retrieval. Use cache() when you need to reuse a DataFrame or RDD multiple times in a session without specifying storage levels. Example: python df.cache() df.count() # Triggers caching Persistence : Allows you to specify storage levels (e.g., memory, disk, or a combination). Use persist() when memory is limited, and you want a fallback to disk storage. Example: python from pyspark import StorageLevel df.persist(StorageLevel.MEMORY_AND_DISK) df.count() # Triggers persistence Performance Implications : cache() is ...

Ad

Shell Data Analyst Interview question and answer December 2024

Shell Data Analyst Interview Experience: CTC - 18 LPA

Shell’s Data Analyst role demands strong SQL, Python, and Power BI skills alongside the ability to align technical insights with business strategy. Below, I’ve shared the questions asked during my interview process and how I would have answered them.




SQL Questions

1️⃣ Write a query to calculate the cumulative revenue per customer for each month in the last year.

  • Answer:
    sql

    SELECT CustomerID, DATE_FORMAT(Date, '%Y-%m') AS Month, SUM(Amount) OVER (PARTITION BY CustomerID ORDER BY DATE_FORMAT(Date, '%Y-%m')) AS CumulativeRevenue FROM Transactions WHERE Date >= DATE_SUB(CURDATE(), INTERVAL 1 YEAR);

2️⃣ Identify plants that consistently exceeded their daily average output for at least 20 days in a given month.

  • Answer:
    sql

    WITH DailyAvg AS ( SELECT PlantID, AVG(Output) AS AvgOutput FROM Production GROUP BY PlantID ), ExceedDays AS ( SELECT p.PlantID, DATE(p.Date) AS Day, COUNT(*) OVER (PARTITION BY p.PlantID) AS ExceedCount FROM Production p JOIN DailyAvg da ON p.PlantID = da.PlantID WHERE p.Output > da.AvgOutput ) SELECT PlantID FROM ExceedDays WHERE ExceedCount >= 20;

3️⃣ Find employees with the highest consecutive absences in the last quarter.

  • Answer:
    sql

    WITH AbsenceRank AS ( SELECT EmployeeID, Date, ROW_NUMBER() OVER (PARTITION BY EmployeeID ORDER BY Date) - ROW_NUMBER() OVER (ORDER BY Date) AS RankGroup FROM EmployeeAttendance WHERE Status = 'Absent' AND Date >= DATE_SUB(CURDATE(), INTERVAL 3 MONTH) ) SELECT EmployeeID, COUNT(*) AS ConsecutiveAbsences FROM AbsenceRank GROUP BY EmployeeID, RankGroup ORDER BY ConsecutiveAbsences DESC LIMIT 1;

4️⃣ Pros and cons of using indexes in SQL, and when would you avoid using them?

  • Answer:
    Pros: Speeds up query performance, especially on large datasets.
    Cons: Slows down INSERT/UPDATE operations and increases storage requirements.
    Avoid: When tables experience frequent writes or have low query volume.

5️⃣ Differences between window and aggregate functions with examples.

  • Answer:
    Window functions operate on a subset of rows and return a result for each row, whereas aggregate functions collapse rows into a single value.
    • Window Function Example: Cumulative sales for each customer.
      sql
      SELECT CustomerID, SUM(Sales) OVER (PARTITION BY CustomerID ORDER BY Date) AS CumulativeSales
      FROM Orders;
    • Aggregate Function Example: Total sales per customer.
      sql
      SELECT CustomerID, SUM(Sales) AS TotalSales
      FROM Orders GROUP BY CustomerID;

Python Questions

6️⃣ Merge multiple CSV files and clean the data.

  • Answer:
    python

    import pandas as pd import os def merge_csv(directory): all_files = [f for f in os.listdir(directory) if f.endswith('.csv')] dataframes = [pd.read_csv(os.path.join(directory, file)) for file in all_files] merged_df = pd.concat(dataframes) # Basic cleaning merged_df.drop_duplicates(inplace=True) merged_df.fillna(0, inplace=True) merged_df.to_csv('merged_file.csv', index=False) merge_csv('path_to_directory')

7️⃣ Group a list of dictionaries by a key and calculate summary statistics.

  • Answer:
    python

    from collections import defaultdict def group_data(data, key): grouped = defaultdict(list) for item in data: grouped[item[key]].append(item) summary = {k: len(v) for k, v in grouped.items()} return summary data = [{'Category': 'A', 'Value': 10}, {'Category': 'B', 'Value': 20}, {'Category': 'A', 'Value': 15}] print(group_data(data, 'Category'))

8️⃣ Difference between list, tuple, and dictionary with examples.

  • Answer:
    • List: Mutable, ordered collection (e.g., [1, 2, 3]).
    • Tuple: Immutable, ordered collection (e.g., (1, 2, 3)).
    • Dictionary: Key-value pairs, unordered (e.g., {'key': 'value'}).

9️⃣ Automate the generation of monthly reports from an Excel dataset.

  • Answer:
    python

    import pandas as pd def generate_reports(file_path): data = pd.read_excel(file_path) grouped = data.groupby('Month') for month, group in grouped: group.to_excel(f'{month}_report.xlsx', index=False) generate_reports('sales_data.xlsx')

Power BI Questions

🔟 Create a dashboard to track production plant efficiency.

  • Use measures like OEE (Overall Equipment Efficiency), visualize KPIs with cards, and use line graphs for trends.

        1️⃣ Handle data source refresh delays.

  • Optimize queries, use DirectQuery mode, and ensure a reliable connection.

        2️⃣ Row-level vs. role-level security.

  • Row-level: Controls data access at the row level for individual users.
  • Role-level: Groups users into roles to apply security policies collectively.

        3️⃣ Visualize trends and outliers in daily sales data.

  • Use scatter plots and line charts with dynamic filters to highlight anomalies.

        4️⃣ Create a calculated measure for YoY growth.
DAX YoY Growth = (SUM(Sales) - CALCULATE(SUM(Sales), SAMEPERIODLASTYEAR(Date))) / CALCULATE(SUM(Sales), SAMEPERIODLASTYEAR(Date))


General Questions

        5️⃣ Data-driven insights example.

  • At my previous role, I analyzed customer purchase patterns and introduced a discount strategy that increased sales by 15%.

        6️⃣ Prioritizing tasks in high-pressure environments.

  • Use tools like Eisenhower Matrix and regularly communicate with stakeholders to manage expectations.

        7️⃣ Why join Shell?

  • Shell’s commitment to sustainability aligns with my values. My expertise in SQL, Python, and BI tools will help drive data-driven decision-making in Shell’s operational efficiency goals.

Pro Tip

Stay confident, structure your answers, and align them with the business impact wherever possible.


#Shell #DataAnalyst #InterviewExperience #SQL #Python #PowerBI

Comments

Ad

Popular posts from this blog

Deloitte Data Analyst Interview Questions and Answer

Deloitte Data Analyst Interview Questions: Insights and My Personal Approach to Answering Them 1. Tell us about yourself and your current job responsibilities. Example Answer: "I am currently working as a Data Analyst at [Company Name], where I manage and analyze large datasets to drive business insights. My responsibilities include creating and maintaining Power BI dashboards, performing advanced SQL queries to extract and transform data, and collaborating with cross-functional teams to improve data-driven decision-making. Recently, I worked on a project where I streamlined reporting processes using DAX measures and optimized SQL queries, reducing report generation time by 30%." 2. Can you share some challenges you encountered in your recent project involving Power BI dashboards, and how did you resolve them? Example Challenge: In a recent project, one of the key challenges was handling complex relationships between multiple datasets, which caused performance issues and in...

Deloitte Recent Interview Questions for Data Analyst Position November 2024

Deloitte Recent Interview Insights for a Data Analyst Position (0-3 Years) When preparing for an interview with a firm like Deloitte, particularly for a data analyst role, it's crucial to combine technical proficiency with real-world experiences. Below are my personalized insights into common interview questions. 1. Tell us about yourself and your current job responsibilities. Hi, I’m [Your Name], currently working as a Sr. Data Analyst with over 3.5 years of experience. I specialize in creating interactive dashboards, analyzing large datasets, and automating workflows. My responsibilities include developing Power BI dashboards for financial and operational reporting, analyzing trends in customer churn rates, and collaborating with cross-functional teams to implement data-driven solutions. Here’s a quick glimpse of my professional journey: Reporting financial metrics using Power BI, Excel, and SQL. Designing dashboards to track sales and marketing KPIs. Teaching data analysis conce...

EXL Interview question and answer for Power BI Developer (3 Years of Experience)

EXL Interview Experience for Power BI Developer (3 Years of Experience) I recently appeared for an interview at EXL for the role of Power BI Developer . The selection process consisted of three rounds: 2 Technical Rounds 1 Managerial Round Here, I’ll share the key technical questions I encountered, along with my approach to answering them. SQL Questions 1️⃣ Write a SQL query to find the second most recent order date for each customer from a table Orders ( OrderID , CustomerID , OrderDate ). To solve this, I used the ROW_NUMBER() window function: sql WITH RankedOrders AS ( SELECT CustomerID, OrderDate, ROW_NUMBER () OVER ( PARTITION BY CustomerID ORDER BY OrderDate DESC ) AS RowNum FROM Orders ) SELECT CustomerID, OrderDate AS SecondMostRecentOrderDate FROM RankedOrders WHERE RowNum = 2 ; 2️⃣ Write a query to find the nth highest salary from a table Employees with columns ( EmployeeID , Name , Salary ). The DENSE_RANK() fu...