Skip to main content

Meesho PySpark Interview Questions for Data Engineers in 2025

Meesho PySpark Interview Questions for Data Engineers in 2025 Preparing for a PySpark interview? Let’s tackle some commonly asked questions, along with practical answers and insights to ace your next Data Engineering interview at Meesho or any top-tier tech company. 1. Explain how caching and persistence work in PySpark. When would you use cache() versus persist() and what are their performance implications? Answer : Caching : Stores data in memory (default) for faster retrieval. Use cache() when you need to reuse a DataFrame or RDD multiple times in a session without specifying storage levels. Example: python df.cache() df.count() # Triggers caching Persistence : Allows you to specify storage levels (e.g., memory, disk, or a combination). Use persist() when memory is limited, and you want a fallback to disk storage. Example: python from pyspark import StorageLevel df.persist(StorageLevel.MEMORY_AND_DISK) df.count() # Triggers persistence Performance Implications : cache() is ...

Ad

Deloitte Data Analyst Interview Questions and Answer

Deloitte Data Analyst Interview Questions: Insights and My Personal Approach to Answering Them


1. Tell us about yourself and your current job responsibilities.

Example Answer: "I am currently working as a Data Analyst at [Company Name], where I manage and analyze large datasets to drive business insights. My responsibilities include creating and maintaining Power BI dashboards, performing advanced SQL queries to extract and transform data, and collaborating with cross-functional teams to improve data-driven decision-making. Recently, I worked on a project where I streamlined reporting processes using DAX measures and optimized SQL queries, reducing report generation time by 30%."


2. Can you share some challenges you encountered in your recent project involving Power BI dashboards, and how did you resolve them?

Example Challenge: In a recent project, one of the key challenges was handling complex relationships between multiple datasets, which caused performance issues and incorrect aggregations in Power BI.

Resolution: To address this:

  • I optimized data models by reducing unnecessary columns and creating a star schema for better relationships.
  • Used summarized tables and DAX functions like SUMX and CALCULATE to simplify calculations.
  • Implemented row-level security for user-specific data, which reduced redundancy and improved dashboard performance.


3. What distinguishes a KPI (Key Performance Indicator) from a dimension?

  • KPI: A KPI is a measurable value that indicates performance relative to a goal (e.g., "Revenue Growth %," "Customer Retention Rate").

    • Example: Monthly Sales Target Achievement = 90%.
    • Example: Region = "North America," Product Category = "Electronics."

  • Dimension: A dimension provides descriptive information and is used for categorization or filtering (e.g., "Region," "Product Category").

Key Difference: KPIs measure performance, while dimensions provide context.


4. Write a SQL query to find the third highest salary from an employee table with the following columns: EID, ESalary.

sql

SELECT DISTINCT ESalary FROM Employee ORDER BY ESalary DESC LIMIT 1 OFFSET 2;

Alternatively (if LIMIT is not supported):

sql

SELECT MAX(ESalary) FROM Employee WHERE ESalary < ( SELECT MAX(ESalary) FROM Employee WHERE ESalary < ( SELECT MAX(ESalary) FROM Employee ) );

5. Create a SQL procedure using ESalary as a parameter that selects all EIDs from the Employee table where ESalary is less than 50,000.

sql

CREATE PROCEDURE GetLowSalaryEmployees(IN salaryLimit INT) BEGIN SELECT EID FROM Employee WHERE ESalary < salaryLimit; END;

Call Example:

sql

CALL GetLowSalaryEmployees(50000);

6. For the Employee table (columns EID, ESalary), retrieve all EIDs with odd salaries and join this with another table, empdetails (columns EID, EDOB), to obtain EDOB.

sql

SELECT e.EID, ed.EDOB FROM Employee e JOIN empdetails ed ON e.EID = ed.EID WHERE e.ESalary % 2 = 1;

7. How would you use the LEAD or LAG function in SQL to compare week-over-week data?

Example: To calculate the week-over-week sales difference:

sql

SELECT Week, Sales, LEAD(Sales) OVER (ORDER BY Week) AS NextWeekSales, Sales - LAG(Sales) OVER (ORDER BY Week) AS WeekOverWeekDifference FROM SalesData;
  • LEAD fetches the value for the next row.
  • LAG fetches the value for the previous row.

8. Can you explain how you would create a DAX measure in Power BI to calculate the year-over-year growth for a specific metric?

DAX Measure Example:

DAX

YoY Growth = DIVIDE( [Total Sales] - CALCULATE([Total Sales], SAMEPERIODLASTYEAR('Date'[Date])), CALCULATE([Total Sales], SAMEPERIODLASTYEAR('Date'[Date])) )
  • SAMEPERIODLASTYEAR: Compares sales for the same period last year.
  • DIVIDE: Calculates the percentage growth.

9. Identify a unique chart type in Power BI that differs from standard charts and explain its purpose.

Example Chart: Decomposition Tree

  • Purpose: Breaks down a key metric (e.g., Total Sales) by various dimensions (e.g., Region, Product, Time).
  • Use Case: Helps identify root causes for trends or variances.

10. Describe how you would implement a time intelligence feature in Power BI to analyze sales trends over different time periods.

Steps:

  1. Create a Date Table: Use DAX to generate a continuous date table with columns for Year, Quarter, Month, Week, etc.

    DAX

    DateTable = CALENDAR(DATE(2022, 1, 1), DATE(2024, 12, 31))
  2. Relate to Sales Data: Link the Date column in the date table to the sales dataset.

  3. Create Time Intelligence Measures:

    • Sales Last Month:
      DAX

      SalesLastMonth = CALCULATE([Total Sales], PREVIOUSMONTH('Date'[Date]))
    • Sales Trend:
      DAX

      SalesTrend = CALCULATE([Total Sales], DATESYTD('Date'[Date]))
  4. Visualize Trends: Use line charts or time slicers to showcase data over time (e.g., year-to-date, month-over-month).


These examples should give you a strong foundation for preparing responses to Deloitte's interview questions!



Comments

Ad

Popular posts from this blog

Deloitte Recent Interview Questions for Data Analyst Position November 2024

Deloitte Recent Interview Insights for a Data Analyst Position (0-3 Years) When preparing for an interview with a firm like Deloitte, particularly for a data analyst role, it's crucial to combine technical proficiency with real-world experiences. Below are my personalized insights into common interview questions. 1. Tell us about yourself and your current job responsibilities. Hi, I’m [Your Name], currently working as a Sr. Data Analyst with over 3.5 years of experience. I specialize in creating interactive dashboards, analyzing large datasets, and automating workflows. My responsibilities include developing Power BI dashboards for financial and operational reporting, analyzing trends in customer churn rates, and collaborating with cross-functional teams to implement data-driven solutions. Here’s a quick glimpse of my professional journey: Reporting financial metrics using Power BI, Excel, and SQL. Designing dashboards to track sales and marketing KPIs. Teaching data analysis conce...

EXL Interview question and answer for Power BI Developer (3 Years of Experience)

EXL Interview Experience for Power BI Developer (3 Years of Experience) I recently appeared for an interview at EXL for the role of Power BI Developer . The selection process consisted of three rounds: 2 Technical Rounds 1 Managerial Round Here, I’ll share the key technical questions I encountered, along with my approach to answering them. SQL Questions 1️⃣ Write a SQL query to find the second most recent order date for each customer from a table Orders ( OrderID , CustomerID , OrderDate ). To solve this, I used the ROW_NUMBER() window function: sql WITH RankedOrders AS ( SELECT CustomerID, OrderDate, ROW_NUMBER () OVER ( PARTITION BY CustomerID ORDER BY OrderDate DESC ) AS RowNum FROM Orders ) SELECT CustomerID, OrderDate AS SecondMostRecentOrderDate FROM RankedOrders WHERE RowNum = 2 ; 2️⃣ Write a query to find the nth highest salary from a table Employees with columns ( EmployeeID , Name , Salary ). The DENSE_RANK() fu...