Skip to main content

Meesho PySpark Interview Questions for Data Engineers in 2025

Meesho PySpark Interview Questions for Data Engineers in 2025 Preparing for a PySpark interview? Let’s tackle some commonly asked questions, along with practical answers and insights to ace your next Data Engineering interview at Meesho or any top-tier tech company. 1. Explain how caching and persistence work in PySpark. When would you use cache() versus persist() and what are their performance implications? Answer : Caching : Stores data in memory (default) for faster retrieval. Use cache() when you need to reuse a DataFrame or RDD multiple times in a session without specifying storage levels. Example: python df.cache() df.count() # Triggers caching Persistence : Allows you to specify storage levels (e.g., memory, disk, or a combination). Use persist() when memory is limited, and you want a fallback to disk storage. Example: python from pyspark import StorageLevel df.persist(StorageLevel.MEMORY_AND_DISK) df.count() # Triggers persistence Performance Implications : cache() is ...

Ad

Flipkart Business Analyst Interview question and Answer asked in December 2024

Flipkart Business Analyst Interview Experience (1-3 Years)

Recently, I appeared for an interview at Flipkart for the position of Business Analyst, and I’m excited to share the questions asked during the process along with how I would approach answering them. The interview covered various domains such as SQL, guesstimates, case studies, managerial scenarios, and Python.

Here’s how I would have tackled each question:


SQL Questions

1️⃣ What are window functions, and how do they differ from aggregate functions? Can you give a use case?

  • Answer:
    Window functions perform calculations across a set of table rows related to the current row, without collapsing the result set into a single value like aggregate functions.
    • Example:
      sql

      SELECT CustomerID, OrderID, OrderDate, ROW_NUMBER() OVER (PARTITION BY CustomerID ORDER BY OrderDate DESC) AS OrderRank FROM Orders;
      Use case: Finding the latest order per customer without grouping data.

2️⃣ Explain indexing. When could an index potentially reduce performance, and how would you approach indexing for a large dataset?

  • Answer:
    Indexing speeds up query retrieval by creating a data structure on columns. However, it can reduce performance during write operations (INSERT, UPDATE, DELETE) due to the overhead of updating indexes.
    • For large datasets, I’d:
      • Create indexes only on frequently queried columns.
      • Use covering indexes where possible.
      • Avoid excessive indexing, as it can increase storage costs.

3️⃣ Write a query to retrieve customers who made purchases in the last 30 days but didn’t purchase anything in the previous 30 days.

  • Answer:
    sql

    SELECT DISTINCT CustomerID FROM Orders WHERE OrderDate BETWEEN DATE_SUB(CURDATE(), INTERVAL 30 DAY) AND CURDATE() AND CustomerID NOT IN ( SELECT DISTINCT CustomerID FROM Orders WHERE OrderDate BETWEEN DATE_SUB(CURDATE(), INTERVAL 60 DAY) AND DATE_SUB(CURDATE(), INTERVAL 30 DAY) );

4️⃣ Given a table of transactions, find the top 3 most purchased products for each category.

  • Answer:
    sql

    SELECT Category, ProductID, RANK() OVER (PARTITION BY Category ORDER BY SUM(Quantity) DESC) AS Rank FROM Transactions GROUP BY Category, ProductID HAVING Rank <= 3;

5️⃣ How would you identify duplicate records in a large dataset and remove only the duplicates, retaining the first occurrence?

  • Answer:
    sql

    DELETE FROM TableName WHERE ID NOT IN ( SELECT MIN(ID) FROM TableName GROUP BY DuplicateColumn1, DuplicateColumn2 );

Guesstimates

1️⃣ Estimate the number of online food delivery orders in a large metropolitan city over a month.

  • Answer:
    • Population: 10 million
    • Percentage ordering food online: 30%
    • Average orders per person per month: 5
    • Total orders = 10M * 30% * 5 = 15 million orders/month

2️⃣ How many customer service calls would a telecom company receive daily for a customer base of 1 million?

  • Answer:
    • Assume 5% of users call customer service daily.
    • Calls = 1M * 5% = 50,000 calls/day

Case Studies

1️⃣ A sudden decrease in conversion rate is observed in a popular product category. How would you investigate the cause and propose solutions?

  • Answer:
    • Investigate:
      • Analyze traffic trends (source, location).
      • Check product availability and pricing.
      • Review customer feedback.
    • Propose solutions:
      • Optimize pricing strategy.
      • Resolve technical issues on the website.
      • Enhance product descriptions or images.

2️⃣ The company is considering adding a new subscription model. How would you evaluate its potential impact on customer lifetime value and revenue?

  • Answer:
    • Analyze historical data to determine potential upsell opportunities.
    • Conduct surveys to gauge customer interest.
    • Simulate subscription revenue based on adoption rates and churn predictions.

Managerial Questions

1️⃣ Describe a time when you faced conflicting priorities on a project. How did you manage your workload to meet deadlines?

  • Answer:
    • I would prioritize based on impact and urgency using the Eisenhower Matrix. Communicating expectations and negotiating timelines with stakeholders has helped me deliver results effectively.

2️⃣ How would you handle a disagreement within the team on an analytical approach?

  • Answer:
    • Facilitate a discussion to ensure everyone’s perspective is heard. Use data to support decision-making and align the team towards a common goal.

Python Questions

1️⃣ Write a Python function to find the longest consecutive sequence of unique numbers in a list.

  • Answer:
    python
    def longest_unique_sequence(nums):
    unique_nums = set(nums) longest = 0 for num in nums: if num - 1 not in unique_nums: current = num length = 1 while current + 1 in unique_nums: current += 1 length += 1 longest = max(longest, length) return longest


2️⃣ If you’re working with a large dataset with missing values, what Python libraries would you use to handle missing data, and why?

  • Answer:
    • Libraries:
      • Pandas: To detect and fill/drop missing values.
      • Scikit-learn: For advanced imputation methods like KNNImputer.
    • I’d choose based on the dataset and the need for accuracy vs. simplicity.

Pro Tip

  • Always structure your answers logically, especially for guesstimates and case studies.
  • Highlight problem-solving skills and focus on clarity.

Follow for more interview experiences and actionable tips!
Hashtag: #Flipkart #BusinessAnalyst #InterviewExperience #DataAnalysis

Comments

Ad

Popular posts from this blog

Deloitte Data Analyst Interview Questions and Answer

Deloitte Data Analyst Interview Questions: Insights and My Personal Approach to Answering Them 1. Tell us about yourself and your current job responsibilities. Example Answer: "I am currently working as a Data Analyst at [Company Name], where I manage and analyze large datasets to drive business insights. My responsibilities include creating and maintaining Power BI dashboards, performing advanced SQL queries to extract and transform data, and collaborating with cross-functional teams to improve data-driven decision-making. Recently, I worked on a project where I streamlined reporting processes using DAX measures and optimized SQL queries, reducing report generation time by 30%." 2. Can you share some challenges you encountered in your recent project involving Power BI dashboards, and how did you resolve them? Example Challenge: In a recent project, one of the key challenges was handling complex relationships between multiple datasets, which caused performance issues and in...

Deloitte Recent Interview Questions for Data Analyst Position November 2024

Deloitte Recent Interview Insights for a Data Analyst Position (0-3 Years) When preparing for an interview with a firm like Deloitte, particularly for a data analyst role, it's crucial to combine technical proficiency with real-world experiences. Below are my personalized insights into common interview questions. 1. Tell us about yourself and your current job responsibilities. Hi, I’m [Your Name], currently working as a Sr. Data Analyst with over 3.5 years of experience. I specialize in creating interactive dashboards, analyzing large datasets, and automating workflows. My responsibilities include developing Power BI dashboards for financial and operational reporting, analyzing trends in customer churn rates, and collaborating with cross-functional teams to implement data-driven solutions. Here’s a quick glimpse of my professional journey: Reporting financial metrics using Power BI, Excel, and SQL. Designing dashboards to track sales and marketing KPIs. Teaching data analysis conce...

EXL Interview question and answer for Power BI Developer (3 Years of Experience)

EXL Interview Experience for Power BI Developer (3 Years of Experience) I recently appeared for an interview at EXL for the role of Power BI Developer . The selection process consisted of three rounds: 2 Technical Rounds 1 Managerial Round Here, I’ll share the key technical questions I encountered, along with my approach to answering them. SQL Questions 1️⃣ Write a SQL query to find the second most recent order date for each customer from a table Orders ( OrderID , CustomerID , OrderDate ). To solve this, I used the ROW_NUMBER() window function: sql WITH RankedOrders AS ( SELECT CustomerID, OrderDate, ROW_NUMBER () OVER ( PARTITION BY CustomerID ORDER BY OrderDate DESC ) AS RowNum FROM Orders ) SELECT CustomerID, OrderDate AS SecondMostRecentOrderDate FROM RankedOrders WHERE RowNum = 2 ; 2️⃣ Write a query to find the nth highest salary from a table Employees with columns ( EmployeeID , Name , Salary ). The DENSE_RANK() fu...