Skip to main content

Meesho PySpark Interview Questions for Data Engineers in 2025

Meesho PySpark Interview Questions for Data Engineers in 2025 Preparing for a PySpark interview? Let’s tackle some commonly asked questions, along with practical answers and insights to ace your next Data Engineering interview at Meesho or any top-tier tech company. 1. Explain how caching and persistence work in PySpark. When would you use cache() versus persist() and what are their performance implications? Answer : Caching : Stores data in memory (default) for faster retrieval. Use cache() when you need to reuse a DataFrame or RDD multiple times in a session without specifying storage levels. Example: python df.cache() df.count() # Triggers caching Persistence : Allows you to specify storage levels (e.g., memory, disk, or a combination). Use persist() when memory is limited, and you want a fallback to disk storage. Example: python from pyspark import StorageLevel df.persist(StorageLevel.MEMORY_AND_DISK) df.count() # Triggers persistence Performance Implications : cache() is ...

Ad

Power BI Questions interview question asked in ScatterPie Analytics 1st Round ( November 2024 )

My Power BI Journey: Answering #ScatterPie Analytics 1st Round

Today, I’m sharing my responses to a series of Power BI questions which has been asked in the #ScatterPie Analytics 1st Round interview. These questions not only tested my technical expertise but also explored how I approach problem-solving in real-world scenarios. Here's how I responded:


1. Walk Me Through Your Profile

I am a Sr. Data Analyst with over 3.5 years of experience specializing in data visualization and reporting. I have developed dynamic dashboards and reports in Power BI, leveraging my expertise in DAX, SQL, and data modeling. My work spans domains like sales, marketing, operations, and finance, delivering actionable insights that drive business decisions. I’ve also managed Row Level Security (RLS) implementations and optimized performance for datasets with millions of rows.


2. How Many Dashboards Have You Developed So Far?

I have developed 20+ dashboards, catering to diverse business needs. For instance, I’ve created interactive sales performance dashboards, real-time marketing ROI trackers, and operational efficiency reports. Each dashboard is tailored to deliver key insights efficiently and visually.


3. Describe the Project Development Process

a) Data Sourcing:

I typically source data from SQL databases, Excel files, APIs, and cloud platforms like Azure and Google Sheets.

b) Reading and Processing Data:

Using Power Query, I clean, transform, and load the data into Power BI, ensuring it’s optimized for analysis.

c) Gathering Requirements and Defining KPIs:

I collaborate closely with stakeholders to understand business needs, define KPIs, and map the data architecture accordingly.


4. What Is Your Role in the Workspace?

In my workspace, I:

  • Manage datasets and reports.
  • Create dashboards with interactive visuals.
  • Implement Row Level Security (RLS) to ensure data integrity.
  • Collaborate with team members for seamless report delivery.

5. What Is Row Level Security (RLS)? Provide Examples

RLS restricts data access based on user roles.

  • Static RLS: Predefined roles, e.g., restricting access to sales data by department.
  • Dynamic RLS: Uses DAX filters to apply conditions dynamically, e.g., showing sales only for the logged-in user’s region.

6. Does RLS Apply to a Member Role in the Workspace?

No, RLS does not work for users with Admin, Member, or Contributor roles in a workspace. RLS is effective only when users access reports through an app or shared link.


7. Why Do We Need a Master Calendar?

A master calendar ensures consistent time-based analysis across dimensions.

  • Used with Dimensions: Enables filtering and grouping by dates.
  • Not Used with Fact Tables: Fact tables focus on transactional data, and joining them with a calendar table can cause redundancy.

8. Difference Between Measure and Calculated Column

  • Measure: Context-dependent, dynamic, and calculated on the fly.
  • Calculated Column: Static, calculated row by row, and stored in the dataset.

9. Difference Between HAVING and WHERE Clause

  • WHERE: Filters data before aggregation.
  • HAVING: Filters data after aggregation.

Example:

sql

SELECT Category, SUM(Sales) FROM SalesData WHERE Region = 'East' GROUP BY Category HAVING SUM(Sales) > 5000;

10. Measure vs. Calculated Column Output

When used in a table visual:

  • Measure: Shows the dynamic sum of sales grouped by category.
  • Calculated Column: Displays the same total sales value for every row.

11. DAX Formula for Sales of Category A Only

DAX

Measure = CALCULATE(SUM(Sales), Category = "A")

12. Adding a City Column: Output Change

  • Measure: Dynamically adjusts to show sales grouped by both category and city.
  • Calculated Column: Repeats the same total sales value across all rows.

13. What Are Dataverse and Power Automate?

  • Dataverse: A secure cloud-based platform for storing and managing data.
  • Power Automate: A tool for automating workflows across apps and services.

14. Star Schema vs. Snowflake Schema

  • Star Schema: Direct relationships, simpler design, better for smaller datasets.
  • Snowflake Schema: Normalized tables, ideal for complex datasets requiring higher consistency.

15. Techniques to Reduce Load in Power BI

  • Import only necessary columns.
  • Filter data at the source using SQL queries.
  • Use incremental refresh and aggregations for large datasets.

16. Challenges I’ve Faced in Projects

One challenge was optimizing query performance for a dashboard handling 10 million+ rows of sales data. I resolved this by:

  • Creating aggregate tables.
  • Implementing incremental refresh to reduce load times.

17. Do You Have Any Questions?

I asked:

“What are the immediate challenges this team is facing, and how can I contribute to overcoming them?”

Comments

Ad

Popular posts from this blog

Deloitte Data Analyst Interview Questions and Answer

Deloitte Data Analyst Interview Questions: Insights and My Personal Approach to Answering Them 1. Tell us about yourself and your current job responsibilities. Example Answer: "I am currently working as a Data Analyst at [Company Name], where I manage and analyze large datasets to drive business insights. My responsibilities include creating and maintaining Power BI dashboards, performing advanced SQL queries to extract and transform data, and collaborating with cross-functional teams to improve data-driven decision-making. Recently, I worked on a project where I streamlined reporting processes using DAX measures and optimized SQL queries, reducing report generation time by 30%." 2. Can you share some challenges you encountered in your recent project involving Power BI dashboards, and how did you resolve them? Example Challenge: In a recent project, one of the key challenges was handling complex relationships between multiple datasets, which caused performance issues and in...

Deloitte Recent Interview Questions for Data Analyst Position November 2024

Deloitte Recent Interview Insights for a Data Analyst Position (0-3 Years) When preparing for an interview with a firm like Deloitte, particularly for a data analyst role, it's crucial to combine technical proficiency with real-world experiences. Below are my personalized insights into common interview questions. 1. Tell us about yourself and your current job responsibilities. Hi, I’m [Your Name], currently working as a Sr. Data Analyst with over 3.5 years of experience. I specialize in creating interactive dashboards, analyzing large datasets, and automating workflows. My responsibilities include developing Power BI dashboards for financial and operational reporting, analyzing trends in customer churn rates, and collaborating with cross-functional teams to implement data-driven solutions. Here’s a quick glimpse of my professional journey: Reporting financial metrics using Power BI, Excel, and SQL. Designing dashboards to track sales and marketing KPIs. Teaching data analysis conce...

EXL Interview question and answer for Power BI Developer (3 Years of Experience)

EXL Interview Experience for Power BI Developer (3 Years of Experience) I recently appeared for an interview at EXL for the role of Power BI Developer . The selection process consisted of three rounds: 2 Technical Rounds 1 Managerial Round Here, I’ll share the key technical questions I encountered, along with my approach to answering them. SQL Questions 1️⃣ Write a SQL query to find the second most recent order date for each customer from a table Orders ( OrderID , CustomerID , OrderDate ). To solve this, I used the ROW_NUMBER() window function: sql WITH RankedOrders AS ( SELECT CustomerID, OrderDate, ROW_NUMBER () OVER ( PARTITION BY CustomerID ORDER BY OrderDate DESC ) AS RowNum FROM Orders ) SELECT CustomerID, OrderDate AS SecondMostRecentOrderDate FROM RankedOrders WHERE RowNum = 2 ; 2️⃣ Write a query to find the nth highest salary from a table Employees with columns ( EmployeeID , Name , Salary ). The DENSE_RANK() fu...