Meesho PySpark Interview Questions for Data Engineers in 2025 Preparing for a PySpark interview? Let’s tackle some commonly asked questions, along with practical answers and insights to ace your next Data Engineering interview at Meesho or any top-tier tech company. 1. Explain how caching and persistence work in PySpark. When would you use cache() versus persist() and what are their performance implications? Answer : Caching : Stores data in memory (default) for faster retrieval. Use cache() when you need to reuse a DataFrame or RDD multiple times in a session without specifying storage levels. Example: python df.cache() df.count() # Triggers caching Persistence : Allows you to specify storage levels (e.g., memory, disk, or a combination). Use persist() when memory is limited, and you want a fallback to disk storage. Example: python from pyspark import StorageLevel df.persist(StorageLevel.MEMORY_AND_DISK) df.count() # Triggers persistence Performance Implications : cache() is ...
How I Would Solve These Tricky SQL Questions Asked in American Express Interview SQL is a fundamental skill for any data analyst, and mastering complex queries is key to standing out in interviews. Below, I break down how I would approach solving the tricky SQL questions mentioned. Each of these challenges is designed to test both your technical proficiency and your problem-solving ability. Let’s dive into the solutions. 1. Find the Second-Highest Salary in a Table Without Using LIMIT or TOP This is a classic problem that requires creativity. My solution: sql SELECT MAX (salary) FROM employees WHERE salary < ( SELECT MAX (salary) FROM employees); Here, the subquery finds the maximum salary, and the outer query selects the highest salary below that. 2. Find All Employees Who Earn More Than Their Managers Joining the table to itself is the key here: sql SELECT e1.employee_name FROM employees e1 JOIN employees e2 ON e1.manager_id = e2.employee_id WHERE e1.salary > ...