...

SQL Interview Preparation Questions for Data Analysts in 2024

5/5 - (1 vote)

Table of Contents

Basic Level SQL Interview Preparation Questions

As the demand for skilled data analysts continues to rise in 2024, proficiency in SQL (Structured Query Language) remains a crucial skill set sought after by employers across various industries. SQL serves as the backbone of data analysis, enabling professionals to retrieve, manipulate, and analyze data efficiently. With the evolving landscape of technology and data analytics, staying abreast of the latest SQL Interview Preparation Questions for Data Analysts in 2024 and techniques is paramount for aspiring data analysts.

  1. What is the Primary Key?

A primary key is a special field in a database table that uniquely identifies each record within that table. It serves as a unique identifier for each row, ensuring that no two rows have the same key value. This helps maintain data integrity and enables efficient data retrieval and management within the database system. In the below Table, You can see Bank Account Numbers will be different for Different People. So the Account number will not match with others.

2. What is a foreign key?

A foreign key is a field in a table that refers to the primary key in another table. It establishes a link between the two tables. By Using Foreign Key you can Join Two tables.

SELECT d.EmpNo, e.DName from Department d JOIN Employee e ON d.DepNo=e.DepNo

Intermediate SQL Questions:

1. Write a SQL query to retrieve all employees from the “employees” table with a salary greater than $50,000.

Explanation: you have to retrieve all employees, so you have to use (*) and the ‘WHERE’ condition will be used to filter data for salary > $50,000

Answer:

  SELECT * FROM employees WHERE salary > 50000;

2. Explain the difference between INNER JOIN and LEFT JOIN.

INNER JOIN returns only the rows where there is a match in both tables, while LEFT JOIN returns all rows from the left table and the matched rows from the right table.

3. What is the purpose of the GROUP BY clause?

The GROUP BY clause is used to group rows that have the same values in specified columns into summary rows.

4. Write a SQL query to find the second highest salary from the “employees” table.

SELECT MAX(salary) FROM employees WHERE salary < (SELECT MAX(salary) FROM employees);

Explanation: from WHERE subquery, we will get max_salary, and Main Query we will get max which is max of subquery.

5. What is a subquery, and how is it different from a JOIN?

A subquery is a query nested inside another query. It can be used to return a single value or a set of values. A JOIN is used to combine rows from two or more tables based on a related column.

6. Explain the ACID properties in the context of database transactions.

ACID stands for Atomicity, Consistency, Isolation, and Durability. These properties ensure that database transactions are processed reliably.

7. Write a SQL query to calculate the total revenue for each category from the “sales” table.

SELECT category, SUM(revenue) FROM sales GROUP BY category;

Explanation:

  1. SELECT category, SUM(revenue):
    • The SELECT statement specifies the columns to be included in the result set. In this case, it selects the ‘category’ column along with the sum of ‘revenue’ for each category.
    • The SUM(revenue) function calculates the total revenue for each category.
  2. FROM sales:
    • Specifies the table from which to retrieve the data. In this case, it’s the ‘Sales’ table.
  3. GROUP BY category:
    • Groups the rows of the result set by the values in the ‘category’ column.
    • This means that all rows with the same ‘category’ value will be combined into a single group.
    • The aggregate function (SUM) is then applied to each group separately, resulting in the total revenue for each category being calculated.

8. What is normalization, and why is it important in database design?

Normalization is the process of organizing data to reduce redundancy and improve data integrity. It ensures that data is stored efficiently without unnecessary duplication.

9. How to find duplicates in a given table?

SELECT emp_id, COUNT(*) FROM emp GROUP BY emp_id HAVING COUNT(*) > 1;

Explanation:

  1. SELECT emp_id, COUNT(*):
    • The SELECT statement specifies the columns to be included in the result set. In this case, it selects the ’emp_id’ column and calculates the count of occurrences for each ’emp_id’.
    • The COUNT(*) function counts the number of rows for each ’emp_id’ group.
  2. FROM emp:
    • Indicates the source table from which to retrieve the data. Here, it’s the ’emp’ table.
  3. GROUP BY emp_id:
    • Groups the rows of the result set by the values in the ’emp_id’ column.
    • This means that all rows with the same ’emp_id’ value will be combined into a single group.
  4. HAVING COUNT(*) > 1:
    • The HAVING clause filters the grouped rows based on a specified condition after the GROUP BY operation has been performed.
    • In this case, it specifies that only groups where the count of occurrences of ’emp_id’ is greater than 1 will be included in the result set.
    • This condition effectively identifies ’emp_id’ values that appear more than once in the ’emp’ table.

10. How to delete duplicates?

#By USING WIth CTE:-
WITH cte AS 
                  ( SELECT *, ROW_NUMBER() OVER (PARTITION BY emp_id ORDER BY emp_id) AS rn 
               FROM emp)
DELETE FROM cte WHERE rn > 1;

Explanation:

  1. WITH cte AS (…):
    • This begins the definition of a Common Table Expression (CTE) named ‘cte’.
    • Within the CTE, a SELECT statement is used to retrieve all columns (*) from the ’emp’ table along with an additional column named ‘rn’.
    • The ROW_NUMBER() function assigns a unique sequential number to each row within a partition defined by the ’emp_id’ column.
    • The PARTITION BY clause divides the result set into partitions based on distinct ’emp_id’ values.
    • The ORDER BY clause orders the rows within each partition by ’emp_id’.
  2. DELETE FROM cte WHERE rn > 1:
    • This is the main DELETE statement that operates on the CTE ‘cte’.
    • It specifies to delete rows from the ‘cte’ CTE where the row number ‘rn’ is greater than 1.
    • Since the ROW_NUMBER() function assigns the number 1 to the first row in each partition, any row with a row number greater than 1 represents a duplicate row within its ’emp_id’ partition.

#By Using UNION:-

SELECT emp_name FROM employees
UNION
SELECT customer_name FROM customers;

Explanation: The UNION operator combines the results of two SELECT statements and removes duplicate rows from the combined result set. In this example, the query selects names from emp_name in the employees table and customer_name in the customers table, and any duplicate names are removed from the final result set.

-#By Using UNION ALL

SELECT emp_name FROM employees
UNION ALL
SELECT customer_name FROM customers;

Explanation: The UNION ALL operator also combines the results of two SELECT statements but does not remove duplicate rows from the combined result set. In this example, all rows from both emp_name in employees and customer_name in customers are included in the final result set, including duplicates.

11. What is the Difference between RANK, ROW_NUMBER, and DENSE_RANK? Explain with the help of SQL Query.

#By using RANK

SELEC Temp_id, emp_name, salary,
          RANK() OVER(ORDER BY salary DESC) AS rnk
FROM  employees;

#By using ROW_NUMBER

SELECT emp_id,emp_name,salary,
ROW_NUMBER() OVER(ORDER BY salary DESC) AS rn
FROM employees;

# By using DENSE_RANK

SELEC Temp_id, emp_name, salary,
DENSE_RANK() OVER(ORDER BY salary DESC) AS rn
FROM employees;

Explanation:

  1. RANK(): The RANK() function assigns a unique rank to each distinct row in the result set.If multiple rows have the same values and are assigned the same rank, the next rank will skip the number of tied rows and continue from there. In other words, it leaves gaps in the ranking sequence for tied rows. It assigns the same rank to tied rows but doesn’t provide unique row numbers.
  2. ROW_NUMBER(): The ROW_NUMBER() function assigns a unique row number to each row in the result set.It doesn’t consider ties. If multiple rows have the same values, they will still receive unique row numbers.It doesn’t leave gaps in the row numbering sequence.
  3. DENSE_RANK(): The DENSE_RANK() function assigns a unique rank to each distinct row in the result set.If multiple rows have the same values, they are assigned the same rank, and the next rank does not skip any values. It provides a dense ranking without gaps for tied rows. It assigns the same rank to tied rows but doesn’t provide unique row numbers. These window functions are useful for assigning ranks or row numbers to rows in a result set based on the specified ordering, and understanding their differences is crucial for performing analytical tasks efficiently in SQL.

12. How to find employees who are not present at the department table?

SELECT *FROM employees AS e 
LEFT JOIN
departments AS d ON e.department_id = d.dept_id
WHERE d.dept_id IS NULL;

Explanation:

  • In this SQL query, we’re using a LEFT JOIN between the employees table (aliased as e) and the departments table (aliased as d) based on the department_id.
  • The LEFT JOIN ensures that all rows from the employees table are returned, along with matching rows from the departments table. If there is no match, NULL values are returned for the columns from the departments table.
  • The WHERE clause filters the result set to only include rows where the dept_id column from the departments table is NULL. This indicates that the employee is not associated with any department.
  • By using LEFT JOIN and checking for NULL values in the department table, we can find employees who are not present in the department table. This query is useful for identifying employees who have not been assigned to any department, which can be important for organizational management and analysis.

13. How to find the second-highest salary in each department?

SELECT * FROM 
                (SELEC Temp_name,salary,department_id,
               DENSE_RANK() OVER (PARTITION BY department_id ORDER BY salary DESC) AS rn
               FROM employees) AS a
               WHERE rn = 2;

Explanation:

  • In this SQL query, we’re using a window function DENSE_RANK() over a specific partition defined by the department_id column to assign ranks to salaries within each department.
  • The ORDER BY clause orders the salaries within each department in descending order.
  • The DENSE_RANK() function assigns a dense rank to each salary within its respective department, ensuring that tied salaries receive the same rank without any gaps.
  • By filtering the results to WHERE rn = 2, we’re selecting rows where the rank (rn) is equal to 2. This effectively gives us the employees with the second-highest salary in each department.
  • The outer query then selects all columns from the derived table a, which contains the employees with the second-highest salary in each department. This query is useful for identifying employees with the second-highest salary within their respective departments, which can be valuable for various HR and management purposes.

14 . Find all the transactions done by customers named “DEEPAK”.

SELECT *FROM orders 
WHERE UPPER(customer_name) = 'DEEPAK';

Explanation:

  • In this SQL query, we’re selecting all columns from the orders table.
  • The WHERE clause filters the result set to include only rows where the customer_name column matches ‘RAKESH’ after converting it to uppercase using the UPPER() function.
  • Using UPPER() ensures that the comparison is case-insensitive, allowing us to match variations in the case of the customer name. This query retrieves all transactions made by customers named “DEEPAK” from the orders table.

15. How to perform a self-join to find cases where Employees who earn more than their managers?

Example:

SELECT e1.emp_id, e1.emp_name, 
e2.emp_name AS manager_name,
e1.salary AS employee_salary,
e2.salary AS manager_salary
FROMemployees AS e1
INNER JOIN employees AS e2  ON e1.manager_id = e2.emp_id
WHERE e2.salary > e1.salary;

Explanation:

  • In this SQL query, we perform a self-join on the employees table.
  • We join the table on the condition that the manager_id of an employee (e1) matches the emp_id of another employee (e2). This joins an employee with their manager.
  • The SELECT statement retrieves the emp_id, emp_name, and salary of the employee (e1), along with the emp_name and salary of the manager (e2).
  • The WHERE clause filters the result set to only include rows where the salary of the manager (e2.salary) is greater than the salary of the employee (e1.salary).
  • This query helps identify cases where the manager earns more than the employee, which might be useful for performance evaluation or salary adjustment purposes.

16 How to find all transactions done by female customers?Example:

SELECT * FROM orders 
WHERE UPPER(customer_gender) = 'FEMALE';

Explanation:

  • In this SQL query, we’re selecting all columns from the orders table.
  • The WHERE clause filters the result set to include only rows where the customer_gender column matches ‘FEMALE’ after converting it to uppercase using the UPPER() function.
  • Using UPPER() ensures that the comparison is case-insensitive, allowing us to match variations in the case of the gender. This query retrieves all transactions made by female customers from the orders table, which can be useful for analyzing sales patterns or targeting specific customer segments.

17. How to find employees who are not assigned to any department?

Example:

SELECT *FROM employees AS e
LEFT JOIN 
departments AS d ON e.department_id = d.dept_id
WHERE d.dept_id IS NULL;

Explanation:

  • In this SQL query, we’re performing a LEFT JOIN between the employees table (aliased as e) and the departments table (aliased as d) based on the department_id.
  • The LEFT JOIN ensures that all rows from the employees table are returned, along with matching rows from the departments table. If there is no match, NULL values are returned for the columns from the departments table.
  • The WHERE clause filters the result set to only include rows where the dept_id column from the departments table is NULL. This indicates that the employee is not associated with any department.
  • This query helps identify employees who have not been assigned to any department, which can be important for organizational management and analysis.

18 How to calculate the total number of orders placed on each order date?

Example:

SELECT order_date, COUNT(*) AS total_orders
FROM orders
GROUP BY order_date;

Explanation:

  • In this SQL query, we’re selecting the order_date column from the orders table along with the count of orders placed on each date.
  • The COUNT(*) function is used to count the number of rows for each distinct order_date.
  • Using the GROUP BY clause, we group the rows in the orders table by the order_date column. This groups the orders based on their order dates.
  • As a result, we get one row for each unique order date along with the total number of orders placed on that date.
  • This query helps to analyze the distribution of orders over different dates, which can be useful for understanding trends and patterns in customer behavior or business operations.

19 . How to identify employees with the highest salary in each department?

Example:

SELECT *FROM 
               (SELECT emp_name,
                salary,department_id,
                ROW_NUMBER() OVER (PARTITION BY department_id ORDER BY salary DESC) AS rn
                FROM employees) AS a
WHERE rn = 1;

Explanation:

  • In this SQL query, we’re using a window function ROW_NUMBER() over a specific partition defined by the department_id column to assign row numbers based on the salary in descending order within each department.
  • The PARTITION BY clause divides the result set into partitions, one for each unique department_id.
  • The ORDER BY clause orders the rows within each partition by salary in descending order.
  • The ROW_NUMBER() function assigns a unique row number to each row within its partition.
  • By filtering the results to WHERE rn = 1, we’re selecting rows where the row number (rn) is equal to 1. This effectively gives us the employees with the highest salary in each department.
  • The outer query then selects all columns from the derived table a, which contains the employees with the highest salary in each department. This query is useful for identifying top earners in each department, which can be valuable for various HR and management purposes.

20. How to find the total salary expenditure for each department?

Example:

SELECT d.dept_name,
SUM(e.salary) AS total_salary_expenditure
FROM employees e 
JOIN departments d ON e.department_id = d.dept_id
GROUP BY d.dept_name;

Explanation:

  • In this SQL query, we’re joining the employees table (e) with the departments table (d) based on the department_id and dept_id columns, respectively.
  • The JOIN operation ensures that we have access to the department names (dept_name) from the departments table.
  • The SUM() function is used to calculate the total salary expenditure for each department by summing up the salaries of all employees within the department.
  • Using the GROUP BY clause, we group the result set by department names (dept_name). This allows us to calculate the total salary expenditure for each department.
  • As a result, we obtain one row for each department, containing the department name and the total salary expenditure for that department.
  • This query helps organizations understand and manage their payroll expenses across different departments, facilitating budgeting and financial analysis.

21. How to identify employees who have the same salary as their managers?

Example:

SELECT e.emp_id, 
              e.emp_name,
              e.salary AS employee_salary,
              m.emp_name AS manager_name,
              m.salary AS manager_salary
      FROM employees e
JOIN   employees m ON e.manager_id = m.emp_id
          WHEREe.salary = m.salary;

Explanation:

  • In this SQL query, we’re joining the employee’s table (e) with itself (aliased as m) based on the manager_id and emp_id columns to establish a self-join.
  • The JOIN operation ensures that we pair each employee with their corresponding manager.
  • The WHERE clause filters the result set to include only rows where the salary of the employee (e.salary) is equal to the salary of their manager (m.salary).
  • As a result, we obtain rows where employees have the same salary as their managers, allowing organizations to identify such cases for review or analysis.
  • This query can help in assessing the fairness and consistency of salary structures within the organization.

22. Update query to swap gender.

Example:

UPDATE employees
SET gender = 
CASE WHEN gender = 'Male' THEN 'Female'
WHEN gender = 'Female' THEN 'Male'
ELSE gender
END;

Explanation:

  • The UPDATE statement is used to modify existing records in the employees table.
  • The SET clause specifies the column to be updated, which is gender in this case.
  • The CASE statement evaluates each row’s current gender value: If the gender is ‘Male’, it updates it to ‘Female’.If the gender is ‘Female’, it updates it to ‘Male’.For any other gender values, it leaves the gender unchanged.
  • The UPDATE statement applies the changes to all records in the employee’s table.
  • This query efficiently swaps the gender values for all employees in the table, ensuring data consistency.

23. Write an SQL query to find all dates’ id with higher temperatures compared to their previous dates (yesterday). use Windows function and without Windows Function

# With Windows Function
SELECT id, date
FROM (
  SELECT id, date, temperature,
         LAG(temperature) OVER (ORDER BY date) AS previous_temperature
  FROM your_table_name
) AS subquery
WHERE temperature > previous_temperature

In this query:

  1. Replace your_table_name with the actual name of your table.
  2. The subquery selects the ID, date, and temperature from the table, and also includes a window function LAG() to fetch the previous day’s temperature.
  3. The outer query filters the results by selecting only the rows where the temperature is higher than the previous day’s temperature.

This query will give you the IDs and dates of the days with higher temperatures compared to the previous day.

# Without Windows Function

SELECT t1.id, t1.date
FROM your_table_name t1
JOIN your_table_name t2 ON t1.date = DATE_ADD(t2.date, INTERVAL 1 DAY)
WHERE t1.temperature > t2.temperature;

In this query:

  1. Replace your_table_name with the actual name of your table.
  2. The query uses a self-join to join the table with itself. The first occurrence of the table is aliased as t1, representing the current date, and the second occurrence is aliased as t2, representing the previous day.
  3. The ON clause specifies the condition for joining the tables, which is that the date of t1 should be equal to the date of t2 plus one day (yesterday).
  4. The WHERE clause filters the results by selecting only the rows where the temperature of t1 is higher than the temperature of t2.

This query will give you the IDs and dates of the days with higher temperatures compared to the previous day.

Employees who earn more than their managers

24. Write an SQL query to find all numbers that appear at least three times consecutively.

Return the result table in any order.

id is the primary key for this table.

SELECT a.Num as ConsecutiveNums
FROM Logs a 
JOIN Logs b ON a.id = b.id+1 AND a.num = b.num
JOIN Logs c ON a.id = c.id+2 AND a.num = c.num;

Input Table and Result

25. Find the Month-on-Month Revenue change for each product i.e. display the % sales growth for each product for each month compared to the sales in the previous month. If there is no previous month sales for a product, the % growth can be shown as null.

month, product, Monthly_sales, previous_Month_sales, REVENUE_CHANGE_PER_MONTH

‘June’, ‘Apple’, ‘422980’, NULL, NULL
‘April’, ‘Banana’, ‘1856602’, NULL, NULL
‘Feb’, ‘Banana’, ‘322698’, ‘1856602’, ‘-82.62’
‘Jan’, ‘Banana’, ‘126358’, ‘322698’, ‘-60.84’
‘June’, ‘Banana’, ‘3504818’, ‘126358’, ‘2673.72’
‘Mar’, ‘Banana’, ‘238342’, ‘3504818’, ‘-93.20’
‘May’, ‘Banana’, ‘2289032’, ‘238342’, ‘860.40’
‘June’, ‘Citrus’, ‘88459’, NULL, NULL
‘April’, ‘Mango’, ‘426065’, NULL, NULL
‘June’, ‘Mango’, ‘450441’, ‘426065’, ‘5.72’
‘May’, ‘Mango’, ‘336425’, ‘450441’, ‘-25.31’
‘April’, ‘Pomo’, ‘253787’, NULL, NUL

# SQL Query
WITH SALES AS (
WITH CTE1 AS (
 SELECT month,product,
sum(Total_Invoice_Value) as Monthly_sales
FROM 
practice1.horti_sales
group by Month,product
 )
SELECT *,
LAG(Monthly_sales,1,NULL) OVER (PARTITION BY product ORDER BY month) as 
previous_Month_sales
FROM CTE1
) 
SELECT *,
ROUND ((((Monthly_sales - previous_Month_sales)*100)/previous_Month_sales ),2) AS 
REVENUE_CHANGE_PER_MONTH
FROM SALES

This SQL query analyzes sales data to calculate the revenue change percentage between consecutive months for each product. Let’s break it down step by step:

  1. Common Table Expressions (CTEs):
    • The query starts by defining a CTE named CTE1, which selects the month, product, and the total invoice value summed up as monthly sales from the practice1.horti_sales table. It groups this data by month and product.
    • Then, there’s another CTE named SALES, which further refines the data from CTE1. It selects all columns from CTE1 and adds an additional column previous_Month_sales using the LAG function. This function retrieves the value of Monthly_sales from the previous row within the same partition, ordered by month, and sets it as the value of previous_Month_sales.
  2. Main Query:
    • The main query selects all columns from the SALES CTE and adds another calculated column named REVENUE_CHANGE_PER_MONTH. This column calculates the percentage change in revenue between the current month’s sales and the previous month’s sales, for each product. It uses a formula to calculate this percentage change: Percentage Change=Current Month Sales−Previous Month SalesPrevious Month Sales×100Percentage Change=Previous Month SalesCurrent Month Sales−Previous Month Sales​×100
    • The ROUND function is used to round the calculated percentage change to two decimal places.

In essence, this query generates a report that shows the monthly sales data for each product, along with the percentage change in revenue between consecutive months. It’s a useful tool for analyzing sales trends over time.

Conclusion:

In the fast-paced world of data analytics, proficiency in SQL remains an indispensable skill for aspiring data analysts. Throughout this article, we’ve navigated through a comprehensive array of SQL interview preparation questions tailored for data analysts in 2024. From fundamental concepts to advanced techniques, these questions encompass the breadth and depth of SQL knowledge essential for success in interviews and real-world data analysis scenarios.

Spread the love

Leave a Comment

Seraphinite AcceleratorOptimized by Seraphinite Accelerator
Turns on site high speed to be attractive for people and search engines.