Data Engineering Interview Crash Workbook: Answer Guide Part 2

This is Part 2 of the workbook companion. It is a separate page from Part 1 and focuses on:

full worked answers for the remaining coding problem list
mock interview questions with spoken answers
a final 3-day revision checklist

How To Use This Page

For each coding problem:

say the brute-force idea first
say the optimized idea
explain the data structure choice
code the clean solution
test one happy case and one edge case

Remaining Coding Answers

Arrays And Strings

154. Three Sum

Brute force: try all triplets, O(n^3).
Optimized idea: sort, fix one number, then use two pointers on the rest.

def three_sum(nums: list[int]) -> list[list[int]]:
    nums.sort()
    result = []
    n = len(nums)

    for i in range(n - 2):
        if i > 0 and nums[i] == nums[i - 1]:
            continue

        left, right = i + 1, n - 1
        while left < right:
            total = nums[i] + nums[left] + nums[right]
            if total == 0:
                result.append([nums[i], nums[left], nums[right]])
                left += 1
                right -= 1
                while left < right and nums[left] == nums[left - 1]:
                    left += 1
                while left < right and nums[right] == nums[right + 1]:
                    right -= 1
            elif total < 0:
                left += 1
            else:
                right -= 1

    return result

Complexity: O(n^2) time, O(1) extra space excluding output.
Edge cases: duplicates, all positives, all negatives.

157. Merge Sorted Arrays

Brute force: concatenate then sort.
Optimized idea: fill from the end because the first array has extra space.

def merge(nums1: list[int], m: int, nums2: list[int], n: int) -> None:
    i, j, write = m - 1, n - 1, m + n - 1

    while j >= 0:
        if i >= 0 and nums1[i] > nums2[j]:
            nums1[write] = nums1[i]
            i -= 1
        else:
            nums1[write] = nums2[j]
            j -= 1
        write -= 1

Complexity: O(m + n) time, O(1) extra space.

158. Remove Duplicates From Sorted Array

def remove_duplicates(nums: list[int]) -> int:
    if not nums:
        return 0

    write = 1
    for read in range(1, len(nums)):
        if nums[read] != nums[write - 1]:
            nums[write] = nums[read]
            write += 1

    return write

Idea: the write pointer marks the next location for a new unique value.
Complexity: O(n) time, O(1) space.

159. Rotate Array

Optimized idea: reverse whole array, reverse first k, reverse rest.

def rotate(nums: list[int], k: int) -> None:
    k %= len(nums)

    def reverse(left: int, right: int) -> None:
        while left < right:
            nums[left], nums[right] = nums[right], nums[left]
            left += 1
            right -= 1

    reverse(0, len(nums) - 1)
    reverse(0, k - 1)
    reverse(k, len(nums) - 1)

160. Prefix Sum

Use case: fast range-sum queries.

def build_prefix(nums: list[int]) -> list[int]:
    prefix = [0]
    for num in nums:
        prefix.append(prefix[-1] + num)
    return prefix


def range_sum(prefix: list[int], left: int, right: int) -> int:
    return prefix[right + 1] - prefix[left]

Interview line: precompute once, answer many range queries in O(1).

161. Difference Array Idea

Use case: many range updates, one final reconstruction.

def apply_updates(length: int, updates: list[tuple[int, int, int]]) -> list[int]:
    diff = [0] * (length + 1)

    for left, right, delta in updates:
        diff[left] += delta
        if right + 1 < len(diff):
            diff[right + 1] -= delta

    result = [0] * length
    running = 0
    for i in range(length):
        running += diff[i]
        result[i] = running
    return result

162. Group Anagrams

from collections import defaultdict


def group_anagrams(words: list[str]) -> list[list[str]]:
    groups = defaultdict(list)
    for word in words:
        key = "".join(sorted(word))
        groups[key].append(word)
    return list(groups.values())

Complexity: O(n * k log k) where k is word length.
Alternative: use a 26-count tuple to avoid sorting.

163. Valid Anagram

from collections import Counter


def is_anagram(s: str, t: str) -> bool:
    return Counter(s) == Counter(t)

Interview note: if only lowercase letters exist, a fixed-size frequency array is faster than a hash map.

164. String Compression

def compress(chars: list[str]) -> int:
    write = 0
    read = 0

    while read < len(chars):
        ch = chars[read]
        start = read
        while read < len(chars) and chars[read] == ch:
            read += 1

        chars[write] = ch
        write += 1
        count = read - start
        if count > 1:
            for digit in str(count):
                chars[write] = digit
                write += 1

    return write

165. Character Frequency Map

def char_frequency(s: str) -> dict[str, int]:
    freq = {}
    for ch in s:
        freq[ch] = freq.get(ch, 0) + 1
    return freq

Why interviewers ask this: it checks whether you naturally reach for a hash map.

166. Substring Search

Simple interview-safe solution:

def str_str(haystack: str, needle: str) -> int:
    if needle == "":
        return 0

    for i in range(len(haystack) - len(needle) + 1):
        if haystack[i:i + len(needle)] == needle:
            return i
    return -1

Complexity: O((n-m+1) * m) worst case.
If pushed further: mention KMP or Rabin-Karp.

167. Pattern Matching

For interview level, this often means mapping pattern symbols to words.

def word_pattern(pattern: str, s: str) -> bool:
    words = s.split()
    if len(pattern) != len(words):
        return False

    p_to_w = {}
    w_to_p = {}

    for p, w in zip(pattern, words):
        if p in p_to_w and p_to_w[p] != w:
            return False
        if w in w_to_p and w_to_p[w] != p:
            return False
        p_to_w[p] = w
        w_to_p[w] = p

    return True

168. Sliding Window Max

Optimized idea: deque keeps candidate indices in decreasing value order.

from collections import deque


def max_sliding_window(nums: list[int], k: int) -> list[int]:
    dq = deque()
    result = []

    for i, num in enumerate(nums):
        while dq and dq[0] <= i - k:
            dq.popleft()
        while dq and nums[dq[-1]] <= num:
            dq.pop()

        dq.append(i)
        if i >= k - 1:
            result.append(nums[dq[0]])

    return result

169. Longest Common Prefix

def longest_common_prefix(strs: list[str]) -> str:
    if not strs:
        return ""

    prefix = strs[0]
    for word in strs[1:]:
        while not word.startswith(prefix):
            prefix = prefix[:-1]
            if not prefix:
                return ""
    return prefix

170. Roman To Integer

def roman_to_int(s: str) -> int:
    values = {"I": 1, "V": 5, "X": 10, "L": 50, "C": 100, "D": 500, "M": 1000}
    total = 0

    for i in range(len(s)):
        if i + 1 < len(s) and values[s[i]] < values[s[i + 1]]:
            total -= values[s[i]]
        else:
            total += values[s[i]]
    return total

171. Integer To Roman

def int_to_roman(num: int) -> str:
    values = [
        (1000, "M"), (900, "CM"), (500, "D"), (400, "CD"),
        (100, "C"), (90, "XC"), (50, "L"), (40, "XL"),
        (10, "X"), (9, "IX"), (5, "V"), (4, "IV"), (1, "I"),
    ]

    result = []
    for value, symbol in values:
        while num >= value:
            result.append(symbol)
            num -= value
    return "".join(result)

172. String To Integer

def my_atoi(s: str) -> int:
    s = s.lstrip()
    if not s:
        return 0

    sign = 1
    i = 0
    if s[i] in "+-":
        sign = -1 if s[i] == "-" else 1
        i += 1

    result = 0
    while i < len(s) and s[i].isdigit():
        result = result * 10 + int(s[i])
        i += 1

    result *= sign
    min_int, max_int = -(2 ** 31), 2 ** 31 - 1
    if result < min_int:
        return min_int
    if result > max_int:
        return max_int
    return result

173. Reverse Words In Sentence

def reverse_words(s: str) -> str:
    words = s.split()
    return " ".join(reversed(words))

175. Find Missing Number

def missing_number(nums: list[int]) -> int:
    expected = len(nums) * (len(nums) + 1) // 2
    return expected - sum(nums)

Alternative: XOR.
Complexity: O(n) time, O(1) space.

176. Best Time To Buy And Sell Stock

def max_profit(prices: list[int]) -> int:
    best_buy = float("inf")
    best_profit = 0

    for price in prices:
        best_buy = min(best_buy, price)
        best_profit = max(best_profit, price - best_buy)

    return best_profit

178. Merge Intervals

def merge_intervals(intervals: list[list[int]]) -> list[list[int]]:
    if not intervals:
        return []

    intervals.sort(key=lambda x: x[0])
    merged = [intervals[0]]

    for start, end in intervals[1:]:
        last = merged[-1]
        if start <= last[1]:
            last[1] = max(last[1], end)
        else:
            merged.append([start, end])

    return merged

179. Insert Interval

def insert_interval(intervals: list[list[int]], new_interval: list[int]) -> list[list[int]]:
    result = []
    i = 0

    while i < len(intervals) and intervals[i][1] < new_interval[0]:
        result.append(intervals[i])
        i += 1

    while i < len(intervals) and intervals[i][0] <= new_interval[1]:
        new_interval[0] = min(new_interval[0], intervals[i][0])
        new_interval[1] = max(new_interval[1], intervals[i][1])
        i += 1

    result.append(new_interval)

    while i < len(intervals):
        result.append(intervals[i])
        i += 1

    return result

180. Meeting Rooms Overlap

def can_attend_meetings(intervals: list[list[int]]) -> bool:
    intervals.sort(key=lambda x: x[0])
    for i in range(1, len(intervals)):
        if intervals[i][0] < intervals[i - 1][1]:
            return False
    return True

Stack, Queue, Linked List

182. Min Stack

class MinStack:
    def __init__(self):
        self.stack = []

    def push(self, val: int) -> None:
        current_min = val if not self.stack else min(val, self.stack[-1][1])
        self.stack.append((val, current_min))

    def pop(self) -> None:
        self.stack.pop()

    def top(self) -> int:
        return self.stack[-1][0]

    def get_min(self) -> int:
        return self.stack[-1][1]

Key idea: each entry stores both value and min so far.

183. Queue Using Two Stacks

class MyQueue:
    def __init__(self):
        self.in_stack = []
        self.out_stack = []

    def push(self, x: int) -> None:
        self.in_stack.append(x)

    def _move(self) -> None:
        if not self.out_stack:
            while self.in_stack:
                self.out_stack.append(self.in_stack.pop())

    def pop(self) -> int:
        self._move()
        return self.out_stack.pop()

    def peek(self) -> int:
        self._move()
        return self.out_stack[-1]

    def empty(self) -> bool:
        return not self.in_stack and not self.out_stack

184. Stack Using Queues

from collections import deque


class MyStack:
    def __init__(self):
        self.q = deque()

    def push(self, x: int) -> None:
        self.q.append(x)
        for _ in range(len(self.q) - 1):
            self.q.append(self.q.popleft())

    def pop(self) -> int:
        return self.q.popleft()

    def top(self) -> int:
        return self.q[0]

    def empty(self) -> bool:
        return not self.q

187. Merge Two Sorted Linked Lists

def merge_two_lists(list1, list2):
    dummy = ListNode()
    tail = dummy

    while list1 and list2:
        if list1.val <= list2.val:
            tail.next = list1
            list1 = list1.next
        else:
            tail.next = list2
            list2 = list2.next
        tail = tail.next

    tail.next = list1 if list1 else list2
    return dummy.next

188. Find Middle Of Linked List

def middle_node(head):
    slow = fast = head
    while fast and fast.next:
        slow = slow.next
        fast = fast.next.next
    return slow

Why it works: fast moves twice as fast, so slow reaches the middle when fast ends.

189. Remove Nth Node From End

def remove_nth_from_end(head, n: int):
    dummy = ListNode(0, head)
    slow = fast = dummy

    for _ in range(n + 1):
        fast = fast.next

    while fast:
        slow = slow.next
        fast = fast.next

    slow.next = slow.next.next
    return dummy.next

190. Reorder List

Pattern: find middle, reverse second half, then merge alternating nodes.

def reorder_list(head) -> None:
    if not head or not head.next:
        return

    slow = fast = head
    while fast and fast.next:
        slow = slow.next
        fast = fast.next.next

    prev = None
    curr = slow.next
    slow.next = None
    while curr:
        nxt = curr.next
        curr.next = prev
        prev = curr
        curr = nxt

    first, second = head, prev
    while second:
        tmp1, tmp2 = first.next, second.next
        first.next = second
        second.next = tmp1
        first, second = tmp1, tmp2

191. LRU Cache Concept

Spoken answer: an LRU cache needs O(1) get and put. That normally means a hash map for lookup plus a doubly linked list to maintain recency order. On access, move the node to the front. On capacity overflow, evict the tail.

192. Circular Queue

class CircularQueue:
    def __init__(self, k: int):
        self.q = [0] * k
        self.head = 0
        self.count = 0
        self.k = k

    def en_queue(self, value: int) -> bool:
        if self.count == self.k:
            return False
        tail = (self.head + self.count) % self.k
        self.q[tail] = value
        self.count += 1
        return True

    def de_queue(self) -> bool:
        if self.count == 0:
            return False
        self.head = (self.head + 1) % self.k
        self.count -= 1
        return True

193. Deque Use Cases

Spoken answer: a deque supports insertion and deletion at both ends in O(1). It is useful for sliding-window maximum, BFS, palindrome checks, task scheduling, and monotonic queue patterns.

194. Monotonic Stack

Spoken answer: a monotonic stack maintains elements in increasing or decreasing order so each new item can efficiently resolve previous items. It is ideal for next-greater-element, daily temperatures, and histogram-area problems.

195. Next Greater Element

def next_greater(nums: list[int]) -> list[int]:
    result = [-1] * len(nums)
    stack = []

    for i, num in enumerate(nums):
        while stack and nums[stack[-1]] < num:
            idx = stack.pop()
            result[idx] = num
        stack.append(i)

    return result

196. Daily Temperatures

def daily_temperatures(temperatures: list[int]) -> list[int]:
    result = [0] * len(temperatures)
    stack = []

    for i, temp in enumerate(temperatures):
        while stack and temperatures[stack[-1]] < temp:
            prev = stack.pop()
            result[prev] = i - prev
        stack.append(i)

    return result

197. Largest Rectangle In Histogram

def largest_rectangle_area(heights: list[int]) -> int:
    stack = []
    best = 0

    for i, h in enumerate(heights + [0]):
        while stack and heights[stack[-1]] > h:
            height = heights[stack.pop()]
            left = stack[-1] if stack else -1
            width = i - left - 1
            best = max(best, height * width)
        stack.append(i)

    return best

198. Trapping Rain Water

def trap(height: list[int]) -> int:
    left, right = 0, len(height) - 1
    left_max = right_max = 0
    water = 0

    while left < right:
        if height[left] < height[right]:
            left_max = max(left_max, height[left])
            water += left_max - height[left]
            left += 1
        else:
            right_max = max(right_max, height[right])
            water += right_max - height[right]
            right -= 1

    return water

199. Evaluate Reverse Polish Notation

def eval_rpn(tokens: list[str]) -> int:
    stack = []

    for token in tokens:
        if token not in {"+", "-", "*", "/"}:
            stack.append(int(token))
            continue

        b = stack.pop()
        a = stack.pop()
        if token == "+":
            stack.append(a + b)
        elif token == "-":
            stack.append(a - b)
        elif token == "*":
            stack.append(a * b)
        else:
            stack.append(int(a / b))

    return stack[-1]

200. Sliding Window With Deque

Spoken answer: use a deque when the window needs both order and fast max or min lookup. Keep only useful candidate indices, remove expired ones from the front, and remove weaker candidates from the back.

Trees And Graphs

213. Maximum Depth Of Binary Tree

def max_depth(root) -> int:
    if not root:
        return 0
    return 1 + max(max_depth(root.left), max_depth(root.right))

214. Invert Binary Tree

def invert_tree(root):
    if not root:
        return None
    root.left, root.right = invert_tree(root.right), invert_tree(root.left)
    return root

215. Validate BST

def is_valid_bst(root) -> bool:
    def dfs(node, low, high):
        if not node:
            return True
        if not (low < node.val < high):
            return False
        return dfs(node.left, low, node.val) and dfs(node.right, node.val, high)

    return dfs(root, float("-inf"), float("inf"))

216. Lowest Common Ancestor

def lowest_common_ancestor(root, p, q):
    if not root or root == p or root == q:
        return root

    left = lowest_common_ancestor(root.left, p, q)
    right = lowest_common_ancestor(root.right, p, q)

    if left and right:
        return root
    return left if left else right

217. Level Order Traversal

from collections import deque


def level_order(root) -> list[list[int]]:
    if not root:
        return []

    result = []
    q = deque([root])

    while q:
        level = []
        for _ in range(len(q)):
            node = q.popleft()
            level.append(node.val)
            if node.left:
                q.append(node.left)
            if node.right:
                q.append(node.right)
        result.append(level)

    return result

218. Tree Diameter

def diameter_of_binary_tree(root) -> int:
    best = 0

    def depth(node):
        nonlocal best
        if not node:
            return 0
        left = depth(node.left)
        right = depth(node.right)
        best = max(best, left + right)
        return 1 + max(left, right)

    depth(root)
    return best

219. Path Sum

def has_path_sum(root, target_sum: int) -> bool:
    if not root:
        return False
    if not root.left and not root.right:
        return target_sum == root.val
    return (
        has_path_sum(root.left, target_sum - root.val)
        or has_path_sum(root.right, target_sum - root.val)
    )

220. Serialize And Deserialize Tree

from collections import deque


def serialize(root) -> str:
    values = []

    def dfs(node):
        if not node:
            values.append("#")
            return
        values.append(str(node.val))
        dfs(node.left)
        dfs(node.right)

    dfs(root)
    return ",".join(values)


def deserialize(data: str):
    values = deque(data.split(","))

    def dfs():
        val = values.popleft()
        if val == "#":
            return None
        node = TreeNode(int(val))
        node.left = dfs()
        node.right = dfs()
        return node

    return dfs()

222. Clone Graph

def clone_graph(node):
    if not node:
        return None

    copies = {}

    def dfs(curr):
        if curr in copies:
            return copies[curr]

        copy = Node(curr.val)
        copies[curr] = copy
        for neighbor in curr.neighbors:
            copy.neighbors.append(dfs(neighbor))
        return copy

    return dfs(node)

223. Course Schedule

from collections import defaultdict, deque


def can_finish(num_courses: int, prerequisites: list[list[int]]) -> bool:
    graph = defaultdict(list)
    indegree = [0] * num_courses

    for course, prereq in prerequisites:
        graph[prereq].append(course)
        indegree[course] += 1

    q = deque([i for i in range(num_courses) if indegree[i] == 0])
    taken = 0

    while q:
        course = q.popleft()
        taken += 1
        for nxt in graph[course]:
            indegree[nxt] -= 1
            if indegree[nxt] == 0:
                q.append(nxt)

    return taken == num_courses

224. Topological Sort

Spoken answer: topological sort orders nodes so every prerequisite comes before dependent nodes. It works only for DAGs. Use indegree plus queue or DFS postorder.

225. Detect Cycle In Graph

For a directed graph, use DFS states.

def has_cycle_directed(graph: dict[int, list[int]]) -> bool:
    state = {}

    def dfs(node):
        if node in state:
            return state[node] == 1

        state[node] = 1
        for neighbor in graph.get(node, []):
            if dfs(neighbor):
                return True
        state[node] = 2
        return False

    for node in graph:
        if dfs(node):
            return True
    return False

226. Connected Components

def count_components(n: int, edges: list[list[int]]) -> int:
    graph = {i: [] for i in range(n)}
    for a, b in edges:
        graph[a].append(b)
        graph[b].append(a)

    seen = set()

    def dfs(node):
        if node in seen:
            return
        seen.add(node)
        for nxt in graph[node]:
            dfs(nxt)

    count = 0
    for node in range(n):
        if node not in seen:
            count += 1
            dfs(node)
    return count

227. Shortest Path In Unweighted Graph

def shortest_path_unweighted(graph: dict[int, list[int]], start: int, target: int) -> int:
    q = deque([(start, 0)])
    seen = {start}

    while q:
        node, dist = q.popleft()
        if node == target:
            return dist
        for nxt in graph.get(node, []):
            if nxt not in seen:
                seen.add(nxt)
                q.append((nxt, dist + 1))
    return -1

Why BFS: each edge has equal cost.

228. Dijkstra Basics

Spoken answer: Dijkstra finds shortest paths in a graph with non-negative edge weights. Use a min-heap keyed by current best distance. When a node is popped with its smallest known distance, relax its outgoing edges.

229. Union Find Basics

class UnionFind:
    def __init__(self, n: int):
        self.parent = list(range(n))
        self.rank = [0] * n

    def find(self, x: int) -> int:
        if self.parent[x] != x:
            self.parent[x] = self.find(self.parent[x])
        return self.parent[x]

    def union(self, a: int, b: int) -> bool:
        root_a = self.find(a)
        root_b = self.find(b)
        if root_a == root_b:
            return False

        if self.rank[root_a] < self.rank[root_b]:
            root_a, root_b = root_b, root_a
        self.parent[root_b] = root_a
        if self.rank[root_a] == self.rank[root_b]:
            self.rank[root_a] += 1
        return True

Use cases: cycle detection, connected components, MST.

230. Bipartite Graph Check

def is_bipartite(graph: list[list[int]]) -> bool:
    color = {}

    for start in range(len(graph)):
        if start in color:
            continue

        q = deque([start])
        color[start] = 0

        while q:
            node = q.popleft()
            for neighbor in graph[node]:
                if neighbor in color:
                    if color[neighbor] == color[node]:
                        return False
                else:
                    color[neighbor] = 1 - color[node]
                    q.append(neighbor)

    return True

Recursion, Backtracking, DP, Search

262. Climbing Stairs

def climb_stairs(n: int) -> int:
    if n <= 2:
        return n

    a, b = 1, 2
    for _ in range(3, n + 1):
        a, b = b, a + b
    return b

264. Longest Common Subsequence

def longest_common_subsequence(text1: str, text2: str) -> int:
    rows, cols = len(text1), len(text2)
    dp = [[0] * (cols + 1) for _ in range(rows + 1)]

    for i in range(rows - 1, -1, -1):
        for j in range(cols - 1, -1, -1):
            if text1[i] == text2[j]:
                dp[i][j] = 1 + dp[i + 1][j + 1]
            else:
                dp[i][j] = max(dp[i + 1][j], dp[i][j + 1])

    return dp[0][0]

265. Edit Distance

def min_distance(word1: str, word2: str) -> int:
    rows, cols = len(word1), len(word2)
    dp = [[0] * (cols + 1) for _ in range(rows + 1)]

    for i in range(rows + 1):
        dp[i][cols] = rows - i
    for j in range(cols + 1):
        dp[rows][j] = cols - j

    for i in range(rows - 1, -1, -1):
        for j in range(cols - 1, -1, -1):
            if word1[i] == word2[j]:
                dp[i][j] = dp[i + 1][j + 1]
            else:
                dp[i][j] = 1 + min(
                    dp[i + 1][j],
                    dp[i][j + 1],
                    dp[i + 1][j + 1],
                )

    return dp[0][0]

266. Knapsack Idea

Spoken answer: in 0/1 knapsack, each item can be taken once or skipped. The DP state is typically dp[i][capacity], meaning the best value using items from i onward with remaining capacity. The recurrence compares taking the item versus skipping it.

267. Subsets

def subsets(nums: list[int]) -> list[list[int]]:
    result = []

    def backtrack(index: int, path: list[int]) -> None:
        if index == len(nums):
            result.append(path[:])
            return
        path.append(nums[index])
        backtrack(index + 1, path)
        path.pop()
        backtrack(index + 1, path)

    backtrack(0, [])
    return result

268. Permutations

def permute(nums: list[int]) -> list[list[int]]:
    result = []

    def backtrack(start: int) -> None:
        if start == len(nums):
            result.append(nums[:])
            return
        for i in range(start, len(nums)):
            nums[start], nums[i] = nums[i], nums[start]
            backtrack(start + 1)
            nums[start], nums[i] = nums[i], nums[start]

    backtrack(0)
    return result

269. Combination Sum

def combination_sum(candidates: list[int], target: int) -> list[list[int]]:
    result = []

    def backtrack(index: int, remaining: int, path: list[int]) -> None:
        if remaining == 0:
            result.append(path[:])
            return
        if remaining < 0 or index == len(candidates):
            return

        path.append(candidates[index])
        backtrack(index, remaining - candidates[index], path)
        path.pop()
        backtrack(index + 1, remaining, path)

    backtrack(0, target, [])
    return result

270. N-Queens Idea

Spoken answer: place queens row by row. At each row, try safe columns only. Track used columns, diagonals, and anti-diagonals in sets. Backtrack when a placement creates conflict.

272. Search In Rotated Sorted Array

def search_rotated(nums: list[int], target: int) -> int:
    left, right = 0, len(nums) - 1

    while left <= right:
        mid = left + (right - left) // 2
        if nums[mid] == target:
            return mid

        if nums[left] <= nums[mid]:
            if nums[left] <= target < nums[mid]:
                right = mid - 1
            else:
                left = mid + 1
        else:
            if nums[mid] < target <= nums[right]:
                left = mid + 1
            else:
                right = mid - 1

    return -1

273. Find First And Last Position

def search_range(nums: list[int], target: int) -> list[int]:
    def bound(find_first: bool) -> int:
        left, right = 0, len(nums) - 1
        answer = -1
        while left <= right:
            mid = left + (right - left) // 2
            if nums[mid] == target:
                answer = mid
                if find_first:
                    right = mid - 1
                else:
                    left = mid + 1
            elif nums[mid] < target:
                left = mid + 1
            else:
                right = mid - 1
        return answer

    return [bound(True), bound(False)]

274. Median Of Two Sorted Arrays Idea

Spoken answer: the optimal solution does a binary search on the smaller array to find a partition where left-side elements are less than or equal to right-side elements across both arrays. The median then comes from the border values. This gives O(log(min(m, n))).

275. Kth Largest Element Using Heap

import heapq


def find_kth_largest(nums: list[int], k: int) -> int:
    heap = []
    for num in nums:
        heapq.heappush(heap, num)
        if len(heap) > k:
            heapq.heappop(heap)
    return heap[0]

276. Top K Frequent Elements

from collections import Counter
import heapq


def top_k_frequent(nums: list[int], k: int) -> list[int]:
    counts = Counter(nums)
    return [item for item, _ in heapq.nlargest(k, counts.items(), key=lambda x: x[1])]

277. Heap Basics

Spoken answer: a heap is a partial ordering where the smallest element in a min-heap is always at the root. It is ideal when you repeatedly need the smallest or largest element but do not need a fully sorted structure.

278. Recursion Tree Thinking

Spoken answer: recursion-tree thinking means visualizing how each call branches into smaller calls, how deep the tree goes, and how much repeated work happens. It helps estimate time complexity and identify when memoization is needed.

279. Memoization vs Tabulation

Memoization: top-down recursion with caching.
Tabulation: bottom-up iteration.

Spoken answer: memoization is often easier to derive from the recurrence. Tabulation usually gives tighter control over iteration order and avoids recursion overhead.

280. State Definition In DP

Spoken answer: the most important DP step is defining what dp[i] or dp[i][j] means. If the state is ambiguous, the recurrence will be wrong. I try to define the state in plain English before writing transitions.

Mock Interview Questions With Spoken Answers

These are intentionally written in a speaking style, not an essay style.

1. Walk Me Through How You Solve A Coding Problem In A Live Interview

Spoken answer:
First I want to restate the problem and confirm the input and output assumptions. Then I usually start with the brute-force approach so we have a correct baseline. After that I look for the pattern, such as hash map, sliding window, two pointers, stack, BFS, or dynamic programming. Once I have the optimized approach, I will explain the complexity, code it cleanly, and test at least one normal case and one edge case out loud.

2. How Do You Decide Between BFS And DFS?

Spoken answer:
I usually ask what the problem is optimizing for. If I need the shortest path in an unweighted graph or I need strict level-order traversal, BFS is the natural choice. If I need exhaustive exploration, path checks, recursion over structure, or backtracking, DFS is often simpler. I also consider memory, because BFS can hold an entire frontier at once.

3. Explain Why You Used A Hash Map In Two Sum

Spoken answer:
The brute-force version compares every pair, which is O(n^2). A hash map lets me remember what I have already seen, so for each number I can ask in constant time whether the complement already exists. That reduces the time complexity to O(n) with O(n) extra space.

4. How Would You Explain Dynamic Programming To A Non-Expert Interviewer?

Spoken answer:
I think of dynamic programming as solving a large problem by storing answers to overlapping smaller problems so I do not recompute them. The core steps are defining the state, writing the recurrence, choosing base cases, and deciding whether top-down memoization or bottom-up tabulation is cleaner.

5. What Is The Most Common Mistake In SQL Interviews?

Spoken answer:
The most common mistake is not being explicit about grain. People jump into joins and aggregations without saying what one output row represents. That leads to row explosion, wrong counts, and incorrect sums. I try to say the output grain first, then build the query around that.

6. How Do You Debug Wrong Results In A SQL Query?

Spoken answer:
I reduce the problem step by step. I check row counts at each stage, verify key uniqueness before joins, inspect null behavior, and test the logic on a very small sample I can reason about by hand. If there is a join, I check whether I accidentally changed the grain. If there is an aggregation, I validate totals before and after.

7. How Would You Explain SCD Type 2 In An Interview?

Spoken answer:
SCD Type 2 is how I preserve full history for changing dimension attributes. Instead of overwriting the row, I insert a new version with effective start and end timestamps and mark the current row. That lets analysts answer questions like what the customer segment was at the time of an order, not just what it is today.

8. How Do You Answer A System Design Question For A Data Platform?

Spoken answer:
I start with requirements, especially scale, latency, reliability, and governance constraints. Then I walk left to right through sources, ingestion, storage, processing, and serving. After that I add observability, security, and failure handling. I always close with trade-offs, because architecture is about choosing the right compromise for the business need.

9. When Would You Choose Batch Over Streaming?

Spoken answer:
I choose batch when the business does not need second-level latency and simplicity matters more than immediacy. It is usually cheaper, easier to operate, and easier to reason about. I choose streaming when the data has to drive near-real-time decisions such as fraud detection, alerting, or live operational workflows.

10. How Do You Handle A Problem When You Forget The Exact Algorithm?

Spoken answer:
I do not try to hide that. I restate the brute-force approach, solve the problem correctly first, and then look for structure that improves it. Interviewers usually care more about problem-solving discipline than whether I instantly remember the optimal named algorithm.

11. Give Me An Example Of Explaining Trade-Offs Clearly

Spoken answer:
If I compare Spark and Trino, I would say Spark is strong for heavy transformations and large compute jobs, while Trino is strong for interactive SQL and federation. Spark may be better for pipeline compute, and Trino may be better for low-latency ad hoc analytics. The right choice depends on workload pattern, concurrency, governance needs, and operational cost.

12. How Do You Keep Python Code Readable Under Interview Pressure?

Spoken answer:
I keep it boring on purpose. I avoid clever one-liners and use simple loops, dictionaries, sets, and helper functions. I optimize for correctness and readability first, because interview code needs to be explained live, not just run.

Final 3-Day Revision Checklist

Day 1: SQL And Modeling

Morning

practice 3 window-function problems
practice 3 join-debugging problems
solve 2 aggregation problems with GROUP BY, HAVING, and conditional logic
explain ROW_NUMBER, RANK, and DENSE_RANK without notes

Afternoon

define grain for 5 sample business questions
explain fact vs dimension out loud
explain star vs snowflake out loud
explain SCD Type 1, 2, and 3 with one example each
draw one ecommerce star schema on paper

Evening

solve one end-to-end SQL problem with CTE + join + window
review row explosion, null behavior, and duplicate join keys
rehearse a spoken answer for late-arriving data and unknown dimension members

Day 2: Coding

Morning

arrays and strings:
- Two Sum
- Three Sum
- Longest Substring Without Repeating Characters
- Product of Array Except Self
- Merge Intervals

Afternoon

linked list and stack:
- Reverse Linked List
- Detect Cycle
- Valid Parentheses
- Daily Temperatures
- LRU Cache concept

Evening

trees and graphs:
- Maximum Depth
- Validate BST
- Number of Islands
- Course Schedule
- BFS vs DFS explanation

Night Review

say this exact structure for every coding problem:
- brute force
- optimized idea
- complexity
- edge cases
- quick test case

Day 3: Architecture And Interview Delivery

Morning

explain batch vs streaming
explain Kafka topic, partition, offset, consumer group
explain CDC ingestion design
explain API ingestion design
explain Spark vs Trino vs lakehouse roles

Afternoon

do one whiteboard architecture drill:
- sources
- ingestion
- storage
- processing
- serving
- governance
- observability
- failure handling

Evening

practice 10 mock spoken answers from this page
rehearse one project story with measurable impact
rehearse one failure story and how you fixed it
rehearse one trade-off discussion

Final Night Before Interview

do not learn new topics
review only your must-win set
write 5 coding patterns from memory
write 5 SQL patterns from memory
sleep instead of cramming

Must-Win Set

If time is limited, review these first:

SQL: windows, joins, deduplication, CTEs, aggregations
coding: hash map, sliding window, stack, linked list, BFS/DFS, binary search
modeling: grain, fact vs dimension, SCD2, snapshots
architecture: batch vs stream, Kafka basics, CDC, lakehouse, governance

Interview-Day Speaking Reminders

start simple
state assumptions
keep the grain explicit
do not jump into code too early
close with complexity or trade-off summary
if stuck, fall back to brute force and improve

Suggested Next File After This

If you want a Part 3, the most useful next separate file would be:

Salesforce-focused behavioral answers
data engineering system design whiteboard templates
20 timed mock interview exercises with answer keys