Skip List: A Probabilistic Data Structure for Search, Insert and Delete

Skip List: A Probabilistic Data Structure for Search, Insert and Delete
Photo by Dominic Kurniawan Suryaputra / Unsplash

Skip List is a probabilstic data structure that gives us O(log n) average complexity (and O(n) for worst case for all) for search, insert and delete operations.

Why do we need this?

If we think about the sorted linked list, when we search some target value we need to search with O(n) complexity for all the time.

On the other hand, if we use a sorted array instead of now, we sacrifice from the insert operation overhead since any of the insertions would cause n-1 shifting (or recreating a new array according to implementation)

To break this requirement and give ourselves space to find the target more quickly, we need structural changes that bring us to this structure as one of the answers.

How does it work?

Let's give an analogy that focuses on the search part to help you understand this algorithm better.

Now, imagine that you are in a line for buying a brand new phone that is limited or something else and one of your friends telling you on the phone that he/she is already in line but unable to describe where it is. To find your friend, you have to traverse all the lines one by one and ask them if they have seen someone like your friend. (Let's consider they all know by resembling your friend's description)

This issue may end with endless searching and to solve this issue, you are creating "shortcut lines" on top of the main line. These lines can be more than one, and to simplify, you may think the shortcut lines like;

  • First shortcut: we pick every second person and let them form a new and shorter line,
  • Second shortcut: we pick every fourth person and make an even shorter line.
  • and so on until the searching would be super fast (we can set a probabilistic value, but for this story, let's just say "enough" number of lines)

Now we have shorter lines, and it is time to search by following the four basic steps,

  1. Start at the topmost shortcut line (the fastest one).
  2. Look ahead: If the next person in the shortcut line is still before your friend, keep moving forward.
  3. If the next person is past your friend, drop down to the next lower shortcut line.
  4. Repeat until you reach your friend!
skip list presentation from https://en.wikipedia.org/wiki/Skip_list
💡
While checking the presentation consider the horizontal lines are the actual lines and you will start from the head's top element by checking the diff with the current node and

Deletion

The delete operations are simple when we understand how we search within this data structure. The operation just follows almost the same logic that we have for the basic linked list.

There are also some edge cases we need to consider, like removing the topmost level if there is no item when we delete the target item.

Here is an example from geeks for geeks;

delete operation from https://www.geeksforgeeks.org/skip-list-set-3-searching-deletion/

As we see, when we want to delete item 6, we just delete that node and pass the references to the next one accordingly. After this step, we know that level 3 has no item, meaning we no longer need that. So, we delete that as well (shown in last section in the figure)

Implementation Example


class Comparable(Protocol):
    def __lt__(self, other: Any) -> bool: ...


T = TypeVar("T", bound=Comparable)


@dataclass
class Node(Generic[T]):
    """Node class for Skip List that can hold any comparable value."""

    value: Optional[T]
    forward: List[Optional["Node[T]"]]

    def __init__(self, value: Optional[T], level: int):
        self.value = value
        self.forward = [None] * level


class SkipList(Generic[T]):
    """
    Skip List implementation with O(log n) average case complexity for search, insert, and delete operations.
    """

    def __init__(self, max_level: int = 16, probability: float = 0.5):
        """
        Initialize Skip List.

        Args:
            max_level: Maximum level of the skip list
            probability: Probability for level promotion (default: 0.5)
        """
        self.max_level = max_level
        self.probability = probability
        self.level = 0
        self.head = Node[T](None, max_level)  # Sentinel node
        self.size = 0

    def _random_level(self) -> int:
        """
        Generate a random level for a new node.

        Returns:
            int: Random level between 1 and max_level
        """
        level = 1
        while random.random() < self.probability and level < self.max_level:
            level += 1
        return level

    def insert(self, value: T) -> None:
        """
        Insert a value into the skip list.

        Args:
            value: Value to insert
        """
        # Create update array to store nodes that need to be updated
        update: List[Optional[Node[T]]] = [None] * self.max_level
        current = self.head

        # Find the position to insert the new node
        for i in range(self.level - 1, -1, -1):
            while (
                current.forward[i] is not None
                and current.forward[i].value is not None
                and current.forward[i].value < value
            ):
                current = current.forward[i]
            update[i] = current

        # Generate random level for the new node
        new_level = self._random_level()

        # Update the skip list level if necessary
        if new_level > self.level:
            for i in range(self.level, new_level):
                update[i] = self.head
            self.level = new_level

        # Create and insert the new node
        new_node = Node(value, new_level)
        for i in range(new_level):
            new_node.forward[i] = update[i].forward[i]
            update[i].forward[i] = new_node

        self.size += 1

    def search(self, value: T) -> Optional[T]:
        """
        Search for a value in the skip list.

        Args:
            value: Value to search for

        Returns:
            Optional[T]: The value if found, None otherwise
        """
        current = self.head

        # Start from the highest level and move down
        for i in range(self.level - 1, -1, -1):
            while (
                current.forward[i] is not None
                and current.forward[i].value is not None
                and current.forward[i].value < value
            ):
                current = current.forward[i]

            if current.forward[i] is not None and current.forward[i].value == value:
                return current.forward[i].value

        return None

    def delete(self, value: T) -> bool:
        """
        Delete a value from the skip list.

        Args:
            value: Value to delete

        Returns:
            bool: True if value was found and deleted, False otherwise
        """
        update: List[Optional[Node[T]]] = [None] * self.max_level
        current = self.head

        # Find the node to delete
        for i in range(self.level - 1, -1, -1):
            while (
                current.forward[i] is not None
                and current.forward[i].value is not None
                and current.forward[i].value < value
            ):
                current = current.forward[i]
            update[i] = current

        # Check if the value exists
        if current.forward[0] is None or current.forward[0].value != value:
            return False

        # Delete the node
        node_to_delete = current.forward[0]
        for i in range(self.level):
            if update[i].forward[i] != node_to_delete:
                break
            update[i].forward[i] = node_to_delete.forward[i]

        # Update the skip list level
        while self.level > 0 and self.head.forward[self.level - 1] is None:
            self.level -= 1

        self.size -= 1
        return True

    def __len__(self) -> int:
        """
        Get the number of elements in the skip list.

        Returns:
            int: Number of elements
        """
        return self.size

    def __str__(self) -> str:
        """
        String representation of the skip list.

        Returns:
            str: String representation
        """
        result = []
        current = self.head.forward[0]
        while current is not None and current.value is not None:
            result.append(str(current.value))
            current = current.forward[0]
        return " -> ".join(result) if result else "Empty Skip List"

Conclusion

Although there is no such data structure that solves every problem, a skip list is a good tool when you need to use a linked list and you don't want to create too much effort and time consumption for mostly the search cases and you don't have any concern about the memory usage.

Reference

Skip list - Wikipedia
Efficient and robust approximate nearest neighbor search using Hierarchical Navigable Small World graphs
We present a new approach for the approximate K-nearest neighbor search based on navigable small world graphs with controllable hierarchy (Hierarchical NSW, HNSW). The proposed solution is fully graph-based, without any need for additional search structures, which are typically used at the coarse search stage of the most proximity graph techniques. Hierarchical NSW incrementally builds a multi-layer structure consisting from hierarchical set of proximity graphs (layers) for nested subsets of the stored elements. The maximum layer in which an element is present is selected randomly with an exponentially decaying probability distribution. This allows producing graphs similar to the previously studied Navigable Small World (NSW) structures while additionally having the links separated by their characteristic distance scales. Starting search from the upper layer together with utilizing the scale separation boosts the performance compared to NSW and allows a logarithmic complexity scaling. Additional employment of a heuristic for selecting proximity graph neighbors significantly increases performance at high recall and in case of highly clustered data. Performance evaluation has demonstrated that the proposed general metric space search index is able to strongly outperform previous opensource state-of-the-art vector-only approaches. Similarity of the algorithm to the skip list structure allows straightforward balanced distributed implementation.
Skip List | Set 3 (Searching and Deletion) - GeeksforGeeks
A Computer Science portal for geeks. It contains well written, well thought and well explained computer science and programming articles, quizzes and practice/competitive programming/company interview Questions.