Comprehensive comparative analysis of the principles, complexity and stability of bubble, selection, insertion, Hill, quick sort, merge, heap sort, counting, bucket, and radix sort.
Sorting is one of the most basic and important algorithmic problems in computer science. Whether it is the construction of database indexes, the presentation of search engine results, or the scheduling of operating system processes, sorting algorithms play an indispensable role. In his classic book The Art of Computer Programming, Donald Knuth noted that about 25% of a computer's running time is spent sorting. An in-depth understanding of the principles, performance characteristics and applicable scenarios of various sorting algorithms is a required course for every programmer.
This article will conduct a comprehensive and in-depth comparative analysis of ten classic sorting algorithms, covering principle explanation, complexity analysis, code implementation, and practical application scenarios.
1. Bubble Sort
Brief description of the principle: Bubble sorting repeatedly traverses the sequence to be sorted, comparing two adjacent elements each time, and swaps them if they are in the wrong order. Each round of traversal will "bubble" the largest element of the current unsorted part to the correct position.
Complexity Analysis:
- Best time complexity: O(n) (the sequence is ordered and only traversed once)
- Worst time complexity: O(n²) (complete reverse order of the sequence)
- Average time complexity: O(n²)
- Space complexity: O(1)
- Stability: Stable
def bubble_sort(arr): n = len(arr) for i in range(n): swapped = False for j in range(0, n - i - 1): if arr[j] > arr[j + 1]: arr[j], arr[j + 1] = arr[j + 1], arr[j] swapped = True if not swapped: # 若本轮无交换,说明已有序 break return arr二、选择排序(Selection Sort)
原理简述: 选择排序每次从未排序部分中找到最小(或最大)元素,将其放到已排序部分的末尾。算法维护两个子序列:已排序部分和未排序部分。
复杂度分析:
- 最好时间复杂度:O(n²)
- 最坏时间复杂度:O(n²)
- 平均时间复杂度:O(n²)
- 空间复杂度:O(1)
- 稳定性:不稳定(交换可能打破相同元素的相对顺序)
def selection_sort(arr): n = len(arr) for i in range(n): min_idx = i for j in range(i + 1, n): if arr[j] < arr[min_idx]: min_idx = j arr[i], arr[min_idx] = arr[min_idx], arr[i] return arr3. Insertion Sort
Brief description of the principle: Insertion sort divides the array into two parts, sorted and unsorted. Each time, the first element of the unsorted part is taken, and a suitable position is found in the sorted part for insertion. It is similar to the process we use to organize our hands when playing cards.
Complexity Analysis:
- Best time complexity: O(n) (sequence is ordered)
- Worst time complexity: O(n²) (complete reverse order of the sequence)
- Average time complexity: O(n²)
- Space complexity: O(1)
- Stability: Stable
def insertion_sort(arr): for i in range(1, len(arr)): key = arr[i] j = i - 1 while j >= 0 and arr[j] > key: arr[j + 1] = arr[j] j -= 1 arr[j + 1] = key return arr插入排序在小规模数据和近乎有序的数据上表现极为优秀,许多高级排序算法(如 Timsort)在处理小规模子数组时会回退到插入排序。
四、希尔排序(Shell Sort)
原理简述: 希尔排序是插入排序的改进版本,核心思想是通过设定一个递减的间隔序列(gap sequence),先对间隔较大的元素进行排序,逐步缩小间隔,最终在间隔为 1 时退化为标准插入排序。由于前期的大间隔排序已使数据趋于有序,最终的插入排序效率大大提高。
复杂度分析:
- 最好时间复杂度:O(n log n)(取决于间隔序列)
- 最坏时间复杂度:O(n²)(使用原始 Shell 间隔序列)至 O(n^(4/3))(使用 Sedgewick 序列)
- 平均时间复杂度:取决于间隔序列,通常介于 O(n log n) 和 O(n²) 之间
- 空间复杂度:O(1)
- 稳定性:不稳定
def shell_sort(arr): n = len(arr) gap = n // 2 while gap > 0: for i in range(gap, n): temp = arr[i] j = i while j >= gap and arr[j - gap] > temp: arr[j] = arr[j - gap] j -= gap arr[j] = temp gap //= 2 return arr5. Quick Sort
Brief description of the principle: Quick sort uses the divide-and-conquer strategy, selects a pivot element, divides the array into two parts - elements smaller than the pivot and elements larger than the pivot, and then sorts the two parts recursively. The core of quick sort lies in the partition operation.
Complexity Analysis:
- Best time complexity: O(n log n) (divided evenly every time)
- Worst time complexity: O(n²) (take the best value each time as the benchmark, such as taking the first element of a sorted array)
- Average time complexity: O(n log n)
- Space complexity: O(log n) (recursion stack depth)
- Stability: Unstable
def quick_sort(arr, low=0, high=None): if high is None: high = len(arr) - 1 if low < high: pivot_idx = partition(arr, low, high) quick_sort(arr, low, pivot_idx - 1) quick_sort(arr, pivot_idx + 1, high) return arr def partition(arr, low, high): pivot = arr[high] # 选择最后一个元素作为基准 i = low - 1 for j in range(low, high): if arr[j] <= pivot: i += 1 arr[i], arr[j] = arr[j], arr[i] arr[i + 1], arr[high] = arr[high], arr[i + 1] return i + 1快速排序是实践中最常用的排序算法之一,其平均性能极为优秀。通过随机化选取 pivot 或三数取中法(median-of-three),可以有效避免最坏情况。
六、归并排序(Merge Sort)
原理简述: 归并排序同样采用分治策略,将数组递归地分成两半,分别排序后再合并(merge)。合并操作是归并排序的核心,它将两个有序子数组合并为一个有序数组。
复杂度分析:
- 最好时间复杂度:O(n log n)
- 最坏时间复杂度:O(n log n)
- 平均时间复杂度:O(n log n)
- 空间复杂度:O(n)(需要额外数组存放合并结果)
- 稳定性:稳定
def merge_sort(arr): if len(arr) <= 1: return arr mid = len(arr) // 2 left = merge_sort(arr[:mid]) right = merge_sort(arr[mid:]) return merge(left, right) def merge(left, right): result = [] i = j = 0 while i < len(left) and j < len(right): if left[i] <= right[j]: # <= ensure stability result.append(left[i]) i += 1 else: result.append(right[j]) j += 1 result.extend(left[i:]) result.extend(right[j:]) return resultMerge sort has stable performance and is not affected by the distribution of input data. It is the preferred algorithm for external sorting (when the data volume exceeds memory).
7. Heap Sort
Brief description of the principle: Heap sort uses the data structure of a binary heap (usually a maximum heap). First, the array is constructed as a max-heap, and then the top (maximum value) and end elements of the heap are repeatedly exchanged to reduce the scope of the heap and readjust the heap structure to achieve sorting.
Complexity Analysis:
- Best time complexity: O(n log n)
- Worst time complexity: O(n log n)
- Average time complexity: O(n log n)
- Space complexity: O(1)
- Stability: Unstable
def heap_sort(arr): n = len(arr) # 构建最大堆 for i in range(n // 2 - 1, -1, -1): heapify(arr, n, i) # 逐个提取堆顶元素 for i in range(n - 1, 0, -1): arr[0], arr[i] = arr[i], arr[0] heapify(arr, i, 0) return arr def heapify(arr, n, i): largest = i left = 2 * i + 1 right = 2 * i + 2 if left < n and arr[left] > arr[largest]: largest = left if right < n and arr[right] > arr[largest]: largest = right if largest != i: arr[i], arr[largest] = arr[largest], arr[i] heapify(arr, n, largest)堆排序的优势在于最坏情况下仍保证 O(n log n) 且只需 O(1) 额外空间,但由于缓存不友好(跳跃式内存访问),实际运行速度通常不如快速排序。
八、计数排序(Counting Sort)
原理简述: 计数排序是一种非比较排序算法。它通过统计每个元素出现的次数,然后根据统计信息将元素放到正确位置。适用于数据范围有限的整数排序。
复杂度分析:
- 时间复杂度:O(n + k),其中 k 为数据范围
- 空间复杂度:O(n + k)
- 稳定性:稳定
def counting_sort(arr): if not arr: return arr max_val = max(arr) min_val = min(arr) range_val = max_val - min_val + 1 count = [0] * range_val for num in arr: # Cumulative count # Traverse from back to front to maintain stability return output ## 9. Bucket Sort **Brief description of the principle:** Bucket sorting distributes data into several "buckets", each bucket is sorted independently (usually using insertion sort or other sorting algorithms), and finally the results of all buckets are spliced together. Bucket sort is extremely efficient when the data is evenly distributed. **Complexity Analysis:**- Best time complexity: O(n + k) (data is uniformly distributed, k is the number of buckets)- Worst time complexity: O(n²) (all data falls into the same bucket)- Average time complexity: O(n + n²/k + k), O(n) when k ≈ n- Space complexity: O(n + k)- Stability: **Stable** (depends on the bucket sorting algorithm) ```pythondef bucket_sort(arr, bucket_count=10): if not arr: return arr min_val, max_val = min(arr), max(arr) bucket_range = (max_val - min_val) / bucket_count + 1 buckets = [[] for _ in range(bucket_count)] for num in arr: idx = int((num - min_val) / bucket_range) buckets[idx].append(num) result = [] for bucket in buckets: result.extend(insertion_sort(bucket)) # 桶内使用插入排序 return result十、基数排序(Radix Sort)
原理简述: 基数排序按照位数从低位到高位(LSD)或从高位到低位(MSD)依次进行排序,每一位的排序使用稳定的排序算法(通常是计数排序)。适用于整数或等长字符串排序。
复杂度分析:
- 时间复杂度:O(d × (n + k)),d 为最大位数,k 为基数(如十进制 k=10)
- 空间复杂度:O(n + k)
- 稳定性:稳定
def radix_sort(arr): if not arr: return arr max_val = max(arr) exp = 1 while max_val // exp > 0: counting_sort_by_digit(arr, exp) exp *= 10 return arr def counting_sort_by_digit(arr, exp): n = len(arr) output = [0] * n count = [0] * 10 for num in arr: for i in range(1, 10): for i in range(n - 1, -1, -1): for i in range(n): ## Full comparison table | Sorting algorithm | Best time | Average time | Worst time | Space complexity | Stability ||---------|---------|---------|---------|----------|--------|| Bubble sort | O(n) | O(n²) | O(n²) | O(1) | Stable || Selection sort | O(n²) | O(n²) | O(n²) | O(1) | Unstable || Insertion sort | O(n) | O(n²) | O(n²) | O(1) | Stable || Hill sort | O(n log n) | O(n^1.3) | O(n²) | O(1) | Unstable || Quick sort | O(n log n) | O(n log n) | O(n²) | O(log n) | Unstable || Merge sort | O(n log n) | O(n log n) | O(n log n) | O(n) | Stable || Heap sort | O(n log n) | O(n log n) | O(n log n) | O(1) | Unstable || Counting sort | O(n+k) | O(n+k) | O(n+k) | O(n+k) | Stable || Bucket sort | O(n+k) | O(n+k) | O(n²) | O(n+k) | Stable || Radix sort | O(d(n+k)) | O(d(n+k)) | O(d(n+k)) | O(n+k) | Stable | > Note: k is the data range or the number of buckets, and d is the maximum number of digits. ## Actual application scenario analysis **Small size data (n < 50):** Insertion sort is the best choice. The constant factor is small and extremely efficient for nearly ordered data. **Medium-Scale General Purpose Sort:** Quicksort is the fastest comparison sorting algorithm in most scenarios, with a small constant factor in the average case. **Stable sort required:** Merge sort is an O(n log n) algorithm that is stable and has predictable performance. Stability is crucial in scenarios where the original order of equal elements needs to be preserved (such as multi-key sorting). **Memory limited:** Heap sort requires only O(1) extra space, suitable for embedded systems with tight memory. **Integer sorting with limited data range:** Counting sort or radix sort can achieve linear time complexity that far exceeds the O(n log n) theoretical lower bound of comparison sort. **Large-scale external sorting:** Merge sort is a natural fit for external sorting because its sequential access pattern lends itself well to disk I/O. **Floating point sorting with uniform data distribution:** Bucket sorting can achieve near-linear sorting efficiency. ## The underlying implementation of sorting in each language’s standard library ### Python — Timsort Python's built-in `sort()` and `sorted()` use the **Timsort** algorithm, designed by Tim Peters in 2002. Timsort is a hybrid sorting algorithm that combines the advantages of merge sort and insertion sort: - First identify an ordered subsequence in the data (called a "run") The time complexity of Timsort is O(n log n), and it performs particularly well on partially sorted data, and can reach O(n) in the best case. ### C — qsort The C standard library's `qsort()` is usually implemented based on quick sort, but the specific implementation varies between compilers. glibc's implementation uses merge sort (when there is enough memory) or quick sort (when there is not enough memory), and uses insertion sort optimization on small arrays. MSVC's implementation uses introspective sorting (Introsort), combining quick sort, heap sort, and insertion sort. ### Java — Dual-Pivot Quicksort and Timsort Java uses **Dual-Pivot Quicksort** (dual-pivot quicksort, designed by Vladimir Yaroslavskiy) for arrays of primitive types, choosing two pivots to divide the array into three parts, which is faster in practice than classic quicksort. For object arrays, Java uses **Timsort** to ensure the stability of sorting. ### C++ — Introsort C++ STL's `std::sort` usually uses **Introsort** (introspective sort), proposed by David Musser in 1997. It starts with quick sort, switches to heap sort when the recursion depth exceeds 2×log₂(n) to guarantee the worst case of O(n log n), and uses insertion sort for small arrays. ### Go — Pattern-Defeating Quicksort (pdqsort) The Go language standard library uses **pdqsort**, which is a modern hybrid sorting algorithm that is close to the performance of quick sort on random data and can efficiently handle data in special patterns such as sorted, reverse order, and large number of repetitions. ## Summary and selection suggestions There is no absolute "best" sorting algorithm, only the one that is most suitable for a specific scenario. Here is a brief decision-making guide: 1. **When in doubt, use the language standard library**: The sorting implementation of the modern language standard library has been carefully optimized and is the best choice for most scenarios. Understanding these algorithms not only helps to choose appropriate sorting solutions, but also cultivates the thinking ability to analyze problems and design algorithms. The theoretical lower bound of comparative sorting is O(n log n), but non-comparative sorting can break through this limit under certain conditions - this idea itself is an exquisite example of "breaking conventional constraints" in algorithm design.