Frequency Count Sort: A Near-Linear Time Distribution Sort for Low-Entropy Datasets with Frequent Duplicates | IJCT Volume 13 – Issue 3 | IJCT-V13I3P122

International Journal of Computer Techniques
ISSN 2394-2231
Volume 13, Issue 3  |  Published: May – June 2026

Author

Ogunbor Anthony Aikhomu

Abstract

Many real-world datasets exhibit structural low entropy: a small number of unique values u repeated across a large input of size n, where u is far less than n (u << n). Classic examples include election results, student grades, national exam, genomic sequences, sensor telemetry, and survey responses. Despite the prevalence of this data profile, very few sorting algorithms in common use, e.g., the 3-way Quicksort, have been designed explicitly for the dataset domain represented in this paper as typically u <= 100 and n >= 100,000. This paper introduces Frequency Count Sort (FC-Sort), a distribution-based sorting algorithm targeting low-entropy, high-duplicate datasets. FC-Sort operates in three stages: (1) a single O(n) frequency count, building a map of u unique keys to occurrence counts; (2) a sort of unique keys alone, using insertion sort and completing in at most u(u-1)/2 comparisons in the worst case – a fixed constant independent of n; and (3) a linear O(n) left-to-right reconstruction with a total complexity of O(n + u²). Empirical benchmark using optimized FC sort indicates that the algorithm consistently outperforms one of the two best performing algorithms in the problem space defined – Radix sort, while remaining competitive with the other – Counting sort and showing consistent superiority for smaller values of u. The algorithm outperforms all O(nlogn) algorithms by a factor of 2x to over one order of magnitude across the dataset space tested. The exception being 3-way quicksort which at very small u (u<=5), can marginally outperform FC sort.

Keywords

Fc-sort; homogeneity test heuristic (HTH); low-entropy datasets; distribution sorting; empirical algorithm analysis; adaptive sorting algorithms

Conclusion

This paper introduces Frequency Count Sort (FC-Sort), a distribution-based sorting algorithm designed explicitly for low-entropy datasets in which the number of unique keys u is substantially smaller than the total number of observations n. By reducing the sorting problem to the unique keys in the input list, that is, a frequency map of u unique keys, a sort of those keys alone, and then a linear reconstruction, FC-Sort achieves near-linear time complexity O(n + u²) that degrades gracefully as u grows and converges to O(n) within the defined problem domain of u ≤ 100. Empirical benchmarks across a broad configuration space (n ∈ {100,000 to 2,500,000}, u ∈ {5 to 100}) demonstrate and confirm that FC-Sort ranks first or remains statistically equivalent to the best-performing competitor (Counting Sort) in every optimized configuration tested within this domain. It has been established in the algorithms literature that 3-way Quicksort represents the optimal comparison-based approach for sorting all-identical key datasets, owing to its single equal-zone partition pass making it achieve linear O(n) time [1], [7]. This optimality is bounded by the requirement to examine every element/key. This paper demonstrates that FC-Sort, deploying a post-frequency-count early-exit heuristic, termed Homogeneity Test Heuristic (HTH), breaks that bound. In this specific all-equal keys domain, it achieves a strictly lower constant-factor cost by eliminating the sort and reconstruction phases entirely upon detection of u = 1. This reduces the total sorting task to a single O(n) frequency count pass — the theoretical minimum, since any algorithm must examine every element at least once. FC-Sort does not merely match the optimal comparison-based baseline of 3-Way Quicksort in this single key domain, it dominates it, achieving work that is asymptotically less and, for large n. The algorithm is intentionally simple: three stages, implementable from pseudocode in under 20 lines and portable across any language supporting hash-based associative structures. This simplicity is a feature and it lowers the barrier to adoption, auditing, reproducibility and verification. Two directions for future work are identified. First, considering that the algorithm demonstrates efficiency with some specialized dataset with equal keys, a fusion with another algorithm with O(nlogn) complexity in a hybrid design could potentially result in a very competitive O(nlogn) algorithm. Second, the algorithm design practically condenses the input list to the much smaller sub-list of unique keys, and given that the frequency count of all unique elements are known before the left to right serial expansion, a significant time incurred in serial writing to target list could be reduced under parallelism. This would mean that the writing to the target list of each or some unique elements could be assigned to separate processors in the machine such that writing operation occurs simultaneously saving time and thus improving runtime.

References

[1] R. Sedgewick, “Quicksort with Equal Keys,” SIAM Journal on Computing, Vol. 6, pp. 240 – 267, June 1977 https://doi.org/10.1137/0206018 [2] T.H. Cormen, C.E. Leiserson, R. L. Rivest, and C. Stein, “Introduction to Algorithms, 4th Edition”, MIT Press, pp. 89, 313, – 314, 290 – 295, 2022 [3] C.A.R. Hoare, “Quicksort”, The Computer Journal, vol 5, pp. 10– 16, 1961, doi.org/10.1093/comjnl/5.1.10 [4] R. Sedgewick, “The Analysis of Quicksort Programs”, ACTA Informatica, Vol 7, pp. 327–355, 1977, doi.org/10.1007/BF00289467 [5] E. Dijistrar, “A Discipline of Programming, The Dutch National Flag Problem”, Prentice-Hall, Chapter 14, 1976 [6] J. L. Bentley and R. Sedgewick, “Fast Algorithms for Sorting and Searching Strings”, Proceedings of the 8th Annual ACM–SIAM Symposium on Discrete Algorithms, pp. 360–369, New Orleans, 1997 [7] S. Wild, “Quicksort is Optimal for Many Equal Keys”, Proceedings of the Fifteenth Meeting on Analytic Algorithmics and Combinatorics (ANALCO), SIAM, 201. doi.org//10.1137/1.9781611975062.2

How to Cite This Paper

Ogunbor Anthony Aikhomu (2026). Frequency Count Sort: A Near-Linear Time Distribution Sort for Low-Entropy Datasets with Frequent Duplicates. International Journal of Computer Techniques, 13(3). ISSN: 2394-2231.

© 2026 International Journal of Computer Techniques (IJCT). All rights reserved.

Submit Your Paper