ArKanjo: a tool for detecting function-level Code Duplication in the Linux Kernel
Speaker: David Tadokoro
Track: Academic
Type: Academic paper
Room: Petit amphi
Time: Jul 15 (Tue): 15:30
Duration: 0:20
The Linux kernel’s massive scale (+28 M LoC, +20 K contributors) presents unique maintenance challenges. Surprisingly, Code duplication remains a persistent issue in the kernel’s codebase, which could hinder its evolution and patching. Academic approaches often focus on pairwise comparison of code artifacts, not directly applied for comprehensive codebase analyses. Other existing free software tools explored in practice frequently suffer from limited functionality, such as primitive textual matching, prove too narrow in scope, or fail to deliver effective results on complex, large-scale codebases. Existing solutions generally fail to address the Linux kernel’s specific needs: (1) scalability to handle its size, (2) actionable results for developers, and (3) integration with kernel development workflows. This paper presents ArKanjo, a novel command-line tool for Linux kernel maintenance designed to detect and analyze function-level duplications. Released under the MIT license, ArKanjo employs a two-stage architecture consisting of a Preprocessor and a Query Responder that separates computationally intensive analysis from efficient querying for duplications within large codebases. Pivotal advantages of ArKanjo over existing solutions include: (1) optimization for C codebases with kernel-specific patterns; (2) preprocessing that enables rapid queries without redundant analysis; and (3) prioritization of duplicates that impact maintainability, such as copied buggy logic. We evaluate Arkanjo against real-world duplication cases in recent kernel versions, demonstrating its effectiveness in identifying problematic clones that generic tools often overlook. By identifying well-defined, manageable duplication instances, ArKanjo effectively lowers the barrier for new contributors, a capability evidenced by its role in guiding students to make their first code improvements to the kernel. ArKanjo offers immediate value to kernel maintainers and serves as a replicable model for clone detection in other large-scale free software projects.