Discover millions of ebooks, audiobooks, and so much more with a free trial

Only $11.99/month after trial. Cancel anytime.

Understanding Checksums and Cyclic Redundancy Checks
Understanding Checksums and Cyclic Redundancy Checks
Understanding Checksums and Cyclic Redundancy Checks
Ebook723 pages6 hours

Understanding Checksums and Cyclic Redundancy Checks

Rating: 0 out of 5 stars

()

Read preview

About this ebook

Understanding Checksums and Cyclic Redundancy Checks:
This book gives practical, comprehensive answers to common questions about checksums and CRCs. Descriptions are based mainly on intuitive rather than mathematical explanations for both algorithmic operation and limitations, improving accessibility to non-specialists. Coverage includes single-sum checksums, dual-sum checksums (Fletcher checksum, DualX, DualXP), the new Koopman checksum, Cyclic Redundancy Checks (CRCs), and system-level usage considerations.

For decades much of the practical use of checksums and CRCs was based, at least in part, on folklore. This book provides a solid, comprehensive foundation for addressing core issues such as the comparative fault detection effectiveness of each technique, insight into speed differences, intuitive explanations for how speed-up techniques work, which CRC polynomial you should use for any particular situation, and source code examples for each approach.

This is the most comprehensive treatment of checksums and CRCs to date. The emphasis is on intuitive explanations and empirical validation of insights for practical application of these techniques to addressing real-world problems.

Chapters:
1. Introduction
2. Decimal checksum examples
3. Checksum operation and terminology
4. Checksum fault model
5. Single-sum checksums
6. Dual-sum checksums (Fletcher, Adler, DualX)
7. Koopman checksum
8. Checksum plus parity for HD=4 (KoopmanP, DualXP)
9. Cyclic Redundancy Check (CRC)
10. CRC effectiveness
11. Other checksum considerations
12. System-level considerations
13. Resources
14. Conclusions
15. Appendix: Good CRC Polynomials
16. Glossary

Includes example source code in C for each type of checksum.
This e-book is best viewed in color, with some color-coded text and dozens of color illustrations.

LanguageEnglish
Release dateMar 20, 2024
ISBN9798224117970
Understanding Checksums and Cyclic Redundancy Checks
Author

Philip Koopman

Prof. Philip Koopman is an internationally recognized expert on Autonomous Vehicle (AV) safety whose work in that area spans over 25 years. He is also actively involved with AV policy and standards as well as more general embedded system design and software quality. His pioneering research work includes software robustness testing and run time monitoring of autonomous systems to identify how they break and how to fix them. He has extensive experience in software safety and software quality across numerous transportation, industrial, and defense application domains including conventional automotive software and hardware systems. He was the principal technical contributor to the UL 4600 standard for autonomous system safety issued in 2020. He is a faculty member of the Carnegie Mellon University ECE department where he teaches software skills for mission-critical systems. In 2018 he was awarded the highly selective IEEE-SSIT Carl Barus Award for outstanding service in the public interest for his work in promoting automotive computer-based system safety. In 2022 he was named to the National Safety Council's Mobility Safety Advisory Group. He is the author of the books: Better Embedded System Software (2010), How Safe is Safe Enough: measuring and predicting autonomous vehicle safety (2022), and The UL 4600 Guidebook (2022).

Read more from Philip Koopman

Related to Understanding Checksums and Cyclic Redundancy Checks

Related ebooks

Software Development & Engineering For You

View More

Related articles

Reviews for Understanding Checksums and Cyclic Redundancy Checks

Rating: 0 out of 5 stars
0 ratings

0 ratings0 reviews

What did you think?

Tap to rate

Review must be at least 10 words

    Book preview

    Understanding Checksums and Cyclic Redundancy Checks - Philip Koopman

    Preface

    Checksums and Cyclic Redundancy Checks (CRCs) have been around for a long time. But surprisingly, it has taken decades to really understand them.

    There is a significant divide between mathematical treatments of the topic and practical applications to networking and lightweight data integrity checks. This book comes down firmly on the side of practical implementation, and intentionally minimizes the mathematical aspects. There are plenty of places to read about the math. My goals here are to build non-mathematical intuition and emphasize empirical validation of insights.

    An important realization I had early on was that a purely mathematical approach does not necessarily get you where you need to go on this topic. The math is seductive. But it tends to run out of steam just when things get interesting, and in particular does so for CRCs with a Hamming Distance of greater than 4 (that is when you want to be able to guarantee detection of 4 or more bit faults in a single network message or data structure). So I turned to a brute force evaluation approach that depended on dramatic algorithmic optimizations. Despite daunting odds that it would ever be efficient enough to be useful, a brute-force approach with suitable optimizations turned out to be the key to unlocking significant new discoveries and insights.

    I’ve spent many hours puzzling through coding theory books. While I can understand what they say, my brain has never really worked math-first for this topic – it is more math-last I guess! I build intuition, and often see the bit movement patterns in my head before the math makes sense. Not everyone thinks that way, and certainly that is not how the vast majority of CRC and checksum literature is written. But it turns out that this way of thinking has been helpful for achieving a more comprehensive understanding of the practical aspects of checksums and CRCs.

    Along the way, I have occasionally noted and commented upon mistakes, typos, or limitations of previous work. It is important to recognize that creating anything defect-free in this field is immensely difficult. Indeed, I have been hesitating to press the publish button for this book because I know there will inevitably be mistakes here too. Nonetheless, knowledge improves over time by identifying and correcting previous over-simplifications, false paths, gaps, and outright mistakes. I hope that this book provides a solid step in that direction.

    Any book must have limits to its scope. For this book the topic is generally limited to checksums (including CRCs) used for fault detection. This leaves out error correction codes, and also codes more complex than checksums and CRCs. Other books for the most part quickly gloss over the mundane topic of checksums to concentrate on more complex and sophisticated error codes. However, in many practical applications checksums or CRCs are the right tool for the job, so this book concentrates on them rather than fancier coding theory. Hash functions, which might be thought of as cousins to checksums, are similarly out of scope.

    One checksum I decided not to dig into deeper is the Weighted Sum Codes approach (Feldmeier 1995). While it is said to offer an interesting speed vs. fault detection tradeoff point, the computational algorithm is more complex than the codes I consider, and I have not seen any practical adoptions. So that is left as an area of future study for the reader.

    A topic area I discuss with a curtailed scope is proper vs. improper CRCs (see section 10.10). What I found on that topic raised more questions than it answered for practical applications, primarily in the area of what happens with bit error rates might be greater than 50%.¹ Again, this is left as an area of future study.

    For printing production I decided to go with color, because the graphs are very challenging to do well in black and white. That makes the printed version a bit more expensive (and for the hard cover edition, the raw printing cost is much more expensive). But let’s face it – this topic is not cut out to be a best seller beach read for the masses! So I’m going for presentation quality rather than cheaper printing cost. Additionally, I expect many copies of this book will be distributed as an e-book edition, where color just costs some file size. (As an aside, this is not a free book. If you didn’t pay for your copy, then you are reading an illicit copy. Please go buy a legitimate copy from the e-book store in the nearest convenient country. Thanks!)

    The journey to finally being in a position to write this book was a long one. I give grateful acknowledgement to students, colleagues, and others who helped along the way with research in this area, including: Tridib Chakravarty, Kobey DeVale, Kevin Driscoll, Brendan Hall, Beth Osyk, Theresa (Theta) Maxino, Jennifer Black, Michael Paulitsch, Justin Ray, and Eushiuan Tran. Additional contributions came from: Mark Adler, Bob DiSilvestro, Berthold Gick, Hubert Kirrmann, and Pierre Zuber. I appreciate the additional personal support from Charles Kilgore, Barbara Lingberg, and Priya Narasimhan.

    Some of the earliest work on this topic using an empirical evaluation approach was done by Dr. G. Funk of ASEA-Brown Boveri (ABB). His ABB colleagues pointed me toward his work early in my investigation of this topic, and it had a large impact on my thinking.

    While writing this book itself did not have research sponsorship, previous related research did, so I want to acknowledge supporters again from those previous research projects that paved the way: the General Motors Collaborative Research Laboratory at Carnegie Mellon University, Robert Bosch GmbH, Bombardier Transportation, Honeywell, the Federal Aviation Administration, the Office of Naval Research, US Department of Transportation, and the Pennsylvania Infrastructure Technology Alliance (PITA). Earlier work also benefitted from equipment grants from Digital Equipment Corporation and Intel. The opinions and findings in this book are entirely my own, and do not reflect upon those historical research sponsors.²

    Comments and corrections are welcome via email. Thanks to Steve Simpson for catching typos.

    I maintain an updated zoo of good CRC polynomials here:

    https://users.ece.cmu.edu/~koopman/crc/

    I also have a checksum and CRC blog with occasional entries. Look for an entry about this book that has errata and a clickable list of all web pointers to save having to try to type them in for those of you reading the print version:

    https://checksumcrc.blogspot.com/p/book.html

    Checksums and CRCs occupy a peculiar niche in the world of computer dependability. The classic techniques are decades old and ubiquitous. Yet they are arguably under-studied for such a core part of our computing infrastructure, and use of them is often (but decreasingly) based on folklore rather than solid engineering tradeoffs. I hope that publishing this book is a useful step in the direction of improving the practice of deploying appropriate and effective checksums and CRCs.

    Philip Koopman

    Pittsburgh, PA, February 2024

    koopman@cmu.edu

    Sun Flowers

    1. Introduction

    The vast majority of data transferred between computers of all types has at some point been protected from potential accidental corruption by a checksum or Cyclic Redundancy Check (CRC).

    Any data transmitted on a computer network or placed into storage is subject to some level of corruption. Over time the bits just rot so to speak. The probability for any particular bit being corrupted is small, but the risk adds up as the amount of data and exposure to potential sources of corruption increases. A number stored in memory comes back altered when it is read later. A message sent at one end of a network comes out a different message at the other end. This tendency toward data corruption applies to all data values, whether numbers, text, photos, videos, music, or otherwise.

    For the most part, data corruption and methods to mitigate its effects are invisible to end users. But those who build any computer system need to choose the right mitigation method and implement it in a robust way to achieve a happy outcome of apparent fault-free operation.

    The area of research and practice known as coding theory³ arose to deal with this issue of data corruption and related matters. This book does not attempt to be yet another book that tries to broadly cover that complex and vast topic. Rather, this book limits its coverage to the commonly used – but too often under-discussed – area of checksums and CRCs.

    Our scope covers variations on the following algorithms:

    Modular addition checksums (including parity computations)

    Dual-sum checksums (including Fletcher and Adler checksums), including an extended block size DualX version

    Parity versions of DualX and Koopman checksums

    The Koopman checksum, which is a new point in the complexity/error detection performance tradeoff space based on a pure modular reduction operation

    Cyclic Redundancy Checks (CRCs)

    Modular addition checksums, dual-sum checksums, and CRCs are the workhorse error detection codes widely used in different computing applications. Understanding the tradeoffs is especially important for embedded computer system designers, who struggle with meager resources in both computing power and bandwidth while creating highly reliable products.

    While checksums and CRCs are typically dismissed in a few pages of a deeply theoretical textbook, there is more to it than is typically discussed in such treatments.⁵ More importantly, that extra information can easily make the difference between a checksum or CRC that is effective for error detection, and one that is not.

    There is plenty of misinformation on what makes an effective checksum or CRC that has been circulated over the decades.⁶ All of it is well-intentioned, but much of it is incorrect nonetheless. Some of the issues are just typos, but even a single incorrect bit value can change an excellent CRC into a terrible one. Some are just advice that seemed reasonable in a previous era given a lack of robust quantitative analysis results, but have turned out to be inaccurate. And some of it is just engineers frustrated at a lack of solid research results doing the best they can given insufficient information.

    This book strives to provide comprehensive, detailed information that is generally applicable to a wide variety of uses for both checksums and CRCs. Moreover, it does so in a way that is intentionally less mathematical than typical coding theory texts so that it might serve a much broader audience of practicing engineers. Think of it as intuition-building with a software-forward instead of a math-forward approach.

    Specific design choices made in creating this book include:

    Using the least amount of mathematical notation that is practical, consistent with describing what is essentially a mathematical computation. This is enforced by limiting math to what can be expressed readily without using an equation editor. This means some of the mathematical rigor and proofs typically found in a coding theory approach are missing. But see the next point…

    Emphasizing a description-by-example approach rather than a mathematical proof approach. This makes the material accessible to a wider audience. If you already understand the math from coding theory treatments, there is still material here for you in terms of practical application. And you might find that the insight-based rather than mathematical-based treatments still provide value.

    Describing various topics example-first, then generalizing. This can make concepts more concrete than a math-first approach to many readers.

    Using an empirical evaluation approach to selecting moduli rather than a theoretical approach.⁷ This has been the key to unlocking deeper insights for both checksums and CRCs.

    Emphasizing the contributions of each specific number of fault bits (e.g., 2-bit, 3-bit, 4-bit, … faults) rather than varying the probability of random bit faults (bit error ratio) in analysis. We have found this provides useful insights.

    Conforming to established terminology as much as possible, while attempting to smooth the seams where terminology from diverse disciplines leads to conflicting terms. For example, one might say there is a bit fault or a bit error in a particular datum depending on technical background and point of view, but mean the same thing.

    To ensure we are not accused of complete heresy regarding the lack of fancy equations, indeed there will be a discussion of polynomial division regarding CRCs. But that is just the start of the discussion, not the end. And the emphasis is on intuition-building and algorithms, not rigorous mathematical notation.

    There will still need to be some math, and we assume the reader is familiar with basic probability calculations.⁸ Ultimately, evaluating checksums and CRC fault detection effectiveness is a probabilistic numbers game, so a minimum amount of math is required. That and long division.

    This section is a warm-up of terminology to prepare the reader for the introduction to checksum concepts in chapter 2. We revisit terminology in more detail in chapter 3. Also see the glossary at the end of the book if you need to quickly check the definition of a term.

    The basic arrangement of a checksum or CRC calculation as it relates to data in memory or a network message is shown in figure 1.1.


    Figure 1.1. Simplified representation of a code word that contains both a data word (data to be protected) and a check value that is the result of a checksum or CRC computation.


    A data wordis the data we want to safeguard via using a checksum. It might be one bit, or it might be a megabyte or more of data.¹⁰ The idea is that it is some set of data that we want to ensure has not been corrupted. To be clear, for our purposes we cannot prevent it from being corrupted. But we can try our best to ensure that corrupted data does not sneak through unnoticed to create incorrect computational results.

    A check valueis computed by running the data word through a checksum or CRC calculation. This mixes and condenses the data word into a resultant check value that is often 8, 16, or 32 bits in size. The check value is computed in a way that we hope makes detecting any corruption likely. But no such calculation can be perfect at detecting all possible faults. In particular, there will always be some fault patterns in the data word (potentially combined with faults in the check value) that in effect cancel each other out and result in an undetectably corrupted data word.

    A code wordis the combination of a data word concatenated with a check value. The entire code word is handled as a single entity at the system level. For example, a single network message is likely to contain a code word. When a code word is received, the CRC or checksum algorithm is run again to see if the data word still matches the expected check value. If not, there has been a detected fault (i.e., the code word is corrupted, and that corruption has been detected). If they do match then either (1) there has been no corruption, or (2) there has been a corruption severe enough to be beyond the ability of the checksum or CRC algorithm to detect. Note that we consider corruption of the entire code word, not just corruption of the data word, because various causes of corruption at the bit-by-bit level generally do not know the structure of the code word – they see a string of bits the recipient happens to decide to treat as a particular type of code word.

    The Hamming Distance is the smallest number of bit faults that must occur to corrupt a code word to the point that it is potentially undetectable by a particular checksum or CRC. For example, HD=4 means Hamming Distance of 4, in which all 1-, 2- and 3-bit faults are detected, but some 4-bit faults are not.

    There are significant tradeoffs regarding the computational cost vs. fault detection capability of any particular checksum or CRC algorithm. In general, more computational cost gives better fault detection, but more sophisticated approaches have limits to the maximum size data word for which they are effective. Exploring that tradeoff space is a primary goal of this book.

    There are two basic mechanisms for checksums discussed in this book.

    Modular addition checksum. This computes a sum of blocks of data across the length of the data word to produce a check value. For example, a checksum might be a two’s complement 32-bit addition of all 32-bit values in a data word. The modular part comes into play in that a modulo operation (remainder after division) is performed after each addition so that the result fits within the defined check value size, even if the sum would otherwise be too big to fit. The modular addition mechanism is used by both single-sum and dual-sum checksums.

    Modular reduction checksum.  In this approach, the entire data word is treated as a single large number that is divided by some modulus. The check value is the remainder after division. This mechanism is used by pure modular reduction, Koopman checksums, and CRCs.

    For both modular addition and modular reduction, there are two possible types of mathematical operations used for the addition and division operations: normal integer arithmetic for most checksums, and polynomial operations with 1-bit coefficients for CRCs. The details of each will be described as we go.

    Beyond the scope of this book are other types of error codes. They certainly have their uses. To understand those, have a look at a more traditional book on coding theory, with several of the usual suspects for that listed in the resources chapter. We’re here to take a deep dive into checksums and CRCs.

    The goal of this book is to be an accessible guide for practicing engineers. The style reflects this.

    Many of the references are to Wikipedia. Wikipedia is not a universally authoritative source. However, in our experience, the topics we refer to are pretty reasonable in content and style. Wikipedia provides references to useful, often authoritative sources for anyone who wants to dig further into a particular topic. Just as importantly, much of the work in this area is buried behind academic publication paywalls and might not be available to many readers. So we refer to Wikipedia in most cases to ensure that backup information is readily availability, with the caution that the reader should be wary of any potential inaccuracies that might affect some particular point.¹¹

    In some places, the recommendations in this book are different than those found in our earlier publications. That does not mean the earlier publications are wrong (unless we note that explicitly). Rather, it means that after decades of studying this topic we have even better advice to offer. In particular, CRC polynomials previously recommended still perform as advertised at the time. But in a few cases, we have decided that recommendations can be changed to be even better.

    For those who want to dive deeper, Chapter 13 has a list of resources that have informed this work and are worth looking at for more details, especially regarding mathematical and theoretical topics that we do not cover in depth.

    Here is a summary of our coverage:

    Chapter 2 gives some decimal checksum examples, including some examples in use in the real world such as the ISBN 10 book numbering system and off-line credit card number validation. This serves as a gentle introduction to the concepts used in more general checksum examples. It also provides a quick survey of some practical checksum uses beyond binary numeric applications.

    Chapter 3 gives a more thorough set of terminology and definitions. It also presents a generic description of how a checksum/CRC assures data integrity.

    Chapter 4 covers fault models. If we are going to talk about how well any particular algorithm can detect faults, we need a solid basis for understanding what different fault characteristics might be detected. This includes a discussion of the notion of a Hamming Distance (HD), which is a key attribute of fault detection effectiveness.

    Chapter 5 covers single-sum checksums, which are what people often mean when they are mentioning a checksum without being more specific. A single running sum is computed across bytes, 32-bit integers, or other blocks of the data word to create a check value.

    Chapter 6 covers dual-sum checksums. These use a pair of cascaded running sums across the data word to provide better fault detection compared to single-sum checksums. The Fletcher and Adler checksums are examples of this technique. We introduce DualX checksums, which have an expanded block size. DualX checksums provide more capable fault detection and can run significantly faster on larger processors.

    Chapter 7 covers the Koopman checksum, which is a novel approach to computing a modular reduction operation that involves reformulating it to be a modular addition loop. This provides superior fault detection effectiveness for larger data words than a dual-sum checksum while still making use of any native CPU support for integer division. It is for practical purposes as effective as a particular class of CRCs (HD=3) for data word lengths of up to about half as long as a corresponding good CRC with the same check value size, but with much faster computational speed.

    Chapter 8 covers variations of DualX and Koopman checksums that include a parity bit in the check value. This provides fault detection effectiveness at HD=4, albeit for data word lengths approximately half as long as their non-parity equivalents.

    Chapter 9 explains how Cyclic Redundancy Check (CRC) computations work, including both left-shift and right-shift algorithms, as well as the table lookup computational speedup technique.

    Chapter 10 examines CRC effectiveness, which is independent of which CRC computation algorithm is being used.

    Chapter 11 covers loose ends that have arisen in previous chapters: the effect of data values on fault detection effectiveness, whether large moduli are better, what happens for large numbers of bit faults, and effectiveness for burst faults rather than random independent single-bit faults.

    Chapter 12 discusses practical system-level considerations for using checksums and CRCs.

    Chapter 13 is an annotated list of resources that might prove useful to readers interested in exploring further.

    Chapter 14 is a brief conclusion.

    The Appendix lists good general-purpose CRC polynomials based on CRC size, data word length, and maximum achievable HD. The polynomials listed provide one-stop shopping for good CRC polynomial selection.

    A Glossary has concise definitions for bold-faced terms defined throughout the book.

    Figure 1.2 is a preview of checksum and CRC performance for some of the checksum algorithms discussed in later chapters. The vertical axis is the probability of undetected fault Pud at a particular bit error rate. (Pud is the probability of an undetected faulty code word at a particular assumed bit error rate. The details of what that might mean are discussed in chapter 4. For now, up is bad; down is good; and the vertical axis is on a log scale.)

    With a span of many orders of magnitude of Pud performance between the most and least effective checksums on this chart, there is plenty of room for trading off better Pud vs. algorithmic complexity and computation speed. The Simple curves are notional references for an idealized checksum operating at the indicated HD. The balance of the book revisits all these curves with explanations, so this is just a preview of things to come.


    graphs

    Figure 1.2. Survey of Pud for selected checksums. 16-bit check values; BER=10-⁶


    Speed differences are discussed in section 12.1, with the general finding that CRCs are slowest, but provide the best fault detection capabilities. Checksums that are both increasingly faster but less capable are, in order: KoopmanP, Koopman, DualXP, DualX, and single sum/parity. DualX (e.g., Fletcher and Adler checksums) are in many cases both slower and less capable than the corresponding DualX checksum alternatives.

    Chapter 2 covers decimal digit checksums while subsequent chapters switch to computer-friendly binary number-based checksum algorithms.

    2. Decimal checksum examples

    In this chapter, we work some examples of decimal digit checksums to illustrate the general ideas that come into play. Subsequent chapters will revisit these ideas in a more rigorous way using binary integers.

    The emphasis here is on building insight to ground the more detailed discussions that follow. The examples are based on decimal notation that is generally suitable for manual data entry and, at least in principle, pencil-and-paper arithmetic.¹²

    A classic use of a checksum is detecting mis-typed numeric data entered via a keyboard. If a person is asked to enter a long string of arbitrary numbers, there is a chance they will make some sort of typographical error doing so. Some problems may come from pressing the wrong key, while others come from visually getting lost in the long string of digits. Swapping the order of pairs of adjacent digits during their trip through a person’s short-term memory is an especially common fault to guard against.

    There are notational tricks to help with the visual digit recognition task, such as breaking phone numbers into groups of digits. For example, using the North American phone numbering system, 4125550123 might be more difficult to interpret visually than 412.555.0123.¹³

    However, typos will still happen. To help detect such typos, we can use a checksum.

    Given a data word of a 10-digit phone number, the most straightforward checksum calculation is to add the digits of a number, taking only the lowest digit of the result to form a one-digit check value (algorithm 2.1). For a running example of 412.555.0123, we would get:

    4 + 1 + 2 + 5 + 5 + 5 + 0 + 1 + 2 + 3 = 28 checksum of 8

    We’ll note that taking only the lowest digit is mathematically equivalent to taking the remainder after division by 10, or computing the sum modulo 10 (often written as mod 10).¹⁴

    We could then append this value 8 as a check digit to the number, creating a more robust number (a code word): 412.555.0123.8. The point of doing this is that if anyone enters this number with wrong-digit typographical fault, there is a reasonable chance the check digit (the check value) will not match the phone number (the data word), and the fact that there has been a typo can be detected.


    Algorithm 2.1 Decimal addition mod 10.


    With a checksum-based phone number, rather than bothering someone by reaching a wrong number, the caller can get a system error most of the time¹⁵ and try again. For example, let’s say someone mis-wrote the above phone number as: 412.556.0123.8 (the underline is there for the reader, but the person making a call would not know there was a 5⇒"6" typo). When using this number to make a call, the phone system would compute the checksum of the first 10 digits:

    Intended number: 412.555.0123.8

    Faulty number:    412.556.0123.8

    Faulty number: 4 + 1 + 2 + 5 + 5 + 6 + 0 + 1 + 2 + 3 = 29

    29 mod 10 computed checksum of 9

    That computed checksum "9 could be compared to the dialed checksum (the final 8" in the dialed number) to determine if there must be some typo in the number somewhere, making the mis-typed phone number invalid.

    For this example, we have the following situation, in which dialing a wrong number successfully would require two digits to be incorrect, resulting in a check digit that happens to match up with the main phone number:

    412.555.0123.8 valid phone number

    412.556.0123.8 invalid checksum means invalid number

    412.556.0123.9 some other phone number that is valid

    412.546.0123.8 some other phone number that is valid

    Given those examples, we can infer some fault detection properties of this checksum approach:

    Any single-digit fault (any incorrect single decimal digit within the number, including the check digit) will be detectable. There are two cases:

    If the digit fault is in the main phone number, it will give an incorrect sum that does not match the check value digit at the end of the phone number.

    If the check digit is incorrect, it will not match the sum of the correct phone number digits.

    Many, but not all, two-digit faults will be undetectable. There are two cases of undetectable faults:

    A one-digit change in the phone number is paired with a change to the check value, with that pair resulting in a correct checksum match. In the example above, 412.556.0123.9 might be an intended legitimate phone number, or might be the result of changing 5 to 6 as well as a second change of the check digit 8 to 9, which results in a check value that matches the computed checksum result.

    A two-digit change in the phone number in which those two digits happen to result in the same sum. In the example above, 412.546.0123.8 has a valid checksum, but that might be due to corrupting 555 to 546, which has the same net effect on the checksum, giving an overall checksum result of 8.

    Most multi-digit faults will be detectable. In general, entering a completely random 11-digit number has only a 10% chance of forming a valid phone number. As a simple thought experiment, if you choose a 10-digit primary phone number at random, then choose a random check digit, there is a 1 in 10 chance that the check digit just happens to match whatever one of the ten possible sums was created based on the primary digits.

    No ordering faults (digits with different positions) are detectable. If you change the order of the digits, the sum is the same. This is a significant weakness in practice, because reversing the order of adjacent digits is a somewhat common data entry fault. (In the example above, 412.555.1023.8 is a valid phone number, but might be the result of a transposition fault of 0123 to 1023.)

    An important fault detection characteristic is that this checksum algorithm is especially good at detecting single-digit faults (100% detection). If we think that in practice single-digit faults are the most common, this is an attractive property, even if the average performance across more random faults is only a 90% fault detection rate.

    Why modern phone numbers don’t have check digits is not obvious, but might have its roots in the evolution of phone numbers. Early phone systems used word/number pairs, presumably to aid in the ability to keep the whole number in a person’s short-term memory.¹⁶ Adding a check digit would have increased the burden of dealing with numbers, and would likely not have had any practical benefit until automated switching was pervasive – by which time the older numbering system would have been well established. Even though most North American calls now require a full 10 digits, with an eleventh digit potentially being a minor incremental burden, the legacy numbering system is far too entrenched to add check digits at this point.¹⁷

    An alternate approach to creating a checksum is to use division instead of addition as the primary operation. This is obviously not really a sum and feels more like a hash function,¹⁸ but can serve the purpose of a checksum in some applications, and lays important groundwork for later discussions.

    For a simple division-based checksum, the data value is divided by some fixed number, and the remainder of that division is used as the checksum value. Taking a remainder after division is called modular reduction (also known as a modulo operation) with the number being used as the divisor being the modulus.

    For example, if we use a divisor of 9, we get:

    4125550123 mod 9 = 1 ⇒ checksum value of 1

    In this system, a valid phone number would be treated as a single large integer for the checksum operation, with an added checksum digit that is the mod 9 result of "1", giving: 412.555.0123.1

    While this approach works moderately well, it has three important limitations:

    It is more complex to compute than simple addition.

    It only uses nine out of the ten possibilities for the check digit (a mod 9 computation can only generate a check digit of {0..8}, with the value 9 being unused). This in turn means that its average effectiveness for randomly corrupted values is only 8/9 = 88.9% instead of the 90% for decimal digits with a range of {0..9}. That is a small difference, but a difference nonetheless.

    It misses some single-digit corruptions. In particular, it will miss any 0 changed to a 9, or any 9 changed to a 0, because either digit’s contribution to the remainder is zero. For example, 4125559123 mod 9 = 1, which is unchanged even though the 0 has been corrupted to a 9.

    So far, we

    Enjoying the preview?
    Page 1 of 1