The Joys of Hashing: Hash Table Programming with C

Ebook238 pages1 hour

The Joys of Hashing: Hash Table Programming with C

Name: The Joys of Hashing: Hash Table Programming with C
Author: Thomas Mailund
ISBN: 9781484240663

By Thomas Mailund

Rating: 0 out of 5 stars

()

Read preview

About this ebook

Build working implementations of hash tables, written in the C programming language. This book starts with simple first attempts devoid of collision resolution strategies, and moves through improvements and extensions illustrating different design ideas and approaches, followed by experiments to validate the choices.
Hash tables, when implemented and used appropriately, are exceptionally efficient data structures for representing sets and lookup tables, providing low overhead, constant time, insertion, deletion, and lookup operations.
The Joys of Hashing walks you through the implementation of efficient hash tables and the pros and cons of different design choices when building tables. The source code used in the book is available on GitHub for your re-use and experiments.
What You Will Learn

Master the basic ideas behind hash tables
Carry out collision resolution, including strategies for handling collisions and their consequences for performance
Resize or grow and shrink tables as needed
Store values by handling when values must be stored with keys to make general sets and maps

Who This Book Is For
Those with at least some prior programming experience, especially in C programming.

Skip carousel

Programming

LanguageEnglish

PublisherApress

Release dateFeb 9, 2019

ISBN9781484240663

Author

Thomas Mailund

Related to The Joys of Hashing

Related ebooks

Skip carousel

Introducing Algorithms in C: A Step by Step Guide to Algorithms in C
Ebook
Introducing Algorithms in C: A Step by Step Guide to Algorithms in C
byLuciano Manelli
Rating: 0 out of 5 stars
0 ratings
R for SAS and SPSS Users
Ebook
R for SAS and SPSS Users
byRobert A. Muenchen
Rating: 4 out of 5 stars
4/5
R for Marketing Research and Analytics
Ebook
R for Marketing Research and Analytics
byChris Chapman
Rating: 0 out of 5 stars
0 ratings
Instant Heat Maps in R How-to
Ebook
Instant Heat Maps in R How-to
bySebastian Raschka
Rating: 0 out of 5 stars
0 ratings
Sets, Numbers and Flowcharts
Ebook
Sets, Numbers and Flowcharts
byWilliam R. Parks
Rating: 0 out of 5 stars
0 ratings
Statistical Analysis of Network Data with R
Ebook
Statistical Analysis of Network Data with R
byEric D. Kolaczyk
Rating: 2 out of 5 stars
2/5
Pointers in C Programming: A Modern Approach to Memory Management, Recursive Data Structures, Strings, and Arrays
Ebook
Pointers in C Programming: A Modern Approach to Memory Management, Recursive Data Structures, Strings, and Arrays
byThomas Mailund
Rating: 0 out of 5 stars
0 ratings
Coding for beginners The basic syntax and structure of coding
Ebook
Coding for beginners The basic syntax and structure of coding
byDiamond Moore
Rating: 0 out of 5 stars
0 ratings
Beginning Mathematica and Wolfram for Data Science: Applications in Data Analysis, Machine Learning, and Neural Networks
Ebook
Beginning Mathematica and Wolfram for Data Science: Applications in Data Analysis, Machine Learning, and Neural Networks
byJalil Villalobos Alva
Rating: 0 out of 5 stars
0 ratings
Practical Java Machine Learning: Projects with Google Cloud Platform and Amazon Web Services
Ebook
Practical Java Machine Learning: Projects with Google Cloud Platform and Amazon Web Services
byMark Wickham
Rating: 0 out of 5 stars
0 ratings
Ace the Trading Systems Developer Interview (C++ Edition) : Insider's Guide to Top Tech Jobs in Finance
Ebook
Ace the Trading Systems Developer Interview (C++ Edition) : Insider's Guide to Top Tech Jobs in Finance
byDennis Thompson Sr
Rating: 5 out of 5 stars
5/5
Spatial Statistics Illustrated
Ebook
Spatial Statistics Illustrated
byLauren Bennett
Rating: 5 out of 5 stars
5/5
Python for Marketing Research and Analytics
Ebook
Python for Marketing Research and Analytics
byJason S. Schwarz
Rating: 0 out of 5 stars
0 ratings
From Big Data to Intelligent Data: An Applied Perspective
Ebook
From Big Data to Intelligent Data: An Applied Perspective
byFady A. Harfoush
Rating: 0 out of 5 stars
0 ratings
Numerical Methods for Scientists and Engineers
Ebook
Numerical Methods for Scientists and Engineers
byRichard Hamming
Rating: 4 out of 5 stars
4/5
Manager’s Guide to SharePoint Server 2016: Tutorials, Solutions, and Best Practices
Ebook
Manager’s Guide to SharePoint Server 2016: Tutorials, Solutions, and Best Practices
byHeiko Angermann
Rating: 0 out of 5 stars
0 ratings
Just Enough R: Learn Data Analysis with R in a Day
Ebook
Just Enough R: Learn Data Analysis with R in a Day
bySivakumaran Raman
Rating: 4 out of 5 stars
4/5
Using R for Biostatistics
Ebook
Using R for Biostatistics
byThomas W. MacFarland
Rating: 0 out of 5 stars
0 ratings
Developing Analytic Talent: Becoming a Data Scientist
Ebook
Developing Analytic Talent: Becoming a Data Scientist
byVincent Granville
Rating: 3 out of 5 stars
3/5
Python: Advanced Guide to Programming Code with Python: Python Computer Programming, #4
Ebook
Python: Advanced Guide to Programming Code with Python: Python Computer Programming, #4
byCharlie Masterson
Rating: 0 out of 5 stars
0 ratings
Learn Data Science Using SAS Studio: A Quick-Start Guide
Ebook
Learn Data Science Using SAS Studio: A Quick-Start Guide
byEngy Fouda
Rating: 0 out of 5 stars
0 ratings
Practical Text Analytics: Maximizing the Value of Text Data
Ebook
Practical Text Analytics: Maximizing the Value of Text Data
byMurugan Anandarajan
Rating: 0 out of 5 stars
0 ratings
Biostatistics and Computer-based Analysis of Health Data using R
Ebook
Biostatistics and Computer-based Analysis of Health Data using R
byChristophe Lalanne
Rating: 0 out of 5 stars
0 ratings
Simple Data Science (R)
Ebook
Simple Data Science (R)
byNarayana Nemani
Rating: 5 out of 5 stars
5/5
Geometric Hashing: Efficient Algorithms for Image Recognition and Matching
Ebook
Geometric Hashing: Efficient Algorithms for Image Recognition and Matching
byFouad Sabry
Rating: 0 out of 5 stars
0 ratings
Python: Advanced Guide to Programming Code with Python
Ebook
Python: Advanced Guide to Programming Code with Python
byCharlie Masterson
Rating: 0 out of 5 stars
0 ratings
Visualizing Data Structures
Ebook
Visualizing Data Structures
byRhonda Hoenigman
Rating: 0 out of 5 stars
0 ratings
Computer Programming: A Simplified Entry to Python, Java, and C++ Programming for Beginners
Ebook
Computer Programming: A Simplified Entry to Python, Java, and C++ Programming for Beginners
byLena Neill
Rating: 0 out of 5 stars
0 ratings
Computer Algebra: Fundamentals and Applications
Ebook
Computer Algebra: Fundamentals and Applications
byFouad Sabry
Rating: 0 out of 5 stars
0 ratings
Statistics for Ecologists Using R and Excel: Data Collection, Exploration, Analysis and Presentation
Ebook
Statistics for Ecologists Using R and Excel: Data Collection, Exploration, Analysis and Presentation
byMark Gardener
Rating: 3 out of 5 stars
3/5

Programming For You

Skip carousel

Game Development with Unreal Engine 5: Learn the Basics of Game Development in Unreal Engine 5 (English Edition)
Ebook
Game Development with Unreal Engine 5: Learn the Basics of Game Development in Unreal Engine 5 (English Edition)
byMitchell Lynn
Rating: 0 out of 5 stars
0 ratings
Python Programming : How to Code Python Fast In Just 24 Hours With 7 Simple Steps
Ebook
Python Programming : How to Code Python Fast In Just 24 Hours With 7 Simple Steps
byJason Scotts
Rating: 4 out of 5 stars
4/5
Microsoft Office 365 Bible: 10:1 Mastery | Excel in Your Profession, Enhance Time Management, and Foster Exceptional Collaboration [III EDITION]: Career Elevator
Ebook
Microsoft Office 365 Bible: 10:1 Mastery | Excel in Your Profession, Enhance Time Management, and Foster Exceptional Collaboration [III EDITION]: Career Elevator
byKevin Pitch
Rating: 5 out of 5 stars
5/5
Excel Essentials: A Step-by-Step Guide with Pictures for Absolute Beginners to Master the Basics and Start Using Excel with Confidence
Ebook
Excel Essentials: A Step-by-Step Guide with Pictures for Absolute Beginners to Master the Basics and Start Using Excel with Confidence
byNigel Tillery
Rating: 0 out of 5 stars
0 ratings
Python Machine Learning By Example
Ebook
Python Machine Learning By Example
byYuxi (Hayden) Liu
Rating: 4 out of 5 stars
4/5
Data Science from Scratch: The #1 Data Science Guide for Everything A Data Scientist Needs to Know: Python, Linear Algebra, Statistics, Coding, Applications, Neural Networks, and Decision Trees
Ebook
Data Science from Scratch: The #1 Data Science Guide for Everything A Data Scientist Needs to Know: Python, Linear Algebra, Statistics, Coding, Applications, Neural Networks, and Decision Trees
bySteven Cooper
Rating: 4 out of 5 stars
4/5
Java for Beginners: A Crash Course to Learn Java Programming in 1 Week
Ebook
Java for Beginners: A Crash Course to Learn Java Programming in 1 Week
byBrady Ellison
Rating: 5 out of 5 stars
5/5
Python: For Beginners A Crash Course Guide To Learn Python in 1 Week
Ebook
Python: For Beginners A Crash Course Guide To Learn Python in 1 Week
byTimothy C. Needham
Rating: 4 out of 5 stars
4/5
Excel : The Ultimate Comprehensive Step-By-Step Guide to the Basics of Excel Programming: 1
Ebook
Excel : The Ultimate Comprehensive Step-By-Step Guide to the Basics of Excel Programming: 1
byKevin Clark
Rating: 5 out of 5 stars
5/5
HTML & CSS: Learn the Fundaments in 7 Days
Ebook
HTML & CSS: Learn the Fundaments in 7 Days
byMichael Knapp
Rating: 4 out of 5 stars
4/5
PYTHON: Practical Python Programming For Beginners & Experts With Hands-on Project
Ebook
PYTHON: Practical Python Programming For Beginners & Experts With Hands-on Project
byMark Chan
Rating: 5 out of 5 stars
5/5
HTML & CSS QuickStart Guide: The Simplified Beginners Guide to Developing a Strong Coding Foundation, Building Responsive Websites, and Mastering the Fundamentals of Modern Web Design
Ebook
HTML & CSS QuickStart Guide: The Simplified Beginners Guide to Developing a Strong Coding Foundation, Building Responsive Websites, and Mastering the Fundamentals of Modern Web Design
byDavid DuRocher
Rating: 4 out of 5 stars
4/5
The Advanced Roblox Coding Book: An Unofficial Guide, Updated Edition: Learn How to Script Games, Code Objects and Settings, and Create Your Own World!
Ebook
The Advanced Roblox Coding Book: An Unofficial Guide, Updated Edition: Learn How to Script Games, Code Objects and Settings, and Create Your Own World!
byHeath Haskins
Rating: 5 out of 5 stars
5/5
Python QuickStart Guide: The Simplified Beginner's Guide to Python Programming Using Hands-On Projects and Real-World Applications
Ebook
Python QuickStart Guide: The Simplified Beginner's Guide to Python Programming Using Hands-On Projects and Real-World Applications
byRobert Oliver
Rating: 0 out of 5 stars
0 ratings
Python Programming For Beginners: Learn The Basics Of Python Programming (Python Crash Course, Programming for Dummies)
Ebook
Python Programming For Beginners: Learn The Basics Of Python Programming (Python Crash Course, Programming for Dummies)
byJames Tudor
Rating: 5 out of 5 stars
5/5
Python Programming for Beginners: A Comprehensive Crash Course With Practical Exercises to Quickly Learn Coding and Programming for Data Analysis and Machine Learning
Ebook
Python Programming for Beginners: A Comprehensive Crash Course With Practical Exercises to Quickly Learn Coding and Programming for Data Analysis and Machine Learning
byAnthony Adams
Rating: 4 out of 5 stars
4/5
Grokking Algorithms: An illustrated guide for programmers and other curious people
Ebook
Grokking Algorithms: An illustrated guide for programmers and other curious people
byAditya Bhargava
Rating: 4 out of 5 stars
4/5
Learn JavaScript in 24 Hours
Ebook
Learn JavaScript in 24 Hours
byAlex Nordeen
Rating: 3 out of 5 stars
3/5
Problem Solving in C and Python: Programming Exercises and Solutions, Part 1
Ebook
Problem Solving in C and Python: Programming Exercises and Solutions, Part 1
byYana Kortsarts
Rating: 5 out of 5 stars
5/5
Coding All-in-One For Dummies
Ebook
Coding All-in-One For Dummies
byNikhil Abraham
Rating: 4 out of 5 stars
4/5
SQL QuickStart Guide: The Simplified Beginner's Guide to Managing, Analyzing, and Manipulating Data With SQL
Ebook
SQL QuickStart Guide: The Simplified Beginner's Guide to Managing, Analyzing, and Manipulating Data With SQL
byWalter Shields
Rating: 4 out of 5 stars
4/5
Learn to Code. Get a Job. The Ultimate Guide to Learning and Getting Hired as a Developer.
Ebook
Learn to Code. Get a Job. The Ultimate Guide to Learning and Getting Hired as a Developer.
byGwendolyn Faraday
Rating: 5 out of 5 stars
5/5
Python for Beginners. A Smarter Way to Learn Python in 5 Days and Remember it Longer. With Easy Step by Step Guidance and Hands on Examples. (Python Crash Course-Programming for Beginners)
Ebook
Python for Beginners. A Smarter Way to Learn Python in 5 Days and Remember it Longer. With Easy Step by Step Guidance and Hands on Examples. (Python Crash Course-Programming for Beginners)
byArthur T. Brooks
Rating: 0 out of 5 stars
0 ratings
Python Programming, Deep Learning: 3 Books in 1: A Complete Guide for Beginners, Python Coding for Ai, Neural Networks, & Machine Learning, Data Science/Analysis with Practical Exercises for Learners
Ebook
Python Programming, Deep Learning: 3 Books in 1: A Complete Guide for Beginners, Python Coding for Ai, Neural Networks, & Machine Learning, Data Science/Analysis with Practical Exercises for Learners
byAnthony Adams
Rating: 4 out of 5 stars
4/5
C# Programming from Zero to Proficiency (Beginner): C# from Zero to Proficiency, #2
Ebook
C# Programming from Zero to Proficiency (Beginner): C# from Zero to Proficiency, #2
byPatrick Felicia
Rating: 0 out of 5 stars
0 ratings
Python Data Structures and Algorithms
Ebook
Python Data Structures and Algorithms
byBenjamin Baka
Rating: 5 out of 5 stars
5/5
Expert Python Programming - Third Edition: Become a master in Python by learning coding best practices and advanced programming concepts in Python 3.7, 3rd Edition
Ebook
Expert Python Programming - Third Edition: Become a master in Python by learning coding best practices and advanced programming concepts in Python 3.7, 3rd Edition
byMichał Jaworski
Rating: 0 out of 5 stars
0 ratings
Linux: Learn in 24 Hours
Ebook
Linux: Learn in 24 Hours
byAlex Nordeen
Rating: 5 out of 5 stars
5/5
Programming Arduino: Getting Started with Sketches
Ebook
Programming Arduino: Getting Started with Sketches
bySimon Monk
Rating: 4 out of 5 stars
4/5
The Unofficial Guide to Open Broadcaster Software: OBS: The World's Most Popular Free Live-Streaming Application
Ebook
The Unofficial Guide to Open Broadcaster Software: OBS: The World's Most Popular Free Live-Streaming Application
byPaul Richards
Rating: 0 out of 5 stars
0 ratings

Related podcast episodes

Skip carousel

Alignment Newsletter #173: Recent language model results from DeepMind: Recent language model results from DeepMind
Podcast episode
Alignment Newsletter #173: Recent language model results from DeepMind: Recent language model results from DeepMind
byAlignment Newsletter Podcast
0 ratings
0% found this document useful
Quantifying The Return On Investment For Your Data Team: As businesses increasingly invest in technology and talent focused on data engineering and analytics, they want to know whether they are benefiting. So how do you calculate the return on investment for data? In this episode Barr Moses and Anna Filippova explore that question and provide useful exercises to start answering that in your company.
Podcast episode
Quantifying The Return On Investment For Your Data Team: As businesses increasingly invest in technology and talent focused on data engineering and analytics, they want to know whether they are benefiting. So how do you calculate the return on investment for data? In this episode Barr Moses and Anna Filippova explore that question and provide useful exercises to start answering that in your company.
byData Engineering Podcast
0 ratings
0% found this document useful
The All-Seeing Project: Towards Panoptic Visual Recognition and Understanding of the Open World: We present the All-Seeing (AS) project: a large-scale data and model for recognizing and understanding everything in the open world. Using a scalable data engine that incorporates human feedback and efficient models in the loop, we create a new datas...
Podcast episode
The All-Seeing Project: Towards Panoptic Visual Recognition and Understanding of the Open World: We present the All-Seeing (AS) project: a large-scale data and model for recognizing and understanding everything in the open world. Using a scalable data engine that incorporates human feedback and efficient models in the loop, we create a new datas...
byPapers Read on AI
0 ratings
0% found this document useful
RLHF 201 - with Nathan Lambert of AI2 and Interconnects
Podcast episode
RLHF 201 - with Nathan Lambert of AI2 and Interconnects
byLatent Space: The AI Engineer Podcast — Practitioners talking LLMs, CodeGen, Agents, Multimodality, AI UX, GPU Infra and all things Software 3.0
0 ratings
0% found this document useful
Reduce Friction In Your Business Analytics Through Entity Centric Data Modeling: For business analytics the way that you model the data in your warehouse has a lasting impact on what types of questions can be answered quickly and easily. The major strategies in use today were created decades ago when the software and hardware for warehouse databases were far more constrained. In this episode Maxime Beauchemin of Airflow and Superset fame shares his vision for the entity-centric data model and how you can incorporate it into your own warehouse design.
Podcast episode
Reduce Friction In Your Business Analytics Through Entity Centric Data Modeling: For business analytics the way that you model the data in your warehouse has a lasting impact on what types of questions can be answered quickly and easily. The major strategies in use today were created decades ago when the software and hardware for warehouse databases were far more constrained. In this episode Maxime Beauchemin of Airflow and Superset fame shares his vision for the entity-centric data model and how you can incorporate it into your own warehouse design.
byData Engineering Podcast
0 ratings
0% found this document useful
#75 - Domain Storytelling: Building Domain-Driven Software Collaboratively - Stefan Hofer
Podcast episode
#75 - Domain Storytelling: Building Domain-Driven Software Collaboratively - Stefan Hofer
byTech Lead Journal
0 ratings
0% found this document useful
Enabling Version Controlled Data Collaboration With TerminusDB: An interview about the TerminusDB platform and how it supports data collaboration through a version controlled graph storage engine.
Podcast episode
Enabling Version Controlled Data Collaboration With TerminusDB: An interview about the TerminusDB platform and how it supports data collaboration through a version controlled graph storage engine.
byData Engineering Podcast
0 ratings
0% found this document useful
66: A guide to data models and dynamic dashboards for marketers
Podcast episode
66: A guide to data models and dynamic dashboards for marketers
byHumans of Martech
0 ratings
0% found this document useful
Data Types: One size does not fit all: Learn Programming and Electronics with Arduino
Podcast episode
Data Types: One size does not fit all: Learn Programming and Electronics with Arduino
byLearn Programming and Electronics with Arduino
0 ratings
0% found this document useful
Your Buddy is Typing: We sit down with Teresa Dietrich, Stack Overflow's new Chief Product Officer. Teresa, a CMU grad and tech industry veteran, shares a story from her time at AOL: the day three little dots emerged without warning and turned their network upside down.
Podcast episode
Your Buddy is Typing: We sit down with Teresa Dietrich, Stack Overflow's new Chief Product Officer. Teresa, a CMU grad and tech industry veteran, shares a story from her time at AOL: the day three little dots emerged without warning and turned their network upside down.
byThe Stack Overflow Podcast
0 ratings
0% found this document useful
The Testing Show: SAP Testing: Software Testing experts discussing trends and directions in the QA industry
Podcast episode
The Testing Show: SAP Testing: Software Testing experts discussing trends and directions in the QA industry
byThe Testing Show
0 ratings
0% found this document useful
Snorkel: Extracting Value From Dark Data with Alex Ratner - Episode 15: Snorkel: Extracting Value From Dark Data With Python (Interview)
Podcast episode
Snorkel: Extracting Value From Dark Data with Alex Ratner - Episode 15: Snorkel: Extracting Value From Dark Data With Python (Interview)
byData Engineering Podcast
0 ratings
0% found this document useful
Scaling Machine Learning with Data Mesh // Shawn Kyzer // Coffee Sessions #116
Podcast episode
Scaling Machine Learning with Data Mesh // Shawn Kyzer // Coffee Sessions #116
byMLOps.community
0 ratings
0% found this document useful
How Column-Aware Development Tooling Yields Better Data Models: Architectural decisions are all based on certain constraints and a desire to optimize for different outcomes. In data systems one of the core architectural exercises is data modeling, which can have significant impacts on what is and is not possible for downstream use cases. By incorporating column-level lineage in the data modeling process it encourages a more robust and well-informed design. In this episode Satish Jayanthi explores the benefits of incorporating column-aware tooling in the data modeling process.
Podcast episode
How Column-Aware Development Tooling Yields Better Data Models: Architectural decisions are all based on certain constraints and a desire to optimize for different outcomes. In data systems one of the core architectural exercises is data modeling, which can have significant impacts on what is and is not possible for downstream use cases. By incorporating column-level lineage in the data modeling process it encourages a more robust and well-informed design. In this episode Satish Jayanthi explores the benefits of incorporating column-aware tooling in the data modeling process.
byData Engineering Podcast
0 ratings
0% found this document useful
ToRA: A Tool-Integrated Reasoning Agent for Mathematical Problem Solving: Large language models have made significant progress in various language tasks, yet they still struggle with complex mathematics. In this paper, we propose ToRA a series of Tool-integrated Reasoning Agents designed to solve challenging mathematical p...
Podcast episode
ToRA: A Tool-Integrated Reasoning Agent for Mathematical Problem Solving: Large language models have made significant progress in various language tasks, yet they still struggle with complex mathematics. In this paper, we propose ToRA a series of Tool-integrated Reasoning Agents designed to solve challenging mathematical p...
byPapers Read on AI
0 ratings
0% found this document useful
Justin Dux - Be The Match: Sign-up for The Technopath Way Weekly Newsletter here: technopath.ac-page.com/the-technopath-way-sign-up Be The Match How can I check if I’m still registered in the database if I signed up years ago? You can call this number to find out: 1 (800)...
Podcast episode
Justin Dux - Be The Match: Sign-up for The Technopath Way Weekly Newsletter here: technopath.ac-page.com/the-technopath-way-sign-up Be The Match How can I check if I’m still registered in the database if I signed up years ago? You can call this number to find out: 1 (800)...
byThe Technopath Way: Productivity through tech for nonprofits
0 ratings
0% found this document useful
Is R the language of geospatial data?: Episode #44 - R is perhaps the most powerful computer environment for data analysis that is currently available. R is both a computer language, that allows you to write instructions, and a program that responds to these instructions. R has core functiona...
Podcast episode
Is R the language of geospatial data?: Episode #44 - R is perhaps the most powerful computer environment for data analysis that is currently available. R is both a computer language, that allows you to write instructions, and a program that responds to these instructions. R has core functiona...
byThe MapScaping Podcast - GIS, Geospatial, Remote Sensing, earth observation and digital geography
0 ratings
0% found this document useful
Spam Filtering with Naive Bayes: Today's spam filters are advanced data driven tools. They rely on a variety of techniques to effectively and often seamlessly filter out junk email from good email. Whitelists, blacklists, traffic analysis, network analysis, and a variety of other...
Podcast episode
Spam Filtering with Naive Bayes: Today's spam filters are advanced data driven tools. They rely on a variety of techniques to effectively and often seamlessly filter out junk email from good email. Whitelists, blacklists, traffic analysis, network analysis, and a variety of other...
byData Skeptic
0 ratings
0% found this document useful
Seamless SQL And Python Transformations For Data Engineers And Analysts With SQLMesh: Data transformation is a key activity for all of the organizational roles that interact with data. Because of its importance and outsized impact on what is possible for downstream data consumers it is critical that everyone is able to collaborate seamlessly. SQLMesh was designed as a unifying tool that is simple to work with but powerful enough for large-scale transformations and complex projects. In this episode Toby Mao explains how it works, the importance of automatic column-level lineage tracking, and how you can start using it today.
Podcast episode
Seamless SQL And Python Transformations For Data Engineers And Analysts With SQLMesh: Data transformation is a key activity for all of the organizational roles that interact with data. Because of its importance and outsized impact on what is possible for downstream data consumers it is critical that everyone is able to collaborate seamlessly. SQLMesh was designed as a unifying tool that is simple to work with but powerful enough for large-scale transformations and complex projects. In this episode Toby Mao explains how it works, the importance of automatic column-level lineage tracking, and how you can start using it today.
byData Engineering Podcast
0 ratings
0% found this document useful
What "Data Lineage Done Right" Looks Like And How They're Doing It At Manta: An interview with Ernie Ostic about the Manta platform and how it approaches the collection and processing of metadata to build a comprehensive view of data lineage across your various data systems
Podcast episode
What "Data Lineage Done Right" Looks Like And How They're Doing It At Manta: An interview with Ernie Ostic about the Manta platform and how it approaches the collection and processing of metadata to build a comprehensive view of data lineage across your various data systems
byData Engineering Podcast
0 ratings
0% found this document useful
4 + 1 Model of Data Science: Before diving into the complex world of data science it seemed to wise to establish a shared definition of the field. Here at the UVA School of Data Science, we have defined data science with the 4 + 1 Model. This model serves an outline for the first series of UVA Data Points. It also serves as a guiding definition within the School of Data Science, touching everything from research to course planning. In this introduction trailer, host Monica Manney discusses the history, development, and function of the 4 + 1 Model of Data Science with its main author, Raf Alvarado. Below is a brief expect from An Outline of the 4 + 1 Model of Data Science by Raf Alvarado: “The point of the 4 + 1 model, abstract as it is, is to provide a practical template for strategically planning the various elements of a school of data science. To serve as an effective template, a model must be general. But generality if often purchased at the cost of intuitive understanding. The fol
Podcast episode
4 + 1 Model of Data Science: Before diving into the complex world of data science it seemed to wise to establish a shared definition of the field. Here at the UVA School of Data Science, we have defined data science with the 4 + 1 Model. This model serves an outline for the first series of UVA Data Points. It also serves as a guiding definition within the School of Data Science, touching everything from research to course planning. In this introduction trailer, host Monica Manney discusses the history, development, and function of the 4 + 1 Model of Data Science with its main author, Raf Alvarado. Below is a brief expect from An Outline of the 4 + 1 Model of Data Science by Raf Alvarado: “The point of the 4 + 1 model, abstract as it is, is to provide a practical template for strategically planning the various elements of a school of data science. To serve as an effective template, a model must be general. But generality if often purchased at the cost of intuitive understanding. The fol
byUVA Data Points
0 ratings
0% found this document useful
429: 5x with Tarush Aggarwal: Tarush Aggarwal is the Founder and CEO of 5X, the modern data stack as a managed service that enables companies to answer business questions without having to worry about building data infrastructure or bringing in the right data engineering team. Chad talks with Tarush about the modern data stack movement, choosing things that make sense on behalf of their customers, and building a team culture at a company with a fairly large time zone distribution.
Podcast episode
429: 5x with Tarush Aggarwal: Tarush Aggarwal is the Founder and CEO of 5X, the modern data stack as a managed service that enables companies to answer business questions without having to worry about building data infrastructure or bringing in the right data engineering team. Chad talks with Tarush about the modern data stack movement, choosing things that make sense on behalf of their customers, and building a team culture at a company with a fairly large time zone distribution.
byGiant Robots Smashing Into Other Giant Robots
0 ratings
0% found this document useful
Where Are the Gaps in Climate Tech?
Podcast episode
Where Are the Gaps in Climate Tech?
byThe Interchange: Recharged
0 ratings
0% found this document useful
Strategies For A Successful Data Platform Migration: All software systems are in a constant state of evolution. This makes it impossible to select a truly future-proof technology stack for your data platform, making an eventual migration inevitable. In this episode Gleb Mezhanskiy and Rob Goretsky share their experiences leading various data platform migrations, and the hard-won lessons that they learned so that you don't have to.
Podcast episode
Strategies For A Successful Data Platform Migration: All software systems are in a constant state of evolution. This makes it impossible to select a truly future-proof technology stack for your data platform, making an eventual migration inevitable. In this episode Gleb Mezhanskiy and Rob Goretsky share their experiences leading various data platform migrations, and the hard-won lessons that they learned so that you don't have to.
byData Engineering Podcast
0 ratings
0% found this document useful
Big Data In The Browser: So why would anyone want to put alot of data into a browser? Well, for a lot of the same reasons that edge computing and distributed computing have become so popular. You get the data a lot closer to the user and you don’t have to pay for the compute...
Podcast episode
Big Data In The Browser: So why would anyone want to put alot of data into a browser? Well, for a lot of the same reasons that edge computing and distributed computing have become so popular. You get the data a lot closer to the user and you don’t have to pay for the compute...
byThe MapScaping Podcast - GIS, Geospatial, Remote Sensing, earth observation and digital geography
0 ratings
0% found this document useful
Leveraging E-Sourcing for Non-Price Factors – Lukas Wawrla & Tim Grunow from Archlet
Podcast episode
Leveraging E-Sourcing for Non-Price Factors – Lukas Wawrla & Tim Grunow from Archlet
byThe Procurement Software Podcast
0 ratings
0% found this document useful
Thomas Huckle and Tobias Neckel, "Bits and Bugs: A Scientific and Historical Review of Software Failures in Computational Science" (SIAM, 2019): An interview with Thomas Huckle and Tobias Neckel
Podcast episode
Thomas Huckle and Tobias Neckel, "Bits and Bugs: A Scientific and Historical Review of Software Failures in Computational Science" (SIAM, 2019): An interview with Thomas Huckle and Tobias Neckel
byNew Books in Mathematics
0 ratings
0% found this document useful
Thomas Huckle and Tobias Neckel, "Bits and Bugs: A Scientific and Historical Review of Software Failures in Computational Science" (SIAM, 2019): An interview with Thomas Huckle and Tobias Neckel
Podcast episode
Thomas Huckle and Tobias Neckel, "Bits and Bugs: A Scientific and Historical Review of Software Failures in Computational Science" (SIAM, 2019): An interview with Thomas Huckle and Tobias Neckel
byNew Books in Science, Technology, and Society
0 ratings
0% found this document useful
Mapping The Data Infrastructure Landscape As A Venture Capitalist: The data ecosystem has been building momentum for several years now. As a venture capital investor Matt Turck has been trying to keep track of the main trends and has compiled his findings into the MAD (ML, AI, and Data) landscape reports each year. In this episode he shares his experiences building those reports and the perspective he has gained from the exercise.
Podcast episode
Mapping The Data Infrastructure Landscape As A Venture Capitalist: The data ecosystem has been building momentum for several years now. As a venture capital investor Matt Turck has been trying to keep track of the main trends and has compiled his findings into the MAD (ML, AI, and Data) landscape reports each year. In this episode he shares his experiences building those reports and the perspective he has gained from the exercise.
byData Engineering Podcast
0 ratings
0% found this document useful
Thomas Huckle and Tobias Neckel, "Bits and Bugs: A Scientific and Historical Review of Software Failures in Computational Science" (SIAM, 2019): An interview with Thomas Huckle and Tobias Neckel
Podcast episode
Thomas Huckle and Tobias Neckel, "Bits and Bugs: A Scientific and Historical Review of Software Failures in Computational Science" (SIAM, 2019): An interview with Thomas Huckle and Tobias Neckel
byNew Books in the History of Science
0 ratings
0% found this document useful

Skip carousel

Using Calc For Serious Mathematics Work
Linux Format
Article
Using Calc For Serious Mathematics Work
Mar 10, 2020
10 min read
Website And RSS Feed Python Scraping
Linux Format
Article
Website And RSS Feed Python Scraping
Oct 18, 2022
Matt Holder has worked in IT support for over a decade, and is keen to utilise Linux alongside other installed systems. All the Python scripts that we’ve discussed in this tutorial are all available at https://github.com/mattmole/LXF295. Before we b
8 min read
Intelligent Artificiality: WHY ‘AI’ DOES NOT LIVE UP TO ITS HYPE – AND HOW TO MAKE IT MORE USEFUL THAN IT CURRENTLY IS
The European Business Review
Article
Intelligent Artificiality: WHY ‘AI’ DOES NOT LIVE UP TO ITS HYPE – AND HOW TO MAKE IT MORE USEFUL THAN IT CURRENTLY IS
Aug 2, 2019
5 min read
Microcontrollers In Amateur Radio
CQ Amateur Radio
Article
Microcontrollers In Amateur Radio
Feb 1, 2023
3 min read
Add A Little Funk To Mathematical Plots
Linux Format
Article
Add A Little Funk To Mathematical Plots
Jul 25, 2023
6 min read
Make Office better
APC
Article
Make Office better
Oct 9, 2023
Hide white spaces in Print Layout mode If you’re writing in Word’s Print Layout mode, regardless of whether you’re actually planning to print your document, your pages will always appear with white spaces at the top and bottom, and a grey gap between
2 min read
How To Make Use Of Typographic Refinement In Pages And Other MacOS Software
MacWorld
Article
How To Make Use Of Typographic Refinement In Pages And Other MacOS Software
Aug 17, 2021
8 min read
How To Insert Fractions In Word And Pages
MacWorld
Article
How To Insert Fractions In Word And Pages
Oct 17, 2023
4 min read
Your Excel Formulas Cheat Sheet
PCWorld
Article
Your Excel Formulas Cheat Sheet
Feb 4, 2020
6 min read
GO Inside Parsing – How Go Handles The Code
Linux Format
Article
GO Inside Parsing – How Go Handles The Code
Jul 30, 2019
This tutorial has two aspects: a theoretical one and a practical one. In the theoretical part, you will learn about parsing, grammar and regular expressions; this is how languages are built and therefore understood in terms of construction and usage.
8 min read
Mining Actionable Information with Smart Capture
The European Business Review
Article
Mining Actionable Information with Smart Capture
May 22, 2018
4 min read
Turn A Pi HAT Into A Magic 8-ball Project
Linux Format
Article
Turn A Pi HAT Into A Magic 8-ball Project
Feb 9, 2021
4 min read
The Art Of Hiding Messages In Plain Sight
Linux Format
Article
The Art Of Hiding Messages In Plain Sight
Sep 20, 2022
Mike Bedford used invisible ink as a kid. Many years later he learned it was called steganography and discovered it’s gone digital. Most steganography is network steganography. However, the term has a specific meaning of embedding information in netw
11 min read
Make Office Better
Computeractive
Article
Make Office Better
Aug 16, 2023
If you’re writing in Word’s Print Layout mode, regardless of whether you’re actually planning to print your document, your pages will always appear with white spaces at the top and bottom, and a grey gap between them. Sentences that spill over pages
2 min read
Visualise Complex Data In Style Using Timelion
Linux Format
Article
Visualise Complex Data In Style Using Timelion
Oct 20, 2020
Simon Quain is a site reliability engineer who likes discovering open datasets online to play around with in the Elastic Stack. You’ve probably heard of Elasticsearch – the search engine that enables you to index and then quickly search through your
9 min read
Do Math Quickly With Soulver
MacLife
Article
Do Math Quickly With Soulver
Feb 16, 2018
2 min read
Do Math Quickly With Soulver
MacLife
Article
Do Math Quickly With Soulver
Mar 6, 2018
2 min read
Salesforce Adding Einstein Analytics Al To Tableau Platform
Techfastly
Article
Salesforce Adding Einstein Analytics Al To Tableau Platform
Feb 4, 2021
3 min read
GENEALOGY GADGETS & APPS FOR ALL OCCASIONS!
Family Tree UK
Article
GENEALOGY GADGETS & APPS FOR ALL OCCASIONS!
Dec 9, 2022
4 min read
Tensor Flow 101
APC
Article
Tensor Flow 101
Jan 27, 2020
4 min read
Speed Up Text-based Tasks With Espanso
Linux Format
Article
Speed Up Text-based Tasks With Espanso
May 2, 2023
Nick Peers has been playing around with computers for over 30 years, and has been dabbling with Linux for the best part of a decade. When it comes to time-saving tools, the text expander is one of the most universally W useful. Everyone has to type
9 min read
Machine Learning And Investing: The Cautious Seldom Err Or Write Great Poetry
Finweek - English
Article
Machine Learning And Investing: The Cautious Seldom Err Or Write Great Poetry
Oct 18, 2019
5 min read
Monitor And Graph Your System Metrics
Linux Format
Article
Monitor And Graph Your System Metrics
Dec 13, 2022
Credit: https://oss.oetiker.ch/rrdtool Matt Holder has worked in IT support for over a decade, and always tries to use Linux alongside other installed systems. The code used in this article can be downloaded from https:// github.com/ mattmole/ LXF297
8 min read
Machine Learning – With Zero Programming
APC
Article
Machine Learning – With Zero Programming
Aug 12, 2019
6 min read
How To Create Excel Macros And Automate Your Spreadsheets
PCWorld
Article
How To Create Excel Macros And Automate Your Spreadsheets
Jan 8, 2019
10 min read
Make Office Better
Computeractive
Article
Make Office Better
Jun 7, 2023
2 min read
Databases Made Quick And Easy
Linux Format
Article
Databases Made Quick And Easy
Aug 25, 2020
These days, databases are more routinely associated with powering websites and ecommerce systems. To the casual user they look impenetrable, involving connecting to third-party database servers such as SQL and hiding behind opaque languages like PHP.
7 min read
How To Create Excel Macros And Automate Your Spreadsheets
PCWorld
Article
How To Create Excel Macros And Automate Your Spreadsheets
Mar 3, 2020
12 min read
Microcontrollers In Amateur Radio
CQ Amateur Radio
Article
Microcontrollers In Amateur Radio
May 1, 2022
When you hit the compile button for your compiler, there’s a whole bunch of stuff that takes place that isn’t obvious while the code compiles. In general terms, the C compiler: 1) invokes a preprocessor pass on the code;2) performs syntax/semantic ch
4 min read
Coding Secure Rust System Tools
Linux Format
Article
Coding Secure Rust System Tools
Apr 5, 2022
8 min read

Related categories

Skip carousel

Reviews for The Joys of Hashing

Rating: 0 out of 5 stars

0 ratings

0 ratings0 reviews

Book preview

The Joys of Hashing - Thomas Mailund

Thomas MailundThe Joys of Hashinghttps://doi.org/10.1007/978-1-4842-4066-3_1

1. The Joys of Hashing

Thomas Mailund¹

(1)

Aarhus N, Denmark

This book is an introduction to the hash table data structure. When implemented and used appropriately, hash tables are exceptionally efficient data structures for representing sets and lookup tables. They provide constant time, low overhead, insertion, deletion, and lookup. The book assumes the reader is familiar with programming and the C programming language. For the theoretical parts of the book, it also assumes some familiarity with probability theory and algorithmic theory.

Hash tables are constructed from two basic ideas: reducing application keys to a hash key, a number in the range from 0 to some N – 1, and mapping that number into a smaller range from 0 to m – 1, m « N. We can use the small range to index into an array with constant time access. Both ideas are simple, but how they are implemented in practice affects the efficiency of hash tables.

Consider Figure 1-1. This figure illustrates the main components of storing values in a hash table: application values, which are potentially complex, are mapped to hash keys, which are integer values in a range of size N, usually zero to N – 1. In the figure, N = 64. Doing this simplifies the representation of the values; now you only have integers as keys, and if N is small, you can store them in an array of size N. You use their hash keys as their index into the array. However, if N is large, this is not feasible. If, for example, the space of hash keys is 32-bit integers, then N = 4, 294, 967, 295, slightly more than four billion. An array of bytes of this size would, therefore, take up more than four gigabytes of space. To be able to store pointers or integers, simple objects, you would need between four and eight times as much memory. It is impractical to use this size of an array to store some application keys.

../images/470552_1_En_1_Chapter/470552_1_En_1_Fig1_HTML.png

Figure 1-1

Values map to hash keys that then map to table bins

Even if N is considerably smaller than four-byte words, if you plan to store n « N keys, you waste a lot of space to have the array. Since this array needs to be allocated and initialized, merely creating it will cost you O(N) . Even if you get constant time insertion and deletion into such an array, the cost of producing it can easily swamp the time your algorithm will spend while using the array. If you want a table that is efficient, you should be able to both initialize it and use it to insert or delete n keys, all in time O(n). Therefore, N should be in O(n).

The typical solution to this is to keep N large but have a second step that reduces the hash key range down to a smaller bin range of size m with m ∈ O(n); in the example, you use m = 8. If you keep m small, as in O(n), you can allocate and initialize it in linear time, and you can get any bin in it in constant time. To insert, check, or delete an element in the table, you map the application value to its hash key and then map the hash key to a bin index.

You reduce values to bin indices in two steps because the first step, mapping data from your application domain to a number, is program-specific and cannot be part of a general hash table implementation.¹ Moving from large integer intervals to smaller, however, can be implemented as part of the hash table. If you resize the table to adapt it to the number of keys you store in it, you need to change m. You do not want the application programmer to provide separate functions for each m. You can think of the hash key space, [N] = [0, ... , N – 1], as the interface between the application and the data structure. The hash table itself can map from this space to indices in an array, [m] = [0, ... , m – 1].

The primary responsibility of the first step is to reduce potentially complicated application values to simpler hash keys, such as to map application-relevant information like positions on a board game or connections in a network down to integers. These integers can then be handled by the hash table data structure. A second responsibility of the function is to make the hash keys uniformly distributed in the range [N]. The binning strategy for mapping hash keys to bins assumes that the hash keys are uniformly distributed to distribute keys into bins evenly. If this is violated, the data structure does not guarantee (expected) constant time operations. Here, you can add a third, middle step that maps from [N] → [N] and scrambles the application hash keys to hash keys with a better distribution; see Figure 1-2. These functions can be application-independent and part of a hash table library. You will return to such functions in Chapter 6 and Chapter 7. Having a middle step does not eliminate the need for application hash functions. You still need to map complex data into integers. The middle step only alleviates the need for an even distribution of keys. The map from application keys to hash keys still has some responsibility for this, though. If it maps different data to the same hash keys, then the middle step cannot do anything but map the same input to the same output.

../images/470552_1_En_1_Chapter/470552_1_En_1_Fig2_HTML.png

Figure 1-2

If the application maps values to keys, but they are not uniformly distributed, then a hashing step between the application and the binning can be added

Strictly speaking, you do not need the distribution of hash keys to be uniform as long as the likelihood of two different values mapping to the same key is highly unlikely. The goal is to have uniformly distributed hash keys, as these are easiest to work with when analyzing theoretical performance. The runtime results referred to in Chapter 3 assume this, and therefore, we will as well. In Chapter 7, you will learn techniques for achieving similar results without the assumption.

The book is primarily about implementing the hash table data structure and only secondarily about hash functions. The concerns when implementing hash tables are these: given hash keys with application values attached to them, how do you represent the data such that you can update and query tables in constant time? The fundamental idea is, of course, to reduce hash keys to bins and then use an array of bins containing values. In the purest form, you can store your data values directly in the array at the index the hash function and binning functions provide but if m is relatively small compared to the number of data values, then you are likely to have collisions: cases where two hash keys map to the same bin. Although different values are unlikely to hash to the same key in the range [N], this does not mean that collisions are unlikely in the range [m] if m is smaller than N (and as the number of keys you insert in the table, n, approaches m, collisions are guaranteed). Dealing with collisions is a crucial aspect of implementing hash tables, and a topic we will deal with for a sizeable part of this book.

Footnotes

In some textbooks, you will see the hashing step and the binning step combined and called hashing. Then you have a single function that maps application-specific keys directly to bins. I prefer to consider this as two or three separate functions, and it usually is implemented as such.

Thomas MailundThe Joys of Hashinghttps://doi.org/10.1007/978-1-4842-4066-3_2

2. Hash Keys, Indices, and Collisions

Thomas Mailund¹

(1)

Aarhus N, Denmark

As mentioned in the introduction, this book is primarily about implementing hash tables and not hash functions. So to simplify the exposition, assume that the data values stored in tables are identical to the hash keys. In Chapter 5, you will address which changes you have to make to store application data together with keys, but for most of the theory of hash tables you only need to consider hash keys; everywhere else, you will view additional data as black box data and just store their keys. While the code snippets below cover all that you need to implement the concepts in the chapter, you cannot easily compile them from the book, but you can download the complete code listings from https://github.com/mailund/JoyChapter2 .

Assume that the keys are uniformly distributed in the interval [N] = [0, ... , N – 1] where N is the maximum uint32_t and consider the most straightforward hash table. It consists of an array where you can store keys and a number holding the size of the table, m. To be able to map from the range [N] to the range [m], you need to remember m. You store this number in the variable size in the structure below. You cannot use a special key to indicate that an entry in the table is not occupied, so you will use a structure called struct bin that contains a flag for this.

struct bin {

int is_free : 1;

uint32_t key;

};

struct hash_table {

struct bin *table;

uint32_t size;

};

Functions for allocating and deallocating tables can then look like this:

struct hash_table *empty_table(uint32_t size)

{

struct hash_table *table =

(struct hash_table*)malloc(sizeof(struct hash_table));

table->table = (struct bin *)malloc(size * sizeof(struct bin));

for (uint32_t i = 0; i < size; ++i) {

struct bin *bin = & table->table[i];

bin->is_free = true;

}

table->size = size;

return table;

}

void delete_table(struct hash_table *table)

{

free(table->table);

free(table);

}

The operations you want to implement on hash tables are the insertion and deletion of keys and queries to test if a table holds a given key. You use this interface to the three operations:

void insert_key (struct hash_table *table, uint32_t key);

bool contains_key (struct hash_table *table, uint32_t key);

void delete_key (struct hash_table *table, uint32_t key);

Mapping from Keys to Indices

When you have to map a hash key from [N] down to the range of the indices in the array, [m], the most straightforward approach is to take the remainder of a division by m:

unsigned int index = key % table->size;

You then use that index to access the array. Assuming that you never have collisions when doing this, the implementation of the three operations would then be as simple as this:

void insert_key(struct hash_table *table, uint32_t key)

{

uint32_t index = key % table->size;

struct bin *bin = & table->table[index];

if (bin->is_free) {

bin->key = key;

bin->is_free = false;

} else {

// There is already a key here, so we have a

// collision. We cannot deal with this yet.

}

bool contains_key(struct hash_table *table, uint32_t key)

{

uint32_t index = key % table->size;

struct bin *bin = & table->table[index];

if (!bin->is_free && bin->key == key) {

return true;

} else {

return false;

}

void delete_key(struct hash_table

Enjoying the preview?

Page 1 of 1

The Joys of Hashing: Hash Table Programming with C

About this ebook

Thomas Mailund

Read more from Thomas Mailund

Related authors

Related to The Joys of Hashing

Related ebooks

Programming For You

Related podcast episodes

Related articles

Related categories

Reviews for The Joys of Hashing

What did you think?

Book preview

The Joys of Hashing - Thomas Mailund

1. The Joys of Hashing

2. Hash Keys, Indices, and Collisions

Mapping from Keys to Indices