String Algorithms in C: Efficient Text Representation and Search

Ebook415 pages3 hours

String Algorithms in C: Efficient Text Representation and Search

Name: String Algorithms in C: Efficient Text Representation and Search
Author: Thomas Mailund
ISBN: 9781484259207

By Thomas Mailund

Rating: 0 out of 5 stars

()

Read preview

About this ebook

Implement practical data structures and algorithms for text search and discover how it is used inside other larger applications. This unique in-depth guide explains string algorithms using the C programming language. String Algorithms in C teaches you the following algorithms and how to use them: classical exact search algorithms; tries and compact tries; suffix trees and arrays; approximative pattern searches; and more.

In this book, author Thomas Mailund provides a library with all the algorithms and applicable source code that you can use in your own programs. There are implementations of all the algorithms presented in this book so there are plenty of examples.

You’ll understand that string algorithms are used in various applications such as image processing, computer vision, text analytics processing from data science to web applications, information retrieval from databases, network security, and much more.

What You Will Learn

Search in trees, use tries and compact tries, and work with the Aho-Carasick algorithm
Work with suffix arrays including binary searches; sorting naive constructions; suffix tree construction; skew algorithms; and the Borrows-Wheeler transform (BWT)
Carry out approximative pattern searches among suffix trees and approximative BWT searches

Who This Book Is For

Those with at least some prior programming experience with C or Assembly and have at least prior experience with programming algorithms.

Skip carousel

Programming

LanguageEnglish

PublisherApress

Release dateAug 28, 2020

ISBN9781484259207

Author

Thomas Mailund

Related to String Algorithms in C

Related ebooks

Skip carousel

Visualizing Data Structures
Ebook
Visualizing Data Structures
byRhonda Hoenigman
Rating: 0 out of 5 stars
0 ratings
Introduction to Algorithms
Ebook
Introduction to Algorithms
byS VASIST
Rating: 0 out of 5 stars
0 ratings
Coding for beginners The basic syntax and structure of coding
Ebook
Coding for beginners The basic syntax and structure of coding
byDiamond Moore
Rating: 0 out of 5 stars
0 ratings
Mastering Data Structures and Algorithms in C and C++
Ebook
Mastering Data Structures and Algorithms in C and C++
bySachin Naha
Rating: 0 out of 5 stars
0 ratings
Practical TLA+: Planning Driven Development
Ebook
Practical TLA+: Planning Driven Development
byHillel Wayne
Rating: 0 out of 5 stars
0 ratings
Beginning Mathematica and Wolfram for Data Science: Applications in Data Analysis, Machine Learning, and Neural Networks
Ebook
Beginning Mathematica and Wolfram for Data Science: Applications in Data Analysis, Machine Learning, and Neural Networks
byJalil Villalobos Alva
Rating: 0 out of 5 stars
0 ratings
Introducing Algorithms in C: A Step by Step Guide to Algorithms in C
Ebook
Introducing Algorithms in C: A Step by Step Guide to Algorithms in C
byLuciano Manelli
Rating: 0 out of 5 stars
0 ratings
Ian Talks Regex A-Z
Ebook
Ian Talks Regex A-Z
byIan Eress
Rating: 0 out of 5 stars
0 ratings
Codeless Data Structures and Algorithms: Learn DSA Without Writing a Single Line of Code
Ebook
Codeless Data Structures and Algorithms: Learn DSA Without Writing a Single Line of Code
byArmstrong Subero
Rating: 0 out of 5 stars
0 ratings
Python: Advanced Guide to Programming Code with Python
Ebook
Python: Advanced Guide to Programming Code with Python
byCharlie Masterson
Rating: 0 out of 5 stars
0 ratings
Python: Advanced Guide to Programming Code with Python: Python Computer Programming, #4
Ebook
Python: Advanced Guide to Programming Code with Python: Python Computer Programming, #4
byCharlie Masterson
Rating: 0 out of 5 stars
0 ratings
Learn C++
Ebook
Learn C++
byDurgesh
Rating: 4 out of 5 stars
4/5
Practical Java Machine Learning: Projects with Google Cloud Platform and Amazon Web Services
Ebook
Practical Java Machine Learning: Projects with Google Cloud Platform and Amazon Web Services
byMark Wickham
Rating: 0 out of 5 stars
0 ratings
JavaScript Data Structures and Algorithms: An Introduction to Understanding and Implementing Core Data Structure and Algorithm Fundamentals
Ebook
JavaScript Data Structures and Algorithms: An Introduction to Understanding and Implementing Core Data Structure and Algorithm Fundamentals
bySammie Bae
Rating: 0 out of 5 stars
0 ratings
Data Structures and Algorithms in Swift: Implement Stacks, Queues, Dictionaries, and Lists in Your Apps
Ebook
Data Structures and Algorithms in Swift: Implement Stacks, Queues, Dictionaries, and Lists in Your Apps
byElshad Karimov
Rating: 0 out of 5 stars
0 ratings
Python for Beginners: This comprehensive introduction to the world of coding introduces you to the Python programming language
Ebook
Python for Beginners: This comprehensive introduction to the world of coding introduces you to the Python programming language
byVere salazar
Rating: 0 out of 5 stars
0 ratings
Joe Celko's Trees and Hierarchies in SQL for Smarties
Ebook
Joe Celko's Trees and Hierarchies in SQL for Smarties
byJoe Celko
Rating: 0 out of 5 stars
0 ratings
Formal Languages And Automata Theory
Ebook
Formal Languages And Automata Theory
byAjit Singh
Rating: 0 out of 5 stars
0 ratings
Introduction to PHP, Part 2, Second Edition
Ebook
Introduction to PHP, Part 2, Second Edition
byAdam Majczak
Rating: 0 out of 5 stars
0 ratings
Java: Advanced Guide to Programming Code with Java
Ebook
Java: Advanced Guide to Programming Code with Java
byCharlie Masterson
Rating: 0 out of 5 stars
0 ratings
Java: Advanced Guide to Programming Code with Java: Java Computer Programming, #4
Ebook
Java: Advanced Guide to Programming Code with Java: Java Computer Programming, #4
byCharlie Masterson
Rating: 0 out of 5 stars
0 ratings
Essential Algorithms: A Practical Approach to Computer Algorithms
Ebook
Essential Algorithms: A Practical Approach to Computer Algorithms
byRod Stephens
Rating: 5 out of 5 stars
5/5
Cybersecurity and Applied Mathematics
Ebook
Cybersecurity and Applied Mathematics
byLeigh Metcalf
Rating: 0 out of 5 stars
0 ratings
Numerical C: Applied Computational Programming with Case Studies
Ebook
Numerical C: Applied Computational Programming with Case Studies
byPhilip Joyce
Rating: 0 out of 5 stars
0 ratings
Computer Algebra: Fundamentals and Applications
Ebook
Computer Algebra: Fundamentals and Applications
byFouad Sabry
Rating: 0 out of 5 stars
0 ratings
Top Numerical Methods With Matlab For Beginners!
Ebook
Top Numerical Methods With Matlab For Beginners!
byAndrei Besedin
Rating: 0 out of 5 stars
0 ratings
Magic Data: Part 1 - Harnessing the Power of Algorithms and Structures
Ebook
Magic Data: Part 1 - Harnessing the Power of Algorithms and Structures
byChuck Sherman
Rating: 0 out of 5 stars
0 ratings
C# Mastery: A Comprehensive Guide to Advanced C# Features and Applications
Ebook
C# Mastery: A Comprehensive Guide to Advanced C# Features and Applications
byLena Neill
Rating: 0 out of 5 stars
0 ratings
Semantic Network: Fundamentals and Applications
Ebook
Semantic Network: Fundamentals and Applications
byFouad Sabry
Rating: 0 out of 5 stars
0 ratings
Ian Talks Python A-Z
Ebook
Ian Talks Python A-Z
byIan Eress
Rating: 0 out of 5 stars
0 ratings

Programming For You

Skip carousel

The Advanced Roblox Coding Book: An Unofficial Guide, Updated Edition: Learn How to Script Games, Code Objects and Settings, and Create Your Own World!
Ebook
The Advanced Roblox Coding Book: An Unofficial Guide, Updated Edition: Learn How to Script Games, Code Objects and Settings, and Create Your Own World!
byHeath Haskins
Rating: 5 out of 5 stars
5/5
Python: For Beginners A Crash Course Guide To Learn Python in 1 Week
Ebook
Python: For Beginners A Crash Course Guide To Learn Python in 1 Week
byTimothy C. Needham
Rating: 4 out of 5 stars
4/5
Excel Essentials: A Step-by-Step Guide with Pictures for Absolute Beginners to Master the Basics and Start Using Excel with Confidence
Ebook
Excel Essentials: A Step-by-Step Guide with Pictures for Absolute Beginners to Master the Basics and Start Using Excel with Confidence
byNigel Tillery
Rating: 0 out of 5 stars
0 ratings
Python Programming : How to Code Python Fast In Just 24 Hours With 7 Simple Steps
Ebook
Python Programming : How to Code Python Fast In Just 24 Hours With 7 Simple Steps
byJason Scotts
Rating: 4 out of 5 stars
4/5
Python Programming For Beginners: Learn The Basics Of Python Programming (Python Crash Course, Programming for Dummies)
Ebook
Python Programming For Beginners: Learn The Basics Of Python Programming (Python Crash Course, Programming for Dummies)
byJames Tudor
Rating: 5 out of 5 stars
5/5
HTML & CSS: Learn the Fundaments in 7 Days
Ebook
HTML & CSS: Learn the Fundaments in 7 Days
byMichael Knapp
Rating: 4 out of 5 stars
4/5
Java for Beginners: A Crash Course to Learn Java Programming in 1 Week
Ebook
Java for Beginners: A Crash Course to Learn Java Programming in 1 Week
byBrady Ellison
Rating: 5 out of 5 stars
5/5
SQL: For Beginners: Your Guide To Easily Learn SQL Programming in 7 Days
Ebook
SQL: For Beginners: Your Guide To Easily Learn SQL Programming in 7 Days
byi Code Academy
Rating: 5 out of 5 stars
5/5
The JavaScript Workshop: Learn to develop interactive web applications with clean and maintainable JavaScript code
Ebook
The JavaScript Workshop: Learn to develop interactive web applications with clean and maintainable JavaScript code
byJoseph Labrecque
Rating: 5 out of 5 stars
5/5
HTML & CSS QuickStart Guide: The Simplified Beginners Guide to Developing a Strong Coding Foundation, Building Responsive Websites, and Mastering the Fundamentals of Modern Web Design
Ebook
HTML & CSS QuickStart Guide: The Simplified Beginners Guide to Developing a Strong Coding Foundation, Building Responsive Websites, and Mastering the Fundamentals of Modern Web Design
byDavid DuRocher
Rating: 4 out of 5 stars
4/5
CODING FOR ABSOLUTE BEGINNERS: How to Keep Your Data Safe from Hackers by Mastering the Basic Functions of Python, Java, and C++ (2022 Guide for Newbies)
Ebook
CODING FOR ABSOLUTE BEGINNERS: How to Keep Your Data Safe from Hackers by Mastering the Basic Functions of Python, Java, and C++ (2022 Guide for Newbies)
byEric Vargas
Rating: 0 out of 5 stars
0 ratings
SQL QuickStart Guide: The Simplified Beginner's Guide to Managing, Analyzing, and Manipulating Data With SQL
Ebook
SQL QuickStart Guide: The Simplified Beginner's Guide to Managing, Analyzing, and Manipulating Data With SQL
byWalter Shields
Rating: 4 out of 5 stars
4/5
Python Programming for Beginners: A Comprehensive Crash Course With Practical Exercises to Quickly Learn Coding and Programming for Data Analysis and Machine Learning
Ebook
Python Programming for Beginners: A Comprehensive Crash Course With Practical Exercises to Quickly Learn Coding and Programming for Data Analysis and Machine Learning
byAnthony Adams
Rating: 4 out of 5 stars
4/5
Learn to Code. Get a Job. The Ultimate Guide to Learning and Getting Hired as a Developer.
Ebook
Learn to Code. Get a Job. The Ultimate Guide to Learning and Getting Hired as a Developer.
byGwendolyn Faraday
Rating: 5 out of 5 stars
5/5
Coding All-in-One For Dummies
Ebook
Coding All-in-One For Dummies
byNikhil Abraham
Rating: 4 out of 5 stars
4/5
Python Machine Learning By Example
Ebook
Python Machine Learning By Example
byYuxi (Hayden) Liu
Rating: 4 out of 5 stars
4/5
101 Amazing Nintendo NES Facts: Includes facts about the Famicom
Ebook
101 Amazing Nintendo NES Facts: Includes facts about the Famicom
byJimmy Russell
Rating: 4 out of 5 stars
4/5
Python for Beginners. A Smarter Way to Learn Python in 5 Days and Remember it Longer. With Easy Step by Step Guidance and Hands on Examples. (Python Crash Course-Programming for Beginners)
Ebook
Python for Beginners. A Smarter Way to Learn Python in 5 Days and Remember it Longer. With Easy Step by Step Guidance and Hands on Examples. (Python Crash Course-Programming for Beginners)
byArthur T. Brooks
Rating: 0 out of 5 stars
0 ratings
Pokemon Go: Guide + 20 Tips and Tricks You Must Read Hints, Tricks, Tips, Secrets, Android, iOS
Ebook
Pokemon Go: Guide + 20 Tips and Tricks You Must Read Hints, Tricks, Tips, Secrets, Android, iOS
byGame Guidez
Rating: 5 out of 5 stars
5/5
Linux: Learn in 24 Hours
Ebook
Linux: Learn in 24 Hours
byAlex Nordeen
Rating: 5 out of 5 stars
5/5
Python Machine Learning - Third Edition: Machine Learning and Deep Learning with Python, scikit-learn, and TensorFlow 2, 3rd Edition
Ebook
Python Machine Learning - Third Edition: Machine Learning and Deep Learning with Python, scikit-learn, and TensorFlow 2, 3rd Edition
bySebastian Raschka
Rating: 5 out of 5 stars
5/5
Data Science from Scratch: The #1 Data Science Guide for Everything A Data Scientist Needs to Know: Python, Linear Algebra, Statistics, Coding, Applications, Neural Networks, and Decision Trees
Ebook
Data Science from Scratch: The #1 Data Science Guide for Everything A Data Scientist Needs to Know: Python, Linear Algebra, Statistics, Coding, Applications, Neural Networks, and Decision Trees
bySteven Cooper
Rating: 4 out of 5 stars
4/5
Grokking Algorithms: An illustrated guide for programmers and other curious people
Ebook
Grokking Algorithms: An illustrated guide for programmers and other curious people
byAditya Bhargava
Rating: 4 out of 5 stars
4/5
Learn SQL in 24 Hours
Ebook
Learn SQL in 24 Hours
byAlex Nordeen
Rating: 5 out of 5 stars
5/5
SQL All-in-One For Dummies
Ebook
SQL All-in-One For Dummies
byAllen G. Taylor
Rating: 3 out of 5 stars
3/5
Excel : The Ultimate Comprehensive Step-By-Step Guide to the Basics of Excel Programming: 1
Ebook
Excel : The Ultimate Comprehensive Step-By-Step Guide to the Basics of Excel Programming: 1
byKevin Clark
Rating: 5 out of 5 stars
5/5
Microsoft Office 365 Bible: 10:1 Mastery | Excel in Your Profession, Enhance Time Management, and Foster Exceptional Collaboration [III EDITION]: Career Elevator
Ebook
Microsoft Office 365 Bible: 10:1 Mastery | Excel in Your Profession, Enhance Time Management, and Foster Exceptional Collaboration [III EDITION]: Career Elevator
byKevin Pitch
Rating: 5 out of 5 stars
5/5
PYTHON: Practical Python Programming For Beginners & Experts With Hands-on Project
Ebook
PYTHON: Practical Python Programming For Beginners & Experts With Hands-on Project
byMark Chan
Rating: 5 out of 5 stars
5/5
Modern C++ for Absolute Beginners: A Friendly Introduction to C++ Programming Language and C++11 to C++20 Standards
Ebook
Modern C++ for Absolute Beginners: A Friendly Introduction to C++ Programming Language and C++11 to C++20 Standards
bySlobodan Dmitrović
Rating: 0 out of 5 stars
0 ratings
Python Projects for Beginners: A Ten-Week Bootcamp Approach to Python Programming
Ebook
Python Projects for Beginners: A Ten-Week Bootcamp Approach to Python Programming
byConnor P. Milliken
Rating: 0 out of 5 stars
0 ratings

Related podcast episodes

Skip carousel

Tensor Operations for Machine Learning with Anima Anandkumar - TWiML Talk #142: In this episode of our TrainAI series, I sit down…
Podcast episode
Tensor Operations for Machine Learning with Anima Anandkumar - TWiML Talk #142: In this episode of our TrainAI series, I sit down…
byThe TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)
0 ratings
0% found this document useful
Hasty Treat - Hireable Skills for 2021: In this Hasty Treat, Scott and Wes talk about hireable skills or 2021 — what you need to know to get a job and grow in your career this year! Freshbooks - Sponsor Get a 30 day free trial of Freshbooks at and put SYNTAX in the “How did...
Podcast episode
Hasty Treat - Hireable Skills for 2021: In this Hasty Treat, Scott and Wes talk about hireable skills or 2021 — what you need to know to get a job and grow in your career this year! Freshbooks - Sponsor Get a 30 day free trial of Freshbooks at and put SYNTAX in the “How did...
bySyntax - Tasty Web Development Treats
0 ratings
0% found this document useful
10. Unlocking Contract Intelligence: The Intersection of AI and Transformative Mathematics with Randy Friedman: The CLM Rx
Podcast episode
10. Unlocking Contract Intelligence: The Intersection of AI and Transformative Mathematics with Randy Friedman: The CLM Rx
byThe CLM Rx
0 ratings
0% found this document useful
How ChatGPT Works
Podcast episode
How ChatGPT Works
byAI Chat: ChatGPT & AI News, Artificial Intelligence, OpenAI, Machine Learning
0 ratings
0% found this document useful
Mastering Algorithms and Data Structures - Marcello La Rocca
Podcast episode
Mastering Algorithms and Data Structures - Marcello La Rocca
byDataTalks.Club
0 ratings
0% found this document useful
Data Types: One size does not fit all: Learn Programming and Electronics with Arduino
Podcast episode
Data Types: One size does not fit all: Learn Programming and Electronics with Arduino
byLearn Programming and Electronics with Arduino
0 ratings
0% found this document useful
Functions: Let's make programming Arduino as easy as possible: Learn Programming and Electronics with Arduino
Podcast episode
Functions: Let's make programming Arduino as easy as possible: Learn Programming and Electronics with Arduino
byLearn Programming and Electronics with Arduino
0 ratings
0% found this document useful
Episode 136: Math competitions, crypto as alchemy & Gasper with Yan Zhang: This week, Anna and guest host Tarun Chitra chat with Yan Zhang, professor at SJSU, about math competitions, math education, how crypto is like alchemy and the Gasper paper - work Yan and his students collaborated on with the Ethereum Foundation.
Podcast episode
Episode 136: Math competitions, crypto as alchemy & Gasper with Yan Zhang: This week, Anna and guest host Tarun Chitra chat with Yan Zhang, professor at SJSU, about math competitions, math education, how crypto is like alchemy and the Gasper paper - work Yan and his students collaborated on with the Ethereum Foundation.
byZero Knowledge
0 ratings
0% found this document useful
The All-Seeing Project: Towards Panoptic Visual Recognition and Understanding of the Open World: We present the All-Seeing (AS) project: a large-scale data and model for recognizing and understanding everything in the open world. Using a scalable data engine that incorporates human feedback and efficient models in the loop, we create a new datas...
Podcast episode
The All-Seeing Project: Towards Panoptic Visual Recognition and Understanding of the Open World: We present the All-Seeing (AS) project: a large-scale data and model for recognizing and understanding everything in the open world. Using a scalable data engine that incorporates human feedback and efficient models in the loop, we create a new datas...
byPapers Read on AI
0 ratings
0% found this document useful
Alignment Newsletter #163: Using finite factored sets for causal and temporal inference: Using finite factored sets for causal and temporal inference
Podcast episode
Alignment Newsletter #163: Using finite factored sets for causal and temporal inference: Using finite factored sets for causal and temporal inference
byAlignment Newsletter Podcast
0 ratings
0% found this document useful
Snorkel: Extracting Value From Dark Data with Alex Ratner - Episode 15: Snorkel: Extracting Value From Dark Data With Python (Interview)
Podcast episode
Snorkel: Extracting Value From Dark Data with Alex Ratner - Episode 15: Snorkel: Extracting Value From Dark Data With Python (Interview)
byData Engineering Podcast
0 ratings
0% found this document useful
React + TypeScript: In this episode of Syntax, Scott and Wes talk about using React with Typescript — how to set it up, components, state, props, passing data, custom hooks, and more! Freshbooks - Sponsor Get a 30 day free trial of Freshbooks at and put...
Podcast episode
React + TypeScript: In this episode of Syntax, Scott and Wes talk about using React with Typescript — how to set it up, components, state, props, passing data, custom hooks, and more! Freshbooks - Sponsor Get a 30 day free trial of Freshbooks at and put...
bySyntax - Tasty Web Development Treats
0 ratings
0% found this document useful
TypeScript Fundamentals: In this episode of Syntax, Scott and Wes talk about TypeScript fundamentals — what it is, how you use it, why people love it so much, and more! Sanity - Sponsor is a real-time headless CMS with a fully customizable Content Studio built in...
Podcast episode
TypeScript Fundamentals: In this episode of Syntax, Scott and Wes talk about TypeScript fundamentals — what it is, how you use it, why people love it so much, and more! Sanity - Sponsor is a real-time headless CMS with a fully customizable Content Studio built in...
bySyntax - Tasty Web Development Treats
0 ratings
0% found this document useful
#27: Serverless at A Cloud Guru with Dale Salter
Podcast episode
#27: Serverless at A Cloud Guru with Dale Salter
byReal World Serverless with theburningmonk
0 ratings
0% found this document useful
Machine Learning: Does machine learning feel like too convoluted a topic? Not anymore! Listen to hosts Lois Houston and Nikita Abraham, along with Senior Principal OCI Instructor Hemant Gahankari, talk about foundational machine learning concepts and dive into how...
Podcast episode
Machine Learning: Does machine learning feel like too convoluted a topic? Not anymore! Listen to hosts Lois Houston and Nikita Abraham, along with Senior Principal OCI Instructor Hemant Gahankari, talk about foundational machine learning concepts and dive into how...
byOracle University Podcast
0 ratings
0% found this document useful
#143 Scott Aaronson: Revealing the Truth About Quantum Computing: This episode is sponsored by Crusoe. Crusoe Cloud is a scalable, clean, high-performance cloud, optimized for AI and HPC workloads, and powered by wasted, stranded or clean energy. Crusoe offers virtualized compute and storage solutions for a range of...
Podcast episode
#143 Scott Aaronson: Revealing the Truth About Quantum Computing: This episode is sponsored by Crusoe. Crusoe Cloud is a scalable, clean, high-performance cloud, optimized for AI and HPC workloads, and powered by wasted, stranded or clean energy. Crusoe offers virtualized compute and storage solutions for a range of...
byEye On A.I.
0 ratings
0% found this document useful
The Mathematics of Training LLMs — with Quentin Anthony of Eleuther AI
Podcast episode
The Mathematics of Training LLMs — with Quentin Anthony of Eleuther AI
byLatent Space: The AI Engineer Podcast — Practitioners talking LLMs, CodeGen, Agents, Multimodality, AI UX, GPU Infra and all things Software 3.0
0 ratings
0% found this document useful
590 : Topical English Vocabulary Lesson With Teacher Tiffani about Robotics
Podcast episode
590 : Topical English Vocabulary Lesson With Teacher Tiffani about Robotics
bySpeak English with Tiffani Podcast
0 ratings
0% found this document useful
DSPy: Transforming Language Model Calls into Smart Pipelines // Omar Khattab // #194
Podcast episode
DSPy: Transforming Language Model Calls into Smart Pipelines // Omar Khattab // #194
byMLOps.community
0 ratings
0% found this document useful
RLHF 201 - with Nathan Lambert of AI2 and Interconnects
Podcast episode
RLHF 201 - with Nathan Lambert of AI2 and Interconnects
byLatent Space: The AI Engineer Podcast — Practitioners talking LLMs, CodeGen, Agents, Multimodality, AI UX, GPU Infra and all things Software 3.0
0 ratings
0% found this document useful
STaR: Bootstrapping Reasoning With Reasoning: Generating step-by-step"chain-of-thought"rationales improves language model performance on complex reasoning tasks like mathematics or commonsense question-answering. However, inducing language model rationale generation currently requires either con...
Podcast episode
STaR: Bootstrapping Reasoning With Reasoning: Generating step-by-step"chain-of-thought"rationales improves language model performance on complex reasoning tasks like mathematics or commonsense question-answering. However, inducing language model rationale generation currently requires either con...
byPapers Read on AI
0 ratings
0% found this document useful
Episode 406: JSJ 401: Hasura with Tanmai Gopal
Podcast episode
Episode 406: JSJ 401: Hasura with Tanmai Gopal
byJavaScript Jabber
0 ratings
0% found this document useful
From search trees to neural nets, a deep dive into natural language processing: Today's episode is sponsored by Rev. We explore the history of automatic speech recognition and computer systems that can understand human commands. From there, we explain the machine learning revolution that has powered recent advancements in speech to text systems like the one employed by Rev. Finally, we look to the future, and imagine the features and services that the next generation of this AI could produce.
Podcast episode
From search trees to neural nets, a deep dive into natural language processing: Today's episode is sponsored by Rev. We explore the history of automatic speech recognition and computer systems that can understand human commands. From there, we explain the machine learning revolution that has powered recent advancements in speech to text systems like the one employed by Rev. Finally, we look to the future, and imagine the features and services that the next generation of this AI could produce.
byThe Stack Overflow Podcast
0 ratings
0% found this document useful
74: Pratik Desai: A time traveler’s guide to martech and personalization
Podcast episode
74: Pratik Desai: A time traveler’s guide to martech and personalization
byHumans of Martech
0 ratings
0% found this document useful
Your Buddy is Typing: We sit down with Teresa Dietrich, Stack Overflow's new Chief Product Officer. Teresa, a CMU grad and tech industry veteran, shares a story from her time at AOL: the day three little dots emerged without warning and turned their network upside down.
Podcast episode
Your Buddy is Typing: We sit down with Teresa Dietrich, Stack Overflow's new Chief Product Officer. Teresa, a CMU grad and tech industry veteran, shares a story from her time at AOL: the day three little dots emerged without warning and turned their network upside down.
byThe Stack Overflow Podcast
0 ratings
0% found this document useful
Seamless SQL And Python Transformations For Data Engineers And Analysts With SQLMesh: Data transformation is a key activity for all of the organizational roles that interact with data. Because of its importance and outsized impact on what is possible for downstream data consumers it is critical that everyone is able to collaborate seamlessly. SQLMesh was designed as a unifying tool that is simple to work with but powerful enough for large-scale transformations and complex projects. In this episode Toby Mao explains how it works, the importance of automatic column-level lineage tracking, and how you can start using it today.
Podcast episode
Seamless SQL And Python Transformations For Data Engineers And Analysts With SQLMesh: Data transformation is a key activity for all of the organizational roles that interact with data. Because of its importance and outsized impact on what is possible for downstream data consumers it is critical that everyone is able to collaborate seamlessly. SQLMesh was designed as a unifying tool that is simple to work with but powerful enough for large-scale transformations and complex projects. In this episode Toby Mao explains how it works, the importance of automatic column-level lineage tracking, and how you can start using it today.
byData Engineering Podcast
0 ratings
0% found this document useful
Reduce Friction In Your Business Analytics Through Entity Centric Data Modeling: For business analytics the way that you model the data in your warehouse has a lasting impact on what types of questions can be answered quickly and easily. The major strategies in use today were created decades ago when the software and hardware for warehouse databases were far more constrained. In this episode Maxime Beauchemin of Airflow and Superset fame shares his vision for the entity-centric data model and how you can incorporate it into your own warehouse design.
Podcast episode
Reduce Friction In Your Business Analytics Through Entity Centric Data Modeling: For business analytics the way that you model the data in your warehouse has a lasting impact on what types of questions can be answered quickly and easily. The major strategies in use today were created decades ago when the software and hardware for warehouse databases were far more constrained. In this episode Maxime Beauchemin of Airflow and Superset fame shares his vision for the entity-centric data model and how you can incorporate it into your own warehouse design.
byData Engineering Podcast
0 ratings
0% found this document useful
271: Is That Your Final Answer?: On this week's episode, Chris describes his continued explorations with Svelte specifically bringing TypeScript into the mix. Steph discusses the first cohort for the RSpec training and some related testing questions around third party APIs. They round things out with a listener question about managing permissions and roles, with a brief detour around single table inheritance vs polymorphic associations. Oh, and Steph rented goats to mow her lawn. ?
Podcast episode
271: Is That Your Final Answer?: On this week's episode, Chris describes his continued explorations with Svelte specifically bringing TypeScript into the mix. Steph discusses the first cohort for the RSpec training and some related testing questions around third party APIs. They round things out with a listener question about managing permissions and roles, with a brief detour around single table inheritance vs polymorphic associations. Oh, and Steph rented goats to mow her lawn. ?
byThe Bike Shed
0 ratings
0% found this document useful
175 - Decimal, Binary, Octal, and Hex: Free SDR Course! Our new free course will introduce you to Software Defined Radios. "The Ultimate Beginner's Guide to Software Defined Radio: Everything you need to know to get started with SDR in an afternoon" is now open for enrollment. ...
Podcast episode
175 - Decimal, Binary, Octal, and Hex: Free SDR Course! Our new free course will introduce you to Software Defined Radios. "The Ultimate Beginner's Guide to Software Defined Radio: Everything you need to know to get started with SDR in an afternoon" is now open for enrollment. ...
byScanner School - Everything you wanted to know about the Scanner Radio Hobby
0 ratings
0% found this document useful
Leslie Lamport - in partnership with ACM Bytecast: In this collaboration with ACM ByteCast and Hanselminutes, Scott welcomes 2013 ACM A.M. Turing Award laureate Leslie Lamport of Microsoft Research, best known for his seminal work in distributed and concurrent systems, and as the initial developer of the document preparation system LaTeX and the author of its first manual. Among his many honors and recognitions, Lamport is a Fellow of ACM and has received the IEEE Emanuel R. Piore Award, the Dijkstra Prize, and the IEEE John von Neumann Medal. Leslie shares his journey into computing, which started out as something he only did in his spare time as a mathematician. Scott and Leslie discuss the differences and similarities between computer science and software engineering, the math involved in Leslie’s high-level temporal logic of actions (TLA), which can help solve the famous Byzantine Generals Problem, and the algorithms Leslie himself has created. He also reflects on how the building
Podcast episode
Leslie Lamport - in partnership with ACM Bytecast: In this collaboration with ACM ByteCast and Hanselminutes, Scott welcomes 2013 ACM A.M. Turing Award laureate Leslie Lamport of Microsoft Research, best known for his seminal work in distributed and concurrent systems, and as the initial developer of the document preparation system LaTeX and the author of its first manual. Among his many honors and recognitions, Lamport is a Fellow of ACM and has received the IEEE Emanuel R. Piore Award, the Dijkstra Prize, and the IEEE John von Neumann Medal. Leslie shares his journey into computing, which started out as something he only did in his spare time as a mathematician. Scott and Leslie discuss the differences and similarities between computer science and software engineering, the math involved in Leslie’s high-level temporal logic of actions (TLA), which can help solve the famous Byzantine Generals Problem, and the algorithms Leslie himself has created. He also reflects on how the building
byHanselminutes with Scott Hanselman
0 ratings
0% found this document useful

Skip carousel

Website And RSS Feed Python Scraping
Linux Format
Article
Website And RSS Feed Python Scraping
Oct 18, 2022
Matt Holder has worked in IT support for over a decade, and is keen to utilise Linux alongside other installed systems. All the Python scripts that we’ve discussed in this tutorial are all available at https://github.com/mattmole/LXF295. Before we b
8 min read
Comparing Time Series Data Like A Pro
Linux Format
Article
Comparing Time Series Data Like A Pro
Jun 1, 2021
8 min read
Microcontrollers In Amateur Radio
CQ Amateur Radio
Article
Microcontrollers In Amateur Radio
Feb 1, 2023
3 min read
GO Inside Parsing – How Go Handles The Code
Linux Format
Article
GO Inside Parsing – How Go Handles The Code
Jul 30, 2019
This tutorial has two aspects: a theoretical one and a practical one. In the theoretical part, you will learn about parsing, grammar and regular expressions; this is how languages are built and therefore understood in terms of construction and usage.
8 min read
Code Read/write System File Tools
Linux Format
Article
Code Read/write System File Tools
May 31, 2022
The subject of this tutorial is file input and output (I/O) in Rust. File I/O is an important part of every operating system. An OS or even a database system wouldn’t be able to function without being able to process, read, write and append to files.
8 min read
Code Read/write System File Tools
Linux Format
Article
Code Read/write System File Tools
May 31, 2022
The subject of this tutorial is file input and output (I/O) in Rust. File I/O is an important part of every operating system. An OS or even a database system wouldn’t be able to function without being able to process, read, write and append to files.
8 min read
Using Calc For Serious Mathematics Work
Linux Format
Article
Using Calc For Serious Mathematics Work
Mar 10, 2020
10 min read
Coding Secure Rust System Tools
Linux Format
Article
Coding Secure Rust System Tools
Apr 5, 2022
8 min read
Coding Secure Rust System Tools
Linux Format
Article
Coding Secure Rust System Tools
Apr 5, 2022
8 min read
Solve Word Puzzles With Clever Code
Linux Format
Article
Solve Word Puzzles With Clever Code
Apr 2, 2024
Matt Holder is an IT professional of 15 years, Linux user for over 20 years, homeautomation fan and selfprofessed geek. The full source code can be downloaded from https://github.com/mattmole/LXF-Countdown-Word-Solver We are going to create a program
8 min read
Facebook Neural Nets Solve Differential Equations
Popular Mechanics South Africa
Article
Facebook Neural Nets Solve Differential Equations
Feb 22, 2021
IF UNIVERSITY students could obtain a copy of Facebook’s latest neural network – a series of algorithms that resemble the human brain – they could cheat all the way through Calculus 300. At the least, they could solve the following differential equat
3 min read
How To Develop Multi-threaded Code
Linux Format
Article
How To Develop Multi-threaded Code
Jul 26, 2022
Get the code for this tutorial from the Linux Format archive: www. linuxformat. com/archives ?issue=292. You can learn more about Rust at www. rust-lang.org. This month’s instalment of our ongoing Rust series will cover concurrent programming. The di
10 min read
Tensor Flow 101
APC
Article
Tensor Flow 101
Jan 27, 2020
4 min read
Machine Learning – With Zero Programming
APC
Article
Machine Learning – With Zero Programming
Aug 12, 2019
6 min read
FRACTALS Going beyond the Mandelbrot Set
Linux Format
Article
FRACTALS Going beyond the Mandelbrot Set
Jul 2, 2019
10 min read
PYTHON/GO Parsing XML files
Linux Format
Article
PYTHON/GO Parsing XML files
Jul 2, 2019
8 min read
How Image Recognition Works
APC
Article
How Image Recognition Works
Nov 4, 2019
4 min read
Monitor And Graph Your System Metrics
Linux Format
Article
Monitor And Graph Your System Metrics
Dec 13, 2022
Credit: https://oss.oetiker.ch/rrdtool Matt Holder has worked in IT support for over a decade, and always tries to use Linux alongside other installed systems. The code used in this article can be downloaded from https:// github.com/ mattmole/ LXF297
8 min read
Microcontrollers In Amateur Radio
CQ Amateur Radio
Article
Microcontrollers In Amateur Radio
Aug 1, 2022
My way of teaching about program data has always been a little different than the way most approach the subject. As you may know, pointers in C are a special type of variable that allows you to access data in a very efficient manner. Indeed, many com
6 min read
How To Solve Mazes Using Go And Maths
Linux Format
Article
How To Solve Mazes Using Go And Maths
May 5, 2020
This month, we’re going delve into solving maze and dungeon puzzles with the help of graph theory (well, we are all in isolation–Ed). The purpose of such puzzles is to determine whether there’s a path that connects two places and find that path, keep
7 min read
Turn Your Data Plots Into Visual Information
Linux Format
Article
Turn Your Data Plots Into Visual Information
Jun 30, 2020
This month’s coding tutorial is on D3.js, a powerful low-level JavaScript library that can create unique, highly customisable and impressive graphical output based on your data. For reasons of simplicity most of the examples shown here will include t
8 min read
The Race To Exascale Supercomputers
Maximum PC
Article
The Race To Exascale Supercomputers
Jun 21, 2022
9 min read
Visualise Complex Data In Style Using Timelion
Linux Format
Article
Visualise Complex Data In Style Using Timelion
Oct 20, 2020
Simon Quain is a site reliability engineer who likes discovering open datasets online to play around with in the Elastic Stack. You’ve probably heard of Elasticsearch – the search engine that enables you to index and then quickly search through your
9 min read
R Data Types
Linux Format
Article
R Data Types
Feb 11, 2020
List – a generic vector containing other objects, and a vector is a sequence of data elements of the same basic type. Data frame – a list of vectors of equal length that is primarily used for storing data tables. It is used a lot in R and is equivale
1 min read
16 Simple, Yet Powerful, Excel Functions You Need To Know
PCWorld
Article
16 Simple, Yet Powerful, Excel Functions You Need To Know
Sep 4, 2018
3 min read
Coding Arm 64-bit Assembly Language
Linux Format
Article
Coding Arm 64-bit Assembly Language
Feb 9, 2021
9 min read
Monitoring Cycles In Directory Trees
Linux Format
Article
Monitoring Cycles In Directory Trees
Apr 6, 2021
7 min read
Kolmogorov Complexity and Our Search for Meaning
Nautilus
Article
Kolmogorov Complexity and Our Search for Meaning
Aug 2, 2018
Was it a chance encounter when you met that special someone or was there some deeper reason for it? What about that strange dream last night—was that just the random ramblings of the synapses of your brain or did it reveal something deep about your u
8 min read
Build Calendars With Date And Time Types
Linux Format
Article
Build Calendars With Date And Time Types
Feb 11, 2020
7 min read
Clever CAD Coding For Clients And Cigars
Linux Format
Article
Clever CAD Coding For Clients And Cigars
Apr 2, 2024
Credit: http://openscad.org Tam Hanna’s minimal creative capability makes him ideally suited to teaching all kinds of workarounds for problems that require the use of creativity. Catch up by ordering back issues on page 58! The experiments performed
7 min read

Related categories

Skip carousel

Reviews for String Algorithms in C

Rating: 0 out of 5 stars

0 ratings

0 ratings0 reviews

Book preview

String Algorithms in C - Thomas Mailund

T. MailundString Algorithms in Chttps://doi.org/10.1007/978-1-4842-5920-7_1

1. Introduction

Thomas Mailund¹

(1)

Aarhus N, Denmark

Algorithms operating on strings are fundamental to many computer programs, and in particular searching for one string in another is the core of many algorithms. An example is searching for a word in a text document, where we want to know everywhere it occurs. This search can be exact, meaning that we are looking for the positions where the word occurs verbatim, or approximative, where we allow for some spelling mistakes.

This book will teach you fundamental algorithms and data structures for exact and approximative search. The goal of the book is not to cover the theory behind the material in great detail. However, we will see theoretical considerations where relevant. The purpose of the book is to give you examples of how the algorithms can be implemented. For every algorithm and data structure in the book, I will present working C code and nowhere will I use pseudocode. When I argue for the correctness and running time of algorithms, I do so intentionally informal. I aim at giving you an idea about why the algorithms solve a specific problem in a given time, but I will not mathematically prove so.

You can copy all the algorithms and data structures in this book from the pages, but they are also available in a library on GitHub: https://github.com/mailund/stralg. You can download and link against the library or copy snippets of code into your own projects. On GitHub you can also find all the programs I have used for time measurement experiments so you can compare the algorithm’s performance on your own machine and in your own runtime environment.

Notation and conventions

Unless otherwise stated , we use x, y, and p to refer to strings and i, j, k, l, and h to denote indices. We use 𝜖 to denote the empty string. We use a, b, and c for single characters. As in C, we do not distinguish between strings and pointers to a sequence of characters. Since the book is about algorithms in C, the notation we use matches that which is used for strings, pointers, and arrays in C. Arrays and strings are indexed from zero, that is, A[0] is the first value in array A (and x[0] is the first character in string x). The ith character in a string is at index i − 1.

When we refer to a substring, we define it using two indices, i and j, i ≤ j, and we write x[i, j] for the substring. The first index is included and the second is not, that is, x[i, j] = x[i]x[i + 1] · · · x[ j − 1]. If a string has length n, then the substring x[0, n] is the full string. If we have a character a and a string x, then ax denotes the string that has a as its first character and is then followed by the string x. We use ak to denote a sequence of as of length k. The string a³ x has a as its first three characters and is then followed by x. A substring that starts at index 0, x[0, i], is a prefix of the string, and it is a proper prefix if it is neither the empty string x[0, 0] = 𝜖 nor the full string x[0, n]. A substring that ends in n, x[i, n], is a suffix, and it is a proper suffix if it is neither the empty string nor the full string. We will sometimes use x[i, ] for this suffix.

We use $ to denote a sentinel in a string, that is, it is a character that is not found in the rest of the string. It is typically placed at the end of the string. The zero-terminated C strings have the zero byte as their termination sentinel, and unless otherwise stated, $ refers to that. All C strings x have a zero sentinel at index n if the string has length n, x = x[0]x[1] · · · x[n − 1]0. For some algorithms, the sentinel is essential; in others, it is not. We will leave it out of the notation when a sentinel isn’t needed for an algorithm, but naturally include the sentinel when it is necessary.

Graphical notation

Most data structures and algorithmic ideas are simpler to grasp if we use drawings to capture the structure of strings rather than textual notation. Because of this, I have chosen to provide more figures in this book than you will typically see in a book on algorithms. I hope you will appreciate it. If there is anything you find unclear about an algorithm, I suggest you try to draw key strings yourself and work out the properties you have problems with.

In figures, we represent strings as rectangles. We show indices into a string as arrows pointing to the index in the string; see Figure 1-1. In this notation, we do not distinguish between pointers and indices. If a variable is an index j and it points into x, then what it points to is x[ j], naturally. If the variable is a pointer, y, then what it points to is ∗y. Whether we are working with pointers or indices should be clear from the context. It will undoubtedly be clear from the C implementations. We represent substrings by boxes of a different color inside the original string-rectangle. If we specify the indices defining the substring, we include their start and stop index (where the stop index points one after the end of the substring).

../images/490521_1_En_1_Chapter/490521_1_En_1_Fig1_HTML.png

Figure 1-1

Graphical string notation

When we compare two strings, we imagine that we align the boxes representing them, so the parts we are comparing are on top of each other. For example, if we compare the character at index j in a string x with the character at index i in another string p, then we draw a box representing x over a box representing p, and we draw pointers for the two indices; see Figure 1-2. Since we are comparing the characters in the two indices, the two pointers are pointing at each other. Conceptually, we imagine that p is aligned under x starting at position j − i.

../images/490521_1_En_1_Chapter/490521_1_En_1_Fig2_HTML.png

Figure 1-2

Graphical notation for comparing indices in two different strings

Code conventions

There is a trade-off between long variables and type names and then the line within a book. In many cases, I have had to use an indentation that you might not be used to. In function prototypes and function definitions, I will generally write with one variable per line, indented under the function return type and name, for example:

void compute_z_array(

const unsigned char *x,

uint32_t n,

uint32_t *Z

);

void compute_reverse_z_array(

const unsigned char *x,

uint32_t m,

uint32_t *Z

);

If a return type name is long, I will put it on a separate line:

static inline uint32_t

edge_length(struct suffix_tree_node *n) {

return range_length(n->range);

}

struct suffix_tree *

mccreight_suffix_tree(

const unsigned char *string

);

struct suffix_tree *

lcp_suffix_tree(

const unsigned char *string,

uint32_t *sa,

uint32_t *lcp

);

struct suffix_tree_node *

st_search(

struct suffix_tree *st,

const char *pattern

);

I make an exception for functions that take no arguments, that is, have void as their argument type.

There are many places where an algorithm needs to use characters to look up in arrays. If you use the conventional C string type, char *, then the character can be either signed or unsigned, depending on your compiler, and you have to cast the type to avoid warnings. A couple of places we also have to make assumptions about the alphabet size. Because of this, I use arrays of uint8_t with a zero termination sentinel as strings. On practically all platforms, char is 8 bits so this type is, for all intents and purposes, C strings. We are just guaranteed that we can use it unsigned and that the alphabet size is 256. Occasionally it is necessary to cast a uint8_t * string to a C string. A direct cast, (char *)x, will most likely work unless you are on an exotic platform. If it doesn’t, you have to build a char buffer and copy characters byte by byte. It has to be a very exotic platform if you cannot store 8 bits in a char! Because I assume that you can always cast to char *, I will use the C library string functions (with a cast) when this is appropriate. It is a small matter to write your own if it is necessary.

I will use uint32_t for indices, assuming that strings are short enough that we can index them with 32 bits. You can change it as needed, but I find it a good trade-off between likely lengths of strings and the space I need for data structures. I work in bioinformatics, so hundreds of millions of characters are usually the longest I encounter.

Reporting a sequence of results

In search algorithms, we report each occurrence of a pattern. This sounds straightforward, but there is a design choice in how we report the occurrences. Consider the following algorithm. It is the Boyer-Moore-Horspool (BMH) algorithm that you will see in the next chapter. It takes a string, x, and a pattern, p, and searches for all occurrences of p in x. First, it does some preprocessing, and then it searches. This is a general pattern for the algorithms in the next chapter. In the search, when it has found an occurrence of p, it reports the position by calling the REPORT(j) function.

void bmh_search(

const uint8_t *x,

const uint8_t *p

) {

uint32_t n = strlen((char *)x);

uint32_t m = strlen((char *)p);

// Preprocessing

int jump_table[256];

for (int k = 0; k < 256; k++) {

jump_table[k] = m;

}

for (int k = 0; k < m - 1; k++) {

jump_table[p[k]] = m - k - 1;

}

// Searching

for (uint32_t j = 0;

j < n - m + 1;

j += jump_table[x[j + m - 1]]) {

int i = m - 1;

while (i > 0 && p[i] == x[j + i])

--i;

if (i == 0 && p[0] == x[j]) {

REPORT(j);

}

If a global report function is all you need in your program, then this is an excellent solution. Often, however, we need different reporting functions for separate calls to the search function. Or we need the report function to collect data for further processing (and preferably not use global variables). We need some handle to choose different report functions and to provide them with data.

One approach is using callbacks: Provide a report function and data argument to the search function and call the report function with the data when we find an occurrence. In the following implementation, I am assuming we have defined the function type for reporting, report_function , and the type for data we can add to it, report_function_data , somewhere outside of the search function.

void bmh_search_callback(

const uint8_t *x,

const uint8_t *p,

report_function report,

report_function_data data

) {

uint32_t n = strlen((char *)x);

uint32_t = strlen((char *)p);

// Preprocessing

uint32_t jump_table[256];

for (int k = 0; k < 256; k++) {

jump_table[k] = m;

}

for (int k = 0; k < m - 1; k++) {

jump_table[p[k]] = m - k - 1;

}

// Searching

for (uint32_t j = 0;

j < n - m + 1;

j += jump_table[x[j + m - 1]]) {

int i = m - 1;

while (i > 0 && p[i] == x[j + i])

--i;

if (i == 0 && p[0] == x[j]) {

report(j, data);

}

Callback functions have their uses, especially to handle events in interactive programs, but also some substantial drawbacks. To use them, you have to split the control flow of your program into different functions which hurts readability. Especially if you need to handle nested loops, for example, iterate over all nodes in a tree and for each node iterate over the leaves in another tree where for each node-leaf pair you find occurrences… (the example here is made up, but there are plenty of real algorithms with nested loops, and we will see some later in the book).

We can get the control flow back to the calling function using the iterator design pattern. We define an iterator structure that holds information about the loop state, and we provide functions for setting it up, progressing to the next point in the loop, and reporting a match and then a function for freeing resources once the iterator is done.

The general pattern for using an iterator looks like this:

struct iterator iter;

struct match match;

iter_init(&iter, data);

while (next_func(&iter, &match)) {

// Process occurrence

}

iter_dealloc(&iter);

The iterator structure contains the loop information. That means it must save the preprocessing data from when we create it and information about how to resume the loop after each time it is suspended. To report occurrences, it takes a match structure through which it can inform the caller about where matches occur. The iterator is initialized with data that determines what it should loop over. The loop is handled using a next function that returns true if there is another match (and if it does it will have filled out match). If there are no more matches, and the loop terminates, then it returns false. The iterator might contain allocated resources, so there should always be a function for freeing those.

In an iterator for the BMH, we would keep the string, pattern, and table we build in the preprocessing.

struct bmh_match_iter {

const uint8_t *x; uint32_t n;

const uint8_t *p; uint32_t m;

int jump_table[256];

uint32_t j;

};

struct match {

uint32_t pos;

};

We put the preprocessing in the iterator initialization function

void init_bmh_match_iter(

struct bmh_match_iter *iter,

const uint8_t *x, uint32_t n,

const uint8_t *p, uint32_t m

) {

// Preprocessing

iter->j = 0;

iter->x = x; iter->n = n;

iter->p = p; iter->m = m;

for (int k = 0; k < 256; k++) {

iter->jump_table[k] = m;

}

for (int k = 0; k < m - 1; k++) {

iter->jump_table[p[k]] = m - k - 1;

}

and in the next function we do the search

bool next_bmh_match(

struct bmh_match_iter *iter,

struct match *match

) {

const uint8_t *x = iter->x;

const uint8_t *p = iter->p;

uint32_t n = iter->n;

uint32_t m = iter->m;

int *jump_table = iter->jump_table;

// Searching

for (uint32_t j = iter->j;

j < n - m + 1;

j += jump_table[x[j + m - 1]]) {

int i = m - 1;

while (i > 0 && p[i] == x[j + i]) {

i--;

}

if (i == 0 && p[0] == x[j]) {

match->pos = j;

iter->j = j +

jump_table[x[j + m - 1]];

return true;

}

return false;

}

We set up the loop with information from the iterator and search from there. If we find an occurrence, we store the new loop information in the iterator and the match information in the match structure and return true. If we reach the end of the loop, we report false.

We have not allocated any resources when we initialized the iterator, so we do not need to free anything.

void dealloc_bmh_match_iter(

struct bmh_match_iter *iter

) {

// Nothing to do here

}

Since the deallocation function doesn’t do anything, we could leave it out. Still, consistency in the use of iterators helps avoid problems. Plus, should we at some point add resources to the iterator, then it is easier to update one function than change all the places in the code that should now call a deallocation function.

Iterators complicate the implementation of algorithms, especially if they are recursive and the iterator needs to keep track of a stack. Still, they greatly simplify the user interface to your algorithms, which makes it worthwhile to spend a little extra time implementing them. In this book, I will use iterators throughout.

T. MailundString Algorithms in Chttps://doi.org/10.1007/978-1-4842-5920-7_2

2. Classical algorithms for exact search

Thomas Mailund¹

(1)

Aarhus N, Denmark

We kick the book off by looking at classical algorithms for exact search, that is, finding positions in a string where a pattern string matches precisely. This problem is so fundamental that it received much attention in the very early days of computing, and by now, there are tens if not hundreds of approaches. In this chapter, we see a few classics.

Recall that we use iterators whenever we have an algorithm that loops over results that should be reported. All iterators must be initialized, and the resources they hold must be deallocated when we no longer need the iterator. When we loop, we have a function that returns true when there is something to report and false when the loop is done. The values the iterator reports are put in a structure that we pass along to the function that iterates to the next value to report. For the algorithms in this chapter, we initialize the iterators with the string in which we search, the pattern we search for, and the lengths of the two strings. Iterating over all occurrences of the pattern follows this structure:

struct iterator iter;

struct match match;

iter_init(iter, x, strlen(x), p, strlen(p));

while (next_func(&iter, &match)) {

// Process occurrence

}

iter_dealloc(&iter);

When we report an occurrence, we get the position of the match, so the structure the iterator use for reporting is this:

struct match {

uint32_t pos;

};

Naïve algorithm

The simplest way imaginable for exact search is to iteratively move through the string x, with an index j that conceptually runs the pattern p along x, and at each index start matching the pattern against the string using another index, i (see Figure 2-1). The algorithm has two loops, one that iterates j through x and one that iterates i through p, matching x[i + j] against p[i] along the way. We run the inner loop until we see a mismatch or until we reach the end of the pattern. In the former case, we move p one step forward and try matching again. In the second case, we report an occurrence at position j and then increment the index so we can start matching at the next position. We stop the outer loop when index j is greater than n − m. If it is, there isn’t room for a match that doesn’t run past the end of x.

../images/490521_1_En_2_Chapter/490521_1_En_2_Fig1_HTML.png

Figure 2-1

Exact search with the naïve approach

We terminate the comparison of x[i + j] and p[i] when we see a mismatch, so in the best case, where the first character in p never matches a character in x, the algorithm runs in time O(n) where n is the length of x. In the worst case, we match all the way to the end of p at each position, and in that case, the running time is O(nm) where m is the length of p.

To implement the algorithm using an iterator, the iterator needs to remember the string to search in and the pattern to search for—so we do not need to pass these along each time we increment the iterator with potentials for errors if we use the wrong strings—and we keep track

Enjoying the preview?

Page 1 of 1

String Algorithms in C: Efficient Text Representation and Search

About this ebook

Thomas Mailund

Read more from Thomas Mailund

Related authors

Related to String Algorithms in C

Related ebooks

Programming For You

Related podcast episodes

Related articles

Related categories

Reviews for String Algorithms in C

What did you think?

Book preview

String Algorithms in C - Thomas Mailund

1. Introduction

Notation and conventions

Graphical notation

Code conventions

Reporting a sequence of results

2. Classical algorithms for exact search

Naïve algorithm