Practical Apache Lucene 8: Uncover the Search Capabilities of Your Application
By Atri Sharma
()
About this ebook
Gain a thorough knowledge of Lucene's capabilities and use it to develop your own search applications. This book explores the Java-based, high-performance text search engine library used to build search capabilities in your applications.
Starting with the basics of Lucene and searching, you will learn about the types of queries used in it and also take a look at scoring models. Applying this basic knowledge, you will develop a hello world app using basic Lucene queries and explore functions like scoring and document level boosting.Along the way you will also uncover the concepts of partial searching and matching in Lucene and then learn how to integrate geographical information (geospatial data) in Lucene using spatial queries and n-dimensional indexing. This will prepare you to build a location-aware search engine with a representative data set that allows location constraints to be specified during a search. You’ll also develop atext classifier using Lucene and Apache Mahout, a popular machine learning framework.
After a detailed review of performance bench-marking and common issues associated with it, you’ll learn some of the best practices of tuning the performance of your application. By the end of the book you’ll be able to build your first Lucene patch, where you will not only write your patch, but also test it and ensure it adheres to community coding standards.
What You’ll Learn
- Master the basics of Apache Lucene
- Utilize different query types in Apache Lucene
- Explore scoring and document level boosting
- Integrate geospatial data into your application
Who This Book Is For
Developers wanting to learn the finer details of Apache Lucene by developing a series of projects with it.
Related to Practical Apache Lucene 8
Related ebooks
Beginning Apache Spark 2: With Resilient Distributed Datasets, Spark SQL, Structured Streaming and Spark Machine Learning library Rating: 0 out of 5 stars0 ratingsBeginning Azure Synapse Analytics: Transition from Data Warehouse to Data Lakehouse Rating: 0 out of 5 stars0 ratingsApache Solr for Indexing Data Rating: 0 out of 5 stars0 ratingsMonitoring Elasticsearch Rating: 0 out of 5 stars0 ratingsBuilding REST APIs with Flask: Create Python Web Services with MySQL Rating: 0 out of 5 stars0 ratingsPractical Splunk Search Processing Language: A Guide for Mastering SPL Commands for Maximum Efficiency and Outcome Rating: 0 out of 5 stars0 ratingsLearning Elasticsearch 7.x: Index, Analyze, Search and Aggregate Your Data Using Elasticsearch (English Edition) Rating: 0 out of 5 stars0 ratingsPython Data Persistence Rating: 0 out of 5 stars0 ratingsApache Solr High Performance Rating: 0 out of 5 stars0 ratingsPolyBase Revealed: Data Virtualization with SQL Server, Hadoop, Apache Spark, and Beyond Rating: 0 out of 5 stars0 ratingsOracle SQL Revealed: Executing Business Logic in the Database Engine Rating: 0 out of 5 stars0 ratingsLearning ELK Stack Rating: 0 out of 5 stars0 ratingsBeginning Laravel: Build Websites with Laravel 5.8 Rating: 0 out of 5 stars0 ratingsSpark: Big Data Cluster Computing in Production Rating: 0 out of 5 stars0 ratingsPro Machine Learning Algorithms: A Hands-On Approach to Implementing Algorithms in Python and R Rating: 0 out of 5 stars0 ratingsQuerying Databricks with Spark SQL: Leverage SQL to query and analyze Big Data for insights (English Edition) Rating: 0 out of 5 stars0 ratingsBeginning Apache Spark 3: With DataFrame, Spark SQL, Structured Streaming, and Spark Machine Learning Library Rating: 0 out of 5 stars0 ratingsOpenStack Sahara Essentials Rating: 0 out of 5 stars0 ratingsLearning Elasticsearch Rating: 4 out of 5 stars4/5SQL Rating: 0 out of 5 stars0 ratingsPro Oracle SQL Development: Best Practices for Writing Advanced Queries Rating: 0 out of 5 stars0 ratingsNatural Language Processing Recipes: Unlocking Text Data with Machine Learning and Deep Learning Using Python Rating: 0 out of 5 stars0 ratingsSQL Server Big Data Clusters: Data Virtualization, Data Lake, and AI Platform Rating: 0 out of 5 stars0 ratingsSplunk Certified Study Guide: Prepare for the User, Power User, and Enterprise Admin Certifications Rating: 0 out of 5 stars0 ratingsNext-Generation Big Data: A Practical Guide to Apache Kudu, Impala, and Spark Rating: 0 out of 5 stars0 ratingsExpert Oracle RAC Performance Diagnostics and Tuning Rating: 0 out of 5 stars0 ratingsOracle 10g/11g Data and Database Management Utilities: LITE Rating: 0 out of 5 stars0 ratingsHands-on Azure Pipelines: Understanding Continuous Integration and Deployment in Azure DevOps Rating: 0 out of 5 stars0 ratings
Programming For You
Python Programming : How to Code Python Fast In Just 24 Hours With 7 Simple Steps Rating: 4 out of 5 stars4/5HTML & CSS: Learn the Fundaments in 7 Days Rating: 4 out of 5 stars4/5Coding All-in-One For Dummies Rating: 4 out of 5 stars4/5Java for Beginners: A Crash Course to Learn Java Programming in 1 Week Rating: 5 out of 5 stars5/5SQL QuickStart Guide: The Simplified Beginner's Guide to Managing, Analyzing, and Manipulating Data With SQL Rating: 4 out of 5 stars4/5Learn PowerShell in a Month of Lunches, Fourth Edition: Covers Windows, Linux, and macOS Rating: 0 out of 5 stars0 ratingsGrokking Algorithms: An illustrated guide for programmers and other curious people Rating: 4 out of 5 stars4/5Hacking: Ultimate Beginner's Guide for Computer Hacking in 2018 and Beyond: Hacking in 2018, #1 Rating: 4 out of 5 stars4/5Learn to Code. Get a Job. The Ultimate Guide to Learning and Getting Hired as a Developer. Rating: 5 out of 5 stars5/5SQL: For Beginners: Your Guide To Easily Learn SQL Programming in 7 Days Rating: 5 out of 5 stars5/5The Unofficial Guide to Open Broadcaster Software: OBS: The World's Most Popular Free Live-Streaming Application Rating: 0 out of 5 stars0 ratingsPYTHON: Practical Python Programming For Beginners & Experts With Hands-on Project Rating: 5 out of 5 stars5/5Excel : The Ultimate Comprehensive Step-By-Step Guide to the Basics of Excel Programming: 1 Rating: 5 out of 5 stars5/5Python Projects for Beginners: A Ten-Week Bootcamp Approach to Python Programming Rating: 0 out of 5 stars0 ratingsTeach Yourself C++ Rating: 4 out of 5 stars4/5Python: For Beginners A Crash Course Guide To Learn Python in 1 Week Rating: 4 out of 5 stars4/5Web Designer's Idea Book, Volume 4: Inspiration from the Best Web Design Trends, Themes and Styles Rating: 4 out of 5 stars4/5The Little SAS Book: A Primer, Sixth Edition Rating: 5 out of 5 stars5/5SQL All-in-One For Dummies Rating: 3 out of 5 stars3/5Pokemon Go: Guide + 20 Tips and Tricks You Must Read Hints, Tricks, Tips, Secrets, Android, iOS Rating: 5 out of 5 stars5/5Linux: Learn in 24 Hours Rating: 5 out of 5 stars5/5
Reviews for Practical Apache Lucene 8
0 ratings0 reviews
Book preview
Practical Apache Lucene 8 - Atri Sharma
© Atri Sharma 2020
A. SharmaPractical Apache Lucene 8https://doi.org/10.1007/978-1-4842-6345-7_1
1. Hola, Lucene!
Atri Sharma¹
(1)
Bengaluru, Karnataka, India
Welcome to your journey into the mystic and fun world of search!
This chapter discusses information-retrieval (IR) systems, focusing on Lucene, and includes a number of terms and key bits of information that you will find handy as we delve deeper into the innards of Lucene and build some cool applications.
Before we begin, let’s review some commonly asked questions and their answers.
Is Lucene a search engine?
I would generally respond, Well, yes and no.
Lucene is a library and platform used to build a variety of IR systems, search engines being the most well known. Other areas where Lucene applies include the following:
Document analytics: Loading and traversing text documents based on some given criteria, finding the top terms from documents, aggregating on specific fields.
Log analytics: Analyzing application logs with high-performance dashboards built directly on top of Lucene.
Geospatial search: With the recent advent of geospatial queries and data structures, Lucene is fast becoming a popular choice for indexing latitudinal and longitudinal data and queries such as colocation searches.
Do I need a background in information retrieval?
A background in IR is good to have, but it is not mandatory in your quest to understand Lucene. Although this book covers some background theory, any complex information is explained in such a way as to allow you to build a contextual understanding.
Note
Information retrieval is a vast subject, and this book does not aim to be exhaustively authoritative on the topic. Instead, this book focuses on enhancing your understanding of Lucene. For a more robust understanding of IR in general, consult an IR book to complement this one. Introduction to Information Retrieval by Manning is a good reference book to start with.
Does Lucene support SQL or SQL dialects?
No, Lucene has a set of supported queries (discussed in further chapters). You can use those queries, however, to construct execution plans that can be derived from SQL-ish languages. Some engines have done exactly that, but Lucene has no native support for it.
Lucene is a library that enables you to build your search application. Use Lucene when you need fast indexing and search capabilities in your application. Lucene puts a lot of power in the user’s hands, but with great power comes great responsibility. So, it is crucial that the discerning user understand trade-offs and the best-fit cases for Lucene’s more advanced features (as discussed in later chapters).
Key Features of Lucene
Lucene has been around for a while, and a number of its features and capabilities have made it quite popular, including the following:
Scalable, high-performance indexing: Lucene enables very fast indexing (over 150GB/hour on modern hardware).
Incremental indexing: Indexes are added as new documents come in, with no need to modify existing indexes (and thus avoiding excess index churn).
Top N queries: Lucene is efficient at scanning through large volumes of data and getting the top N documents which match the query, ranked by the scoring function used.
Myriad query types: Phrase queries, wildcard queries, proximity queries, range queries, and more.
Single-field and multifieldsearching: Lucene allows searching on a single field or multiple fields — allowing ranking across multiple fields.
Sortingandfaceting: Lucene allows ordering results on a specific field (think of SQL ORDER BY). Lucene also allows faceting on different attributes (think of SQL GROUP BY).
Multi-index searches: Lucene allows a single query to query multiple indices and then merge results from all of the indices to a final result set.
Concurrent indexingandsearching: Lucene allows using multiple threads for a single indexing or a search request. This can speed up the performance of a single request significantly.
Highlighting, joins, and result grouping: Lucene allows joins across different indices with certain conditions.
Pluggable ranking models: Including the vector space model and Okapi BM25.
Custom codecs for storage: It is possible to implement and use custom storage formats in Lucene, thus allowing flexibility when using Lucene in the search application.
Although by no means comprehensive, the preceding list highlights some of the more popular features available in Lucene that enable you to build high-performance systems while maintaining a high degree of relevance in the returned results.
Information Retrieval Basics
Before delving deeper into the innards of Lucene, let’s review what search in IR systems is really about. Although this discussion does not go into the full complexity of IR, it should allow you to grasp the finer details of Lucene as we