Ebook530 pages5 hours

Automatic Text Summarization

Name: Automatic Text Summarization
ISBN: 9781119044079

Rating: 0 out of 5 stars

()

Read preview

About this ebook

Textual information in the form of digital documents quickly accumulates to create huge amounts of data. The majority of these documents are unstructured: it is unrestricted text and has not been organized into traditional databases. Processing documents is therefore a perfunctory task, mostly due to a lack of standards. It has thus become extremely difficult to implement automatic text analysis tasks. Automatic Text Summarization (ATS), by condensing the text while maintaining relevant information, can help to process this ever-increasing, difficult-to-handle, mass of information.

This book examines the motivations and different algorithms for ATS. The author presents the recent state of the art before describing the main problems of ATS, as well as the difficulties and solutions provided by the community. The book provides recent advances in ATS, as well as current applications and trends. The approaches are statistical, linguistic and symbolic. Several examples are also included in order to clarify the theoretical concepts.

Skip carousel

Information Technology

LanguageEnglish

PublisherWiley

Release dateSep 25, 2014

ISBN9781119044079

Related to Automatic Text Summarization

Related ebooks

Skip carousel

Words and Power: Computers, Language, and U.S. Cold War Values
Ebook
Words and Power: Computers, Language, and U.S. Cold War Values
byBernadette Longo
Rating: 0 out of 5 stars
0 ratings
Computer–Assisted Research in the Humanities: A Directory of Scholars Active
Ebook
Computer–Assisted Research in the Humanities: A Directory of Scholars Active
byJoseph Raben
Rating: 0 out of 5 stars
0 ratings
Analysis of Wildlife Radio-Tracking Data
Ebook
Analysis of Wildlife Radio-Tracking Data
byGary C. White
Rating: 0 out of 5 stars
0 ratings
A Librarian's Guide to Graphs, Data and the Semantic Web
Ebook
A Librarian's Guide to Graphs, Data and the Semantic Web
byJames Powell
Rating: 0 out of 5 stars
0 ratings
Communication Nets: Stochastic Message Flow and Delay
Ebook
Communication Nets: Stochastic Message Flow and Delay
byLeonard Kleinrock
Rating: 3 out of 5 stars
3/5
Numerical Analysis
Ebook
Numerical Analysis
byLarkin Ridgway Scott
Rating: 0 out of 5 stars
0 ratings
Spatiotemporal Data Analysis
Ebook
Spatiotemporal Data Analysis
byGidon Eshel
Rating: 3 out of 5 stars
3/5
Computers and Languages: Theory and Practice
Ebook
Computers and Languages: Theory and Practice
byA. Nijholt
Rating: 0 out of 5 stars
0 ratings
Google's PageRank and Beyond: The Science of Search Engine Rankings
Ebook
Google's PageRank and Beyond: The Science of Search Engine Rankings
byAmy N. Langville
Rating: 4 out of 5 stars
4/5
Languages, Compilers and Run-time Environments for Distributed Memory Machines
Ebook
Languages, Compilers and Run-time Environments for Distributed Memory Machines
byElsevier Books Reference
Rating: 0 out of 5 stars
0 ratings
Transactional Information Systems: Theory, Algorithms, and the Practice of Concurrency Control and Recovery
Ebook
Transactional Information Systems: Theory, Algorithms, and the Practice of Concurrency Control and Recovery
byGerhard Weikum
Rating: 5 out of 5 stars
5/5
Foundations of Quantum Programming
Ebook
Foundations of Quantum Programming
byMingsheng Ying
Rating: 4 out of 5 stars
4/5
Language and the Rise of the Algorithm
Ebook
Language and the Rise of the Algorithm
byJeffrey M. Binder
Rating: 0 out of 5 stars
0 ratings
University Education in Computing Science: Proceedings of a Conference on Graduate Academic and Related Research Programs in Computing Science, Held at the State University of New York at Stony Brook, June 1967
Ebook
University Education in Computing Science: Proceedings of a Conference on Graduate Academic and Related Research Programs in Computing Science, Held at the State University of New York at Stony Brook, June 1967
byAaron Finerman
Rating: 0 out of 5 stars
0 ratings
Advances in Programming and Non-Numerical Computation
Ebook
Advances in Programming and Non-Numerical Computation
byL. Fox
Rating: 0 out of 5 stars
0 ratings
Introduction to Artificial Intelligence: Second, Enlarged Edition
Ebook
Introduction to Artificial Intelligence: Second, Enlarged Edition
byPhilip C. Jackson
Rating: 3 out of 5 stars
3/5
Commonsense Reasoning
Ebook
Commonsense Reasoning
byErik T. Mueller
Rating: 0 out of 5 stars
0 ratings
Cellular Automata
Ebook
Cellular Automata
byE. F. Codd
Rating: 4 out of 5 stars
4/5
Cache and Memory Hierarchy Design: A Performance Directed Approach
Ebook
Cache and Memory Hierarchy Design: A Performance Directed Approach
bySteven A. Przybylski
Rating: 3 out of 5 stars
3/5
Exploring RANDOMNESS
Ebook
Exploring RANDOMNESS
byGregory J. Chaitin
Rating: 4 out of 5 stars
4/5
Representations of Commonsense Knowledge
Ebook
Representations of Commonsense Knowledge
byErnest Davis
Rating: 0 out of 5 stars
0 ratings
The Computer from Pascal to von Neumann
Ebook
The Computer from Pascal to von Neumann
byHerman H. Goldstine
Rating: 2 out of 5 stars
2/5
Computational Linguistics: International Series in Modern Applied Mathematics and Computer Science
Ebook
Computational Linguistics: International Series in Modern Applied Mathematics and Computer Science
byNick Cercone
Rating: 5 out of 5 stars
5/5
Computers and the Cybernetic Society
Ebook
Computers and the Cybernetic Society
byMichael A. Arbib
Rating: 0 out of 5 stars
0 ratings
Statistics for Physical Sciences: An Introduction
Ebook
Statistics for Physical Sciences: An Introduction
byBrian Martin
Rating: 0 out of 5 stars
0 ratings
Splines and Variational Methods
Ebook
Splines and Variational Methods
byP. M. Prenter
Rating: 5 out of 5 stars
5/5
Feature Extraction and Image Processing for Computer Vision
Ebook
Feature Extraction and Image Processing for Computer Vision
byMark Nixon
Rating: 4 out of 5 stars
4/5
Multi-Dimensional Summarization in Cyber-Physical Society
Ebook
Multi-Dimensional Summarization in Cyber-Physical Society
byHai Zhuge
Rating: 0 out of 5 stars
0 ratings
On the Logic and Learning of Language
Ebook
On the Logic and Learning of Language
bySean A. Fulop
Rating: 0 out of 5 stars
0 ratings
Combinatorial Optimization: Networks and Matroids
Ebook
Combinatorial Optimization: Networks and Matroids
byEugene Lawler
Rating: 4 out of 5 stars
4/5

Information Technology For You

Skip carousel

Computer Science: A Concise Introduction
Ebook
Computer Science: A Concise Introduction
byIan Sinclair
Rating: 4 out of 5 stars
4/5
The Ultimate Guide to Landing a Network Engineering Job
Ebook
The Ultimate Guide to Landing a Network Engineering Job
byJ.L Parham
Rating: 0 out of 5 stars
0 ratings
How to Write Effective Emails at Work
Ebook
How to Write Effective Emails at Work
byRamakrishna Reddy
Rating: 4 out of 5 stars
4/5
Health Informatics: Practical Guide
Ebook
Health Informatics: Practical Guide
byWilliam Hersh
Rating: 0 out of 5 stars
0 ratings
AWS Certified Cloud Practitioner: Study Guide with Practice Questions and Labs
Ebook
AWS Certified Cloud Practitioner: Study Guide with Practice Questions and Labs
byNouman Ahmed Khan
Rating: 5 out of 5 stars
5/5
CompTIA ITF+ CertMike: Prepare. Practice. Pass the Test! Get Certified!: Exam FC0-U61
Ebook
CompTIA ITF+ CertMike: Prepare. Practice. Pass the Test! Get Certified!: Exam FC0-U61
byMike Chapple
Rating: 0 out of 5 stars
0 ratings
Data Analytics for Beginners: Introduction to Data Analytics
Ebook
Data Analytics for Beginners: Introduction to Data Analytics
byAnthony S. Williams
Rating: 4 out of 5 stars
4/5
Practical Ethical Hacking from Scratch
Ebook
Practical Ethical Hacking from Scratch
byAnsh Goyal
Rating: 5 out of 5 stars
5/5
The Microsoft Office 365 Bible All-in-One For Beginners: The Complete Step-By-Step User Guide For Mastering The Microsoft Office Suite To Help With Productivity And Completing Tasks (Computer/Tech)
Ebook
The Microsoft Office 365 Bible All-in-One For Beginners: The Complete Step-By-Step User Guide For Mastering The Microsoft Office Suite To Help With Productivity And Completing Tasks (Computer/Tech)
byVoltaire Lumiere
Rating: 0 out of 5 stars
0 ratings
Creating Online Courses with ChatGPT | A Step-by-Step Guide with Prompt Templates
Ebook
Creating Online Courses with ChatGPT | A Step-by-Step Guide with Prompt Templates
byCea West
Rating: 4 out of 5 stars
4/5
The Basics of Hacking and Penetration Testing: Ethical Hacking and Penetration Testing Made Easy
Ebook
The Basics of Hacking and Penetration Testing: Ethical Hacking and Penetration Testing Made Easy
byPatrick Engebretson
Rating: 4 out of 5 stars
4/5
Inkscape Beginner’s Guide
Ebook
Inkscape Beginner’s Guide
byBethany Hiitola
Rating: 5 out of 5 stars
5/5
WordPress Plugin Development: Beginner's Guide
Ebook
WordPress Plugin Development: Beginner's Guide
byVladimir Prelovac
Rating: 0 out of 5 stars
0 ratings
Hacking With Kali Linux : A Comprehensive, Step-By-Step Beginner's Guide to Learn Ethical Hacking With Practical Examples to Computer Hacking, Wireless Network, Cybersecurity and Penetration Testing
Ebook
Hacking With Kali Linux : A Comprehensive, Step-By-Step Beginner's Guide to Learn Ethical Hacking With Practical Examples to Computer Hacking, Wireless Network, Cybersecurity and Penetration Testing
byPeter Bradley
Rating: 5 out of 5 stars
5/5
How To Use Chatgpt: Using Chatgpt To Make Money Online Has Never Been This Simple
Ebook
How To Use Chatgpt: Using Chatgpt To Make Money Online Has Never Been This Simple
byMoses Omojola
Rating: 0 out of 5 stars
0 ratings
Hacking Essentials - The Beginner's Guide To Ethical Hacking And Penetration Testing
Ebook
Hacking Essentials - The Beginner's Guide To Ethical Hacking And Penetration Testing
byAdidas Wilson
Rating: 3 out of 5 stars
3/5
Summary of Super-Intelligence From Nick Bostrom
Ebook
Summary of Super-Intelligence From Nick Bostrom
bySummary Station
Rating: 5 out of 5 stars
5/5
An Ultimate Guide to Kali Linux for Beginners
Ebook
An Ultimate Guide to Kali Linux for Beginners
byAnsh Goyal
Rating: 3 out of 5 stars
3/5
Windows Registry Forensics: Advanced Digital Forensic Analysis of the Windows Registry
Ebook
Windows Registry Forensics: Advanced Digital Forensic Analysis of the Windows Registry
byHarlan Carvey
Rating: 4 out of 5 stars
4/5
Quantum Computing for Programmers and Investors: with full implementation of algorithms in C
Ebook
Quantum Computing for Programmers and Investors: with full implementation of algorithms in C
byAlberto Palazzi
Rating: 5 out of 5 stars
5/5
CompTIA Network+ CertMike: Prepare. Practice. Pass the Test! Get Certified!: Exam N10-008
Ebook
CompTIA Network+ CertMike: Prepare. Practice. Pass the Test! Get Certified!: Exam N10-008
byMike Chapple
Rating: 0 out of 5 stars
0 ratings
Linux Command Line and Shell Scripting Bible
Ebook
Linux Command Line and Shell Scripting Bible
byRichard Blum
Rating: 3 out of 5 stars
3/5
The Programmer's Brain: What every programmer needs to know about cognition
Ebook
The Programmer's Brain: What every programmer needs to know about cognition
byFelienne Hermans
Rating: 5 out of 5 stars
5/5
DNS in Action
Ebook
DNS in Action
byAlena KabelovÃ¡
Rating: 0 out of 5 stars
0 ratings
Summary of Digital Minimalism: by Cal Newport - Choosing a Focused Life in a Noisy World - A Comprehensive Summary
Ebook
Summary of Digital Minimalism: by Cal Newport - Choosing a Focused Life in a Noisy World - A Comprehensive Summary
byAlexander Cooper
Rating: 5 out of 5 stars
5/5
ChatGPT: The Future of Intelligent Conversation
Ebook
ChatGPT: The Future of Intelligent Conversation
byCea West
Rating: 4 out of 5 stars
4/5
The Certified Fintech Professional
Ebook
The Certified Fintech Professional
byDr. Zulk Shamsuddin
Rating: 5 out of 5 stars
5/5
A Civic Technologist's Practice Guide
Ebook
A Civic Technologist's Practice Guide
byCyd Harrell
Rating: 0 out of 5 stars
0 ratings
Supercommunicator: Explaining the Complicated So Anyone Can Understand
Ebook
Supercommunicator: Explaining the Complicated So Anyone Can Understand
byFrank Pietrucha
Rating: 3 out of 5 stars
3/5
CompTIA A+ CertMike: Prepare. Practice. Pass the Test! Get Certified!: Core 1 Exam 220-1101
Ebook
CompTIA A+ CertMike: Prepare. Practice. Pass the Test! Get Certified!: Core 1 Exam 220-1101
byMike Chapple
Rating: 0 out of 5 stars
0 ratings

Related podcast episodes

Skip carousel

Notebooks = Chat++ and RAG = RecSys! — with Bryan Bischof of Hex Magic
Podcast episode
Notebooks = Chat++ and RAG = RecSys! — with Bryan Bischof of Hex Magic
byLatent Space: The AI Engineer Podcast — Practitioners talking LLMs, CodeGen, Agents, Multimodality, AI UX, GPU Infra and all things Software 3.0
0 ratings
0% found this document useful
World’s largest supercomputer v. biology’s toughest problems
Podcast episode
World’s largest supercomputer v. biology’s toughest problems
byRaising Health
0 ratings
0% found this document useful
007 Prof. Kristin Persson of the Materials Project – Building a Global Materials Informatics Platform: Summary: This episode focuses on Prof. Kristin Persson’s work directing the Materials Project, where she had her group have built an open-source materials informatics platform that reaches over 75,000 users worldwide. In this episode,...
Podcast episode
007 Prof. Kristin Persson of the Materials Project – Building a Global Materials Informatics Platform: Summary: This episode focuses on Prof. Kristin Persson’s work directing the Materials Project, where she had her group have built an open-source materials informatics platform that reaches over 75,000 users worldwide. In this episode,...
byDataLab: The Materials Informatics Podcast
0 ratings
0% found this document useful
Do Computers Byte?: ENCORE The march of computer technology continues. But as silicon chips and search engines become faster and more productive – can the same be said for us? The creator of Wolfram Alpha describes how his new “computational knowledge...
Podcast episode
Do Computers Byte?: ENCORE The march of computer technology continues. But as silicon chips and search engines become faster and more productive – can the same be said for us? The creator of Wolfram Alpha describes how his new “computational knowledge...
byBig Picture Science
0 ratings
0% found this document useful
Dynamical Sampling: Modellansatz 173
Podcast episode
Dynamical Sampling: Modellansatz 173
byModellansatz - English episodes only
0 ratings
0% found this document useful
Dynamical Sampling
Podcast episode
Dynamical Sampling
byModellansatz
0 ratings
0% found this document useful
Forget to Remember: ENCORE You must not remember this. Indeed, it may be key to having a healthy brain. Our gray matter evolved to forget things; otherwise we’d have the images of every face we saw on the subway rattling around our head all day long. Yet...
Podcast episode
Forget to Remember: ENCORE You must not remember this. Indeed, it may be key to having a healthy brain. Our gray matter evolved to forget things; otherwise we’d have the images of every face we saw on the subway rattling around our head all day long. Yet...
byBig Picture Science
0 ratings
0% found this document useful
STEPHEN WOLFRAM 2.0 - Resolving the Mystery of the Second Law of Thermodynamics
Podcast episode
STEPHEN WOLFRAM 2.0 - Resolving the Mystery of the Second Law of Thermodynamics
byMachine Learning Street Talk (MLST)
0 ratings
0% found this document useful
Martin Paul Eve, "Close Reading with Computers" (Stanford UP, 2019): Most contemporary digital studies are interested in distant-reading paradigms for large-scale literary history. This book asks what happens when such telescopic techniques function as a microscope instead...
Podcast episode
Martin Paul Eve, "Close Reading with Computers" (Stanford UP, 2019): Most contemporary digital studies are interested in distant-reading paradigms for large-scale literary history. This book asks what happens when such telescopic techniques function as a microscope instead...
byNew Books in Literary Studies
0 ratings
0% found this document useful
Thomas Huckle and Tobias Neckel, "Bits and Bugs: A Scientific and Historical Review of Software Failures in Computational Science" (SIAM, 2019): An interview with Thomas Huckle and Tobias Neckel
Podcast episode
Thomas Huckle and Tobias Neckel, "Bits and Bugs: A Scientific and Historical Review of Software Failures in Computational Science" (SIAM, 2019): An interview with Thomas Huckle and Tobias Neckel
byNew Books in the History of Science
0 ratings
0% found this document useful
Thomas Huckle and Tobias Neckel, "Bits and Bugs: A Scientific and Historical Review of Software Failures in Computational Science" (SIAM, 2019): An interview with Thomas Huckle and Tobias Neckel
Podcast episode
Thomas Huckle and Tobias Neckel, "Bits and Bugs: A Scientific and Historical Review of Software Failures in Computational Science" (SIAM, 2019): An interview with Thomas Huckle and Tobias Neckel
byNew Books in Science, Technology, and Society
0 ratings
0% found this document useful
Thomas Huckle and Tobias Neckel, "Bits and Bugs: A Scientific and Historical Review of Software Failures in Computational Science" (SIAM, 2019): An interview with Thomas Huckle and Tobias Neckel
Podcast episode
Thomas Huckle and Tobias Neckel, "Bits and Bugs: A Scientific and Historical Review of Software Failures in Computational Science" (SIAM, 2019): An interview with Thomas Huckle and Tobias Neckel
byNew Books in Mathematics
0 ratings
0% found this document useful
BI 158 Paul Rosenbloom: Cognitive Architectures: Support the show to get full episodes and join the Discord community. Paul Rosenbloom is Professor Emeritus of Computer Science at the University of Southern California. In the early 1980s, Paul , along with John Laird and the early AI pioneer Al
Podcast episode
BI 158 Paul Rosenbloom: Cognitive Architectures: Support the show to get full episodes and join the Discord community. Paul Rosenbloom is Professor Emeritus of Computer Science at the University of Southern California. In the early 1980s, Paul , along with John Laird and the early AI pioneer Al
byBrain Inspired
0 ratings
0% found this document useful
Assessing Quantum Computing, with Ignacio Cirac
Podcast episode
Assessing Quantum Computing, with Ignacio Cirac
byLondon Futurists
0 ratings
0% found this document useful
108 - Nadja Oertelt on Humanizing The Stories of Science
Podcast episode
108 - Nadja Oertelt on Humanizing The Stories of Science
byFUTURE FOSSILS
0 ratings
0% found this document useful
Does Not Compute: Scientific journal articles have a lot of numbers. Scientists are smart people with even smarter computers, so an outsider might think that, if nothing else, you can count on the math checking out. But modern data analysis is complicated, and computation...
Podcast episode
Does Not Compute: Scientific journal articles have a lot of numbers. Scientists are smart people with even smarter computers, so an outsider might think that, if nothing else, you can count on the math checking out. But modern data analysis is complicated, and computation...
byThe Black Goat
0 ratings
0% found this document useful
58. David Duvenaud - Using generative models for explainable AI
Podcast episode
58. David Duvenaud - Using generative models for explainable AI
byTowards Data Science
0 ratings
0% found this document useful
069 - All about automated machine learning with Dr. Nicolo Fusi
Podcast episode
069 - All about automated machine learning with Dr. Nicolo Fusi
byMicrosoft Research Podcast
0 ratings
0% found this document useful
100x Improvements in Deep Learning Performance with Sparsity with Subutai Ahmad - #562
Podcast episode
100x Improvements in Deep Learning Performance with Sparsity with Subutai Ahmad - #562
byThe TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)
0 ratings
0% found this document useful
Andrea L. Guzman et al., "The SAGE Handbook of Human–Machine Communication" (SAGE, 2023): An interview with Rhonda McEwen
Podcast episode
Andrea L. Guzman et al., "The SAGE Handbook of Human–Machine Communication" (SAGE, 2023): An interview with Rhonda McEwen
byNew Books in Language
0 ratings
0% found this document useful
Andrea L. Guzman et al., "The SAGE Handbook of Human–Machine Communication" (SAGE, 2023): An interview with Rhonda McEwen
Podcast episode
Andrea L. Guzman et al., "The SAGE Handbook of Human–Machine Communication" (SAGE, 2023): An interview with Rhonda McEwen
byNew Books in Science, Technology, and Society
0 ratings
0% found this document useful
Nick Bostrom - Philosopher, Founding Director, Future of Humanity Institute, Oxford: Conversation, Future of Humanity Institute, Oxford, AI, super intelligence, the singularity, digital minds, education, future, work, transhumanism, post humanity, climate change, human enhancement, philosophy,
Podcast episode
Nick Bostrom - Philosopher, Founding Director, Future of Humanity Institute, Oxford: Conversation, Future of Humanity Institute, Oxford, AI, super intelligence, the singularity, digital minds, education, future, work, transhumanism, post humanity, climate change, human enhancement, philosophy,
byOne Planet Podcast · Climate Change, Politics, Sustainability, Environmental Solutions, Renewable Energy, Activism, Biodiversity, Carbon Footprint, Wildlife, Regenerative Agriculture, Circular Economy, Extinction, Net-Zero
0 ratings
0% found this document useful
How to craft and communicate a simple science story: Ditch jargon, keep sentences short, stay topical. Pakinam Amer shares the secrets of good science writing for books and magazines.In the final episode of this six-part series about science communication, three experts describe how they learned to cr...
Podcast episode
How to craft and communicate a simple science story: Ditch jargon, keep sentences short, stay topical. Pakinam Amer shares the secrets of good science writing for books and magazines.In the final episode of this six-part series about science communication, three experts describe how they learned to cr...
byWorking Scientist
0 ratings
0% found this document useful
48. Big Data Wrangling for Core Sensing Technology
Podcast episode
48. Big Data Wrangling for Core Sensing Technology
byDiscovery to Recovery
0 ratings
0% found this document useful
269: Teaching Research to Digital Natives: Remember when research projects involved stacks of books and notecards? Yeah, me too. But we all know research has changed. I recently finished a couple of pedagogy books for English teachers - one by Angela Stockman on designing inclusive spaces for...
Podcast episode
269: Teaching Research to Digital Natives: Remember when research projects involved stacks of books and notecards? Yeah, me too. But we all know research has changed. I recently finished a couple of pedagogy books for English teachers - one by Angela Stockman on designing inclusive spaces for...
byThe Spark Creativity Teacher Podcast | ELA
0 ratings
0% found this document useful
323: OSI Burrito Guy: The earliest Unix code, how to replace fail2ban with blacklistd, OpenBSD crossed 400k commits, how to install Bolt CMS on FreeBSD, optimized hammer2, appeasing the OSI 7-layer burrito guys, and more.
Podcast episode
323: OSI Burrito Guy: The earliest Unix code, how to replace fail2ban with blacklistd, OpenBSD crossed 400k commits, how to install Bolt CMS on FreeBSD, optimized hammer2, appeasing the OSI 7-layer burrito guys, and more.
byBSD Now
0 ratings
0% found this document useful
AI Fundamentals: Datasets 101
Podcast episode
AI Fundamentals: Datasets 101
byLatent Space: The AI Engineer Podcast — Practitioners talking LLMs, CodeGen, Agents, Multimodality, AI UX, GPU Infra and all things Software 3.0
0 ratings
0% found this document useful
Math's Days Are Numbered: ENCORE Imagine a world without algebra. We can hear the sound of school children applauding. What practical use are parametric equations and polynomials, anyway? Even some scholars argue that algebra is the Latin of today, and should be dropped...
Podcast episode
Math's Days Are Numbered: ENCORE Imagine a world without algebra. We can hear the sound of school children applauding. What practical use are parametric equations and polynomials, anyway? Even some scholars argue that algebra is the Latin of today, and should be dropped...
byBig Picture Science
0 ratings
0% found this document useful
World’s Largest Supercomputer v. Biology’s Toughest Problems: with @vijaypande, @drGregBowman, @lr_bio This episode celebrates the 20th anniversary of Folding at Home, the distributed computing project for simulating protein dynamics. Folding at Home is run on millions of devices, is the world’s largest supercomputer, and tackles some of biology’s toughest problems, including COVID-19.
Podcast episode
World’s Largest Supercomputer v. Biology’s Toughest Problems: with @vijaypande, @drGregBowman, @lr_bio This episode celebrates the 20th anniversary of Folding at Home, the distributed computing project for simulating protein dynamics. Folding at Home is run on millions of devices, is the world’s largest supercomputer, and tackles some of biology’s toughest problems, including COVID-19.
bya16z Podcast
0 ratings
0% found this document useful
27: Peer Pressure (Cellular Automata): The fabric of the natural world is an issue of no small contention: philosophers and truth-seekers universally debate about and study the nature of reality, and exist as long as there are observers in that reality. One topic that has grown from a...
Podcast episode
27: Peer Pressure (Cellular Automata): The fabric of the natural world is an issue of no small contention: philosophers and truth-seekers universally debate about and study the nature of reality, and exist as long as there are observers in that reality. One topic that has grown from a...
byBreaking Math Podcast
0 ratings
0% found this document useful

Skip carousel

The Scientific Paper Is Obsolete
The Atlantic
Article
The Scientific Paper Is Obsolete
Apr 5, 2018
The scientific paper—the actual form of it—was one of the enabling inventions of modernity. Before it was developed in the 1600s, results were communicated privately in letters, ephemerally in lectures, or all at once in books. There was no public fo
18 min read
Ghost Writer In The Machine
Fortean Times
Article
Ghost Writer In The Machine
Mar 23, 2023
You may not have heard of chatbots before, but chances are you have read a recent news story about ChatGPT from OpenAI or rival Bing from Microsoft or Bard from Google. These artificial intelligence apps give apparently intelligent, if occasionally b
4 min read
Artificial Intelligence Is Cracking Open the Vatican's Secret Archives
The Atlantic
Article
Artificial Intelligence Is Cracking Open the Vatican's Secret Archives
Apr 30, 2018
6 min read
Literature by the Numbers: Critical Reading Gets Even Better When You Use Your Computer.: Critical reading gets even better when you use your computer.
Nautilus
Article
Literature by the Numbers: Critical Reading Gets Even Better When You Use Your Computer.: Critical reading gets even better when you use your computer.
Oct 10, 2013
“Literature is the opposite of data,” wrote novelist Stephen Marche in the Los Angeles Times Review of Books in October 2012. He cited his favorite line from Shakespeare’s Macbeth: “Light thickens, and the crows make wing to the rooky wood.” Marche w
9 min read
In Fermat’s Library, No Margin Is Too Narrow
Nautilus
Article
In Fermat’s Library, No Margin Is Too Narrow
Oct 16, 2017
4 min read
Numismatics and Nanotechnology
Numismatic News
Article
Numismatics and Nanotechnology
Jan 20, 2023
Nanotechnology involves science and applications on the scale of a nanometer, which is one-billionth of a meter. To give you some idea of size, a cube measuring one nano-meter on a side would hold 176 hydrogen atoms. It might be easier to visualize t
2 min read
The Human Brain Project Hasn’t Lived Up to Its Promise
The Atlantic
Article
The Human Brain Project Hasn’t Lived Up to Its Promise
Jul 22, 2019
6 min read
How Artificial Intelligence Got Its Name
The Atlantic
Article
How Artificial Intelligence Got Its Name
Aug 9, 2016
5 min read
The Logic of the Filing Cabinet Is Everywhere
The Atlantic
Article
The Logic of the Filing Cabinet Is Everywhere
Jun 8, 2021
7 min read
When The Humanities Meet Big Data
The Christian Science Monitor
Article
When The Humanities Meet Big Data
May 16, 2018
3 min read
The Psychological Benefits of Commuting to Work
The Atlantic
Article
The Psychological Benefits of Commuting to Work
Jun 9, 2021
7 min read
Science Is Becoming Less Human
The Atlantic
Article
Science Is Becoming Less Human
Dec 11, 2023
This summer, a pill intended to treat a chronic, incurable lung disease entered mid-phase human trials. Previous studies have demonstrated that the drug is safe to swallow, although whether it will improve symptoms of the painful fibrosis that it tar
8 min read
This PC Does Not Exist
Maximum PC
Article
This PC Does Not Exist
May 23, 2023
7 min read
The Dawn Of Post-theory Science
Guardian Weekly
Article
The Dawn Of Post-theory Science
Jan 14, 2022
5 min read
Getting Connected
Maximum PC
Article
Getting Connected
Mar 29, 2022
16 min read
The Problem with Scientific Credit
Nautilus
Article
The Problem with Scientific Credit
Nov 15, 2018
I first learned about Douglas Prasher three years ago, when an algorithm we’d just developed made an unpredictable prediction: He should have been a recipient of the 2008 Nobel Prize. Instead, the award had been given to three other scientists. Even
14 min read
The Problem with Scientific Credit
Nautilus
Article
The Problem with Scientific Credit
Nov 15, 2018
I first learned about Douglas Prasher three years ago, when an algorithm we’d just developed made an unpredictable prediction: He should have been a recipient of the 2008 Nobel Prize. Instead, the award had been given to three other scientists. Even
14 min read
AI Is Making Literary Leaps – Now We Need The Rules To Catch Up
The Guardian
Article
AI Is Making Literary Leaps – Now We Need The Rules To Catch Up
Nov 2, 2019
3 min read
Torching the Modern-Day Library of Alexandria
The Atlantic
Article
Torching the Modern-Day Library of Alexandria
Apr 20, 2017
“Somewhere at Google there is a database containing 25 million books and nobody is allowed to read them.”
24 min read
Can America's Fastest Super - Computer Defeat Covid for Good?
Popular Mechanics South Africa
Article
Can America's Fastest Super - Computer Defeat Covid for Good?
Jun 17, 2022
13 min read
Novelists Deserve To Be Paid For Training AI
APC
Article
Novelists Deserve To Be Paid For Training AI
Nov 6, 2023
8 min read
A Quantum Leap
Newsweek International
Article
A Quantum Leap
Dec 18, 2020
11 min read
The Origins Of The Mac
MacLife
Article
The Origins Of The Mac
Feb 5, 2019
11 min read
A Quantum Leap
Newsweek
Article
A Quantum Leap
Dec 18, 2020
11 min read
Is Artificial Intelligence Permanently Inscrutable?
Nautilus
Article
Is Artificial Intelligence Permanently Inscrutable?
Sep 1, 2016
Dmitry Malioutov can’t say much about what he built. As a research scientist at IBM, Malioutov spends part of his time building machine learning systems that solve difficult problems faced by IBM’s corporate clients. One such program was meant for a
13 min read
“Q’s” Real Gadgets Could Go Beyond Lasers Inside Pens
PC Pro Magazine
Article
“Q’s” Real Gadgets Could Go Beyond Lasers Inside Pens
Jul 8, 2021
Recently, I read Quantum Computing: How it Works and Why it Could Change the World by Amit Katwala. It’s a great intro to the present state of and prospects for quantum computing – not too technical, wasting little space on the basics but avoiding th
2 min read
Is Artificial Intelligence Permanently Inscrutable?: Despite new biology-like tools, some insist interpretation is impossible.
Nautilus
Article
Is Artificial Intelligence Permanently Inscrutable?: Despite new biology-like tools, some insist interpretation is impossible.
Sep 1, 2016
Dmitry Malioutov can’t say much about what he built. As a research scientist at IBM, Malioutov spends part of his time building machine learning systems that solve difficult problems faced by IBM’s corporate clients. One such program was meant for a
13 min read
Breakthroughs Of The Year
The Atlantic
Article
Breakthroughs Of The Year
Dec 8, 2022
Pictures of the beginning of the universe, medicine that can (kind of) reverse death, and other leaps of human ingenuity
10 min read
What Counts as Science?: The arXiv preprint service is trying to answer an age-old question.
Nautilus
Article
What Counts as Science?: The arXiv preprint service is trying to answer an age-old question.
Oct 27, 2016
xxx.lanl.gov. The address was cryptic, with a tantalizing whiff of government secrets, or worse. The server itself was exactly the opposite. Government, yes—it was hosted by Los Alamos National Laboratory—but openly accessible in a way that, in those
9 min read
AI Is Unlocking the Human Brain’s Secrets
The Atlantic
Article
AI Is Unlocking the Human Brain’s Secrets
May 26, 2023
5 min read

Related categories

Skip carousel

Reviews for Automatic Text Summarization

Rating: 0 out of 5 stars

0 ratings

0 ratings0 reviews

Book preview

Automatic Text Summarization - Juan-Manuel Torres-Moreno

Contents

Foreword by A. Zamora and R. Salvador

Foreword by H. Saggion

Notation

Introduction

PART 1 Foundations

1. Why Summarize Texts?

1.1. The need for automatic summarization

1.2. Definitions of text summarization

1.3. Categorizing automatic summaries

1.4. Applications of automatic text summarization

1.5. About automatic text summarization

1.6. Conclusion

2 Automatic Text Summarization: Some Important Concepts

2.1. Processes before the process

2.2. Extraction, abstraction or compression?

2.3. Extraction-based summarization

2.4. Abstract summarization

2.5. Sentence compression and fusion

2.6. The limits of extraction

2.7. The evolution of automatic text summarization tasks

2.8. Evaluating summaries

2.9. Conclusion

3 Single-document Summarization

3.1. Historical approaches

3.2. Machine learning approaches

3.3. State-of-the-art approaches

3.4. Latent semantic analysis

3.5. Graph-based approaches

3.6. DIVTEX: a summarizer based on the divergence of probability distribution

3.7. CORTEX²²

3.8. ARTEX: another summarizer based on the vectorial model

3.9. ENERTEX: a summarization system based on textual energy

3.10. Approaches using rhetorical analysis

3.11. Summarization by lexical chains

3.12. Conclusion

4 Guided Multi-Document Summarization

4.1. Introduction

4.2. The problems of multidocument summarization

4.3. The DUC/TAC tasks for multidocument summarization and INEX Tweet Contextualization

4.4. The taxonomy of multidocument summarization methods

4.5. Some multi-document summarization systems and algorithms

4.6. Update summarization

4.7. Multi-document summarization by polytopes

4.8. Redundancy

4.9. Conclusion

5 Multi and Cross-lingual Summarization

5.1. Multilingualism, the web and automatic summarization

5.2. Automatic multilingual summarization

5.3. MEAD

5.4. SUMMARIST

5.5. COLUMBIA NEWSBLASTER

5.6. NEWSEXPLORER

5.7. GOOGLE NEWS

5.8. CAPS

5.9. Automatic cross-lingual summarization

5.10. Conclusion

6 Source and Domain-Specific Summarization

6.1. Genre, specialized documents and automatic summarization

6.2. Automatic summarization and organic chemistry

6.3. Automatic summarization and biomedicine

6.4. Summarizing court decisions

6.5. Opinion summarization

6.6. Web summarization

6.7. Conclusion

7 Text Abstracting

7.1. Abstraction-based automatic summarization

7.2. Systems using natural language generation

7.3. An abstract generator using information extraction

7.4. Guided summarization and a fully abstractive approach

7.5. Abstraction-based summarization via conceptual graphs

7.6. Multisentence fusion

7.7. Sentence compression

7.8. Conclusion

8 Evaluating Document Summaries

8.1. How can summaries be evaluated?

8.2. Extrinsic evaluations

8.3. Intrinsic evaluations

8.4. TIPSTER SUMMAC evaluation campaigns

8.5. NTCIR evaluation campaigns

8.6. DUC/TAC evaluation campaigns

8.7. CLEF-INEX evaluation campaigns

8.8. Semi-automatic methods for evaluating summaries

8.9. Automatic evaluation via information theory

8.10. Conclusion

Conclusion

Appendix 1 Information Retrieval, NLP and ATS

A.1. Text preprocessing

A.2. The vector space model

A.3. Precision, recall, F-measure and accuracy

Appendix 2 Automatic Text Summarization Resources

Bibliography

Index

title.gif

First published 2014 in Great Britain and the United States by ISTE Ltd and John Wiley & Sons, Inc.

Apart from any fair dealing for the purposes of research or private study, or criticism or review, as permitted under the Copyright, Designs and Patents Act 1988, this publication may only be reproduced, stored or transmitted, in any form or by any means, with the prior permission in writing of the publishers, or in the case of reprographic reproduction in accordance with the terms and licenses issued by the CLA. Enquiries concerning reproduction outside these terms should be sent to the publishers at the undermentioned address:

ISTE Ltd

27-37 St George’s Road

London SW19 4EU

www.iste.co.uk

John Wiley & Sons, Inc.

111 River Street

Hoboken, NJ 07030

USA

www.wiley.com

The rights of Juan-Manuel Torres-Moreno to be identified as the author of this work have been asserted by him in accordance with the Copyright, Designs and Patents Act 1988.

Library of Congress Control Number: 2014947781

British Library Cataloguing-in-Publication Data

A CIP record for this book is available from the British Library

ISBN 978-1-84821-668-6

Foreword by A. Zamora and R. Salvador

Foreword

The need to identify important information

Throughout history, the sheer amount of printed information and the scarce availability of time to read it have always been two major obstacles in the search for knowledge. Famous novelist W. Somerset Maugham wrote It was customary for someone who read a book to skip paragraphs, pages or even whole sections of the book. But, to obviate paragraphs or pages without suffering losses was a dangerous and very difficult thing to do unless one had a natural gift for it or a peculiar facility for an on-going recognition of interesting things as well as for bypassing invalid or uninteresting matters.¹ Somerset Maugham called this the art of skipping pages, and even himself had an offer by a North American publisher to re-edit old books in abbreviated form. The publisher wanted him to omit everything except the argument, main ideas and personages created by the author.

The problem of information storage

In the November 1961 issue of the Library Journal, Malcolm M. Ferguson, Reference Librarian at the Massachusetts Institute of Technology, wrote that the December 12, 1960, issue of Time magazine included a statement that, in his opinion, may provoke discussion and perplexity. The article reported that Richard P. Feynman, Professor of Physics, California Institute of Technology (who would later receive a Nobel prize), had predicted that an explosion of information and storage would soon occur on planet Earth and argued that it would be convenient to reduce the amount and size of present information to be able to store all the world’s basic knowledge in the equivalent of a pocket-sized pamphlet. Feynman went on to offer a prize to anyone reducing the information of one page of a book to one twenty-five-thousandth of the linear scale of the original, in a manner that it could be read with an electron microscope.

One year after Ferguson’s article, Hal Drapper from the University of California published a satirical article called MS FND IN A LBRY in the December 1961 issue of The Magazine of Fantasy & Science Fiction. Drapper poked fun at the idea of trying to cope with Feynman’s predicted information explosion problem by compressing data to microscopic levels to help store the information and by the development of indexes of indexes in order to retrieve it.

The information explosion was and still is a real problem, but the exponential growth in the capacity of new electronic processors overcomes the barrier imposed by old paper archives. Electronic book readers, such as Amazon’s Kindle, can now store hundreds of books in a device the size of a paperback book. Encoding information has even been taken to the molecular level, such as the synthetic organism created through genetic engineering at the J. Craig Venter Institute which used nucleotides in the organism’s DNA to encode a message containing the names of the authors and contributors. And the message would replicate when the organism multiplies.

Automatic size reduction

Since the dawn of the computer age, various attempts have been made to automatically shrink the size of the documents into a human-readable format. Drapper suggested one experimental method which consisted of reducing the cumbersome alphabet to mainly consonantal elements (thus: thr cmbrsm alfbt ws rdsd t mnl cnsntl elmnts) but this was done to facilitate quick reading, and only incidentally would cut down the mass of documents and books to solve the information explosion. More sophisticated methods attempted to identify, select and extract important information through statistical analysis by correlating words from the title to passages in the text, and by analyzing the position in which sentences occurred in the document trying to assign importance to sentences by their positions within the text. We (Antonio Zamora and Ricardo Salvador) worked at Chemical Abstract Service (CAS), where abstracting and indexing performed manually was our daily job². Realizing that it was difficult to recognize what was important in a document, we developed a computer program that started by trying to discover what was not important, such as clichés, empty phases, repetitive expressions, tables and grammatical subterfuges that were not essential for understanding the article. Our technique to eliminate non-significant, unessential, unsubstantial, trivial, useless, duplicated and obvious sentences from the whole text reduced the articles to the salient and interesting points of the document.

By the late 1970s, we could manufacture indicative abstracts for a fraction of a dollar. These abstracts contained 60–70% of the same sentences chosen by professional abstractors. Some professional abstractors began to worry that they could lose their jobs to a machine. However, primary journals started providing abstracts prepared by the authors themselves; so there was no demand for automatic abstracting. The Internet has changed all that. Many news feeds with unabridged text have become available that can overwhelm anyone looking for information. Yes, presently there is a real need for automatic abstracting.

The future

Today’s smart phones have more computational power than many mainframe computers of the 20th Century. Speech recognition and automatic translation have evolved from being experimental curiosities to tools that we use every day from Google. We are at the threshold of artificial intelligence. IBM’s Watson program won a one-million dollar Jeopardy contest against the two best human champions. Cloud-based computing has removed all the constraints of memory size from our portable devices. It is now possible to access extensive knowledge bases with simple protocols. The ease with which dictionaries can be accessed allows us to use synonyms in our contextual searches so that bird flu will retrieve avian influenza. Great advances in automatic abstracting have been made during the last 40 years. It is quite possible that in the next quarter of a century there will be computer programs with enough cognition to answer questions such as What were the important points of this article?. This is exactly what automatic abstracting strives to accomplish. For specific tasks, the behavior of these new programs will be indistinguishable from that of humans. These expectations are not only dreams about a distant future … we may actually live to see them become reality.

This book by Juan-Manuel Torres-Moreno presents the approaches that have been used in the past for automatic text summarization and describes the new algorithms and techniques of state-of-the-art programs.

Antonio Zamora

Ricardo Salvador

August 2014

1. Ten Novels and Their Authors by W. Somerset Maugham.

2. See sections 1.5 and 3.1.4.

Foreword by H. Saggion

Automatic Text Summarization

Juan-Manual Torres-Moreno

Text summarization, the reduction of a text to its essential content, is a task that requires linguistic competence, world knowledge, and intelligence. Automatic text summarization, the production of summaries by computers is therefore a very difficult task. One may wonder whether machines would ever be able to produce summaries which are indistinguishable from human summaries, a kind of Turing test and a motivation to advance the state of the art in natural language processing. Text summarization algorithms have many times ignored the cognitive processes and the knowledge that go into text understanding and which are essential to properly summarize.

In Automatic Text Summarization, Juan-Manuel Torres- Moreno offers a comprehensive overview of methods and techniques used in automatic text summarization research from the first attempts to the most recent trends in the field (e.g. opinion and tweet summarization).

Torres-Moreno makes an excellent job of covering various summarization problems, starting from the motivations behind this interesting subject, he takes the reader on a long research journey that spans over 50 years. The book is organized into more or less traditional topics: single and multi-document summarization, domain-specific summarization, multi-lingual and cross-lingual summarization. Systems, algorithms and methods are explained in detail often with illustrations, assessment of their performances, and limitations. Torres-Moreno pays particular attention to intrinsic summarization evaluation metrics that are based on vocabulary comparison and to international evaluation programs in text summarization such as the Document Understanding Conference and the Text Analysis Conference. Part of the book is dedicated to text abstracting, the ultimate goal of text summarization research, consisting of the production of text summaries which are not a mere copy of sentences and words from the input text.

While various books exist on this subject, Torres-Moreno’s covers interesting system and research ideas rarely cited in the literature. This book is a very valuable source of information offering in my view the most complete account of automatic text summarization research up to date. Because of its detailed content, clarity of exposition, and inquisitive style, this work will become a very valuable resource for teachers and researchers alike. I hope the readers will learn from this book and enjoy it as much as I have.

Horacio Saggion

Research Professor at Universitat Pompeu Fabra

Barcelona, August 2014

Notation

The main notations used in this book are the following:

Introduction

Tormented by the cursed ambition always to put a whole book in a page, a whole page in a sentence, and this sentence in a word. I am speaking of myself¹ Joseph Joubert (1754–1824), Pensées, essais et maximes.

Gallica http://www.bnf.fr

The need to summarize texts

Textual information in the form of digital documents quickly accumulates to huge amounts of data. Most of this large volume of documents is unstructured: it is unrestricted text and has not been organized into traditional databases. Processing documents is therefore a perfunctory task, mostly due to the lack of standards. Consequently, it has become extremely difficult to implement automatic text analysis tasks. Automatic text summarization (ATS), by condensing the text while maintaining relevant information, can help to process this ever-increasing, difficult to handle, mass of information.

Summaries are the most obvious way of reducing the length of a document. In books, abstracts and tables of content are different ways of representing the condensed form of the document. But what exactly is a text summary? The literature provides several definitions. One definition states that the summary of a document is a reduced, though precise, representation of the text which seeks to render the exact idea of its contents. Its principal objective is to give information about and provide privileged access to the source documents. Summarization is automatic when it is generated by software or an algorithm. ATS is a process of compression with loss of information, unlike conventional text compression methods and software, such as those of the gzip family². Information which has been discarded during the summarization process is not considered representative or relevant. In fact, determining the relevance of information included in documents is one of the major challenges of automatic summarization.

The summarization process

For human beings, summarizing documents to generate an adequate abstract is a cognitive process which requires that the text be understood. However, in a few weeks interval, the same person could write very different summaries. However, after an interval of several weeks, the same person can write very different summaries. This demonstrates, in part, the difficulty of automating the task. Generating a summary requires considerable cognitive effort from the summarizer (either a human being or an artificial system): different fragments of a text must be selected, reformulated and assembled according to their relevance. The coherence of the information included in the summary must also be taken into account. In any case, there is a general consensus that the process of summarizing documents is, for humans, a difficult cognitive task.

Fortunately, automatic summarization is an application requiring an extremely limited understanding of the text. Therefore, current systems of ATS have set out to replicate the results of the abstracting process and not the process itself, of which we still have a limited understanding. Although great progress has been made in automatic summarization in recent years, there is still a great number of things to achieve.

From the user’s perspective, people are not always looking for the same type of summary. There is also another type of user: automatic systems which use the results of a summarization system as the foundation for other tasks. Many different types and sources of documents exist (both textual and/or multimedia), such as legal, literary, scientific and technical documents, e-mails, tweets, videos, audio and images. As a result, there is no such thing as one type of summary. Sources and user expectations have prompted many applications to be created. Even for text documents, there is a large number of automatic summarization applications in existence (for people or machines):

– generic summarization;

– multi-document summarization;

– specialized document summarization: biomedical, legal texts, etc.;

– web page summarization;

– meeting, report, etc., summarization;

– biographical extracts;

– e-mail and e-mail thread summarization;

– news, rich site summary (RSS) and blog summarization;

– automatic extraction of titles;

– tweets summarization;

– opinion summarization;

– improving the performance of information retrieval systems, and so on.

Automatic text summarization

ATS became a discipline in 1958 following H.P. Luhn’s research into scientific text summarization. Two or three important works [EDM 61, EDM 69, RUS 71] were completed before 1978, but they were followed by some 20 years of silence. In the early 1990s, however, the works of K. Spärck-Jones and J. Kupieck improved this landscape. Currently, ATS is the subject of intensive research in several fields, including natural language processing (NLP) and other related areas.

ATS has benefited from the expertise of a range of fields of research: information retrieval and information extraction, natural language generation, discourse studies, machine learning and technical studies used by professional summarizers. Answers have been found to several questions concerning ATS, but many more remain unsolved. Indeed, it appears that 50 years will not suffice to resolve all the issues concerning ATS. For instance, although generating a summary is a difficult task in itself, evaluating the quality of the summary is another matter altogether. How can we objectively determine that the summary of one text is better than another? Does a perfect summary exist for each document? What objective criteria should exist to evaluate the content and form of summaries? The community is yet to find answers to these questions.

About this book

Since 1971, roughly 10 books have been published about document summarization: half of these are concerned with automatic summarization. This book is aimed at people who are interested in automatic summarization algorithms: researchers, undergraduate and postgraduate students in NLP, PhD students, engineers, linguists, computer scientists, mathematicians and specialists in the digital humanities. Far from being exhaustive, this book aims to provide an introduction to ATS. It will therefore offer an overview of ATS theories and techniques; the readers will be able to increase their knowledge on the subject.

The book is divided into two parts, consisting of four chapters each.

– ilh-1.gif I) Foundations:

- Chapter 1. Why Smmarize Texts?

- Chapter 2. Automatic Text Summarization

- Chapter 3. Single-Document Summarization

- Chapter 4. Guided Multi-Document Summarization

– ilh-2.gif II) Emerging Systems:

- Chapter 5. Multi- and Cross-Lingual Summarization

- Chapter 6. Source and Domain-Specific Summarization

- Chapter 7. Text Abstracting

- Chapter 8. Evaluating Document Summaries

The conclusion and two appendices complete this book. The first appendix deals with NLP and information retrieval (IR) techniques, which is useful for an improved understanding of the rest of the book: text preprocessing, vector model and relevance measures. The second appendix contains several resources for ATS: software, evaluation systems and scientific conferences. A website providing readers with examples, software and resources accompanies this book: http://ats.talne.eu.

This book is first and foremost a pragmatic look at what is eminently an applied science. A coherent overview of the field will be given, though chronology will not always be respected.

Juan-Manuel TORRES-MORENO

Laboratoire Informatique d’Avignon

Université d’Avignon et des Pays de Vaucluse

France, August 2014

1.S’il est un homme tourmenté par la maudite ambition de mettre tout un livre dans une page, toute une page dans une phrase, et cette phrase dans un mot, c’est moi.

2. For more information, see http://www.gzip.org/.

PART 1

Foundations

1 Why Summarize Texts?

In the 1780s, Joseph Joubert¹ was already tormented by his ambition to summarize texts and condense sentences. Though he did not know it, he was a visionary of the field of automatic text summarization, which was born some two and a half centuries later with the arrival of the Internet and the subsequent surge in the number of documents. Despite this surge, the number of documents which have been annotated (with Standard Generalized Markup Language (SGML), Extensible Markup Language (XML) or their dialects) remains small compared to unstructured text documents. As a result, this huge volume of documents quickly accumulates to even larger quantities. As a result, text documents are often analyzed in a perfunctory and very superficial way. In addition, different types of documents, such as administrative notes, technical reports, medical documents and legal and scientific texts, etc., have very different writing standards. Automatic text analysis tasks and text mining² [BER 04, FEL 07, MIN 02] as exploration, information extraction (IE), categorization and classification, among others, are therefore becoming increasingly difficult to implement [MAN 99b].

1.1. The need for automatic summarization

The expression too much information kills information is as relevant today as it has ever been. The fact that the Internet exists in multiple languages does nothing but increase the aforementioned difficulties regarding document analysis. Automatic text summarization helps us to efficiently process the ever-growing volume of information, which humans are simply incapable of handling. To be efficient, it is essential that the storage of documents is linked to their distribution. In fact, providing summaries alongside source documents is an interesting idea: summaries would become an exclusive way of accessing the content of the source document [MIN 01]. However, unfortunately this is not always possible.

Summaries written by the authors of online documents are not always available: they either do not exist or have been written by somebody else. In fact, summaries can either be written by the document author, professional summarizers³ or a third party. Minel et al. [MIN 01] have questioned why we are not happy with the summaries written by professional summarizers. According to the authors there are a number of reasons: […] because the cost of production of a summary by a professional is very high. […] Finally, the reliability of this kind of summary is very controversial. Knowing how to write documents does not always equate with knowing how to write correct summaries. This is even more true when the source document(s) relate to a specialized domain.

Why summarize texts? There are several valid reasons in favor of the – automatic – summarization of documents. Here are just a few [ARC 13]:

1) Summaries reduce reading time.

2) When researching documents, summaries make the selection process easier.

3) Automatic summarization improves the effectiveness of indexing.

4) Automatic summarization algorithms are less biased than human summarizers.

5) Personalized summaries are useful in question-answering systems as they provide personalized information.

6) Using automatic or semi-automatic summarization systems enables commercial abstract services to increase the number of texts they are able to process.

In addition to the above, the American National Standards Institute⁴ (ANSI) [ANS 79] states that a well prepared abstract enables readers to identify the basic content of a document quickly and accurately, to determine its relevance to their interests, and thus to decide whether they need to read the document in its entirety. Indeed, in 2002 the SUMMAC report supports this assertion by demonstrating that summaries as short as 17% of full text length sped up decision-making by almost a factor of 2 with no statistically significant degradation in accuracy [MAN 02].

1.2. Definitions of text summarization

The literature provides various definitions of text summarization. In 1979, the ANSI provided a concise definition [ANS 79]:

DEFINITION 1.1.– [An abstract] is an abbreviated, accurate representation of the contents of a document, preferably prepared by its author(s) for publication with it. Such abstracts are useful in access publications and machine-readable databases.

According to van Dijk [DIJ 80]:

DEFINITION 1.2.– The primary function of abstracts is to indicate and predict the structure and content of the text.

According to Cleveland [CLE 83]:

DEFINITION 1.3.– An abstract summarizes the essential contents of a particular knowledge record, and it is a true surrogate of the document.

Nevertheless, it is important to understand that these definitions describe summaries produced by people. Definitions of automatic summarization are considerably less ambitious. For instance, automatic text summarization is defined in the Oxford English dictionary⁵ as:

DEFINITION 1.4.– The creation of a shortened version

Enjoying the preview?

Page 1 of 1

Automatic Text Summarization

About this ebook

Related to Automatic Text Summarization

Related ebooks

Information Technology For You

Related podcast episodes

Related articles

Related categories

Reviews for Automatic Text Summarization

What did you think?

Book preview

Automatic Text Summarization - Juan-Manuel Torres-Moreno

Foreword by A. Zamora and R. Salvador

Foreword

The need to identify important information

The problem of information storage

Automatic size reduction

The future

Foreword by H. Saggion

Automatic Text Summarization

Juan-Manual Torres-Moreno

Notation

Introduction

The need to summarize texts

The summarization process

Automatic text summarization

About this book

1

Why Summarize Texts?

1.1. The need for automatic summarization

1.2. Definitions of text summarization