Multimodal Behavior Analysis in the Wild: Advances and Challenges

Ebook941 pages9 hours

Multimodal Behavior Analysis in the Wild: Advances and Challenges

Name: Multimodal Behavior Analysis in the Wild: Advances and Challenges
ISBN: 9780128146026

By Nicu Sebe

Rating: 0 out of 5 stars

()

Read preview

About this ebook

Multimodal Behavioral Analysis in the Wild: Advances and Challenges presents the state-of- the-art in behavioral signal processing using different data modalities, with a special focus on identifying the strengths and limitations of current technologies. The book focuses on audio and video modalities, while also emphasizing emerging modalities, such as accelerometer or proximity data. It covers tasks at different levels of complexity, from low level (speaker detection, sensorimotor links, source separation), through middle level (conversational group detection, addresser and addressee identification), and high level (personality and emotion recognition), providing insights on how to exploit inter-level and intra-level links.

This is a valuable resource on the state-of-the- art and future research challenges of multi-modal behavioral analysis in the wild. It is suitable for researchers and graduate students in the fields of computer vision, audio processing, pattern recognition, machine learning and social signal processing.

Gives a comprehensive collection of information on the state-of-the-art, limitations, and challenges associated with extracting behavioral cues from real-world scenarios
Presents numerous applications on how different behavioral cues have been successfully extracted from different data sources
Provides a wide variety of methodologies used to extract behavioral cues from multi-modal data

Skip carousel

LanguageEnglish

PublisherAcademic Press

Release dateNov 13, 2018

ISBN9780128146026

Related to Multimodal Behavior Analysis in the Wild

Related ebooks

Skip carousel

Cognitive Informatics, Computer Modelling, and Cognitive Science: Volume 2: Application to Neural Engineering, Robotics, and STEM
Ebook
Cognitive Informatics, Computer Modelling, and Cognitive Science: Volume 2: Application to Neural Engineering, Robotics, and STEM
byG. R. Sinha
Rating: 0 out of 5 stars
0 ratings
Cognitive Informatics, Computer Modelling, and Cognitive Science: Volume 1: Theory, Case Studies, and Applications
Ebook
Cognitive Informatics, Computer Modelling, and Cognitive Science: Volume 1: Theory, Case Studies, and Applications
byG. R. Sinha
Rating: 0 out of 5 stars
0 ratings
Bio-inspired Algorithms for Engineering
Ebook
Bio-inspired Algorithms for Engineering
byNancy Arana-Daniel
Rating: 0 out of 5 stars
0 ratings
Web Semantics: Cutting Edge and Future Directions in Healthcare
Ebook
Web Semantics: Cutting Edge and Future Directions in Healthcare
bySarika Jain
Rating: 0 out of 5 stars
0 ratings
Computer Vision for Assistive Healthcare
Ebook
Computer Vision for Assistive Healthcare
byLeo Marco
Rating: 0 out of 5 stars
0 ratings
Deep Learning for Robot Perception and Cognition
Ebook
Deep Learning for Robot Perception and Cognition
byAlexandros Iosifidis
Rating: 4 out of 5 stars
4/5
Tactile Sensing, Skill Learning, and Robotic Dexterous Manipulation
Ebook
Tactile Sensing, Skill Learning, and Robotic Dexterous Manipulation
byQiang Li
Rating: 0 out of 5 stars
0 ratings
Human-Machine Shared Contexts
Ebook
Human-Machine Shared Contexts
byWilliam Lawless
Rating: 0 out of 5 stars
0 ratings
Cognitive Big Data Intelligence with a Metaheuristic Approach
Ebook
Cognitive Big Data Intelligence with a Metaheuristic Approach
bySushruta Mishra
Rating: 0 out of 5 stars
0 ratings
Computers and Conversation
Ebook
Computers and Conversation
byPaul Luff
Rating: 0 out of 5 stars
0 ratings
Deep Learning for Medical Applications with Unique Data
Ebook
Deep Learning for Medical Applications with Unique Data
byDeepak Gupta
Rating: 0 out of 5 stars
0 ratings
Mutualistic Networks
Ebook
Mutualistic Networks
byJordi Bascompte
Rating: 0 out of 5 stars
0 ratings
Cognitive Systems - Information Processing Meets Brain Science
Ebook
Cognitive Systems - Information Processing Meets Brain Science
byRichard G.M. Morris
Rating: 3 out of 5 stars
3/5
Cognitive and Soft Computing Techniques for the Analysis of Healthcare Data
Ebook
Cognitive and Soft Computing Techniques for the Analysis of Healthcare Data
byAkash Kumar Bhoi
Rating: 0 out of 5 stars
0 ratings
Learning Mechanisms: A Tutorial Study Guide
Ebook
Learning Mechanisms: A Tutorial Study Guide
byNicoladie Tam, Ph.D.
Rating: 0 out of 5 stars
0 ratings
Bioinformatics A Complete Guide - 2020 Edition
Ebook
Bioinformatics A Complete Guide - 2020 Edition
byGerardus Blokdyk
Rating: 0 out of 5 stars
0 ratings
Conversations with Leading Academic and Research Library Directors: International Perspectives on Library Management
Ebook
Conversations with Leading Academic and Research Library Directors: International Perspectives on Library Management
byPatrick Lo
Rating: 0 out of 5 stars
0 ratings
Modeling Populations of Adaptive Individuals
Ebook
Modeling Populations of Adaptive Individuals
bySteven F. Railsback
Rating: 0 out of 5 stars
0 ratings
Systems Neuroscience
Ebook
Systems Neuroscience
byDavid Metzler
Rating: 0 out of 5 stars
0 ratings
Inequality and Climate Change: Perspectives from the South
Ebook
Inequality and Climate Change: Perspectives from the South
byCarlo Delgado-Ramos
Rating: 0 out of 5 stars
0 ratings
Security in IoT Social Networks
Ebook
Security in IoT Social Networks
byFadi Al-Turjman
Rating: 0 out of 5 stars
0 ratings
Biological Network Analysis: Trends, Approaches, Graph Theory, and Algorithms
Ebook
Biological Network Analysis: Trends, Approaches, Graph Theory, and Algorithms
byPietro Hiram Guzzi
Rating: 0 out of 5 stars
0 ratings
The Visually Responsive Neuron: From Basic Neurophysiology to Behavior
Ebook
The Visually Responsive Neuron: From Basic Neurophysiology to Behavior
byT.P. Hicks
Rating: 0 out of 5 stars
0 ratings
Tactile Internet: with Human-in-the-Loop
Ebook
Tactile Internet: with Human-in-the-Loop
byFrank H. P. Fitzek
Rating: 0 out of 5 stars
0 ratings
Handbook of Decision Support Systems for Neurological Disorders
Ebook
Handbook of Decision Support Systems for Neurological Disorders
byD. Jude Hemanth
Rating: 0 out of 5 stars
0 ratings
Foundations of Genetic Algorithms 1995 (FOGA 3)
Ebook
Foundations of Genetic Algorithms 1995 (FOGA 3)
byElsevier Books Reference
Rating: 0 out of 5 stars
0 ratings
Diagnostic Biomedical Signal and Image Processing Applications With Deep Learning Methods
Ebook
Diagnostic Biomedical Signal and Image Processing Applications With Deep Learning Methods
byKemal Polat
Rating: 0 out of 5 stars
0 ratings
The Natural Language for Artificial Intelligence
Ebook
The Natural Language for Artificial Intelligence
byDioneia Motta Monte-Serrat
Rating: 0 out of 5 stars
0 ratings
Bio-Inspired Computation and Applications in Image Processing
Ebook
Bio-Inspired Computation and Applications in Image Processing
byXin-She Yang
Rating: 0 out of 5 stars
0 ratings
Trends in Deep Learning Methodologies: Algorithms, Applications, and Systems
Ebook
Trends in Deep Learning Methodologies: Algorithms, Applications, and Systems
byVincenzo Piuri
Rating: 0 out of 5 stars
0 ratings

Robotics For You

Skip carousel

ChatGPT Money Machine 2024 - The Ultimate Chatbot Cheat Sheet to Go From Clueless Noob to Prompt Prodigy Fast! Complete AI Beginner’s Course to Catch the GPT Gold Rush Before It Leaves You Behind
Ebook
ChatGPT Money Machine 2024 - The Ultimate Chatbot Cheat Sheet to Go From Clueless Noob to Prompt Prodigy Fast! Complete AI Beginner’s Course to Catch the GPT Gold Rush Before It Leaves You Behind
byAlec Rowe
Rating: 0 out of 5 stars
0 ratings
Robot Building For Dummies
Ebook
Robot Building For Dummies
byRoger Arrick
Rating: 3 out of 5 stars
3/5
Artificial Intelligence: The Complete Beginner’s Guide to the Future of A.I.
Ebook
Artificial Intelligence: The Complete Beginner’s Guide to the Future of A.I.
byJohn Adamssen
Rating: 4 out of 5 stars
4/5
Dark Aeon: Transhumanism and the War Against Humanity
Ebook
Dark Aeon: Transhumanism and the War Against Humanity
byJoe Allen
Rating: 5 out of 5 stars
5/5
Arduino: The complete guide to Arduino for beginners, including projects, tips, tricks, and programming!
Ebook
Arduino: The complete guide to Arduino for beginners, including projects, tips, tricks, and programming!
byArthur James
Rating: 4 out of 5 stars
4/5
Artificial Intelligence Revolution: How AI Will Change our Society, Economy, and Culture
Ebook
Artificial Intelligence Revolution: How AI Will Change our Society, Economy, and Culture
byRobin Li
Rating: 5 out of 5 stars
5/5
2062: The World that AI Made
Ebook
2062: The World that AI Made
byToby Walsh
Rating: 5 out of 5 stars
5/5
Robotics, Mechatronics, and Artificial Intelligence: Experimental Circuit Blocks for Designers
Ebook
Robotics, Mechatronics, and Artificial Intelligence: Experimental Circuit Blocks for Designers
byNewton C. Braga
Rating: 5 out of 5 stars
5/5
Raspberry Pi Projects for the Evil Genius
Ebook
Raspberry Pi Projects for the Evil Genius
byDonald Norris
Rating: 0 out of 5 stars
0 ratings
ChatGPT: The Future of Intelligent Conversation
Ebook
ChatGPT: The Future of Intelligent Conversation
byCea West
Rating: 4 out of 5 stars
4/5
How to Survive a Robot Uprising: Tips on Defending Yourself Against the Coming Rebellion
Ebook
How to Survive a Robot Uprising: Tips on Defending Yourself Against the Coming Rebellion
byDaniel H. Wilson
Rating: 3 out of 5 stars
3/5
Become a U.S. Commercial Drone Pilot
Ebook
Become a U.S. Commercial Drone Pilot
byJohn Deans
Rating: 5 out of 5 stars
5/5
The Fourth Age: Smart Robots, Conscious Computers, and the Future of Humanity
Ebook
The Fourth Age: Smart Robots, Conscious Computers, and the Future of Humanity
byByron Reese
Rating: 3 out of 5 stars
3/5
How to Walk on Water and Climb up Walls: Animal Movement and the Robots of the Future
Ebook
How to Walk on Water and Climb up Walls: Animal Movement and the Robots of the Future
byDavid Hu
Rating: 3 out of 5 stars
3/5
CNC: How Hard Can it Be
Ebook
CNC: How Hard Can it Be
byMelvin Wolgamott
Rating: 4 out of 5 stars
4/5
Machine Learning: Adaptive Behaviour Through Experience: Thinking Machines
Ebook
Machine Learning: Adaptive Behaviour Through Experience: Thinking Machines
byalasdair gilchrist
Rating: 4 out of 5 stars
4/5
Arduino + Android Projects for the Evil Genius: Control Arduino with Your Smartphone or Tablet
Ebook
Arduino + Android Projects for the Evil Genius: Control Arduino with Your Smartphone or Tablet
bySimon Monk
Rating: 5 out of 5 stars
5/5
How to Build an Android: The True Story of Philip K. Dick's Robotic Resurrection
Ebook
How to Build an Android: The True Story of Philip K. Dick's Robotic Resurrection
byDavid F. Dufty
Rating: 4 out of 5 stars
4/5
Arduino For Beginners: How to get the most of out of your Arduino, including Arduino basics, Arduino tips and tricks, Arduino projects and more!
Ebook
Arduino For Beginners: How to get the most of out of your Arduino, including Arduino basics, Arduino tips and tricks, Arduino projects and more!
byMatthew Oates
Rating: 0 out of 5 stars
0 ratings
Artificial Intelligence for Fashion: How AI is Revolutionizing the Fashion Industry
Ebook
Artificial Intelligence for Fashion: How AI is Revolutionizing the Fashion Industry
byLeanne Luce
Rating: 0 out of 5 stars
0 ratings
Aliens, Robots & Virtual Reality Idols in the Science Fiction of H. P. Lovecraft, Isaac Asimov and William Gibson
Ebook
Aliens, Robots & Virtual Reality Idols in the Science Fiction of H. P. Lovecraft, Isaac Asimov and William Gibson
byJohn L. Steadman
Rating: 0 out of 5 stars
0 ratings
In the Age of AI: How AI and Emerging Technologies Are Disrupting Industries, Lives, and the Future of Work
Ebook
In the Age of AI: How AI and Emerging Technologies Are Disrupting Industries, Lives, and the Future of Work
bySam Mielke
Rating: 5 out of 5 stars
5/5
Digital Twin Development and Deployment on the Cloud: Developing Cloud-Friendly Dynamic Models Using Simulink®/SimscapeTM and Amazon AWS
Ebook
Digital Twin Development and Deployment on the Cloud: Developing Cloud-Friendly Dynamic Models Using Simulink®/SimscapeTM and Amazon AWS
byNassim Khaled
Rating: 0 out of 5 stars
0 ratings
Raspberry Pi For Beginners: How to get the most out of your raspberry pi, including raspberry pi basics, tips and tricks, raspberry pi projects, and more!
Ebook
Raspberry Pi For Beginners: How to get the most out of your raspberry pi, including raspberry pi basics, tips and tricks, raspberry pi projects, and more!
byMatthew Oates
Rating: 0 out of 5 stars
0 ratings
Love and Sex with Robots: The Evolution of Human-Robot Relationships
Ebook
Love and Sex with Robots: The Evolution of Human-Robot Relationships
byDavid Levy
Rating: 4 out of 5 stars
4/5
Turned On: Science, Sex and Robots
Ebook
Turned On: Science, Sex and Robots
byKate Devlin
Rating: 4 out of 5 stars
4/5
Artificial You: AI and the Future of Your Mind
Ebook
Artificial You: AI and the Future of Your Mind
bySusan Schneider
Rating: 4 out of 5 stars
4/5
Martin Ford's Rise of The Robots: Technology and the Threat of a Jobless Future Summary
Ebook
Martin Ford's Rise of The Robots: Technology and the Threat of a Jobless Future Summary
byAnt Hive Media
Rating: 2 out of 5 stars
2/5
In Our Own Image
Ebook
In Our Own Image
byGeorge Zarkadakis
Rating: 4 out of 5 stars
4/5
Arduino: The ultimate Arduino guide for beginners, including Arduino programming, Arduino cookbook, tips, tricks, and more!
Ebook
Arduino: The ultimate Arduino guide for beginners, including Arduino programming, Arduino cookbook, tips, tricks, and more!
byCraig Newport
Rating: 3 out of 5 stars
3/5

Related podcast episodes

Skip carousel

#98 Interpretable Machine Learning
Podcast episode
#98 Interpretable Machine Learning
byDataFramed
0 ratings
0% found this document useful
Integrating Psycholinguistics into AI with Dominique Simmons - TWiML Talk #23: I think you’re really going to enjoy today’s show…
Podcast episode
Integrating Psycholinguistics into AI with Dominique Simmons - TWiML Talk #23: I think you’re really going to enjoy today’s show…
byThe TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)
0 ratings
0% found this document useful
Episode 39: Reshaping Energy Storage
Podcast episode
Episode 39: Reshaping Energy Storage
byMaterialism: A Materials Science Podcast
0 ratings
0% found this document useful
Research Review of FASHION UK from PEDro: Research Review of FASHION UK from PEDro
Podcast episode
Research Review of FASHION UK from PEDro: Research Review of FASHION UK from PEDro
byPT Pintcast - Physical Therapy
0 ratings
0% found this document useful
The Anthrozoology Podcast - Problematizing the Ethics Process: An Anthrozoological Perspective Part 2 #10
Podcast episode
The Anthrozoology Podcast - Problematizing the Ethics Process: An Anthrozoological Perspective Part 2 #10
byThe Anthrozoology Podcast
0 ratings
0% found this document useful
The Anthrozoology Podcast - Problematizing the Ethics Process: An Anthrozoological Perspective Part 1 #9
Podcast episode
The Anthrozoology Podcast - Problematizing the Ethics Process: An Anthrozoological Perspective Part 1 #9
byThe Anthrozoology Podcast
0 ratings
0% found this document useful
Systems Thinking (Feat. Dr Alison Rodrias)
Podcast episode
Systems Thinking (Feat. Dr Alison Rodrias)
byBA Brew - A Business Analysis Podcast
0 ratings
0% found this document useful
Episode 417: Evolving Ethics for Wildlife Control: Interview with Dr. Sara Dubois: 'Seven Principles for Ethical Wildlife Control' comes out of meeting of 20 international scientists
Podcast episode
Episode 417: Evolving Ethics for Wildlife Control: Interview with Dr. Sara Dubois: 'Seven Principles for Ethical Wildlife Control' comes out of meeting of 20 international scientists
byDefender Radio and The Switch
0 ratings
0% found this document useful
What We Know vs. What We Think We Know
Podcast episode
What We Know vs. What We Think We Know
byTalking Biotech with Dr. Kevin Folta
0 ratings
0% found this document useful
Which UX Research Methods Should You Use?: Episode 11 Show Notes: It can be intimidating looking at all the methods for user research. Where do I start? Which ones will work best for my client? What if I choose the wrong one? In this episode, I go over what user research methods are out...
Podcast episode
Which UX Research Methods Should You Use?: Episode 11 Show Notes: It can be intimidating looking at all the methods for user research. Where do I start? Which ones will work best for my client? What if I choose the wrong one? In this episode, I go over what user research methods are out...
byUI Narrative: UX, UI, IxD, Design and Research
0 ratings
0% found this document useful
Ep. 180 Environmental Impact of Interventional Radiology with Dr. Jonathan Gross: Interventional Radiologist Dr. Jonathan Gross and host Dr. Aaron Fritts discuss the results from his recent JVIR Media article on the quantifiable environmental impact of operating an interventional radiology practice for one week. Guess how many road trips around the world it equates to!?
Podcast episode
Ep. 180 Environmental Impact of Interventional Radiology with Dr. Jonathan Gross: Interventional Radiologist Dr. Jonathan Gross and host Dr. Aaron Fritts discuss the results from his recent JVIR Media article on the quantifiable environmental impact of operating an interventional radiology practice for one week. Guess how many road trips around the world it equates to!?
byBackTable Vascular & Interventional
0 ratings
0% found this document useful
IBS Biomarkers and Diagnostic Diapers With Maria Eugenia Inda-Webb: , Pew Postdoctoral Fellow working in the Synthetic Biology Center at MIT builds biosensors to diagnose and treat inflammatory disorders in the gut, like inflammatory bowel disease and celiac disease. She discusses how “wearables,” like diagnostic...
Podcast episode
IBS Biomarkers and Diagnostic Diapers With Maria Eugenia Inda-Webb: , Pew Postdoctoral Fellow working in the Synthetic Biology Center at MIT builds biosensors to diagnose and treat inflammatory disorders in the gut, like inflammatory bowel disease and celiac disease. She discusses how “wearables,” like diagnostic...
byMeet the Microbiologist
0 ratings
0% found this document useful
Doing Software Engineering in Academia - Johanna Bayer
Podcast episode
Doing Software Engineering in Academia - Johanna Bayer
byDataTalks.Club
0 ratings
0% found this document useful
Go with the (4D) Flow: Neuroimaging Technology Used to Study Overlap of Cerebrovascular Disease and Alzheimer’s: A recent study from Alzheimer’s disease researchers at the University of Wisconsin–Madison used neuroimaging technology called 4D Flow MRI to study the relationship between blood vessel disease and Alzheimer’s disease. Published in December 2021, the...
Podcast episode
Go with the (4D) Flow: Neuroimaging Technology Used to Study Overlap of Cerebrovascular Disease and Alzheimer’s: A recent study from Alzheimer’s disease researchers at the University of Wisconsin–Madison used neuroimaging technology called 4D Flow MRI to study the relationship between blood vessel disease and Alzheimer’s disease. Published in December 2021, the...
byDementia Matters
0 ratings
0% found this document useful
Unearthing New Patterns of Plant Behavior—Paco Calvo—Minimal Intelligence Lab: Plants: they add color and oxygen to our world, they release self-protective chemicals in response to the presence of pathogens, and they grow toward the sun. We know that plants are intelligent in the sense that they can adapt to their environment,...
Podcast episode
Unearthing New Patterns of Plant Behavior—Paco Calvo—Minimal Intelligence Lab: Plants: they add color and oxygen to our world, they release self-protective chemicals in response to the presence of pathogens, and they grow toward the sun. We know that plants are intelligent in the sense that they can adapt to their environment,...
byFinding Genius Podcast
0 ratings
0% found this document useful
FBL99: Karl Friston - How Free Energy Shapes the Future of AI
Podcast episode
FBL99: Karl Friston - How Free Energy Shapes the Future of AI
byThe Feedback Loop by Singularity
0 ratings
0% found this document useful
#28 - Dr Cotton-Barratt on why scientists should need insurance, PhD strategy & fast AI progresses: A researcher is working on creating a new virus –…
Podcast episode
#28 - Dr Cotton-Barratt on why scientists should need insurance, PhD strategy & fast AI progresses: A researcher is working on creating a new virus –…
by80,000 Hours Podcast
0 ratings
0% found this document useful
Astronauts, The Overview Effect, And Ecological Transcendence With Anaïs Voşki, Stanford University
Podcast episode
Astronauts, The Overview Effect, And Ecological Transcendence With Anaïs Voşki, Stanford University
byHow to Save the World | The Psychology & Science of Environmental Behavior
0 ratings
0% found this document useful
Can Filters Capture Viruses? - Short #172: In this short podcast, Bryan talks about filtration and IAQ, especially as they relate to virus control. He also answers the age-old question: “Can filters capture viruses?” While it may seem like particle size matters when it comes to filter...
Podcast episode
Can Filters Capture Viruses? - Short #172: In this short podcast, Bryan talks about filtration and IAQ, especially as they relate to virus control. He also answers the age-old question: “Can filters capture viruses?” While it may seem like particle size matters when it comes to filter...
byHVAC School - For Techs, By Techs
0 ratings
0% found this document useful
Research proves that Biophilic Design increases Workplace value
Podcast episode
Research proves that Biophilic Design increases Workplace value
byJournal of Biophilic Design
0 ratings
0% found this document useful
Brain in A Nutshell 79 Part 2: The Latest Environmental Psychology Research Findings, Part 2
Podcast episode
Brain in A Nutshell 79 Part 2: The Latest Environmental Psychology Research Findings, Part 2
byHow to Save the World | The Psychology & Science of Environmental Behavior
0 ratings
0% found this document useful
#6 - NIRS & Cycling Performance
Podcast episode
#6 - NIRS & Cycling Performance
byOxidative Potential Podcast
0 ratings
0% found this document useful
129: AR Lab Network, GN7F panel, and more
Podcast episode
129: AR Lab Network, GN7F panel, and more
byLet's Talk Micro
0 ratings
0% found this document useful
Ep. 171 The Making of a “Good” IR with Dr. Lola Oladini: Dr. Eric Keller talks with Dr. Lola Oladini from Stanford Medicine Department of Radiology about what makes optimal training for Interventional Radiologists, including discussion on the variety of strengths of programs across the country, balancing diagnostics with procedural training, and what it means in being a "clinical IR".
Podcast episode
Ep. 171 The Making of a “Good” IR with Dr. Lola Oladini: Dr. Eric Keller talks with Dr. Lola Oladini from Stanford Medicine Department of Radiology about what makes optimal training for Interventional Radiologists, including discussion on the variety of strengths of programs across the country, balancing diagnostics with procedural training, and what it means in being a "clinical IR".
byBackTable Vascular & Interventional
0 ratings
0% found this document useful
When Guilt Works Better Than Pride To Get People To Get People To Do the Green Thing with Nicole Sintov PhD
Podcast episode
When Guilt Works Better Than Pride To Get People To Get People To Do the Green Thing with Nicole Sintov PhD
byHow to Save the World | The Psychology & Science of Environmental Behavior
0 ratings
0% found this document useful
Designing Bird Call Audio for Game-like Wildlife Citizen Science Engagement with Jessie Oliver, PhD Candidate
Podcast episode
Designing Bird Call Audio for Game-like Wildlife Citizen Science Engagement with Jessie Oliver, PhD Candidate
byHow to Save the World | The Psychology & Science of Environmental Behavior
0 ratings
0% found this document useful
#16: Your Brain, and Body, on Nature: https://nres.illinois.edu/directory/fekuo (Dr. Ming Kuo), Professor of Environmental Psychology and Director of the Landscape and Human Health Lab at University of Illinois joins the podcast to discuss the myriad psychological and physiological benefits ...
Podcast episode
#16: Your Brain, and Body, on Nature: https://nres.illinois.edu/directory/fekuo (Dr. Ming Kuo), Professor of Environmental Psychology and Director of the Landscape and Human Health Lab at University of Illinois joins the podcast to discuss the myriad psychological and physiological benefits ...
byAll Things Connected
0 ratings
0% found this document useful
03| Neuroimaging and Neuropsychology, Friends or Foes? (Part 1): We interviewed Stephen Correia, Ph.D., ABPP-CN, a neuropsychologist with extensive research and clinical training in the use of imaging techniques. In this week’s episode, we cover the following content related to clinical neuroimaging, with a...
Podcast episode
03| Neuroimaging and Neuropsychology, Friends or Foes? (Part 1): We interviewed Stephen Correia, Ph.D., ABPP-CN, a neuropsychologist with extensive research and clinical training in the use of imaging techniques. In this week’s episode, we cover the following content related to clinical neuroimaging, with a...
byNavigating Neuropsychology
0 ratings
0% found this document useful
S3E2: Multi-echo EPI: An under-utilised tool for fMRI with Prantik Kundu and Charles Lynch
Podcast episode
S3E2: Multi-echo EPI: An under-utilised tool for fMRI with Prantik Kundu and Charles Lynch
byOHBM Neurosalience
0 ratings
0% found this document useful
Olaf Sporns on Network Neuroscience: An interview with Olaf Sporns
Podcast episode
Olaf Sporns on Network Neuroscience: An interview with Olaf Sporns
byNew Books in Science
0 ratings
0% found this document useful

Skip carousel

The Announcement Of The Winner For The International Competition Of City Main Library Gwangju
Space
Article
The Announcement Of The Winner For The International Competition Of City Main Library Gwangju
Apr 2, 2020
2 min read
Is Artificial Intelligence Permanently Inscrutable?
Nautilus
Article
Is Artificial Intelligence Permanently Inscrutable?
Sep 1, 2016
Dmitry Malioutov can’t say much about what he built. As a research scientist at IBM, Malioutov spends part of his time building machine learning systems that solve difficult problems faced by IBM’s corporate clients. One such program was meant for a
13 min read
Is Artificial Intelligence Permanently Inscrutable?: Despite new biology-like tools, some insist interpretation is impossible.
Nautilus
Article
Is Artificial Intelligence Permanently Inscrutable?: Despite new biology-like tools, some insist interpretation is impossible.
Sep 1, 2016
Dmitry Malioutov can’t say much about what he built. As a research scientist at IBM, Malioutov spends part of his time building machine learning systems that solve difficult problems faced by IBM’s corporate clients. One such program was meant for a
13 min read
Lesser-known Relative Of The Laser Could Leave The Lab Soon
Futurity
Article
Lesser-known Relative Of The Laser Could Leave The Lab Soon
Feb 9, 2018
Researchers may have found a way to solve the weakness of a type of light source similar to lasers. The alternative light source could lead to smaller, lower-cost, and more efficient sources of light pulses. “Sometimes you completely reshape your und
3 min read
Your Robot Vacuum Could Spy On You
Futurity
Article
Your Robot Vacuum Could Spy On You
Dec 7, 2020
2 min read
Wireless ‘Clearbuds’ Use Machine Learning For Better Sound
Futurity
Article
Wireless ‘Clearbuds’ Use Machine Learning For Better Sound
Jul 12, 2022
2 min read
Sound And Vibrations Let Smart Devices Know Where They Are
Futurity
Article
Sound And Vibrations Let Smart Devices Know Where They Are
Oct 23, 2018
Methods that analyze sounds and vibrations can make smart devices more aware of their surroundings, according to new research. “A smart speaker sitting on a kitchen countertop cannot figure out if it is in a kitchen, let alone know what a person is d
3 min read
Slow Listening
Stereophile
Article
Slow Listening
Dec 10, 2019
THERE ARE AS MANY OPINIONS AS THERE ARE EXPERTS THIS ISSUE: Pro audio is coming around to what audiophiles have known for years. Subjectivist audiophiles have long maintained that long-term listening is necessary to assess the quality and character o
3 min read
Stickers ‘Listen’ To Skin To Track Your Health
Futurity
Article
Stickers ‘Listen’ To Skin To Track Your Health
Aug 16, 2019
2 min read
The New Wave of Wearables
Fast Company
Article
The New Wave of Wearables
Jun 20, 2016
3 min read
Professor Newman on… Metrics
Amateur Photographer
Article
Professor Newman on… Metrics
Apr 15, 2023
2 min read
Deep-learning AI Technique Helps Scientists See More Clearly Inside The Cell
STAT
Article
Deep-learning AI Technique Helps Scientists See More Clearly Inside The Cell
Sep 4, 2019
A new imaging restoration technique using deep learning offers scientists a higher-resolution, less-blurry, less-noisy view of the interior of cells.
3 min read
AI Teaches Brain Tumor Surgery Better Than Human Experts
Futurity
Article
AI Teaches Brain Tumor Surgery Better Than Human Experts
Feb 24, 2022
2 min read
Wireless Network Gets Data From Sensors The Size Of Salt Grains
Futurity
Article
Wireless Network Gets Data From Sensors The Size Of Salt Grains
Mar 19, 2024
Tiny chips may be a big breakthrough, researchers report. They have a new approach for a wireless communication network that can efficiently transmit, receive, and decode data from thousands of microelectronic chips that are each no larger than a gra
3 min read
How Robot Math and Smartphones Led Researchers to a Drug Discovery Breakthrough
AppleMagazine
Article
How Robot Math and Smartphones Led Researchers to a Drug Discovery Breakthrough
Jan 19, 2018
3 min read
What Makes A Perfect Sensor?
Amateur Photographer
Article
What Makes A Perfect Sensor?
Jan 29, 2018
2 min read
Watch: Person Uses Thoughts To Operate A Wheelchair
Futurity
Article
Watch: Person Uses Thoughts To Operate A Wheelchair
Nov 21, 2022
2 min read
Neural Pathways
Guitar Magazine
Article
Neural Pathways
Jul 2, 2021
5 min read
18 For Placing DNA Sequencing In The Palm Of Your Hand
Fast Company
Article
18 For Placing DNA Sequencing In The Palm Of Your Hand
May 30, 2018
Gordon Sanghera Cofounder and CEO, Oxford Nanopore Technologies Scientists around the world have been using the MinION—a handheld device that can sequence DNA and RNA in real time and costs just $1,000—since Oxford Nanopore Technologies first relea
1 min read
4D Camera Gives Robots a Wider View
Futurity
Article
4D Camera Gives Robots a Wider View
Jul 25, 2017
Researchers have created a new camera that could create four-dimensional images and capture nearly 140 degrees of information. “We’re great at making cameras for humans but do robots need to see the way humans do? Probably not…” The camera could gene
3 min read
App Looks at Your Pupil to Detect Concussion
Futurity
Article
App Looks at Your Pupil to Detect Concussion
Sep 14, 2017
3 min read
Stretchy ‘Band-aid’ Tracks Stroke Recovery In Real Time
Futurity
Article
Stretchy ‘Band-aid’ Tracks Stroke Recovery In Real Time
Feb 19, 2018
A stretchable wearable sensor designed to be worn on the throat can help monitor and treat stroke patients. The sensor adds to the portfolio of stretchable electronics that are precise enough for use in advanced medical care and portable enough to be
2 min read
To Make Sense Of A.I. Decisions, ‘Peek Under The Hood’
Futurity
Article
To Make Sense Of A.I. Decisions, ‘Peek Under The Hood’
Oct 8, 2018
Now that humans have programmed computers to learn, we want to know exactly what they’ve learned and how they make decisions after their learning process is complete. The answers to such questions could shed light on our own decision-making processes
6 min read
Super Cheap Tags Can Make Our Dumb Stuff Smarter
Futurity
Article
Super Cheap Tags Can Make Our Dumb Stuff Smarter
Apr 12, 2019
2 min read
Mantis Shrimp Eyes Inspire New Optical Sensor
Futurity
Article
Mantis Shrimp Eyes Inspire New Optical Sensor
Mar 4, 2021
2 min read
Smart Contacts Monitor Eyes For Sign Of Glaucoma 24/7
Futurity
Article
Smart Contacts Monitor Eyes For Sign Of Glaucoma 24/7
Oct 4, 2022
3 min read
Science Is Becoming Less Human
The Atlantic
Article
Science Is Becoming Less Human
Dec 11, 2023
This summer, a pill intended to treat a chronic, incurable lung disease entered mid-phase human trials. Previous studies have demonstrated that the drug is safe to swallow, although whether it will improve symptoms of the painful fibrosis that it tar
8 min read
Scientists Aim to Ease Blindness With Video Goggles
Futurity
Article
Scientists Aim to Ease Blindness With Video Goggles
Aug 24, 2017
Scientists are still a long way from creating a visual prosthesis that works as well as a real human eye. But, engineers are making steady progress in what was once the realm of science fiction. One of their promising new devices, a bionic vision sys
5 min read
AI Gets A Sense Of Number
Business Today
Article
AI Gets A Sense Of Number
May 27, 2019
3 min read
Scanning Ahead…
Digital Camera World
Article
Scanning Ahead…
Jan 7, 2022
2 min read

Related categories

Skip carousel

Reviews for Multimodal Behavior Analysis in the Wild

Rating: 0 out of 5 stars

0 ratings

0 ratings0 reviews

Book preview

Multimodal Behavior Analysis in the Wild - Xavier Alameda-Pineda

IAPR.

Multimodal behavior analysis in the wild: An introduction

Xavier Alameda-Pineda⁎; Elisa Ricci†,‡; Nicu Sebe† ⁎Inria Grenoble Rhone-Alpes, Perception Team, France

†University of Trento, Department of Information Engineering and Computer Science, Italy

‡Fondazione Bruno Kessler, Technology of Vision, Italy

Abstract

The nature of human behavior is complex and multifaceted. Behavioral expressions vary significantly across individuals and are influenced by many factors. People act differently according to their physical and mental state, to their age, gender and socio-cultural background, to the nature of the tasks they are engaged in, to the environment where they operate, to the behavior of other individuals, etc. All these factors make the automatic analysis of human behavior an extremely challenging problem.

Despite its complexity, human behavior understanding has attracted considerable attention due to its many applications, e.g., in health care, conflict and people management, sociology, marketing and surveillance, etc. In the last decades many researchers have invested efforts into developing computational approaches that enable one to automatically describe the behavior of individuals and groups. Generally speaking, the extraction of behavioral information involves methods operating at different levels of granularity, from low level (e.g., people detection, motion estimation) to high level (e.g., emotional states, personality traits). accurate methods operating at each level. This book describes some recent research efforts in the area of human behavior analysis, presenting methodologies for extracting behavioral cues at different levels. Special emphasis is given to recent approaches considering multimodal data to robustly extract behavioral information in real-world settings. Beside covering state-of-the-art research, the book also outlines some open challenges in the field as well as promising future research directions.

Keywords

Multimodal data; Human behavior analysis; Realistic conditions

0.1 Analyzing human behavior in the wild from multimodal data

Due to its importance in many applications, the automatic analysis of human behavior has been a popular research topic in the last decades. Understanding human behavior is relevant in many fields, such as assistive robotics, human–computer interaction, surveillance and security, to cite only a few.

The automatic extraction of behavioral cues is an extremely challenging task involving several disciplines ranging from machine learning, signal processing, computer vision, social psychology, etc. Thanks to the recent progress in the area of Artificial Intelligence and deep learning, significant advances have been made in the last few years in the development of systems for human behavior analysis. For instance, technologies for speech recognition and machine translation have significantly improved and they are now able to work in a wide range of real-world settings. Similarly, several advances have been made in the robotics field, witnessed by the advent on the market of robots which accurately recognize and mimic human emotions. More surprisingly, in the last years technologies have appeared which are able to interpret people behaviors even more precisely than human observers. For instance, computer vision researchers have developed systems which can estimate physiological signals (e.g. heart and respiration rate) by analyzing subtle skin color variations from face videos [18] or which can track the position of a moving person behind a wall from the shadow arising on the ground at the base of the wall's edge [6]. Despite this progress, still many current technologies for human behavior analysis have limited applicability and are not robust enough to operate in arbitrary conditions and in real-world settings. In other words, the path towards automatically understanding human behaviors ‘in the wild’ is still to be discovered.

It is a well-known fact that the automatic analysis of human behavior can benefit from harnessing multiple modalities. While earlier work on behavior understanding focused on an unimodal setting, typically considering only visual or audio data, more recent approaches leverage multimodal information. Investigating methods to process multimodal data is of utmost importance as multiple modalities provide a more complete representation of human behavior. Moreover, data gathered with different sensors can be incomplete or corrupted by noise. In synthesis, unimodal approaches fail to provide a robust and accurate representation of behavioral patterns and smart multimodal fusion strategies are required. However, from the machine learning point of view, multimodal human behavior analysis is a challenging task since learning the complex relationship across modalities is non-trivial.

The analysis of human behavior from multimodal data has been encouraged in the last few years by the emergence on the market of novel devices, such as wearable watches or smartphones. These devices typically include several sensors (e.g. camera, microphone, accelerometers, bluetooth, etc.), i.e., they are inherently multimodal. Additionally, the diffusion at consumer level of systems such as drones, low-cost infrared or wearable cameras has opened the possibility of studying human behavior considering other types of data, such as images in an ego-vision setting, 2.5D data, or birds-eye view videos. These technologies provide complementary information to traditional sensing modalities, such as distributed camera and microphone networks. For instance, when analyzing social scenes, wearable sensing devices can be exploited in association with data from traditional cameras to localize people and estimate their interactions [3]. Similarly, egocentric videos can be used together with images from surveillance cameras for the purpose of automatic activity recognition or analyzed jointly with audio signals for robust people re-identification [7].

Besides the widespread diffusion of novel devices, in the last decade the study of human behavior from multimodal data has been also encouraged by the emergence of new methodologies. In particular, research work from social signal processing [20,8] has allowed for significantly advances of the field. Studies have clearly stressed the importance of non-verbal cues such as gestures, gaze, and emotional patterns in human communication and the need of designing methodologies for inferring these cues by processing multimodal data. Similar studies demonstrated the interest of fusing auditory and visual data for lower-level tasks [4,5]. Furthermore, social signal processing has enabled one to improve technologies for the automatic analysis of human behavior thanks to the integrations of concepts from social psychology into machine learning algorithms (e.g. the concepts of proxemics and kinesics are now used in many approaches for automatically detecting groups in social scenes [11,19]).

Significant advances in the field of understanding human behavior have also been achieved thanks to the (re-)discovery of deep neural networks. Deep learning has significantly improved the accuracy of many systems for extracting behavioral cues under real-world conditions. For instance, in computer vision deep models have been applied with success to the tasks of activity recognition [22], gaze estimation [16], group analysis, etc. Some of the technologies described in this book adopting deep learning architectures have been deployed in real-world settings (e.g. the audio-visual systems described in Chapter 8 have been used by museum visitors). In addition, several research studies have proposed deep learning-based strategies for fusing multimodal data, outperforming previous approaches based on traditional machine learning models.

The fast and broad progress in Artificial Intelligence has not only enabled great advances in the analysis of human behavior but has also opened new possibilities for generating realistic human-like behavioral data [23,14,17]. Notable examples relate to the synthesis of realistic-looking images of people, to the generation of human-sounding speech as well as to the design of robots that emulate human emotions. Furthermore, the successes achieved with deep learning have also encouraged the research community to address new challenges. For instance, several recent studies tackled the problem of activity forecasting and behavior prediction [1,21]. Other work focused on the rethinking the action–perception loop and devising end-to-end trainable architecture to directly predict actions from raw data [24,15]. In the area of human behavior analysis these studies can be extremely relevant and would ultimately lead for instance to creating unified models for jointly analyzing human social behaviors and controlling intelligent vehicles and social robots.

While the progress in the study of human behaviors has been considerable, recent work has also pointed out many limitations of current methodologies and systems. For instance, the adoption of deep learning in several applications has highlighted the need for large-scale datasets. Indeed, datasets can be a very limiting issue depending on the application at hand for different reasons, such as labeling cost, privacy, synchronization problems, etc. The research community is pushing towards handling this problem and several datasets have been made available in the last few years for studying human behaviors in the wild. Notable attempts are for instance the efforts made by researchers involved in the Chalearn initiative [13] or in other dataset collection campaigns [10,12,9,2]. Besides the issues with data, several open challenges involve the design of algorithms for inferring behavioral cues. In particular, understanding human behavior requires approaches which operate at different levels of granularity and which are able to infer both low-level cues (e.g. detecting people position or pose) and high-level information (e.g. group dynamics and social interactions, emotional patterns). However, devising methods which deal with tasks at different levels of granularity in isolation is largely suboptimal. Future research efforts should be made devoted to addressing the problem of human behavior analysis in a more holistic manner.

0.2 Scope of the book

The main objective of this book is to present an overview of recent advances in the area of behavioral signal processing. A special focus is given to describing the strengths and weaknesses of current methods and technologies which (i) analyze human behaviors by exploiting different data modalities and (ii) are deployed in real-world scenarios. In other words, the two prominent characteristics of the book are the multimodality and the in the wild perspective. Regarding multimodality, the book presents state-of-the-art human behavior understanding methods which exploit information coming from different sensors. Audio and video being the most popular modalities used for analyzing human behavior and activities, they have a privileged role in the manuscript. However, the book also covers methodologies and applications where emerging modalities such as accelerometer or proximity data are exploited for behavior understanding. Regarding the in the wild aspect, the book aims to describe the current usage, limitations and challenges of systems combining multimodal data, signal processing and machine learning for the understanding of behavioral cues in real-world scenarios. Importantly, the book covers tasks at different levels of complexity, from low level (speaker detection, sensorimotor links, source separation), through middle level (conversational group detection, activity recognition) to high level (affect and emotion recognition).

This book is intended to be a resource for experts and practitioners interested on the state of the art and future research challenges of multimodal behavioral analysis in the wild. It is suitable for researchers and graduate students in the fields of computer vision, audio processing, pattern recognition, multimedia analysis, machine learning, robotics, and social signal processing. The chapters of the book are organized according to three main directions, corresponding to three different application domains as illustrated in Fig. 0.1.

Figure 0.1 Overview of the structure of the book chapters.

The first series of chapters mostly deal with the problem of behavior understanding and multimodal fusion in the context of robotics and Human–Robot Interaction (HRI). In particular, Chapter 1 focuses on the development of dialog systems for robotic platforms and addresses two important challenges: how to move from closed-domain to open-domain dialogues, and how to create multimodal (audio-visual) dialog systems. The authors describe an approach to jointly tackle these two problems by proposing a Constructive Dialog Model and show how they handle the topic shifts considering Wikipedia as external resource. Chapter 2 describes a robust methodology for audio-motor integration applied to robot hearing. In robotics, audio signal processing in the wild amounts to deal with sounds recorded by a system that moves and whose actuators produce noise. This creates additional challenges in sound source localization, signal enhancement and recognition. But the specificity of such platforms also brings interesting opportunities: can information about the robot actuators' states be meaningfully integrated in the audio processing pipeline to improve performance and efficiency? While robot audition grew to become an established field, methods that explicitly use motor-state information as a complementary modality to audio are scarcer. This chapter proposes a unified view of this endeavor, referred to as audio-motor integration. A literature review and two learning-based methods for audio-motor integration in robot audition are presented, with application to single-microphone sound source localization and ego-noise reduction on real data. Chapter 3 reviews the literature related to multichannel audio source separation in real-life environments. The authors explore some of the major achievements in the field and discuss some of the remaining challenges. Several important issues, e.g. moving sources and/or microphones, varying numbers of sources and sensors, high reverberation levels, spatially diffuse sources, synchronization, etc., are extensively discussed. Many applicative scenarios, such as smart assistants, cellular phones, hearing aids and robots, are presented together with the most prominent associated methodologies. The chapter concludes with open challenges and interesting future guidelines on the topic.

A second series of chapters describe methodologies for fusing multimodal data collected with wearable technologies. In particular, Chapter 4 describes the development of novel wearable glasses aimed to assist users with limited technology skills and disabilities. The glasses process audio-visual data and are equipped with technologies for visual object recognition to support users with low vision as well as with algorithms for enhancing speech signals for people with hearing loss impairment. The chapter further illustrates the results of a user study conducted with people with disabilities in real-world settings. Chapter 5 also focuses on analyzing audio-visual data from wearable sensors and describes an approach for person re-identification where information from audio signals is exploited to complement image streams in the case of challenging conditions (e.g., rapid changes in camera pose, self-occlusions, motion blur, etc.). Similarly, Chapter 6 considers video streams collected with wearable cameras. The authors address the problem of recognizing activities from visual lifelogs and, after outlining the main challenges of this task, they perform a detailed review of state-of-the-art methods and show the results of an extensive experimental comparison. An interesting application of visual lifelogs is described in Chapter 7. The authors present an approach to automatically analyzing images collected from wearable cameras in order to extract a nonredundant set of frames useful for the purpose of memory stimulation in patients with neurodegenerative diseases (Alzheimer, Mild Cognitive Impairment, etc.). Chapters 8 and 9 describe how wearable technologies, alone or in combination with more traditional static and distributed sensors, can be used to analyze visitor behavior in museums. In particular, these chapters address the challenges of interpreting raw multimodal data for the purpose of visitor tracking and improving tourist experience.

Chapters 10–15 mostly describe recent methodologies and open problems in the analysis of social scenes. Specifically, Chapters 10 and 11 present approaches which exploit data from wearable sensors for the purpose of understanding social interactions. In particular, Chapter 10 addresses the problem of discovering conversational groups (more precisely, F-formations) in egocentric videos depicting social gatherings and presents an algorithm based on Structural SVM. Chapter 11 also focuses on the challenges of analyzing conversational scenes and, in particular, illustrates the limitations of current datasets publicly available for the automated analysis of human social behavior. The authors also describe the conceptual and practical issues inherent to the data collection process, with a specific focus on the multimodality and the ‘in the wild’ perspective. The problem of analyzing social interactions and detecting conversational groups is also addressed in Chapter 12. In particular, a methodology for recognizing F-formations derived from game theory is described but, differently from Chapter 10, the approach is tested on data from static surveillance cameras. Chapter 13 also addresses the problem of analyzing social interactions. The authors point out that understanding nonverbal behavioral cues (e.g., facial expressions, gaze, gestures, etc.) is important both from a human science perspective, as it helps to understand how people work, and from a technological point of view, because it allows one to design systems can make sense of social and psychological phenomena. Chapter 14 focuses on crowd analysis, different from the work presented in the previous chapters, which deals with social scenes with a small number of people. The chapter describes the challenges of understanding crowd behaviors in realistic settings and provides an overview of state of the art approaches for analyzing visual data and detecting motion patterns, tracking people, recognizing activities and spotting anomalous behaviors. A methodological contribution is presented in Chapter 15, where the problem of learning robust and invariant representations in visual recognition systems is considered. This issue is of utmost importance when deploying systems operating in the wild.

The last series of chapters describe methodologies for detecting affective and emotional patterns in real-world settings. In particular, Chapter 16 focuses on the problem of visual affect recognition. The chapter addresses the challenges of bridging the ‘affective gap’ between visual features and semantic concepts. Following the Adjective Noun Pair (ANP) paradigm, i.e. considering mid-level representations of pairs of noun-adjectives such as ‘scary face’, ‘beautiful women’, etc., the authors present an approach for sentiment and emotion prediction which operates by embedding ANP constructs in a latent space.

Chapter 17 addresses the problem of video-based emotion recognition ‘in the wild’ and describes an approach for fusing audio-visual data. The method uses summarizing functionals of complementary visual descriptors in conjunction with audio features. The audio and visual data are fused within a least squares classifier framework. The authors report state-of-the-art results on the EmotiW Challenge. Chapter 18 also considers the problem of emotion recognition from audiovisual signals in real-world environments. This chapter highlights the differences between affect recognition in real-world and laboratory settings, provides an overview of state of the art methodologies, and illustrates an audio-visual continuous emotion recognition system based on deep learning. Similarly, Chapter 19 focuses on affective facial computing with a special emphasis on addressing the ‘in the wild’ aspect. Specifically, the authors consider the generalizability of facial computing technologies across various domains and propose a review of several previous studies on the topic. The outcome of their study is that the ability of current systems to generalize across domains is limited and that, in this context, transfer learning and domain adaptation methodologies are a precious resource. Finally, Chapter 20 discusses the problem of emotion recognition from behavioral data, with special emphasis on distinguishing between self-reported and perceived emotions. The authors further analyze how this aspect influences the design of systems for emotion recognition and outline recent advances and challenges in this research topic.

0.3 Summary of important points

This book aims to describe recent works in the area of human behavior analysis, with special emphasis on studies considering multimodal data. Besides providing an overview on state-of-the-art research in the field, the book highlights the main challenges associated with the automatic analysis of human behaviors in real-world scenarios, discussing the limitations of existing technologies. The chapters of the book describe a large variety of methodologies to extract behavioral cues from multimodal data and consider different applications. This clearly demonstrates that the topic addressed by the book can be of interest for a large set of researchers and graduate students working in different fields.

References

[1] Alexandre Alahi, Vignesh Ramanathan, Kratarth Goel, Alexandre Robicquet, Amir A. Sadeghian, Li Fei-Fei, Silvio Savarese, Learning to predict human behavior in crowded scenes, Group and Crowd Behavior for Computer Vision. 2017:183–207.

[2] Xavier Alameda-Pineda, Jacopo Staiano, Ramanathan Subramanian, Ligia Batrinca, Elisa Ricci, Bruno Lepri, Oswald Lanz, Nicu Sebe, Salsa: a novel dataset for multimodal group behavior analysis, IEEE Trans. Pattern Anal. Mach. Intell. 2016;38(8):1707–1720.

[3] Xavier Alameda-Pineda, Yan Yan, Elisa Ricci, Oswald Lanz, Nicu Sebe, Analyzing free-standing conversational groups: a multimodal approach, ACM International Conference on Multimedia. 2015:5–14.

[4] Yutong Ban, Laurent Girin, Xavier Alameda-Pineda, Radu Horaud, Exploiting the complementarity of audio and visual data in multi-speaker tracking, IEEE/CVF ICCV Workshop on Computer Vision for Audio-Visual Media. 2017.

[5] Yutong Ban, Xiaofei Li, Xavier Alameda-Pineda, Laurent Girin, Radu Horaud, Accounting for room acoustics in audio-visual multi-speaker tracking, IEEE International Conference on Acoustic, Speech and Signal Processing. 2018.

[6] Katherine L. Bouman, Vickie Ye, Adam B. Yedidia, Frédo Durand, Gregory W. Wornell, Antonio Torralba, William T. Freeman, Turning corners into cameras: principles and methods, International Conference on Computer Vision. 2017.

[7] Alessio Brutti, Andrea Cavallaro, Online cross-modal adaptation for audio-visual person identification with wearable cameras, IEEE Trans. Human-Mach. Syst. 2017;47(1):40–51.

[8] Judee K. Burgoon, Nadia Magnenat-Thalmann, Maja Pantic, Alessandro Vinciarelli, Social Signal Processing. Cambridge University Press; 2017.

[9] Laura Cabrera-Quiros, Andrew Demetriou, Ekin Gedik, Leander van der Meij, Hayley Hung, The MatchNMingle dataset: a novel multi-sensor resource for the analysis of social interactions and group dynamics in-the-wild during free-standing conversations and speed dates, IEEE Trans. Affect. Comput. 2018 10.1109/TAFFC.2018.2848914.

[10] Tatjana Chavdarova, Pierre Baqué, Stéphane Bouquet, Andrii Maksai, Cijo Jose, Louis Lettry, Pascal Fua, Luc Van Gool, François Fleuret, The wildtrack multi-camera person dataset, arXiv preprint arXiv:1707.09299; 2017.

[11] Marco Cristani, R. Raghavendra, Alessio Del Bue, Vittorio Murino, Human behavior analysis in video surveillance: a social signal processing perspective, Neurocomputing 2013;100:86–97.

[12] Dima Damen, Hazel Doughty, Giovanni Maria Farinella, Sanja Fidler, Antonino Furnari, Evangelos Kazakos, Davide Moltisanti, Jonathan Munro, Toby Perrett, Will Price, et al., Scaling egocentric vision: the epic-kitchens dataset, arXiv preprint arXiv:1804.02748; 2018.

[13] Sergio Escalera, Xavier Baró, Hugo Jair Escalante, Isabelle Guyon, Chalearn looking at people: a review of events and resources, International Joint Conference on Neural Networks. 2017:1594–1601.

[14] Agrim Gupta, Justin Johnson, Li Fei-Fei, Silvio Savarese, Alexandre Alahi, Social GAN: socially acceptable trajectories with generative adversarial networks, IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2018.

[15] Guan-Horng Liu, Avinash Siravuru, Sai Prabhakar, Manuela Veloso, George Kantor, Learning end-to-end multimodal sensor policies for autonomous navigation, arXiv preprint arXiv:1705.10422; 2017.

[16] Adria Recasens, Aditya Khosla, Carl Vondrick, Antonio Torralba, Where are they looking? Advances in Neural Information Processing Systems. 2015:199–207.

[17] Aliaksandr Siarohin, Enver Sangineto, Stéphane Lathuilière, Nicu Sebe, Deformable GANs for pose-based human image generation, IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2018.

[18] Sergey Tulyakov, Xavier Alameda-Pineda, Elisa Ricci, Lijun Yin, Jeffrey F. Cohn, Nicu Sebe, Self-adaptive matrix completion for heart rate estimation from face videos under realistic conditions, IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2016:2396–2404.

[19] Jagannadan Varadarajan, Ramanathan Subramanian, Samuel Rota Bulò, Narendra Ahuja, Oswald Lanz, Elisa Ricci, Joint estimation of human pose and conversational groups from social scenes, Int. J. Comput. Vis. 2018;126(2–4):410–429.

[20] Alessandro Vinciarelli, Maja Pantic, Hervé Bourlard, Social signal processing: survey of an emerging domain, Image Vis. Comput. 2009;27(12):1743–1759.

[21] Carl Vondrick, Deniz Oktay, Hamed Pirsiavash, Antonio Torralba, Predicting motivations of actions by leveraging text, IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2016:2997–3005.

[22] Jindong Wang, Yiqiang Chen, Shuji Hao, Xiaohui Peng, Lisha Hu, Deep learning for sensor-based activity recognition: a survey, Pattern Recognit. Lett. 2018 10.1109/CVPR.2016.327.

[23] Wei Wang, Xavier Alameda-Pineda, Dan Xu, Pascal Fua, Elisa Ricci, Nicu Sebe, Every smile is unique: landmark-guided diverse smile generation, IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2018:7083–7092.

[24] Huazhe Xu, Yang Gao, Fisher Yu, Trevor Darrell, End-to-end learning of driving models from large-scale video datasets, IEEE/CVF Computer Vision and Pattern Recognition. 2017.

Chapter 1

Multimodal open-domain conversations with robotic platforms

Kristiina Jokinen⁎; Graham Wilcock† ⁎AIST/AIRC, Tokyo, Japan

†CDM Interact, Helsinki, Finland

Abstract

The chapter discusses how to move from closed-domain dialogs to open-domain dialogs, and from speech-based dialogs to multimodal dialogs with speech, gestures, and gaze, using robot agents. We briefly describe the Constructive Dialog Model, the foundation for our work. Management of topic shifts is one of the challenges for open-domain dialogs, and we describe how Wikipedia can be used for topic shifts as well as an open-domain knowledge source. Multimodal issues are illustrated by our multimodal WikiTalk open-domain robot dialog system. Two future research directions are discussed: the use of domain ontologies in dialog systems and the need to integrate robots with the Internet of Things.

Keywords

Multimodal communication; Constructive dialog model; Human–robot interaction; Open-domain dialogs

Chapter Outline

1.1 Introduction

1.1.1 Constructive Dialog Model

1.2 Open-domain dialogs

1.2.1 Topic shifts and topic trees

1.2.2 Dialogs using Wikipedia

1.3 Multimodal dialogs

1.3.1 Multimodal WikiTalk for robots

1.3.2 Multimodal topic modeling

1.4 Future directions

1.4.1 Dialogs using domain ontologies

1.4.2 IoT and an integrated robot architecture

1.5 Conclusion

References

1.1 Introduction

From a historical point of view, the development of natural language conversational systems has accelerated in recent years due to advances in computational facilities and multimodal dialog modeling, availability of big data, statistical modeling and deep learning, and increased demand in commercial applications. In a somewhat simplified manner, we can say that the capability of conversational systems has improved in roughly 20-year time-spans if seen from the viewpoint of technological advancements: from ELIZA's imitation of human-like properties in the early 1960s, via systems that understand spoken natural language and various multimodal acts in limited domains developed in the 1980s, to interactive systems that are part of everyday environments in the 21st century. Embodied conversational agents, chatbots, Siri, Amazon Alexa, Google Home, etc. are examples of the multitude of interactive systems that aim to provide natural language capabilities for question answering and to search for useful information in the cloud.

The rapid development of robot technology has had a huge impact on interaction research and in particular, on developing social robotics, i.e. human–robot applications, where the robot can provide natural language communication with a user, and be able to observe and understand the user's needs and emotions. This enables a novel type of interaction where the robot is not just a tool to do things, but an agent to communicate with: social robots can interact with human users in natural language, and support companionship and peer-type assistance which feature information-providing as well as chatting and sensitivity to social aspects of interaction. Co-located acting and free observations of the partner are both beneficial and challenging for interaction modeling. Interaction becomes richer and more natural, but also more complicated: learning the various social signals and constructing a shared context for the interaction (cf. [15]).

Social robotics emphasizes the robot's communication skills besides its autonomous decision-making and moving around in the environment. Social robots show more human-like interaction and try to act in a proactive manner so as to support human interest and activity. Consequently, social robotics has had a huge impact on interaction technology. Communication is simultaneously visual, verbal, and vocal, i.e. humans not only utter words, but use various vocalizations (laughs, coughs), head, gaze, hands, and the whole body to convey messages. In order to understand human behavior and communicative needs, the robot should observe the user's multimodal signals and be able to generate reasonable behavior patterns in interactive situations. The main hypothesis is that the more engaging the interaction is in terms of communicative competence, the better results the interaction produces, whether or not the task that the user is involved in with the robot concerns friendly chatting or some more structured task.

The chapter is structured as follows. The next section describes the Constructive Dialog Model which forms the foundation for our work. Section 1.2 discusses issues in moving from closed-domain dialogs to open-domain dialogs, including how to manage topic shifts and how to use Wikipedia as a knowledge source. Section 1.3 addresses multimodal interaction with the Nao robot by speech, gesturing and face-tracking, and multimodal aspects of topic modeling. Section 1.4 briefly presents two future research directions, the use of domain ontologies in dialog systems and the need to integrate robots with the Internet of Things. Section 1.5 presents conclusions.

1.1.1 Constructive Dialog Model

Conversational interactions are cooperative activities through which the interlocutors build common ground (Clark and Schaefer [8]). Cooperation indicates the interlocutors' basic willingness to engage in the conversation, and manifests itself in smooth turn-taking and coherent replies. The agents react to the situation according to their beliefs, intentions and interpretation of the situation, and they use multimodal signals to indicate how the basic enablements of communication are fulfilled.

In Fig. 1.1, interaction in the Constructive Dialog Model (CDM) [15] is seen as a cycle which starts with the participants being in contact, observing the partner's intent to communicate, interpreting the partner's communicative content, and producing their own reaction to the message in an appropriate manner. Fig. 1.1 shows the communication cycle with the basic enablements of communication, which concern Contact, Perception, Understanding, and Reaction (Allwood [1], Jokinen [15]).

Figure 1.1 The communication cycle in the Constructive Dialog Model.

Contact refers to the participants' mutual awareness of their intention to communicate, i.e. being close enough to be able to communicate or having a means to communicate such as a phone or skype if not in a face-to-face situation. Perception relates to the participants' perception of communicative signals as a message with an intent. Understanding concerns the participants' cognitive processes to interpret the message in the given context. Reaction is the speakers' observable behavior which manifests their reaction to the new changed situation in which the agents find themselves.

In the CDM system architecture in Fig. 1.2, signal detection and signal analysis modules implement Contact and Perception, respectively, for speech, gesture, and gaze recognition, and these components are responsible for interpreting the user awareness. Understanding is implemented by the decision-making and related modules, while Reaction corresponds to the production and coordination of utterances and motoric actions, including internal updates. Together these two are responsible for the system's engagement with the user. The dialog management is based on dialog states (also called mental states) which are representations of the situation the robot is in and the situation it believes the user is in.

Figure 1.2 An implementation of the CDM model in a system architecture.

Many neurocognitive studies show how activation in the brain is triggered by the mere appearance of a human in the vicinity of a person, while attention is directed to a human face (Levitski et al. [22]). The robot agent obtains information from the environment via its sensors and the dialog component integrates them into the system knowledge base through its recognition and decision-making processes. The perception of the partner concerns recognition of the signals as having some communicative meaning: the face belongs to a particular person, the sounds belong to a particular language, and gesturing has communicative content.

Interpretation of the signals concerns their further processing to form a meaningful semantic representation in the given context. The new information entered into the system will trigger a reaction, i.e. cognitive processes which evaluate the new information with respect to the agent's own goals and the decision-making process which results in carrying out an action that in turn triggers a similar analysis and generation process in the partner. If the speaker is repeatedly exposed to a certain kind of communicative situation and if the speaker's communicative action results in a successful goal achievement, the same action will be used again, to maximize benefits in the future.

To construct common ground, the interlocutors thus pay attention to signals that indicate the partner's reaction to the conveyed message, their emotional state, and the new information in the partner's speech. Non-verbal signals such as pauses, intonation, nods, smiles, frowns, eye-gaze, gesturing etc. are effectively used to signal the speaker's understanding and emotions (Feldman and Rim [12]). Studies on embodied conversational agents have widely focused on various aspects of interaction, multimodality, and culturally competent communication (see e.g. Andre and Pelachaud [4], Jokinen et al. [16,17]). For instance, in human–human interactions, gestures and head movements complement language expressions and enable the interlocutors to coordinate the dialog in a tacit manner. Gesturing is synchronized with speech, and besides the pointing gestures and iconic gestures that refer to objects and describe events, gesturing also provides rhythmic accompaniment to speech (co-gesturing) which contributes to the fluency of expression and construction of shared understanding.

A related question is the robot's understanding of the relation between language and the physical environment, i.e. the embodiment of linguistic knowledge via action and interaction. The connection between linguistic expressions and objects in the environment is called grounding. In dialog modeling, the term grounding is usually used to refer to the interlocutors' actions that enable them to build mutual understanding of the referential elements. Grounding in interactions can be studied with respect to the notion of affordance (Jokinen [15]): the interlocutors' actions (speech and gesturing) should readily support a natural and smooth way of communication. Interactions between humans and robots as well as between humans and intelligent environments should enable easy recognition of various communicatively important objects, and the different objects must be distinguished from each other so as to be correctly referred to.

The CDM framework is applied in the WikiTalk open-domain robot dialog system (Jokinen and Wilcock [21]) where both the human and the robot can initiate topics and ask questions on a wide variety of topics. It is also applied in human–robot interaction such as newspaper-reading or story-telling in elder-care and educational activities. It aims at acquiring a good level of knowledge about the user and his/her context and can thus enable an open-domain conversation with the user, presenting useful and interesting information. As the robot can observe user behavior, it can infer a user's emotion and interest levels, and tailor its presentation accordingly.

1.2 Open-domain dialogs

In traditional closed-domain dialog systems, such as flight reservation systems, the system asks questions in order to achieve a specified dialog goal. Finite state machines can be used for this kind of closed-domain form-filling dialog. The system asks questions, which are predefined for the specific domain, in order to achieve the dialog goal by filling in the required fields in the form. It is easy to change the information within the domain database, for example about flights and destinations, but it is very difficult to change to a different domain because the questions are specific to the domain.

In order to advance from closed-domain to open-domain dialogs, WikiTalk [26] uses Wikipedia as its source of world knowledge. By exploiting ready-made paragraphs and sentences from Wikipedia, the system enables a robot to talk about thousands of different topics (there are 5 million articles in English Wikipedia). WikiTalk is open-domain in the sense that the currently talked-about topic can be any topic that Wikipedia has an article about, and the user can switch topics at any time and as often as desired. In an open-domain system it is extremely important to keep track of the current topic and to have smooth mechanisms for changing to new topics.

1.2.1 Topic shifts and topic trees

An important feature that enables an interactive system to manage dialogs in a natural manner is its ability to handle smooth topic shifts, i.e. to be able to provide a relevant continuation in the current dialog state. The underlying problem is that knowing what is relevant depends on the overall organization of knowledge.

The organization of knowledge into related topics has often been done with the help of topic trees. Originally focus trees were proposed by McCoy and Cheng [23] to trace foci in natural language generation systems. The branches of the tree describe what sort of topic shifts are cognitively easy to process and can be expected to occur in dialogs: random jumps from one branch to another are not very likely to occur, and if they do, they should be appropriately marked. McCoy and Cheng [23] dealt with different types of focusing phenomena by referring to a model of the conceptual structure of the domain of discourse. They introduced the notion of focus tree and argued that the tree structure is more flexible in managing focus shifts than a stack: instead of pushing and popping foci in a particular order into and from the stack, the tree allows traversal of the branches in a different order, and the coherence of the text can be determined on the basis of the distance of the focus nodes in the tree.

The focus tree is a subgraph of the world knowledge, built in the course of the discourse on the basis of the utterances that have occurred so far. The tree both constrains and enables prediction of what is likely to be talked about next, and thus provides a top-down approach to dialog coherence. The topic (focus) is a means to describe thematically coherent discourse structure, and its use has been mainly supported by arguments regarding anaphora resolution and processing effort. Focus shifting rules are expressed in terms of the type of relationships which occur in the domain. In language generation, they provide information about whether or not a topic shift is easy to process (and, similarly, whether or not the hearer will expect some kind of marker), and in language analysis they help to decide on what sort of topic shifts are likely to occur. Jokinen, Tanaka and Yokoo [19] applied the idea of focus tree in spoken dialog processing. They made the distinctions between topical vs. non-topical informational units, i.e. what the utterance is about vs. what is in the background, and new vs. old information in the dialog context.

Grosz, Weinstein and Joshi [14] distinguished between global and local coherence, as well as between global focus and centering, respectively. The former refers to the ways in which larger segments of discourse relate to each other, and accordingly, global focus refers to a set of entities that are relevant to the overall discourse. The latter deals with individual sentences and their combinations into larger discourse segments, and accordingly, centering refers to a more local focusing process which identifies a single entity as the most central one in an individual sentence. Each sentence can thus be associated with a single backward-looking center which encodes the notion of global focusing and a set of forward-looking centers which encodes the notion of centering.

The organization of knowledge has always been one of the big questions, but we can now look for help with this question from the internet. In fact we can assume that world knowledge is somehow stored in the internet and we wish to take advantage of this. Previously, topic trees were hand-coded which is time-consuming and subjective. Automatic clustering programs were also used but were not entirely satisfactory. Our approach to topic trees exploits the organization of domain knowledge in terms of topic types found in the web, and more specifically in Wikipedia.

We use topic information in predicting the likely content of the next utterance, and thus we are more interested in the topic types that describe the information conveyed by utterances than the actual topic entity. Consequently, instead of tracing salient entities in the dialog and providing heuristics for different shifts of attention, we seek a formalization of the information structure of utterances in terms of the new information that is exchanged. Wikipedia provides an extensive, freely available, open-domain and constantly growing knowledge source. We therefore use Wikipedia to produce robot contributions in open-domain

Enjoying the preview?

Page 1 of 1

Multimodal Behavior Analysis in the Wild: Advances and Challenges

About this ebook

Related to Multimodal Behavior Analysis in the Wild

Related ebooks

Robotics For You

Related podcast episodes

Related articles

Related categories

Reviews for Multimodal Behavior Analysis in the Wild

What did you think?

Book preview

Multimodal Behavior Analysis in the Wild - Xavier Alameda-Pineda

Abstract

Keywords

Multimodal data; Human behavior analysis; Realistic conditions

0.1 Analyzing human behavior in the wild from multimodal data

0.2 Scope of the book

0.3 Summary of important points

References

Abstract

Keywords

Multimodal communication; Constructive dialog model; Human–robot interaction; Open-domain dialogs

Chapter Outline

1.1 Introduction

1.1.1 Constructive Dialog Model

1.2 Open-domain dialogs

1.2.1 Topic shifts and topic trees