OpenCL in Action: How to accelerate graphics and computations

Ebook864 pages7 hours

OpenCL in Action: How to accelerate graphics and computations

Name: OpenCL in Action: How to accelerate graphics and computations
Author: Matthew Scarpino
ISBN: 9781638352389

By Matthew Scarpino

Rating: 0 out of 5 stars

()

Read preview

About this ebook

Summary

OpenCL in Action is a thorough, hands-on presentation of OpenCL, with an eye toward showing developers how to build high-performance applications of their own. It begins by presenting the core concepts behind OpenCL, including vector computing, parallel programming, and multi-threaded operations, and then guides you step-by-step from simple data structures to complex functions.
About the Technology
Whatever system you have, it probably has more raw processing power than you're using. OpenCL is a high-performance programming language that maximizes computational power by executing on CPUs, graphics processors, and other number-crunching devices. It's perfect for speed-sensitive tasks like vector computing, matrix operations, and graphics acceleration.
About this Book
OpenCL in Action blends the theory of parallel computing with the practical reality of building high-performance applications using OpenCL. It first guides you through the fundamental data structures in an intuitive manner. Then, it explains techniques for high-speed sorting, image processing, matrix operations, and fast Fourier transform. The book concludes with a deep look at the all-important subject of graphics acceleration. Numerous challenging examples give you different ways to experiment with working code.

A background in C or C++ is helpful, but no prior exposure to OpenCL is needed.

Purchase of the print book comes with an offer of a free PDF, ePub, and Kindle eBook from Manning. Also available is all code from the book.
What's Inside

Learn OpenCL step by step
Tons of annotated code
Tested algorithms for maximum performance

***********
Table of Contents

Introducing OpenCL
Host programming: fundamental data structures
Host programming: data transfer and partitioning
Kernel programming: data types and device memory
Kernel programming: operators and functions
Image processing
Events, profiling, and synchronization
Development with C++
Development with Java and Python
General coding principles
Reduction and sorting
Matrices and QR decomposition
Sparse matrices
Signal processing and the fast Fourier transform
Combining OpenCL and OpenGL
Textures and renderbuffers

Skip carousel

LanguageEnglish

PublisherManning

Release dateNov 13, 2011

ISBN9781638352389

Author

Matthew Scarpino

Related to OpenCL in Action

Related ebooks

Skip carousel

CoreOS in Action: Running Applications on Container Linux
Ebook
CoreOS in Action: Running Applications on Container Linux
byMatt Bailey
Rating: 0 out of 5 stars
0 ratings
Storm Applied: Strategies for real-time event processing
Ebook
Storm Applied: Strategies for real-time event processing
byMatthew Jankowski
Rating: 0 out of 5 stars
0 ratings
WPF in Action with Visual Studio 2008: Covers Visual Studio 2008 Service Pack 1 and .NET 3.5 Service Pack 1!
Ebook
WPF in Action with Visual Studio 2008: Covers Visual Studio 2008 Service Pack 1 and .NET 3.5 Service Pack 1!
byArlen Feldman
Rating: 0 out of 5 stars
0 ratings
Erlang and OTP in Action
Ebook
Erlang and OTP in Action
byEric Merritt
Rating: 0 out of 5 stars
0 ratings
Android in Practice
Ebook
Android in Practice
byMatthias Kaeppler
Rating: 0 out of 5 stars
0 ratings
OSGi in Action: Creating Modular Applications in Java
Ebook
OSGi in Action: Creating Modular Applications in Java
byKarl Pauls
Rating: 0 out of 5 stars
0 ratings
Programming the TI-83 Plus/TI-84 Plus
Ebook
Programming the TI-83 Plus/TI-84 Plus
byChristopher Mitchell
Rating: 0 out of 5 stars
0 ratings
Parallel and High Performance Computing
Ebook
Parallel and High Performance Computing
byRobert Robey
Rating: 0 out of 5 stars
0 ratings
Front-End Tooling with Gulp, Bower, and Yeoman
Ebook
Front-End Tooling with Gulp, Bower, and Yeoman
byStefan Baumgartner
Rating: 0 out of 5 stars
0 ratings
Android in Action
Ebook
Android in Action
byFrank Ableson
Rating: 0 out of 5 stars
0 ratings
The Well-Grounded Java Developer: Vital techniques of Java 7 and polyglot programming
Ebook
The Well-Grounded Java Developer: Vital techniques of Java 7 and polyglot programming
byBenjamin Evans
Rating: 4 out of 5 stars
4/5
Arduino in Action
Ebook
Arduino in Action
byJordan Hochenbaum
Rating: 0 out of 5 stars
0 ratings
Problem-solving in High Performance Computing: A Situational Awareness Approach with Linux
Ebook
Problem-solving in High Performance Computing: A Situational Awareness Approach with Linux
byIgor Ljubuncic
Rating: 0 out of 5 stars
0 ratings
High Performance Parallelism Pearls Volume One: Multicore and Many-core Programming Approaches
Ebook
High Performance Parallelism Pearls Volume One: Multicore and Many-core Programming Approaches
byJames Reinders
Rating: 0 out of 5 stars
0 ratings
Data Structures, Computer Graphics, and Pattern Recognition
Ebook
Data Structures, Computer Graphics, and Pattern Recognition
byA. Klinger
Rating: 0 out of 5 stars
0 ratings
Complex Binary Number System: Algorithms and Circuits
Ebook
Complex Binary Number System: Algorithms and Circuits
byTariq Jamil
Rating: 0 out of 5 stars
0 ratings
Systems Programming: Designing and Developing Distributed Applications
Ebook
Systems Programming: Designing and Developing Distributed Applications
byRichard Anthony
Rating: 0 out of 5 stars
0 ratings
Mastering Classification Algorithms for Machine Learning: Learn how to apply Classification algorithms for effective Machine Learning solutions (English Edition)
Ebook
Mastering Classification Algorithms for Machine Learning: Learn how to apply Classification algorithms for effective Machine Learning solutions (English Edition)
byPartha Majumdar
Rating: 0 out of 5 stars
0 ratings
Intelligent Image and Video Compression: Communicating Pictures
Ebook
Intelligent Image and Video Compression: Communicating Pictures
byDavid Bull
Rating: 5 out of 5 stars
5/5
GPU-based Parallel Implementation of Swarm Intelligence Algorithms
Ebook
GPU-based Parallel Implementation of Swarm Intelligence Algorithms
byYing Tan
Rating: 0 out of 5 stars
0 ratings
Sharing Data and Models in Software Engineering
Ebook
Sharing Data and Models in Software Engineering
byTim Menzies
Rating: 5 out of 5 stars
5/5
Structured Parallel Programming: Patterns for Efficient Computation
Ebook
Structured Parallel Programming: Patterns for Efficient Computation
byMichael McCool
Rating: 1 out of 5 stars
1/5
Heterogeneous System Architecture: A New Compute Platform Infrastructure
Ebook
Heterogeneous System Architecture: A New Compute Platform Infrastructure
byWen-mei W. Hwu
Rating: 0 out of 5 stars
0 ratings
Foundations of Data Intensive Applications: Large Scale Data Analytics under the Hood
Ebook
Foundations of Data Intensive Applications: Large Scale Data Analytics under the Hood
bySupun Kamburugamuve
Rating: 0 out of 5 stars
0 ratings
High Performance Computing: Technology, Methods and Applications
Ebook
High Performance Computing: Technology, Methods and Applications
byElsevier Books Reference
Rating: 0 out of 5 stars
0 ratings
Rust for C++ Programmers: Learn how to embed Rust in C/C++ with ease (English Edition)
Ebook
Rust for C++ Programmers: Learn how to embed Rust in C/C++ with ease (English Edition)
byMustafif Khan
Rating: 0 out of 5 stars
0 ratings
Graphics Gems III (IBM Version): Ibm Version
Ebook
Graphics Gems III (IBM Version): Ibm Version
byDavid Kirk
Rating: 3 out of 5 stars
3/5
C++ Networking 101: Unlocking Sockets, Protocols, VPNs, and Asynchronous I/O with 75+ sample programs
Ebook
C++ Networking 101: Unlocking Sockets, Protocols, VPNs, and Asynchronous I/O with 75+ sample programs
byAnais Sutherland
Rating: 0 out of 5 stars
0 ratings
Tiny C Projects
Ebook
Tiny C Projects
byDan Gookin
Rating: 0 out of 5 stars
0 ratings
Heterogeneous Computing with OpenCL 2.0
Ebook
Heterogeneous Computing with OpenCL 2.0
byDavid R. Kaeli
Rating: 0 out of 5 stars
0 ratings

Programming For You

Skip carousel

Python: For Beginners A Crash Course Guide To Learn Python in 1 Week
Ebook
Python: For Beginners A Crash Course Guide To Learn Python in 1 Week
byTimothy C. Needham
Rating: 4 out of 5 stars
4/5
The JavaScript Workshop: Learn to develop interactive web applications with clean and maintainable JavaScript code
Ebook
The JavaScript Workshop: Learn to develop interactive web applications with clean and maintainable JavaScript code
byJoseph Labrecque
Rating: 5 out of 5 stars
5/5
HTML & CSS: Learn the Fundaments in 7 Days
Ebook
HTML & CSS: Learn the Fundaments in 7 Days
byMichael Knapp
Rating: 4 out of 5 stars
4/5
Python Programming : How to Code Python Fast In Just 24 Hours With 7 Simple Steps
Ebook
Python Programming : How to Code Python Fast In Just 24 Hours With 7 Simple Steps
byJason Scotts
Rating: 4 out of 5 stars
4/5
Python Programming For Beginners: Learn The Basics Of Python Programming (Python Crash Course, Programming for Dummies)
Ebook
Python Programming For Beginners: Learn The Basics Of Python Programming (Python Crash Course, Programming for Dummies)
byJames Tudor
Rating: 5 out of 5 stars
5/5
Java for Beginners: A Crash Course to Learn Java Programming in 1 Week
Ebook
Java for Beginners: A Crash Course to Learn Java Programming in 1 Week
byBrady Ellison
Rating: 5 out of 5 stars
5/5
CODING FOR ABSOLUTE BEGINNERS: How to Keep Your Data Safe from Hackers by Mastering the Basic Functions of Python, Java, and C++ (2022 Guide for Newbies)
Ebook
CODING FOR ABSOLUTE BEGINNERS: How to Keep Your Data Safe from Hackers by Mastering the Basic Functions of Python, Java, and C++ (2022 Guide for Newbies)
byEric Vargas
Rating: 0 out of 5 stars
0 ratings
SQL: For Beginners: Your Guide To Easily Learn SQL Programming in 7 Days
Ebook
SQL: For Beginners: Your Guide To Easily Learn SQL Programming in 7 Days
byi Code Academy
Rating: 5 out of 5 stars
5/5
Coding All-in-One For Dummies
Ebook
Coding All-in-One For Dummies
byNikhil Abraham
Rating: 4 out of 5 stars
4/5
Python Programming for Beginners: A Comprehensive Crash Course With Practical Exercises to Quickly Learn Coding and Programming for Data Analysis and Machine Learning
Ebook
Python Programming for Beginners: A Comprehensive Crash Course With Practical Exercises to Quickly Learn Coding and Programming for Data Analysis and Machine Learning
byAnthony Adams
Rating: 4 out of 5 stars
4/5
Data Science from Scratch: The #1 Data Science Guide for Everything A Data Scientist Needs to Know: Python, Linear Algebra, Statistics, Coding, Applications, Neural Networks, and Decision Trees
Ebook
Data Science from Scratch: The #1 Data Science Guide for Everything A Data Scientist Needs to Know: Python, Linear Algebra, Statistics, Coding, Applications, Neural Networks, and Decision Trees
bySteven Cooper
Rating: 4 out of 5 stars
4/5
Python Machine Learning By Example
Ebook
Python Machine Learning By Example
byYuxi (Hayden) Liu
Rating: 4 out of 5 stars
4/5
Excel Essentials: A Step-by-Step Guide with Pictures for Absolute Beginners to Master the Basics and Start Using Excel with Confidence
Ebook
Excel Essentials: A Step-by-Step Guide with Pictures for Absolute Beginners to Master the Basics and Start Using Excel with Confidence
byNigel Tillery
Rating: 0 out of 5 stars
0 ratings
Learn to Code. Get a Job. The Ultimate Guide to Learning and Getting Hired as a Developer.
Ebook
Learn to Code. Get a Job. The Ultimate Guide to Learning and Getting Hired as a Developer.
byGwendolyn Faraday
Rating: 5 out of 5 stars
5/5
Learn SQL in 24 Hours
Ebook
Learn SQL in 24 Hours
byAlex Nordeen
Rating: 5 out of 5 stars
5/5
The Advanced Roblox Coding Book: An Unofficial Guide, Updated Edition: Learn How to Script Games, Code Objects and Settings, and Create Your Own World!
Ebook
The Advanced Roblox Coding Book: An Unofficial Guide, Updated Edition: Learn How to Script Games, Code Objects and Settings, and Create Your Own World!
byHeath Haskins
Rating: 5 out of 5 stars
5/5
SQL QuickStart Guide: The Simplified Beginner's Guide to Managing, Analyzing, and Manipulating Data With SQL
Ebook
SQL QuickStart Guide: The Simplified Beginner's Guide to Managing, Analyzing, and Manipulating Data With SQL
byWalter Shields
Rating: 4 out of 5 stars
4/5
Linux: Learn in 24 Hours
Ebook
Linux: Learn in 24 Hours
byAlex Nordeen
Rating: 5 out of 5 stars
5/5
Python Machine Learning - Third Edition: Machine Learning and Deep Learning with Python, scikit-learn, and TensorFlow 2, 3rd Edition
Ebook
Python Machine Learning - Third Edition: Machine Learning and Deep Learning with Python, scikit-learn, and TensorFlow 2, 3rd Edition
bySebastian Raschka
Rating: 5 out of 5 stars
5/5
HTML & CSS QuickStart Guide: The Simplified Beginners Guide to Developing a Strong Coding Foundation, Building Responsive Websites, and Mastering the Fundamentals of Modern Web Design
Ebook
HTML & CSS QuickStart Guide: The Simplified Beginners Guide to Developing a Strong Coding Foundation, Building Responsive Websites, and Mastering the Fundamentals of Modern Web Design
byDavid DuRocher
Rating: 4 out of 5 stars
4/5
Pokemon Go: Guide + 20 Tips and Tricks You Must Read Hints, Tricks, Tips, Secrets, Android, iOS
Ebook
Pokemon Go: Guide + 20 Tips and Tricks You Must Read Hints, Tricks, Tips, Secrets, Android, iOS
byGame Guidez
Rating: 5 out of 5 stars
5/5
Excel : The Ultimate Comprehensive Step-By-Step Guide to the Basics of Excel Programming: 1
Ebook
Excel : The Ultimate Comprehensive Step-By-Step Guide to the Basics of Excel Programming: 1
byKevin Clark
Rating: 5 out of 5 stars
5/5
Microsoft Office 365 Bible: 10:1 Mastery | Excel in Your Profession, Enhance Time Management, and Foster Exceptional Collaboration [III EDITION]: Career Elevator
Ebook
Microsoft Office 365 Bible: 10:1 Mastery | Excel in Your Profession, Enhance Time Management, and Foster Exceptional Collaboration [III EDITION]: Career Elevator
byKevin Pitch
Rating: 5 out of 5 stars
5/5
Grokking Algorithms: An illustrated guide for programmers and other curious people
Ebook
Grokking Algorithms: An illustrated guide for programmers and other curious people
byAditya Bhargava
Rating: 4 out of 5 stars
4/5
SQL All-in-One For Dummies
Ebook
SQL All-in-One For Dummies
byAllen G. Taylor
Rating: 3 out of 5 stars
3/5
Modern C++ for Absolute Beginners: A Friendly Introduction to C++ Programming Language and C++11 to C++20 Standards
Ebook
Modern C++ for Absolute Beginners: A Friendly Introduction to C++ Programming Language and C++11 to C++20 Standards
bySlobodan Dmitrović
Rating: 0 out of 5 stars
0 ratings
Web Designer's Idea Book, Volume 4: Inspiration from the Best Web Design Trends, Themes and Styles
Ebook
Web Designer's Idea Book, Volume 4: Inspiration from the Best Web Design Trends, Themes and Styles
byPatrick McNeil
Rating: 4 out of 5 stars
4/5
101 Amazing Nintendo NES Facts: Includes facts about the Famicom
Ebook
101 Amazing Nintendo NES Facts: Includes facts about the Famicom
byJimmy Russell
Rating: 4 out of 5 stars
4/5
OneNote: The Ultimate Guide on How to Use Microsoft OneNote for Getting Things Done
Ebook
OneNote: The Ultimate Guide on How to Use Microsoft OneNote for Getting Things Done
byChris Will
Rating: 1 out of 5 stars
1/5
Learn PowerShell in a Month of Lunches, Fourth Edition: Covers Windows, Linux, and macOS
Ebook
Learn PowerShell in a Month of Lunches, Fourth Edition: Covers Windows, Linux, and macOS
byTravis Plunk
Rating: 0 out of 5 stars
0 ratings

Related podcast episodes

Skip carousel

TestContainers to Reduce Developer Frustration
Podcast episode
TestContainers to Reduce Developer Frustration
byThe Cloudcast
0 ratings
0% found this document useful
Morgan Senkal: Using Epics to Improve Code Quality Within Sprints: Robby speaks with Morgan Senkal, Software Architect at Metal Toad. Morgan recalls a challenging 15-year-old legacy project that was reminiscent of a Stephen King story and explains what to think about when considering a software rewrite. Morgan and Robby keep a running analogy of technical debt and automotive repairs.
Podcast episode
Morgan Senkal: Using Epics to Improve Code Quality Within Sprints: Robby speaks with Morgan Senkal, Software Architect at Metal Toad. Morgan recalls a challenging 15-year-old legacy project that was reminiscent of a Stephen King story and explains what to think about when considering a software rewrite. Morgan and Robby keep a running analogy of technical debt and automotive repairs.
byMaintainable
0 ratings
0% found this document useful
Every commit is a gift: celebrating Maintainer Week with Brett Cannon
Podcast episode
Every commit is a gift: celebrating Maintainer Week with Brett Cannon
byThe Changelog: Software Development, Open Source
0 ratings
0% found this document useful
This Week In Machine Learning & AI - 5/20/16: AI at Google I/O, Amazon's Deep Learning DSSTNE: This Week In Machine Learning & AI - May 20, 2016…
Podcast episode
This Week In Machine Learning & AI - 5/20/16: AI at Google I/O, Amazon's Deep Learning DSSTNE: This Week In Machine Learning & AI - May 20, 2016…
byThe TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)
0 ratings
0% found this document useful
Build Better Machine Learning Models With Confidence By Adding Validation With Deepchecks: A cross-over episode from The Machine Learning Podcast with the team from Deepchecks, exploring the challenges of testing and validating machine learning applications and their work to make it easier.
Podcast episode
Build Better Machine Learning Models With Confidence By Adding Validation With Deepchecks: A cross-over episode from The Machine Learning Podcast with the team from Deepchecks, exploring the challenges of testing and validating machine learning applications and their work to make it easier.
byThe Python Podcast.__init__
0 ratings
0% found this document useful
235: Pair programming with Ben Orenstein & Tuple: In this episode, Kaushik goes solo and interviews Ben Orenstein. Ben is a prolific Ruby developer, an amazing conference speaker, an ardent vim-ster, and now the CEO of Tuple. Kaushik has been a big fan of Ben's work and was super stoked to talk to Ben and pick his brains on a host of topics: starting the company Tuple, pair programming in general, learning different programming languages and technology, giving better conference talks and more! This episode is chock full of wisdom from Ben. Enjoy!
Podcast episode
235: Pair programming with Ben Orenstein & Tuple: In this episode, Kaushik goes solo and interviews Ben Orenstein. Ben is a prolific Ruby developer, an amazing conference speaker, an ardent vim-ster, and now the CEO of Tuple. Kaushik has been a big fan of Ben's work and was super stoked to talk to Ben and pick his brains on a host of topics: starting the company Tuple, pair programming in general, learning different programming languages and technology, giving better conference talks and more! This episode is chock full of wisdom from Ben. Enjoy!
byFragmented - An Android Developer Podcast
0 ratings
0% found this document useful
41. Bob Nystrom
Podcast episode
41. Bob Nystrom
byIt's All Widgets! Flutter Podcast
0 ratings
0% found this document useful
#98 Interpretable Machine Learning
Podcast episode
#98 Interpretable Machine Learning
byDataFramed
0 ratings
0% found this document useful
15: “My interpretation of functional programming”, with special guest Chris Eidhof: Chris Eidhof, founder of objc.io and co-host of Swift Talk, joins John to talk about app architecture, functional programming, the "rockstar developer culture", picking database solutions and much more!
Podcast episode
15: “My interpretation of functional programming”, with special guest Chris Eidhof: Chris Eidhof, founder of objc.io and co-host of Swift Talk, joins John to talk about app architecture, functional programming, the "rockstar developer culture", picking database solutions and much more!
bySwift by Sundell
100%
100% found this document useful
Jobs of Tomorrow: Windows Insider Podcast Episode 17
Podcast episode
Jobs of Tomorrow: Windows Insider Podcast Episode 17
byWindows Insider Podcast
100%
100% found this document useful
This Week In Machine Learning & AI - 5/27/16: The White House on AI & Aggressive Self-Driving Cars: This Week in Machine Learning & AI brings you the…
Podcast episode
This Week In Machine Learning & AI - 5/27/16: The White House on AI & Aggressive Self-Driving Cars: This Week in Machine Learning & AI brings you the…
byThe TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)
0 ratings
0% found this document useful
One Shot and Metric Learning - Quadruplet Loss (Machine Learning Dojo)
Podcast episode
One Shot and Metric Learning - Quadruplet Loss (Machine Learning Dojo)
byMachine Learning Street Talk (MLST)
0 ratings
0% found this document useful
Learning Long-Time Dependencies with RNNs w/ Konstantin Rusch - #484: Today we conclude our 2021 ICLR coverage joined by Konstantin Rusch, a PhD Student at ETH Zurich. In our conversation with Konstantin, we explore his recent papers, titled coRNN and uniCORNN respectively, which focus on a novel architecture of...
Podcast episode
Learning Long-Time Dependencies with RNNs w/ Konstantin Rusch - #484: Today we conclude our 2021 ICLR coverage joined by Konstantin Rusch, a PhD Student at ETH Zurich. In our conversation with Konstantin, we explore his recent papers, titled coRNN and uniCORNN respectively, which focus on a novel architecture of...
byThe TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)
0 ratings
0% found this document useful
The Past, Present, and Future of Deep Learning In PyTorch: An interview with the creator of the popular PyTorch deep learning framework
Podcast episode
The Past, Present, and Future of Deep Learning In PyTorch: An interview with the creator of the popular PyTorch deep learning framework
byThe Python Podcast.__init__
0 ratings
0% found this document useful
433: Falling for FastAPI: Mike's falling in love with FastAPI and gives us a hint at the next project he's building.
Podcast episode
433: Falling for FastAPI: Mike's falling in love with FastAPI and gives us a hint at the next project he's building.
byCoder Radio
0 ratings
0% found this document useful
Nest, Node.js, & F.Secure - Application Security Weekly #None: In the news, the entire Nest ecosystem of smart home devices goes offline, how Alphabet plans to keep hackers away from this year's election, the Node.js Ecosystem is chaotic and insecure, open-source vulnerabilities plague enterprise codebase...
Podcast episode
Nest, Node.js, & F.Secure - Application Security Weekly #None: In the news, the entire Nest ecosystem of smart home devices goes offline, how Alphabet plans to keep hackers away from this year's election, the Node.js Ecosystem is chaotic and insecure, open-source vulnerabilities plague enterprise codebase...
bySecurity Weekly Podcast Network (Video)
0 ratings
0% found this document useful
Distributed Systems with Leslie Lamport: This episode is a republication from my interview with Leslie Lamport on Software Engineering Radio. Leslie Lamport won a Turing Award in 2013 for his work in distributed and concurrent systems. He also designed the document preparation tool LaTex.
Podcast episode
Distributed Systems with Leslie Lamport: This episode is a republication from my interview with Leslie Lamport on Software Engineering Radio. Leslie Lamport won a Turing Award in 2013 for his work in distributed and concurrent systems. He also designed the document preparation tool LaTex.
byCloud Engineering Archives - Software Engineering Daily
0 ratings
0% found this document useful
Past, Present and Future of C++ with Bjarne Stroustrup: Rob and Jason are joined by Bjarne Stroustrup, designer and original implementer of C++ to discuss the current state of C++, his vision for the future as well as some discussion of the past. Bjarne Stroustrup is the designer and original implementer...
Podcast episode
Past, Present and Future of C++ with Bjarne Stroustrup: Rob and Jason are joined by Bjarne Stroustrup, designer and original implementer of C++ to discuss the current state of C++, his vision for the future as well as some discussion of the past. Bjarne Stroustrup is the designer and original implementer...
byCppCast
0 ratings
0% found this document useful
Leslie Lamport - in partnership with ACM Bytecast: In this collaboration with ACM ByteCast and Hanselminutes, Scott welcomes 2013 ACM A.M. Turing Award laureate Leslie Lamport of Microsoft Research, best known for his seminal work in distributed and concurrent systems, and as the initial developer of the document preparation system LaTeX and the author of its first manual. Among his many honors and recognitions, Lamport is a Fellow of ACM and has received the IEEE Emanuel R. Piore Award, the Dijkstra Prize, and the IEEE John von Neumann Medal. Leslie shares his journey into computing, which started out as something he only did in his spare time as a mathematician. Scott and Leslie discuss the differences and similarities between computer science and software engineering, the math involved in Leslie’s high-level temporal logic of actions (TLA), which can help solve the famous Byzantine Generals Problem, and the algorithms Leslie himself has created. He also reflects on how the building
Podcast episode
Leslie Lamport - in partnership with ACM Bytecast: In this collaboration with ACM ByteCast and Hanselminutes, Scott welcomes 2013 ACM A.M. Turing Award laureate Leslie Lamport of Microsoft Research, best known for his seminal work in distributed and concurrent systems, and as the initial developer of the document preparation system LaTeX and the author of its first manual. Among his many honors and recognitions, Lamport is a Fellow of ACM and has received the IEEE Emanuel R. Piore Award, the Dijkstra Prize, and the IEEE John von Neumann Medal. Leslie shares his journey into computing, which started out as something he only did in his spare time as a mathematician. Scott and Leslie discuss the differences and similarities between computer science and software engineering, the math involved in Leslie’s high-level temporal logic of actions (TLA), which can help solve the famous Byzantine Generals Problem, and the algorithms Leslie himself has created. He also reflects on how the building
byHanselminutes with Scott Hanselman
0 ratings
0% found this document useful
ChatOps with Jason Hand: Chat bots are your newest co-worker. Slack, HipChat, and other chat clients allow developers and other team members to communicate more dynamically than the limits of email. Companies have started to add bots to their chat rooms.
Podcast episode
ChatOps with Jason Hand: Chat bots are your newest co-worker. Slack, HipChat, and other chat clients allow developers and other team members to communicate more dynamically than the limits of email. Companies have started to add bots to their chat rooms.
byCloud Engineering Archives - Software Engineering Daily
0 ratings
0% found this document useful
A Chaos Engineering & Jeli Sandwich with Nora Jones: Nora Jones is the founder and CEO at Jeli, makers of an incident analysis platform that leverages data to recommend productive solutions to the problems at hand. Before this role, she was Head of Chaos Engineering and Human Factors at Slack, a senior soft
Podcast episode
A Chaos Engineering & Jeli Sandwich with Nora Jones: Nora Jones is the founder and CEO at Jeli, makers of an incident analysis platform that leverages data to recommend productive solutions to the problems at hand. Before this role, she was Head of Chaos Engineering and Human Factors at Slack, a senior soft
byScreaming in the Cloud
0 ratings
0% found this document useful
Being Bayesian: This episode explores the root concept of what it is to be Bayesian: describing knowledge of a system probabilistically, having an appropriate prior probability, know how to weigh new evidence, and following Bayes's rule to compute the revised...
Podcast episode
Being Bayesian: This episode explores the root concept of what it is to be Bayesian: describing knowledge of a system probabilistically, having an appropriate prior probability, know how to weigh new evidence, and following Bayes's rule to compute the revised...
byData Skeptic
0 ratings
0% found this document useful
Cloud Dataflow with Eric Anderson: Batch and stream processing systems have been evolving for the past decade. From MapReduce to Apache Storm to Dataflow, the best practices for large volume data processing have become more sophisticated as the industry and open source communities have ...
Podcast episode
Cloud Dataflow with Eric Anderson: Batch and stream processing systems have been evolving for the past decade. From MapReduce to Apache Storm to Dataflow, the best practices for large volume data processing have become more sophisticated as the industry and open source communities have ...
byCloud Engineering Archives - Software Engineering Daily
0 ratings
0% found this document useful
Declarative Machine Learning For High Performance Deep Learning Models With Predibase
Podcast episode
Declarative Machine Learning For High Performance Deep Learning Models With Predibase
byThe Python Podcast.__init__
0 ratings
0% found this document useful
A Programmer's Introduction to Mathematics with Jeremy Kun: Like Programming, Mathematics has language and culture. Jeremy Kun has written A Programmer's Introduction to Mathematics as a way to bridge these two worlds and make the power and magic of mathematics available and understandable to programmers everywhere.
Podcast episode
A Programmer's Introduction to Mathematics with Jeremy Kun: Like Programming, Mathematics has language and culture. Jeremy Kun has written A Programmer's Introduction to Mathematics as a way to bridge these two worlds and make the power and magic of mathematics available and understandable to programmers everywhere.
byHanselminutes with Scott Hanselman
0 ratings
0% found this document useful
CRDTs and Distributed Consensus with Christopher Meiklejohn - Episode 14: CRDTs, Conflict Resolution, and Distributed Consensus in Real World Systems (Interview)
Podcast episode
CRDTs and Distributed Consensus with Christopher Meiklejohn - Episode 14: CRDTs, Conflict Resolution, and Distributed Consensus in Real World Systems (Interview)
byData Engineering Podcast
0 ratings
0% found this document useful
Python, Django, and Channels: with Andrew Godwin, creator of Django Channels
Podcast episode
Python, Django, and Channels: with Andrew Godwin, creator of Django Channels
byThe Changelog: Software Development, Open Source
0 ratings
0% found this document useful
How Can Algorithms Help to Protect our Privacy: In this terms Strachey lecture, Professor Monika Henzinger gives an introduction to differential privacy with an emphasis on differential private algorithms that can handle changing input data.
Podcast episode
How Can Algorithms Help to Protect our Privacy: In this terms Strachey lecture, Professor Monika Henzinger gives an introduction to differential privacy with an emphasis on differential private algorithms that can handle changing input data.
byComputer Science
0 ratings
0% found this document useful
Exploring The Patterns And Practices For Deep Learning With Andrew Ferlitsch: An interview with Andrew Ferlitsch about his experiences building and teaching deep learning models and his work on a book to capture those lessons for everyone to learn from.
Podcast episode
Exploring The Patterns And Practices For Deep Learning With Andrew Ferlitsch: An interview with Andrew Ferlitsch about his experiences building and teaching deep learning models and his work on a book to capture those lessons for everyone to learn from.
byThe Python Podcast.__init__
0 ratings
0% found this document useful
Computational Thinking & Learning Python During an AI Revolution
Podcast episode
Computational Thinking & Learning Python During an AI Revolution
byThe Real Python Podcast
0 ratings
0% found this document useful

Skip carousel

Introduction to eBPF Revolutionizing Linux Kernel Technology
Techfastly
Article
Introduction to eBPF Revolutionizing Linux Kernel Technology
Apr 1, 2022
6 min read
VisionFive V1 RISC-V SBC on sale
Linux Format
Article
VisionFive V1 RISC-V SBC on sale
May 3, 2022
1 min read
Rise Of The Robots
Linux Format
Article
Rise Of The Robots
Jan 12, 2021
7 min read
Build The Kernel
Linux Format
Article
Build The Kernel
Mar 8, 2022
1 min read
Using EBPF To Monitor Filesystems
Linux Format
Article
Using EBPF To Monitor Filesystems
Dec 13, 2022
10 min read
Getting Started With The Powerful EBPF
Linux Format
Article
Getting Started With The Powerful EBPF
Sep 20, 2022
Credit: https://ebpf.io Don’t miss next issue! Subscribe on page 16 Mihalis Tsoukalos is a systems engineer and a technical writer. You can reach him at www. mtsoukalos.eu and @mactsouk. Get the code for this tutorial from the Linux Format archive:
10 min read
MapReduce: The ‘Big Data’ Idea Inside Your Android Phone
APC
Article
MapReduce: The ‘Big Data’ Idea Inside Your Android Phone
Dec 2, 2019
4 min read
What Is The Future Of Game Streaming Now That Stadia Is Dead?
APC
Article
What Is The Future Of Game Streaming Now That Stadia Is Dead?
Oct 31, 2022
Once hyped as being ‘the future of gaming’, the Google Stadia game streaming service was officially, just three years after launch and before even making it to Australian shores. When game streaming first launched we did have some apprehension about
2 min read
Remembering The Legacy Of Sir Clive Sinclair
Linux Format
Article
Remembering The Legacy Of Sir Clive Sinclair
Oct 19, 2021
1 min read
Mailserver
Linux Format
Article
Mailserver
Mar 8, 2022
3 min read
Editor’s Note
Techfastly
Article
Editor’s Note
Apr 1, 2022
Dear Readers, eBPF is a ground-breaking technology derived from the Linux kernel that allows sandboxed programmes to operate within an operating system kernel. It stands for Extended Berkeley Packet Filter (eBPF). eBPF was first published in diminish
1 min read
Plugins For Nnn
APC
Article
Plugins For Nnn
Aug 10, 2020
Nnn is tiny but highly extensible. There are several plugins available in the Git repo (check out the list at https://github.com/jarun/nnn/tree/ master/plugins, which also shows their dependencies), and even a script to install them easily. You can r
1 min read
Rokoko Studio 2.0
3D World
Article
Rokoko Studio 2.0
Feb 23, 2021
1 min read
Open Source Processors
Linux Format
Article
Open Source Processors
Jun 2, 2020
8 min read
Create A Custom Windows Installer
Maximum PC
Article
Create A Custom Windows Installer
May 28, 2019
8 min read
Say Goodbye To X+Y: Should Community Colleges Abolish Algebra?
NPR
Article
Say Goodbye To X+Y: Should Community Colleges Abolish Algebra?
Jul 19, 2017
4 min read
67 AMAZING iPHONE SECRETS
MacLife
Article
67 AMAZING iPHONE SECRETS
Feb 27, 2024
1 min read
Give Old Mac Software Eternal Life
MacWorld
Article
Give Old Mac Software Eternal Life
Jun 19, 2018
4 min read
Access Your Mac Anywhere
MacLife
Article
Access Your Mac Anywhere
Nov 8, 2022
2 min read
The Verdict
Linux Format
Article
The Verdict
May 30, 2023
2 min read
Micro-controllers
Linux Format
Article
Micro-controllers
Jan 12, 2021
If you want to use JavaScript and interface directly with Arduino and other system-on-chip boards, consider these simpler systems. They’re designed to interface directly to a large amount of boards, including the Arduino, Raspberry Pi and other small
1 min read
Linux Mint 2 1.1 Vera
Linux Format
Article
Linux Mint 2 1.1 Vera
Feb 7, 2023
5 min read
Why Are We Stuck With M.2 When U.2 Is So Much Better?
APC
Article
Why Are We Stuck With M.2 When U.2 Is So Much Better?
May 22, 2023
4 min read
The Best USB–C Docks For MacBooks
MacLife
Article
The Best USB–C Docks For MacBooks
Jun 22, 2021
2 min read
Use Home Assistant NodeRED devices
Linux Format
Article
Use Home Assistant NodeRED devices
May 31, 2022
NodeRED is an open source graphical programming environment that can be used to complete a large number of tasks. Graphical programming languages tend to be more intuitive than textual programming languages and can be easier to learn. Support is buil
5 min read
Enhance GIMP With Must-have Plugins
Linux Format
Article
Enhance GIMP With Must-have Plugins
Apr 4, 2023
Credit: http://gimp.org Karsten Günther loves to extend his GIMP installations with useful plugins. And playing with the new tools… Many of the most important plugins presented here are quite old, but they are still maintained and sometimes even deve
9 min read
Linux in Windows
TechLife
Article
Linux in Windows
Nov 15, 2021
4 min read
Usability
Linux Format
Article
Usability
Oct 19, 2021
3 min read
Kernel Internals
Linux Format
Article
Kernel Internals
Aug 24, 2021
4 min read
Tiny Core Linux 14.0
Linux Format
Article
Tiny Core Linux 14.0
May 30, 2023
2 min read

Related categories

Skip carousel

Reviews for OpenCL in Action

Rating: 0 out of 5 stars

0 ratings

0 ratings0 reviews

Book preview

OpenCL in Action - Matthew Scarpino

Copyright

For online information and ordering of this and other Manning books, please visit www.manning.com. The publisher offers discounts on this book when ordered in quantity. For more information, please contact

Special Sales Department

Manning Publications Co.

20 Baldwin Road

PO Box 261

Shelter Island, NY 11964

Email:

orders@manning.com

No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by means electronic, mechanical, photocopying, or otherwise, without prior written permission of the publisher.

Many of the designations used by manufacturers and sellers to distinguish their products are claimed as trademarks. Where those designations appear in the book, and Manning Publications was aware of a trademark claim, the designations have been printed in initial caps or all caps.

Recognizing the importance of preserving what has been written, it is Manning’s policy to have the books we publish printed on acid-free paper, and we exert our best efforts to that end. Recognizing also our responsibility to conserve the resources of our planet, Manning books are printed on paper that is at least 15 percent recycled and processed without the use of elemental chlorine.

Printed in the United States of America

1 2 3 4 5 6 7 8 9 10 – MAL – 16 15 14 13 12 11

Brief Table of Contents

Copyright

Brief Table of Contents

Table of Contents

Preface

Acknowledgments

About this Book

1. Foundations of OpenCL programming

Chapter 1. Introducing OpenCL

Chapter 2. Host programming: fundamental data structures

Chapter 3. Host programming: data transfer and partitioning

Chapter 4. Kernel programming: data types and device memory

Chapter 5. Kernel programming: operators and functions

Chapter 6. Image processing

Chapter 7. Events, profiling, and synchronization

Chapter 8. Development with C++

Chapter 9. Development with Java and Python

Chapter 10. General coding principles

2. Coding practical algorithms in OpenCL

Chapter 11. Reduction and sorting

Chapter 12. Matrices and QR decomposition

Chapter 13. Sparse matrices

Chapter 14. Signal processing and the fast Fourier transform

3. Accelerating OpenGL with OpenCL

Chapter 15. Combining OpenCL and OpenGL

Chapter 16. Textures and renderbuffers

Appendix A. Installing and using a software development kit

Appendix B. Real-time rendering with OpenGL

Appendix C. The minimalist GNU for Windows and OpenCL

Appendix D. OpenCL on mobile devices

Index

List of Figures

List of Tables

List of Listings

Copyright

Brief Table of Contents

Table of Contents

Preface

Acknowledgments

About this Book

1. Foundations of OpenCL programming

Chapter 1. Introducing OpenCL

1.1. The dawn of OpenCL

1.2. Why OpenCL?

1.2.1. Portability

1.2.2. Standardized vector processing

1.2.3. Parallel programming

1.3. Analogy: OpenCL processing and a game of cards

1.4. A first look at an OpenCL application

1.5. The OpenCL standard and extensions

1.6. Frameworks and software development kits (SDKs)

1.7. Summary

Chapter 2. Host programming: fundamental data structures

2.1. Primitive data types

2.2. Accessing platforms

2.2.1. Creating platform structures

2.2.2. Obtaining platform information

2.2.3. Code example: testing platform extensions

2.3. Accessing installed devices

2.3.1. Creating device structures

2.3.2. Obtaining device information

2.3.3. Code example: testing device extensions

2.4. Managing devices with contexts

2.4.1. Creating contexts

2.4.2. Obtaining context information

2.4.3. Contexts and the reference count

2.4.4. Code example: checking a context’s reference count

2.5. Storing device code in programs

2.5.1. Creating programs

2.5.2. Building programs

2.5.3. Obtaining program information

2.5.4. Code example: building a program from multiple source files

2.6. Packaging functions in kernels

2.6.1. Creating kernels

2.6.2. Obtaining kernel information

2.6.3. Code example: obtaining kernel information

2.7. Collecting kernels in a command queue

2.7.1. Creating command queues

2.7.2. Enqueuing kernel execution commands

2.8. Summary

Chapter 3. Host programming: data transfer and partitioning

3.1. Setting kernel arguments

3.2. Buffer objects

3.2.1. Allocating buffer objects

3.2.2. Creating subbuffer objects

3.3. Image objects

3.3.1. Creating image objects

3.3.2. Obtaining information about image objects

3.4. Obtaining information about buffer objects

3.5. Memory object transfer commands

3.5.1. Read/write data transfer

3.5.2. Mapping memory objects

3.5.3. Copying data between memory objects

3.6. Data partitioning

3.6.1. Loops and work-items

3.6.2. Work sizes and offsets

3.6.3. A simple one-dimensional example

3.6.4. Work-groups and compute units

3.7. Summary

Chapter 4. Kernel programming: data types and device memory

4.1. Introducing kernel coding

4.2. Scalar data types

4.2.1. Accessing the double data type

4.2.2. Byte order

4.3. Floating-point computing

4.3.1. The float data type

4.3.2. The double data type

4.3.3. The half data type

4.3.4. Checking IEEE-754 compliance

4.4. Vector data types

4.4.1. Preferred vector widths

4.4.2. Initializing vectors

4.4.3. Reading and modifying vector components

4.4.4. Endianness and memory access

4.5. The OpenCL device model

4.5.1. Device model analogy part 1: math students in school

4.5.2. Device model analogy part 2: work-items in a device

4.5.3. Address spaces in code

4.5.4. Memory alignment

4.6. Local and private kernel arguments

4.6.1. Local arguments

4.6.2. Private arguments

4.7. Summary

Chapter 5. Kernel programming: operators and functions

5.1. Operators

5.2. Work-item and work-group functions

5.2.1. Dimensions and work-items

5.2.2. Work-groups

5.2.3. An example application

5.3. Data transfer operations

5.3.1. Loading and storing data of the same type

5.3.2. Loading vectors from a scalar array

5.3.3. Storing vectors to a scalar array

5.4. Floating-point functions

5.4.1. Arithmetic and rounding functions

5.4.2. Comparison functions

5.4.3. Exponential and logarithmic functions

5.4.4. Trigonometric functions

5.4.5. Miscellaneous floating-point functions

5.5. Integer functions

5.5.1. Adding and subtracting integers

5.5.2. Multiplication

5.5.3. Miscellaneous integer functions

5.6. Shuffle and select functions

5.6.1. Shuffle functions

5.6.2. Select functions

5.7. Vector test functions

5.8. Geometric functions

5.9. Summary

Chapter 6. Image processing

6.1. Image objects and samplers

6.1.1. Image objects on the host: cl_mem

6.1.2. Samplers on the host: cl_sampler

6.1.3. Image objects on the device: image2d_t and image3d_t

6.1.4. Samplers on the device: sampler_t

6.2. Image processing functions

6.2.1. Image read functions

6.2.2. Image write functions

6.2.3. Image information functions

6.2.4. A simple example

6.3. Image scaling and interpolation

6.3.1. Nearest-neighbor interpolation

6.3.2. Bilinear interpolation

6.3.3. Image enlargement in OpenCL

6.4. Summary

Chapter 7. Events, profiling, and synchronization

7.1. Host notification events

7.1.1. Associating an event with a command

7.1.2. Associating an event with a callback function

7.1.3. A host notification example

7.2. Command synchronization events

7.2.1. Wait lists and command events

7.2.2. Wait lists and user events

7.2.3. Additional command synchronization functions

7.2.4. Obtaining data associated with events

7.3. Profiling events

7.3.1. Configuring command profiling

7.3.2. Profiling data transfer

7.3.3. Profiling data partitioning

7.4. Work-item synchronization

7.4.1. Barriers and fences

7.4.2. Atomic operations

7.4.3. Atomic commands and mutexes

7.4.4. Asynchronous data transfer

7.5. Summary

Chapter 8. Development with C++

8.1. Preliminary concerns

8.1.1. Vectors and strings

8.1.2. Exceptions

8.2. Creating kernels

8.2.1. Platforms, devices, and contexts

8.2.2. Programs and kernels

8.3. Kernel arguments and memory objects

8.3.1. Memory objects

8.3.2. General data arguments

8.3.3. Local space arguments

8.4. Command queues

8.4.1. Creating CommandQueue objects

8.4.2. Enqueuing kernel-execution commands

8.4.3. Read/write commands

8.4.4. Memory mapping and copy commands

8.5. Event processing

8.5.1. Host notification

8.5.2. Command synchronization

8.5.3. Profiling events

8.5.4. Additional event functions

8.6. Summary

Chapter 9. Development with Java and Python

9.1. Aparapi

9.1.1. Aparapi installation

9.1.2. The Kernel class

9.1.3. Work-items and work-groups

9.2. JavaCL

9.2.1. JavaCL installation

9.2.2. Overview of JavaCL development

9.2.3. Creating kernels with JavaCL

9.2.4. Setting arguments and enqueuing commands

9.3. PyOpenCL

9.3.1. PyOpenCL installation and licensing

9.3.2. Overview of PyOpenCL development

9.3.3. Creating kernels with PyOpenCL

9.3.4. Setting arguments and executing kernels

9.4. Summary

Chapter 10. General coding principles

10.1. Global size and local size

10.1.1. Finding the maximum work-group size

10.1.2. Testing kernels and devices

10.2. Numerical reduction

10.2.1. OpenCL reduction

10.2.2. Improving reduction speed with vectors

10.3. Synchronizing work-groups

10.4. Ten tips for high-performance kernels

10.5. Summary

2. Coding practical algorithms in OpenCL

Chapter 11. Reduction and sorting

11.1. MapReduce

11.1.1. Introduction to MapReduce

11.1.2. MapReduce and OpenCL

11.1.3. MapReduce example: searching for text

11.2. The bitonic sort

11.2.1. Understanding the bitonic sort

11.2.2. Implementing the bitonic sort in OpenCL

11.3. The radix sort

11.3.1. Understanding the radix sort

11.3.2. Implementing the radix sort with vectors

11.4. Summary

Chapter 12. Matrices and QR decomposition

12.1. Matrix transposition

12.1.1. Introduction to matrices

12.1.2. Theory and implementation of matrix transposition

12.2. Matrix multiplication

12.2.1. The theory of matrix multiplication

12.2.2. Implementing matrix multiplication in OpenCL

12.3. The Householder transformation

12.3.1. Vector projection

12.3.2. Vector reflection

12.3.3. Outer products and Householder matrices

12.3.4. Vector reflection in OpenCL

12.4. The QR decomposition

12.4.1. Finding the Householder vectors and R

12.4.2. Finding the Householder matrices and Q

12.4.3. Implementing QR decomposition in OpenCL

12.5. Summary

Chapter 13. Sparse matrices

13.1. Differential equations and sparse matrices

13.2. Sparse matrix storage and the Harwell-Boeing collection

13.2.1. Introducing the Harwell-Boeing collection

13.2.2. Accessing data in Matrix Market files

13.3. The method of steepest descent

13.3.1. Positive-definite matrices

13.3.2. Theory of the method of steepest descent

13.3.3. Implementing SD in OpenCL

13.4. The conjugate gradient method

13.4.1. Orthogonalization and conjugacy

13.4.2. The conjugate gradient method

13.5. Summary

Chapter 14. Signal processing and the fast Fourier transform

14.1. Introducing frequency analysis

14.2. The discrete Fourier transform

14.2.1. Theory behind the DFT

14.2.2. OpenCL and the DFT

14.3. The fast Fourier transform

14.3.1. Three properties of the DFT

14.3.2. Constructing the fast Fourier transform

14.3.3. Implementing the FFT with OpenCL

14.4. Summary

3. Accelerating OpenGL with OpenCL

Chapter 15. Combining OpenCL and OpenGL

15.1. Sharing data between OpenGL and OpenCL

15.1.1. Creating the OpenCL context

15.1.2. Sharing data between OpenGL and OpenCL

15.1.3. Synchronizing access to shared data

15.2. Obtaining information

15.2.1. Obtaining OpenGL object and texture information

15.2.2. Obtaining information about the OpenGL context

15.3. Basic interoperability example

15.3.1. Initializing OpenGL operation

15.3.2. Initializing OpenCL operation

15.3.3. Creating data objects

15.3.4. Executing the kernel

15.3.5. Rendering graphics

15.4. Interoperability and animation

15.4.1. Specifying vertex data

15.4.2. Animation and display

15.4.3. Executing the kernel

15.5. Summary

Chapter 16. Textures and renderbuffers

16.1. Image filtering

16.1.1. The Gaussian blur

16.1.2. Image sharpening

16.1.3. Image embossing

16.2. Filtering textures with OpenCL

16.2.1. The init_gl function

16.2.2. The init_cl function

16.2.3. The configure_shared_data function

16.2.4. The execute_kernel function

16.2.5. The display function

16.3. Summary

Appendix A. Installing and using a software development kit

A.1. Understanding OpenCL SDKs

A.1.1. Checking device compliance

A.1.2. OpenCL header files and libraries

A.2. OpenCL on Windows

A.2.1. Windows installation with an AMD graphics card

A.2.2. Building Windows applications with an AMD graphics card

A.2.3. Windows installation with an Nvidia graphics card

A.2.4. Building Windows applications with an Nvidia graphics card

A.3. OpenCL on Linux

A.3.1. Linux installation with an AMD graphics card

A.3.2. Linux installation with an Nvidia graphics card

A.3.3. Building OpenCL applications for Linux

A.4. OpenCL on Mac OS

A.5. Summary

Appendix B. Real-time rendering with OpenGL

B.1. Installing OpenGL

B.1.1. OpenGL installation on Windows

B.1.2. OpenGL installation on Linux

B.1.3. OpenGL installation on Mac OS

B.2. OpenGL development on the host

B.2.1. Placing data in vertex buffer objects (VBOs)

B.2.2. Configuring vertex attributes

B.2.3. Compiling and deploying shaders

B.2.4. Launching the rendering process

B.3. Shader development

B.3.1. Introduction to shader coding

B.3.2. Vertex shaders

B.3.3. Fragment shaders

B.4. Creating the OpenGL window with GLUT

B.4.1. Configuring and creating a window

B.4.2. Event handling

B.4.3. Displaying a window

B.5. Combining OpenGL and GLUT

B.5.1. GLUT/OpenGL initialization

B.5.2. Setting the viewport

B.5.3. Rendering the model

B.6. Adding texture

B.6.1. Creating textures in the host application

B.6.2. Texture mapping in the vertex shader

B.6.3. Applying textures in the fragment shader

B.7. Summary

Appendix C. The minimalist GNU for Windows and OpenCL

C.1. Installing MinGW on Windows

C.1.1. Obtaining and running the graphical installer

C.1.2. Installing new tools in MinGW

C.2. Building MinGW executables

C.2.1. Building Hello World! with MinGW

C.2.2. The GNU compiler

C.3. Makefiles

C.3.1. Structure of a GNU makefile

C.3.2. Targets and phony targets

C.3.3. Simple example makefile

C.4. Building OpenCL applications

C.5. Summary

Appendix D. OpenCL on mobile devices

D.1. Numerical processing

D.2. Image processing

D.3. Summary

Index

List of Figures

List of Tables

List of Listings

Preface

In the summer of 1997, I was terrified. Instead of working as an intern in my major (microelectronic engineering), the best job I could find was at a research laboratory devoted to high-speed signal processing. My job was to program the two-dimensional fast Fourier transform (FFT) using C and the Message Passing Interface (MPI), and get it running as quickly as possible. The good news was that the lab had sixteen brand new SPARCstations. The bad news was that I knew absolutely nothing about MPI or the FFT.

Thanks to books purchased from a strange new site called Amazon.com, I managed to understand the basics of MPI: the application deploys one set of instructions to multiple computers, and each processor accesses data according to its ID. As each processor finishes its task, it sends its output to the processor whose ID equals 0.

It took me time to grasp the finer details of MPI (blocking versus nonblocking data transfer, synchronous versus asynchronous communication), but as I worked more with the language, I fell in love with distributed computing. I loved the fact that I could get sixteen monstrous computers to process data in lockstep, working together like athletes on a playing field. I felt like a choreographer arranging a dance or a composer writing a symphony for an orchestra. By the end of the internship, I coded multiple versions of the 2-D FFT in MPI, but the lab’s researchers decided that network latency made the computation impractical.

Since that summer, I’ve always gravitated toward high-performance computing, and I’ve had the pleasure of working with digital signal processors, field-programmable gate arrays, and the Cell processor, which serves as the brain of Sony’s PlayStation 3. But nothing beats programming graphics processing units (GPUs) with OpenCL. As today’s supercomputers have shown, no CPU provides the same number-crunching power per watt as a GPU. And no language can target as wide a range of devices as OpenCL.

When AMD released its OpenCL development tools in 2009, I fell in love again. Not only does OpenCL provide new vector types and a wealth of math functions, but it also resembles MPI in many respects. Both toolsets are freely available and their routines can be called in C or C++. In both cases, applications deliver instructions to multiple devices whose processing units rely on IDs to determine which data they should access. MPI and OpenCL also make it possible to send data using similar types of blocking/non-blocking transfers and synchronous/asynchronous communication.

OpenCL is still new in the world of high-performance computing, and many programmers don’t know it exists. To help spread the word about this incredible language, I decided to write OpenCL in Action. I’ve enjoyed working on this book a great deal, and I hope it helps newcomers take advantage of the power of OpenCL and distributed computing in general.

As I write this in the summer of 2011, I feel as though I’ve come full circle. Last night, I put the finishing touches on the FFT application presented in chapter 14. It brought back many pleasant memories of my work with MPI, but I’m amazed by how much the technology has changed. In 1997, the sixteen SPARCstations in my lab took nearly a minute to perform a 32k FFT. In 2011, my $300 graphics card can perform an FFT on millions of data points in seconds.

The technology changes, but the enjoyment remains the same. The learning curve can be steep in the world of distributed computing, but the rewards more than make up for the effort expended.

Acknowledgments

I started writing my first book for Manning Publications in 2003, and though much has changed, they are still as devoted to publishing high-quality books now as they were then. I’d like to thank all of Manning’s professionals for their hard work and dedication, but I’d like to acknowledge the following folks in particular:

First, I’d like to thank Maria Townsley, who worked as developmental editor. Maria is one of the most hands-on editors I’ve worked with, and she went beyond the call of duty in recommending ways to improve the book’s organization and clarity. I bristled and whined, but in the end, she turned out to be absolutely right. In addition, despite my frequent rewriting of the table of contents, her pleasant disposition never flagged for a moment.

I’d like to extend my deep gratitude to the entire Manning production team. In particular, I’d like to thank Andy Carroll for going above and beyond the call of duty in copyediting this book. His comments and insight have not only dramatically improved the polish of the text, but his technical expertise has made the content more accessible. Similarly, I’d like to thank Maureen Spencer and Katie Tennant for their eagle-eyed proofreading of the final copy and Gordan Salinovic for his painstaking labor in dealing with the book’s images and layout. I’d also like to thank Mary Piergies for masterminding the production process and making sure the final product lives up to Manning’s high standards.

Jörn Dinkla is, simply put, the best technical editor I’ve ever worked with. I tested the book’s example code on Linux and Mac OS, but he went further and tested the code with software development kits from Linux, AMD, and Nvidia. Not only did he catch quite a few errors I missed, but in many cases, he took the time to find out why the error had occurred. I shudder to think what would have happened without his assistance, and I’m beyond grateful for the work he put into improving the quality of this book’s code.

I’d like to thank Candace Gilhooley for spreading the word about the book’s publication. Given OpenCL’s youth, the audience isn’t as easy to reach as the audience for Manning’s many Java books. But between setting up web articles, presentations, and conference attendance, Candace has done an exemplary job in marketing OpenCL in Action.

One of Manning’s greatest strengths is its reliance on constant feedback. During development and production, Karen Tegtmeyer and Ozren Harlovic sought out reviewers for this book and organized a number of review cycles. Thanks to the feedback from the following reviewers, this book includes a number of important subjects that I wouldn’t otherwise have considered: Olivier Chafik, Martin Beckett, Benjamin Ducke, Alan Commike, Nathan Levesque, David Strong, Seth Price, John J. Ryan III, and John Griffin.

Last but not least, I’d like to thank Jan Bednarczuk of Jandex Indexing for her meticulous work in indexing the content of this book. She not only created a thorough, professional index in a short amount of time, but she also caught quite a few typos in the process. Thanks again.

About this Book

OpenCL is a complex subject. To code even the simplest of applications, a developer needs to understand host programming, device programming, and the mechanisms that transfer data between the host and device. The goal of this book is to show how these tasks are accomplished and how to put them to use in practical applications.

The format of this book is tutorial-based. That is, each new concept is followed by example code that demonstrates how the theory is used in an application. Many of the early applications are trivially basic, and some do nothing more than obtain information about devices and data structures. But as the book progresses, the code becomes more involved and makes fuller use of both the host and the target device. In the later chapters, the focus shifts from learning how OpenCL works to putting OpenCL to use in processing vast amounts of data at high speed.

Audience

In writing this book, I’ve assumed that readers have never heard of OpenCL and know nothing about distributed computing or high-performance computing. I’ve done my best to present concepts like task-parallelism and SIMD (single instruction, multiple data) development as simply and as straightforwardly as possible.

But because the OpenCL API is based on C, this book presumes that the reader has a solid understanding of C fundamentals. Readers should be intimately familiar with pointers, arrays, and memory access functions like malloc and free. It also helps to be cognizant of the C functions declared in the common math library, as most of the kernel functions have similar names and usages.

OpenCL applications can run on many different types of devices, but one of its chief advantages is that it can be used to program graphics processing units (GPUs). Therefore, to get the most out of this book, it helps to have a graphics card attached to your computer or a hybrid CPU-GPU device such as AMD’s Fusion.

Roadmap

This book is divided into three parts. The first part, which consists of chapters 1–10, focuses on exploring the OpenCL language and its capabilities. The second part, which consists of chapters 11–14, shows how OpenCL can be used to perform large-scale tasks commonly encountered in the field of high-performance computing. The last part, which consists of chapters 15 and 16, shows how OpenCL can be used to accelerate OpenGL applications.

The chapters of part 1 have been structured to serve the needs of a programmer who has never coded a line of OpenCL. Chapter 1 introduces the topic of OpenCL, explaining what it is, where it came from, and the basics of its operation. Chapters 2 and 3 explain how to code applications that run on the host, and chapters 4 and 5 show how to code kernels that run on compliant devices. Chapters 6 and 7 explore advanced topics that involve both host programming and kernel coding. Specifically, chapter 6 presents image processing and chapter 7 discusses the important topics of event processing and synchronization.

Chapters 8 and 9 discuss the concepts first presented in chapters 2 through 5, but using languages other than C. Chapter 8 discusses host/kernel coding in C++, and chapter 9 explains how to build OpenCL applications in Java and Python. If you aren’t obligated to program in C, I recommend that you use one of the toolsets discussed in these chapters.

Chapter 10 serves as a bridge between parts 1 and 2. It demonstrates how to take full advantage of OpenCL’s parallelism by implementing a simple reduction algorithm that adds together one million data points. It also presents helpful guidelines for coding practical OpenCL applications.

Chapters 11–14 get into the heavy-duty usage of OpenCL, where applications commonly operate on millions of data points. Chapter 11 discusses the implementation of MapReduce and two sorting algorithms: the bitonic sort and the radix sort. Chapter 12 covers operations on dense matrices, and chapter 13 explores operations on sparse matrices. Chapter 14 explains how OpenCL can be used to implement the fast Fourier transform (FFT).

Chapters 15 and 16 are my personal favorites. One of OpenCL’s great strengths is that it can be used to accelerate three-dimensional rendering, a topic of central interest in game development and scientific visualization. Chapter 15 introduces the topic of OpenCL-OpenGL interoperability and shows how the two toolsets can share data corresponding to vertex attributes. Chapter 16 expands on this and shows how OpenCL can accelerate OpenGL texture processing. These chapters require an understanding of OpenGL 3.3 and shader development, and both of these topics are explored in appendix B.

At the end of the book, the appendixes provide helpful information related to OpenCL, but the material isn’t directly used in common OpenCL development. Appendix A discusses the all-important topic of software development kits (SDKs), and explains how to install the SDKs provided by AMD and Nvidia. Appendix B discusses the basics of OpenGL and shader development. Appendix C explains how to install and use the Minimalist GNU for Windows (MinGW), which provides a GNU-like environment for building executables on the Windows operating system. Lastly, appendix D discusses the specification for embedded OpenCL.

Obtaining and compiling the example code

In the end, it’s the code that matters. This book contains working code for over 60 OpenCL applications, and you can download the source code from the publisher’s website at www.manning.com/OpenCLinAction or www.manning.com/scarpino2/.

The download site provides a link pointing to an archive that contains code intended to be compiled with GNU-based build tools. This archive contains one folder for each chapter/appendix of the book, and each top-level folder has subfolders for example projects. For example, if you look in the Ch5/shuffle_test directory, you’ll find the source code for Chapter 5’s shuffle_test project.

As far as dependencies go, every project requires that the OpenCL library (OpenCL.lib on Windows, libOpenCL.so on *nix systems) be available on the development system. Appendix A discusses how to obtain this library by installing an appropriate software development kit (SDK).

In addition, chapters 6 and 16 discuss images, and the source code in these chapters makes use of the open-source PNG library. Chapter 6 explains how to obtain this library for different systems. Appendix B and chapters 15 and 16 all require access to OpenGL, and appendix B explains how to obtain and install this toolset.

Code conventions

As lazy as this may sound, I prefer to copy and paste working code into my applications rather than write code from scratch. This not only saves time, but also reduces the likelihood of producing bugs through typographical errors. All the code in this book is public domain, so you’re free to download and copy and paste portions of it into your applications. But before you do, it’s a good idea to understand the conventions I’ve used:

Host data structures are named after their data type. That is, each cl_platform_id structure is called platform, each cl_device_id structure is called device, each cl_context structure is called context, and so on.

In the host applications, the main function calls on two functions: create_device returns a cl_device, and build_program creates and compiles a cl_program. Note that create_device searches for a GPU associated with the first available platform. If it can’t find a GPU, it searches for the first compliant CPU.

Host applications identify the program file and the kernel function using macros declared at the start of the source file. Specifically, the PROGRAM_FILE macro identifies the program file and KERNEL_FUNC identifies the kernel function.

All my program files end with the .cl suffix. If the program file only contains one kernel function, that function has the same name as the file.

For GNU code, every makefile assumes that libraries and header files can be found at locations identified by environment variables. Specifically, the makefile searches for AMDAPPSDKROOT on AMD platforms and CUDA on Nvidia platforms.

Author Online

Nobody’s perfect. If I failed to convey my subject material clearly or (gasp) made a mistake, feel free to add a comment through Manning’s Author Online system. You can find the Author Online forum for this book by going to www.manning.com/OpenCLinAction and clicking the Author Online link.

Simple questions and concerns get rapid responses. In contrast, if you’re unhappy with line 402 of my bitonic sort implementation, it may take me some time to get back to you. I’m always happy to discuss general issues related to OpenCL, but if you’re looking for something complex and specific, such as help debugging a custom FFT, I will have to recommend that you find a professional consultant.

About the cover illustration

The figure on the cover of OpenCL in Action is captioned a Kranjac, or an inhabitant of the Carniola region in the Slovenian Alps. This illustration is taken from a recent reprint of Balthasar Hacquet’s Images and Descriptions of Southwestern and Eastern Wenda, Illyrians, and Slavs published by the Ethnographic Museum in Split, Croatia, in 2008. Hacquet (1739–1815) was an Austrian physician and scientist who spent many years studying the botany, geology, and ethnography of the Julian Alps, the mountain range that stretches from northeastern Italy to Slovenia and that is named after Julius Caesar. Hand drawn illustrations accompany the many scientific papers and books that Hacquet published.

The rich diversity of the drawings in Hacquet’s publications speaks vividly of the uniqueness and individuality of the eastern Alpine regions just 200 years ago. This was a time when the dress codes of two villages separated by a few miles identified people uniquely as belonging to one or the other, and when members of a social class or trade could be easily distinguished by what they were wearing. Dress codes have changed since then and the diversity by region, so rich at the time, has faded away. It is now often hard to tell the inhabitant of one continent from another and today the inhabitants of the picturesque towns and villages in the Slovenian Alps are not readily distinguishable from the residents of other parts of Slovenia or the rest of Europe.

We at Manning celebrate the inventiveness, the initiative, and the fun of the computer business with book covers based on costumes from two centuries ago brought back to life by illustrations such as this one.

Part 1. Foundations of OpenCL programming

Part 1 presents the OpenCL language. We’ll explore OpenCL’s data structures and functions in detail and look at example applications that demonstrate their usage in code.

Chapter 1 introduces OpenCL, explaining what it’s used for and how it works. Chapters 2 and 3 explain how host applications are coded, and chapters 4 and 5 discuss kernel coding. Chapters 6 and 7 explore the advanced topics of image processing and event handling.

Chapters 8 and 9 discuss how OpenCL is coded in languages other than C, such as C++, Java, and Python. Chapter 10 explains how OpenCL’s capabilities can be used to develop large-scale applications.

Chapter 1. Introducing OpenCL

This chapter covers

Understanding the purpose and benefits of OpenCL

Introducing OpenCL operation: hosts and kernels

Implementing an OpenCL application in code

In October 2010, a revolution took place in the world of high-performance computing. The Tianhe-1A, constructed by China’s National Supercomputing Center in Tianjin, came from total obscurity to seize the leading position among the world’s best performing supercomputers. With a maximum recorded computing speed of 2,566 TFLOPS (trillion floating-point operations per second), it performs nearly 50 percent faster than the second-place finisher, Cray’s Jaguar supercomputer. Table 1.1 lists the top three supercomputers.

Table 1.1. Top three supercomputers of 2010 (source: www.top500.org)

What’s so revolutionary is the presence of GPUs (graphics processing units) in both the Tianhe-1A and Nebulae? In 2009, none of the top three supercomputers had GPUs, and only one system in the top 20 had any GPUs at all. As the table makes clear, the two systems with GPUs provide not only excellent performance, but also impressive power efficiency.

Using GPUs to perform nongraphical routines is called general-purpose GPU computing, or GPGPU computing. Before 2010, GPGPU computing was considered a novelty in the world of high-performance computing and not worthy of serious attention. But today, engineers and academics are reaching the conclusion that CPU/GPU systems represent the future of supercomputing.

Now an important question arises: how can you program these new hybrid devices? Traditional C and C++ only target traditional CPUs. The same holds true for Cray’s proprietary Chapel language and the Cray Assembly Language (CAL). Nvidia’s CUDA (Compute Unified Device Architecture) can be used to program Nvidia’s GPUs, but not CPUs.

The answer is OpenCL (Open Computing Language). OpenCL routines can be executed on GPUs and CPUs from major manufacturers like AMD, Nvidia, and Intel, and will even run on Sony’s PlayStation 3. OpenCL is nonproprietary—it’s based on a public standard, and you can freely download all the development tools you need. When you code routines in OpenCL, you don’t have to worry about which company designed the processor or how many cores it contains. Your code will compile and execute on AMD’s latest Fusion processors, Intel’s Core processors, Nvidia’s Fermi processors, and IBM’s Cell Broadband Engine.

The goal of this book is to explain how to program these cross-platform applications and take maximum benefit from the underlying hardware. But the goal of this chapter is to provide a basic overview of the OpenCL language. The discussion will start by focusing on OpenCL’s advantages and operation, and then proceed to describing a complete application. But first, it’s important to understand OpenCL’s origin. Corporations have spent a great deal of time developing this language, and once you see why, you’ll have a better idea why learning about OpenCL is worth your own.

1.1. The dawn of OpenCL

The x86 architecture enjoys a dominant position in the world of personal computing, but there is no prevailing architecture in the fields of graphical and high-performance computing. Despite their common purpose, there is little similarity between Nvidia’s line of Fermi processors, AMD’s line of Evergreen processors, and IBM’s Cell Broadband Engine. Each of these devices has its own instruction set, and before OpenCL, if you wanted to program them, you had to learn three different languages.

Enter Apple. For those of you who have been living as recluses, Apple Inc. produces an insanely popular line of consumer electronic products: the iPhone, the iPad, the iPod, and the Mac line of personal computers. But Apple doesn’t make processors for the Mac computers. Instead, it selects devices from other companies. If Apple chooses a graphics processor from Company A for its new gadget, then Company A will see a tremendous rise in market share and developer interest. This is why everyone is so nice to Apple.

Important events in OpenCL and multicore computing history

2001— IBM releases POWER4, the first multicore processor.

2005— First multicore processors for desktop computers released: AMD’s Athlon 64 X2 and Intel’s Pentium D.

June 2008— The OpenCL Working Group forms as part of the Khronos Group.

December 2008— The OpenCL Working Group releases version 1.0 of the OpenCL specification.

April 2009— Nvidia releases OpenCL SDK for Nvidia graphics cards.

August 2009— ATI (now AMD) releases OpenCL SDK for ATI graphics cards. Apple includes OpenCL support in its Mac OS 10.6 (Snow Leopard) release.

June 2010— The OpenCL Working Group releases version 1.1 of the OpenCL specification.

In 2008, Apple turned to its vendors and asked, Why don’t we make a common interface so that developers can program your devices without having to learn multiple languages? If anyone else had raised this question, cutthroat competitors like Nvidia, AMD, Intel, and IBM might have laughed. But no one laughs at Apple. It took time, but everyone put their heads together, and they produced the first draft of OpenCL later that year.

To manage OpenCL’s progress and development, Apple and its friends formed the OpenCL Working Group. This is one of many working groups in the Khronos Group, a consortium of companies whose aim is to advance graphics and graphical media. Since its formation, the OpenCL Working Group has released two formal specifications: OpenCL version 1.0 was released in 2008, and OpenCL version 1.1 was released in 2010. OpenCL 2.0 is planned for 2012.

This section has explained why businesses think highly of OpenCL, but I wouldn’t be surprised if you’re still sitting on the fence. The next section, however, explains the technical merits of OpenCL in greater depth. As you read, I hope you’ll better understand the advantages of OpenCL as compared to traditional programming languages.

1.2. Why OpenCL?

You may hear OpenCL referred to as its own separate language, but this isn’t accurate. The OpenCL standard defines a set of data types, data structures, and functions that augment C and C++. Developers have created OpenCL ports for Java and Python, but the standard only requires that OpenCL frameworks provide libraries in C and C++.

Here’s the million-dollar question: what can you do with OpenCL that you can’t do with regular C and C++? It will take this entire book to answer this question in full, but for now, let’s look at three of OpenCL’s chief advantages: portability, standardized vector processing, and parallel programming.

1.2.1. Portability

Java is one of the most popular programming languages in the world, and it owes a large part of its success to its motto: Write once, run everywhere. With Java, you don’t have to rewrite your code for different operating systems. As long as the operating system supports a compliant Java Virtual Machine (JVM), your code will run.

OpenCL adopts a similar philosophy, but a more suitable motto might be, Write once, run on anything. Every vendor that provides OpenCL-compliant hardware also provides the tools that compile OpenCL code to run on the hardware. This means you can write your OpenCL routines once and compile them for any compliant device, whether it’s a multicore processor or a graphics card. This is a great advantage over regular high-performance computing, in which you have to learn vendor-specific languages to program vendor-specific hardware.

There’s more to this advantage than just running on any type of compliant hardware. OpenCL applications can target multiple devices at once, and these devices don’t have to have the same architecture or even the same vendor. As long as all the devices are OpenCL-compliant, the functions will run. This is impossible with regular C/C++ programming, in which an executable can only target one device at a time.

Here’s a concrete example. Suppose you have a multicore processor from AMD, a graphics card from Nvidia, and a PCI-connected accelerator from IBM. Normally, you’d never be able to build an application that targets all three systems at once because each requires a separate compiler and linker. But a single OpenCL program can deploy executable code to all three devices. This means you can unify your hardware to perform a common task with a single program. If you connect more compliant devices, you’ll have to rebuild the program, but you won’t have to rewrite your code.

1.2.2. Standardized vector processing

Standardized vector processing is one of the greatest advantages of OpenCL, but before I explain why, I need to define precisely what I’m talking about. The term vector is going to get a lot of mileage in this book, and it may be used in one of three different (though essentially similar) ways:

Physical or geometric vector—An entity with a magnitude and direction. This is used frequently in physics to identify force, velocity, heat transfer, and so on. In graphics, vectors are employed to identify directions.

Mathematical vector—An ordered, one-dimensional collection of elements. This is distinguished from a two-dimensional collection of elements, called a matrix.

Computational vector—A data structure that contains multiple elements of the same data type. During a vector operation, each element (called a component) is operated upon in the same clock cycle.

This last usage is important to OpenCL because high-performance processors operate on multiple values at once. If you’ve heard the terms superscalar processor or vector processor, this is the type of device being referred to. Nearly all modern processors are capable of processing vectors, but ANSI C/C++ doesn’t define any basic vector data types. This may seem odd, but there’s a clear problem: vector instructions are usually vendor-specific. Intel processors use SSE extensions, Nvidia devices require PTX instructions, and IBM devices rely on AltiVec instructions to process vectors. These instruction sets have nothing in common.

But with OpenCL, you can code your vector routines once and run them on any compliant processor. When you compile your application, Nvidia’s OpenCL compiler will produce PTX instructions. An IBM compiler for OpenCL will produce AltiVec instructions. Clearly, if you intend to make your high-performance application available on multiple platforms, coding with OpenCL will save you a great deal of time. Chapter 4 discusses OpenCL’s vector data types and chapter 5 presents the functions available to operate on vectors.

1.2.3. Parallel programming

If you’ve ever coded large-scale applications, you’re probably familiar with the concept of concurrency, in which a single processing element shares its resources among processes and threads. OpenCL includes aspects of concurrency, but one of its great advantages is that it enables parallel programming. Parallel programming assigns computational tasks to multiple processing elements to be performed at the same time.

In OpenCL parlance, these tasks are called kernels. A kernel is a specially coded function that’s intended to be executed by one or more OpenCL-compliant devices. Kernels are sent to their intended device or devices by host applications. A host application is a regular C/C++ application running on the user’s development system, which we’ll call the host. For many developers, the host dispatches kernels to a single device: the GPU on the computer’s graphics card. But kernels can also be executed by the same CPU on which the host application is running.

Hosts applications manage their connected devices using a container called a context. Figure 1.1 shows how hosts interact with kernels and devices.

Figure 1.1. Kernel distribution among OpenCL-compliant devices

To create a kernel, the host selects a function from a kernel container called a program. Then it associates the kernel with argument data and dispatches it to a structure called a command queue. The command queue is the mechanism through which the host tells devices what to do, and when a kernel is enqueued, the device will execute the corresponding function.

An OpenCL application can configure different devices to perform different tasks, and each task can operate on different data. In other words, OpenCL provides full task-parallelism. This is an important advantage over many

Enjoying the preview?

Page 1 of 1

OpenCL in Action: How to accelerate graphics and computations

About this ebook

Matthew Scarpino

Read more from Matthew Scarpino

Related authors

Related to OpenCL in Action

Related ebooks

Programming For You

Related podcast episodes

Related articles

Related categories

Reviews for OpenCL in Action

What did you think?

Book preview

OpenCL in Action - Matthew Scarpino

Copyright

Brief Table of Contents

Table of Contents

Preface

Acknowledgments

About this Book

Audience

Roadmap

Obtaining and compiling the example code

Code conventions

Author Online

About the cover illustration

Part 1. Foundations of OpenCL programming

Chapter 1. Introducing OpenCL

1.1. The dawn of OpenCL

1.2. Why OpenCL?