XProc 3.0 Programmer Reference
By Erik Siegel
()
About this ebook
XProc 3.0 is a programming language for processing XML, JSON, and other documents in pipelines. XProc chains conversions and other steps, allowing for potentially complex processing. XProc is especially useful for applications, such as publishing, where content may come from multiple input sources, pass through multiple processing steps and result in multiple output streams.
XProc 3.0 Programmer Reference is aimed at programmers and others who process XML. It explains the language in detail, provides examples, and contains a set of example use cases. Anyone who uses the XProc language will find a wealth of information in this book.
Erik Siegel
Erik Siegel runs Xatapult, a consultancy that offers coaching, training, applications, and more to the publishing world.
Related to XProc 3.0 Programmer Reference
Related ebooks
IronPython in Action Rating: 0 out of 5 stars0 ratingsPrototype and Scriptaculous in Action Rating: 4 out of 5 stars4/5JSTL: Practical Guide for JSP Programmers Rating: 3 out of 5 stars3/5Java: Practical Guide for Programmers Rating: 3 out of 5 stars3/5Ajax in Action Rating: 0 out of 5 stars0 ratingsLinux Shell Scripting Essentials Rating: 1 out of 5 stars1/5C# Package Mastery: 100 Essentials in 1 Hour - 2024 Edition Rating: 0 out of 5 stars0 ratingsWPF in Action with Visual Studio 2008: Covers Visual Studio 2008 Service Pack 1 and .NET 3.5 Service Pack 1! Rating: 0 out of 5 stars0 ratingsRPG TnT: 101 Dynamite Tips 'n Techniques with RPG IV Rating: 5 out of 5 stars5/5The Struts Framework: Practical Guide for Java Programmers Rating: 0 out of 5 stars0 ratingsOntologies with Python: Programming OWL 2.0 Ontologies with Python and Owlready2 Rating: 0 out of 5 stars0 ratingsBeginning Spring 5: From Novice to Professional Rating: 0 out of 5 stars0 ratingsIPv6 Socket API Extensions: Programmer's Guide Rating: 0 out of 5 stars0 ratingsPractical Oracle Cloud Infrastructure: Infrastructure as a Service, Autonomous Database, Managed Kubernetes, and Serverless Rating: 0 out of 5 stars0 ratingsLINQ in Action Rating: 0 out of 5 stars0 ratingsLearning Linux Shell Scripting Rating: 4 out of 5 stars4/5Exploring the Python Library Ecosystem: A Comprehensive Guide Rating: 0 out of 5 stars0 ratingsQ Tips: Fast, Scalable, and Maintainable Kdb+ Rating: 0 out of 5 stars0 ratingsOracle Advanced PL/SQL Developer Professional Guide Rating: 4 out of 5 stars4/5Mastering Python Programming: A Comprehensive Guide: The IT Collection Rating: 5 out of 5 stars5/5Programming Concepts in Python Rating: 0 out of 5 stars0 ratingsPractical Go: Building Scalable Network and Non-Network Applications Rating: 0 out of 5 stars0 ratingsJSP: Practical Guide for Programmers Rating: 0 out of 5 stars0 ratingsAgent-based Spatial Simulation with NetLogo, Volume 2: Advanced Concepts Rating: 0 out of 5 stars0 ratingsLogging in Action: With Fluentd, Kubernetes and more Rating: 0 out of 5 stars0 ratings6LoWPAN: The Wireless Embedded Internet Rating: 0 out of 5 stars0 ratingsFreeSWITCH 1.0.6 Rating: 0 out of 5 stars0 ratingsThe PHP Workshop: Learn to build interactive applications and kickstart your career as a web developer Rating: 0 out of 5 stars0 ratingsDistributed Computing with Python Rating: 0 out of 5 stars0 ratingsSplunk Certified Study Guide: Prepare for the User, Power User, and Enterprise Admin Certifications Rating: 0 out of 5 stars0 ratings
Software Development & Engineering For You
Learning Python Rating: 5 out of 5 stars5/5How to Write Effective Emails at Work Rating: 4 out of 5 stars4/5iOS App Development For Dummies Rating: 0 out of 5 stars0 ratingsPython For Dummies Rating: 4 out of 5 stars4/5Level Up! The Guide to Great Video Game Design Rating: 4 out of 5 stars4/5Adobe Illustrator CC For Dummies Rating: 5 out of 5 stars5/5Hand Lettering on the iPad with Procreate: Ideas and Lessons for Modern and Vintage Lettering Rating: 4 out of 5 stars4/5Tiny Python Projects: Learn coding and testing with puzzles and games Rating: 5 out of 5 stars5/5Learn to Code. Get a Job. The Ultimate Guide to Learning and Getting Hired as a Developer. Rating: 5 out of 5 stars5/5Lua Game Development Cookbook Rating: 0 out of 5 stars0 ratingsRy's Git Tutorial Rating: 0 out of 5 stars0 ratingsReversing: Secrets of Reverse Engineering Rating: 4 out of 5 stars4/5PYTHON: Practical Python Programming For Beginners & Experts With Hands-on Project Rating: 5 out of 5 stars5/5Coding All-in-One For Dummies Rating: 0 out of 5 stars0 ratingsGrokking Algorithms: An illustrated guide for programmers and other curious people Rating: 4 out of 5 stars4/5Engineering Management for the Rest of Us Rating: 5 out of 5 stars5/5Beginning C++ Programming Rating: 3 out of 5 stars3/5Beginning Programming For Dummies Rating: 4 out of 5 stars4/527 PROGRAM MANAGEMENT INTERVIEW TECHNIQUES - To Ace That Dream Job Offer ! Rating: 5 out of 5 stars5/5Modern C++ for Absolute Beginners: A Friendly Introduction to C++ Programming Language and C++11 to C++20 Standards Rating: 0 out of 5 stars0 ratingsRESTful API Design - Best Practices in API Design with REST: API-University Series, #3 Rating: 5 out of 5 stars5/5Android App Development For Dummies Rating: 0 out of 5 stars0 ratingsGood Code, Bad Code: Think like a software engineer Rating: 5 out of 5 stars5/5DevOps For Dummies Rating: 4 out of 5 stars4/5How Do I Do That in Photoshop?: The Quickest Ways to Do the Things You Want to Do, Right Now! Rating: 4 out of 5 stars4/5How Do I Do That In InDesign? Rating: 5 out of 5 stars5/5INSTANT PLC Programming with RSLogix 5000 Rating: 4 out of 5 stars4/5
Reviews for XProc 3.0 Programmer Reference
0 ratings0 reviews
Book preview
XProc 3.0 Programmer Reference - Erik Siegel
XProc 3.0 Programmer Reference
Table of Contents
Preface
Who is this book for?
How to use this book
The XProc logo
Conventions used
File extension
Explaining XML structures
Special data types
Using and finding code examples
Additional resources
XProc and underlying standards
Other information
Contact information
Acknowledgements
1. Introduction
1.1. What is XProc?
1.2. A first example
1.3. The language specification, step library, and this book
1.4. A little history
1.5. Who is using XProc and for what?
1.6. Alternatives
2. Getting started with XProc
2.1. Installing and using an XProc processor
2.1.1. XML Calabash
2.1.2. MorganaXProc
2.1.3. Hello XProc world
2.1.4. Integration in oXygen
2.2. XProc 101
2.2.1. Converting to HTML
2.2.1.1. Basic conversion
2.2.2. Adding additional header information
2.2.2.1. Using an option instead of a variable
2.2.2.2. Loading from another input port
2.2.2.3. Supplying defaults for input ports
2.2.2.4. Indenting the result XML
2.2.3. Inserting additional data
2.2.4. Generic data conversion to HTML
3. XProc fundamentals
3.1. Pipelines and steps
3.2. Documents
3.2.1. Representations and properties
3.2.2. Document types
3.2.2.1. XML Documents
3.2.2.2. HTML documents
3.2.2.3. JSON documents
3.2.2.4. Text documents
3.2.2.5. Other documents
3.3. Steps
3.3.1. Properties of a step
3.3.1.1. Type
3.3.1.2. Ports
3.3.1.3. Options
3.3.2. Kinds of steps
3.4. Ports
3.4.1. Properties of ports
3.4.2. Connecting or binding ports
3.4.2.1. Implicit bindings
3.4.2.2. Explicit binding
3.4.2.3. The Default Readable Port
4. Programming concepts
4.1. XPath in XProc
4.1.1. Extension functions
4.2. XPath expressions and step options
4.3. QName magic
4.4. Map usage
4.5. Document details
4.5.1. Document properties
4.5.1.1. content-type
4.5.1.2. base-uri
4.5.1.3. serialization
4.5.2. JSON documents
4.5.3. Documents and step results
4.6. Attribute and Text Value Templates
4.6.1. Attribute Value Templates in the XProc code
4.6.2. Value Templates and the Default Readable Port
4.6.3. Value Template expansion in inline content
4.6.3.1. Turning Value Templates on or off for inline content
4.6.3.2. TVT expansion in inline content
4.7. Common attributes
4.7.1. The [p:] notation for attributes
4.7.2. The [p:]expand-text attribute
4.7.3. The [p:]use-when attribute
4.7.4. The xml:base attribute
4.7.5. The xml:id attribute
4.8. Documentation and annotations
4.8.1. Documentation
4.8.2. Annotations
5. Defining steps: Prolog
5.1. Declaring a step:
5.1.1. Declaring an atomic step
5.2. Declaring ports
5.2.1. Declaring input ports:
5.2.2. Declaring output ports:
5.2.3. Specifying connections for input and output port declarations
5.2.4. Specifying serialization
5.2.5. Specifying content types
5.3. Declaring options:
5.3.1. Specifying an option’s acceptable values
5.3.2. Static options
5.3.3. Visibility of options
5.4. Importing steps and step libraries:
5.5. Importing function libraries:
5.6. Building step libraries:
6. Populating steps
6.1. A word upfront: understandable steps
6.2. Anatomy of a step’s body
6.3. Invoking steps
6.3.1. Default step names
6.3.2. Step dependencies: [p:]depends attribute
6.4. Connecting input ports:
6.5. Setting options:
6.6. Declaring variables:
6.6.1. Addressing multiple documents
6.6.2. Visibility of variables
6.7. Common connection constructs
6.7.1. Specifying nothing:
6.7.2. Reading external documents:
6.7.3. Explicitly connecting to another port:
6.7.4. Specifying documents inline:
6.7.4.1. Excluding namespace prefixes
7. Core steps
7.1. Looping:
7.2. Acting on a part of a document:
7.3. Making decisions:
7.3.1. Multiple decisions:
7.3.1.1. p:when
7.3.1.2. p:otherwise
7.3.2. Single decision:
7.4. Grouping:
7.5. Error handling:
7.5.1. p:catch
7.5.2. p:finally
7.5.3. Error specification
7.6. Output ports of subpipelines
7.6.1. The subpipeline’s anonymous output port
7.6.2. Explicitly defining output ports in subpipelines
8. Built-in steps
8.1. Classification of built-in steps
8.2. The standard step library
8.3. Some commonly used steps
8.3.1. Doing nothing:
8.3.2. XSLT Transformation:
8.3.3. Adding attributes:
8.3.4. Deleting stuff:
8.3.5. Inserting stuff:
8.3.6. Wrapping a sequence:
8.3.7. Resolving XIncludes:
8.3.8. Storing documents:
8.3.9. Changing a document’s type:
9. Extension functions
9.1. Environment information related functions
9.2. Iteration related functions
9.3. Document properties related functions
9.4. Other functions
10. XProc examples and recipes
10.1. Using
10.2. Working with document-properties
10.3. Working with JSON documents
10.4. Working with text documents
10.5. Working with other documents
10.6. Working with zip archives
10.6.1. Inspect a zip archive
10.6.2. Extract and change a file from a zip archive
10.6.3. Creating a new zip archive
10.7. Debugging hints
10.7.1. Follow progress using messages
10.7.2. See intermediate results using
10.7.3. Turn code on/off with the [p:]use-when attribute and static options
A. Standard step library
A.1. p:add-attribute
A.2. p:add-xml-base
A.3. p:archive
A.3.1. The archive manifest
A.3.2. Handling of ZIP archives
A.4. p:archive-manifest
A.5. p:cast-content-type
A.5.1. Casting from an XML media type
A.5.2. Casting from an HTML media type
A.5.3. Casting from a JSON media type
A.5.4. Casting from a text media type
A.5.5. Casting from any other media type
A.6. p:compare
A.7. p:compress
A.8. p:count
A.9. p:delete
A.10. p:error
A.11. p:filter
A.12. p:hash
A.13. p:http-request
A.13.1. Construction of a multipart request
A.13.2. Managing a multipart response
A.14. p:identity
A.15. p:in-scope-names
A.16. p:insert
A.17. p:json-join
A.18. p:json-merge
A.19. p:label-elements
A.20. p:load
A.20.1. Loading XML data
A.20.2. Loading text data
A.20.3. Loading JSON data
A.20.4. Loading HTML data
A.20.5. Loading binary data
A.21. p:make-absolute-uris
A.22. p:namespace-delete
A.23. p:namespace-rename
A.24. p:pack
A.25. p:parameters
A.25.1. The c:param element
A.25.2. The c:param-set element
A.26. p:rename
A.27. p:replace
A.28. p:set-attributes
A.29. p:set-properties
A.30. p:sink
A.31. p:split-sequence
A.32. p:store
A.33. p:string-replace
A.34. p:text-count
A.35. p:text-head
A.36. p:text-join
A.37. p:text-replace
A.38. p:text-sort
A.39. p:text-tail
A.40. p:unarchive
A.41. p:uncompress
A.42. p:unwrap
A.43. p:uuid
A.44. p:wrap
A.45. p:wrap-sequence
A.46. p:www-form-urldecode
A.47. p:www-form-urlencode
A.48. p:xinclude
A.49. p:xquery
A.49.1. Example
A.50. p:xslt
A.50.1. Invoking an XSLT 3.0 stylesheet
A.50.2. Invoking an XSLT 2.0 stylesheet
A.50.3. Invoking an XSLT 1.0 stylesheet
B. Optional built-in steps overview
B.1. Dynamic pipeline execution
B.1.1. p:run
B.2. File steps
B.2.1. p:directory-list
B.2.1.1. Directory list details
B.2.2. p:file-copy
B.2.3. p:file-create-tempfile
B.2.4. p:file-delete
B.2.5. p:file-info
B.2.6. p:file-mkdir
B.2.7. p:file-move
B.2.8. p:file-touch
B.3. Operating system steps
B.3.1. p:os-exec
B.3.2. p:os-info
B.4. Mail steps
B.4.1. p:send-mail
B.5. Paged media steps
B.5.1. p:css-formatter
B.5.2. p:xsl-formatter
B.6. Text steps
B.6.1. p:markdown-to-html
B.7. Validation steps
B.7.1. Validate with NVDL
B.7.2. Validate with RELAX NG
B.7.3. Validate with Schematron
B.7.4. Validate with XML Schema
C. Namespaces
D. Copyright credits
Step Index
E. Copyright and Legal Notices
XProc 3.0
Programmer Reference
Erik Siegel
Preface
Welcome to this book about XProc 3.0, a programming language for processing XML (and other) documents in pipelines. XProc chains conversions and other steps together to achieve potentially complex results.
XProc is currently not a well-known or a widely used programming language. Most systems that process XML are relatively simple. A single XSLT or even some DOM manipulation in some programming language often suffices. There is no need to bother with yet another processor and programming language.
However, there are areas where XML processing gets complicated. One example is publishing, where XML is widely used to mark up content. Most publishers struggle with consolidating input from various sources (Word, InDesign, PDF, XML, HTML, etc.) into a central XML format such as DITA or DocBook. From this XML they create their output: books, magazines, newspapers, websites, etc. Both sides of this process—to and from the central XML format—involve heavy XML processing, and this is where XProc can play a starring role and sometimes even make the difference between impracticable and feasible.
But people love what they know, and unknown means unloved. The syntax of XProc version 1.0 also didn’t help: it put off a lot of people, even seasoned XML programmers who were used to other pipeline tools such as Cocoon. In addition, introductory material for learning the language was hard to find.
XProc 3.0 was designed with these shortcomings in mind. The syntax was simplified and new features added. These language advancements, backed up by this programmer’s manual, will hopefully make it a lot easier for people to learn and use XProc.
Heavy XML processing is a niche in the programming world, so the XProc language will probably never become really mainstream. However, I hope that our efforts on the specification, together with the availability of this book, will make it more likely that people will use XProc.
Who is this book for?
XProc is a programming language originating from the XML world. So, no surprise, this book is aimed at programmers and others who do XML processing and use, or are considering using, XProc. It explains the language in detail, provides examples, and contains a selected set of example use cases (Chapter 10).
Readers should have some background knowledge of XML and XML processing. Knowing, even superficially, languages like XSLT, XQuery, or XPath will certainly help you understand XProc better. This book does not set out to teach these other languages; it concentrates on XProc only. Consult the section called Additional resources
if you want to know more about the underlying technologies, tools, and standards.
How to use this book
There are several ways you can use this book, including the following:
If you want to learn it all, read it all. Skim the appendices and use them for reference.
If you want an introduction, a basic understanding of what XProc is, what it can be used for, and how it does its magic, read Chapter 1 and Chapter 3.
Use it as a reference guide as you use XProc.
Find inspiration in the use cases in Chapter 10.
From my own experience in learning a new programming language, I know it helps to read, or at least glance at, the entire book. Simply find out what’s there so you know what the capabilities are, even if you don’t initially dig into all of the details.
A personal sorry tale: I did some foolish and heavy programming to get loop counters in
Here is an overview of the structure of the book:
Chapter 1 introduces XProc and provides a little (non-technical) background information.
Chapter 2 helps you get started using XProc. It tells you how to install and run the appropriate software. Examples range from the proverbial Hello world!
to a 101 crash course.
Chapter 3 provides background information about XProc and how it sees the world without delving into the syntax just yet: what is a pipeline and a document, what are their main properties, how do you connect steps, etc.
Chapter 4 explains the various programming concepts in XProc: the use of XPath, maps, Attribute and Text Value Templates (AVT/TVT), etc.
Chapters 5–7 are key in creating XProc pipelines. They explain the full language:
Chapter 5 explains how to declare a step: what does a pipeline document look like and how do you declare input ports, output ports, and options. It’s about the prolog of a pipeline.
Chapter 6 tells you how populate your pipelines with functionality: chaining steps, using variables, etc. It’s about the body of a pipeline.
Chapter 7 handles the core steps in XProc, including looping, branching, etc.
Chapter 8 lists the built-in steps that populate pipelines. There are many built-in steps. This chapter doesn’t handle them in detail, it mainly provides an overview.
Chapter 9 lists the XProc extension functions that are available for use in the XPath expressions in your pipeline. Among many others it contains functions for iteration information.
Chapter 10 provides some examples of XProc pipelines for specific use cases.
There are several appendices with reference information and a Step Index.
The XProc logo
To make absolutely sure XProc will be taken serious in the XML standards world, it has a logo (designed by Bethan Tovey-Walsh):
Figure 1. The Kanava XProc logo
The Kanava XProc logoWe even gave it a name: Kanava. This not only sounds funny and is easy to pronounce, but also appears to be Finnish for pipeline!
Conventions used
File extension
This book and its code examples use the preferred file extension for XProc documents: .xpl.
Explaining XML structures
This book uses a specific notation to explain XML structures. Although this notation looks like XML, it is not well-formed! So don’t copy/paste any of these examples directly into your code and expect them to work.
Figure 2 shows an XML structure for the fictitious
Figure 2. Sample XML structure for a fictitious element
attribute-2-required = (type)
attribute-3-fixed-values? = value-1
| value-2
| value-3
…
attribute-4-avt? = { (type) } >
(
Each example is followed by a table that describes the attributes and child elements in the example.
Here are some details about the syntax used in Figure 2:
Occurrences are given in DTD fashion:
Attributes are followed by a data type (for instance xs:string or xs:boolean) or by the list of values it can have (like the attribute-3-fixed-values attribute in Figure 2).
The xs namespace prefix for data types must be bound to the namespace http://www.w3.org/2001/XMLSchema (xmlns:xs=http://www.w3.org/2001/XMLSchema
).
There are some special, XProc specific, data types, for example SelPattern or XPathType. These are explained in the section called Special data types
.
When the type of an attribute is in curly braces {…} (like the attribute-4-avt attribute in the example), its value is an Attribute Value Template (AVT, see Section 4.6, Attribute and Text Value Templates
). It can contain XPath expressions between curly braces, {…}, which will be evaluated and expanded.
Child elements can be grouped, like the
In Figure 2, the occurrence indicator on this group is * (zero-or-more), so any combination of both elements would be valid, as would an empty value.
Special data types
Some attributes have a special data type. Table 1 shows the data-type indicators used in XProc.
Table 1 – Special data type indicators
Using and finding code examples
The code examples presented in this book, especially the ones in Chapters 2 and 10, can be found on GitHub at https://github.com/eriksiegel/XProc-3.0-book-sources.git. These examples are offered under the MIT License (see Appendix D). Feel free to download and use them.
Because I don’t know where you’ll install these source files, every reference to them is prefixed with $SOURCES, for example $SOURCES/hello-world/hello-world.xpl.
Additional resources
This section contains additional information resources. It’s compiled based on my personal preferences combined with suggestions from others. Although incomplete and biased, it is nevertheless a good place to start exploring.
XProc and underlying standards
The XProc 3.0 specification (http://spec.xproc.org/)
The XProc specification is written in the style of a W3C recommendation, using the terse prose necessary to reach the level of exactness a standard needs. This book was written based on the specification.
If during your XProc adventures you’re ever in doubt who is right, this book or the specification, stop asking. The specification is always right.
XPath 3.1 standard (https://www.w3.org/TR/xpath-31), XPath and XQuery Functions and Operators 3.1 (https://www.w3.org/TR/xpath-functions-31)
XProc uses XPath 3.1 as an underlying standard. These links bring you to the official standard documents.
If you’re new to XPath, I suggest you read Michael Kay’s book XSLT 2.0 and XPath 2.0. Although written for a previous version, it still stands. Its core concepts are explained well, and since 3.1 is backwards compatible with 2.0, anything you read is still true.
Another well-written and informative resource for learning XPath is Priscilla Walmsley’s book XQuery: Search Across a Variety of XML Data. This book is very complete, covering difficult subjects such as higher-order functions, maps, and arrays. However, it doesn’t make the distinction between what’s XPath and what’s XQuery as clear as it could. So you might end up trying XQuery constructs (like FLWOR expressions) as XPath expressions. That won’t work inside an XProc pipeline. The book also includes a full list of available XPath functions.
Other information
The World Wide Web Consortium (W3C) (http://www.w3c.org)The W3C is the body that manages, among other things, the XProc specification and related XML standards. Their website is easy to use and informative.W3 schools (http://w3schools.com)W3Schools is a developer site with tutorials and references on web development languages, including XSLT, XQuery and XPath.[1] XSLT 2.0 and XPath 2.0: Programmer’s Reference by Michael Kay (Wiley, 2008)Although this book is a bit outdated (XSLT is on version 3.0 and XPath is on version 3.1), it is still an excellent resource and a good place to start with XPath. XSLT, 2nd Edition by Doug Tidwell (O’Reilly, 2008)A good book to learn and explore XSLT. It’s a bit outdated but still useful for the basics. XSLT Cookbook, 2nd Edition by Sal Mangano (O’Reilly, 2006)XSLT examples in cookbook style. Useful when you have an XSLT problem to crack, and you need an example to start from.XSL-List (https://www.mulberrytech.com/xsl/xsl-list/)The XSL-List is a place to ask XSLT questions and receive help. The list is usually very responsive. It also has an archive. XQuery: Search Across a Variety of XML Data by Priscilla Walmsley (O’Reilly, 2016)Considered the authoritative resource for programming XQuery. Also a very good place to start when you need information about the underlying XPath standard.
Contact information
This book was written by Erik Siegel (Xatapult, http://www.xatapult.com). You can reach me at erik@xatapult.nl.
I would definitely like to hear from you. Whatever you have to say about this book, suggestions, omissions, errors, praise, likes, dislikes, disgust, please send me an e-mail. Knowing that there are people out there actually using what I’ve written will help me stay motivated.
Acknowledgements
I would like to thank the other members of the XProc 3.0 editorial team for their support, discussions and encouragements: Achim Berndzen, Gerrit Imsieke, and Norman Tovey-Walsh. Extra kudos for Achim and Norman because they somehow managed to work on an XProc processor in addition to their day-time jobs and working on the specification.
Beside the XProc editors, a lot of people participated in the discussions and meetings around XProc 3.0. This community group, chaired by Ari Nordström, varied in attendance and so I’m not going to mention other names, because I will undoubtedly forget someone. But if you were there you know it: thanks!
And there were, unbelievably, people that took the time and the effort to actually read all this and helped me with detailed feedback and criticism: my fellow XProc editorial crew-mates Achim, Gerrit, and Norman as well as Pieter Masereeuw, who looked at it from an outsider’s viewpoint. Thanks for all your time and effort!
[1] Be aware, W3Schools is a controversial website. Just mentioning it here caused a surprising amount of criticism from reviewers. It apparently has a history of errors and is infamous for heavy advertising. Nonetheless, I personally consult it regularly and find the information well-presented. Judge it for yourself.
Chapter 1. Introduction
1.1. What is XProc?
Let’s try to answer this question with an overview of XProc’s main high-level characteristics:
XProc is a programming language, expressed in XML, in which you can write pipelines.
An XProc pipeline takes data as its input (often XML) and passes it through specialized steps to produce end results.
Steps range from simple tasks, like reading and writing data, to complex ones like splitting/combining/pruning data, transforming data with XSLT or XQuery, and validating data.
Within a pipeline you can work with variables, create branches and loops, catch errors, etc. All of this processing is based on the data flowing through the pipeline.
XProc pipelines are not limited to a linear succession of steps. They can fork and merge.
XProc allows you to create custom steps by combining other steps. These custom steps can be used just like any other step. Custom steps can be collected into libraries.
XProc supports housekeeping functions such as inspecting directories, reading documents from zip files, writing data to disk, etc.
There is software—XProc processors—that can execute these pipelines.
Why and when would these capabilities be useful? In the physical world, pipelining and working in specialized steps is not unusual. For instance, an oil refinery takes crude oil as input and, through a series of steps and intermediate products, produces petrol/gasoline, kerosene, diesel, etc. Refineries take the word pipeline literally.
Another classic example of working in pipelines (although not literal) with specialized steps is a factory producing cars. This usually consists of a conveyor belt (the main pipeline) that takes the cars-to-be from one specialized assembly station to the next (the steps). There will probably be sub-conveyor belts (subpipelines) for parts like the engine, the cabling, the interior, etc. At the end you have a complete and functioning car.
Yet another example is the UNIX pipe. A command produces some output and you pipe that output to another command—for example grep or tail—which does further processing to get the desired result. The character used for chaining steps, |, is even called the pipe
character.
So why do this in the world of information and document processing? One of the main reasons is that data is often not in the format you need it to be. Here are some examples:
You have XML coming from some data source but need HTML for your website.
You have data coming from multiple weather stations that needs to be merged into a single consolidated view. From this you produce a map with the information nicely laid out.
Word processors produce zip files with lots of XML documents inside (most word processors do nowadays). You need the text in some other format so you have to inspect the zip file, combine the XML documents inside, and transform the result into what you need.
For straight transformation of XML data you can use languages such as XSLT and XQuery. But often tasks such as chaining, splitting, or merging are more complex than can be handled in a single transformation. For such tasks, you may need to perform housekeeping functions such as determining where to read from or write to, inspecting directories, creating zip files,