Discover millions of ebooks, audiobooks, and so much more with a free trial

Only $11.99/month after trial. Cancel anytime.

Hands-On Julia Programming: An Authoritative Guide to the Production-Ready Systems in Julia
Hands-On Julia Programming: An Authoritative Guide to the Production-Ready Systems in Julia
Hands-On Julia Programming: An Authoritative Guide to the Production-Ready Systems in Julia
Ebook757 pages4 hours

Hands-On Julia Programming: An Authoritative Guide to the Production-Ready Systems in Julia

Rating: 0 out of 5 stars

()

Read preview

About this ebook

The Julia Programming language enables data scientists and programmers to create prototypes without sacrificing performance. Nonetheless, skeptics question its readiness for production deployments as a new platform with a 1.0 release in 2018. This book removes these doubts and offers a comprehensive glimpse at the language's use throughout developing and deploying production-ready applications.

The first part of the book teaches experienced programmers and scientists about the Julia language features in great detail. The second part consists of gaining hands-on experience with the development environment, debugging, programming guidelines, package management, and cloud deployment strategies. In the final section, readers are introduced to a variety of third-party packages available in the Julia ecosystem for Data Processing, Text Analytics, and developing Deep Learning models.

This book provides an extensive overview of the programming language and broadens understanding of the Julia ecosystem. As a result, it assists programmers, scientists, and information architects in selecting Julia for their next production deployments.
LanguageEnglish
Release dateOct 20, 2021
ISBN9789391030919
Hands-On Julia Programming: An Authoritative Guide to the Production-Ready Systems in Julia

Related to Hands-On Julia Programming

Related ebooks

Intelligence (AI) & Semantics For You

View More

Related articles

Reviews for Hands-On Julia Programming

Rating: 0 out of 5 stars
0 ratings

0 ratings0 reviews

What did you think?

Tap to rate

Review must be at least 10 words

    Book preview

    Hands-On Julia Programming - Sambit Kumar Dash

    CHAPTER 1

    Getting Started

    Introduction

    Julia is a fast dynamic language developed keeping high performance as the core objective of the platform. We will outline the capabilities of the language and look forward to programmers and researchers using the language for production environments.

    Structure

    In this chapter, we will cover the following topics:

    Objectives

    Purpose of Julia

    A brief history

    The book outline

    Setting up Julia

    Read-Eval-Print Loop (REPL)

    Interactive Julia and Jupyter

    Julia programs

    Objectives

    The readers will be able to understand the design considerations of the Julia language. They will get a brief history of the language and the outline of the book. They will also be able to set up the environment and try some simple Julia programs.

    Purpose of Julia

    A getting started chapter without a program on hello_world to begin with! Do not get too worried about it; we will get there soon. However, we will like you to understand a little bit about Julia concepts and conventions in order to be able to understand why the language is designed the way it is. This book intends to be a practitioner’s manual and should help you understand the Julia programming platform. We insist you consider Julia as a complete platform beyond a simple programming language to implement practical and production quality software with it. There are a dozen reasons to look at Julia as an alternative to your current scientific programming tools.

    1. Performance is the core

    Unlike other dynamic languages like Python, R, SPSS, or MATLAB, Julia is designed keeping performance as the central focus of its development. Type determinations and conversions take significant time in most interpreted languages. Julia is fundamentally a Just-in-Time (JIT) compiled language. The types are determined during runtime and several code segments are compiled based on the types. A multi-dispatch architecture¹ based on data types ensures the most efficient implementation is invoked. Julia performance benchmarks are carried out keeping compiled languages like C/C++ in mind. In many of them, Julia almost matches closely to performance levels² of C/C++. Due to these performance gains, developers can write high-performance libraries in native Julia. This is unlike languages like Python or R that serve as excellent front-end interfaces leaving the backend performing code to native compiled languages like C/C++.

    2. Modern programming language

    In the world of programming, where COBOL and FORTRAN programs written about half a century ago are still in existence, a decade is definitely a short period of time. Let us look at some of the competing languages such as Python (1990), R (1993), S (1976), SPSS (1968), and MATLAB (the late 1970s). None of these languages were developed with the need for a production-ready data science application development platform. Python started as a general-purpose scripting language; providing easy programming interfaces to develop quick solutions even for non-regular programmers, while R, S, SPSSS, and MATLAB provided excellent prototyping infrastructures for researchers. The first appearance in 2012 gave Julia a new kid in the block image. As you read along, you will realize Julia is really a smart brat. Maintaining backward compatibility with old systems may not be always practical. Python 3.x had many features that were incompatible with Python 2.x; putting programmers into the same level of difficulty to migrate and rewriting in a new language. Julia is modern and built with newer programming paradigms keeping researchers and data scientists in mind³; yet, it integrates effortlessly with native platforms.

    3. Integration with other platforms

    Good platforms provide great interfaces and APIs to build flexible and large systems. Better ones embrace industry best practices than reinventing them. Julia is a perfect example of the second kind. It has native integration with LLVM (high-level compiler tools), OpenBLAS (linear algebra), LAPACK (linear algebra), Intel MKL, OpenLibm, Nvidia CUDA, PCRE (regular expressions), and so on. This ensures that the platform is built on proven technologies. Moreover, Julia provides foreign interface binding to languages like C, C++, Python, Java, and FORTRAN. If you are used to the flexibility or power of any such libraries, you can continue to use them in Julia. For example, the Julia plotting interface can easily be federated with Python matplotlib or GR. Packages are continuously being built and maintained for TensorFlow⁴ for machine learning; Cairo⁵ for low-level graphics or Qt⁶ for windowing or Electron JS[⁷,⁸] for HTML-based client applications, and many more. All these are de-facto industry standard frameworks. Moreover, Julia wrapper packages are being built for commonly used native binary libraries⁹ like Zlib, OpenSSL and Qt, and so on. And, they are provided as binary distributions.

    4. Designed for researchers

    Julia is designed for researchers who would like to see their contribution made to the production code than having to be rewritten and redesigned for performance and maintainability. The functional programming paradigm ensures any program can be easily extended using compositions than rigidly adhering to pure object-oriented hierarchies. Data encapsulation is a directive. and language constructs do not rigidly block access to underlying attributes of an object.. Multiple dispatch interfaces provide easy polymorphic designs like an object-oriented system. Like most functional programming languages, recursion programming is encouraged where needed while iterative programming is fully supported. Lack of native support for tail-call optimization can be considered a minor bottleneck. It has support for Unicode variables and native set theoretic expressions that help researchers write complete mathematical expressions in Julia. Many researchers in the community develop applications with LaTeX document management systems to be able to present their research in exportable for publications formats. Moreover, some of the works in the Julia community are published in various research publications¹⁰. Interactive modes like IJulia and Jupyter provide easy prototyping alongside console-based REPL.

    5. Metaprogramming

    Metaprogramming is about generating the code and dynamically executing it. In some languages like C++, templates are based on types and the code is generated for the exact data type during the compilation process. The same exists in Julia as well under parametric data types. While such generic programs are very useful in an object-oriented programming language like C++, Julia goes a step further in virtually creating a complete syntax tree of Julia code and executes the tree-based on input data. This is an extremely flexible kind of metaprogramming that is supported in languages like Lisp or Scheme. Julia has a strong heritage of Lisp and ships with a FemtoLisp compiler for some of its low-level parsing functionalities. This not only makes the programs easy to write but also makes functional programming very powerful; there is also a downside of the code becoming too complex for the novice programmers.

    6. Development toolset

    The Julia platform is self-sufficient as far as the entire code generation infrastructure is concerned. You do not have to install anything additional if you are installing the platform for normal development work. However, there are integrations with editors like Emacs, vi, and Sublime¹¹. Fully integrated and managed IDEs are implemented with Atom and Microsoft Code. You can use these IDEs for functional navigations, line debugging, showing interactive plots, and so on. The toolset integrates seamlessly with source control (git derivatives like GitHub, GitLab), coverage tools (CoverAll.io, CodeCov), continuous integration systems (GitHub Actions), and so on.

    7. Package ecosystem

    Any platform that is non-extendible is incomplete in today’s world. Julia understands that need and provides a complete package development and management ecosystem. You can use JuliaHub¹² to search for the packages you need and deploy them in your environment. With roughly about 4500+ packages, the number is fairly limited in comparison to other platforms like Python, R, MATLAB, and so on. However, for a language that came to a stable interface version 1.0 in October 2018, this number shows the continual interest in enhancing the platform ecosystem. While some of the functionalities may not exist in Julia, it’s not hard to consider the equivalent C/C++ or Java binaries and integrate them with your Julia code using the native interfaces. The Julia package infrastructure is based on git, which makes it absolutely easy for people to develop their own private repositories. Moreover, a full package and environment management system Pkg provides self-contained environments such that compatibility across packages is maintained.

    8. Server platforms support

    When you start using Julia, you will realize the system is well suited for cloud-based application development. Julia will work in the backend system for computation layers, leaving the rendering and connection management interfaces to well-defined applications and web servers. While some web-based application servers are developed in Julia, they are mostly experimental. Most code written in Julia is kept in the source code form amenable to JIT compilation. Thus, statically compiled code is not the most common deployment practice in Julia, although the flexibility exists through certain packages¹³. As a server-based platform, Julia with its collection of packages can be deployed with ease on the cloud platforms like AWS, Azure, or be delivered as Docker packages.

    9. Open source and flexible licensing

    Julia is developed as a collection of open-source projects. We will insist on the usage of the term collection of open-source projects as most of the packages used are distributed as open source but may have different licensing than the Julia language. The Julia language is developed under the MIT License¹⁴, which provides significant flexibility in terms of using and extending the sources. The same also makes it easy for anyone to include complete Julia or its derivatives as a part of the other applications they are developing. Most packages developed in Julia also follow similar licensing schemes. Make sure you review the licensing flexibilities for the packages¹⁵ before you use them in your commercially licensed products.

    10. Enterprise-friendly architecture

    Just like the licensing, Julia architecturally is designed for federated enterprise environments. One can use their own private repository for applications and packages, deployment on enterprise-centric internal cloud infrastructures, and packages to integrate with enterprise federated authentication and authorization schemes. The Julia language also maintains all major versions like 1.0 as long-term support (LTS) to cater to enterprise support requirements¹⁶. Most importantly, Julia is supported by a Julia Computing Inc.¹⁷, an enterprise that provides tools, supports Julia, and is actively involved in the community activities of Julia programming. Enterprises interested in tools from Julia computing will be able to obtain various support plans from them. All founding members of the Julia language are currently employed in Julia Computing.

    11. Active community

    Any open-source software is good as long as there is continued interest in the community. Started with just three researchers and their advisor, the team expanded getting into many academic institutes and researchers showing active interest in it. The language’s main repository has about a thousand contributors, 30,000+ stars¹⁸ and 4200+ forks¹⁹. Beyond the main language repository, many developers contribute to the 4500+ packages available in the Julia packages ecosystem. The repositories are maintained in GitHub. While most communication is carried out as part of pull requests and issues tracking GitHub, specific project groups interact with Discourse, Slack, and GitHub projects as well. The Julia community conducts JuliaCon²⁰, a conference dedicated to the development and evangelizing of the Julia ecosystem. There are many meetup groups where communities interact and exchange ideas. Free and paid courses are available to understand the ecosystem²¹ better.

    12. Get under the hood

    The most important advantage of working on a relatively new programming language like Julia is the flexibility to look into the code and understand the implementation, suggest changes by raising issues, submit PRs, and actively contribute to the environment. The ability and interest to contribute where you can bring in the change you desire definitely needs to be explored for more discerning programmers. For students, contributions into Julia can be funded with Google Summer of Code sponsorship²² projects; thus, meeting their professional internship needs. If you are a developer who enjoys getting into the thick of action, here is a platform that you can work on as well as contribute to. And the community is one of the most welcoming ones in terms of encouraging changes from almost novice programmers. This also is a quick way to learn and understand the system better.

    A brief history

    Julia was conceived about a decade ago but it came to prominence in 2012 as an open-source project. Started by three computer scientists Jeff Bezanson, Stefan Karpinski, and Viral B. Shah under the guidance of Prof. Alan Edelman of MIT, Julia had a very humble beginning as an open-source project. A significant problem they realized was the need to first develop prototypes as researchers and then productize the code into a lower-level precompiled language to get that extra performance of a compiled language. Julia tries to break that barrier of prototype vs. production. The authors of Julia chose the name Julia as a tribute to the mathematical concept of Julia sets. They insist the Julia programming language or Julia language be used while communicating. Julia when used as a proper noun must be used in a gender-neutral form. We will respect the same convention in this book.

    The book outline

    The author started working with the system about three years back in search of a programming language that was specifically designed for data science projects. With Julia 1.0 still in the development phases, there was a deeper engagement needed to understand the language. Surprisingly, for a language that is just a few years old, there are quite a few books written. There is good quality documentation already available. However, many books written on pre-1.0 versions are no longer relevant as the interfaces of the language have changed. Secondly, no book provides exposure to the entire Julia ecosystem that an experienced professional will look for. This book is an attempt to address both these needs.

    The book is composed of 3 modules and 17 chapters.

    Module 1. The Julia language

    This module is focused on understanding the Julia programming language.

    Chapter 1. Getting started

    This chapter states the rationale and history behind the Julia language, provides the initial installation and configuration process, and shows cases some simple examples toward the end of the chapter.

    Chapter 2. Data types

    Julia is an optionally typed language. While there is no necessity for a data type to be specified in a program and the types are inferred automatically during on demand JIT compilation. Yet, Julia provides the flexibility to specify data types. This enables a behavioral polymorphism in a dynamic typed language, secondly, dispatch to specific method becomes easier and enables faster execution and thus performance gains. Types can be simple types like int, float, Boolean, and so on. or user-defined types like structures. They can also be immutable objects like tuples. This chapter will help us take a quick look at all these data types.

    Chapter 3. Conditions, control flow, and iterations

    Programming is incomplete without a clear-cut control flow of data and instructions. In most cases, the flow is controlled while fulfilling a condition. Sometimes the conditions can be a simple variable comparison with another value or could be repetitive till a particular condition is met. The control flow logic may be needed for taking remedial actions when an exceptional condition occurs. These varied control flows are achieved by specialized language constructs. In Julia, if…else, for, while loops are used for most of the regular loop constructions, while try…catch are used for exceptional conditions. Similarly, Julia provides tasks-based control flows for parallel execution for cooperative multi-tasking and preemptive multi-tasking with threads.

    Chapter 4. Functions and methods

    If data types are nouns of communication, then functions play the role of a verb. Functions provide a mechanism to make small chunks of code that can meaningfully accomplish a task; the task can be repetitive or can represent a logically separable concept. Functions can take multiple inputs and return one or more output values as a tuple. The multiple dispatch architecture of Julia enables different functions are invoked based on the data type of input parameters. This enables a polymorphic behavior of the functions and can be called as methods. The polymorphic behavior with parametric data types can enable the implementation of reusable algorithms. Julia does not support automatic promotion across data types but provides enablers for programmers to define their own type conversion and promotion rules. We will take a look at recursion which is considered as a fundamental method of control flow in many algorithmic tasks, hence given emphasis in functional languages. Lastly, we will look at anonymous functions which are similar to the concept of lambda in languages like Lisp. Anonymous functions are used as predicates or callbacks in many functional language applications.

    Chapter 5. Collections

    Iterations are carried out on a collection of objects to achieve a repetitive task for which a certain predicate is fulfilled. Second, objects are stored in a collection to ensure certain operations are most efficient. For example, linear integer-based iteration along a vector is very efficient while a dictionary is ideal to match a value for a given key. Similarly, sets are useful data types for checking for membership. Julia provides certain standard data types like Array, Dict, and Set. While these address the needs of a specific class of computational problems, it is insufficient to address a vast majority of issues faced in the scientific computing. Hence, Julia provides an iteration architecture that can be implemented by custom data structures. DataStructures.jl package provides a class of commonly used data structures.

    Chapter 6. Arrays

    Programming for mathematics or scientific discipline is incomplete without matrix and matrix operations. Arrays are a foundation to that concept. Arrays can be one, two, or multi-dimensional. While one-dimensional arrays are called Vectors and two-dimensional Matrix, there are arrays that can be sparse. For example, a diagonal matrix has only the diagonal elements filled in. Similarly, dense representation of arrays can have a large number of elements filled in. It will be important to understand the array representations, the indexing schemes that can be represented by the arrays, how to access a small subset of an array and conduct localized operations on them or any representational aspects that can make it easier for using linear algebra operations.

    Chapter 7. Strings

    Strings are used for the textual presentation of information; when no other structure can represent the data effectively data can be presented as strings. Julia strings follow the Unicode specification and are encoded using the UTF-8 encoding. However, transformations can be carried out into other encodings with ease. Strings can also be considered as collections of characters. In this chapter, we will look at methods applicable to strings and their applications. Non-standard strings can be developed for custom applications while they continue to support the String framework. Regular expressions are another important concept that is utilized extensively for text pattern matching. Julia utilizes the PCRE engine for regular expressions. Lastly, we shall see how formatted outputs can be created for objects for output to terminals or file systems.

    Chapter 8. Metaprogramming

    Metaprogramming is conceptually a mechanism of dynamically generating code that can be run later. This late binding approach provides the flexibility in creating code segments based on the dynamic states of the application. Most of such codes are written as macros. As part of this chapter, we will analyze how to write macros. Some macros like the string literals have special meanings and they aid compact representation of string-like sequences. We will also review generated functions that can help create complex non-standard dispatch rules. Lastly, we will look at the implementations of certain commonly used macros and understand their intent and implementation.

    Chapter 9. Standard libraries

    Standard libraries are the heart of a programming language. Julia functionalities are arranged in various modules. Core, Base, and Main are some of the predefined modules. In this chapter, we will touch upon a few APIs from the Base modules. Then look at additional modules that represent the standard libraries. Julia has a rich set of predefined functionalities for commonly used tasks.

    Module 2. The development environment

    This module familiarizes the reader with the development environment.

    Chapter 10. Programming guidelines in Julia

    Every programming language has certain suggested guidelines in writing programs for better readability, maintainability, and performance. Some are purely on the basis of style guides suggested by the language development practices, some are design guidelines that help in keeping the code extendable and maintainable over the complete life cycle. Lastly, guidelines that help write high-performance code. We will discuss the first two in this chapter and the last one as we elaborate on performance management in the next chapter.

    Chapter 11. Performance management

    Performance is the key to scientific computing and for Julia it is one of the fundamental design considerations²³. Hence, there is utmost focus in ensuring every code written in Julia can easily be timed and can be reviewed for any anomalies that can lead to a performance penalty. Julia provides general guidelines for performance improvements, provides many tools to analyze performance bottlenecks and carry out benchmark against standards. In this chapter, we will review many such tools that will aid in the improvement of code performance.

    Chapter 12. IDE and debugging

    Integrated development environments (IDE) are becoming common in today’s application development. As much as they provide a wide variety of views of multiple files and their interrelations, they also help in accomplishing tasks like compilation, running the code and viewing results with visualizations. As a scientific computing language, algorithm-driven applications are developed in Julia. REPL is used extensively to try out small chunks of code before they are finally added to the larger application. Debugging line by line using a line debugger is also carried out in some circumstances. We will look at Julia’s integration with editors like emacs and vi. Also, work with Atom and Microsoft Code for a complete IDE experience.

    Chapter 13. Package management

    Although, named as package management, this chapter is about code organization for easier management and deployment of applications. Managing code may become complex as it may depend on other components. There may be dependency issues due to version incompatibilities. The code needs to be tested automatically on check-in, continuous integration frameworks should be able to run and any version management needed should be carried out with ease. In Julia, code is typically composed of projects, packages, and modules. Packages are well-defined organizations of code that help manage continuous integration, automated testing, document creation, and deployment with the least effort. There is an inherent structure between package management and git APIs that makes the package management part of the development process. Moreover, Julia provides environments that can be used to establish an isolated and consistent package sets for specific application requirements.

    Chapter 14. Deployment

    Any application is as good as it’s deployed in the cloud in a production environment. We will review how effectively Julia can be used to invoke AWS services and deploy solutions using the AWS cloud services. Similarly, Julia can be used easily in the Azure environment. There is also a strong integration framework with Kubernetes.

    Module 3. Packages in Julia

    The last module introduces a few packages that are commonly used in data science and related applications.

    Chapter 15. Data transformations

    Data is central to all data science applications. Data may be available in various file types. We will begin with FileIO set of data acquisition APIs that will help read data from various file types into Julia. We will also look into PDFIO, a package designed to read PDF files. Further, the data can be read into DataFrames for some specific data transformation tasks. We will also see how the data can be sampled effectively for various data-centric applications. We will review Julia Plotting in the visualization of data.

    Chapter 16. Text analytics

    Text Analytics is the science of obtaining meaningful insights from unstructured text data. While natural language processing is a form of text analytics, text analytics can be carried out even with simpler techniques and rules. Simple pattern matching and regular expressions can provide some minimal level of text analytics. TextAnalysis.jl and related packages will be utilized to show some advanced text analysis in Julia.

    Chapter 17. Deep learning

    Neural networks have become the lingua franca of machine learning techniques. The Flux.jl package in Julia provides excellent APIs to model neural networks of various complexities. As part of this chapter, we will delve into how to write simple deep learning models and try to solve some simple problems using neural networks. Most of the models are

    Enjoying the preview?
    Page 1 of 1