Discover millions of ebooks, audiobooks, and so much more with a free trial

Only $11.99/month after trial. Cancel anytime.

Mastering Java through Biology: A Bioinformatics Project Book
Mastering Java through Biology: A Bioinformatics Project Book
Mastering Java through Biology: A Bioinformatics Project Book
Ebook643 pages7 hours

Mastering Java through Biology: A Bioinformatics Project Book

Rating: 3 out of 5 stars

3/5

()

Read preview

About this ebook

Learn programming by solving key problems in bioinformatics and computational biology. You will simulate a neural membrane, align biological sequences, model chemotaxis and genetic drift, and find blood cells in an image.
In the process you will become proficient in Java, including the many new features of Java 8.
This practical book will help you find scientific software online and learn to release usable, high quality code.
LanguageEnglish
PublisherBookBaby
Release dateJul 22, 2014
ISBN9781483534404
Mastering Java through Biology: A Bioinformatics Project Book

Related to Mastering Java through Biology

Related ebooks

Computers For You

View More

Related articles

Reviews for Mastering Java through Biology

Rating: 3 out of 5 stars
3/5

2 ratings0 reviews

What did you think?

Tap to rate

Review must be at least 10 words

    Book preview

    Mastering Java through Biology - Peter Garst

    Mastering Java through Biology

    A bioinformatics project book

    Peter Garst

    Copyright © 2013 by Peter Garst

    All rights reserved

    License Notes

    This ebook is licensed for your personal enjoyment only. This ebook may not be re-sold or given away to other people. If you would like to share this book with another person, please purchase an additional copy for each recipient. If you’re reading this book and did not purchase it, or it was not purchased for your use only, then please return to your favorite ebook retailer and purchase your own copy. Thank you for respecting the hard work of this author.

    Errata and other information about this book is available at www.petergarst.com/JavaBio.

    Cover design by Kit Foster, www.kitfosterdesign.com.

    Please consider leaving a review for this book at your favorite retailer.

    Introduction

    Learning how to program by solving serious scientific problems is much cooler and much more fun than learning how to program by solving little made up examples. With the same time investment you can learn programming, plus a lot of computational biology, plus some algorithm design and other things.

    Furthermore, knowing computing and something else is an excellent place to be in life: computing and physics, computing and chemistry, computing and economics, or, in the case of this book, computing and biology.

    This is a systematic programming book, covering everything you need to know to be a very competent Java programmer. If you master the material here you will be fully ready to start solving problems, and to do more advanced work in software engineering.

    As a biology book it is a sampler. We will do programming projects in many interesting areas of computational biology, such as neural simulations and systems biology. We will devote a chapter to such topics, each of which would take a full book or more to fully explore. When you finish this book you will have an overview and experience with key problems in some of the most active areas of computational biology.

    Who this book is for

    This book is for students and professionals in biology and related disciplines who want to use computers to help process and understand their information.

    This book is also for people with established computer skills who want to get a broader view of what they can do with those skills, and learn about some marvelous applications in the biological sciences. Of course, it is also for people who are just starting out in both areas.

    It assumes no previous programming experience, starting at the Hello, World level for the Java programming language, but people who already know some Java can safely skip the introductory material.

    People who took some science in high school should be ready to go on this book. If you know a little bit more, for example a little about genes and proteins and calculus, some parts of the book will be clearer.

    What's different about this book

    It's about more than programming

    Every useful program is about more than software. It's also about finding friends, or predicting movie preferences, or in the case of computational biology modeling neural behavior or understanding relations between genes.

    Most programming books focus solely on program structure and in the process trivialize the things the programs are about. In this book we accept a little less organization on the software side, but in return we try hard to make each program we write about something which is evidently interesting and important.

    I believe this approach makes this a superior way to learn programming even if you have only a passing interest in biology. If you are aiming to be a physicist or even a software engineer, learning from this book rather than another will give you greater respect and appreciation for the things you can do with software.

    It's about scientific computing

    This book lays a foundation for scientific programming in any discipline. We learn aspects of software that are basic to many scientific applications, like numerical programming and random number utilities, and we discuss a number of software packages which scientific programmers will find useful.

    We also learn about some fundamental algorithms which are crucial in computational biology and in other disciplines as well.

    It's about how to do things

    I tried hard to include everything you need to know to accomplish useful work. Like every programming book, it teaches you about the language and the structure of programs. It also fills in the many pieces which generally lie between writing a solid program and solving the problem.

    These days the first step in solving a computational problem is often to find someone else who has already done it, and download their solution. We talk about many useful software packages and other resources you can download and use in your work.

    We also discuss testing, documentation, license agreements, how to release your work. We discuss finding and fixing bottlenecks in your program. My goal for people who finish this material is that they should be ready to get things done.

    License notes

    This ebook is licensed for your personal enjoyment only. This ebook may not be re-sold or given away to other people. If you would like to share this book with another person, please purchase an additional copy for each recipient. If you’re reading this book and did not purchase it, or it was not purchased for your use only, then please return to your favorite ebook retailer and purchase your own copy. Thank you for respecting the hard work of this author.

    Computational biology

    Biologists are confronted by a tidal wave of information. Unfortunately, few of them know how to swim.

    Economist, June 24, 1999.

    Biology is becoming an information intensive discipline and a systems oriented discipline.

    The best known example of information intensive biology is the human genome project. The human genome, printed out, would fill thousands of volumes. There is no way to make sense of it without computational tools. Here are a few of the ways computer programs help.

    Computers store and retrieve sequence information. You can pick some gene, say BRCA2, one of the breast cancer susceptibility genes, and quickly get its DNA sequence from any internet connected computer. This alone represents a large investment in technology and an important tool for scientists.

    Genes, the parts of the DNA that encode proteins, actually make up a small fraction of the DNA. Programs help distinguish genes from the much larger volume of noncoding DNA. There are systematic statistical differences between the genes and the noncoding DNA, and computer programs can learn those differences and attempt to classify unknown segments of DNA.

    Biologists reconstruct evolutionary trees by measuring the similarity of genes across species. They use computer programs to calculate similarity.

    Biologists are very interested in what parts of the genome are preserved across many species, and which parts are changing, and computer programs help search for this information. A highly preserved section may indicate a critical biological function. The human gene KCNJ3, which codes for a protein used in a potassium channel, critical for neural signalling and other functions, has a 96% match with the comparable gene in the Coelacanth, a primitive fish. Rapidly changing genes are also interesting. Human genes relating to digestion and brain structure have been evolving rapidly in recent history.

    Computational tools are very important in other areas of biology also. For example, the three dimensional structure of a protein or RNA molecule is critical to its function. Predicting the three dimensional structure from the sequence information is an important unsolved problem in biology. Many groups are working on computer programs to help make these predictions.

    Scientists use computer programs to simulate neural systems. A simulation tests our understanding of how a single cell works. If we write a computer program modeling a neuron, and the program predicts the same behavior under a specific set of conditions that the bench scientist observes in the lab, then in some sense we can claim to understand it. This line of work goes back to Hodgkin and Huxley, who used pencils and paper and desk calculators to work out a computational model of the giant axon of the squid. Scientists are working on programs to simulate large systems of neurons to try to understand how the properties of brains are built out of the properties of neurons.

    Systems biology is another area where computers help biologists. Cells have very complex regulatory networks for their genes and proteins. An external signal may activate a protein that binds to a gene regulator so that another protein is synthesized, and so on and on. Systems biology studies these interlocking systems, in effect studying how individual molecular interactions lead to the behavior of the cell as a whole.

    Ecology and population biology often use computer simulations to test understanding of natural systems, and to predict their future course. Many of these systems are too big for lab experiments. Computer models also have great practical value, for example in predicting how different actions may affect wildlife populations.

    Many lab instruments include dedicated computers which help run them, interpret the results and communicate them to the scientist or another computer. These are called embedded systems.

    This is a sampling of the most important areas where computers are helping biologists. These are the kinds of problems this book prepares you to tackle.

    Programming is something you learn by doing. This book includes many exercises and projects, and to master the subject you should spend more time doing those than you do reading the book.

    Most chapters in this book have a scientific topic and a programming topic or topics. The chapter title describes the scientific subject and the subtitle the programming subjects.

    This book has sample code at https://github.com/pgarst/JavaBio. It includes all the code used in this book in downloadable form, which you may want to use as a starting point for your own projects. It also includes solutions for many of the exercises and projects, but you should always attempt your own solution before peeking.

    You should read this book at the computer for the most part. On almost every page you will want to check the online documentation or try a small example of a new feature of Java.

    Programming involves many picky details, but avoid getting lost in them. The essence of programming isn’t making sure you get all the semicolons in the right places, but in feeling out the structure of problems, and finding robust, usable ways to solve them.

    You can learn Java from this book without any particular background, but a solid high school level science background will help. Each chapter includes brief notes on the science used in the chapter, with pointers to more detailed information. There are occasional advanced notes which put some topic in context for those with a somewhat deeper background.

    This book should be excellent background for more advanced work in computer science or computational biology. Compared to other Java books this one puts more emphasis on topics useful in scientific and technical computing, such as numerical methods, simulations, random number facilities and available repositories of scientific software.

    A program is a list of instructions telling the computer precisely what to do, written in a computer language. Java is one among many languages you can use to write that list of instructions. Each language has very specific rules and structures.

    Here is one statement in Java:

    System.out.println(Darwin rules!);

    Here is the same instruction in Python:

    print Darwin rules!

    In Perl:

    print Darwin rules!\n;

    In C++ (pronounced c plus plus):

    cout << Darwin rules! << endl;

    Why bother with all these languages? Why doesn't everybody just use one language? Most people would agree that there are more computer languages than are strictly necessary, but like animals they evolve and multiply. Dennis Ritchie designed a programming language called C at Bell Labs in 1972, which was a successor to a programming language called B. It was well designed and became widely used. Starting in 1979 Bjarne Stroustrup at Bell Labs thought about C, what worked well and what could be improved, and C++ was born. In 1995 James Gosling at Sun Microsystems did this again. He thought about C++ and how it could be improved, and developed Java.

    The process still continues. Scala is a more powerful successor language to Java which is generating a lot of buzz. In ten years people in your position might be reading Scala for Bioinformatics, or might be reading about some other language.

    Java programs are platform independent: they run on Windows, Macs and Linux machines, and others as well. After chapter 2, where we walk you through installing the tools you will need to program in Java, the rest of the book works equally well for different platforms with only very minor customizations. The other major computer languages used for biology have followed this example, but for earlier languages making programs available on different machines could be a huge pain in the neck.

    Different languages have different strengths. Some produce faster programs; some are easier to use to quickly throw a small program together; some include resources for building big software systems. There are special purpose languages designed for writing programs that might run on thousands of computers at once.

    There are three languages in wide use in biology: Java, Python and Perl. Perl was adopted early on, and so there is a lot of biological material in that language, but both Java and Python are better languages, and I’d recommend using one of them if you have a choice. Your choice might depend on what people in your lab use, or the language of some application you want to work with.

    Biologists also use many other languages, albeit less commonly.

    Let's list some of the good and bad points of Java relative to Python. The relative merits of computer languages can be a religious issue, and some people will disagree with these claims.

    Java is more widely used in software engineering than Python, so it would be a good choice if you are thinking of studying more computer science.

    In many cases Java programs run significantly faster than Python programs (but slower than C++ programs).

    Java is a strongly typed language. This means you have some extra work up front to more carefully specify your program elements, but in return your program should be more reliable and easier to maintain. My view is that this is a disadvantage for very small programs but a major advantage for large programs, especially if several people work on them.

    Java is well suited to large, complex applications, including web based programs, and programs that have many contributors.

    Java has more overhead. The smallest Java program is bigger and more complex than the smallest Perl or Python program.

    In some cases these considerations will incline you for or against Java, but for most biological software they are not decisive, and either would serve well.

    My opinion is that using a well structured, highly disciplined language like Java pays substantial dividends in producing higher quality code, and for a beginning programmer also provides a much firmer understanding of how software works.

    Learning to solve problems with programs is more than learning about Java or some other language. For example, suppose we want to write a program to align two protein sequences. We may have the amino acid sequence for a human protein and for a corresponding mouse protein. We want to find the corresponding parts of the two proteins, and places where sections have been inserted or deleted in one relative to the other. We will write this program later in the book.

    There are three main elements we need to solve this problem. One is the programming language, Java in our case.

    The second main element is data structures: we need effective ways to represent the information. For this problem, we need to represent protein sequences, and we need to represent an alignment between two sequences. What would that look like? Perhaps a list of units, each of which might say something like a specific section of the mouse protein is deleted in the human protein?

    We will see that Java puts a number of effective data structures at our disposal, with names like sets, trees, hash tables, and so on. These structures are available in any modern computer language, and learning how to use them is more general than learning Java. A good programmer will spot the known underlying structure in her data representation problem, and work to adapt and modify it for the specific case.

    The third element is algorithms. Given two sequences, how do we go about finding the sections we want to align? What steps do we take? What do we do first? Just as there are common data structures which are the common property of programmers, there are common algorithms which provide useful ways to approach many problems.

    As you work on programming you will find that thinking of data structures and algorithms as elements separate from the language will help you organize your thoughts, and will also make it easier for you to pick up other languages in the future.

    Here's what's in store for you.

    Chapter 2 helps you set up a Java development environment. You will download and install a number of tools, and at the end of the chapter you'll be able to write, debug and run a very small Java program.

    Chapter 3 covers the basics of the language: what a Java program looks like, and how it handles numbers, strings and other basic elements. By the end of this chapter you'll be able to write a program that summarizes experimental data and prints summary statistics.

    Chapter 4 introduces classes and objects, which are the real foundation of Java. The chapter is based on a project which simulates the regulation of a single gene.

    Chapter 5 provides more practice with classes and objects, and also teaches you how to read and write files and use data structures to organize your information. This chapter develops a simulation of gene regulation networks.

    Chapter 6 is about inheritance, but in the Java sense, not the biological sense. This is all about how different classes are related to each other, and how you can write code for the common parts of different things just once. In this chapter we write a model of a neural membrane.

    Chapter 7 is about biological sequences and how they are represented in files. We will download and install software to help work with sequences.

    Chapter 8 discusses web services: how you can write programs to pull biological sequence data and other information off of servers from around the world.

    Chapter 9 provides more practice with files and data structures, and also teaches you how to use random number tools. This chapter is based on a typing monkey program which uses statistical constraints on random sequences. If the monkey is using a regular English typewriter you might get Hamlet; if it is using a four letter typewriter you might get Shakespeare's genome.

    Chapter 10 works with sequence alignment algorithms. We will learn how to analyze algorithms and make them more efficient.

    Chapter 11 teaches you how to build graphical user interfaces, the interfaces you are used to on most programs, with buttons, text fields, menus and so on. We develop a genetic drift simulation which shows how gene frequencies vary randomly in a population over time.

    Chapter 12 discusses threading, a technique which allows your program to do several things at once. We apply it to a simulation of bacterial chemotaxis.

    Chapter 13 covers reflection, which allows your Java program to examine itself. We use it to build a plugin for the ImageJ image processing system.

    Chapter 14 provides a selection of projects you can try to explore other areas of computational biology. These are more ambitious and open ended than the projects in previous chapters.

    The appendix provides scientific reference information used throughout the book.

    Your Java environment

    When you write a Java program you create the source code, consisting of one or more text files containing Java language instructions. Most of this book teaches you what to put in those files to solve your problem as efficiently and effectively as possible.

    In this chapter we mostly ignore the source code and focus on the larger programming context. What steps are required to actually develop and run your program? What tools are available to help you? We will outline the tasks required over the life of the program, and recommend tools for tackling them.

    By the end of this chapter you will have your computer set up to run the sample programs and write your own programs.

    We'll start by installing the main tool we will use throughout this book.

    You have many, many choices for the tools you use to work with Java. This section guides you through downloading and installing NetBeans, a free and good IDE. Later we discuss some of the alternatives to consider. The NetBeans web site at https://www.netbeans.org has much more information about the system.

    The main criterion in choosing a development environment is whether you find it to be productive: whether it fits your working style and makes software development as easy as possible. There are also more down to earth criteria, and we'll list the reasons we chose NetBeans for this book so you can see the sorts of considerations that come into play.

    NetBeans works on all three major platforms: Windows, Macs and Linux.

    It is relatively easy to install. You can download a single file with everything you need.

    It is free.

    An IDE like NetBeans is an easier way to get started than separate editors and debuggers.

    NetBeans includes a GUI builder.

    A GUI is a graphical user interface, the set of buttons, fields, menus and so on a program presents to the user, if it is not one you run from the command line. You will get practice designing and building one later in this book. A GUI builder allows you to construct your interface by dragging a button to the place you want it in your interface, for example. Without this ability coding all the details about which controls go where can be quite tedious.

    To follow the steps in this section you must have permission to install applications on your computer. If you don't, for example if you are using a computer owned by a school or college, you will need to talk to the system administrator about installing Java development tools if they are not already available.

    The most convenient download location for NetBeans and basic Java tools is http://www.oracle.com/technetwork/java/javase/downloads/index.html. If you download the JDK and NetBeans bundle you will have current versions of everything you need to write and run Java programs. JDK stands for Java Development Kit. If you have an older version of Java installed (and this includes you if you have a Mac with the standard Apple version of Java) then I recommend getting the bundle.

    You can also download NetBeans from http://www.netbeans.org/downloads/index.html if you are sure you already have a current Java system on your machine. The web site should give you a version suitable for your machine. The examples and images in this book are based on NetBeans 7.4, but by the time you read this there may be a more recent version available.

    Note that download locations and other installation details may change as new versions of these systems become available. If you do not find what you need at these addresses, look at the main NetBeans web site, or check this book's web site for recent changes.

    Run the installer after the download is complete. Detailed instructions are available at http://netbeans.org/community/releases/80/install.html, or look in the community section of the NetBeans web site for a more recent release.

    Before you start writing programs and using data we need to talk about where to put everything. This includes all the programs and data available for download from the book’s web site, and also the programs you will write. Here is one suggestion for how to organize things. The terminology differs somewhat on different platforms, but the plan is the same.

    Download the JavaBio.zip file from the book’s source repository at https://github.com/pgarst/JavaBio and extract all the files into some convenient location. This will create a new JavaBio directory (or folder, for those on Windows) to contain all your work related to this book. Inside you will find a directory called Programs containing all the sample programs for the book. There will be another directory called MyWork, and inside that are directories called Chapter2, Chapter3 and so on. These are empty containers for all the great work you are going to do while reading this book.

    There are corresponding directories with names like Programs/Chapter2 which contain the completed programs we use in this book.

    Here is approximately how this area is laid out:

    JavaBio/

    Data/                  Data for use in the projects

    MyWork/

    Chapter2/      Put your chapter 2 work here

    ...

    Programs/

    Chapter2/      Sample code for chapter 2

    ...

    Solutions/

    Tools/

    Many of the illustrations in this book are in color, and you will have a better experience on a compatible device.

    You will have the best reading experience if your ebook reader displays full lines of the code samples without wrapping. The code samples use up to 59 characters per line, as in the following, which should fit well for anything but quite small readers or quite large font sizes.

    1234567890123456789012345678901234567890123456789012345678X

    If you can see the whole line without wrapping, including the X at the end, then you can skip to the next section. The code in the book is sometimes a condensed version of the code in the download.

    If you do not see the whole line, consider these options:

    If you are using a small device, like a phone, try using a larger device or reading it on your computer.

    Switch from portrait to landscape mode.

    If you have a double page display, switch to a single page display. Make it wider if you can.

    Use a smaller font size.

    Read the code in NetBeans rather than in the ebook.

    Time for your first program! We’ll follow the same basic steps to start a new program many times.

    Start NetBeans following the appropriate steps for your platform. On Mac and Windows you should have a desktop icon after installation. On Linux you can just type netbeans in a command window.

    Once you have been through creating and running a program once a lot of the dialogs and boiler plate material will be familiar, and in our future discussions we'll skip that stuff and just look at the new bits. For this first outing, though, we'll look at all the details and explain what they mean. For later projects you can look back here to remind yourself of the basic steps; or if you have done this before you may want to skim this part.

    NetBeans organizes your work into projects. All the elements related to one program will go into one project. Later we will see how to share material between projects.

    Choose the File menu, and then choose the New Project item. You can also use the New Project icon in the toolbar; the result will be the same. A new project wizard like Figure 2-1 will open, although the details of the look and the options available will depend on your system and NetBeans version.

    Figure 2-1. New project dialog

    Choose Java as the category in the first column, and Java Application as the project type in the second column, as shown in Figure 2-1. Then push the Next button to get the dialog shown in Figure 2-2.

    Figure 2-2. Choose project name and location

    On the next screen, enter Darwin in the project name box. Enter the place you want to put this program in the project location box. The location shown in Figure 2-2 is consistent with the suggested layout above, for my own Linux machine with the root of the JavaBio material in the directory /windows/C/jbook3. It will be a little different for you, depending on your computer system and where you chose to put the downloaded material.

    After filling in those two fields you can let the other elements of the dialog take the default values.

    Click the Finish button. NetBeans will give you a file with an empty Java program: that is, a program that does nothing.

    After clicking the finish button the NetBeans window should look something like Figure 2-3.

    Figure 2-3. The NetBeans interface

    The section of the interface marked 1 is the toolbar. You can use the icons there to quickly access some of the most frequently used actions. Run your mouse over the icons to see what they do.

    Section 2 shows the structure of your projects. There is just one project now, called Darwin, and at the innermost level just one source file called Darwin.java. Later our projects will be more complex.

    Section 3 is an editor. In Figure 2-3 it shows the source code file generated by NetBeans. You will use the editor here to modify the file, and in future projects to write all your programs.

    Section 4 shows the logical structure of your program elements. Currently the only thing to show is the main method. In more complex programs this panel will help you navigate in your program.

    As you use NetBeans you will become familiar with the other elements of the interface. It is a complex tool, but your work will be much more efficient if you learn to use it well.

    Now look at the complete program listing shown in Listing 2-1, taken from the NetBeans interface in Figure 2-3.

    Listing 2-1. Darwin/src/darwin/Darwin.java

    /*

    * To change this template, choose Tools | Templates

    * and open the template in the editor.

    */

    package darwin;

    /**

    *

    * @author peterg

    */

    public class Darwin {

        /**

         * @param args the command line arguments

         */

        public static void main(String[] args) {

            // TODO code application logic here

            }

        }

    Observe that NetBeans colors different parts of the file differently. The grayed out parts, the TODO line and some other parts of the file, are comments. These are just for your information; the compiler ignores them. Any time you want to make a note to yourself you can use one.

    The boilerplate file produced by NetBeans has four comments. The first is part of the file template NetBeans uses, and as it says you can change it if you wish. You could put a copyright notice there, for example.

    The next two comments, which begin with /**, are part of the Javadoc documentation system which we will discuss in chapter 6.

    The final TODO comment is in the section of code called the main method. Every Java application program you write will have one. We have discussed how a program is a list of instructions. When you run the program, the Java virtual machine looks in the braces of the main method and starts executing the instructions it finds there. Because there are no instructions there (just a comment, which it ignores) this program does nothing yet.

    The tab which shows this file is a text editor. Use it to enter a new line, replacing the TODO comment line:

    System.out.println(Darwin rules!);

    As you type you'll notice NetBeans providing suggestions for things you might want to enter, and you'll notice a red error symbol on the left until you finish the line and type the semicolon. Java treats upper and lower case as different, so be sure to match the case when you enter this.

    Click the run icon, the green triangle in the row of icons on top of the window; the tip is Run Main Project.

    You'll see the results in the Output window below the editor window.

    Congratulations! You're a programmer.

    We'll shortly discuss the other parts of this file, the package line and the public class line. For now you can ignore them.

    This program is a break with tradition. The first program in a new language usually prints Hello, world, although I know one depressive programmer who starts each new language with a goodbye, world program. James Iry claims (Iry) that in 1801 Joseph Marie Jacquard used his punch card controlled loom, one of the early precursors of modern computers, to weave hello, world into a tapestry, but I think he made that up.

    See http://en.wikipedia.org/wiki/Hello_world_program for more about this tradition.

    Think about the underlying steps here.

    NetBeans created a text file for us, containing Java source code.

    We edited the source code to add our instructions. NetBeans checks the code as we write, and flags any problems it detects.

    When we pushed the run button NetBeans started the Java virtual machine, which we'll discuss below, and used it to run our program instructions.

    Thus our program ran and printed its output.

    Look through the files created by NetBeans in your project location. There are lots, some of which we'll explore in more detail. The important one for now is Darwin/src/darwin/Darwin.java, which is the source file for your program. Java source files always use the .java extension.

    This basic approach will take us through our initial study of Java. We will create new projects, put some Java instructions into the main method, and let it rip.

    Let's step back and think about what just happened.

    Your computer does not know how to run Java directly. The stuff the machine understands is called variously object code, machine code or binary code, in contrast to the source code you originally wrote, and it makes very little sense to us. Your original program might have included an algebraic expression like

    a*x + b

    (where * means multiplication.) Before the computer can find the value of this expression in your program it needs to understand it as a series of individual operations expressed in machine code. The low level instructions might be something like

    movl -20(%rbp), %eax

    imull -24(%rbp), %eax

    addl -28(%rbp), %eax

    This is actually assembly language, which is one step up from machine code.

    The first programmers wrote code like this, but Java and other higher level languages allow you to write something much closer to a human readable description of what you want the computer to do.

    Different languages take different approaches to bridging the gap between the code you write and the object code the computer runs.

    Some languages are compiled. Examples are C and C++. A program called a compiler translates your instructions into the low level code that the machine actually runs. Using a compiled language involves these steps:

    Write program instructions in your chosen language.

    Run the compiler to translate your source code to object code.

    Your computer can then run the object code directly.

    There are advantages and disadvantages to compiled languages. The programs can be fast, and after compilation is done running the program is simple. On the other hand, the object code is not portable. If you want people to run your program on Windows, Mac and Linux machines, for example, you have to compile it on each of those platforms, and then distribute separate copies of the object code to people who use the different platforms.

    There is another class of languages called interpreted languages. Python and Perl are two interpreted languages widely used in biology. For these languages you run a program called an interpreter which reads your program source code and makes the computer do what it says, without translating it into object code first. This is simple: just write your code, start the interpreter and run it.

    There are also advantages and disadvantages to interpreted languages. Programs written in interpreted languages may be slower than those written in compiled languages. For example, when you use an algebraic expression in an interpreted language the interpreter needs to figure out the structure of the expression every time you run the program; in a compiled language the compiler would do it once, before the first time you run the program. In most cases computers are so fast that speed isn't a problem, but there are a few cases in which interpreted languages are too slow. One major advantage of interpreted languages is that they are portable. You can send the same program out to your users without caring what kind of machine they use. This assumes that there is an interpreter for the language available for that machine, but for the widely used languages we're thinking about that won't be a problem.

    Java is halfway between these two groups. After writing your Java source code, you use the Java compiler to translate it into a form called byte code (sometimes called object code, even though it does not run directly on a machine as compiled language object code does.) Then you use an interpreter, called the Java virtual machine or JVM, to run the byte code. In return for the extra complexity of using both a compiler and an interpreter you get the main advantages of each group: a Java program is generally almost as fast as a compiled program, but it is still portable between different machines.

    The name Java virtual machine tells you how to think about this arrangement. Java byte code is the machine language not for a real machine, like a Linux x86 computer, but for an abstract machine, a virtual machine, which is created by the JVM program. This abstract machine can sit on top of different real machines, and thus provide portability.

    The bird's eye view of software development goes like this. You have a problem to solve, and you come up with an approach that you think will work.

    You write Java source code to implement your solution. You could use whatever program you use for writing text on your computer, perhaps Microsoft Word or Open Office, but you really don't want to do that. A good programmer's editor provides a lot of help by showing you the structure of your program as you type, finding matching brackets, and helping in many other ways.

    Next you run the Java compiler to translate your source code into byte code. Most of the time this will fail at first: the compiler will find and report problems with your program. Go back to the editor and fix these until you can get

    Enjoying the preview?
    Page 1 of 1