4  Introducing Python and the Jupyter notebook

This chapter introduces you to the technology we will use throughout the book. By technology, we mean two things:

Using Python on the web and on your computer

In this chapter, we concentrate on the technology we use for the interactive version of the book. The interactive version allows you to run Python code as interactive notebooks in your web browser.

Either now, or later, you should also consider running the code on your own computer — see Section 4.10.

The chapter introduces Python and its packages, and then gives an example to introduce Python and the Jupyter Notebook. If you have not used Python before, the example notebook will get you started. The example also shows how we will be using notebooks through the rest of the book.

4.1 Python and its packages

This version of the book uses the Python1 programming language to implement resampling algorithms.

Python is a programming language that can be used for many tasks. It is a popular language for teaching, but has also become standard in industry and academia. It is one of the most widely used programming languages in the world, and the most popular language for data science.

For many of the initial examples, we will also be using the Numpy2 package for Python. A Python package is a library of Python code and data. Numpy is a package that makes it easier to work with sequences of data values, such as sequences of numbers. These are typical in probability and statistics.

Later, we be using the Matplotlib3 package. This is the main Python package with code for producing plots, such as bar charts, histograms, and scatter plots. See the rest of the book for more details on these plots.

Still further on in the book, we will use more specialized libraries for data manipulation and analysis. Pandas4 is the standard Python package for loading data files and working with data tables. Scipy5 is a package that houses a wide range of numerical routines, including some simple statistical methods. The Statsmodels6 package has code for many more statistical procedures. We will find ourselves comparing the results of our own resampling algorithms to those in Scipy and Statsmodels.

It is very important that Python is a programming language and not a set of canned routines for “doing statistics”. It means that we can explore the ideas of probability and statistics using the language of Python to express those ideas. It also means that you, and we, and anyone else in the world, can write new code to share with others, so they can benefit from our work, understand it, and improve it. This book is one example; we have written the Python code in this book as clearly as we can to make it easy to follow, and to explain the underlying ideas. We hope you will help us by testing what we have done and sending us suggestions for ways we could improve. Please see Section 6 for more information about how to do that.

4.2 The environment

Many of the chapters have sections with code for you to run, and experiment with. These sections contain Jupyter notebooks[^jupyter-nb]. Jupyter notebooks are interactive web pages that allow you to read, write and run Python code. We mark the start of each notebook in the text with a note and link heading like the one you see below. In the web edition of this book, you can click on the Download link in this header to download the section as a notebook. You can also click on the Interact link in this header to open the notebook in your web browser, using a system called JupyterLite. You can run the code, and experiment by making changes.

About JupyterLite

JupyterLite7 is a version of the Jupyter notebook and Python that will automatically download and run inside your web browser.

When you click on the “Interact” link, it will take you to a web address that has the effect of making your browser download the JupyterLite system, along with compatible versions of Python and its standard packages . The web page that opens allows you run to the Python code in the notebook inside your browser.

In the print version of the book, we point you to the web version, to get the links.

At the end of this chapter, we explain how to run these notebooks on your own computer. In the next section you will see an example notebook; to start with, you might want to run this in your browser using the “Interact” link.

4.3 Getting started with the notebook

The next section contains a notebook called “Billie’s Bill”. If you are looking at the web edition, you will see links to interact with this notebook in your browser, or download it to your computer.

Note 4.1: Notebook: Billie’s Bill

The text in this notebook section assumes you have opened the page as an interactive notebook on the web, or on your own computer (see Section 4.10).

A notebook can contain blocks of text — like this one — as well as code, and the results from running the code.

Jupyter Notebooks are made up of cells.

Jupyter cells can contain text or code.

Notebook text can have formatting, such as links.

For example, this sentence ends with a link to the earlier second edition of this book.

If you are in the interactive notebook interface (rather than reading this in the textbook), you will see the Jupyter menu bar near the top of the page, with headings “File”, “Edit” and so on.

In Jupyter, underneath the File … menu bar, by default, you may see a row of icons - the “Toolbar”.

In the Jupyter toolbar, you may see icons to run the current cell, among others.

To move from one cell to the next, you can click the run icon in the toolbar, but it is more efficient to press the Shift key, and press Enter (with Shift still held down). We will write this as Shift-Enter.

In this, our first notebook, we will be using Python to solve one of those difficult and troubling problems in life — working out the bill in a restaurant.

4.4 The meal in question

Alex and Billie are at a restaurant, getting ready to order. They do not have much money, so they are calculating the expected bill before they order.

Alex is thinking of having the fish for £10.50, and Billie is leaning towards the chicken, at £9.25. First they calculate their combined bill.

Below this text you see a code cell. It contains the Python code to calculate the total bill. Press Shift-Enter in the cell below, to see the total..

10.50 + 9.25
19.75

The contents of the cell above is Python code. As you would predict, Python understands numbers like 10.50, and it understands + between the numbers as an instruction to add the numbers.

When you press Shift-Enter, Python finds 10.50, realizes it is a number, and stores that number somewhere in memory. It does the same thing for 9.25, and then it runs the addition operation on these two numbers in memory, which gives the number 19.75.

Finally, Python sends the resulting number (19.75) back to the notebook for display. The notebook detects that Python sent back a value, and shows it to us.

This is exactly what a calculator would do.

4.5 Comments

Unlike a calculator, we can also put notes next to our calculations, to remind us what they are for. One way of doing this is to use a “comment”. You have already seen comments in the previous chapter.

A comment is some text that the computer will ignore. In Python, you can make a comment by starting a line with the # (hash) character. For example, the next cell is a code cell, but when you run it, it does not show any result. In this case, that is because the computer sees the # at the beginning of the line, and then ignores the rest.

# This bit of text is for us to read, and the computer to ignore.

Many of the code cells you see will have comments in them, to explain what the code is doing.

Practice writing comments for your own code. It is a very good habit to get into. You will find that experienced programmers write many comments on their code. They do not do this to show off, but because they have a lot of experience in reading code, and they know that comments make it much easier to read and understand code.

4.6 More calculations

Let us continue with the struggle that Alex and Billie are having with their bill.

They realize that they will also need to pay a tip.

They think it would be reasonable to leave a 15% tip. Now they need to multiply their total bill by 0.15, to get the tip. The bill is about £20, so they know that the tip will be about £3.

In Python * means multiplication. This is the equivalent of the “×” key on a calculator.

What about this, for the correct calculation?

# The tip - with a nasty mistake.
10.50 + 9.25 * 0.15
11.8875

Oh dear, no, that isn’t doing the right calculation.

Python follows the normal rules of precedence with calculations. These rules tell us to do multiplication before addition.

See https://en.wikipedia.org/wiki/Order_of_operations for more detail on the standard rules.

In the case above the rules tell Python to first calculate 9.25 * 0.15 (to get 1.3875) and then to add the result to 10.50, giving 11.8875.

We need to tell Python we want it to do the addition and then the multiplication. We do this with round brackets (parentheses):

Note 4.2: Three types of brackets in Python

There are three types of brackets in Python.

These are:

  • round brackets or parentheses: ();
  • square brackets: [];
  • curly brackets: {}.

Each type of bracket has a different meaning in Python. In the examples, play close to attention to the type of brackets we are using.

# The bill plus tip - mistake fixed.
(10.50 + 9.25) * 0.15
2.9625

The obvious next step is to calculate the bill including the tip.

# The bill, including the tip
10.50 + 9.25 + (10.50 + 9.25) * 0.15
22.7125

At this stage we start to feel that we are doing too much typing. Notice that we had to type out 10.50 + 9.25 twice there. That is a little boring, but it also makes it easier to make mistakes. The more we have to type, the greater the chance we have to make a mistake.

4.7 Variables

To make things simpler, we would like to be able to store the result of the calculation 10.50 + 9.25, and then re-use this value, to calculate the tip.

This is the role of variables. A variable is a value with a name.

Here is a variable:

# The cost of Alex's meal.
a = 10.50

a is a name we give to the value 10.50. You can read the line above as “The variable a gets the value 10.50”. We can also talk of setting the variable. Here we are setting a to equal 10.50.

Now, when we use a in code, it refers to the value we gave it. For example, we can put a on a line on its own, and Python will show us the value of a:

# The value of a
a
10.5

We did not have to use the name a — we can choose almost any name we like. For example, we could have chosen alex_meal instead:

# The cost of Alex's meal.
# alex_meal gets the value 10.50
alex_meal = 10.50

We often set variables like this, and then display the result, all in the same cell. We do this by first setting the variable, as above, and then, on the final line of the cell, we put the variable name on a line on its own, to ask Python to show us the value of the variable. Here we set billie_meal to have the value 9.25, and then show the value of billie_meal, all in the same cell.

# The cost of Billie's meal.
billie_meal = 9.25
# Show the value of billies_meal
billie_meal
9.25

Of course, here, we did not learn much, but we often set variable values with the results of a calculation. For example:

# The cost of both meals, before tip.
bill_before_tip = 10.50 + 9.25
# Show the value of both meals.
bill_before_tip
19.75

But wait — we can do better than typing in the calculation like this. We can use the values of our variables, instead of typing in the values again.

# The cost of both meals, before tip, using variables.
bill_before_tip = alex_meal + billie_meal
# Show the value of both meals.
bill_before_tip
19.75

We make the calculation clearer by writing the calculation this way — we are calculating the bill before the tip by adding the cost of Alex’s and Billie’s meal — and that’s what the code looks like. But this also allows us to change the variable value, and recalculate. For example, say Alex decided to go for the hummus plate, at £7.75. Now we can tell Python that we want alex_meal to have the value 7.75 instead of 10.50:

# The new cost of Alex's meal.
# alex_meal gets the value 7.75
alex_meal = 7.75
# Show the value of alex_meal
alex_meal
7.75

Notice that alex_meal now has a new value. It was 10.50, but now it is 7.75. We have reset the value of alex_meal. In order to use the new value for alex_meal, we must recalculate the bill before tip with exactly the same code as before:

# The new cost of both meals, before tip.
bill_before_tip = alex_meal + billie_meal
# Show the value of both meals.
bill_before_tip
17.0

Notice that, now we have rerun this calculation, we have reset the value for bill_before_tip to the correct value corresponding to the new value for alex_meal.

All that remains is to recalculate the bill plus tip, using the new value for the variable:

# The cost of both meals, after tip.
bill_after_tip = bill_before_tip + bill_before_tip * 0.15
# Show the value of both meals, after tip.
bill_after_tip
19.55

Now we are using variables with relevant names, the calculation looks right to our eye. The code expresses the calculation as we mean it: the bill after tip is equal to the bill before the tip, plus the bill before the tip times 0.15.

4.8 And so, on

Now you have done some practice with the notebook, and with variables, you are ready for a new problem in probability and statistics, in the next chapter.

4.9 Saving your work

If you are running this notebook via the “Interact” button, you are running it using the JupyterLite system. Please bear in mind that your browser keeps all the notebooks you run in JupyterLite, in its browser cache — a private and temporary store that the browser maintains somewhere on your system. If you want to keep any changes you make to notebooks you have run with the “Interact” JupyterLite system, you might want to save a copy of the notebook outside the browser cache. To do this, look the pane to the left of the notebook for the name of the notebook. This name of this particular notebook is “billies_bill”, and you will see the notebook file in the left pane listed as billies_bill.ipynb. If you want to save a copy to your computer, first use the “File” menu, and the “Save” option, to save your notebook. This saves the notebook to your browser’s private store (the cache). Next right-click on billies_bill.ipynb in the left pane (see Figure 4.1), and choose “Download”. Save the file somewhere memorable on your computer. You can go back to the notebook by following the instructions at Section 4.10.

End of notebook: Billie’s Bill

billies_bill starts at Note 4.1.

Figure 4.1: Downloading files in JupyterLite

4.10 Running the code on your own computer

Many people, including your humble authors, like to be able to run code examples on their own computers. This section explains how you can set up to run the notebooks on your own computer.

Once you have done this setup, you can use the “download” link that you will see for each notebook, to download the notebook to your machine. From there, you can open the notebook on Jupyter.

You will need to install the Python language on your computer, and then install the following packages:

  • Numpy — to work with arrays;
  • Matplotlib - for plots;
  • Scipy - a collection of modules for scientific computing;
  • Pandas - for loading, saving and manipulating data tables;
  • Statsmodels - for traditional statistical analysis;
  • Jupyter - to run the Jupyter Notebook on your own computer.

One way to install Python and the packages you need, is to install Python from the Python website8. Then use the Pip9 installer to install the packages you need.

To install the Python packages, first start a terminal application on your computer. To do this, you can use the Start key, “cmd” in Windows, or the Command key and space then “Terminal” on Mac. At the terminal prompt, type the following command:

Now you should be able to start the Jupyter notebook application. See the Jupyter documentation for how to start Jupyter. Open the notebook you downloaded for the chapter; you will now be able to run the code on your own computer, and experiment by making changes.

You can run any of the code notebooks in this textbook on your own machine by downloading the notebook, via the download link at the top of each notebook section, and then opening the resulting notebook in Jupyter.


  1. https://www.python.org↩︎

  2. https://numpy.org↩︎

  3. https://matplotlib.org↩︎

  4. https://pandas.pydata.org↩︎

  5. https://scipy.org↩︎

  6. https://www.statsmodels.org↩︎

  7. https://jupyterlite.readthedocs.io↩︎

  8. https://www.python.org↩︎

  9. https://pip.pypa.io↩︎