10.50 + 9.25
19.75
This chapter introduces you to the technology we will use throughout the book. By technology, we mean two things:
In this chapter, we concentrate on the technology we use for the interactive version of the book. The interactive version allows you to run Python code as interactive notebooks in your web browser.
Either now, or later, you should also consider running the code on your own computer — see Section 4.10.
The chapter introduces Python and its packages, and then gives an example to introduce Python and the Jupyter Notebook. If you have not used Python before, the example notebook will get you started. The example also shows how we will be using notebooks through the rest of the book.
This version of the book uses the Python1 programming language to implement resampling algorithms.
Python is a programming language that can be used for many tasks. It is a popular language for teaching, but has also become standard in industry and academia. It is one of the most widely used programming languages in the world, and the most popular language for data science.
For many of the initial examples, we will also be using the Numpy2 package for Python. A Python package is a library of Python code and data. Numpy is a package that makes it easier to work with sequences of data values, such as sequences of numbers. These are typical in probability and statistics.
Later, we be using the Matplotlib3 package. This is the main Python package with code for producing plots, such as bar charts, histograms, and scatter plots. See the rest of the book for more details on these plots.
Still further on in the book, we will use more specialized libraries for data manipulation and analysis. Pandas4 is the standard Python package for loading data files and working with data tables. Scipy5 is a package that houses a wide range of numerical routines, including some simple statistical methods. The Statsmodels6 package has code for many more statistical procedures. We will find ourselves comparing the results of our own resampling algorithms to those in Scipy and Statsmodels.
It is crucial to our purpose that Python is a programming language and not a set of canned routines for “doing statistics”. It means that we can explore the ideas of probability and statistics using the language of Python to express those ideas. It also means that you, and we, and anyone else in the world, can write new code to share with others, so they can use our work, understand it, and improve it. This book is one example; we have written the Python code in this book as clearly as we can to make it easy to follow, and to explain the underlying ideas. We hope you will help us by testing what we have done and sending us suggestions for ways we could improve. Please see Section 6 for more information about how to do that.
Many of the chapters have sections with code for you to run, and experiment with. These sections contain Jupyter notebooks[^jupyter-nb]. Jupyter notebooks are interactive web pages that allow you to read, write and run Python code. We mark the start of each notebook in the text with a note-and-link heading like the one you see below. In the web edition of this book, you can click on the Download link in this heading to download the section as a notebook. You can also click on the Interact link to open the notebook in your web browser, using a system called JupyterLite. JupyterLite allows you to run the code in your browser, and experiment by making changes.
JupyterLite7 is a version of the Jupyter notebook and Python that will automatically download and run inside your web browser.
When you click on the “Interact” link, it will take you to a web address that has the effect of making your browser download the JupyterLite system, along with compatible versions of Python and its standard packages . The web page that opens allows you run to the Python code in the notebook inside your browser.
In the print version of the book, we point you to the web version to get the “Download” and “Interact” links.
At the end of this chapter, we explain how to run these notebooks on your own computer. In the next section you will see an example notebook; to start with, you might want to run this in your browser using the “Interact” link.
The next section contains a notebook called “Billie’s Bill”. If you are looking at the web edition, you will see links to interact with this notebook in your browser, or download it to your computer.
The text in this notebook section assumes you have opened the page as an interactive notebook on the web, or on your own computer (see Section 4.10).
A notebook can contain blocks of text — like this one — as well as code, and the results from running the code.
Jupyter Notebooks are made up of cells.
Jupyter cells can contain text or code.
Notebook text can have formatting, such as links.
For example, this sentence ends with a link to the earlier second edition of this book.
If you are in the interactive notebook interface (rather than reading this in the textbook), you will see the Jupyter menu bar near the top of the page, with headings “File”, “Edit” and so on.
In Jupyter, underneath the File … menu bar, by default, you may see a row of icons - the “Toolbar”.
In the Jupyter toolbar, you may see icons to run the current cell, among others.
To move from one cell to the next, you can click the run icon in the toolbar, but it is more efficient to press the Shift key, and press Enter (with Shift still held down). We will write this as Shift-Enter.
In this, our first notebook, we will be using Python to solve one of those difficult and troubling problems in life — working out the bill in a restaurant.
Alex and Billie are at a restaurant, getting ready to order. They do not have much money, so they are calculating the expected bill before they order.
Alex is thinking of having the fish for £10.50, and Billie is leaning towards the chicken, at £9.25. First they calculate their combined bill.
Below this text you see a code cell. It contains the Python code to calculate the total bill. Press Shift-Enter in the cell below, to see the total..
10.50 + 9.25
19.75
The contents of the cell above is Python code. As you would predict, Python understands numbers like 10.50
, and it understands +
between the numbers as an instruction to add the numbers.
When you press Shift-Enter, Python finds 10.50
, realizes it is a number, and stores that number somewhere in memory. It does the same thing for 9.25
, and then it runs the addition operation on these two numbers in memory, which gives the number 19.75.
Finally, Python sends the resulting number (19.75) back to the notebook for display. The notebook detects that Python sent back a value, and shows it to us.
This is exactly what a calculator would do.
Let us continue with the struggle that Alex and Billie are having with their bill.
They realize that they will also need to pay a tip.
They think it would be reasonable to leave a 15% tip. Now they need to multiply their total bill by 0.15, to get the tip. The bill is about £20, so they know that the tip will be about £3.
In Python *
means multiplication. This is the equivalent of the “×” key on a calculator.
What about this, for the correct calculation?
# The tip - with a nasty mistake.
10.50 + 9.25 * 0.15
11.8875
Oh dear, no, that isn’t doing the right calculation.
Python follows the normal rules of precedence with calculations. These rules tell us to do multiplication before addition.
See https://en.wikipedia.org/wiki/Order_of_operations for more detail on the standard rules.
In the case above the rules tell Python to first calculate 9.25 * 0.15
(to get 1.3875
) and then to add the result to 10.50
, giving 11.8875
.
We need to tell Python we want it to do the addition and then the multiplication. We do this with round brackets (parentheses):
There are three types of brackets in Python.
These are:
()
;[]
;{}
.Each type of bracket has a different meaning in Python. In the examples, play close to attention to the type of brackets we are using.
# The bill plus tip - mistake fixed.
10.50 + 9.25) * 0.15 (
2.9625
The obvious next step is to calculate the bill including the tip.
# The bill, including the tip
10.50 + 9.25 + (10.50 + 9.25) * 0.15
22.7125
At this stage we start to feel that we are doing too much typing. Notice that we had to type out 10.50 + 9.25
twice there. That is a little boring, but it also makes it easier to make mistakes. The more we have to type, the greater the chance we have to make a mistake.
To make things simpler, we would like to be able to store the result of the calculation 10.50 + 9.25
, and then re-use this value, to calculate the tip.
This is the role of variables. A variable is a value with a name.
Here is a variable:
# The cost of Alex's meal.
= 10.50 a
a
is a name we give to the value 10.50. You can read the line above as “The variable a
gets the value 10.50”. We can also talk of setting the variable. Here we are setting a
to equal 10.50.
Now, when we use a
in code, it refers to the value we gave it. For example, we can put a
on a line on its own, and Python will show us the value of a
:
# The value of a
a
10.5
We did not have to use the name a
— we can choose almost any name we like. For example, we could have chosen alex_meal
instead:
# The cost of Alex's meal.
# alex_meal gets the value 10.50
= 10.50 alex_meal
We often set variables like this, and then display the result, all in the same cell. We do this by first setting the variable, as above, and then, on the final line of the cell, we put the variable name on a line on its own, to ask Python to show us the value of the variable. Here we set billie_meal
to have the value 9.25, and then show the value of billie_meal
, all in the same cell.
# The cost of Billie's meal.
= 9.25
billie_meal # Show the value of billies_meal
billie_meal
9.25
Of course, here, we did not learn much, but we often set variable values with the results of a calculation. For example:
# The cost of both meals, before tip.
= 10.50 + 9.25
bill_before_tip # Show the value of both meals.
bill_before_tip
19.75
But wait — we can do better than typing in the calculation like this. We can use the values of our variables, instead of typing in the values again.
# The cost of both meals, before tip, using variables.
= alex_meal + billie_meal
bill_before_tip # Show the value of both meals.
bill_before_tip
19.75
We make the calculation clearer by writing the calculation this way — we are calculating the bill before the tip by adding the cost of Alex’s and Billie’s meal — and that’s what the code looks like. But this also allows us to change the variable value, and recalculate. For example, say Alex decided to go for the hummus plate, at £7.75. Now we can tell Python that we want alex_meal
to have the value 7.75 instead of 10.50:
# The new cost of Alex's meal.
# alex_meal gets the value 7.75
= 7.75
alex_meal # Show the value of alex_meal
alex_meal
7.75
Notice that alex_meal
now has a new value. It was 10.50, but now it is 7.75. We have reset the value of alex_meal
. In order to use the new value for alex_meal
, we must recalculate the bill before tip with exactly the same code as before:
# The new cost of both meals, before tip.
= alex_meal + billie_meal
bill_before_tip # Show the value of both meals.
bill_before_tip
17.0
Notice that, now we have rerun this calculation, we have reset the value for bill_before_tip
to the correct value corresponding to the new value for alex_meal
.
All that remains is to recalculate the bill plus tip, using the new value for the variable:
# The cost of both meals, after tip.
= bill_before_tip + bill_before_tip * 0.15
bill_after_tip # Show the value of both meals, after tip.
bill_after_tip
19.55
Now we are using variables with relevant names, the calculation looks right to our eye. The code expresses the calculation as we mean it: the bill after tip is equal to the bill before the tip, plus the bill before the tip times 0.15.
Now you have done some practice with the notebook, and with variables, you are ready for a new problem in probability and statistics, in the next chapter.
If you are running this notebook via the “Interact” button, you are running it using the JupyterLite system. JupyterLite keeps all its notebooks in your browser cache. This is a private and temporary store that the browser keeps somewhere on your system. This can be a problem if you find yourself clearing your browser cache for some reason, or if you start using another browser, that has a different cache. If you want make sure you have a copy of any changes you make to notebooks you ran with the “Interact” JupyterLite system, you might want to save a copy of the notebook outside the browser cache. To do this, look the pane to the left of the notebook for the name of the notebook. This name of this particular notebook is “billies_bill”, and you will see the notebook file in the left pane listed as billies_bill.ipynb
. If you want to save a copy to your computer, first use the “File” menu, and the “Save” option, to save your notebook. This saves the notebook to your browser’s private store (the cache). Next right-click on billies_bill.ipynb
in the left pane (see Figure 4.1), and choose “Download”. Save the file somewhere memorable on your computer. You can go back to the notebook by following the instructions at Section 4.10.
You can use this copy by re-uploading it to the “Interact” JupyterLite system. Go to the upload button near the top-left of the JupyterLite interface (see Figure 4.2). Select the .ipynb
(Jupyter notebook) file you want to upload; once done, you can open the notebook using the file listing panel to the left of the interface.
billies_bill
starts at Note 4.1.
Many people, including your humble authors, like to be able to run code examples on their own computers. This section explains how you can set up to run the notebooks on your own computer.
Once you have done this setup, you can use the “download” link that you will see for each notebook, to download the notebook to your machine. From there, you can open the notebook on Jupyter.
Most of the download links in this book will trigger a download of the notebook file. This is a file with extension .ipynb
, that you can open with Jupyter.
Later in the book, you will see examples where the notebook loads a data file. In that case, the download link for the notebook points to a .zip
file containing the notebook and the data file. Unzip the .zip
file to get the notebook and data file, and then open the resulting notebook in Jupyter.
You will need to install the Python language on your computer, and then install the following packages:
One way to install Python and the packages you need, is to install Python from the Python website8. Then use the Pip9 installer to install the packages you need.
To install the Python packages, first start a terminal application on your computer. To do this, you can use the Start key, “cmd” in Windows, or the Command key and space then “Terminal” on Mac. At the terminal prompt, type the following command:
Now you should be able to start the Jupyter notebook application. See the Jupyter documentation for how to start Jupyter. Open the notebook you downloaded for the chapter; you will now be able to run the code on your own computer, and experiment by making changes.
You can run any of the code notebooks in this textbook on your own machine by downloading the notebook, via the download link at the top of each notebook section, and then opening the resulting notebook in Jupyter.
Notice that the notebooks for download are in the same .ipynb
(Jupyter) format as the notebooks in the “Interact” system (see Section 4.9). That means you can upload the .ipynb
files from the download links into the Interact system and work with them there, and conversely, you can download the .ipynb
files from the Interact system and work with them in Jupyter.
4.5 Comments
Unlike a calculator, we can also put notes next to our calculations, to remind us what they are for. One way of doing this is to use a “comment”. You have already seen comments in the previous chapter.
A comment is some text that the computer will ignore. In Python, you can make a comment by starting a line with the
#
(hash) character. For example, the next cell is a code cell, but when you run it, it does not show any result. In this case, that is because the computer sees the#
at the beginning of the line, and then ignores the rest.Many of the code cells you see will have comments in them, to explain what the code is doing.
Practice writing comments for your own code. It is a very good habit to get into. You will find that experienced programmers write many comments on their code. They do not do this to show off, but because they have a lot of experience in reading code, and they know that comments make it much easier to read and understand code.