Chapter 22 Reading and writing data-files

In this Chapter we will consider how to read and write data to and from various types of files. To keep things simple we will assume from now on that all of the files that we wish to access or create are in the same directory as our code.

Normally when learning Python one starts with the simplest file type (.txt) and then moves on to more sophisticated formats. Given the constraints of time, and that most data we are likely to be interested in will not be stored in plain text files, we did not consider this case in lectures. We include it here for completeness at the end of the Chapter.

22.1 CSV files

We have already explained the basic read and write commands for CSV files in lectures. If we had covered the dictionary data type in this module then we could have replaced the reader and writer commands with more powerful versions called DictReader and DictWriter. This kind of application is a great example of why the dictionary data type is so useful; details can be found in Chapter 13 of Sweigert’s Automate the Boring Stuff With Python.

Although (as the name suggests) a CSV file has the items in it separated by commas, it is possible to choose a different separator such as a tab to create a .tsv file. To do this we add the optional argument delimiter="\t" insider the csv.writer function which inserts tabs between entries.

22.2 Excel files

In lectures we covered the basics of reading and writing Excel files. Here we will focus on more advanced topics, mostly around how to change how the spreadsheet looks and behaves. These are not as important as the topics covered in lectures, but can come in useful.

We can change the font (and other aspects of the text style) using a new function from the module openpyxl.styles as in the following example.

import openpyxl as xl
from openpyxl.styles import Font

wb = xl.Workbook()
mysheet = wb.active
my_new_font = Font(name="Calibri", size=16, bold=True, italic=True)
mysheet["A1"].font = my_new_font
mysheet["A1"] = "Hello World"
wb.save("OurFirstSpreadsheet.xlsx")

The format should be fairly self-explanatory. All of the arguments in the Font command are optional, and there are many aspects that can be included here as you can see by typing help(Font) in the iPython console.

We can write Excel formulas into a cell by writing a string containing the relevant command (starting with an = sign just as we would in Excel itself). For example if we want to have the sum of cells A1:D5 in cell E6 of the sheet my_sheet then we would enter the line

my_sheet["E6"] = "=SUM(A1:D5)"

To change the height of a row or the width of a column is also easy. The example below changes the height of row 3 and the width of column C. The choice of values to use is best determined by experimentation.

import openpyxl as xl

wb = xl.Workbook()
mysheet = wb.active
mysheet.row_dimensions[3].height = 50
mysheet.column_dimensions["C"].width = 30

wb.save("OurFirstSpreadsheet.xlsx")

Merging and unmerging cells can be done as in the following pair of example (where the sheet being considered is labelled by the variable my_sheet:

my_sheet.merge_cells("B2:D6") \(\quad\quad\quad\) my_sheet.unmerge_cells("B2:D6")

In Excel it is possible to freeze some rows and columns to the left or top of the sheet that remain on screen even as the user scrolls around the worksheet. These areas are called freeze panes. Adding a line of the form

my_sheet.freeze_panes = "C2"

will free all the columns to the left of, and rows above, the given cell (in this case C2).

It is even possible to create an Excel chart (such as a bar chart, scatter plot, pie chart, or line graph) using openpyxl. This is a little more complicated as there are many options to set; details can be found in Chapter 13 of Sweigert’s Automate the Boring Stuff With Python.

22.3 Text files

To see how to work with text files, we will illustrate the basic concepts with a file called example.txt consisting of the five lines shown in Figure 22.1.

An example of a .txt file

Figure 22.1: An example of a .txt file

This is easy to create using the Notepad app in Windows.

The following code will import all of the contents of this file as a single string and print the result.

with open("example.txt") as f:
    result = f.read()
print(result)

This prints exactly what we saw on the previous slide: 5 lines of text. Here the open command opens the file, and is rather similar to the csv examples we saw in class.

With the file opened the read method copies the entire contents of the file as a single string, in this case to the string called result.

How does a string get printed as file lines? Well, our text file includes an invisible newline character \n at the end of each line. We will see that this causes a few complications.

We will often want to consider each line in our file separately. If we use the readlines method instead of read in our code above then we will get a list containing each line of the file as a separate string. Unfortunately when we do that the resulting list is

["10\n", "20\n", "30\n", "Hello World\n", "1, 2, 3, 4, 5\n"].

This is not what we expected! The invisible newline characters have appeared as part of each string.

To get a list as we would have hoped for we can use code as below.

def read_data(filename):
    """ makes a list containing the lines of the files as strings"""
    datavals = []
    with open(filename) as f:
        for line in f:
            datavals.append(line.strip())
    return datavals
results = read_data("example.txt")
print(results)

When we loop over an open file we run over the different lines in the file. Each of these is a string (as everything in a text file is regarded as a string), and we add each line as a new object to our list.

But we do not want the newline symbols, so we use the strip method to remove these. We get

["10", "20", "30", "Hello World", "1, 2, 3, 4, 5"].

If we had a file with one number per line, we could easily create a list of numbers by using the float function instead of the strip method to convert the strings to numbers. But in our example we see that it is more complicated, with a combination of numbers, words, and collections of numbers.

Writing to a text file is similar. We use the open function again, but now with an additional argument of w or a (indicating whether to or to the file). We then use the write method to add a new line to the file as illustrated below.

with open("output.txt", "w") as f:
    f.write("first line to add\n")

with open("output.txt", "a") as f:
    f.write("second line to add\n")

Note: We can only use the write method to add a string to a file. Also, if we do not include the newline symbol then the string will be added to the end of the final line in the file.