Data Structures

Today, we’re discussing section 5 of The Python Tutorial. The subsections are

Intro

The goal of this section is to get a handle on the different built-in data structures. This does not include pandas dataframes.

A data structure is a variable that contains other variables.

Different data structures have slightly different behaviour,

The main examples that we’ll cover:

Data structure Indexing Ordering Mutability
Lists list By position (0, 1, 2, … ) Ordered Mutable
Tuples tuple By position (0, 1, 2, … ) Ordered Immutable
Sets set Not possible Unordered Mutable
Dictionaries dict | By key (key1,key2,key3`, … ) Insertion-ordered Mutable

5.1 More on Lists

Lists, like all Python variables, come with a set of ‘methods’ or functions generic to all lists.

Let’s make an example list and look at a few:

fruits = ['orange', 'apple', 'pear', 'banana', 'kiwi', 'apple', 'banana']
fruits.count('apple')
2
fruits.count('tangerine')
0
fruits.index('banana')
3
fruits.index('banana', 4)  # Find next banana starting at position 4
6
fruits.reverse()
fruits
['banana', 'apple', 'kiwi', 'banana', 'pear', 'apple', 'orange']
fruits.append('grape')
fruits
['banana', 'apple', 'kiwi', 'banana', 'pear', 'apple', 'orange', 'grape']
fruits.sort()
fruits
['apple', 'apple', 'banana', 'banana', 'grape', 'kiwi', 'orange', 'pear']
fruits.pop()
'pear'
fruits
['apple', 'apple', 'banana', 'banana', 'grape', 'kiwi', 'orange']

5.1.3 List comprehensions

These are concise ways to create lists from other lists. They look like loops, but don’t work the same way.

one_to_ten = range(10)

squares = [x**2 for x in one_to_ten]
squares
[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]

That second line has the list comprehension. It makes a new list, populating each element with x**2, each time pulling x from one_to_ten.

You can add a conditional,

odd_squares = [x**2 for x in one_to_ten if x % 2 == 1]
odd_squares
[1, 9, 25, 49, 81]

You can also nest comprehensions, which gets quite confusing. This one combines x and y into a list if they’re different, making a bigger list of all the combinations.

[[x, y] for x in [1,2,3] for y in [3,1,4] if x != y]
[[1, 3], [1, 4], [2, 3], [2, 1], [2, 4], [3, 1], [3, 4]]

5.2 The del statement

To delete elements of a list, or indeed the whole list, you can use the del statement:

fruits
['apple', 'apple', 'banana', 'banana', 'grape', 'kiwi', 'orange']
del fruits[0]
fruits
['apple', 'banana', 'banana', 'grape', 'kiwi', 'orange']
del fruits[3:5]
fruits
['apple', 'banana', 'banana']
del fruits
fruits
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[22], line 1
----> 1 fruits

NameError: name 'fruits' is not defined

5.3 Tuples and Sequences

Tuples are closely related to lists. The main difference is that they are immutable - you can’t modify elements in-place, or append to them, or delete things. In that sense they are ‘constant’ but don’t be fooled: you can always overwrite them.

To make a tuple, don’t use any brackets (or uses parentheses):

veg = "carrot", "broccoli", "peas"
veg
('carrot', 'broccoli', 'peas')
veg = ("carrot", "broccoli", "peas")
veg
('carrot', 'broccoli', 'peas')

We index them like lists, but cannot modify the elements:

veg[0]
'carrot'
veg[0] = "potato"
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[27], line 1
----> 1 veg[0] = "potato"

TypeError: 'tuple' object does not support item assignment

Empty or one-element lists are a bit strange: “ugly, but effective”:

empty = ()
empty
()
singleton = "hello",    # <-- note trailing comma
singleton
('hello',)

5.4 Sets

Set’s are a bit different, they represent the mathematical concept of a set, which is an unordered collection of different things.

Use curly braces, {...}, to create sets.

bread = {"white", "brown", "white", "wholemeal", "multigrain", "wholemeal"}
bread
{'brown', 'multigrain', 'white', 'wholemeal'}

Notice that the order has changed and duplicates have been removed.

We can use the in and not in keywords to check membership (this works for lists and tuples too!)

"white" in bread
True
"apple" in bread
False

Sets have special logical operators. Here, a and b are both sets.

Operator Name Description
a - b Difference Letters in a but not in b
a \| b Union Letters in a or b (or both)
a & b Intersection Letters only in both
a ^ b Symmetric difference Letters in a or b but not both

Set comprehensions also exist:

a = {x for x in 'abracadabra' if x not in 'abc'}
a
{'d', 'r'}

5.5 Dictionaries

Dictionaries are formed by key: value pairs between curly braces:

en_fr = {"apple": "pomme", 
         "banana": "banane", 
         "cherry": "cerise"}

Index them by the keys:

en_fr["apple"]
'pomme'

Dictionary comprehensions exist:

{x: x**2 for x in (2,4,6)}
{2: 4, 4: 16, 6: 36}

5.6 Looping Techniques

For loops are useful for running code on each element in a data structure.

for bread_type in bread:
    print(bread_type)
multigrain
brown
wholemeal
white

Looping through dictionaries goes through the keys by default:

for en_fruit in en_fr:
    print(en_fruit)
apple
banana
cherry

so use .items() to loop through both:

for en, fr in en_fr.items():
    print(en)
    print(fr)
    print()
apple
pomme

banana
banane

cherry
cerise

Notice that we assign to two variables in the for loop.

Use enumerate() to get the index when you loop through a list or tuple:

for i, vegetable in enumerate(veg):
    print(i, vegetable)
0 carrot
1 broccoli
2 peas

Loop through two things at once with zip()

questions = ['name', 'quest', 'favorite color']
answers = ['lancelot', 'the holy grail', 'blue']

for q, a in zip(questions, answers):
    print(f'What is your {q}?  It is {a}.')
What is your name?  It is lancelot.
What is your quest?  It is the holy grail.
What is your favorite color?  It is blue.

5.7 More on Conditions

A few bits and pieces here.

  • The conditions used in while and if statements can contain any operators, not just comparisons.
  • Operators in and not in test membership. Operators is and not is test identity.
  • Comparisons can be chained: a < b == c tests whether a is less than b and moreover b == c
  • Comparisons can be combined with and and or and negated with not.
  • Operators and and or are short-circuit operators: they stop as soon as the outcome is known.

5.8 Comparing Sequences and Other Types

You can compare multiple data structures (rather than their elements), using lexographical ordering. If all items are the same, they are equal, and if they are different length, only the length of the shorter one is checked.

(1, 2, 3) < (1, 2, 4)
True
[1, 2, 3] < [1, 2, 4]
True
'ABC' < 'C' < 'Pascal' < 'Python'
True
(1, 2, 3, 4) < (1, 2, 4)
True
(1, 2)  < (1, 2, -1)
True
(1, 2, 3) == (1.0, 2.0, 3.0)
True
(1, 2, ('aa', 'ab')) < (1, 2, ('abc', 'a'), 4)
True