2024 Mar 7th – UQ PUG 6

Welcome to UQ Python User Group! Check out our general information for details about who we are and what we do.

Overview

Structure

  1. We start today by adding our names to the table below
  2. Add your questions to this page
  3. This month’s presentation
  4. Finally, we spend the rest of the session answering the questions you’ve brought!

Mailing list

If you would like to be on the mailing list and receive the latest PUG updates, please sign up here:

https://forms.office.com/r/6qvfFX0qGr

Feel free to send this link to anyone you think may benefit.

Training Resources

We offer Python training sessions and resources, you can find our introductory guide here.

Today’s Presentation

Today we looked at the popular plotting package matplotlib. You can find the Jupyter notebook we worked from on our GitHub page.

Introduce yourself

What’s your name? Where are you from? Why are you here?
Nick Wiggins UQ Library Here to help (and learn!)
Cameron West UQ Library / SMP To learn and help
Dennis SMP To learn and get help
Karen Fang UQ BEL To learn
Valentina UQ Library Here to say hello!
Theophilus Mensah QAAFI here to learn
Y Allo SMP Learn this sofware
Senn Oon UQ Library Placement student observing!
Jay UQ to learn python

Questions

If you have any Python questions you’d like to explore with the group, please put your question and name in the sections below.

If you think you can help, feel free to contribute to the answers section!

Question 1 - Tuple or List use - Nida

when to use tuple and when to use list?

## when we create a table using pandas DataFrame, we can create a tuple like below:
df = pd.DataFrame(columns=("number", "name", "major"))

## or a list like below:
df = pd.DataFrame(columns=["number", "name", "major"])

## so in what occassion are we using tuple & in what occassion are we using list? Thanks so much

Answers

  • When you need to ensure that the elements shouldn’t change. Tuples cannot be edited (immutable), you can’t change their elements, so only when this is necessary.
  • They can be faster, since they are a constant size – according to StackOverflow

Question 2 - matplotlib - Nida

I have tried matplotlib for making a single line graph, but I still don’t know how to use it for working with multiple data as follows

# I have these Optical Density in 750nm data that show the growth curve for each species
OD750_data = [["Species","8 Dec 2023","15 Dec 2023", "18 Dec 2023"],
            ["Nannochloropsis", 0.2615, 1.0385, 0.822],
            ["Phaeodactylum", 0.208, 0.603, 0.499],
            ["Tisochrysis", 0.1015, 0.135, 0.1265]]
table1 = tabulate(OD750_data)
print(table1)

table2 = tabulate(OD750_data,headers='firstrow')
print(table2)

# how do I turn this table into line graphs with 3 growth curves for 3 species?

Answers

  • Great question Nida. Some of the difficulty lies in the data shape, the easiest way to use data for these things is in long form. I’ve used a few steps below
    1. Turn it into a pandas dataframe
    2. Transform it into three columns and nine rows with pd.melt (long form). Run print(OD750_data) to see this
    3. Plot each line by subsetting OD750_data["Species"] ==. ```python import pandas as pd

I have these Optical Density in 750nm data that show the growth curve for each species

OD750_data = [[“Species”,8,15, 18], [“Nannochloropsis”, 0.2615, 1.0385, 0.822], [“Phaeodactylum”, 0.208, 0.603, 0.499], [“Tisochrysis”, 0.1015, 0.135, 0.1265]]

OD750_data = pd.DataFrame(columns = OD750_data[0], data = OD750_data[1:])

OD750_data = pd.melt(OD750_data, “Species”, var_name = “time”)

plt.plot(“time”, “value”, data = OD750_data[OD750_data[“Species”] == “Nannochloropsis”]) plt.plot(“time”, “value”, data = OD750_data[OD750_data[“Species”] == “Phaeodactylum”]) plt.plot(“time”, “value”, data = OD750_data[OD750_data[“Species”] == “Tisochrysis”])

plt.show()


### Question 3 - Question - Dennis
 Reserve[:,n+1] = Reserve[:,n].reshape(M,1) + (P - S).reshape(M,1)
 
 ValueError: could not broadcast input array from shape (1000,1) into shape (1000,)
 
 Comment: I am pretty sure the broadcast dimension matches. I just dont understand this ".reshape" thing.


```python
## Code for Q3
import numpy as np

#%%
def total_claim_amount(paths, lambda_parameter, mu_parameter, step_size):

    #Simulate the number of claims (N): Poisson distribution
    num_claims = np.random.poisson(lam = lambda_parameter*step_size, 
                                    size = paths)
    dime = np.max(num_claims) 
    
    #Simulate claim sizes (X): Exponential distribution
    # claim_sizes = np.random.exponential(scale = mu_parameter, 
    #                                     size = (paths, dime))
    # claim_sizes = np.random.pareto(a = mu_parameter, 
    #                                     size = (paths, dime))
    claim_sizes = np.random.weibull(a = mu_parameter, 
                                        size = (paths, dime))

    #check to create True (1) and False (0)
        #So, only the correct number of claims are counted
    check = np.arange(dime) < num_claims[:, None]

    #Calculate total claim amount for all paths
        #sum across the rows: axis = 1
        #total was a row vector: reshaped into column vector
    total = np.sum(claim_sizes * check, axis=1).reshape(paths, 1)

    #Calculate mean and standard deviation of total claim amounts
    mean_total = np.mean(total)
    std_total = np.std(total)
    
    # print("Mean total claim amount:", mean_total)
    # print("Standard deviation of total claim amount:", std_total)
    return total, mean_total, std_total

def premium(lambda_parameter, mu_parameter, step_size):
    p = lambda_parameter * mu_parameter * step_size
    return p

#%% Parameters
M = 10**3  #Number of paths
N = 360 #Number of time steps (days)
lambda_parameter = 12 #Poisson distribution parameter for claim number process
                            #1 claim month
mu_parameter = 1 #Exponential distribution parameter for claim size process
                    #mean claim size is $1

Reserve = np.zeros((M,N))
Reserve[:,0] = 100 * mu_parameter * np.ones((M,))

#Compute premium income
P = premium(lambda_parameter, mu_parameter, 1/N) * np.ones((M,1))

#%%
for n in range(N - 1):
    
    #Compute total claim amount process
    S,_,_ = total_claim_amount(M, lambda_parameter, mu_parameter, 1/N)
    
    #Compute reserve for the next time step
    Reserve[:,n+1] = Reserve[:,n].reshape(M,1) + (P - S).reshape(M,1)

Answers

  • Hi Dennis, might need a few more details about what you need here, but I can tell you why the reshaping isn’t working. Essentially, the .reshape method is trying to take your column Reserve[:,n] and turn it into a row. It works! Run Reserve[:,n].reshape(M,1) + (P - S).reshape(M,1) by itself to see. The problem is that your new row can’t be assigned to Reserve[:,n+1] because that is a column of Reserve. Do you need to reshape at all here?

  • In context, every column in Reserve matrix is a time step (eg: t=1, t=2 etc); and every row is just a different trial (trial 1, 2, 3 …). What I want to achieve is using the column t=n value to calculate column t=n+1 value. This is why I used a loop. And my intend to reshape is because I am making sure the dimensions of Left and Right are matching.

    This is why I am confused by the error message because I essentially have the correct dimension, but somehow I cant broadcast it correctly??? - Dennis

Question 4 - Example of Conditionals - Nida

Add more details here

## Code for Q4

Answers

# What are conditionals?

# 'if' statements. These are the control blocks for your code, they're logical filters.

a = 10
b = 20

print(a>b)

# Checks if a > b
if a > b:
    print("a is greater than b!")
    print("Everything that is indented will run")


    print("see!")
    
# elif statements only check if the previous statements failed! 
elif a == b:
    print("We also checked if they were the same. They are!")
    
elif a < 0:
    print("a is a negative number and less than b")

# else statements capture everything that failed. They don't have a condition
else:
    print("Everything failed. a must be greater than 0 and less than b")