02: Python Basics continued, and Arrays and Lists#

Notebook Overview#

This notebook introduces fundamental Python concepts, focusing on variables, conditional statements, and basic algorithms. It then explores the differences between Python lists, NumPy arrays, and tuples, highlighting their properties, operations, and use cases. The notebook also covers indexing and slicing techniques for these data structures, providing practical examples to illustrate each concept.

## Import numpy
import numpy as np

Our first Python algorithm:#

Before we dive in, let’s clarify some key concepts:

What is an algorithm?
An algorithm is a step-by-step set of instructions or rules designed to solve a specific problem or perform a task. Algorithms are fundamental to programming and computer science, as they provide a clear method for processing data and making decisions.

What is algorithmic thinking?
Algorithmic thinking is the process of breaking down problems into logical, manageable steps that can be expressed as an algorithm. It involves identifying the sequence of actions needed to achieve a desired outcome and considering all possible scenarios.


Example Algorithm:
Let’s demonstrate algorithmic thinking by solving a simple problem:
Problem: Decide if a number a is even or odd (i.e., is a divisible by 2?).

## Set a equal to some number
a = 27
#################################
## Version 1 of the algorithm: ##
#################################


## We create a variable, evenorodd, which is the remainder after dividing a by 2:
evenorodd = a % 2 ## Or can type np.mod(a,2). Both work the same.

## If evenorodd equals 1, a is odd
if evenorodd == 1: ## The colon : is the equivalent of the "then" statement
    print('a is odd')
## Otherwise a is even:
else:
    print('a is even')
a is odd

Algorithms, Conditional Statements, and Operators#

In the previous cell, we learned how to use algorithms and conditional statements to solve a simple problem: determining if a number is even or odd.

  • Algorithm: We followed a step-by-step process to check if the variable a (which is 27) is even or odd.

  • Operators: We used the modulo operator % to compute the remainder when a is divided by 2. If the result (evenorodd) is 1, the number is odd; if it is 0, the number is even.

  • Conditional Statements: We used an if-else statement to check the value of evenorodd. If it equals 1, we print that a is odd; otherwise, we print that a is even.

This demonstrates how algorithms break problems into logical steps, how operators perform calculations, and how conditional statements allow our code to make decisions based on those calculations.

#################################
## Version 2 of the algorithm: ##
#################################

## Start off by assuming that a is even:
oddprintstatement = 'a is even'
print('first print statement')

## We create a variable, evenorodd, which is the remainder after dividing a by 2:
evenorodd = a % 2 ## Or can type np.mod(a,2). Both work the same.

## If evenorodd equals 1, a is odd
if evenorodd == 1: ## The colon : is the equivalent of the "then" statement
    oddprintstatement = 'a is odd'
    print('entered decision structure')

print(a)
print(oddprintstatement)
first print statement
entered decision structure
27
a is odd

Is Version 2 of the Algorithm Preferable?#

Version 2 of the algorithm starts by assuming that the number a is even and sets the variable oddprintstatement to 'a is even'. It then checks if a is actually odd by evaluating the remainder when a is divided by 2 (evenorodd = a % 2). If the remainder is 1, it updates oddprintstatement to 'a is odd'. Finally, it prints the value of a and the result.

Why might this version be considered better?

  • Clarity: By initializing the result variable (oddprintstatement) at the start, the code makes it clear what the default assumption is.

  • Single Output Statement: The result is printed once at the end, which can make the output logic cleaner, especially if more actions are needed after the decision.

  • Easier to Extend: If you want to add more logic after determining if a is even or odd, this structure is easier to build upon.

Why might it not be better?

  • Slightly More Complex: It introduces an extra variable and an initial assignment, which may not be necessary for such a simple task.

  • Less Direct: For very simple decisions, the straightforward if-else with immediate print statements (as in Version 1) can be easier to read.

In summary, Version 2 is more flexible and scalable for larger programs, but for very simple cases, Version 1 may be more concise.

Switching Gears: Arrays and Lists in Scientific Programming#

In Python, lists and arrays are fundamental tools for storing and manipulating collections of data. Understanding the differences between them is crucial, especially in scientific programming, where efficient data handling and computation are essential.

Why Do These Concepts Matter?#

  • Data Storage: Scientific problems often involve working with large datasets—measurements, simulation results, or experimental observations. Choosing the right data structure can make your code more efficient and easier to understand.

  • Performance: Arrays (especially those from the NumPy library) are optimized for numerical operations and can handle large-scale computations much faster than standard Python lists.

  • Functionality: Arrays support a wide range of mathematical operations directly, enabling concise and readable code for complex calculations.

Lists vs. Arrays: An Overview#

Similarities:

  • Both store collections of data (numbers, strings, etc.).

  • Both can be indexed and iterated through.

Key Differences:

  • Lists: Flexible, can store elements of different data types, but are slower for numerical computations.

  • Arrays (NumPy): Require elements to be of the same data type, but support fast, element-wise mathematical operations and are ideal for scientific computing.

In the following sections, we’ll explore these differences in detail and see why arrays are often the preferred choice in scientific programming.

## Difference #1: How we make them
## To make a list:
mylist = [5,2,3,4,5]
print(mylist)

## to make an array, we need the package numpy (which we imported above)
myarray = np.array([5,2,3,4,5])
print(myarray)
[5, 2, 3, 4, 5]
[5 2 3 4 5]
## Difference #2: What we can do perform on them is different.
## Example: You can divide an array by 3 and that divides each element by 3:
print(myarray/3)
[1.66666667 0.66666667 1.         1.33333333 1.66666667]
## This doesn't work with lists:
print(mylist/3)
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-25-7e59d3dfab24> in <module>
      1 ## This doesn't work with lists:
----> 2 print(mylist/3)

TypeError: unsupported operand type(s) for /: 'list' and 'int'

Moral of the Story#

  • You cannot perform elementwise mathematical operations directly on a Python list.

  • You can perform elementwise operations on a NumPy array, which makes arrays much more powerful and convenient for numerical computations.

Tip: Use arrays (from NumPy) when you need to do math with collections of numbers!

## Difference #3: Lists can contain elements with different data types, arrays cannot:
list2 = ['favoritelunch',5,3.2,3e-3]
print(list2)
['favoritelunch', 5, 3.2, 0.003]
## Try with an array:
array2 = np.array(['favoritelunch',5,3.2,3e-3])
print(array2)
['favoritelunch' '5' '3.2' '0.003']

Notice what happened with the array:#

Although we created the array with different data types (a string and several numbers), NumPy automatically converted all the elements to strings. This is because NumPy arrays require all elements to be of the same data type. When mixed types are provided, NumPy upcasts them to a common type—in this case, strings—to ensure consistency within the array.

Key takeaway:
NumPy arrays enforce a single data type for all elements, which can lead to automatic type conversion if you mix types when creating the array.

## Difference 4: We can add lists together to make a bigger list; adding arrays adds the elements (doesn't change size of array)
## Test it out:
## Make 2 lists:
list1 = [2,3,4]
list2 = [5,6,7]

## Add them together:
biggerlist = list1 + list2
print(biggerlist)
[2, 3, 4, 5, 6, 7]
## Do same thing with arrays:
array1 = np.array([2,3,4])
array2 = np.array([5,6,7])

biggerarray = array1 + array2
print(biggerarray)
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[13], line 2
      1 ## Do same thing with arrays:
----> 2 array1 = np.array([2,3,4])
      3 array2 = np.array([5,6,7])
      5 biggerarray = array1 + array2

NameError: name 'np' is not defined
## Difference 5: Appending to arrays is different than with lists
## Noticed above that we can append two lists together by adding them.
## We can't do this with arrays

## To make a bigger array, np.append:
appendedarray = np.append(array1,array2)
print(appendedarray)
[2 3 4 5 6 7]
## Difference 6: Arrays can be multidimensional, Lists are 1-dimensional
## So can append arrays together either as rows or columns

## to append arrays as rows explicitly:
np.r_[array1,array2]
array([2, 3, 4, 5, 6, 7])
## Append them column-wise:
np.c_[array1,array2]
array([[2, 5],
       [3, 6],
       [4, 7]])

In the last few cells, we explored the differences between Python lists and NumPy arrays, focusing on how they store and manipulate data. We learned that:

  • Lists are flexible and can hold elements of different data types, but do not support elementwise mathematical operations directly.

  • NumPy arrays require all elements to be of the same data type and allow efficient, elementwise operations, making them ideal for scientific computing.

  • Adding two lists concatenates them, while adding two arrays performs elementwise addition.

  • Arrays can be easily appended and reshaped, supporting multidimensional data, whereas lists are inherently one-dimensional.

  • We also saw how NumPy handles mixed data types by upcasting all elements to a common type (usually strings).

These concepts are fundamental for choosing the right data structure for different programming and data analysis tasks.

The last storage type we’ll talk about: Tuples#

Tuples are similar to lists and arrays in that they can store collections of items, but they have a unique property: they are immutable. This means that once a tuple is created, its elements cannot be changed, added, or removed.

The Oddity of Tuples#

  • Immutability: Unlike lists (which you can modify), tuples are fixed after creation. If you try to change an element, Python will raise an error.

  • Syntax: Tuples are created using parentheses (), while lists use square brackets [].

  • Use Cases: Because they can’t be changed, tuples are often used to store data that should remain constant throughout a program, such as coordinates, RGB color values, or fixed configuration options.

Do We Use Tuples in Scientific Programming?#

  • Rarely for Data Storage: In scientific programming, we usually work with large, mutable datasets that need to be updated or manipulated—tasks better suited for lists or NumPy arrays.

  • When Are Tuples Useful?

    • As keys in dictionaries (since lists and arrays can’t be used as dictionary keys).

    • For returning multiple values from a function.

    • For representing fixed-size, constant data (e.g., a point in 3D space).

Summary:
Tuples are less common in scientific programming compared to lists and arrays, but their immutability can be useful in situations where you want to ensure data does not change. For most numerical and data analysis tasks, lists and especially NumPy arrays are preferred due to their flexibility and powerful mathematical capabilities.

## Example: Change the first element of array1 to be 1 instead of 2:
print(array1)
[2 3 4]
array1[0]
2
## change the value of the first element:
array1[0] = 1
print(array1)
[1 3 4]
## With a tuple, we make them with parentheses (not brackets)
tuple1 = (2,3,4)
print(tuple1)
(2, 3, 4)
tuple1[0]
2
## Try and change the first element:
tuple1[0] = 1
print(tuple1)
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-40-b95b1785d8c1> in <module>
      1 ## Try and change the first element:
----> 2 tuple1[0] = 1
      3 print(tuple1)

TypeError: 'tuple' object does not support item assignment

Everyone’s Favorite: Indexing#

Indexing is the process of accessing individual elements within a collection—such as a list, array, or tuple—using their position (or “index”) in that collection.

Why Does Indexing Matter in Scientific Programming?#

  • Data Access: Scientific data is often stored in large collections (like arrays or lists). Indexing allows you to quickly retrieve, analyze, or modify specific data points.

  • Efficiency: With indexing, you can loop through data, extract subsets, or perform calculations on selected elements without having to process the entire dataset.

  • Flexibility: Indexing supports powerful operations like slicing, which lets you work with ranges of data (e.g., every third measurement, or all values after a certain point).

How Indexing Works#

  • Python starts counting at 0. The first element is at index 0, the second at index 1, and so on.

  • You can use positive indices (from the start) or negative indices (from the end) to access elements.

  • Indexing is essential for tasks like data cleaning, analysis, and visualization.


Let’s play with this concept by creating a list to represent lunchboxes and see how indexing helps us access and manipulate the contents!

lunchboxes = ['cheetos','apples','leftovers','gushers','sandwich','chicken','coffee','burrito']
print(lunchboxes)
['cheetos', 'apples', 'leftovers', 'gushers', 'sandwich', 'chicken', 'coffee', 'burrito']
## NOTE: Python starts counting at 0.
## A normal person might say they want the contents of the third lunchbox for lunch...
lunchboxes[3]
'gushers'
## That actually asked for the thing in the 4th lunchbox, we want the third:
lunchboxes[2]
'leftovers'
## What if someone asks you how many lunchboxes you have?
## We want the "length" of the list, use "len"
len(lunchboxes)
8
## will this run?
lunchboxes[8]
---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
<ipython-input-45-54878cad40e2> in <module>
      1 ## will this run?
----> 2 lunchboxes[8]

IndexError: list index out of range
## If I want the last lunchbox:
print(lunchboxes[7])
print(lunchboxes[-1])
burrito
burrito
## What if I want the second through fourth lunchboxes?
lunchboxes[1:3] ## this will give me the second and third, but not fourth (index = 3)
['apples', 'leftovers']
## We actually have to do:
lunchboxes[1:4]
['apples', 'leftovers', 'gushers']

Summary of This Notebook#

This notebook provides an introduction to fundamental Python programming concepts, with a focus on their application in scientific computing. The main topics covered include:

  • Algorithms and Algorithmic Thinking:
    The notebook begins by defining algorithms and demonstrating algorithmic thinking through the example of determining if a number is even or odd using conditional statements and operators.

  • Conditional Statements and Operators:
    It explains how to use if-else statements and the modulo operator % to make decisions in code.

  • Python Data Structures:
    The differences between Python lists, NumPy arrays, and tuples are explored in detail:

    • Lists: Flexible, can store mixed data types, but do not support elementwise mathematical operations.

    • NumPy Arrays: Require a single data type, support efficient elementwise operations, and are ideal for numerical computations.

    • Tuples: Immutable collections, useful for storing constant data and as dictionary keys.

  • Operations on Lists and Arrays:
    The notebook demonstrates how lists and arrays behave differently when performing operations like addition, appending, and mathematical calculations.

  • Indexing and Slicing:
    It covers how to access and manipulate elements in lists, arrays, and tuples using indexing and slicing, emphasizing the importance of zero-based indexing in Python.

  • Best Practices in Scientific Programming:
    Throughout, the notebook highlights why NumPy arrays are preferred for scientific tasks and when other data structures like lists and tuples are appropriate.

Overall, this notebook equips readers with foundational knowledge of Python data structures and control flow, preparing them for more advanced topics in scientific programming and data analysis.