07: Introduction to 2D Array Slicing in NumPy

07: Introduction to 2D Array Slicing in NumPy#

This notebook explores how to work with multidimensional arrays using NumPy, focusing on slicing and referencing elements in 2D arrays and beyond. Understanding how to access and manipulate data in arrays is fundamental for data analysis, scientific computing, and machine learning.

We will cover:

  • The basics of array indexing and slicing in one and two dimensions.

  • How to select specific elements, rows, columns, or subarrays using different indexing techniques.

  • Using advanced indexing methods, such as boolean and integer array indexing, to extract or modify data.

  • Practical examples to reinforce these concepts, including referencing elements by position, extracting ranges, and using conditions to find and manipulate data.

By the end of this notebook, you will have a solid understanding of how to efficiently access and manipulate data in multidimensional NumPy arrays.

import numpy as np

## First, we need an array of random integers to test things out on.
## Specify the shape of the array we want - in rows,columns
random_shape = (13,6)

## Now with this shape, make a random array. According to this function, we first put in the lowest integer 
## that we want to show up in this array. Then we put in the highest integer (excluded), then the size.
randomarray = np.random.randint(0,51,random_shape)
print(randomarray)
[[24  0 46 18 50 26]
 [11 42  3  4 32 13]
 [39 11 29 44 28 22]
 [37 30 11 20  2 17]
 [ 8 41  5 19 45 36]
 [23 19 17 12 33 14]
 [25 14 16 41 25 17]
 [13 32 38 15 40  9]
 [ 9 42  5  8 34 48]
 [12 25 46 24 11 17]
 [43  5 37 50 11 18]
 [17 36 43 37  9 21]
 [48 20 33 18  3 35]]
## What if we wanted to reference the number 29 that's in the third row, and third column?
## row index, i=2; column index, j=2
randomarray[2,2]
29
## what about the number 20 in the fourth row?
## row i = 3, column j = 3
randomarray[3,3]
20
## What about everything in the third column?
## I want everything in column three, j = 2
randomarray[:,2] ## the colon says give me all the rows (indices in the dimension you're referencing)
## Note that i = :
array([46,  3, 29, 11,  5, 17, 16, 38,  5, 46, 37, 43, 33])
## What if I wanted the values 29, 11, and 5 in the third column?
## j = 2
## i = 2, 3, 4. Type: i = 2:5
randomarray[2:5,2]
array([29, 11,  5])
## Another way you can type:
randomarray[(2,3,4),2]
array([29, 11,  5])
## If I wanted:
## i = 3,6,8
## j = 2,3
randomarray[(3,6,8),2:4]   ## can also type: randomarray[[3,6,8],2:4]
randomarray[[3,6,8],2:4]
array([[11, 20],
       [16, 41],
       [ 5,  8]])

Key Points to Remember:

  1. Indexing starts at 0:
    The first element is at index 0, not 1.

  2. Order is row, then column:
    When accessing elements, the first index is the row, the second is the column.

  3. Colon (:) as a wildcard:
    Using : selects all elements along that axis (all rows or all columns).

  4. Ranges and slices:
    You can use ranges like 3:5 to select multiple rows or columns, e.g., array[3:5, 3] selects rows 3 and 4 from column 3.

  5. Specific indices:
    You can provide lists or tuples of specific indices, e.g., array[[3, 6, 9], 2] selects rows 3, 6, and 9 from column 2.

If we wanted to reference things in a non-consecutive way, we have to ask for the x,y position pairs (i.e., the exact combo of row and columns).

For example, a tuple or list with 3 row indices, and 3 column indices will give an array with three things in it:

randomarray[((2,5,3),(0,1,5))]
array([39, 19, 17])
randomarray[(1,1,3,3,4,4,),(1,2,1,2,1,2)]
array([42,  3, 30, 11, 41,  5])

We can put the referencing indices into variables, and then call those variables!

## For example:
row_i = 0
column_j = 4
randomarray[row_i,column_j]
50
## The np.where function finds which indices in an array satisfy a certain condition
## For example, what if we wanted the positions/indices in random_array where random_array was greater than 40?
elements_gt_40 = np.where(randomarray > 40)
print(elements_gt_40)
(array([ 0,  0,  1,  2,  4,  4,  6,  8,  8,  9, 10, 10, 11, 12],
      dtype=int64), array([2, 4, 1, 3, 1, 4, 3, 1, 5, 2, 0, 3, 2, 0], dtype=int64))
## one way is call the first array of elements_gt_40 (the row indices), and then the second (the column indices)
randomarray[elements_gt_40[0],elements_gt_40[1]]
array([46, 50, 42, 44, 41, 45, 41, 42, 48, 46, 43, 50, 43, 48])
## There's an even faster way...
randomarray[elements_gt_40]
array([46, 50, 42, 44, 41, 45, 41, 42, 48, 46, 43, 50, 43, 48])

Note that np.where only works on arrays, and not lists! :(

Shapes and lengths of arrays#

We can use the function np.shape to get what the shape of an array is

## What is the shape of our random array that we already made?
np.shape(randomarray)
(13, 6)
## Try making a 1D array:
rand1d = randomarray[:,0]
print(rand1d)
[24 11 39 37  8 23 25 13  9 12 43 17 48]
## shape of it:
np.shape(rand1d)
(13,)
rand1d_shape = np.shape(rand1d)
print(rand1d_shape)
(13,)
## What about the length?
len(rand1d)
13
## Ask for length of 2D array - gives the number of rows:
len(randomarray)
13

Slicing enables efficient access, manipulation, and analysis of subsets of data in large arrays, making it essential for fast, memory-efficient numerical computing and data science workflows.