Big O Notation

Some algorithms can be completed in a set number of steps, no matter how many elements there are to work with (efficient). Whilst others take many more steps (less efficient).

hero image
Research source for this article: A Common-Sense Guide to Data Structures and Algorithms by Jay Wengrow

Feedback Is Awesome!

After sharing this article the other day on Twitter, lots of people responded and offered their own insights, feedback and suggestions. This is one of the reasons I love Twitter, because without that engagement all of those learning opportunities might not have happened, especially not so quickly.

I updated this post to incorporate all of the feedback, by embedding the tweets directly in the post (that was fun!).

What is Big O Notation?

Big O Notation is a mathematical concept that allows us to easily categorize how efficient an algorithm is.

The number of steps that an algorithm takes is the main indicator of how efficient it is. If we have an array of 100 elements, and we want to find out the index of a specific value in that array, it could take up to 100 steps for us to find the value by checking each index individually (linear search), or it could take us 7 steps if the array was ordered and we took a higher or lower guessing game approach to finding the number (binary search).

The number of steps it takes to complete a linear or binary search algorithm etc, will depend on how big the array is, and a bunch of other factors.

When is Big O Useful?

Big O is useful when you are working with extremely large input sizes. With smaller inputs the difference between the effectiveness of algorithms is pretty much irrelevent.

If you look at the graph below, you'll see that the fewer elements there are, the closer the efficiency lines are. The difference in efficiency only become marked once the number of elements have increased significantly.

graph showing efficiency matters more when there are more elements

For more information on this, I found the C2 wiki article Big Oh to be helpful.

Huge thanks to Matteo Vaccari and Marco Consolaro for helping me address this gap in the article! Here are the original tweets that inspired it:

Big O: Count the steps

Big O achieves consistency by focusing only on the number of steps that an algorithm takes.

When we read data from an array by specifying it's index, that only takes one step because the computer knows the memory address for each index in an array, and can jump straight to it. The way to express this in Big O Notation is:

O(1)

Some people read this as "Big Oh of 1", or "Order of 1".

O(1) means that an algorithm takes the same number of steps no matter how much data there is. In this case, reading from an array always takes just one step no matter how much data the array contains.

Other operations that fall under the category of O(1) are the insertion and deletion of a value at the end of an array.

In the case of a linear search, the number of steps it takes to complete in the worst-case scenario, is the same number of steps as there are elements in the array. So, for N elements in the array, linear search can take up to a maximum of N steps.

O(N)

Oh of N, is the Big O way of saying that for N elements inside an array, the algorithm would take N steps to complete.

Big O Notation does more than describe the number of steps that an algorithm takes, like a hard number such as 22 or 400. It describes how many steps an algorithm takes based on the number of data elements that the algorithm is acting upon.

Another way of saying this is that Big O answers the following question: How does the number of steps change as the data increases?

When an array increases in size by one element, the O(N) algorithm will increase by one step. Whilst an algorithm that is O(1) will take the same number of steps no matter how large the array gets.

When plotted on a graph (see below), the O(N) makes a perfect diagonal line. This is because for every additional piece of data, the algorithm takes one additional step. O(N) is also known as linear time.

Whereas O(1) is a perfect horizontal line, because the number of steps in the algorithm remains constant no matter how much data there is.

big o notation graph showing O(N) and O(1)

Big O is primarily concerned about how an algorithm performs across varying amounts of data. This means, that as long as an algorithm always takes the same number of steps to perform regardless of how much data there is, then the algorithm would also be described as O(1).

If we are looking for a value that happens to be in the first index of an array (best case scenario), then the efficiency of the linear search in that example could be described as O(1) as it only took one step to complete. However, Big O notation generally refers only to worst case scenarios unless otherwise specified.

Knowing exactly how inefficient an algorithm can get in a worst-case scenario prepares us for the worst and may have a strong impact on our choices.

O(log N)

We can't describe a binary search as being O(1), because the number of steps increases as the data increases. It also doesn't fit into the category of O(N), because the number of steps is much fewer than the number of elements that it searches, only seven steps for an array containing one hundred elements.

Binary search seems to fall somewhere in between O(1) and O(N). In Big O, we describe binary search as having a time complexity of O(log N). This type of algorithm is also known as having a time complexity of log time.

O(log N) is the Big O way of describing an algoritm that increases one step each time the data is doubled.

Logarithms

Logarithms are the inverse of exponents (the power to which a given number or expression is to be raised, like 23 or 2 * 2 * 2, which is 8).

log2 8 is the converse (Switching the hypothesis and conclusion of a conditional statement) of the above. It means, how many times do you have to multiply 2 by itself to get a result of 8.

Another example, 26 translates to 2 * 2 * 2 * 2 * 2 * 2 = 64.

Since we had to multiply 2 by itself 6 times to get 64, log2 64 = 6

Another way of explaining log2 8 is: if we kept dividing 8 by 2 until we ended up with 1, how many 2s would we have in our equation (8/2/2/2 = 1). In this example it takes 3 times. So log2 8 = 3.

Or, we could explain log2 64 as: how many times do we need to halve 64 until we end up with 1? (6 times)

Back to O(log N)

Whenever we say O(log N), it's actually shorthand for saying O(log2 N). We're just ommitting the small 2 for convenience.

O(log N) means that for N data elements, the algorithm would take log2 N steps. If there are 8 elements, the algorithm would take three steps, since log2 8 = 3.

Said another way, if we keep dividing the eight elements in half, it would take us three steps until we end up with one element.

This is exactly what happens with binary search. As we search for a particular item, we keep dividing the array's cells in half until we narrow it down to the correct number.

O(log N) means that the algorithm takes as many steps as it takes to keep halving the data elements until we remain with one.

Practical Examples

1
2
3
things = ["apples", "baboons", "cribs", "pinapples"]
for thing in things:
  print "Here's a thing: %s" % thing

We would describe the efficiency of this algorithm in Big O Notation like this: O(N) because the number of times we print the elements in the array depends on how many elements are in the array.

1
print 'Hello world!'

We would describe the efficiency of this algorithm like this: O(1) because this print statement consistently takes only one step to achieve.

1
2
3
4
5
6
7
def in_prime(number):
  for i in range(2, number):
    if number % i == 0:
      return False
    return True
  end
end

We would describe the efficiency of this algorith like this: O(N), because the number of times the program loops through the numbers in the range depends on which number is given as the end of the range.

What About Other O Notations?

Thank you to Tommaso Previero for pointing out that there are a few Big O Notation Types not covered in this article.

And thank you to Joyce Kung for adding that Big O Measures the Upper-bound of an algorithm might take. So Big O measures the efficiency of an alorithm in the worst-case scenario (e.g. If for example a value you were searching for happened to be the last item in an array, then that might take 10 steps instead of one if it was in the first item in the array. Big O calculates the efficiency assuming that all 10 steps might be taken.