Tag: data structure

Interview Basics: Multidimensional Arrays

Interview Basics: Multidimensional Arrays

Did you read last week’s article on static vs dynamic arrays? If not, catch up now! This week we’re continuing the arrays theme and discussing multidimensional arrays. This is another data structure that you absolutely have to know to succeed in technical interviews, which is why several of our array questions in Interview Practice feature them.

Multidimensional Array Basics

What makes a multidimensional array different from a plain old array? And just what is a multidimensional array, anyway? 

Basically, a multidimensional array is also a random-access data structure, but in this case the data is accessed by more than one index.

Some languages like C# have built-in support for multidimensional arrays. But other languages support multiple indices by creating an “array of arrays”. A multidimensional array in which you need </span><span style="font-weight: 400;">N numbers to reference a particular piece of data is an </span><span style="font-weight: 400;">N-dimensional array. In other words, N is the number of indices needed to locate a single element in the array.

How do you access elements in a multidimensional array? In the “array of arrays” model, the elements at arr[i] are arrays themselves. And element arr[i][j] is the jth element in the array arr[i]. These arrays can be dynamic or static.

The good: The item lookup by index for multidimensional arrays is </span><span style="font-weight: 400;">O(1), and it’s easy to iterate over every element stored in a multidimensional array. You can use multiple indices that make sense for the problem you’re solving (consider using rows and columns for a problem about a chess board, for example), which makes it easier for you to read and maintain your code.

The bad: It’s not easy or quick to rearrange the elements in a multidimensional array, and it requires a long time to change its size.

Meet the 2D Array

Technically, it’s possible to have a multidimensional array of any size. But for technical interviews, it’s most important for you to be familiar with 2D arrays. 2D arrays are often used to implement:

  • Matrices, where matrix[i][j] represents the element at row i and column j;
  • Game boards like chess or checkers boards, where board[row][col] represents the state of the board at the location (row, col);
  • Maps, in which the map is divided into cells to represent different locations.

Jagged vs Regular Arrays

As we mentioned earlier, different languages have different ways of representing multidimensional arrays. When implementing a 2D array as an “array of arrays” (i.e. </span><span style="font-weight: 400;">arr[i] is itself an array), there is no technical reason why </span><span style="font-weight: 400;">len(array[i]) == len(array[j]). If the subarrays have different lengths, we call this a jagged array.

jagged multidimensional array
A jagged multidimensional array

But when we talk about multidimensional arrays, we often mean regular (or rectangular) arrays, in which each dimension has a fixed length.

regular multidimensional array
A regular multidimensional array

TL;DR? Watch this video instead!

More of a visual learner? We get it. Watch this video to get up to speed on the basics of multidimensional arrays!

Up to speed on this handy data structure? Head on over to the arrays section of Interview Practice to solve some real multidimensional array-based technical interview questions!

Interview Basics: Static vs. Dynamic Arrays

Interview Basics: Static vs. Dynamic Arrays

Arrays are one of the most basic data structures in computer science. But they form the basis of some difficult technical interview questions! So no matter where you are in your programming career, you need to be very familiar with arrays and how to use them. Read on to review the basics of static and dynamic arrays, then watch our video to get clear on the differences between them. And once you’re ready to solve some array-based technical interview questions on your own, head to Interview Practice. You’ll be able to practice solving array interview questions that have been asked at Google, LinkedIn, Apple, Uber, and Palantir. 

First Things First

Whatever programming language you choose use in an interview, make sure you know its array methods very well. Like, forwards and backwards well. Interviewers will definitely be judging you on this! If you don’t know something basic like this in the language you’ve chosen to interview with, they’ll question how well you know that language at all… Not to mention how well you can program, period. So spend some time with your language’s documentation to make sure that you’ve got a handle on its basic and advanced array methods.

Arrays

The simplest definition of an array is that it’s a data structure that contains a group of elements. In interviews, you’ll get a lot of questions about static arrays, which are the most basic implementation of this data structure. You’ll also get questions about dynamic arrays. We’re going to focus on static and dynamic arrays in this article. (You’ll also get questions about multidimensional arrays in technical interviews, which we’re going to cover in an upcoming article.)

Static Arrays

The most basic implementation of an array is a static array. A static array is a random-access data structure with a fixed size. You have to declare its size when you first initialize it. To add or remove elements from a static array, you have to create a new appropriately-sized contiguous chunk of memory. Then you have to copy the elements that you want to keep into this new array.

The good: The item lookup by index for static arrays is really quick (O(1)), and they have a very low memory footprint.

The bad: Adding elements to or deleting elements from the array is an O(n) operation, where n is the number of elements in the array. Searching the array for a particular element is also O(n). Arrays don’t allow for the quick rearrangement of their elements.

Dynamic Arrays

A dynamic array is a random-access, variable-sized list data structure that elements can be added to or removed from. They’re basically like static arrays, but with the addition of space that’s reserved for adding more elements. When dealing with a dynamic array, there are two important factors to consider. There’s the logical size of the array (the number of elements being used by the array’s contents) and its physical size (the size of the underlying array). The physical size is counted in either the number of elements the array has room for, or the number of bytes used. In a technical interview, you should use a dynamic array if you know you’ll need to add or delete information.

The good: On average, it’s quick to insert elements to the end of the dynamic array. And item lookup by index is O(1).

The bad: They don’t allow for the quick rearrangement of their elements. And they have inconsistent runtimes for adding elements to the array, so they’re not good for real-time systems.

Diving Deeper

Now that you’re clear on the basics of static and dynamic arrays, watch this video for a deeper dive on the differences between the two.

Bonus Array Joke

  1. Why did the programmer leave his job?
  2. Because he didn’t get arrays.

(Get it? Arrays, “a raise”! Stop groaning. It’s a great joke.)

Interview Basics: Reverse a Linked List

Interview Basics: Reverse a Linked List

Reversing a linked list is a very common task in technical interviews. That’s why we devote a whole section of Interview Practice to them! Being able to solve this question easily is a great way to demonstrate to an interviewer that you’re a capable, competent programmer. Let’s discuss two different ways of reversing the direction of a linked list: recursively and iteratively.Linked list refresher

A linked list can be modeled by Node objects, where each node stores a value and a next pointer to the next node in the list. The last node in the list points to NULL, to indicate that you’ve reached the end of the list. You access the linked list by having access to a head pointer at the beginning of the list.

Basic Linked List
A basic linked list

You need to apply a function reverse(head) that does two things:

  1. Reverses the direction of the arrows, changing the structure of the linked list, and
  2. Returns a pointer to the new beginning of the list.

So all the nodes will stay in place in memory, and only the direction of the pointers will change.

Linked List Reversed
What we want to get

You probably wouldn’t implement a linked list this way in an actual program. There might be a lot of different references to the linked list before you reverse it.

Linked List Multiple References
A linked list with multiple references

If you call reverse(head1), you would update head1. But all the other pointers would then be stuck at the end of the linked list, and only head1 would be a good head pointer.

Linked List Reversing With Multiple References
Only head1 was reversed

To get around this, you could make a LinkedList class, or a sentinel head that external references could point to that contains the pointer to the beginning of the linked list. Then you could safely update this reference without breaking any external references.

Linked List with Sentinel Head
A linked list with a sentinel head

But so that we can focus on just the core algorithm, the solutions in this video use a simple version of the linked list.

Why is reversing a linked list tricky?

Linked lists are designed to be easy to traverse in the forward direction, and once a node passes a particular point there is no way of going back to it. If you were in a position where reversing order was something you needed to do often, then a linked list would be a poor choice of data structure and you’d be better off choosing something else.

So this isn’t a problem that you’d really ever encounter on the job. In technical interviews, though, this kind of question is asked as a way of assessing how well you can perform tricky manipulations. That’s why sometimes the problems that interviewers ask can feel kind of artificial.

In this video, I demonstrate two different approaches to reversing a linked list: a recursive one, and an iterative one.

The code

Introducing the CodeFights Arcade Database World

Introducing the CodeFights Arcade Database World

The ability to access, understand, and manipulate data is extremely important in the current engineering job market. Employers are increasingly listing SQL as a “required skill” in job listings, whether the job is a back-end engineering job or not.

Data scientists, data analysts, researchers, and many others also need to be able to access information that’s stored in relational databases. No matter what your job title is, the ability to dive deep into an organized collection of data to answer questions, uncover patterns, and discover information is incredibly useful!

But unless you already work with a relational database, whether at a job or at school, how do you practice writing efficient and accurate queries?

Database World

At CodeFights, we wanted to give people a fun, easy way to practice writing queries and manipulating data. That’s why we created Database World, our latest addition to the CodeFights Arcade!

Database World has over 80+ tasks spread out over 16 levels. The questions start out easy… But they ramp up in difficulty as you progress. This means you’ll never get bored as you practice!

Database World is a unique way of learning database skills that’s both educational and entertaining. While you write more and more queries, you’ll start to know what to look for and how to pinpoint the information you need. You’ll get faster and have less hesitation. In other words, writing amazing SQL queries will become second nature for you. You’re going to be able to leverage this into being highly competitive in the job market!

Back up – what’s the Arcade?

There are five different worlds in the CodeFights Arcade. Each world has tons of levels and tasks to complete that help you master a particular concept.

  • Start tackling tasks and get familiar with Arcade in the The Intro.
  • Get up to speed on programming fundamentals in The Core.
  • Python World focuses on helping you master programming in Python.
  • Solve graph theory questions in Graphs World.
  • And now, practice your SQL and database skills with Database World!

Get started!

Are you ready to become a database guru? If so, get started on Database World with projectList, the first task in the introductory level!

 

Intro to Hash Tables

Intro to Hash Tables

Hash tables are a must-know data structure for technical interviews. They’re used to store unordered collections of (key, value)pairs, in which the key must be unique. Item lookup by key, inserting a new item, or deleting an item are all fast operations – approximately O(1). Because they give you quick and cheap insertion, deletion, and lookups, you’ll be able to use them to solve many different types of interview questions.

When should you use a hash table?

Some common interview questions in which you should probably use a hash table are:

  • When there’s a unique, non-arbitrary identifier that you can use as a key for lookup. (Example: A caller ID function that uses a person’s phone number to retrieve their information.)
  • You need to quickly determine whether an element belongs to a collection. In these cases, you can represent the collection as a hash table with the elements as keys. (Example: A Scrabble checker that determines if a word is valid or not.)
  • You’re solving a problem about invariants. (Example: Checking whether a word has an anagram.)

To get a good basic introduction (or reintroduction) to hash tables and how you can use them, watch our Intro to Hash Tables tutorial video!

(For even more information about hash tables, check out the Hash Tables section in any of our Interview Practice learning plans.)

Practice makes perfect!

When you feel ready to practice on some real interview questions about this data structure, head to Interview Practice on CodeFights. Sign up for one of our technical interview learning plans. (No matter which one you choose, you’ll have access to the hash tables section.) You’ll learn more about them in our hash tables topic overview, and you’ll solve real interview questions about them. All that practice will get you good at recognizing when to use them and how to implement them during technical interviews!

CodeFights Solves It: findSubstrings

CodeFights Solves It: findSubstrings

The Interview Practice Task of the Week was findSubstrings. This is an Uber interview question, and their technical interviews are notoriously lengthy and difficult. And the solution implements tries, a data structure that you’ll see a lot in interview questions. It’s a double whammy!

You know the drill by now: If you haven’t solved this challenge yet, hop on over to CodeFights and do it! Then come back here and we’ll walk through solving it together. We’ll start with a naive implementation and then work on optimizing it.

…Done? Okay!

Technical interview question

The goal of this Uber interview question is to find the longest substring that appears in a given list parts in a set of words. You’ll return the list of words, with the substrings enclosed in square brackets. If there is a tie for the longest substring, you’ll mark the one that appears earliest in the input string.

On a word-by-word basis, if our list is parts = ["An","a","men", "ing", "ie","mel","me"] then we would indicate substrings in the following way:

  • “word” becomes “word” (there is no substring of “word” in parts)
  • “anticipating” becomes “anticipat[ing]” (the substrings “a” and “ing” both appear, but “ing” is longer)
  • “interest” becomes “interest”
  • “metro” becomes “[me]tro”
  • “melrose” becomes “[mel]rose”
  • “melting” becomes “[mel]ting” (“mel” and “ing” are the same length, but “mel” appears first in the word)
  • “ingoltsmelt” becomes “[ing]oltsmelt”

Our function, findSubstrings, should take a list of strings words and another list of strings parts, and return the list of words with the longest substring indicated. For example, with words = ["word", "anticipating", "ingolt", "melting"], and parts defined as before we would get:

Naive solution

Let’s try approaching this problem directly. This would mean going through our list of words, one at a time, and then checking each substring in parts to see if it appears in our word. We will keep track of the longest part that occurs earliest in the string.

We can try running this function on our example, and it works. So we have a running implementation.

But is it good enough to pass the CodeFights time tests, or will it take too long to execute? We submit it and… it passes!

(I’m going to add an important caveat here and say that it passes at the time when I wrote this. Our engineers add new tests to challenges fairly frequently, based on user feedback and challenge results, so there’s a chance that when you read this my naive solution might not pass anymore. But the explanation below of its run time is still really useful, so don’t just skip to the optimized solution!)

What’s the run time?

We should take a moment to think about the running time of this code. If we call W the number of words, and P the number of parts, then we can tell that the running time is at least O(WP) from our nested loop. With a high-level language like Python, we also want to look for hidden loops in function calls. Inside the p loop, we call if p in w and w.index(p) multiple times. These functions are scanning the word w to look for p, and will have a running time of O(len(W)). We could speed up the function a little bit by calling w.index(p) once and saving the result, but we still have a running time that looks like O(WPN), where N is the length of a typical word in w.

So how bad is O(WPN)? From the constraints page:
* We have at most 5000 words, so in the worst-case scenario W = 5000;
* Each word is at most 30 characters, so the worst case is N = 30;
* We have at most 5000 substrings in parts, so the worst-case is P = 5000.

Putting this together, this gives us 750 million loops. This means each loop would have to run in about 5 picoseconds to be done in 4 seconds! Put another way, we would have to have PetaHz processors for the worst case scenario.

Can we argue that we solved the challenge anyway, and that the CodeFights test showed us that the cases we will be seeing aren’t the worst possible cases? Yes, you can absolutely argue that! When companies are hiring, they want people who can write code quickly to solve today’s problems. A working solution today is better than a perfect solution in a month!

But you should make sure you demonstrate to the interviewer that you know how to think about algorithmic complexity. Microsoft famously had an exponential algorithm for determining which software patches were needed for Windows XP, but it was working fast enough for 10 years(!), and no one realized that the algorithm was exponential.

So you can tell the interviewer that you know this is an algorithm with three multiplicative factors (length of a typical word, number of words, and number of substrings). You should explain that you pass the tests, but that your algorithm won’t handle the worst case in reasonable time. Go on to explain that this isn’t an exponential algorithm, so getting more powerful computers or waiting a little longer isn’t unreasonable.

An interviewer may ask you to write a faster algorithm anyway, so if you really want to impress her you can preemptively mention that you think there are better algorithms that you would use if there was a good business case for spending the extra time on this. Remember that premature optimization is the root of all evil. (Maybe that’s a little over-dramatic… but it’s still bad!)

Giving it our best Trie

Instead of the naive implementation above, we will use a special type of tree data structure called a trie to store the substrings in parts. Like all trees in programming, a trie has a root node. Every other node keeps track of a letter, and getting to this point in the trie completes a substring from parts. Taking our example of parts = ["An","a","men", "ing", "ie","mel","me"] from earlier, we can think about the trie pictorially as:

trie data structure

Note there are 7 red letters in the picture of the trie, which correspond to the 7 words we have in parts. Every leaf is a red letter, but the e below the m is also red to indicate that me is in parts.

To check if a string s is in parts, we start at the root and move to the first level in the trie that matches the first letter of s. If we are trying to match me, for example, we would start by moving from the root to m, and then from m to e. Since we end up on a red node, we know that our word is on the list.

trie data structure

To see if in is a member of parts, we would start at the root and follow the path to i, then to n. Since n is not red, we haven’t completed a string from parts, so we would conclude that in is not a member of parts.

trie data structure

If we want to see whether dog was in parts, we would see there is no path from the root to a d, so we could report right away that dog isn’t in parts. We will be scanning a word character-by-character, and instead of having to search through all of parts, we only have to search through the subset of parts that matches what we have already seen.

You have seen this data structure in action before – in fact, every time you see auto-complete working every time you use a browser or text!

Implementing it in code

Now we have to translate our picture into code. We will make a class TrieNode to store all the information in a node:

To create a trie with the words “cat”, “cam”, and “cha” and “chat”, we could write the following (tedious) code:

or, pictorially, we have

trie data structure uber interview question

Let’s make a function that adds a word to the trie so that we don’t have to do it manually. Our method will be to start at the root, reusing existing nodes along our path (or making new nodes as needed). The code for this is:

Using this function, we can condense the code to create our trie to:

Using the trie

I am going to move most of the hard work – that is, finding the longest substring and enclosing it in square brackets – to a function findSubstringInWord(word, root). The function findSubstrings(words, parts) will be responsible for making the trie and collecting all the words together at the end.

Here is our strategy for findSubstringInWord:

  1. Initialize our lenLongSubstr to 0.
  2. Go through word one character at a time, using that character as a starting point for matching substrings in parts.
  3. For each character, use the trie to find the longest substring from parts starting from this character. If it is longer than the longest one we’ve seen so far, record the current position in the string and the current length.
  4. Use the length of the longestSeenSubstring and its starting position to put the square brackets in the right place.

Note how going through the string in order, and only updating the position when we see a strictly longer string, automatically encodes our tie-breaking condition (yay!). The trickiest part is finding the the longest substring from parts starting at a given location, because we need to know the last terminal node we passed though before failing to find a match on the trie. It would be a lot easier if we just had to keep track of the last node we were on before we failed to match.

The function findSubstrings is simple by comparison! Note that it uses our earlier method of constructing the trie directly from parts.

Running time for the trie

Our algorithm still looped through each word. The constraint that we haven’t used yet is the length of the each element in parts. The worst case scenario is having to check 5 levels at each position at each word. In the worst case, we have:

  • The number of words, W = 5000;
  • The number of letters in a word, N = 30;
  • The number of levels to check in the trie, P = 5.

This is smaller than the naive implementation by a factor of 1000.

The final code!

Our code was spread over a few functions. Collecting it all together, we have:

We did it! High fives for everyone!

uber interview question trie data structure

Tell us…

How did you solve this Uber interview question? Have you ever seen one like it in a coding interview? Let us know on the CodeFights user forum!

CodeFights Solves It: isTreeSymmetric

CodeFights Solves It: isTreeSymmetric

The most recent Interview Practice Challenge of the Week was isTreeSymmetric. This interview question from LinkedIn and Microsoft is checking to see how comfortable you are with the tree data structure. Trees have a wide variety of applications, so your interviewer is going to make sure that you know your stuff!

Have you solved it yet? If not, head over to CodeFights and give it a shot. Once you have a solution, come back here and I’ll walk you through how to answer this interview question. (Need to review first? Check out the CodeFights Explainer article about trees.)

solve this tree challenge
This cat is definitely solving isTreeSymmetric right now. You should too!

The technical interview question

Done? Okay, let’s get started!

Our job is to determine if a (binary) tree is a mirror image of itself. That is, we want to know if the left side mirrors the right side. Here is an example of a symmetric tree:

In contrast, this is not a symmetric tree:

The tree on the bottom is the same on the left and the right. In other words, starting at each 2, you get exactly the same tree. But it’s not a mirror image because the left and right sides — the 3 and 4 — don’t swap positions.

We are going to implement a tree in Python with the following data structure:

Subtrees and recursion

A subtree is what we get by starting at any node, and treating it as the root of its own tree. For example, consider the tree:

The subtree under node 2 is:

While the subtree under node 6 is:

A simpler question: Are left and right sides equal?

Let’s start with a slightly simpler problem. We’ll ask if the left-hand side of the tree is equal to the right-hand side of the tree, and we’ll assume that we have at least three nodes. With these assumptions, this is the same as asking whether the subtree under the first left node is the equal to the subtree under the first right node. Since subtrees are trees in their own right, what we would like is a function areTreesEqual(treeLeft, treeRight).

To figure out how to determine if two trees tree1 and tree2 are equal, we’ll turn to recursion. We have two trees, tree1 and tree2. Suppose that each tree has at the first two levels full (i.e. we have the root and both a left and right branch on level 1). Then tree1 looks like the following:

with a similar structure for tree2. We can think of tree1 as:

tree1 and tree2 are equal if and only if all of the following are true:

  1. A1 and A2 are equal;
  2. The tree {subtree L1} is equal to the {subtree L2} (determined by calling areTreesEqual(tree1.left, tree2.left));
  3. The tree {subtree R1} is equal to the {subtree R2} (determined by calling areTreesEqual(tree1.right, tree2.right)).

Now we need to work out the base case, so that the recursion will terminate. We assumed above that the first two levels would be full. This certainly isn’t true for leaf nodes, and even close to the root we may not have two branches. Here are the cases we need to consider that don’t fall into the recursion above:

  • Both trees are at a leaf (i.e. no children on either side). Just check to see if the values of the nodes are the same; if they are then these two subtrees are equal.
  • Either the left or right side is missing from both nodes. If the sides are the same, just check that side. If the sides are different (e.g. tree1 has no left side, but tree2 has no right side) then the trees are not equal.
  • The left side or right side is missing from only one node. Then the two trees are automatically not equal.

We now have enough information to write our function areTreesEqual:

A function areBranchesEqual, which tests if the two branches of a tree are equal, could then be written to first check if we have a root node, then use areTreesEqual on the two subtrees root.left and root.right.

The original problem: isTreeSymmetric?

The big difference between isTreeSymmetric and areBranchesEqual is implementing the mirroring. When we investigate the left hand path of a node on one side, we have to investigate the right hand side on the other. Our full code is:

The problem with recursion

So far we have a really neat and elegant solution for this interview question. It passes all of the tests on CodeFights, including the hidden ones. A lot of tree problems lend themselves really well to being solved with recursion. However, in a technical interview, the interviewer might point out that the constraints of the problem told us that we were guaranteed the tree would be smaller than 5 * 10^4 values. Could we trigger a maximum recursion depth error in Python? (In other languages, the general term for this is “run out of stack space”).

Our algorithm is a depth-first search check, so if we run into problems it will be on very deep trees. Whenever we move down a level, before calling the recursive step we check if the current left side and right side are equal. So our worst case scenario tree would be:

The depth of this tree is 24999, and the (default) maximum recursion depth in Python is only 1000. Thankfully CodeFights didn’t give us this “torture test” for our algorithm! But how would we deal with it if our interviewer asked us to solve these trees?

The solution is to convert the recursive algorithm into an iterative one.

Phew! Though the recursion version is a lot easier to read, it’s a good idea to practice rewriting recursive algorithms as iterative ones since there are limitations on how many successive recursive calls you can make before returning.

Tell us…

How did you solve this Interview Practice challenge? Have you ever encountered it (or one like it!) in an interview? Let us know over on the forum!

CodeFights Explainer: Trees

CodeFights Explainer: Trees

Trees. They’re considered to be a computer science basic, but if it’s been a while since you’ve dealt with them – or if you’ve never encountered them before at all – they can be daunting. But tree questions are extremely common in technical interviews, so they’ll come up while you’re finding a job. And then on the job, they’re a wildly useful data structure. So let’s get you up to speed on trees!

A tree is a data structure that is particularly useful for representing data that’s sorted or in hierarchical order, and where we want to be able to retrieve data quickly. I’m going to dive into the idea of trees by introducing one of the most common use cases for them: the binary search tree.

Example: An address book app

Suppose we’re designing an address book application. You type in the name of the person you’re looking for, and if they are in your address book the program retrieves some information about them, like their address, email, and telephone number. If they aren’t in your contact book, you’ll get a “User not Found” message. For the moment, let’s just look at the list of names of people in our sorted address book:

Suppose you want to know if David is in your contact list. You could scan through the names, one by one, and then keep track of whether or not you have seen David yet. If you haven’t seen it by the time you reach the end of the list, then you can confidently report he isn’t one of your contacts.

Better yet, since you started with a sorted list, you can stop as soon as you see any name that would appear after David in an alphabetically ordered list. In this particular example, once you reach the 4th element in the list, Flaka, you know that David isn’t on this list.

Even with the improvement of the “stop once you reach an element bigger than the one you are looking for” approach, this algorithm is still O(N), where N is the number of items in our list. If you were checking to see if Zoe was on your list, then you would have to check Every. Single. User. Not cool.

A better approach would be to start in the middle, with Flaka. If you are looking for a name that appears earlier in the alphabet than Flaka (like David) then you know it is in the left half of the list (or it is not there at all). If you are looking for Zoe, you know that it appears on the right half of the list. For either name, you only have to search one half of the list. Let’s focus on the steps you would use to search for David:

  1. Start halfway through the list. The list of contacts has 7 items, so start at the 4th item, Flaka.
  2. Since David appears earlier in the alphabet than Flaka (i.e. as strings, David < Flaka), then you know that if David is in the list at all he is first, second, or third. Pick the middle element again, in this case Amy.
  3. Since Amy < David, if David is in this list he has to be the third element. But the third element is Colin, so David isn’t in this list.
list vs binary search
Visualizing the search: Look at 3 nodes, two jumps.

For David, we see that it takes 3 searches to find that he is not on the list, rather than 4. For Zoe, the difference is more impressive, because we can also exclude her name in 3 searches rather than 7! Here’s what searching for Zoe in your contact list would look like:

list vs binary search, part 2

At each step, we are halving the number of names you have to search through, so the number of searches you have to do to determine if a name is on the list is O(log N), which is a vast improvement over our original O(N)!

Using a tree

Instead of thinking about the list of contacts as a flat list, let’s try to visualize it the way that our sorting algorithm goes through it:

names in a tree
Visualizing the contacts as a tree.

Regardless of what name n we search for, we will always start at Flaka. If n is Flaka, then we’re done! If we are looking for a name that appears earlier in the alphabet, we will look at indices less than Flaka (to the left). Otherwise, we’ll move to the right. We keep moving down the diagram, looking for n, using the convention that if n is smaller than the current word we move to the left, and if it is larger we move to the right. If we reach the names at the bottom of the diagram and still haven’t found the name n, we can conclude that it isn’t in our diagram.

A tree is a way of implementing this data structure directly, rather than imagining this data structure on a simple list.

Each box in the diagram is called a node. Nodes keep track of the following information:

Properties of a node

Property name Type Description
children Pointers to other nodes We show all children of a node are connected to this node via arrows. The node Flaka in our example has two children: Amy and Madison.
value Data This is the data that is actually stored in the node. We are just storing the names at the moment, but a more useful address book would contain a reference to a person object so that we could get the email address, phone number, etc.

Continuing the family-tree analogy, we also have the notion of a parent node. Node X is a parent of node Y if node Y is a child of node X. We’ve already stated that the Flaka node had two children: Amy and Madison. This means that the parent node of Amy is Flaka.

There are also special types of nodes in a tree:

The root node is the only node on the tree that doesn’t have a parent, so in this tree the root is Flaka.

Then there are leaf nodes, which are the nodes that don’t have any children. In this case, the leaf nodes are Amanda, Colin, Gustav and Naomi.

Each node also a level, which is how far this node is from the root node. So Flaka (the root) is at level 1, Amy and Madison are at level 2, and the leaf nodes are all at level 3 in our example.

For a collection of nodes to be a non-empty tree (as used in computer programming), they need to satisfy 4 conditions:

  1. There needs to be exactly one root node (i.e. one node with no parent).
  2. Each node that is not the root node has only one parent node (i.e. no node is a child of two or more nodes)
  3. You need to be able to reach each node by starting at the root and following a sequence of “child” pointers.
  4. A node at level x cannot have a child at a level less than x.
    (In fact, all children of a node at level x have a level of x+1, because children of a node at level x are one extra step from the root.)

When we draw the nodes, we typically draw level 1 (the root) at the top of the page, and increase levels as we move down to the leaves. For some reason, computer scientists have decided to draw all of their trees “upside down”, with the root at top and the leaves at the bottom! Also note that we are jumping between two metaphors when labeling things for trees. We use a lot of terms inspired by family trees such as “parent” and “child”, as well as a lot of terms inspired by botanical trees such as “root” and “leaves”.

To get an idea of what some of the restrictions mean, let’s take a moment to look at two versions of our address book that are not trees. The problem with the data structure on the top is that Colin is a child of Amy and Madison, which breaks rule 2. The problem with the graph on the bottom is that Madison (level 2) is a child of Amanda (level 3).

not a tree
not a tree

These rules guarantee that we cannot have cycles while moving along the arrows, because either we will have no children nodes to visit (i.e. we are at a leaf), or we are guaranteed to be going to a higher level. Since we are moving along children links, all the nodes we have seen already are at a lower level than the current level.

All of this is pretty cool, but it seems like we’ve gone to a lot of work and just reproduced the O(log N) search of my contacts that I could have done with a flat (ordered) list! But one of the big advantages that the tree has over a flat list is that it is more flexible. Continuing on with the address book example, suppose I contact Amanda and Gustav a lot more than I call any of the other people. That means that every time I open my address book, my phone is usually doing three searches in the tree. We can rearrange the tree a little to get the following:

unbalanced tree
A tree with the nodes rearranged to make the average search quicker

In the longest searches, my app needs to do four steps now instead of three. But the most frequently searched-for names are now close to the top with the goal of reducing the average search time. Of course, we can also design a (bad) tree that is just a linked list:

a tree pretending to be a linked list
A tree that’s acting like a linked list.

The process of finding the best organization of the tree for the expected use of the data is called balancing the tree.

Other uses of trees

We have used trees to order data by value (i.e. sort it) as a way of replacing a list. A different way of using trees is to manage objects that are clustered or grouped by some attribute. Consider the organization of CodeFights Cafe. We have an owner, a head of accounting (Alice), and a store manager (Katherine). We also have some employees that belong either to the accounting department, or are managed by Katherine. We might structure a tree for CodeFights Cafe in the following way:

tree organizational structure
The organizational structure of CodeFights Cafe.

This tree arranges the nodes by responsibility, which means that any person in this chart manages their descendants. So Katherine can tell her assistant manager Fred what to do, as well give instructions to Claire and Jim. But she can’t give instructions to Bob, for example, because even though he is at a lower level he’s part of a different department.

It’s clear how this structure is a useful way for us to visualize how the Cafe is organized. It also makes certain types of operations easy to implement recursively. Suppose we wanted to send a message to the entire accounting department. We might have a function tell( node, msg) that will use the information in node to get the contact information for the person and email them the message msg. Suppose each node n has its children stored in a list n.children. We could send a message to n and all of its descendants (those nodes that n can reach that are at a lower level than it) with the following function:

If we want to message the entire company “Happy New Year!”, we can simply call tellAllDescendants( owner_node, "Happy New Year"). If we want to tell the service side that we need to bake more cookies, we can call tellAllDescendants( katherine_node, "Pls bake more cookies") and the message will be relayed to Katherine, Fred, Claire, and Jim.

This same organizational structure can answer other questions for us too, such as quickly finding out what the labor costs are in each department. Suppose we wanted to know what the total cost of the salaries in the service side of the business are (so Katherine’s salary, plus that of all the employees under her). If each node n stored that person’s salary as n.salary, we could write this function:

Note that us just asking for the sum of the node.salary and the sum of salaries of all of nodes children doesn’t work. When applied to the node for Katherine, this would add her salary to Fred’s, but it would miss the salaries of Claire and Jim. To find the salary for the service side of the business, we could now call sumSalaries(katherine_node). As a bonus, if we wanted to determine the entire salary cost for the business, we could call sumSalaries(owner_node). The fact that every employee node has only one parent ensures that each node is counted exactly once. (You can see that if Claire worked in the accounting department as well, then her salary would be counted twice by this code).

Our sumSalaries code is an example of a depth-first recursive algorithm. When calling sumSalaries(katherine_node), here is what our algorithm is doing:

  • Start processing sumSalaries on the node for Katherine
    • Before returning, we call sumSalaries on the node for Fred (Katherine’s child)
    • Before returning, we call sumSalaries on the node for Claire (Fred’s child)
      • Claire’s node has no children, so now it returns.
    • Before returning, we call sumSalaries on the node for Jim (Fred’s child)
      • Jim’s node has no children, so now it returns.
    • We have now summed the salaries of all Fred’s node’s children, so the call to sumSalaries on node for Fred returns.
  • We have now finished summing the salaries of all Katherine’s node’s children, so call to SumSalaries on Katherine returns.

We had to reach a leaf node before we started having functions returning values. When using depth-first recursive algorithms, we have to be careful that we don’t run out of room on the stack! In this case, we had three function calls “stacked” until we finally started returning answers.

We can also use depth-first recursion to search for a particular node (but we’ll discuss a better approach to this problem below). So far, I have been taking it for granted that I have a variable katherine_node lying around. What if I only have the root, owner_node, and I want to find the total salary of the service side of the business? Suppose nodes stored the department’s name as node.department. One way we could do this is to do a depth-first search, stopping the first time we find a department with the right name:

Using depth-first search (DFS) doesn’t make much sense in this example. If we start searching by moving from the owner node to the node for Alice, DFS will keep search through the entire accounting department before giving up. Since we are looking for the highest-ranking person in the service department, it would make more sense to search the tree level by level. This is called breadth first search (BFS), and is implemented below:

The question of which type of search strategy, DFS or BFS, is better depends on the structure of the tree, and what information you are looking for. (I’m going to cover this issue in depth in a future article – stay tuned!)

Summary

We’ve seen that when we have ordered data, trees can give us an O(log N) way of searching the tree for the data. We’ve also seen that trees are more flexible than using simple arrays because we can balance the tree.

Trees are particularly useful when:

  1. You have data that is sorted in some way, and you want to do a lot of searches on it.
  2. You have data that is “grouped” in some way, such as the organizational tree for CodeFights Cafe. File systems are a common hierarchy, where the nodes are either files or directories, and node X is a child of node Y if node X is contained in Y. In this example, all ordinary files are leaves, every parent is a directory, and / is the root. A web-based example is HTML’s document object model, which is tree-based. Front end developers even talk about “child elements” and “parent elements”!

We’ve seen that with hierarchical data, we can still search through it. We can iterate through the entire tree to get information from each node (commonly by going either depth-first or breadth-first), while ensuring that we visit each node exactly once. This process is called tree traversal. We looked closely at a related topic, which was searching in a hierarchical tree. This is like tree traversal, except we can stop whenever we find what we are looking for, so we no longer guarantee that every node will be looked at.

Ready to tackle some trees?

As I mentioned earlier, tree questions are extremely common in technical interviews. You’ll definitely want to practice them while you’re looking for jobs and preparing for interviews! But you don’t have to hunt for good questions. CodeFights Interview Practice has you covered! When you’ve solved all of these questions, take a look through the solutions that other CodeFighters have posted. It’s always really instructive to see how other users have solved interview questions – you’ll definitely learn something new!

Tell us…

Have you encountered any tree questions in interviews? How did you tackle them? Head on over to the CodeFights forum and let us know!