Month: April 2017

April Marathon and Live Broadcast

April Marathon and Live Broadcast

You might not know that every month we run a huge coding tournament. We call them Marathons, but unlike a running marathon, they only last one hour. (And your legs won’t hurt tomorrow!) Each Marathon consists of 10 questions that have been specially created by our team of Content Engineers. They’re fun, they’re tricky, and they’ll get your brain in gear! And if you’ve never participated in a competitive coding event before, this is the perfect introduction.

Participate

If you want to compete in the Marathon, sign up on CodeFights. As of this second, 730 people have signed up, but thanks to the magic of the internet we’ve got enough room for everyone who wants to join us. You can register even after the Marathon starts, but don’t wait too long!

Watch

Even if you don’t want to participate in the Marathon, we’ve got a great way for you to join us. CodeFights CEO Tigran Sloyan and Content Engineer Damien Martin will be broadcasting live and providing commentary the whole time! Watching other people code, and getting play-by-play explanations of how the coders are solving the challenges, isn’t just fun. It’s also a great way to learn about coding and level up your own skills and understanding! To watch the live commentary, head over to our Facebook page.

Join us for this coding Marathon!

So whether you’re participating in the April Marathon tomorrow or just want to watch the action as it unfolds, we’ve got lots of coding goodness for you tomorrow! Both the Marathon and the live broadcast start tomorrow, April 29, at 10AM PDT.

Previously on CodeFight On!

March Marathon Recap

 

CodeFights Solves It: findSubstrings

CodeFights Solves It: findSubstrings

The Interview Practice Task of the Week was findSubstrings. This is an Uber interview question, and their technical interviews are notoriously lengthy and difficult. And the solution implements tries, a data structure that you’ll see a lot in interview questions. It’s a double whammy!

You know the drill by now: If you haven’t solved this challenge yet, hop on over to CodeFights and do it! Then come back here and we’ll walk through solving it together. We’ll start with a naive implementation and then work on optimizing it.

…Done? Okay!

Technical interview question

The goal of this Uber interview question is to find the longest substring that appears in a given list parts in a set of words. You’ll return the list of words, with the substrings enclosed in square brackets. If there is a tie for the longest substring, you’ll mark the one that appears earliest in the input string.

On a word-by-word basis, if our list is parts = ["An","a","men", "ing", "ie","mel","me"] then we would indicate substrings in the following way:

  • “word” becomes “word” (there is no substring of “word” in parts)
  • “anticipating” becomes “anticipat[ing]” (the substrings “a” and “ing” both appear, but “ing” is longer)
  • “interest” becomes “interest”
  • “metro” becomes “[me]tro”
  • “melrose” becomes “[mel]rose”
  • “melting” becomes “[mel]ting” (“mel” and “ing” are the same length, but “mel” appears first in the word)
  • “ingoltsmelt” becomes “[ing]oltsmelt”

Our function, findSubstrings, should take a list of strings words and another list of strings parts, and return the list of words with the longest substring indicated. For example, with words = ["word", "anticipating", "ingolt", "melting"], and parts defined as before we would get:

Naive solution

Let’s try approaching this problem directly. This would mean going through our list of words, one at a time, and then checking each substring in parts to see if it appears in our word. We will keep track of the longest part that occurs earliest in the string.

We can try running this function on our example, and it works. So we have a running implementation.

But is it good enough to pass the CodeFights time tests, or will it take too long to execute? We submit it and… it passes!

(I’m going to add an important caveat here and say that it passes at the time when I wrote this. Our engineers add new tests to challenges fairly frequently, based on user feedback and challenge results, so there’s a chance that when you read this my naive solution might not pass anymore. But the explanation below of its run time is still really useful, so don’t just skip to the optimized solution!)

What’s the run time?

We should take a moment to think about the running time of this code. If we call W the number of words, and P the number of parts, then we can tell that the running time is at least O(WP) from our nested loop. With a high-level language like Python, we also want to look for hidden loops in function calls. Inside the p loop, we call if p in w and w.index(p) multiple times. These functions are scanning the word w to look for p, and will have a running time of O(len(W)). We could speed up the function a little bit by calling w.index(p) once and saving the result, but we still have a running time that looks like O(WPN), where N is the length of a typical word in w.

So how bad is O(WPN)? From the constraints page:
* We have at most 5000 words, so in the worst-case scenario W = 5000;
* Each word is at most 30 characters, so the worst case is N = 30;
* We have at most 5000 substrings in parts, so the worst-case is P = 5000.

Putting this together, this gives us 750 million loops. This means each loop would have to run in about 5 picoseconds to be done in 4 seconds! Put another way, we would have to have PetaHz processors for the worst case scenario.

Can we argue that we solved the challenge anyway, and that the CodeFights test showed us that the cases we will be seeing aren’t the worst possible cases? Yes, you can absolutely argue that! When companies are hiring, they want people who can write code quickly to solve today’s problems. A working solution today is better than a perfect solution in a month!

But you should make sure you demonstrate to the interviewer that you know how to think about algorithmic complexity. Microsoft famously had an exponential algorithm for determining which software patches were needed for Windows XP, but it was working fast enough for 10 years(!), and no one realized that the algorithm was exponential.

So you can tell the interviewer that you know this is an algorithm with three multiplicative factors (length of a typical word, number of words, and number of substrings). You should explain that you pass the tests, but that your algorithm won’t handle the worst case in reasonable time. Go on to explain that this isn’t an exponential algorithm, so getting more powerful computers or waiting a little longer isn’t unreasonable.

An interviewer may ask you to write a faster algorithm anyway, so if you really want to impress her you can preemptively mention that you think there are better algorithms that you would use if there was a good business case for spending the extra time on this. Remember that premature optimization is the root of all evil. (Maybe that’s a little over-dramatic… but it’s still bad!)

Giving it our best Trie

Instead of the naive implementation above, we will use a special type of tree data structure called a trie to store the substrings in parts. Like all trees in programming, a trie has a root node. Every other node keeps track of a letter, and getting to this point in the trie completes a substring from parts. Taking our example of parts = ["An","a","men", "ing", "ie","mel","me"] from earlier, we can think about the trie pictorially as:

trie data structure

Note there are 7 red letters in the picture of the trie, which correspond to the 7 words we have in parts. Every leaf is a red letter, but the e below the m is also red to indicate that me is in parts.

To check if a string s is in parts, we start at the root and move to the first level in the trie that matches the first letter of s. If we are trying to match me, for example, we would start by moving from the root to m, and then from m to e. Since we end up on a red node, we know that our word is on the list.

trie data structure

To see if in is a member of parts, we would start at the root and follow the path to i, then to n. Since n is not red, we haven’t completed a string from parts, so we would conclude that in is not a member of parts.

trie data structure

If we want to see whether dog was in parts, we would see there is no path from the root to a d, so we could report right away that dog isn’t in parts. We will be scanning a word character-by-character, and instead of having to search through all of parts, we only have to search through the subset of parts that matches what we have already seen.

You have seen this data structure in action before – in fact, every time you see auto-complete working every time you use a browser or text!

Implementing it in code

Now we have to translate our picture into code. We will make a class TrieNode to store all the information in a node:

To create a trie with the words “cat”, “cam”, and “cha” and “chat”, we could write the following (tedious) code:

or, pictorially, we have

trie data structure uber interview question

Let’s make a function that adds a word to the trie so that we don’t have to do it manually. Our method will be to start at the root, reusing existing nodes along our path (or making new nodes as needed). The code for this is:

Using this function, we can condense the code to create our trie to:

Using the trie

I am going to move most of the hard work – that is, finding the longest substring and enclosing it in square brackets – to a function findSubstringInWord(word, root). The function findSubstrings(words, parts) will be responsible for making the trie and collecting all the words together at the end.

Here is our strategy for findSubstringInWord:

  1. Initialize our lenLongSubstr to 0.
  2. Go through word one character at a time, using that character as a starting point for matching substrings in parts.
  3. For each character, use the trie to find the longest substring from parts starting from this character. If it is longer than the longest one we’ve seen so far, record the current position in the string and the current length.
  4. Use the length of the longestSeenSubstring and its starting position to put the square brackets in the right place.

Note how going through the string in order, and only updating the position when we see a strictly longer string, automatically encodes our tie-breaking condition (yay!). The trickiest part is finding the the longest substring from parts starting at a given location, because we need to know the last terminal node we passed though before failing to find a match on the trie. It would be a lot easier if we just had to keep track of the last node we were on before we failed to match.

The function findSubstrings is simple by comparison! Note that it uses our earlier method of constructing the trie directly from parts.

Running time for the trie

Our algorithm still looped through each word. The constraint that we haven’t used yet is the length of the each element in parts. The worst case scenario is having to check 5 levels at each position at each word. In the worst case, we have:

  • The number of words, W = 5000;
  • The number of letters in a word, N = 30;
  • The number of levels to check in the trie, P = 5.

This is smaller than the naive implementation by a factor of 1000.

The final code!

Our code was spread over a few functions. Collecting it all together, we have:

We did it! High fives for everyone!

uber interview question trie data structure

Tell us…

How did you solve this Uber interview question? Have you ever seen one like it in a coding interview? Let us know on the CodeFights user forum!

San Francisco Live Tournament Recap

San Francisco Live Tournament Recap

On Saturday, April 22, CodeFighters from all over the Bay Area packed into the new CodeFights office and made history at the first-ever live San Francisco tournament.

Some long-time CodeFighters already knew each other, but for many it was their first time meeting up with other users in real life. The room rang with friendly chatter… right up until the tournament started, and then everyone got down to business.

CodeFights office during live coding tournament

The packed room was nearly silent during the hour-long tournament. CodeFighter xiaowuc1 pulled ahead early, solving all 10 challenges in under 15 minutes.

12 people were able to complete all 10 questions in the hour allotted, while 4 users finished 9 questions, 3 finished 8 questions, and the rest of the pack finished 7 or fewer. The questions started off fairly innocently and then ramped up in difficulty quickly! The last few were especially tricky:

  1. isEarlier
  2. onlyEvenNumbers
  3. lrSegmentNumber
  4. sequencePeakElement
  5. parkingCost
  6. findPath
  7. numberOfCarries
  8. holesErasing
  9. rangeCollapseRepresentations
  10. subdomains

Check out xiaowuc1 explain how he solved subdomains using tries:

Our winners!

1st Place

Taking top position by a wide margin, xiaowuc1 (Nick Wu) solved all 10 challenges in a blistering fast 14 minutes 28 seconds. His language of choice for solving these questions was Java.

xiaowuc1, live CodeFights tournament first place

2nd Place

Coming in second, cows_go_moo (Anton Cao) completed the tournament in 27 minutes and 39 seconds. He solved the challenges in Ruby. Keep an eye on this guy – he’s going to be an intern at CodeFights this summer!

cows_go_moo, CodeFights live tournament second place

3rd Place

Our third place competitor was ahmed_aly (Ahmed Aly), who finished the challenges in 38 minutes and 36 seconds using C++.

ahmed_aly, CodeFights live tournament third place

Special Shoutout

While he wasn’t able to make it into the city to compete in person, CodeFighter Pedro_O (Pedro Osório) participated remotely, solving the challenges in Python. He finished in 33 minutes and 33 seconds.

Thanks!

We had so much fun hosting this! Thanks to everyone who came out to compete, meet other CodeFighters, and hang out with the CodeFights team. It was great meeting you all!

We’ll definitely be holding more live coding tournaments in the future. In fact, it’s going to become a regular event! We’ll let you know when the next one’s scheduled – and hopefully we’ll see you there!

CodeFight on!

 

 

CodeFights Solves It: isTreeSymmetric

CodeFights Solves It: isTreeSymmetric

The most recent Interview Practice Challenge of the Week was isTreeSymmetric. This interview question from LinkedIn and Microsoft is checking to see how comfortable you are with the tree data structure. Trees have a wide variety of applications, so your interviewer is going to make sure that you know your stuff!

Have you solved it yet? If not, head over to CodeFights and give it a shot. Once you have a solution, come back here and I’ll walk you through how to answer this interview question. (Need to review first? Check out the CodeFights Explainer article about trees.)

solve this tree challenge
This cat is definitely solving isTreeSymmetric right now. You should too!

The technical interview question

Done? Okay, let’s get started!

Our job is to determine if a (binary) tree is a mirror image of itself. That is, we want to know if the left side mirrors the right side. Here is an example of a symmetric tree:

In contrast, this is not a symmetric tree:

The tree on the bottom is the same on the left and the right. In other words, starting at each 2, you get exactly the same tree. But it’s not a mirror image because the left and right sides — the 3 and 4 — don’t swap positions.

We are going to implement a tree in Python with the following data structure:

Subtrees and recursion

A subtree is what we get by starting at any node, and treating it as the root of its own tree. For example, consider the tree:

The subtree under node 2 is:

While the subtree under node 6 is:

A simpler question: Are left and right sides equal?

Let’s start with a slightly simpler problem. We’ll ask if the left-hand side of the tree is equal to the right-hand side of the tree, and we’ll assume that we have at least three nodes. With these assumptions, this is the same as asking whether the subtree under the first left node is the equal to the subtree under the first right node. Since subtrees are trees in their own right, what we would like is a function areTreesEqual(treeLeft, treeRight).

To figure out how to determine if two trees tree1 and tree2 are equal, we’ll turn to recursion. We have two trees, tree1 and tree2. Suppose that each tree has at the first two levels full (i.e. we have the root and both a left and right branch on level 1). Then tree1 looks like the following:

with a similar structure for tree2. We can think of tree1 as:

tree1 and tree2 are equal if and only if all of the following are true:

  1. A1 and A2 are equal;
  2. The tree {subtree L1} is equal to the {subtree L2} (determined by calling areTreesEqual(tree1.left, tree2.left));
  3. The tree {subtree R1} is equal to the {subtree R2} (determined by calling areTreesEqual(tree1.right, tree2.right)).

Now we need to work out the base case, so that the recursion will terminate. We assumed above that the first two levels would be full. This certainly isn’t true for leaf nodes, and even close to the root we may not have two branches. Here are the cases we need to consider that don’t fall into the recursion above:

  • Both trees are at a leaf (i.e. no children on either side). Just check to see if the values of the nodes are the same; if they are then these two subtrees are equal.
  • Either the left or right side is missing from both nodes. If the sides are the same, just check that side. If the sides are different (e.g. tree1 has no left side, but tree2 has no right side) then the trees are not equal.
  • The left side or right side is missing from only one node. Then the two trees are automatically not equal.

We now have enough information to write our function areTreesEqual:

A function areBranchesEqual, which tests if the two branches of a tree are equal, could then be written to first check if we have a root node, then use areTreesEqual on the two subtrees root.left and root.right.

The original problem: isTreeSymmetric?

The big difference between isTreeSymmetric and areBranchesEqual is implementing the mirroring. When we investigate the left hand path of a node on one side, we have to investigate the right hand side on the other. Our full code is:

The problem with recursion

So far we have a really neat and elegant solution for this interview question. It passes all of the tests on CodeFights, including the hidden ones. A lot of tree problems lend themselves really well to being solved with recursion. However, in a technical interview, the interviewer might point out that the constraints of the problem told us that we were guaranteed the tree would be smaller than 5 * 10^4 values. Could we trigger a maximum recursion depth error in Python? (In other languages, the general term for this is “run out of stack space”).

Our algorithm is a depth-first search check, so if we run into problems it will be on very deep trees. Whenever we move down a level, before calling the recursive step we check if the current left side and right side are equal. So our worst case scenario tree would be:

The depth of this tree is 24999, and the (default) maximum recursion depth in Python is only 1000. Thankfully CodeFights didn’t give us this “torture test” for our algorithm! But how would we deal with it if our interviewer asked us to solve these trees?

The solution is to convert the recursive algorithm into an iterative one.

Phew! Though the recursion version is a lot easier to read, it’s a good idea to practice rewriting recursive algorithms as iterative ones since there are limitations on how many successive recursive calls you can make before returning.

Tell us…

How did you solve this Interview Practice challenge? Have you ever encountered it (or one like it!) in an interview? Let us know over on the forum!

Interview Practice Task of the Week

Interview Practice Task of the Week

Have you been using our Interview Practice module to prepare for technical interviews? If not, you definitely should be!

Interview Practice has over 100 questions that get asked by top companies during real interviews. (Which companies, you ask? Oh, you know. Just lil’ companies like Apple, Facebook, Google, Microsoft, Twitter, and more.) The topics are ones that you’ll definitely encounter – coding patterns, algorithms, and data structures. You need to be strong these topics in order to do well in technical interviews, whether you’re a junior developer or have been in the field for years.

We’re adding more questions on more topics from more companies all the time, so you’re not going to run out of questions any time soon. In fact, it can be a little bit overwhelming – which challenge should you tackle first? A general rule of thumb is to choose questions based on where you’re interviewing and on topics that you need to study.

Where you’re interviewing

Interviewing at Facebook? Then it’s a given that you should practice using the Facebook Interview Practice questions!

But if we don’t have questions from the company that you’re interviewing with, use the resources at your disposal to find out what sort of questions typically get asked during their interviews. Glassdoor often has this kind of info, so it’s a good place to start researching. And leverage your professional network! Do you know anyone who’s interviewed at (or worked for) the company? Reach out to them and get the inside scoop.

Topics that you need to study

The list of things that you need to study will be largely informed by the research that you’ve done about the company you’re interviewing with. If you know that a company is likely to ask you questions about graph traversal, you can start working on graph traversal interview questions to prepare.

This is an instance in which knowing and being honest with yourself about your skills is critical too. For example, you may have known basic data structures inside and out as a computer science undergraduate. But if you’ve been working as a front-end developer for the past two years you might be a little rusty. Time to start practicing!

Studying != cheating

It’s pretty unlikely that you’ll be asked the exact same questions that we have here in Interview Practice or that are listed on Glassdoor. But researching questions that have been asked in the past is just plain smart, and a great way to set yourself up for success. The goal here isn’t to memorize the answers to a narrow set of questions. You’re not cheating! And really, even if that was your goal, interviewers pick up on that sort of thing really quickly. Rather, the goal is to be as prepared as possible for your interview.

Task of the Week

If you’re not sure where to start, we suggest that you tackle the Interview Practice Task of the Week! Every week, CodeFights content engineers select a new challenge to feature. They choose these interview questions because they’re tough ones that will get you into an interviewing mindset. Once you’ve solved the challenge, check out other CodeFighter’s solutions. It can be extremely instructive to see how other people have solved the same problem. Speaking of which…

CodeFights Explains It

Every Thursday, our content engineer Damien Martin writes a CodeFights Explains It editorial that breaks down how to solve the past week’s Task of the Week. He also tackles issues that are likely to come up during an interview. (Think optimizing your solution and working with interviewer-imposed restraints.) Make sure that you’re reading this series every week! It’s a great supplement to your interview preparation routine.

Tell us…

Have you solved an Interview Practice Task of the Week yet? How are you fitting Interview Practice (and CodeFights in general) into your interview preparation? Let us know over on the CodeFights forum!

CodeFights Explainer: Trees

CodeFights Explainer: Trees

Trees. They’re considered to be a computer science basic, but if it’s been a while since you’ve dealt with them – or if you’ve never encountered them before at all – they can be daunting. But tree questions are extremely common in technical interviews, so they’ll come up while you’re finding a job. And then on the job, they’re a wildly useful data structure. So let’s get you up to speed on trees!

A tree is a data structure that is particularly useful for representing data that’s sorted or in hierarchical order, and where we want to be able to retrieve data quickly. I’m going to dive into the idea of trees by introducing one of the most common use cases for them: the binary search tree.

Example: An address book app

Suppose we’re designing an address book application. You type in the name of the person you’re looking for, and if they are in your address book the program retrieves some information about them, like their address, email, and telephone number. If they aren’t in your contact book, you’ll get a “User not Found” message. For the moment, let’s just look at the list of names of people in our sorted address book:

Suppose you want to know if David is in your contact list. You could scan through the names, one by one, and then keep track of whether or not you have seen David yet. If you haven’t seen it by the time you reach the end of the list, then you can confidently report he isn’t one of your contacts.

Better yet, since you started with a sorted list, you can stop as soon as you see any name that would appear after David in an alphabetically ordered list. In this particular example, once you reach the 4th element in the list, Flaka, you know that David isn’t on this list.

Even with the improvement of the “stop once you reach an element bigger than the one you are looking for” approach, this algorithm is still O(N), where N is the number of items in our list. If you were checking to see if Zoe was on your list, then you would have to check Every. Single. User. Not cool.

A better approach would be to start in the middle, with Flaka. If you are looking for a name that appears earlier in the alphabet than Flaka (like David) then you know it is in the left half of the list (or it is not there at all). If you are looking for Zoe, you know that it appears on the right half of the list. For either name, you only have to search one half of the list. Let’s focus on the steps you would use to search for David:

  1. Start halfway through the list. The list of contacts has 7 items, so start at the 4th item, Flaka.
  2. Since David appears earlier in the alphabet than Flaka (i.e. as strings, David < Flaka), then you know that if David is in the list at all he is first, second, or third. Pick the middle element again, in this case Amy.
  3. Since Amy < David, if David is in this list he has to be the third element. But the third element is Colin, so David isn’t in this list.
list vs binary search
Visualizing the search: Look at 3 nodes, two jumps.

For David, we see that it takes 3 searches to find that he is not on the list, rather than 4. For Zoe, the difference is more impressive, because we can also exclude her name in 3 searches rather than 7! Here’s what searching for Zoe in your contact list would look like:

list vs binary search, part 2

At each step, we are halving the number of names you have to search through, so the number of searches you have to do to determine if a name is on the list is O(log N), which is a vast improvement over our original O(N)!

Using a tree

Instead of thinking about the list of contacts as a flat list, let’s try to visualize it the way that our sorting algorithm goes through it:

names in a tree
Visualizing the contacts as a tree.

Regardless of what name n we search for, we will always start at Flaka. If n is Flaka, then we’re done! If we are looking for a name that appears earlier in the alphabet, we will look at indices less than Flaka (to the left). Otherwise, we’ll move to the right. We keep moving down the diagram, looking for n, using the convention that if n is smaller than the current word we move to the left, and if it is larger we move to the right. If we reach the names at the bottom of the diagram and still haven’t found the name n, we can conclude that it isn’t in our diagram.

A tree is a way of implementing this data structure directly, rather than imagining this data structure on a simple list.

Each box in the diagram is called a node. Nodes keep track of the following information:

Properties of a node

Property name Type Description
children Pointers to other nodes We show all children of a node are connected to this node via arrows. The node Flaka in our example has two children: Amy and Madison.
value Data This is the data that is actually stored in the node. We are just storing the names at the moment, but a more useful address book would contain a reference to a person object so that we could get the email address, phone number, etc.

Continuing the family-tree analogy, we also have the notion of a parent node. Node X is a parent of node Y if node Y is a child of node X. We’ve already stated that the Flaka node had two children: Amy and Madison. This means that the parent node of Amy is Flaka.

There are also special types of nodes in a tree:

The root node is the only node on the tree that doesn’t have a parent, so in this tree the root is Flaka.

Then there are leaf nodes, which are the nodes that don’t have any children. In this case, the leaf nodes are Amanda, Colin, Gustav and Naomi.

Each node also a level, which is how far this node is from the root node. So Flaka (the root) is at level 1, Amy and Madison are at level 2, and the leaf nodes are all at level 3 in our example.

For a collection of nodes to be a non-empty tree (as used in computer programming), they need to satisfy 4 conditions:

  1. There needs to be exactly one root node (i.e. one node with no parent).
  2. Each node that is not the root node has only one parent node (i.e. no node is a child of two or more nodes)
  3. You need to be able to reach each node by starting at the root and following a sequence of “child” pointers.
  4. A node at level x cannot have a child at a level less than x.
    (In fact, all children of a node at level x have a level of x+1, because children of a node at level x are one extra step from the root.)

When we draw the nodes, we typically draw level 1 (the root) at the top of the page, and increase levels as we move down to the leaves. For some reason, computer scientists have decided to draw all of their trees “upside down”, with the root at top and the leaves at the bottom! Also note that we are jumping between two metaphors when labeling things for trees. We use a lot of terms inspired by family trees such as “parent” and “child”, as well as a lot of terms inspired by botanical trees such as “root” and “leaves”.

To get an idea of what some of the restrictions mean, let’s take a moment to look at two versions of our address book that are not trees. The problem with the data structure on the top is that Colin is a child of Amy and Madison, which breaks rule 2. The problem with the graph on the bottom is that Madison (level 2) is a child of Amanda (level 3).

not a tree
not a tree

These rules guarantee that we cannot have cycles while moving along the arrows, because either we will have no children nodes to visit (i.e. we are at a leaf), or we are guaranteed to be going to a higher level. Since we are moving along children links, all the nodes we have seen already are at a lower level than the current level.

All of this is pretty cool, but it seems like we’ve gone to a lot of work and just reproduced the O(log N) search of my contacts that I could have done with a flat (ordered) list! But one of the big advantages that the tree has over a flat list is that it is more flexible. Continuing on with the address book example, suppose I contact Amanda and Gustav a lot more than I call any of the other people. That means that every time I open my address book, my phone is usually doing three searches in the tree. We can rearrange the tree a little to get the following:

unbalanced tree
A tree with the nodes rearranged to make the average search quicker

In the longest searches, my app needs to do four steps now instead of three. But the most frequently searched-for names are now close to the top with the goal of reducing the average search time. Of course, we can also design a (bad) tree that is just a linked list:

a tree pretending to be a linked list
A tree that’s acting like a linked list.

The process of finding the best organization of the tree for the expected use of the data is called balancing the tree.

Other uses of trees

We have used trees to order data by value (i.e. sort it) as a way of replacing a list. A different way of using trees is to manage objects that are clustered or grouped by some attribute. Consider the organization of CodeFights Cafe. We have an owner, a head of accounting (Alice), and a store manager (Katherine). We also have some employees that belong either to the accounting department, or are managed by Katherine. We might structure a tree for CodeFights Cafe in the following way:

tree organizational structure
The organizational structure of CodeFights Cafe.

This tree arranges the nodes by responsibility, which means that any person in this chart manages their descendants. So Katherine can tell her assistant manager Fred what to do, as well give instructions to Claire and Jim. But she can’t give instructions to Bob, for example, because even though he is at a lower level he’s part of a different department.

It’s clear how this structure is a useful way for us to visualize how the Cafe is organized. It also makes certain types of operations easy to implement recursively. Suppose we wanted to send a message to the entire accounting department. We might have a function tell( node, msg) that will use the information in node to get the contact information for the person and email them the message msg. Suppose each node n has its children stored in a list n.children. We could send a message to n and all of its descendants (those nodes that n can reach that are at a lower level than it) with the following function:

If we want to message the entire company “Happy New Year!”, we can simply call tellAllDescendants( owner_node, "Happy New Year"). If we want to tell the service side that we need to bake more cookies, we can call tellAllDescendants( katherine_node, "Pls bake more cookies") and the message will be relayed to Katherine, Fred, Claire, and Jim.

This same organizational structure can answer other questions for us too, such as quickly finding out what the labor costs are in each department. Suppose we wanted to know what the total cost of the salaries in the service side of the business are (so Katherine’s salary, plus that of all the employees under her). If each node n stored that person’s salary as n.salary, we could write this function:

Note that us just asking for the sum of the node.salary and the sum of salaries of all of nodes children doesn’t work. When applied to the node for Katherine, this would add her salary to Fred’s, but it would miss the salaries of Claire and Jim. To find the salary for the service side of the business, we could now call sumSalaries(katherine_node). As a bonus, if we wanted to determine the entire salary cost for the business, we could call sumSalaries(owner_node). The fact that every employee node has only one parent ensures that each node is counted exactly once. (You can see that if Claire worked in the accounting department as well, then her salary would be counted twice by this code).

Our sumSalaries code is an example of a depth-first recursive algorithm. When calling sumSalaries(katherine_node), here is what our algorithm is doing:

  • Start processing sumSalaries on the node for Katherine
    • Before returning, we call sumSalaries on the node for Fred (Katherine’s child)
    • Before returning, we call sumSalaries on the node for Claire (Fred’s child)
      • Claire’s node has no children, so now it returns.
    • Before returning, we call sumSalaries on the node for Jim (Fred’s child)
      • Jim’s node has no children, so now it returns.
    • We have now summed the salaries of all Fred’s node’s children, so the call to sumSalaries on node for Fred returns.
  • We have now finished summing the salaries of all Katherine’s node’s children, so call to SumSalaries on Katherine returns.

We had to reach a leaf node before we started having functions returning values. When using depth-first recursive algorithms, we have to be careful that we don’t run out of room on the stack! In this case, we had three function calls “stacked” until we finally started returning answers.

We can also use depth-first recursion to search for a particular node (but we’ll discuss a better approach to this problem below). So far, I have been taking it for granted that I have a variable katherine_node lying around. What if I only have the root, owner_node, and I want to find the total salary of the service side of the business? Suppose nodes stored the department’s name as node.department. One way we could do this is to do a depth-first search, stopping the first time we find a department with the right name:

Using depth-first search (DFS) doesn’t make much sense in this example. If we start searching by moving from the owner node to the node for Alice, DFS will keep search through the entire accounting department before giving up. Since we are looking for the highest-ranking person in the service department, it would make more sense to search the tree level by level. This is called breadth first search (BFS), and is implemented below:

The question of which type of search strategy, DFS or BFS, is better depends on the structure of the tree, and what information you are looking for. (I’m going to cover this issue in depth in a future article – stay tuned!)

Summary

We’ve seen that when we have ordered data, trees can give us an O(log N) way of searching the tree for the data. We’ve also seen that trees are more flexible than using simple arrays because we can balance the tree.

Trees are particularly useful when:

  1. You have data that is sorted in some way, and you want to do a lot of searches on it.
  2. You have data that is “grouped” in some way, such as the organizational tree for CodeFights Cafe. File systems are a common hierarchy, where the nodes are either files or directories, and node X is a child of node Y if node X is contained in Y. In this example, all ordinary files are leaves, every parent is a directory, and / is the root. A web-based example is HTML’s document object model, which is tree-based. Front end developers even talk about “child elements” and “parent elements”!

We’ve seen that with hierarchical data, we can still search through it. We can iterate through the entire tree to get information from each node (commonly by going either depth-first or breadth-first), while ensuring that we visit each node exactly once. This process is called tree traversal. We looked closely at a related topic, which was searching in a hierarchical tree. This is like tree traversal, except we can stop whenever we find what we are looking for, so we no longer guarantee that every node will be looked at.

Ready to tackle some trees?

As I mentioned earlier, tree questions are extremely common in technical interviews. You’ll definitely want to practice them while you’re looking for jobs and preparing for interviews! But you don’t have to hunt for good questions. CodeFights Interview Practice has you covered! When you’ve solved all of these questions, take a look through the solutions that other CodeFighters have posted. It’s always really instructive to see how other users have solved interview questions – you’ll definitely learn something new!

Tell us…

Have you encountered any tree questions in interviews? How did you tackle them? Head on over to the CodeFights forum and let us know!

Live Tournament in San Francisco

Live Tournament in San Francisco

Bay Area CodeFighters, come join us on Saturday, April 22, for a special event: We’re breaking in our new San Francisco office by hosting a special live one-hour tournament!

Meet the CodeFights team, hang out with other Bay Area coders, and compete for bragging rights, prizes, and CodeFights swag.

The top three participants will all get $50 Amazon gift cards, and the first place winner will also get a special CodeFights gift box.

Tournament Details

2:30 – Check in, hang out, eat snacks
2:50 – Welcome from the CodeFights team
3:00 – Tourney begins
4:00 – Tourney ends
4:10 – More hanging out, more snacks, prizes!
4:30 – Doors close

Sign up on our Facebook event page.

See you there!

CodeFights Solves It, Interview Practice Edition: groupsOfAnagrams

CodeFights Solves It, Interview Practice Edition: groupsOfAnagrams

This week’s featured Interview Practice challenge was groupsOfAnagrams. It’s a Twitter interview question, and they’re known for their difficult technical interviews! If you haven’t solved it yet, go ahead and do that first, then head back here and we’ll discuss how to get to a solution with a good runtime.

…Done? Okay, let’s dig into this coding challenge!

The technical interview challenge

You are given a list of n words. Two words are grouped together if they are anagrams of each other. For this interview question, our task is to determine how many groups of anagrams there are.

For example, if the list words = [listen, admirer, tea, eta, eat, ate, silent, tinsel, enlist, codefights, married], we should have groupsOfAnagrams(words) return 4 because there are 4 groups:

  • listen, silent, tinsel, enlist are all anagrams of each other.

  • tea, eta, eat, ate are all anagrams of each other.

  • married and admirer are anagrams of each other.

  • codefights is not an anagram of anything in words except itself.

In the challenge, we are guaranteed that each word only consists of lowercase letters. So we don’t need to worry about the rules regarding spaces and punctuation. Each word is guaranteed to be at most 10 letters. The hard part is that we have to do this on lists containing up to 10^5 words!

First attempt: Determining if two words are anagrams

There are a few standard ways to determine whether or not words are anagrams of each other. One way is to go through each letter of the alphabet, and check that it appears the same number of times in each word.

What is the runtime of this algorithm? Suppose a typical word has L characters. In calling word.count(char), we are asking the computer to scan word to count the number of times character char appears. This is an O(L) operation. Technically, this makes areAnagrams an O(L) algorithm because we do this count operation for each letter of the alphabet, which doesn’t depend on the size of the word. However, big O notation suppresses the really bad constants hiding in the runtime! Consider that our words have at most 10 letters, and some may be repeats. That means that of our 26 iterations of the loop, at least 16 of them are wasted looking for letters that don’t appear. If we call the number of letters in the alphabet A, then the runtime of this algorithm is O(AL).

Interview Tip:

Asymptotic, or “big O”, notation misses constants. Analysis such as treating the size of the alphabet as a parameter is useful to estimate the size of the constants. This is particularly true in this case, where L is guaranteed to be less than the size of the alphabet A.

If the interviewer questions why you’re doing this (after all, English is almost certainly going to have 26 lowercase letters for our lifetimes!) you have the chance to impress them by saying that you’re considering what happens if punctuation is allowed, or if the company decides to look at large unicode character sets. This shows that you’re forward thinking, and have design on your mind – two things that it’s important to emphasize in technical interviews.

Basically, if using a constraint in the problem helps you significantly reduce the problem, feel free to use it. If you need time while you’re working on your solution, or are doing something odd like considering the number of letters in the alphabet as a variable, you can always defend your move as considering the design implications of the constraint!

We can reduce this runtime of our solution significantly by going through each letter in word1 one at a time. This is at most 10 letters, rather than 26. Here’s one way of implementing it:

While the code is longer, and it’s still O(L), it is no longer O(AL). Note that we are no longer assuming the strings are only made of lowercase letters!

Approach 1: Works, but too slow…

We have an anagram checker that will tell us if any pair of words are anagrams.

What is the runtime here? We run through the loop O(N) times, where N is the number of words. If we are lucky, we remove a lot of words early. However, if each word is not an anagram of any other word, the while loop runs exactly N times. Inside the loop, we scan the remaining words for anagrams. So this is an O(N^2) algorithm. Since we can have up to 10^5 words in a list, O(N^2) probably won’t work. Sure enough, this code doesn’t pass the tests on CodeFights – and it’s not an answer that will knock the interviewer’s socks off during a technical interview. We can do better!

Approach 2: Precomputing invariants

We are going to find an invariant that all the words that are anagrams share with each other, but don’t share with any words they are not anagrams of. We will use these as keys in hash tables with dummy values. At the end, we will simply count the number of keys in the hash table.

What are the invariants that we can use? A simple invariant is the number of as, the number of bs, …, etc. Any two words that are anagrams will have the same invariant, and any two words that aren’t anagrams will have different invariants.

For each word, this is an O(AL) process, as we are iterating through both the letters in the word and in the alphabet. The gain is going to occur because we only calculate it once per word.

Our algorithm is then:

Because calcInvariant is O(AL), where L is the number of letters in the word, we see this algorithm has time complexity O(NAL). Note that we never compare invariants. Instead, we use the magic of hash tables to set the value of 1 to the appropriate entry. If that key already exists, we overwrite the value. If it doesn’t exist, a new value is created. It was the comparison between keys that gave us our problematic O(N^2).

A cleaner, faster, invariant

There is another natural invariant for the words: sorting the letters that occur in the words. We could use:

This has the advantage of being easy to understand, and not making any assumptions about the underlying algorithm. It is a sorting algorithm, so it is O(L log L) instead of O(AL). (Technically we can’t compare algorithms using big O notation like this, because we don’t know the constants involved.) The sorted string can then be used for the invariant. In fact, the code for groupsOfAnagrams doesn’t have to change at all!

This is faster because our sorting algorithm on small words is much faster than our O(AL) dictionary. If the words are long (e.g. we are comparing novels) then our compression of the string to count the number of times the characters are repeated would win. When accessing the hash table hash_of_invariants we need to use the hash function, which depends on the length of the keys. Since L < 10, we know that each key is at most 10 characters long, so the sorting invariant is elegant and understandable.

Dictionaries where we don’t care about the value: Sets

In our hash table, we didn’t really care about the values assigned to the keys. We were only interested in the number of distinct keys we had, as that told us how many groups of anagrams we had.

There is nothing wrong with this, except that someone reading your code may wonder what the significance of the 1 is when you wrote:

The comments certainly help. But Python has a built-in data structure for this case called set. If we made a set of invariants instead, we would simply have:

There is not a visible dummy value that might confuse future maintainers of your code. Under the hood, the Python set is implemented as a dictionary (hash table) with hidden dummy values with a few optimizations, so the real reason for doing this is readability. Because hash tables are used more frequently than sets, there is a valid argument that the hash table is more readable for more programmers.

The final solution

Here is our final, compact solution! Any interviewer would be happy to see this solution during a technical interview.

Timing all the methods

In an interview, you always have to consider the runtime. As you could see, a lot of the effort in creating a solution for this interview question was finding an algorithm that was quick enough! I timed the algorithms for two cases:

  • Our first example, words = [listen, admirer, tea, eta, eat, ate, silent, tinsel, enlist, codefights, married]

  • A pathological example, where each word was its own group of 10 letters, starting with abcdefghij, abcdefghik, …, to abgilnoz (this is a total of 10^5 words)

I didn’t include the raw times, as that tells you more about my computer than the algorithm. Instead I normalized to the time for Method D on the list words (i.e. Method D has a time of 1 by definition).

Method Description Time for words (compared to D on words) Time for 10^5 words (compared to D on words)
A O(N^2) method 5.4 Too Long to Time
B Using precalculated tuple invariant for hash table 6.4 70300
C Using sorted words for hash tables 1.2 10450
D Using sorted words for sets 1.0 7000

Tell us…

Did your solution for this Interview Practice challenge look different than mine? How did you approach the problem? Let us know over on the CodeFights forum!

Ace Your Phone Screen By Telling Your Story, Pt. 2

Ace Your Phone Screen By Telling Your Story, Pt. 2

In Part 1 of this series, you learned how to craft a Story that will resonate with recruiters. Now we’ll talk about exactly how to tell the recruiter your Story during your phone screen.

So you’ve prepared your Story and practiced it a few times. You’ve got your next phone screen scheduled. Now you just need to wow the recruiter!

Be ready

When it’s time for the phone screen, be ready at the agreed-upon time. The recruiter may call a few minutes late, and you shouldn’t take this personally, but on your end you should be 100% ready. Make sure that you’re in a quiet spot and that your phone is fully charged! While this might seem obvious, every recruiter can tell you stories about candidates who took the phone call on the subway, or in a too-loud coffee shop, or… Well, you get the picture. Somewhere other than a quiet, calm place with no background interference that might make it hard for the recruiter to concentrate on what you’re saying. And give yourself a little padding at the end of the scheduled time. If the phone screen is going well and runs a little long, you don’t want to have to cut it short because you have another appointment.

You want to have enough time to finish up this conversation!

Tell your Story

Recruiters will often lead with an open ended question like “Tell me a little bit about yourself.” The purpose of this is two-fold: they want to put you at ease, and they want to get a sense of who you are and what you’re about. The recruiter likely has your resume and your LinkedIn profile in front of them while you’re talking, so don’t just start reciting bullet points. Instead, this is when you’re going to tell the recruiter your Story. At this point, keep the narrative at a high level (think generalities, not specifics). You can dive into the details later if they’re relevant and will drive your Story forward.

Let’s discuss

Your Story isn’t a monologue. Instead, it’s an invitation for the recruiter to ask questions! If you find that you get lost when the recruiter asks a question, it can be worthwhile to keep a list of the high points on hand during the phone screen so that you don’t miss entire portions of your personal narrative.

Talking tech

Even though recruiters tend not to be very technical, as the gatekeepers of the interview process they need to hear that you’re technically competent enough to get to the next round. Be prepared to talk about languages, frameworks, etc. so that they can get a sense of your proficiency level. Part of your Story should be a quantification of how well you know the tools that you have listed on your resume and LinkedIn. Be honest about this stuff! “Familiar but rusty” isn’t going to disqualify you in most cases, and it’s going to come out sooner rather than later if you’ve overstated your skills.

Context, context, context

Never forget why you’re talking to the recruiter – you are interested in a particular position! This context will help you tailor your Story to the specific role and company in question. For instance, if you’re interviewing for a role at a startup, discuss projects or anecdotes that highlight your flexibility, agility, and sense of urgency. Or, if you’re interviewing for a role at a larger company, highlight your commitment to iteration, optimization, and process. Think about why you’re excited about the role or company, and this will come through in your answers.

Stay positive

Never trash talk employers, even when it’s deserved! Keep things positive and professional at all times. Negativity is a big red flag for recruiters.

Check in

While most recruiter phone screens tend to take between 30 minutes to an hour, sometimes they can seem to last forever. Talking about yourself for that long can be hard! It’s okay to check in with the recruiter if you feel like you’ve been talking too much. Don’t be afraid to stop and ask if there’s anything else they want to know about.

Question everything

Always be prepared with some questions! Be sure to do some preliminary homework on the company. Google them to find some recent articles, and spend some time on their website. This will definitely guide a few specific questions. A few good generic ones:

  • “What will role be immediately responsible for/what would I be working on first?”
  • “Is this role new? If so, how is <company> building out the team?”
  • “Can you tell me about professional development at <company>?”
  • “What does the career path/growth for <role> look like?”
  • “What are you most excited about for <company> this year? What brought you there? What keeps you there?”

And finally, never ask about money first. If that’s what you lead with, that’s what you seem to care about most.

Finish strong

The end of your conversation with the recruiter is the perfect opportunity to seal the deal! Tie elements of your Story into specifics about the role and company: “After chatting with you, I’m really excited about x,y,z because it fits in with a,b,c that I’m bringing to the table.” Emphasize that you’re really interested. Now’s not the time to play it cool!

Do you want that job?
The recruiter should already be able to tell you want the job. Don’t make them ask.

And always ask about what the next steps are and what you can do to prepare for them. This shows that you’re proactive, and it’s always a great signal to recruiters.

Congratulations!

You made it past the recruiter gatekeeper! You’re not out of the interview labyrinth – heck, you really just got started – but you’re one step closer to getting that job offer. Put the time and effort into crafting a cohesive, compelling Story before you start off into the interview labyrinth. It’s going to pay off. Not only will you be able to use it in in phone screens, as we’ve discussed in this article, but you’ll be able to use large parts of it in the actual interview as well.

You’re reading an article about how to ace recruiter phone screens, so my spidey senses tell me you might be looking for a job! Did you know that CodeFights can connect you with hundreds of tech companies that are actively seeking qualified engineers – all with only one application? Head to codefights.com/jobs and start finding that dream job today!  

Tell us…

Do you have any tried-and-true tips for doing well on recruiter phone screens? Tell us over on the CodeFights forum!

CodeFights Explainer: Prefix Sums

CodeFights Explainer: Prefix Sums

Prefix sums, or cumulative sums, allow us to quickly find the sum of any contiguous slice of an array.

As a quick example, suppose you copy down some driving directions. They tell you to drive down Penny Lane for 3 miles, then make a left onto Abbey Road, which we travel down for 5 miles before continuing on Lake Shore Drive for 4 miles, and so on. We convert these directions into the number of miles driven on each road.

The prefix sum for the distances is:

This tells us that it took 3 miles to get to the end of the first direction (Penny Lane), and 8 miles in total to get to the end of the second direction (Abby Road). If we want to know how long it took to get from the end of Abbey Road (mile 8 of our trip) to the end of Sunset Blvd (mile 18), we do one subtraction on prefix_distances rather than two additions on distances:

A useful example of prefix sums would be calculating a moving average of an array, which is designed to remove periodic fluctuations in data. For example, if we knew the amount of money brought in by sales of cookies at the CodeFights Cafe per day, we might get something like:

We might guess there is fluctuation based on the day of the week. (Maybe it’s in an area that slows down over the weekend.) The seven day moving average would contain the average of the first seven values (4,5,8,10,12,0,0), while the next value would be the overlapping average of the next seven values (5,8,10,12,0,0,5). The moving average is:

In order to calculate the moving_ave we can start with the prefix sums:

To get the cookies sold for the first seven days, we can look at prefix_cookies[6]. To get the total number of cookies sold from day 2 to day 8 we can calculate prefix_cookies[7]-prefix_cookies[1]. Once we have the prefix sums, calculating the total number of cookies sold in any seven day period becomes trivial. Once we know the number of cookies sold in a week, we can divide by seven to find the average.

The nice thing about the prefix sums approach to moving averages is that the method is agnostic about the period used to average over. If CodeFights Cafe found that there was a monthly cycle to their cookie sales, it would be easy to use the same array prefix_cookies to average over 30 days instead.

Other applications of prefix sums are:

  • Calculating the average value of blocks of pixels (useful in noise removal).

  • Calculating whether one point is visible from another, given an array of heights (called the line of sight problem).

  • Finding cumulative distributions (for example, working out what percentage of income the top 1% of earners make from an array of incomes).

Maybe more impressive is the fact that prefix sums can be performed in parallel. This leads to some really useful algorithmic tricks.

Parallelizing prefix sums

If we are trying to calculate the prefix sum of an array L , we can give a linear time algorithm. To calculate the prefix[n], simply take:

This algorithm seems simple and fast, but it is also clear that because prefix[n] depends on prefix[n-1] the problem seems embarrassingly serial. That is, it doesn’t seem like we can take a huge array and split it in two, and have each computer calculate half the prefix sum, and then easily join the results together in a way that saves time.

The amazing thing is that there is a parallel algorithm for prefix sum! The method involves passing through the array twice:

  1. Bottom-up
    Takes the array and builds a binary tree from the array, making pairwise sums along the way.
  2. Top-down
    Takes the array of sums, and determines the prefix sums.

Let’s look at how this works with our array of distances, [3,5,4,6,2].

First we build a tree, where each element is a leaf at the bottom. Each node keeps track of the index range it comes from (to help us put results from different processes together), the sum of all the elements in index range, and one other element that we’ll ignore for right now.

Prefix Sums Figure 1
Notice that the sum stored in each node can be obtained by adding the sum of its two children. This means that we can split the job off to different machines, and then combine them at the end. What we end up with at the top node (or root node) is the sum of all the elements in the array.

To get the prefix sums, we will define left for a node with an index range [a,b) to be the sum of all the elements of the array with an index of less than a. In other words, this is the sum of all the elements that appear to the left of the first element included in this node. We start at the root node and make our way down the tree.

Prefix Sums Figure 2
To convince ourselves that we can really construct the left attribute this way, we will concentrate on the red boxed square. Moving from the parent (index = [0,4)) to the left child is easy: Since the left child has the same lower limit, it just copies the left value (0 in this case, because these are the elements that start at the beginning of the array). Moving to the right child (index = [2,4)) takes a little more thought. The right child knows that indices from [2,4) add up to 10 by looking at its own sum attribute. By knowing the sum in the parent element is 18, we can deduce that the sum of all elements with indices less than 2 must be 18 - 10 = 8.

To get the prefix sum requires taking the left element of all the leaves (except the first one, which is trivially zero) and the sum of the entire array.

Prefix Sums Figure 3
so our prefix_distances are found to be:

Key points

  • Prefix sums are interesting in their own right, in terms of just precomputing results and then allowing you to rapidly calculate any contiguous slice of an array.

  • Demonstrates a pattern in computer science of breaking a seemingly serial task into one that can be parallelized. Can be generalized: build up a binary tree, then move down the tree from the top to separate off the contribution from the “left hand side” of the tree.

Tell us…

Have you ever encountered a problem in an interview that you solved using prefix sums? Or better yet, encountered one in real life? Let us know over on the CodeFights forum!