Our Interview Practice challenge this week is `goodStringsCount`

, a *combinatorics* problem. As the name suggests, combinatorics deals with combinations of objects that belong to finite sets, and it’s one of those topics that come up *a lot* in technical interviews. This specific coding problem is from Apple, which makes sense since they’re known for asking combinatorics questions in their technical interviews!

If this isn’t your first CodeFights Solves It rodeo, I bet you know what I’m going to ask next: Have you solved goodStringsCount yet? If not, hop to it! Once you’ve got your solution, come back and we’ll walk through the problem together.

Done? Great! Let’s get into it.

## The technical interview problem

For this programming interview question, our job is to write a function that counts “good strings” of length `n`

. A “good string” is defined as a string that satisfies these three properties:

- The strings only contain the (English) lowercase letters a – z.
- Each character in the string is unique.
- Exactly one letter in the string is lexicographically greater than the one before it.

The first two conditions tell us that any string made up of lowercase letters qualifies. For example, there are no good strings of length 1, because the single letter doesn’t have anything to be greater than! In other words, `goodStringsCount(1)`

should return 0.

## Looking at good strings of length 3

Strings of length 2 are actually a little misleading, because they don’t really allow us to dig into the third condition on good strings. It just tells us that the characters have to appear in ascending order, as the second character has to be greater than the first. So let’s start by looking for good strings of length 3 instead!

The third condition tells us that “bfg” is **not** a good string because all the letters appear in ascending order. But “bgf” **is** a good string since:

- looking at the first two characters: ‘b’ < ‘g’ (so there is a letter greater than the one that precedes it)
- looking at the next two characters: ‘g’ > ‘f’ (so there is only one letter greater than the one that precedes it)

From the letters `b`

, `f`

, and `g`

, there are six possible strings, of which 4 are “good strings”.

“bfg”,

“bgf”,“fbg”,“fgb”,“gbf”, “gfb”

There is nothing particularly special about the characters `b`

, `f`

, and `g`

. We could use `a`

, `h`

, and `k`

as well (just replace `b`

with `a`

, `h`

with `k`

, and `g`

with `k`

in the list above). In fact, *any* three distinct letters would do, since the only property we used was their relative order. So we have:

where is ** n choose r**, the function for counting the number of ways of choosing

`r`

elements from `n`

. This is a common function in combinatorics problems, so I’ve included a discussion of it as a footnote to this article! We’ll be using this function a lot, so if you haven’t come across it before skip to the end and read about it now.## Overall strategy for this problem

After looking at a specific case, we get an idea of the structure of this problem. For finding the good strings of length `n`

:

- We need to find the number of ways of picking
`n`

letters, which is`26 choose n`

. - Given the numbers 1, 2, …,
`n`

, we need to find the number of ways we can arrange them in a “good sequence”. A “good sequence” is when exactly one number is greater than the one before it. We will call the number of good sequences of 1, 2, …, n`good(n)`

.

The idea is that for each set of `n`

letters, we can put them in alphabetical order, and label the letters from 1 to `n`

by their order in this array. Any good string then gives a “good sequence” of 1 to $n$, and any “good sequence” corresponds to a unique string (from these letters). For example, using `n=3`

and the letters `b`

,`f`

, `g`

:

* The good sequence `[2,3,1]`

corresponds to the good string “fgb”

* The good string “gbf” corresponds to the good sequence `[3,1,2]`

The number of good strings of length `n`

is then .

## Finding `good(n)`

Let’s call our sequence `[ a[1], a[2] , ..., a[n] ]`

, where each `a[i]`

is distinct and takes a value from 1 to `n`

, inclusive. If this is a “good sequence”, then there is exactly one element `a[j-1]`

that is greater than the element before it, `a[j]`

. Note that this means that:

`a[1]`

>`a[2]`

>`a[3]`

> … >`a[j]`

, i.e. the elements before`a[j+1]`

make a decreasing sequence`a[j+1]`

>`a[j+2]`

>`a[j+3]`

> … >`a[n]`

, i.e. the elements from`a[j+1]`

onward make a decreasing sequence

For example, a good sequence `[4,2,5,3,1]`

can be broken into the two decreasing sequences `[4,2]`

followed by `[5,3,1]`

. In the notation above the value of `j`

in `[4,2,5,3,1]`

is `2`

because `a[2+1] = 5`

is the only element bigger than the previous element.

Given the list of `n`

numbers, there are `n-1`

choices for `a[j]`

. (You can’t use the very last element, because there is no `a[j+1]`

.) Another way of seeing this is that we are splitting the array into two decreasing subarrays, and the only places that we can split on are the locations of the commas!

For now, let’s pick a particular value of `j`

and ask how many good sequences of 1 through `n`

we can make with this value of `j`

. Given a set of distinct numbers, there is only one way of making them into a decreasing sequence, so really we just need to pick which `j`

numbers go in the first decreasing sub-array, and which `n-j`

numbers go in the second decreasing array. Note that once we pick the `j`

numbers for the first array, the remaining numbers automatically go into the second array, so the number of distinct ways of splitting these numbers up is really just . (We have counted one problematic sequence in here, which we’ll come back to.)

In case you got lost in the

`n`

and`j`

s, here’s an example to help you out. Let’s pick`n = 5`

and`j=2`

. Instead of counting all the good sequences, we are just counting the ones where the split happens at position 3. So we have:

`[a[1], a[2]]`

make a decreasing sequence, and`[a[3],a[4],a[5]]`

make a decreasing sequence.How many sequences like this are there? I need to split

`1,2,3,4,5`

up into two pieces: two elements (i.e.`j`

) become`a[1]`

and`a[2]`

, while the remaining three elements (`n-j`

) become`a[3]`

through`a[5]`

. But once I pick`a[1]`

and`a[2]`

, whatever is left over has to go into the other array. There are ways of picking`a[1]`

and`a[2]`

:

*`a[1]`

and`a[2]`

are 1,2: the sequences are`[2,1]`

`[5,4,3]`

->`[2,1,5,4,3]`

*`a[1]`

and`a[2]`

are 1,3: the sequences are`[3,1]`

`[5,4,2]`

->`[3,1,5,4,2]`

*`a[1]`

and`a[2]`

are 1,4: the sequences are`[4,1]`

`[5,3,2]`

->`[4,1,5,3,2]`

* …

*`a[1]`

and`a[2]`

are 3,5: the sequences are`[5,3]`

`[4,2,1]`

->`[5,3,4,2,1]`

*`a[1]`

and`a[2]`

are 4,5: the sequences are`[5,4]`

`[3,2,1]`

->`[5,4,3,2,1]`

(UH OH!)

The last one is problematic, because all the numbers are decreasing! So while there are 10 ways of making the split into two decreasing sequences with `n=5`

and `j=2`

, there are only 9 good sequences (with `n=5`

and `j=2`

) because these is one sequence where all the numbers are decreasing.

The example shows us the problem with the way we have counted by splitting into two decreasing arrays: We never made sure that `a[j+1]`

was bigger than `a[j]`

. Fortunately there is only one arrangement that doesn’t work: the arrangement `[n,n-1,...,1]`

, so we have over-counted by exactly one. The number of good sequences split of 1 through `n`

, split into arrays of length `j`

and `n-j`

is therefore .

To determine `good(n)`

, we just need to add up all the different good sequences we can get by putting the cutoff between the two arrays at different locations `j`

. We get:

This gets us to a point where we can write a program that will run quickly enough, so if you’re totally “mathed out” you can stop here! We can do a little bit better, through. Here are a couple of clever tricks:

- , because there is 1 way of not choosing anything from
`n`

objects regardless of`n`

– just don’t pick any of them! – and 1 way of choosing all`n`

objects from`n`

objects. So putting`j=0`

and`j=n`

into gives zero. Now we can rewrite:

- You might remember from Pascal’s triangle that

One way of seeing this result is: The sum is asking us about the number of ways we can select a subset of`n`

elements by figuring out the number of subsets with 0 elements, the number of subsets with 1 element, et cetera. Since each element is either in a subset or not, there are`2^n`

different subsets.

After all this work, we get:

## Putting it all together

From our overall strategy, we had ways of picking the `n`

characters, and for each of our choices we had `good(n)`

ways of arranging those characters into good strings. So the number of good strings of length `n`

is:

## …and now, some code!

The hard part about this problem isn’t writing the code, but figuring out how to count efficiently. The code is pretty small in this case:

1 2 3 4 5 6 7 8 9 10 11 |
from math import factorial as fact def nChooseR(n,r): # there are better ways that don't involve using # dealing with large integers. For the moment, # I am opting for pragmatism. return fact(n)/((fact(n-r)*fact(r))) def goodStringsCount(n): return nChooseR(26,n)*(2**n - n - 1) |

## Footnote: Ways of choosing r objects from n objects

How do we get the formula for in the first place? Suppose we have `n=6`

objects: a flag, a cookie, a laptop, a pen, a cup of coffee, and an orange. Note that we have picked easy-to-distinguish items! We want to select `r = 4`

of them. How many different sets of items could we come up with?

We have `n`

choices for which item we pick first. We have `n-1`

choices for the second item, and so on. So it *seems* like the result is `n(n-1)(n-2)(n-3)`

. This becomes a little inconvenient to write for a general `r`

(in this case, we know that `r = 4`

), but notice that:

This formula over-counts the number of different sets of items we could have because selecting the laptop, then the coffee, then the orange, and then the pen would give you the same set of items as selecting the coffee, followed by the orange, the pen, and the laptop. In the formula above, we have counted these two arrangements separately! (This is called a *permutation* of selecting 4 items from `n`

, and is another useful formula to have under your belt).

How many different orders are there for selecting these 4 items? This is the number of times we have over-counted each set of items we could end up with. We’ll have 4 choices for whichever one we could have picked first (laptop, coffee, orange, or pen) without affecting the items we end up with. With the first item selected, we have 3 items to choose from for the second item, 2 choices for the third item, and then the last item is the 1 thing left over. So we can rearrange these 4 items in ways. The number of combinations for picking `r=4`

things from `n`

items is:

You can (and should!) run through this argument for a general value of `r`

, and convince yourself that the number of ways of choosing `r`

items from `n`

is:

## Tell us…

How did you solve this combinatorics interview question? Did you solution differ from mine? Let us know over on the CodeFights forum!