8 June 2016

A few days ago I came across an article from a while back that makes several arguments against using cycle detection as an interview question. If you're unfamiliar, here is the problem.

Given a linked list, determine if it contains a cycle.

Among the arguments listed in the article is a claim that it is unreasonable to expect someone to solve this problem during the course of an interview, because linked lists were devised in 1955 but the cycle detection problem didn't have its now-standard tortoise-and-hare solution published until many years later. As a side note, the article claims that it was published by Floyd in 1967, but that paper actually describes a solution to a very different problem. Apparently the earliest known published mention of this algorithm is by Knuth in 1969, but Knuth attributed it to Floyd, so the algorithm was probably reasonably well-known in academic circles beforehand.

I have seen a similar sentiment recently with regards to Dijkstra's algorithm. These two reactions are similar in that their main evidence for a problem being hard comes from the history of how those solutions came to be. In the case of cycle detection, the claim is that because a solution wasn't given for 12 years, that it must be a hard problem. In the case of Dijkstra's algorithm, it's because Dijkstra is regarded as one of the greatest computer scientists.

While I don't believe that either of these problems are good interview questions, the argument against them is much worse. If someone asked me whether I expect someone to be able to come up with Dijkstra's algorithm within a short time span, I'd say absolutely yes. Two weeks ago I wrote about a path one might take to derive Dijkstra's algorithm from scratch. The standard interview length of an hour might be a little too short, but I'd certainly expect a thought process like that to be able to happen if a strong candiate were given, say, four hours.

But regardless of how hard (or easy) I think these problems are, the fact remains that the tortoise-and-hare algorithm wasn't published until 12 years after linked lists were, and Dijkstra's algorithm was, in fact, invented by Dijkstra. However, there are several reasons why those facts don't imply that these problems are unreasonably difficult.

Some answers aren't given because the question hasn't been asked.

A huge advantage that the interviewer has over the historical solvers is that the question is already formulated and asked. I think the authors of the reactions I linked above both significantly underestimate the power of asking a well-formed question.

Cycle detection is especially interesting, because based on the applications, it seems unlikely that it was conceived in the context of linked lists as a data structure. Instead, it was probably thought of in terms of repeated function application: given a function \( f : S \to S \), does the sequence \( x, f(x), f(f(x)), f(f(f(x))), \ldots \) eventually become periodic? In that case, the publication date of linked lists as a datastructure becomes a clear red herring.

But wait, you might say, the concept of a function dates back to the 17th century, so does that mean that this problem wasn't open for some time under 14 years, but instead for over 200 years? But remember, a problem isn't really open when all its prerequisites exist. A problem is open only once it's been asked. So when was the cycle detection question first asked? Well, we can't know for sure, but my guess is that it was about a day before the tortoise-and-hare algorithm was devised.

Some answers aren't given because the question was asked in the wrong way.

Another major advantage that a candidate being asked one of these questions has is that the question is being asked in a form that has the benefit of hindsight. A lot of mathematical questions were originally resolved in relatively complex ways, but over time connections are drawn between those questions and other questions, and they start looking much simpler.

For example, if you look at the history of group theory, you can see that while a lot of work was done with groups in their original formulation as subsets of permutations, once the concept was boiled down to its essence of the abstract formulation that exists today, a lot more work was done a lot more quickly. This is despite the fact that every group can be expressed as a subset of permutations.

Problems become easier to solve over time.

The fact that problems get asked and that their formulations get refined are just two aspects of the general trend that humans become more equipped to solve them. The fact that a problem was hard in 1970 does not mean that the same problem is still hard today. If a new technique is used to solve a problem, mathematicians are going to be excited because they will remember the technique and try using it to solve other problems. Sometimes, that new technique becomes standard and the original problem starts to look easy. Using Dijkstra's algorithm as an example, if you have experience with dynamic programming in general, then it should not be a problem to apply that experience to the case of graphs.

The fact that dynamic programming was developed by Richard Bellman in the 1940s doesn't make the technique any less fundamental today, and certainly won't convince me that I should hire someone unfamiliar with the technique, even if I don't need someone as accomplished as Bellman was. Advancing human knowledge is all about building on top of the knowledge that others have created before. As a result, knowing about and understanding the work in the literature is a relevant skill for anyone who is interested in solving new problems.

It's tempting to draw a line between academia and the tech industry and say that while academics are truly solving new problems and this knowledge is certainly useful there, software engineers in the industry can get by with taking algorithms and such that were developed by academics and merely implementing them. However, I think it's not too difficult to see that engineers in the industry need to solve different problems than academics do, and so not everything will have ready-to-go solutions, and sometimes new problems arise that the academics had no reason to ask.

A great example is Facebook's work on Haxl. The whole idea of Haxl is to create a language (in this case embedded in Haskell) that can be learned quickly like a custom macro language, but is both flexible and able to automatically handle parallelism. Facebook's easy-to-learn criterion is something that probably wouldn't be thought of in academia, and even if so, it would be much harder to test and verify, compared to the ability of Facebook to give this language to actual Facebook employees to use. Furthermore, during this work, the Haxl team at Facebook uncovered performance issues that had previously gone unnoticed due to the lack of scale, including some problems with the garbage collector.

I mentioned at the start of the post that I don't think that asking a candidate to implement Dijkstra's algorithm or asking them about cycle detection are good interview questions, but that I disagree with most of the criticisms of them. To me, the big weak point of these questions is that they are too well-known. It is very plausible (perhaps even expected) that a good candidate will have already seen both problems and solutions. So what do you gain from asking these questions?

In other words, interview questions need to be more complex, and carefully chosen so that candidates should have never seen the problem before. Of course, one hour is a very short time to actually solve a problem from scratch. But the solution to that shouldn't be to eliminate the ability to distinguish among the top end. Why not give the candidate more time? I'd love to see someone try an interview format where the candidate is given 4 hours to work on a problem using whatever resources they want and they would present what they have at the end, whether it's complete code or just ideas toward a solution. Some care would need to be taken to deal with cases of another person solving the problem for them, but I would try this myself if I were in a position to do so. Unfortunately, I don't expect anyone else to run the experiment for me, because I really have no evidence to suggest that it's a good idea other than my own thoughts.