# The Computer Science Problem that Could Cure Cancer (or at least help Amazon deliver packages quicker)

My dad works at Amazon. He commands a fleet of truck drivers who are in charge of delivering packages to residents living in certain zip codes. Each day, these drivers report to the Amazon warehouse, where they load packages into their trucks, prior to proceeding on their journey. On average, drivers can expect to deliver to the residences of 150 or more people before they can return to the warehouse, which marks the end of their long and tiring day.

Amazon expects that almost 100% (and by almost I mean almost - around 98.5% and up) of the packages that the drivers are given in the morning to by received by the people who ordered them. The key to achieving this rate of delivery is plotting an efficient path from the warehouse to each of the houses/apartments and back to the warehouse again. Trying to devise such a route is the essence of the Traveling Salesman Problem, or as I’ll be referring to it, TSP. Solving this problem could cure cancer (we’ll discuss how later) , and, more importantly, get us our Amazon orders quicker.

A more formal definition of TSP is as follows: given a collection of cities (or houses, it really doesn’t matter) and the distances between them, find the shortest route that visits each city and returns to its starting point. Now, there are many variants of TSP but the one that we’re going to be looking at for reasons you’ll see later is Metric TSP, where the distances between the cities act in accordance to the triangle inequality. Essentially, this says that the distance (z) from city A to city C is at most a distance (x) from city from city A to city B + a distance (y) from city B to city C. You can visualize this relationship in the image below. In any instance where the three cities aren’t arranged in a straight line (here, z is equal to x +y), distance z is always going to be smaller than x + y. Attempting TSP without the lens of the Triangle Inequality doesn’t make sense in the real world, as you can see that there is no way for z to be greater than x + y.

Another assumption that we’re making today is that the distances between cities are symmetric. If you were to measure the distance traveling from x to y, you should compute the same value as if you were to measure the distance traveling from y to x. In the real world, this is not always the case. For example, if a city is on a hill, then you’ll probably travel slower while approaching the city because you’ll be climbing uphill. Nevertheless, Symmetric Metric TSP (that’s fun to say, isn’t it?) is what we’re going to be looking at and it still has many beautiful applications.

Problems in Computer Science are unique in the fact that they are difficult to solve exactly and TSP is no exception. At least, that’s what we will continue to believe until somebody presents a flawless algorithm to one of these problems. Our best “exact” algorithms can only solve these problems in exponential time, meaning with n amount of cities, it will take 2^n steps to compute. This is very troublesome because instances with a ton of cities may take hundreds, thousands, or even millions of years and ain’t nobody got time for that.

Instead, in an effort to save precious hours, computer scientists decided to solve these problems approximately. This approach is far superior because it runs in polynomial time. In the case of TSP, if there are n number of cities, an algorithm that runs in polynomial time might solve it in n⁵ steps. When n gets larger and the problem becomes more complex, you can see why this would be beneficial (you would much rather have 1000⁵ steps than 5¹⁰⁰⁰ steps). Next, we will look at some of TSP’s popular approximation algorithms (which are basically just algorithms that work in polynomial time as opposed to exponential time).

By far the most famous approximate solution to TSP is the Christofides algorithm, discovered in 1976 by the British mathematician Nicos Christofides. This algorithm is based on the idea that a TSP instance (i.e a collection of cities) can be represented as a series of nodes on a graph connected by edges which have a value equal to the distance between the nodes they connect. Given such a graph, the Cristofides algorithm generates what is known as the minimum spanning tree. A regular spanning tree is a set of edges that link all the cities, and the minimum spanning tree is very similar, except for the fact that it is arranged in such a way that the route it produces is as short as possible.

It’s not logical to solely rely on the the minimum spanning tree, however, due to the fact that it doesn’t eliminate the possibility of some of the cities having only one edge connected to them, or what we call odd degree. This is a problem because if you have an edge leading into a city, you also need an edge that allows you to exit it. To fix this issue, the Cristofides Algorithm uses this trick called minimum cost matching. All this does is “match” the cities with odd degree and draw an edge between them, effectively creating a way to leave each one. After completing this matching, a eulerian tour (a path which traverses each edge only once) is generated. Following this tour all the way back to the origin is Cristofides’s solution to TSP.

I recently had the honor of talking to Nathan Klein, graduate student at the University of Washington, on my podcast. He brought to my attention that there are certain cases where the the Cristofides Algorithm does this annoying thing where every single city has odd degree in the minimum cost cost spanning tree. As you can imagine, this is very problematic and even with minimum cost matching, the results are very sub optimal. He goes on to explain how he along with Anna Karlin and Shayan Oveis Gharan created a better algorithm (I won’t go into the details, but if anyone is interested the full interview is here). Though their algorithm, which slightly adjusted an earlier algorithm by Oveis Gharan, Saberi, and Singh, didn’t solve TSP exactly, Nathan explained what breakthroughs a perfect solution to TSP would lead to.

One such breakthrough could be the ability to cure cancer. How is this even possible? Well, if someone discovered an algorithm that solved TSP exactly while also working in polynomial time, we would be able to show that problems that we currently think can only be solved approximately can be solved quickly and exactly. Now what does that have anything to do with curing cancer? To figure out if a cell is cancerous, you have to compare its DNA signature to that of a cancer-free cell. The drug that cures cancer would have to only destroy the cells with the cancerous DNA while leaving the other cells unharmed.

Imagine that someone comes along and claims they have a drug that can do just that. How can we verify that what they’re saying is true? Well, we’d have to simulate the behavior of the drug once it enters a cancer patients body. As far as we know, there is no fast way to run such a simulation exactly, but, if we were to solve TSP, there would be some hope. And even if these problems can never be solved exactly and quickly at the same time, at least improved approximation algorithms will get us our Amazon packages in a more timely fashion.