YouTube Videos

A Simple Neural Network
KotlinConf 2018 - Mathematical Modeling
Creating a Sudoku Solver from Scratch
Traveling Salesman Problem
Text Categorization w/ Naive Bayes
Monty Hall Problem
Solving World's Hardest Sudoku

Saturday, December 1, 2018

Is Deep Learning Already Hitting Its Limitations?



And Is Another AI Winter Coming?

Many believed an algorithm would transcend humanity with cognitive awareness. Machines would discern and learn tasks without human intervention and replace workers in droves. They quite literally would be able to “think”. Many people even raised the question whether we could have robots for spouses.

But I am not talking about today. What if I told you this idea was widely marketed in the 1960’s, and AI pioneers Jerome Wiesner, Oliver Selfridge, and Claude Shannon insisted this could happen in their near future? If you find this surprising, watch this video and be amazed how familiar these sentiments are.

Fast forward to 1973, and the hype and exaggeration of AI backfired. The U.K. Parliament sent Sir James Lighthill to get a status report of A.I. research in the U.K. The report criticized the failure of artificial intelligence research to live up to its sensational claims. Interestingly, Lighthill also pointed out how specialized programs (or people) performed better than their “AI” counterparts, and had no prospects in real-world environments. Consequently, all AI research funding was cancelled by the British government.

Across the pond, the United States Department of Defense was invested heavily in AI research, but then cancelled nearly all funding over the same frustrations: exaggeration of AI ability, high costs with no return, and dubious value in the real world.

In the 1980’s, Japan enthusiastically attempted a bold stab at “AI” with the Fifth Generation Project. However, that ended up being a costly $850 million failure as well.

The First AI Winter

The end of the 1980’s brought forth an A.I. Winter, a dark period in computer science where “artificial intelligence” research burned organizations and governments with delivery failures and sunk costs. Such failures would terminate AI research for decades.

By the time the 1990’s rolled around, “AI” became a dirty word and continued to be in the 2000’s. It was widely accepted that “AI just didn’t work”. Software companies who wrote seemingly intelligent programs would use terms like “search algorithms”, “business rule engines”, “constraint solvers”, and “operations research”. It is worth mentioning that these invaluable tools indeed came from AI research, but they were rebranded since they failed to live up to their grander purposes.

But around 2010, something started to change. There was rapidly growing interest in A.I. again and competitions in categorizing images caught the media’s eye. Silicon Valley was sitting on huge amounts of data, and for the first time there was enough to possibly make neural networks useful.

By 2015, “AI” research commanded huge budgets of many Fortune 500 companies. Oftentimes, these companies were driven by FOMO rather than practical use cases, worried that they would be left behind by their automated competitors. After all, having a neural network identify objects in images is nothing short of impressive! To the layperson, SkyNet capabilities must surely be next.

But is this really a step towards true AI? Or is history repeating itself, but this time emboldened by a handful of successful use cases?

What is AI Anyway?

For a long time, I have never liked the term “artificial intelligence”. It is vague and far-reaching, and defined more by marketing folks than scientists. Of course, marketing and buzzwords are arguably necessary to spur positive change and embrace new mindsets. However, buzzword campaigns inevitably lead to confusion. My new ASUS smart phone has an “AI Ringtone” feature that dynamically adjusts the ring volume to be just loud enough over ambient noise. I guess something that literally could be programmed with a series of if conditions, or a simple linear function, is called “AI”. Alrighty then.

In light of that, it is probably no surprise the definition of “AI” is widely disputed. I like Geoffrey De Smet’s definition, which states AI solutions are for problems with a nondeterministic answer and/or an inevitable margin of error. This would include a wide array of tools from machine learning to probability and search algorithms.

It can also be said that the definition of AI evolves and only includes ground-breaking developments, while yesterday’s successes (like optical character recognition or language translators) are no longer considered “AI”. So “artificial intelligence” can be a relative term and hardly absolute.

In recent years, “AI” has often been associated with “neural networks” which is what this article will focus on. There are other “AI” solutions out there, from other machine learning models (Naive Bayes, Support Vector Machines, XGBoost) to search algorithms. However, neural networks are arguably the hottest and most hyped technology at the moment. If you want to learn more about neural networks, I posted my video below.

If you want a more thorough explanation, check out Grant Sanderson’s amazing video series on neural networks here:



An AI Renaissance?

The resurgence of AI hype after 2010 is simply due to a new class of tasks being mastered: categorization. More specifically, scientists have developed effective ways to categorize most types of data including images and natural language thanks to neural networks. Even self-driving cars are categorization tasks, where each image of the surrounding road translates into a set of discrete actions (gas, break, turn left, turn right, etc). To get a simplified idea of how this works, watch this tutorial showing how to make a video game AI.

In my opinion, Natural language processing is more impressive than pure categorization though. It is easy to believe these algorithms are sentient, but if you study them carefully you can tell they are relying on language patterns rather than consciously-constructed thoughts. These can lead to some entertaining results, like these bots that will troll scammers for you:

Probably the most impressive feat of natural language processing is Google Duplex, which allows your Android phone to make phone calls on your behalf, specifically for appointments. However, you have to consider that Google trained, structured, and perhaps even hardcoded the “AI” just for that task. And sure, the fake caller sounds natural with pauses, “ahhs”, and “uhms”… but again, this was done through operations on speech patterns, not actual reasoning and thoughts.

This is all very impressive, and definitely has some useful applications. But we really need to temper our expectations and stop hyping “deep learning” capabilities. If we don’t, we may find ourselves in another AI Winter.

History Repeats Itself

Gary Marcus at Cornell wrote an interesting article on the limitations of deep learning, and poses several sobering points (he also wrote an equally interesting follow-up after the article went viral). Neural networks are data-hungry and even today, data is finite. This is also why “game” AI examples you see on YouTube (like this one as well as this one) often requires days of constant losing gameplay until the neural network finds a pattern that allows it to win.

Neural networks are “deep” in that they technically have several layers of nodes, not because it develops deep understanding about the problem. These layers also make the neural networks difficult to understand, even for its developer. Most importantly, neural networks are experiencing diminishing return when they venture out into other problem spaces, such as the Traveling Salesman Problem. And this makes sense. Why in the world would I solve the Traveling Salesman Problem with a neural network when a search algorithm will be much more effective, scalable, and economical? Of course, there are folks looking to generalize more problem spaces into neural networks, and while that is interesting it rarely seems to outperform any specialized algorithms.

Luke Hewitt at MIT puts it best in this article:

It is a bad idea to intuit how broadly intelligent a machine must be, or have the capacity to be, based solely on a single task. The checkers-playing machines of the 1950s amazed researchers and many considered these a huge leap towards human-level reasoning, yet we now appreciate that achieving human or superhuman performance in this game is far easier than achieving human-level general intelligence. In fact, even the best humans can easily be defeated by a search algorithm with simple heuristics. Human or superhuman performance in one task is not necessarily a stepping-stone towards near-human performance across most tasks. - Luke Hewitt

I think it is also worth pointing out that neural networks require vast amounts of hardware and energy to train. To me, that just does not feel sustainable. Of course, a neural network will predict much more efficiently than it trains. However I do think the ambitions people have for neural networks will demand constant training and therefore require exponential energy and costs.

It is for these reasons I think another AI Winter is coming. Companies are sparing little expense in getting the best “deep learning” and “AI” talent, and I think it is a matter of time before many companies realize deep learning is not what they need. Even worse, if your company does not have Google’s research budget, the PhD talent, or massive data store it collected from users, you can quickly find your practical “deep learning” prospects very limited. This was best captured in this scene from the HBO show Silicon Valley:

Each AI Winter is preceded with scientists exaggerating and hyping the potential of their creations. It is not enough to say their algorithm can do one task well. They want it to ambitiously adapt to any task. I have a theory why people sensationalize despite obvious limitations, and it is more philosophical than scientific. I will save that for the end of the article.

So What’s Next?

Of course, not every company using "machine learning" or "AI" is actually using "deep learning." A good data scientist may have been hired to build a neural network, but when she actually studies the problem she more appropriately builds a Naive Bayes classifier instead. For the companies that are successfully using image recognition and language processing, they will continue to do so happily. But I do think neural networks are not going to progress far from those problem spaces.

The AI Winters of the past were devastating in pushing the boundaries of computer science. It is worth pointing out that useful things came out of such research, like search algorithms which can effectively win at chess or minimize costs in transportation problems. Simply put, innovative algorithms emerged that often excelled at one particular task.

The point I am making is there are many proven solutions out there for many types of problems. To avoid getting put out in the cold by an AI Winter, the best thing you can do is be specific about the problem you are trying to solve and understand its nature. After that, find approaches that provide an intuitive path to a solution for that particular problem. If you want to categorize text messages, you probably want to use Naive Bayes. If you are trying to optimize your transportation network, you likely should use Discrete Optimization. No matter the peer pressure, you are allowed to approach convoluted models with a healthy amount of skepticism, and question whether it is the right approach.

Hopefully this article made it abundantly clear deep learning is not the right approach for most problems. Do not hit the obstacle of seeking a generalized AI solution for all your problems, because you are not going to find one.

Are Our Thoughts Really Dot Products? Philosophy vs Science

One last point I want to throw in this article, and it is more philosophical than scientific. Is every thought and feeling we have simply a bunch of numbers being multiplied and added in linear algebra fashion? Are our brains, in fact, simply a neural network doing dot products all day? That sounds almost like a Pythagorean philosophy that reduces our consciousness to a matrix of numbers. Perhaps this is why so many scientists believe general artificial intelligence is possible, as being human is no different than being a computer. (I’m just pointing this out, not commenting whether this worldview is right or wrong).

If you do not buy into this philosophy, then the best you can strive for is have AI “simulate” actions that give the illusion it has sentiments and thoughts. A translation program does not understand Chinese. It “simulates” the illusion of understanding Chinese by finding probabilistic patterns. When your smartphone “recognizes” a picture of a dog, does it really recognize a dog? Or does it see a pattern of numbers it saw before?

Saturday, October 6, 2018

Animating the Traveling Salesman Problem with JavaFX

Untitled Document.md

Animation can be a powerful tool. It is one thing to explain a complex topic in words or even in pictures, but visuals in motion have an amazing quality to bring abstract ideas to life. This can be especially helpful in complex areas of computer science like optimization and machine learning.

I recently gave a talk at KotlinConf on optimization and machine learning. Of the several examples, one was the Traveling Salesman Problem (a.k.a. “TSP”). This is such a fun and fascinating problem and it often serves as a benchmark for optimization and even machine learning algorithms. However, explaining some of the algorithms (like k-opt and simulated annealing) is less intuitive without a visual aid. Therefore, I made this an open-source project using JavaFX via TornadoFX.

A lot of folks at the conference and online were surprised by the animated visuals, remarking how “integrated” and slick it looked. The truth is I hacked this application together and JavaFX was instrumental in the heavy lifting. It took care of the animation for me so I could focus on the algorithm itself. This is what I want to blog about in this article.

You can watch the video walkthrough of this application (with a thorough explanation of the TSP) here. I recommend watching this before reading on!

The focus of this blog post will be on the animation and how it was achieved. For an in-depth explanation of the TSP and how it can be solved, please watch the above video.

The Structure

To build this, let’s first lay out the structure of our visual framework. I am going to express this in the Kotlin language, and leverage JavaFX via TornadoFX. Thankfully, TornadoFX does not hide or supresss any of JavaFX’s functionality, but rather augments it with expressive Kotlin DSL’s. So you can do this in Java, Kotlin, Scala, or any JVM language that can use JavaFX.

The first thing I’m going to do in my application is declare a Pane, and place in it an ImageView with a simple map image of Europe. Then from my domain model, I’ll import my City objects and place a red Circle on the x and y screen coordinates relative to the Europe map. Finally, I’ll import the Edge objects from my domain where each one is tied to a City, and bind each one to a Line. Each Edge represents a connection between two cities, and it is initialized with the start point and end point being the same city. Therefore, the Line will initialize by resting inside the Circle as a little dot. The Line will also be bound to the startX, endX, startY, and endY properties of its owning Edge.

pane {
    imageview(Image("europe.png")) {
        fitHeight = 1000.0
        fitWidth = 1000.0

        CitiesAndDistances.cities.forEach { city ->
            circle(city.x,city.y,10.0) {
                fill = Color.RED
            }
        }

        Model.edges.forEach { edge ->
            line {
                startXProperty().bind(edge.edgeStartX)
                startYProperty().bind(edge.edgeStartY)
                endXProperty().bind(edge.edgeEndX)
                endYProperty().bind(edge.edgeEndY)
                strokeWidth = 3.0
                stroke = Color.RED
            }
        }
    }
}

At this point, I should have something that renders like this:

When we animate, we will change each Edge’s startX, endX, startY, and endY properties. When we want to connect two cities, for instance, I can change the endX and endY properties to make that line extend to that other city’s coordinates.

Planning the Animation

With this structure in place, I did have to make a few considerations next. Should I animate the algorithm in live time or queue up the animations and make them replayable? Did I want to animate every single thing the algorithm did or filter out the noise and animate only the key events?

These decisions may seem unimportant at first glance, and I even told myself “why not animate everything?”. Of course, this backfired quickly because the animations already slow down the algorithm… and animating unproductive events in the algorithm only added noise. This also made the animation incredingly long and boring.

So what are unproductive events, you ask? Well, the algorithm works by doing thousands of random Edge swaps as explained in the video. When the swap did not improve the solution (or it failed the coin flip in the Simulated Annealing approach), I would undo the swap and put everything back. I learned it was best to not animate these events because most iterations were failed swaps, and it was better to animate successes to show progress rather than every iteration including failures.

Another adapatation I ultimtely made is running the algorithm first, and then animating the results. This had the benefit of being able to replay the results without having to run the entire process again. The key utility I needed in the JavaFX library is the SequentialTransition, which allows me to queue up animations and have them played in order (rather than all at once). I can then have my algorithm add animations to the SequentialTransition and it can be played later when it is done.

I stored each algorithm (“GREEDY”, “TWO-OPT”, “SIMULATED_ANNEALING”, etc) as an enumerable so I gave each one its own SequentialTransition. I also created some convenient extension functions so I could use += operators to add animations.

enum class SearchStrategy {

    RANDOM {
        ... 
    },

    GREEDY {
        ... 
    },

    REMOVE_OVERLAPS {
        ... 
    },
    TWO_OPT {
        ... 
    },

    SIMULATED_ANNEALING {
        ... 
    }
    
    val animationQueue = SequentialTransition()

    abstract fun execute()
}


// extension functions for SequentialTransition
operator fun SequentialTransition.plusAssign(timeline: Timeline) { children += timeline }
fun SequentialTransition.clear() = children.clear()
operator fun SequentialTransition.plusAssign(other: SequentialTransition) { children.addAll(other) }

And of course, I set a speed as a constant that defines how long each animation frame takes.

// animation parameters
var speed = 200.millis

Executing a Path Traversal

On the domain model side, I have Edgeitems that initially belong to one City. However, the startCity and endCity can be mutated and on each mutation, the Edge has an animateChange() function returning a deferred Timeline that will play that change.

But here is the interesting design decision I ended up doing. I created the edgeStartX, edgeStartY, edgeEndX, and edgeEndY to not be synchronized to their respective startCity and endCity. Rather, these are used purely for animation execution. When I decide to animate a change in the startCity or endCity, I call animateChange() to create a Timeline that animates the coordinate changes. It will take the current value in each JavaFX property holding the coordinate values, and animate it by gradually increasing/decreasing to the specified value in that amount of time (which is the speed of the KeyFrame).

Note though this Timeline does not execute, that is up to the function caller on how to use that animation.

class Edge(city: City) {

    val startCityProperty = SimpleObjectProperty(city)
    var startCity by startCityProperty

    val endCityProperty = SimpleObjectProperty(city)
    var endCity by endCityProperty

    val distance get() = CitiesAndDistances.distances[CityPair(startCity.id, endCity.id)]?:0.0

    // animated properties
    val edgeStartX = SimpleDoubleProperty(startCity.x)
    val edgeStartY = SimpleDoubleProperty(startCity.y)
    val edgeEndX = SimpleDoubleProperty(startCity.x)
    val edgeEndY = SimpleDoubleProperty(startCity.y)

    fun animateChange() = timeline(play = false) {
            keyframe(speed) {
                keyvalue(edgeStartX, startCity?.x ?: 0.0)
                keyvalue(edgeStartY, startCity?.y ?: 0.0)
                keyvalue(edgeEndX, endCity?.x ?: 0.0)
                keyvalue(edgeEndY, endCity?.y ?: 0.0)
                keyvalue(Model.distanceProperty, Model.totalDistance)
            }
        }
}

This particular function is used to expand an Edge for the first time to another city, which happens in the GREEDY and RANDOM algorithms. Stiching these together in a sequence results in a path traversing slickly to create a round-trip. Here is how the animateChange() function is leveraged in the RANDOM algorithm. Note how when I traverse to each random City, I connect each consecutive Edge pairs by their startcity and endCity respectively. Then I call animateChange() to return a Timeline and add it to the animationQueue.

RANDOM {
    override fun execute() {
        animationQueue.clear()

        val capturedCities = mutableSetOf<Int>()

        val startingEdge = Model.edges.sample()
        var edge = startingEdge

        while(capturedCities.size < CitiesAndDistances.cities.size) {
            capturedCities += edge.startCity.id

            val nextRandom = Model.edges.asSequence()
                    .filter { it.startCity.id !in capturedCities }
                    .sampleOrNull()?:startingEdge

            edge.endCity = nextRandom.startCity
            animationQueue += edge.animateChange()
            edge = nextRandom
        }
        
        Model.bestDistanceProperty.set(Model.totalDistance)
    }
}

My UI can then call animationQueue.play() to execute that change when the green play button is pressed.

Executing a Swap

Swaps are a bit more tricky than animating a path traversal. When TWO_OPT or SIMULATED_ANNEALING algorithms select random edges and try to swap their cities (vertices) somehow, sometimes it will fail and sometimes it will succeed. A failure can happen if a swap breaks the tour, and the reverse() function will be called. If it is successful, an animate() function can be called and return a Timeline that waits to be queued or executed.


class TwoSwap(val city1: City,
          val city2: City,
          val edge1: Edge,
          val edge2: Edge
) {

fun execute() {
    edge1.let { sequenceOf(it.startCityProperty, it.endCityProperty) }.first { it.get() == city1 }.set(city2)
    edge2.let { sequenceOf(it.startCityProperty, it.endCityProperty) }.first { it.get() == city2 }.set(city1)
}
fun reverse() {
    edge1.let { sequenceOf(it.startCityProperty, it.endCityProperty) }.first { it.get() == city2 }.set(city1)
    edge2.let { sequenceOf(it.startCityProperty, it.endCityProperty) }.first { it.get() == city1 }.set(city2)
}


fun animate() = timeline(play = false) {
        keyframe(speed) {
            sequenceOf(edge1,edge2).forEach {
                keyvalue(it.edgeStartX, it.startCity?.x ?: 0.0)
                keyvalue(it.edgeStartY, it.startCity?.y ?: 0.0)
                keyvalue(it.edgeEndX, it.endCity?.x ?: 0.0)
                keyvalue(it.edgeEndY, it.endCity?.y ?: 0.0)
            }
        }
        keyframe(1.millis) {
            sequenceOf(edge1,edge2).forEach {
                keyvalue(Model.distanceProperty, Model.totalDistance)
            }
        }
    }

}

fun attemptTwoSwap(otherEdge: Edge): TwoSwap? {

    val e1 = this
    val e2 = otherEdge

    val startCity1 = startCity
    val endCity1 = endCity
    val startCity2 = otherEdge.startCity
    val endCity2 = otherEdge.endCity

    return sequenceOf(
        TwoSwap(startCity1, startCity2, e1, e2),
        TwoSwap(endCity1, endCity2, e1, e2),

        TwoSwap(startCity1, endCity2, e1, e2),
        TwoSwap(endCity1, startCity2, e1, e2)

    ).filter {
        it.edge1.startCity !in it.edge2.let { setOf(it.startCity, it.endCity) } &&
            it.edge1.endCity !in it.edge2.let { setOf(it.startCity, it.endCity) }
    }
    .firstOrNull { swap ->
        swap.execute()
        val result = Model.tourMaintained
        if (!result) {
        swap.reverse()
        }
        result
    }
}

This can then be used for the TWO_OPT and SIMULATED_ANNEALING algorithms. Note that for both these algorithms I start by cleaning the animationQueue, execute the RANDOM algorithm and take all of its animations, and add them to this algorithm’s animations. For the TWO_OPT, I then attempt 2000 random swaps and only add animations that improve the distance of the tour. Otehrwise I call reverse() and do not animate the swap (as if it never happened).

TWO_OPT {
    override fun execute() {
        animationQueue.clear()
        SearchStrategy.RANDOM.execute()
        animationQueue += SearchStrategy.RANDOM.animationQueue

        (1..2000).forEach { iteration ->
        Model.edges.sampleDistinct(2).toList()
                .let { it.first() to it.last() }
                .also { (e1,e2) ->

                    val oldDistance = Model.totalDistance
                    e1.attemptTwoSwap(e2)?.also {
                        when {
                            oldDistance <= Model.totalDistance -> it.reverse()
                            oldDistance > Model.totalDistance -> animationQueue += it.animate()
                        }
                    }
                }
        }
        Model.distanceProperty.set(Model.totalDistance)
        Model.bestDistanceProperty.set(Model.totalDistance)

        println("TWO-OPT BEST DISTANCE: ${Model.totalDistance}")

    }
}

The SIMULATED_ANNEALING, at least on the animation front, is not much different. I essentially add a further condition to animate inferior moves that pass the coin flip. Swaps that do not are again reversed.

// Desmos graph for intuition: https://www.desmos.com/calculator/rpfpfiq7ce
if (weightedCoinFlip(
                exp((-(newDistance - bestDistance)) / temperature)
        )
) {
    animationQueue += swap.animate()

} else {
    swap.reverse()
}

As soon as the algorithm is done, call animationQueue.play() on the desired SearchStrategy enumerable and watch the fireworks.