Jumping Sumo – Testing the Network

Once the network has been trained, it’s time to take it for a test run. The necessary activity is PilotActivity. After connecting to the Jumping Sumo (JS), it then loads the network and the analyst (necessary for input normalization and output denormalization) from the files placed in the raw directory (these are created by Encog after training). I’ve used a singleton pattern here for the network because the load is time expensive and doing it multiple times gets annoying really fast.

The inner workings of this activity is quite simple. When the user hits start, successive frames are fed to the network which then uses that as well as the previous speed values to calculate the next values. I setup two views so I could see the actual image as well as what the thresholded image looks like. The activity is setup such that if a new frame comes in before any previous frame is used for prediction, the old frame is overwritten.I’ve also used a timer to schedule calls to the autopilot’s move function every 40ms (this is by pure experimentation and probably is not the best way to go).

Whenever the AutoPilot takes/consumes a frame, it sets nextFrame field to null. This is important because the move method may be called before another frame is available. In this case I’ve taken a naive approach and simply dampened the output speeds by 1/2 its previous value. Now you may be thinking is this the best way to go? Honestly I don’t think so. This is my first foray into messing with or controlling hardware. I just needed to see the JS make a complete lap and I would be validated. I’ve recently heard about control loops and how they’re appropriate for these kinds of tasks and may incorporate one in future. For now, the code produces a satisfactory result and I was/am happy. Once again, here’s what it looks like:

These are links to the related posts:
Getting Ready
Collecting Training Data
Training the Neural Network
Testing the Neural Network

The full source code can also be found here and here.

Please leave a comment if you have a question.

Jumping Sumo – Collecting Training Data

For this tutorial, we’ll be using a form of training known as supervised learning. In other words, you provide the network with inputs and the corresponding expected outputs and it is to learn how to produce a correct output (or close enough) given some input. The next step is to decide what our inputs and outputs will be.

To control the drone, you programmatically set the turn and speed values and then set a flag which causes it to take those values into account. We will be tracking previous and current values of the turn and speed with the MotionData class. The JS also has a camera from which we can receive successive frames. Whenever we receive a frame, we add it along with a copy of the current MotionData as a pair to the MotionRecorder’s queue. MotionRecorder works hand in hand with the Consumer class (in a producer-consumer like fashion) to store data for different runs on the external memory of the device. The previous values in the MotionData along with the image will be used as inputs to the network. The expected output will be the current values.

For easy parsing, image files are saved in this format: runn1_n2_n3_n4_n5_n6.png.
n1 is the run number. You can have multiple training runs.
n2 is the zero based index of the image.
n3 and n4 are the previous values of the turn and speed respectively (network inputs)
n5 and n6 are the current values of the turn and speed respectively (network outputs)

Controlling the Drone using a Controller

If you’re like me, you’ll find that controlling a drone via a phone’s gyroscope is quite daunting. After several attempts at that, I opted to use a controller instead. I used a ps4 controller but any that is Bluetooth enabled should work as well. Android treats controller input like any other input. It then becomes necessary to verify that the input is indeed from a controller and then proceed accordingly.

// TrainActivity.java
public boolean onGenericMotionEvent(MotionEvent event) {
    // Check that input came from a game controller
    if ((event.getSource() & InputDevice.SOURCE_JOYSTICK) == InputDevice.SOURCE_JOYSTICK &&
            event.getAction() == MotionEvent.ACTION_MOVE) {
        // Process all historical movement samples in the batch
        final int historySize = event.getHistorySize();
        for (int i = 0; i < historySize; i++) {
            processJoystickIInput(event, i);

        // Process the current movement sample in the batch (position -1)
        processJoystickIInput(event, -1);

        return true;

    return super.onGenericMotionEvent(event);

private void processJoystickIInput(MotionEvent event, int historyPos) {
    InputDevice input = event.getDevice();

    // Calculate the horizontal distance to move by using the input value from the right joystick
    float x = getCenteredAxis(event, input, MotionEvent.AXIS_Z, historyPos);
    float y = getCenteredAxis(event, input, MotionEvent.AXIS_Y, historyPos);

    // Move drone
    byte turnSpeed = (byte) (MAX_TURN_SPEED * x);
    byte forwardSpeed = (byte) (MAX_FORWARD_SPEED * -y);

    // Stop if no joystick motion
    if (x == 0 && y == 0) {
        mJSDrone.setFlag((byte) 0);
    } else {
        mJSDrone.setFlag((byte) 1);

    motionData.updateMotion(forwardSpeed, turnSpeed);

getCenteredAxis() returns the displacement of the stick indicated from its center. It uses the default implementation in the android developer website.

Processing Images Before Storing

I read somewhere on the internet that neural networks that work with images sometimes do better with grayscale images ūüôā As a result, the images that are received are decoded to a grayscale representation using the Imgcodecs.imdecode() from OpenCV.

By default, the JS takes captures frames at a 640px x 480px resolution. If we were to use it at this scale, training would be extremely slow. Each pixel corresponds to a network input so there would be 307200 pixels for each image and each training run saves several hundred images. As a result, I resized the downscaled image to 32px x 24px using  Imgproc.resize().

Finally, I noticed that training in areas with different lighting led to different results. To reduce this discrepancy, I used a technique known as binary thresholding to set all pixels below a certain threshold (arbitrarily chosen as 160 in this case) to black and all above to white. This way, the white paper used to form the track stands out from its surroundings. The image is then stored in the previously mentioned format using Imgcodecs.imwrite().

// Consumer.java
private void process(Pair<MotionData, byte[]> pair) {
    MotionData motion = pair.first;
    byte[] data = pair.second;

    String fileName = String.format(Locale.US, "run%d_%d_%d_%d_%d_%d.png",
            runNumber, imageNumber,
            motion.getPrevTurnSpeed(), motion.getPrevForwardSpeed(),
            motion.getTurnSpeed(), motion.getForwardSpeed());
    File file = new File(outDir, fileName);

    Mat img = Imgcodecs.imdecode(new MatOfByte(data), Imgcodecs.CV_LOAD_IMAGE_GRAYSCALE);
    Imgproc.resize(img, img, new Size(32, 24));
    Thresholder.Threshold(img, img);
    Imgcodecs.imwrite(file.getAbsolutePath(), img);
    Log.i(TAG, "Wrote data: " + fileName);


    // Notify listeners
    for (MotionRecorder.QueueUpdateListener listener : listeners) {
// Thresholder.java
public class Thresholder {
    public static final int MAXVAL = 255;
    public static final int THRESHOLD = 160;

    public static void Threshold(Mat src, Mat dst) {
        Imgproc.threshold(src, dst, THRESHOLD, MAXVAL, Imgproc.THRESH_BINARY);

Here’s what the initial image looks like:

Original image before any processing

Here’s the image after thresholding has been applied:

Image after it’s been grayscaled and thresholding has been applied

Once the training data has been collected, the next step is to design the neural network and train it.

These are links to the related posts:
Getting Ready
Collecting Training Data
Training the Neural Network
Testing the Neural Network

The full source code can also be found here and here.

Please leave a comment if you have a question.

Jumping Sumo – Getting Ready

Jumping Sumo

To begin with, you’ll need the Jumping Sumo drone by Parrot which can be purchased here. You connect to it via WiFi and can control it with the FreeFlight 3 app available on the app store. For this tutorial, I’ll be using an android device. The Parrot SDK allows you to control the drone via code and you can add it to you android project by following the instructions here.


Encog is a machine learning framework by Jeff Heaton with support for various types of neural networks and training algorithms. It is fully implemented in Java with support for parallel processing which speeds up the algorithms. There are several other frameworks available like Neuroph, DeepLearning4j, Tensorflow… I chose Encog because Jeff’s books got me interested in Neural Networks in the first place. To use it, download the sources off GitHub.


OpenCV stands for Open Computer Vision and is a library of functions that are useful for computer vision. To keep things simple, whenever I used a function from this library, I’ll try to explain exactly what it does and why it was used to the best of my knowledge. See here for instructions on adding OpenCV to your android project.

Complete Source

I find that skipping introductions and going straight to code can be helpful sometimes. If this applies to you, the source code for the tutorial can be found here.

These are links to the related posts:
Getting Ready
Collecting Training Data
Training the Neural Network
Testing the Neural Network

The full source code can also be found here and here.

Please leave a comment if you have a question.

Jumping Sumo – Introduction

Having been recently introduced to neural networks, my fascination for the concept has grown quite a lot. I find it “interesting” that you can train a network to output reasonable results whilst not fully grasping how exactly the results are obtained. Having watched a lot of YouTube videos demonstrating robots of various kinds and shapes being controlled autonomously, I figured it was time I attempted one of my own.

I know very little about hardware (perhaps nothing other than a Modern Digital Systems Design class I took several years ago) and did not want to mess around with it. I needed a device which was programmable out of the box. Ergo, the Jumping Sumo by Parrot.I wanted to reproduce something similar to this but just have the drone move around the track (just once would be a confirmation that my efforts were not wasted). In the next few posts, I’ll describe the various steps involved in setting up, collecting data, training a neural network and finally testing it out. For eye candy, here are my results (not the best but hey, it’s a first).

These are links to the related posts:
Getting Ready
Collecting Training Data
Training the Neural Network
Testing the Neural Network

The full source code can also be found here and here.

Please leave a comment if you have a question.

Drawing binary trees


In this post, I’ll lay out a simple algorithm which can be used to draw binary trees – it can be extended to draw other tree types as well. To be clear,¬†a binary tree is a tree data structure in which each node has at most two children (Wikipedia).

As is convention, we’ll be drawing the tree top-down with the root at the top and I’ll use “left” and “right” to designate the left and right children of a node respectively. To make things easier, let us assume each node has a fixed diameter,¬†d. We also want each node in the same level of the tree to be centered at the same vertical position, so we’ll define h to represent the separation between levels. Finally, we want¬†subtrees to be at a distance w from each other (assume w is always greater than d). The following diagram clarifies the convention I’ve chosen.

Diameter (d), level height (h), and child separation (w)

Notice that I’ve drawn boxes around the nodes. I’ll refer to those as bounding boxes and you’ll understand their usage shortly.



In order to position a node, we need to know how large its children are so that we can center it. This naturally tends to recursion because to know how large a child is, we will need to know how large its children are also and so on till we get to a leaf node whose size is d x d. We will use the width and height of a bounding box to represent this size. Do remember that the bounding boxes of the left and right children are separated by a distance w.

In other words, we need to do a post-order traversal of the tree and in each step, we obtain the size of a sub-tree. If you’re not familiar with it, this means you visit the left node first, then the right node and then the current node (L-R-C). Here’s the pseudocode:

function setSize() : void
  if left is not null then left.setSize()
  if right is not null then right.setSize()

  if both children are not null then
    width = left.width + w + right.width
    height = h + max(left.height, right.height)
  else if left is not null then # left child only
    width = left.width + (w + d) / 2
    height = left.height + h
  else if right is not null then # right child only
    width = right.width + (w + d) / 2
    height = right.height + h
  else # leaf node
    width = height = d
  end if
end function


As an illustration, consider the diagram above. The labels indicate the order in which the bounding rectangles are measured.

  1. At the root, you call left.setSize() which then calls left.setSize() and finally we get to node H. Since H is a leaf node, it’s size (width x height)¬†is d x d.
  2. Since D has no left node, we add¬† half the separation and half the diameter to get its width. It’s height is simply H’s height + h.
  3. Since this is a post-order traversal, the next node to measure is E and since it’s a leaf node, it’s size is also d x d.
  4. The width of B is then sum of the widths of it’s sub-trees and w and it’s height is the greater of the heights of its children plus h.
  5. We then go to the right sub-tree of A (the tree rooted at C). This calls left.setSize() and since F is a leaf node, it’s size is d x d.
  6. At the right of C is G and we go to it’s left I which is a leaf node – d x d.
  7. The size of G is calculated in a fashion similar to that of D
  8. C’s size is calculated similar to B’s.
  9. A’s size (which is the size of the tree) is calculated in a manner similar to B.

Positioning and Drawing

Once the tree’s size has been accurately determined, the next step is to position and draw each node. The algorithm we use here assumes that the x and y coordinates of the center of the current node is specified. Using this information, we then figure out how to place the children so that the current node appears to be in the middle of it’s bounding box.

Here’s a summary:

  1. If a node has a left child only or right child only the horizontal distance between the center of the node and that child (w + d) / 2
  2. If it has both children, then the placement of the children, depends on the placement of nature of the subtree rooted at its children.
    1. If the child has both children, then place it such that it is centered in its bounding box.
    2. If it has a single child, then it should be positioned at either the left or right end of its bounding box depending on which child it has (see image above for clarification.
    3. If it a leaf, center it in its bounding box.
function drawNode(x: real, y: real)
  # draw this node
  if both children are not null then
    # next level is h below current
    childY = y + h

    # center both children in their bounding boxes
    # left end of bounding box
    leftStart = x - width
    if left has both children then
      leftX = leftStart + left.width / 2
    else if left has a left child then
      leftX = leftStart + left.width - d / 2
      leftX = leftStart + d / 2 # position at top-left of bounding box
    end if

    # right end of bounding box
    rightEnd = leftStart + width
    if right has both children then
      rightX = rightEnd - right.width / 2
    else if right has a right child then
      rightX = rightEnd - right.width  + d / 2
      rightX = rightEnd - d / 2
    end if

    left.drawNode(leftX, childY)
    right.drawNode(rightX, childY)
    # draw lines connecting the children to the current node
  else if left is not null then
    leftX = x - (w + d) / 2
    left.drawNode(leftX, childY)
    # draw line connecting left node to this node
  else if right is not null then
    rightX = x + (w + d) / 2
    right.drawNode(rightX, childY)
    #draw line connecting right node to this node
  end if
end function

Finally, since we all love to see proofs of concept, here’s a¬†GitHub repository that demonstrates a working version of the algorithm implemented in¬†C# using WPF. I hope you found this to be helpful!