Jumping Sumo – Testing the Network

Once the network has been trained, it’s time to take it for a test run. The necessary activity is PilotActivity. After connecting to the Jumping Sumo (JS), it then loads the network and the analyst (necessary for input normalization and output denormalization) from the files placed in the raw directory (these are created by Encog after training). I’ve used a singleton pattern here for the network because the load is time expensive and doing it multiple times gets annoying really fast.

The inner workings of this activity is quite simple. When the user hits start, successive frames are fed to the network which then uses that as well as the previous speed values to calculate the next values. I setup two views so I could see the actual image as well as what the thresholded image looks like. The activity is setup such that if a new frame comes in before any previous frame is used for prediction, the old frame is overwritten.I’ve also used a timer to schedule calls to the autopilot’s move function every 40ms (this is by pure experimentation and probably is not the best way to go).

Whenever the AutoPilot takes/consumes a frame, it sets nextFrame field to null. This is important because the move method may be called before another frame is available. In this case I’ve taken a naive approach and simply dampened the output speeds by 1/2 its previous value. Now you may be thinking is this the best way to go? Honestly I don’t think so. This is my first foray into messing with or controlling hardware. I just needed to see the JS make a complete lap and I would be validated. I’ve recently heard about control loops and how they’re appropriate for these kinds of tasks and may incorporate one in future. For now, the code produces a satisfactory result and I was/am happy. Once again, here’s what it looks like:


These are links to the related posts:
Introduction
Getting Ready
Collecting Training Data
Training the Neural Network
Testing the Neural Network

The full source code can also be found here and here.

Please leave a comment if you have a question.

Advertisements

Jumping Sumo – Training the Network

Adding False Images to the Dataset

When training a network, valid datasets teach the neural network how it should behave. However, if all it has to go with is valid data then the network will probably not perform well when presented with new or unexpected data. In our scenario, on-track data is valid and all others are invalid. It would be helpful if we can get the minidrone to stop when it unable to identify a track.

For this purpose, I went around taking random pictures. These were then resized, thresholded and saved in the same format as the valid data but with zeros for the previous and current values of turn and speed. See InvalidConverter.java.

Saving the Dataset to a CSV File

Once all the data is in the same format, the next step is to convert it to a .csv (comma separated value) file with each line representing one data item. The first two numbers are the previous turn and speed values. The next 768 values are pixels from the image  and the last two are current values of turn and speed. The code in Generator.java performs this conversion. I’ve chosen the .csv format because that’s the easiest format to feed Encog with (particularly the workbench – a GUI that simplifies interacting with Encog).

Choice of Neural Network

Now that the data is in a consistent format, it’s time to determine what to use for training as well as the method. For this example, we are interested in predicting the next values of turn and speed based on the previous values as well as the camera image from the drone. This means 770 inputs and 2 outputs. I have chosen to use a feedforward network and train it using resilient propagation (there are several other training methods. I’ve chosen this because it eliminates the need to choose a learning rate or momentum). Based on the number of inputs and outputs, the network will have 770 neurons in the input layer and 2 neurons in the output layer. The big question becomes: how do we choose the number of hidden layers and the number of neurons in each of them?

Sadly, there’s no given method of doing this but by experimentation. I initially started out with a 770-1155-2 network with HyperTan activation in the hidden layer and Linear activation in the output layer. However, varying the number of neurons in the hidden layer did not yield any positive results. I always had above a 100% error. This is the part where patience comes in handy. After several days of no luck I almost gave up on the project. Eventually I decided to switch to network with 2 hidden layers. After several trials,  I settled for a 770-500-100-2 network and was able to get below 5% error.

Training with the Encog Analyst

One of the convenient things about Encog is that it can analyze your data file and normalize it for you. Using the workbench, you can pick a goal (in our case regression) and have it generate a .ega (encog analyst) file which describes the data as well as the several tasks to perform (like splitting it into training and validation sets, randomizing it and so on).

You can simply download the sources and run it to use the workbench. Create a project and then drag the .csv file with your data into it. Right clicking on the data file gives you the option to analyze it and a window comes up allowing you to set any necessary options.

encog_options

Once you hit the OK button  an analyst file is generated. Sadly sometimes Encog misclassifies some inputs as classes rather than continuous values. You can fix these by manually editing the file. It’s also necessary to specify the correct inputs and outputs. The analyst file also contains the network definition and target error. Here’s a copy of the analyst file that I used for training. Once it’s all setup, hitting execute performs all the defined tasks and trains the network. Encog also allows you to stop a command e.g. training once you think the error is acceptable. I was satisfied with a 5% error.

These are links to the related posts:
Introduction
Getting Ready
Collecting Training Data
Training the Neural Network
Testing the Neural Network

The full source code can also be found here and here.

Please leave a comment if you have a question.

Jumping Sumo – Collecting Training Data

For this tutorial, we’ll be using a form of training known as supervised learning. In other words, you provide the network with inputs and the corresponding expected outputs and it is to learn how to produce a correct output (or close enough) given some input. The next step is to decide what our inputs and outputs will be.

To control the drone, you programmatically set the turn and speed values and then set a flag which causes it to take those values into account. We will be tracking previous and current values of the turn and speed with the MotionData class. The JS also has a camera from which we can receive successive frames. Whenever we receive a frame, we add it along with a copy of the current MotionData as a pair to the MotionRecorder’s queue. MotionRecorder works hand in hand with the Consumer class (in a producer-consumer like fashion) to store data for different runs on the external memory of the device. The previous values in the MotionData along with the image will be used as inputs to the network. The expected output will be the current values.

For easy parsing, image files are saved in this format: runn1_n2_n3_n4_n5_n6.png.
n1 is the run number. You can have multiple training runs.
n2 is the zero based index of the image.
n3 and n4 are the previous values of the turn and speed respectively (network inputs)
n5 and n6 are the current values of the turn and speed respectively (network outputs)

Controlling the Drone using a Controller

If you’re like me, you’ll find that controlling a drone via a phone’s gyroscope is quite daunting. After several attempts at that, I opted to use a controller instead. I used a ps4 controller but any that is Bluetooth enabled should work as well. Android treats controller input like any other input. It then becomes necessary to verify that the input is indeed from a controller and then proceed accordingly.

// TrainActivity.java
public boolean onGenericMotionEvent(MotionEvent event) {
    // Check that input came from a game controller
    if ((event.getSource() & InputDevice.SOURCE_JOYSTICK) == InputDevice.SOURCE_JOYSTICK &&
            event.getAction() == MotionEvent.ACTION_MOVE) {
        // Process all historical movement samples in the batch
        final int historySize = event.getHistorySize();
        for (int i = 0; i < historySize; i++) {
            processJoystickIInput(event, i);
        }

        // Process the current movement sample in the batch (position -1)
        processJoystickIInput(event, -1);

        return true;
    }

    return super.onGenericMotionEvent(event);
}

private void processJoystickIInput(MotionEvent event, int historyPos) {
    InputDevice input = event.getDevice();

    // Calculate the horizontal distance to move by using the input value from the right joystick
    float x = getCenteredAxis(event, input, MotionEvent.AXIS_Z, historyPos);
    float y = getCenteredAxis(event, input, MotionEvent.AXIS_Y, historyPos);

    // Move drone
    byte turnSpeed = (byte) (MAX_TURN_SPEED * x);
    byte forwardSpeed = (byte) (MAX_FORWARD_SPEED * -y);

    // Stop if no joystick motion
    if (x == 0 && y == 0) {
        mJSDrone.setFlag((byte) 0);
    } else {
        mJSDrone.setSpeed(forwardSpeed);
        mJSDrone.setTurn(turnSpeed);
        mJSDrone.setFlag((byte) 1);
    }

    motionData.updateMotion(forwardSpeed, turnSpeed);
}

getCenteredAxis() returns the displacement of the stick indicated from its center. It uses the default implementation in the android developer website.

Processing Images Before Storing

I read somewhere on the internet that neural networks that work with images sometimes do better with grayscale images 🙂 As a result, the images that are received are decoded to a grayscale representation using the Imgcodecs.imdecode() from OpenCV.

By default, the JS takes captures frames at a 640px x 480px resolution. If we were to use it at this scale, training would be extremely slow. Each pixel corresponds to a network input so there would be 307200 pixels for each image and each training run saves several hundred images. As a result, I resized the downscaled image to 32px x 24px using  Imgproc.resize().

Finally, I noticed that training in areas with different lighting led to different results. To reduce this discrepancy, I used a technique known as binary thresholding to set all pixels below a certain threshold (arbitrarily chosen as 160 in this case) to black and all above to white. This way, the white paper used to form the track stands out from its surroundings. The image is then stored in the previously mentioned format using Imgcodecs.imwrite().

// Consumer.java
private void process(Pair<MotionData, byte[]> pair) {
    MotionData motion = pair.first;
    byte[] data = pair.second;

    String fileName = String.format(Locale.US, "run%d_%d_%d_%d_%d_%d.png",
            runNumber, imageNumber,
            motion.getPrevTurnSpeed(), motion.getPrevForwardSpeed(),
            motion.getTurnSpeed(), motion.getForwardSpeed());
    File file = new File(outDir, fileName);

    Mat img = Imgcodecs.imdecode(new MatOfByte(data), Imgcodecs.CV_LOAD_IMAGE_GRAYSCALE);
    Imgproc.resize(img, img, new Size(32, 24));
    Thresholder.Threshold(img, img);
    Imgcodecs.imwrite(file.getAbsolutePath(), img);
    Log.i(TAG, "Wrote data: " + fileName);

    imageNumber++;

    // Notify listeners
    for (MotionRecorder.QueueUpdateListener listener : listeners) {
        listener.onItemConsumed();
    }
}
// Thresholder.java
public class Thresholder {
    public static final int MAXVAL = 255;
    public static final int THRESHOLD = 160;

    public static void Threshold(Mat src, Mat dst) {
        Imgproc.threshold(src, dst, THRESHOLD, MAXVAL, Imgproc.THRESH_BINARY);
    }
}

Here’s what the initial image looks like:

sumo_view
Original image before any processing

Here’s the image after thresholding has been applied:

thresholded
Image after it’s been grayscaled and thresholding has been applied

Once the training data has been collected, the next step is to design the neural network and train it.

These are links to the related posts:
Introduction
Getting Ready
Collecting Training Data
Training the Neural Network
Testing the Neural Network

The full source code can also be found here and here.

Please leave a comment if you have a question.

Jumping Sumo – Getting Ready

Jumping Sumo

To begin with, you’ll need the Jumping Sumo drone by Parrot which can be purchased here. You connect to it via WiFi and can control it with the FreeFlight 3 app available on the app store. For this tutorial, I’ll be using an android device. The Parrot SDK allows you to control the drone via code and you can add it to you android project by following the instructions here.

Encog

Encog is a machine learning framework by Jeff Heaton with support for various types of neural networks and training algorithms. It is fully implemented in Java with support for parallel processing which speeds up the algorithms. There are several other frameworks available like Neuroph, DeepLearning4j, Tensorflow… I chose Encog because Jeff’s books got me interested in Neural Networks in the first place. To use it, download the sources off GitHub.

OpenCV

OpenCV stands for Open Computer Vision and is a library of functions that are useful for computer vision. To keep things simple, whenever I used a function from this library, I’ll try to explain exactly what it does and why it was used to the best of my knowledge. See here for instructions on adding OpenCV to your android project.

Complete Source

I find that skipping introductions and going straight to code can be helpful sometimes. If this applies to you, the source code for the tutorial can be found here.

These are links to the related posts:
Introduction
Getting Ready
Collecting Training Data
Training the Neural Network
Testing the Neural Network

The full source code can also be found here and here.

Please leave a comment if you have a question.

Jumping Sumo – Introduction

Having been recently introduced to neural networks, my fascination for the concept has grown quite a lot. I find it “interesting” that you can train a network to output reasonable results whilst not fully grasping how exactly the results are obtained. Having watched a lot of YouTube videos demonstrating robots of various kinds and shapes being controlled autonomously, I figured it was time I attempted one of my own.

I know very little about hardware (perhaps nothing other than a Modern Digital Systems Design class I took several years ago) and did not want to mess around with it. I needed a device which was programmable out of the box. Ergo, the Jumping Sumo by Parrot.I wanted to reproduce something similar to this but just have the drone move around the track (just once would be a confirmation that my efforts were not wasted). In the next few posts, I’ll describe the various steps involved in setting up, collecting data, training a neural network and finally testing it out. For eye candy, here are my results (not the best but hey, it’s a first).


These are links to the related posts:
Introduction
Getting Ready
Collecting Training Data
Training the Neural Network
Testing the Neural Network

The full source code can also be found here and here.

Please leave a comment if you have a question.