Jumping Sumo – Collecting Training Data

For this tutorial, we’ll be using a form of training known as supervised learning. In other words, you provide the network with inputs and the corresponding expected outputs and it is to learn how to produce a correct output (or close enough) given some input. The next step is to decide what our inputs and outputs will be.

To control the drone, you programmatically set the turn and speed values and then set a flag which causes it to take those values into account. We will be tracking previous and current values of the turn and speed with the MotionData class. The JS also has a camera from which we can receive successive frames. Whenever we receive a frame, we add it along with a copy of the current MotionData as a pair to the MotionRecorder’s queue. MotionRecorder works hand in hand with the Consumer class (in a producer-consumer like fashion) to store data for different runs on the external memory of the device. The previous values in the MotionData along with the image will be used as inputs to the network. The expected output will be the current values.

For easy parsing, image files are saved in this format: runn1_n2_n3_n4_n5_n6.png.
n1 is the run number. You can have multiple training runs.
n2 is the zero based index of the image.
n3 and n4 are the previous values of the turn and speed respectively (network inputs)
n5 and n6 are the current values of the turn and speed respectively (network outputs)

Controlling the Drone using a Controller

If you’re like me, you’ll find that controlling a drone via a phone’s gyroscope is quite daunting. After several attempts at that, I opted to use a controller instead. I used a ps4 controller but any that is Bluetooth enabled should work as well. Android treats controller input like any other input. It then becomes necessary to verify that the input is indeed from a controller and then proceed accordingly.

// TrainActivity.java
public boolean onGenericMotionEvent(MotionEvent event) {
    // Check that input came from a game controller
    if ((event.getSource() & InputDevice.SOURCE_JOYSTICK) == InputDevice.SOURCE_JOYSTICK &&
            event.getAction() == MotionEvent.ACTION_MOVE) {
        // Process all historical movement samples in the batch
        final int historySize = event.getHistorySize();
        for (int i = 0; i < historySize; i++) {
            processJoystickIInput(event, i);
        }

        // Process the current movement sample in the batch (position -1)
        processJoystickIInput(event, -1);

        return true;
    }

    return super.onGenericMotionEvent(event);
}

private void processJoystickIInput(MotionEvent event, int historyPos) {
    InputDevice input = event.getDevice();

    // Calculate the horizontal distance to move by using the input value from the right joystick
    float x = getCenteredAxis(event, input, MotionEvent.AXIS_Z, historyPos);
    float y = getCenteredAxis(event, input, MotionEvent.AXIS_Y, historyPos);

    // Move drone
    byte turnSpeed = (byte) (MAX_TURN_SPEED * x);
    byte forwardSpeed = (byte) (MAX_FORWARD_SPEED * -y);

    // Stop if no joystick motion
    if (x == 0 && y == 0) {
        mJSDrone.setFlag((byte) 0);
    } else {
        mJSDrone.setSpeed(forwardSpeed);
        mJSDrone.setTurn(turnSpeed);
        mJSDrone.setFlag((byte) 1);
    }

    motionData.updateMotion(forwardSpeed, turnSpeed);
}

getCenteredAxis() returns the displacement of the stick indicated from its center. It uses the default implementation in the android developer website.

Processing Images Before Storing

I read somewhere on the internet that neural networks that work with images sometimes do better with grayscale images 🙂 As a result, the images that are received are decoded to a grayscale representation using the Imgcodecs.imdecode() from OpenCV.

By default, the JS takes captures frames at a 640px x 480px resolution. If we were to use it at this scale, training would be extremely slow. Each pixel corresponds to a network input so there would be 307200 pixels for each image and each training run saves several hundred images. As a result, I resized the downscaled image to 32px x 24px using  Imgproc.resize().

Finally, I noticed that training in areas with different lighting led to different results. To reduce this discrepancy, I used a technique known as binary thresholding to set all pixels below a certain threshold (arbitrarily chosen as 160 in this case) to black and all above to white. This way, the white paper used to form the track stands out from its surroundings. The image is then stored in the previously mentioned format using Imgcodecs.imwrite().

// Consumer.java
private void process(Pair<MotionData, byte[]> pair) {
    MotionData motion = pair.first;
    byte[] data = pair.second;

    String fileName = String.format(Locale.US, "run%d_%d_%d_%d_%d_%d.png",
            runNumber, imageNumber,
            motion.getPrevTurnSpeed(), motion.getPrevForwardSpeed(),
            motion.getTurnSpeed(), motion.getForwardSpeed());
    File file = new File(outDir, fileName);

    Mat img = Imgcodecs.imdecode(new MatOfByte(data), Imgcodecs.CV_LOAD_IMAGE_GRAYSCALE);
    Imgproc.resize(img, img, new Size(32, 24));
    Thresholder.Threshold(img, img);
    Imgcodecs.imwrite(file.getAbsolutePath(), img);
    Log.i(TAG, "Wrote data: " + fileName);

    imageNumber++;

    // Notify listeners
    for (MotionRecorder.QueueUpdateListener listener : listeners) {
        listener.onItemConsumed();
    }
}
// Thresholder.java
public class Thresholder {
    public static final int MAXVAL = 255;
    public static final int THRESHOLD = 160;

    public static void Threshold(Mat src, Mat dst) {
        Imgproc.threshold(src, dst, THRESHOLD, MAXVAL, Imgproc.THRESH_BINARY);
    }
}

Here’s what the initial image looks like:

sumo_view
Original image before any processing

Here’s the image after thresholding has been applied:

thresholded
Image after it’s been grayscaled and thresholding has been applied

Once the training data has been collected, the next step is to design the neural network and train it.

These are links to the related posts:
Introduction
Getting Ready
Collecting Training Data
Training the Neural Network
Testing the Neural Network

The full source code can also be found here and here.

Please leave a comment if you have a question.

Advertisements

4 thoughts on “Jumping Sumo – Collecting Training Data

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s