Building Lightweight OCR Applications on Android with TFLite: Leveraging 1-Channel Grayscale Input with Keras Custom Dataset

5 min readAug 12, 2024

In today’s mobile applications, optical character recognition (OCR) has become a powerful tool for extracting and utilizing text from images. Whether it’s for scanning receipts, translating text, or digitizing documents, integrating OCR into mobile apps enhances their functionality and user experience. TensorFlow Lite (TFLite) offers a streamlined way to deploy machine learning models on mobile devices, making it ideal for implementing OCR solutions. In this article, we will explore how to integrate a pre-trained and fine-tuned Keras OCR model into an Android application using TensorFlow Lite. We will walk through the necessary setup, preprocessing steps, and code implementation to achieve efficient and accurate text recognition on Android devices.

Introduction

Optical Character Recognition (OCR) technology enables applications to convert images of text into machine-encoded text. The Keras OCR model, available on GitHub, is a robust solution for this purpose, leveraging deep learning techniques to recognize text from images effectively. TensorFlow Lite, a lightweight version of TensorFlow, allows for the efficient deployment of these models on mobile devices, ensuring high performance and low latency.

In this guide, we will integrate a Keras OCR model into an Android app using TensorFlow Lite. This involves several steps:

Setting Up Dependencies: We will configure the Gradle build file to include the required TensorFlow Lite dependencies.
Preparing Your Model: We will place the pre-trained Keras OCR model in the appropriate directory of the Android project.
Understanding the Keras OCR Model: We will review the model’s input and output formats to ensure proper handling.
Image Preprocessing: We will preprocess the images by cropping, converting to grayscale, and adapting the format to suit the model’s input requirements.
Code Implementation: We will provide code snippets to demonstrate how to load the model, preprocess images, run inference, and handle the output.

By following this guide, you will learn how to leverage TensorFlow Lite for efficient OCR processing on Android devices, making your applications more intelligent and versatile.

Setting Up Dependencies

To get started with TensorFlow Lite in your Android project, you need to include the necessary dependencies in your build.gradle file. Add the following lines:

implementation("org.tensorflow:tensorflow-lite:2.13.0")
implementation("org.tensorflow:tensorflow-lite-support:0.4.4")
implementation("org.tensorflow:tensorflow-lite-gpu:2.13.0")
implementation("org.tensorflow:tensorflow-lite-select-tf-ops:2.13.0")

These dependencies provide support for TensorFlow Lite, including GPU acceleration and TensorFlow operations.

Preparing Your Model

Place your pre-trained and fine-tuned Keras OCR model in the assets directory of your Android project. The model file should be named {name_your_model}.tflite, for example, ocr_model.tflite. The path to this file should be app/src/main/assets/ocr_model.tflite.

Understanding Keras OCR Model

The Keras OCR model is designed to detect and recognize text from images using deep learning techniques. To effectively integrate this model into your Android application, it’s crucial to understand its input and output formats.

The model expects input images with the shape (None, 200, 31, 1), which can be broken down as follows:

None represents the batch size, allowing for variable numbers of images to be processed simultaneously.
200 is the height of the input image.
31 is the width of the input image.
1 indicates that the image is in grayscale (single channel).

For the output, the model produces a tensor with the shape (None, 48, 1, 1), detailed as:

None again represents the batch size.
48 is the number of predicted text boxes or character sequences.
1 denotes the height and width of the output, which are both 1 in this case, reflecting the sequential nature of text prediction.

Input and Output Example

Input: The input image must be preprocessed to fit the dimensions expected by the model. This involves resizing or padding the image to match the shape (200, 31) and converting it to a single-channel grayscale image.
Output: The output tensor provides predictions in the form of character probabilities or bounding boxes. Each element in the output corresponds to a detected text box or character, with the shape (48, 1, 1) indicating a sequence of detected characters.

Understanding these dimensions helps in properly preparing the input data and interpreting the model’s predictions, ensuring accurate text recognition in your application.

Image Preprocessing Steps

Before feeding an image into the TensorFlow Lite model, you need to preprocess it. Here’s a step-by-step approach:

Crop the Bitmap: Extract the area of interest from the image based on bounding boxes. This focuses the OCR process on relevant parts of the image.
Grayscale the Bitmap: Convert the image to grayscale. This simplifies the image by removing color information, which can be helpful for OCR.
Convert to 1-Channel Input: The model expects a single-channel (grayscale) image. Convert the RGB image (3 channels) to a single channel.

5. Code Example for Input and Output

Here’s how you can handle the image preprocessing and model inference in your Android application:

fun loadModelFile(context: Context, modelFile: String): MappedByteBuffer {
    val fileDescriptor = context.assets.openFd(modelFile)
    val inputStream = FileInputStream(fileDescriptor.fileDescriptor)
    val fileChannel = inputStream.channel
    val startOffset = fileDescriptor.startOffset
    val declaredLength = fileDescriptor.declaredLength
    val retFile = fileChannel.map(FileChannel.MapMode.READ_ONLY, startOffset, declaredLength)
    fileDescriptor.close()
    return retFile
}

// Load the model
val interpreter = Interpreter(context, loadModelFile("ocr_model.tflite"))

// Preprocess the image
fun preprocessImage(bitmap: Bitmap): ByteBuffer {
    // Convert to grayscale
    val height: Int = bmpOriginal.height
    val width: Int = bmpOriginal.width
    val bmpGrayscale = Bitmap.createBitmap(width, height, Bitmap.Config.ARGB_8888)
    val canvas = Canvas(bmpGrayscale)
    val paint = Paint()

    // Based on OpenCv Conversion to Grayscale
    // https://docs.opencv.org/master/de/d25/imgproc_color_conversions.html#color_convert_rgb_gray
    // and info from
    // https://medium.com/mobile-app-development-publication/android-image-color-change-with-colormatrix-e927d7fb6eb4
    val matrix = floatArrayOf(
        0.299f, 0.587f, 0.114f, 0f, 0f,
        0.299f, 0.587f, 0.114f, 0f, 0f,
        0.299f, 0.587f, 0.114f, 0f, 0f,
        0f, 0f, 0f, 1f, 0f)
    val colorMatrixFilter = ColorMatrixColorFilter(matrix)
    paint.colorFilter = colorMatrixFilter
    canvas.drawBitmap(bmpOriginal, 0f, 0f, paint)
    
    // Convert to 1-channel input
    val bitmap = Bitmap.createScaledBitmap(
        bmpGrayscale,
        200, // requirement model width
        31, // requirement model height
        true
    )
    val width = bitmap.width
    val height = bitmap.height
    // Below 4 is for floats and 2nd one (1) for grayscale
    val mImgData: ByteBuffer = ByteBuffer.allocateDirect(1 * width * height * 1 * 4)
    mImgData.order(ByteOrder.nativeOrder())
    val pixels = IntArray(width * height)
    bitmap.getPixels(pixels, 0, width, 0, 0, width, height)
    for (pixel in pixels) {
        mImgData.putFloat(Color.blue(pixel).toFloat() / 255.0f)
    }
    
    return inputArray
}

// Run inference
val outputMap = Array(1) { LongArray(48) { 0 } } //48 from output shape the model
val inputArray = preprocessImage(bitmap)
interpreter.run(inputArray, outputMap)

val alphabets = "0123456789abcdefghijklmnopqrstuvwxyz."
val outputArray = arrayOutputs[0]
val sb: StringBuilder = StringBuilder()
for (i in outputArray.indices) {
    if (outputArray[i].toString() != "-1") {
        sb.append(alphabets[outputArray[i].toInt()])
    }
}

Result Image

After processing the image through the model, you will get an output image that contains the detected and recognized text. This result can be visualized or further processed based on your application’s requirements.