Google Vision Image recognition for Android

Recently Google announced new API for image recognition. It is one the most exiting solutions from Google and prices are quite affordable.
So, I decided to switch from CamFind API to Google’s, because the last one is much more efficient and faster. Camfind worked quite well, but it took time for upload of image and further recognition. So, the whole process took more than one minute. Google vision takes about 5 seconds at top.

1. Get Google Cloud API key.
Screenshot from 2016-05-03 16:34:34 You need to create a Key in Google Cloud Platform console.
Note: you need to create Server API Key and use it for all platforms. The following solution is Java-based and doesn’t work properly with Android Key.

2. Functions to get image from camera
Add the following functions to your activity in order to start and manipulate camera.

public void startCamera() {
        if (PermissionUtils.requestPermission(this, CAMERA_PERMISSIONS_REQUEST, Manifest.permission.READ_EXTERNAL_STORAGE, Manifest.permission.CAMERA)) {
            Intent intent = new Intent(MediaStore.ACTION_IMAGE_CAPTURE);
            intent.putExtra(MediaStore.EXTRA_OUTPUT, Uri.fromFile(getCameraFile()));
            startActivityForResult(intent, CAMERA_IMAGE_REQUEST);
        }
    }

    public File getCameraFile() {
        File dir = Environment.getExternalStoragePublicDirectory(Environment.DIRECTORY_PICTURES);
        return new File(dir, FILE_NAME);
    }

After you take the picture, you need to get it in ActivityResult

@Override
    public void
    onActivityResult(int requestCode, int resultCode, Intent intent)
    {
        if (requestCode == GALLERY_IMAGE_REQUEST && resultCode == RESULT_OK && intent != null) {
            uploadImage(intent.getData());
        }
        else if(requestCode == CAMERA_IMAGE_REQUEST && resultCode == RESULT_OK) {
            uploadImage(Uri.fromFile(getCameraFile()));
        }
    }

    public void uploadImage(Uri uri) {
          new CloudVisionTask(this, uri).execute();
    }

3. The most interesting part – Image recognition task.

public class CloudVisionTask extends AsyncTask {
    private static final String CLOUD_VISION_API_KEY = "YOUR SERVER API KEY";
    private Uri uri;
    private MainActivity mainActivity;
    private ProgressDialog dialog;
    public CloudVisionTask(MainActivity mainActivity, Uri uri)
    {
        this.uri = uri;
        this.mainActivity=mainActivity;
        this.dialog = new ProgressDialog(mainActivity);
    }
    @Override
    protected String doInBackground(Object... params) {
        Bitmap bitmap = null;
        if (uri != null) {

            try {
                // scale the image to 800px to save on bandwidth
                bitmap = scaleBitmapDown(MediaStore.Images.Media.getBitmap(mainActivity.getContentResolver(), uri), 1200);

            } catch (IOException e) {
                Log.d(Helpers.TAG, "Image picking failed because " + e.getMessage());
                Toast.makeText(mainActivity, R.string.image_picker_error, Toast.LENGTH_LONG).show();
            }
        } else {
            Log.d(Helpers.TAG, "Image picker gave us a null image.");
            Toast.makeText(mainActivity, R.string.image_picker_error, Toast.LENGTH_LONG).show();
        }
        if (bitmap!=null) {
            try {
                HttpTransport httpTransport = AndroidHttp.newCompatibleTransport();
                JsonFactory jsonFactory = GsonFactory.getDefaultInstance();

                Vision.Builder builder = new Vision.Builder(httpTransport, jsonFactory, null);
                builder.setVisionRequestInitializer(new
                        VisionRequestInitializer(CLOUD_VISION_API_KEY));
                Vision vision = builder.build();

                BatchAnnotateImagesRequest batchAnnotateImagesRequest =
                        new BatchAnnotateImagesRequest();
                final Bitmap finalBitmap = bitmap;
                batchAnnotateImagesRequest.setRequests(new ArrayList() {{
                    AnnotateImageRequest annotateImageRequest = new AnnotateImageRequest();

                    // Add the image
                    Image base64EncodedImage = new Image();
                    // Convert the bitmap to a JPEG
                    // Just in case it's a format that Android understands but Cloud Vision
                    ByteArrayOutputStream byteArrayOutputStream = new ByteArrayOutputStream();
                    finalBitmap.compress(Bitmap.CompressFormat.JPEG, 90, byteArrayOutputStream);
                    byte[] imageBytes = byteArrayOutputStream.toByteArray();

                    // Base64 encode the JPEG
                    base64EncodedImage.encodeContent(imageBytes);
                    annotateImageRequest.setImage(base64EncodedImage);

                    // add the features we want
                    annotateImageRequest.setFeatures(new ArrayList() {{
                        Feature logoDetection = new Feature();
                        logoDetection.setType("LOGO_DETECTION");
                        logoDetection.setMaxResults(10);
                        add(logoDetection);
                        Feature textDetection = new Feature();
                        textDetection.setType("TEXT_DETECTION");
                        textDetection.setMaxResults(10);
                        add(textDetection);
                        Feature labelDetection = new Feature();
                        labelDetection.setType("LABEL_DETECTION");
                        labelDetection.setMaxResults(10);
                        add(labelDetection);
                    }});

                    // Add the list of one thing to the request
                    add(annotateImageRequest);
                }});

                Vision.Images.Annotate annotateRequest =
                        vision.images().annotate(batchAnnotateImagesRequest);
                // Due to a bug: requests to Vision API containing large images fail when GZipped.
                annotateRequest.setDisableGZipContent(true);
                Log.d(Helpers.TAG, "created Cloud Vision request object, sending request");

                BatchAnnotateImagesResponse response = annotateRequest.execute();
                return convertResponseToString(response);

            } catch (GoogleJsonResponseException e) {
                Log.d(Helpers.TAG, "failed to make API request because " + e.getContent());
            } catch (IOException e) {
                Log.d(Helpers.TAG, "failed to make API request because of other IOException " +
                        e.getMessage());
            }
        }
        return "Cloud Vision API request failed. Check logs for details.";
    }

    @Override
    protected void onPreExecute() {
        super.onPreExecute();

        dialog.setTitle(R.string.ContactingServers);
        dialog.setMessage(mainActivity.getResources().getString(R.string.image_picker_task));
        dialog.setIndeterminate(false);
        dialog.setCancelable(true);
        if (mainActivity!=null)
            dialog.show();
    }

    protected void onPostExecute(String result) {
        if (dialog.isShowing()) {
            dialog.dismiss();
        }
        mainActivity.setEditText(result);
        mainActivity.obtainAll();
    }

    private String convertResponseToString(BatchAnnotateImagesResponse response) {
        String message = "";

        List logos = response.getResponses().get(0).getLogoAnnotations();
        if (logos != null) {
            for (EntityAnnotation text : logos) {
                Log.d(Helpers.TAG, text.toString());
                message += text.getDescription().replaceAll("\\n"," ");
                break;
            }
        }
        if (message.equals("")) {
            List texts = response.getResponses().get(0).getTextAnnotations();
            if (texts != null) {
                for (EntityAnnotation text : texts) {
                    Log.d(Helpers.TAG, text.toString());
                    message += text.getDescription().replaceAll("\\n", " ");
                    break;
                }
            }
        }
        if (message.equals("")) {
            List labels = response.getResponses().get(0).getLabelAnnotations();
            if (labels != null) {
                for (EntityAnnotation label : labels) {
                    Log.d(Helpers.TAG, label.toString());
                    if (label.getScore() >= 0.7) {
                        message += label.getDescription() + " ";
                    }else {
                        break;
                    }
                }
            } else {
                message += "";
            }
        }

        return message;
    }

    public Bitmap scaleBitmapDown(Bitmap bitmap, int maxDimension) {

        int originalWidth = bitmap.getWidth();
        int originalHeight = bitmap.getHeight();
        int resizedWidth = maxDimension;
        int resizedHeight = maxDimension;

        if (originalHeight > originalWidth) {
            resizedHeight = maxDimension;
            resizedWidth = (int) (resizedHeight * (float) originalWidth / (float) originalHeight);
        } else if (originalWidth > originalHeight) {
            resizedWidth = maxDimension;
            resizedHeight = (int) (resizedWidth * (float) originalHeight / (float) originalWidth);
        } else if (originalHeight == originalWidth) {
            resizedHeight = maxDimension;
            resizedWidth = maxDimension;
        }
        return Bitmap.createScaledBitmap(bitmap, resizedWidth, resizedHeight, false);
    }
}

As you in my case, I am trying to detect first Logo, then recognize OCR text and only then Labels. This allows me to be sure, that in any case, I will get some results from search.

And here are some results with this experiments:


The feature is in development currently, but soon will be available for “Is it kosher?” and “Is it gluten free?” projects.

Leave a Reply

%d bloggers like this: