Introducing the Einstein Vision Add-on for Image Recognition

The most innovative apps augment our human senses, intuition, and logic with machine learning. Deep learning, modelled after the neural networks of the human brain, continues to grow as one of the most powerful types of machine learning. When applied to images, deep learning enables powerful computer vision features like visual search, product identification, and brand detection.

Today, we bring you the Einstein Vision add-on (beta), allowing Heroku developers to easily connect to and use Einstein Vision, a set of powerful new APIs for building AI-powered apps. With this release, Salesforce is making it easy for you to embed image recognition directly into your apps. Rather than building and managing the specialized infrastructure needed to host deep learning models, simply connect to Einstein Vision's HTTP/REST API for custom image recognition with little development overhead.

Use Einstein Vision to discover your products across your social media channels, analyze observational data in healthcare and life science applications, and enable visual search in eCommerce apps to delight your customers. Get started quickly with pre-trained image classifiers that are automatically available to you when you install the Einstein Vision add-on.

A Simple Workflow for Custom Image Recognition

The true strength of Einstein Vision is its ability to train custom models to recognize the things you care about. Creating a custom model takes just a few steps:

  1. Plan
  2. Collect
  3. Train & Evaluate
  4. Query

Let's walk through the workflow to create a brand recognizer for Heroku logos and artwork, which is based on the Einstein Vision example app in Node.js.

Plan the Model: Label All the Things

In machine learning, the “model” is the brain that answers questions, and “labels” are the possible answers. To have Einstein Vision recognize specific objects, we will train the model using example images for each label. For the example brand recognizer app, labels represent visual aspects of the Heroku brand.

Start with the labels of primary interest:

  • Heroku logo, isolated logos
  • Heroku artwork, various supporting imagery
  • Heroku swag, t-shirts, socks, water bottles, etc.

Then, think about images that do not contain one of the objects we want to recognize. How will the model answer those questions? Let's plan a negative, catch-all label representing the infinite world of objects beyond our target labels:

  • Unknown, a random set of things we don't care about

The unknown set is a curiosity at first. Remember that the model can only answer questions it's been trained to answer. If you want a clear indication of the model not matching a label, then train negative labels as well.

Collect Example Images

Before diving into the actual machine learning, we must gather example images that represent each of the planned labels. Each label needs a variety of example images: in isolation, in normal surroundings, from various angles, with various compositions, and with 2D & 3D representations. This avoids over-fitting the model, improving the flexibility in classification of unseen images. We collect examples by sorting them into a directory named for each label, preparing them for zip upload into a dataset.

While a model can be trained with very few images per label, more training examples will dramatically improve prediction accuracy for unseen images. We've built demos with just a few dozen examples per label, but at least a thousand images per label is recommended for high-confidence predictions.

Train The Model

Once the example images are collected, we will use the REST/HTTP API provided by the add-on to upload the dataset.

The steps to train a model are:

  1. Upload example images
  2. Initiate training
  3. Check training status
  4. Inspect the model's metrics

Walkthrough the API flow with our example.

Performance Evaluation

After training, Einstein Vision automatically evaluates the model using cross-validation. It withholds a random 10% (k-fold = 10) of the example data to test the model. The accuracy of predictions from that portion of unseen data, the testAccuracy, represents how the model will perform in the real-world.

Fetch model metrics from the API for any trained model to get its testAccuracy. Additional metrics returned may indicate issues with examples confusing the algorithm or errors reducing the useful dataset.

To tune a model, revise the source dataset to address any issues and then create, train, and evaluate a new model. After tuning, the model with superior metrics may be considered production-ready.

Query The Model

Once training is complete, the new model will answer queries to classify images by URL reference or direct upload. Here's an example query using the curl command-line tool:

$ curl -X POST \
  -F "sampleContent=@./path/to/image.jpg" \
  -F "modelId=YYYYY" \
  -H "Authorization: Bearer XXXXX" \
  -H "Content-Type: multipart/form-data" \

Example JSON response:

  "probabilities": [
      "label": "Heroku Artwork",
      "probability": 0.53223926
      "label": "unknown",
      "probability": 0.46305126
      "label": "Heroku Swag",
      "probability": 0.0038324401
      "label": "Heroku Logo",
      "probability": 0.0008770062
  "object": "predictresponse"

Pipeline to Production

One of the wonderful things about Heroku apps is that after you get a proof-of-concept running, add the app to a Pipeline to enable enterprise-grade continuous delivery, including: Review Apps, CI Tests (in private beta), and elegant promotion to production.

To share a model created in one app with other apps in a Heroku Pipeline, such as promoting an app from Review App to Staging and finally to Production, the add-on must be shared between those apps.

Only the Beginning

We can’t wait to see what you build with the Einstein Vision add-on (beta). Einstein Vision is free to get started, and we plan to introduce paid plans at GA in a few weeks. Check out the add-on documentation, then dive in with our Node example app or add it to your own app to try it out.

Browse the blog archives, subscribe to the full-text feed, or visit the engineering blog.