SEATTLE—We’ve been tracking Microsoft’s work to bring its machine learning platform to more developers and more applications over the last several years. What started as narrowly focused, specialized services have grown into a wider range of features that are more capable and more flexible, while also being more approachable to developers who aren’t experts in the field of machine learning.
This year is no different. The core family of APIs covers the same ground as it has for a while—language recognition, translation, and image and video recognition—with Microsoft taking steps to make the services more capable and easier to integrate into applications.
The company’s focus this year is on two main areas: customization and edge deployment. All the machine learning services operate in broadly the same way, with two distinct phases. The first phase is building a model: a set of test data (for example, text in one language and its translation into another language, or photos of animals along with information about which animal they are) is used to train neural networks to construct the model. The second phase is using the model: new data (say, untranslated text or an image of an unknown animal) is fed into the model and an output is produced according to what the neural nets learned (the translation, or the kind of animal pictured).
Microsoft’s services come with prebuilt models. The customization allows them to be extended by training them on business-specific data. For example, the translation service might be extended with custom translations for particular phrases and jargon that are important to a particular industry and need to be translated a particular way; speech recognition might be customized for particular accents and vocabulary; text to speech might use a customized acoustic model to change the sound of the voice produced.
The customization works in much the same way as the original training and model-building: developers will train the system using their own corpus of test data, building on the preexisting models.
The two phases of machine learning systems have different compute requirements. The initial training and model-building is extremely compute intensive, often accelerated with GPUs or even dedicated machine learning processors. Using the model, by contrast, is comparatively lightweight. That’s not to say it’s trivial—doing complex image recognition on motion video, for example, will likely require GPU acceleration—but the workloads can be small enough that it makes sense to run them on client systems.
This is where Microsoft’s focus on edge deployment comes in. Models can be deployed onto, for example, Windows Machine Learning or the Azure IOT Edge runtime, allowing these tasks to be pushed out onto phones, PCs, and embedded devices. Doing this reduces latency, allows disconnected operation (for example, performing image recognition in a drone or a non networked industrial system), and reduces the compute resources (and hence, monthly bills for cloud computing) that developers have.
The Vision service will be the first to support this edge deployment. The models themselves use ONNX, a format developed by Microsoft, Facebook, and Amazon Web Services and supported by Nvidia, Qualcomm, Intel, and AMD.
Microsoft is also adding new services. There’s a forecasting service that will make predictions, a custom search service that will try to establish connections and cross-references within a data set, and a Bing-powered visual search to retrieve images of objects that are similar to a search image.
The company is also continuing to invest in bots, which are carving a niche for themselves in customer-service-type scenarios. Progressive Insurance’s Flo bot on Facebook Messenger, for example, can handle, end-to-end, the entire process of selling insurance policies, and companies are using bots internally to do things such as improve the interface of self-service HR systems. Microsoft showed a smart demo of a bot seamlessly handling different languages, with contextual conversation (allowing questions that refer back to previous answers), and integration with the website hosting the bot, so the bot conversation could navigate to different pages or perform searches.
New bot services include a conversation learner, which allows the bot framework to learn the patterns of conversations from existing transcripts; a project to give the bots more personality; and the general availability of the Q-and-A maker, a tool for building question-and-answer bots without code.
Enabling new kinds of application
Machine learning and artificial intelligence have become major buzzwords in the computing industry, engendering much cynicism toward both. Much of this cynicism is probably justified, but Microsoft believes that these features will become common to nearly all the applications and services we use.
Machine learning is also enabling applications that simply wouldn’t be practical before. An example is Iris, a company with the mission of ending preventable blindness. Diabetic retinopathy is a complication of diabetes, wherein the central part of the retina becomes damaged, causing loss to the center of the field of vision. With millions of diabetics in the US alone, it’s a significant issue.
Detecting and diagnosing retinopathy requires an ophthalmologist. Iris’ system allows primary care providers to analyze photographs of retinas and provide a first-pass examination of the retina. It has a machine learning image-recognition system that analyzes images in three ways: detecting whether the image is a left eye or a right eye, determining whether the photo quality is good enough for further analysis, and, finally, detecting whether there are signs of retinopathy or not. A physician can take a photograph of the retina and, within 37 seconds, know with high confidence (some 97 to 99 percent) whether a patient needs to be referred to an ophthalmologist or not.
Providing such a service to potentially every doctor in the US or even beyond would be infeasible: Iris would need an enormous number of trained ophthalmologists to sift through the pictures. But with machine learning, the heavy lifting can be done by a computer.
The improvements to machine learning have also made development of the service better: in the early days, the company said it would take a week to train a model using 5,000 test images. This meant that the turnaround during development was slow. Now, with the rise of GPU acceleration, a model can be trained using 10,000 images in just two days. As this kind of acceleration improves, the capabilities and accuracy of systems such as this will only get better.