In my previous entry, 08: Mobile AR Hardware Technologies, I discussed the hardware technologies that are converging to enable mobile augmented reality experiences. Software and services are the other components that are crucial to the delivery of mobile AR experiences and I cover both in this post. There are several layers of software involved, some of which have to do with how developers create experiences while the others are about delivering the experience to the user. Services are applications that provide data or information that is consumed by AR applications over a network (i.e. the Internet). This data is much too diverse and vast to be stored on the device so it is made available upon demand. While hardware and software are responsible for rendering the AR experience, it is services that truly bring value to the user.

In this post I will be discussing concepts in general terms without getting into the details about the various vendors that provide the AR software and services. Vendor offerings will be covered in future posts.


Operating System (OS)
An OS is a collection of software that manages computer hardware resources and provides common services for computer programs. The operating system is an essential component of the system software in a computer system. Application programs require an operating system to function.[Wikipedia] Augmented Reality applications and hardware are no exception. The modern mobile OS’s that enable AR experiences include Android, IOS, Windows Phone, and BlackBerry OS.

AR Platform
In the context of Augmented Reality, a platform is a software framework that provides readymade functionality. It is a set of subsystems and interfaces that form a common infrastructure on top of which a set of related products (i.e. apps) are developed. An AR platform provides the means to author content through SDK’s and/or API’s (see below) as well as the interoperability to run on different devices and operating systems.

The advent of mobile devices has ushered in the era of the “app”. The term is derived from “application” and in this context refers to the lightweight programs that are installed on mobile devices to perform specific functions. Though a mobile device may have the hardware components necessary to perform AR, an app is required to tie these components together to deliver an AR experience. An AR app is generally created using a platform SDK though it is also possible to develop one from scratch by leveraging APIs. For the most part, apps are produced by individuals or companies that are independent of the device and operating systems makers and are downloaded through a virtual store or marketplace.

AR Browsers
In many cases, the platform is accessible to users only through it’s own app from which different AR experiences are made available as content. This is often referred to as an AR browser. AR browsers provide the means to search for specific content, surf content based on material that has been categorized into channels or to serve location specific experiences. Whereas an app store allows one to download standalone AR apps, the AR browser is an app that also hosts the content or experience. An apt metaphor would be that AR experiences are to an AR browser what videos are to YouTube.

SDK (software development kit)
An SDK is a set of tools and components that allows one to create apps for an AR platform or content for an AR browser. An AR platform created with the intent of fostering a community of programmers to create content for it make SDKs available as a means to simplify the development process.  SDKs typically include an integrated development environment (IDE), which serves as the central programming interface. The IDE may include a programming window for writing source code, a debugger for fixing program errors, and a visual editor, which allows developers to create and edit the program’s graphical user interface (GUI). IDEs also include a compiler, which is used to create applications from source code files. Most SDKs are intuitive to use, include extensive documentation and contain sample code, all of which to help developers learn how to build basic programs with the SDK and incentivize program development.[Techterms] 

API (application programming interface)
An API is a set of commands, functions, and protocols which programmers can use when building software for a specific (AR) platform. Instead of writing functions from scratch, the API allows programmers to use predefined functions to interact with the system. Operating systems also provide their own APIs. While the API makes the programmer’s job easier, it also benefits the end user, since it ensures all programs using the same API will have a similar user interface.[Techterms]. APIs are typically leveraged through the SDK but they may also stand alone as a programming resource.


Services are self-contained utilities created to be integrated into apps to enhance the functionality of the app in some way. This integration is achieved through a service’s API. There are many categories of services that apps may leverage. The services I cover in this posting are specific to the context of augmented reality experiences. They require the AR system (i.e. mobile device or head mounted display (HMD) ) to have Internet connectivity in order to interact with the them.

The following scenario describes how many of the services I discuss below can be woven together to deliver an AR experience:

An art museum offers an app that uses proximity radio signals to trigger the presentation of information to the user’s head mounted display (HMD) as they move about an exhibit. The information is presented as audio to the earphones and overlaid images to the lenses that highlight various features in the art work as they are being discussed. The app provides the user with the ability to explore more about the subject matter by speaking a trigger word to invoke the search along with the subject they wish to search on. A spoken query such as “SEARCH pointillism” is captured by the microphone, and sent through the museum’s wifi network and the Internet to a voice recognition service that filters out background noise, accounts for accents then returns text. The text is then passed off to the app’s cloud-based search service which houses the museum’s curated database of art terms. The search service engine then parses the term to identify a number of related subjects it anticipates that the user may want to see such as Impressionism, Divisionism, Georges Seurat, Paul Signac, Camille Pissaro, The Windmills at Overschie by Paul Signac, Banks of Seine by Georges Seurat, etc. The results are displayed on the user’s lenses with certain results highlighted to indicate that the museum has relevant exhibit content elsewhere that the user may visit in person. A result is selected using gesture detection of the user’s finger or by speaking the result number. The user then follows directions projected on the lenses to navigate to the exhibit where the additional content is presented (or if there is no relevant museum content the information is presented to the user in the form of audio and images that are not in register with anything physical).

Cloud refers to the storage of data, files or program hosting (apps) on remote servers, the name and location of which is ambiguous to the device and person accessing it. The advent of cloud computing reduces the barriers to entry for developers who need not own and manage their own servers in order to deliver an app or service to the masses. Developers can deliver apps faster because much of the plumbing is already in place for them to readily incorporate (i.e. database, security, web services, scalability, load balancing, etc.). In addition to being a resource for hosting apps, services and data, the cloud can also serve as storage for files and data created by AR app users. Storing such information remotely rather than on the device not only provides extensive space, but also more easily facilitates the sharing of that information with other users when desired. So the cloud itself is a service provided to developers who may leverage it to provide other services to AR apps.

In the reference scenario, all services are being stored in the cloud.


AR Content Delivery
There are several means by which augmented reality experiences can be triggered on mobile devices (which I will be discussing in my next post). Triggering may be accomplished through machine readable markers known as “fiducials” (i.e. QR codes), “natural feature tracking” or shape recognition using the device’s camera, geotracking using GPS to determine one’s location, or low energy radio transmitter proximity sensing (i.e. iBeacons). In the most common triggering scheme, a URL is sent to a cloud-based server that downloads the augmented reality content to the device where it can be executed by the app. The AR platform provides the API that allows the app to access this content. 

In the reference scenario, the proximity radio signals are being used to deliver content to the app about the exhibit the user is viewing.

Search services allow a program to send a query to a search engine to return relevant results that may aid in providing context. Generally this applies to text but may also encompass many other types of content such as sound, images and video. By leveraging a search service, a developer can easily add custom search capabilities to their application without needing to be a search expert or worry about hardware provisioning, setup, and maintenance. 

In the reference scenario, search services are being used to extend the user’s experience by retrieving contextually relevant information.

Voice Recognition
The sounds of speech are encoded into a compact digital form that preserves its information. The connected device relays it to a server in the cloud that contains series of speech recognition models. The server compares the speech against a statistical model to estimate, based on the sounds spoken and the order in which they were spoken, what letters might constitute it. The highest-probability estimates are selected and the speech is then run through a language model, which estimates the words that the speech is comprised of. Given a sufficient level of confidence, the computer then creates a candidate list of interpretations for what the sequence of words in the speech might mean. The device then uses this meaning to execute a command or provide input to the AR app (as in my Search example above) within a matter of seconds.[SmartPlanet] 

In the reference scenario, voice recognition is being used to turn the user’s voice command and query into ascii text and thus binary code understandable by machines.

Image Recognition
The idea of image recognition is to identify that which the user is gazing upon in order to feature it in the augmented reality experience. The device camera is used to capture the image and inbuilt software is used to plot points along edges, corners and features. Distances between points and relative angles are measured and recorded before sending these metrics along to the app’s image recognition service. An app built for a specific purpose will have a limited universe of images and objects it is designed to recognize, and these metrics are compared to that universe of templates to narrow down and select the most likely subject of the gaze. This enables the correct augmented content to be served to the app user. 

In the reference scenario, image recognition may be used in place of proximity radio signals to identify which art piece the user is viewing.

Facial Recognition
Facial recognition technology is a form of biometrics where a 2D or 3D image is used as the basis for a number of facial feature measurements to be taken which uniquely identify ones face, ideally at different pose angles. This information is stored in a cloud database against which an app service can compare images of faces it encounters to identify them. Facial recognition services in the cloud will allow AR apps to identify persons captured by the device camera to then use separate services to return identity, biographical or security information on the view screen. An example of how it can be used as an AR service would be in a healthcare setting to help people with compromised memories know the names and relationships of friends, loved ones and caregivers.


A mock-up of how facial recognition may be incorporated into a mobile app

Social Media
Social media services provided by companies such as Facebook, Twitter and Google are poised to be an integral part of augmented reality experiences by tying one’s social graph into their AR experiences. An example would be an app that allows one to leave suggestions, comments, reviews, rants and raves about items on a menu at a restaurant. When friends visit the establishment and use the app to gaze at the menu they will see the comments of the people they have relationships with. In addition to accessing one’s social graph through a service, geolocation and image recognition services would come into play here as well.

Location and Navigation
Knowing where the user is allows an app to be smarter and deliver better information. Location-aware applications can utilize the device GPS sensor or a network location provider service to acquire the user’s location indoors. Once the app knows the user’s location coordinates it can interact with services for maps, street views, places, satellite imagery, elevation, etc. Navigation is one of the strongest use cases for AR through the overlay of visual navigational prompts on either head mounted displays or windshield projection along with verbal prompts.


Windscreen augmented reality navigation concept from MicroVision

3D Models
There are many AR use cases that call for in-app 3D models. An example of this is depicted in the photo below where the various auto parts that the mechanic must work with are rendered in 3D along with the pair of hands that demonstrate how the parts are to be held and manipulated. The models would be called by the app from the 3D model service based on the make, model, year and repair being performed.


Cloud-based messaging services allow the sending of data from a server to users’ mobile devices, and also to receive messages from devices on the same connection. The service handles all aspects of queuing of messages and delivery to the target application running on the target device.[Android Developers] The sending and receiving of messages through one’s head mounted display will be a key social component allowing users to provide spontaneous updates on their status while receiving the same from others by whom they care to be updated.

The purchasing of goods and services from within AR applications is a key capability that can also be integrated through a service API. This is essentially a cloud-based means to access and store payment information for app creators. Such a service can streamline the purchase process and reduce the amount of information customers need to enter.[Android Developers]

A common model for app development today is to provide the app and/or content for free in exchange for subjecting the user to occasional interstitial or peripherally placed advertising. Rather than reinventing the means to integrate the ads and source their own paying advertisers, app developers can use a service to provide the advertisements which are often contextual with respect to the user’s profile, location or activity. Ad services allow a developer to easily monetize their work while providing users with free content.

Authentication services allow users to sign in with an existing universal passport account, customize their user experience in accordance with their master profile settings and connect with other content that shares the same service. Once signed in, an app can welcome the user by name, display their picture, connect them with friends, and lots more.[Android Developers]

Context Awareness
Context awareness is a term that describes the ability of a app to understand what is important to a user at any given point in time or place then serve them relevant information. Google Now is an example of this service. This app scours one’s email, calendar, web browser history and search history to discern what is relevant. It also uses the device’s GPS as a means to provide locationally relevant information. It then informs you if your flight is delayed or if the traffic to the airport is heavy so you can leave earlier. When you arrive at the airport it provides a map to the hotel you have reserved, then reminds you about your dinner reservation and maps the route to get there. It even knows to provide useful foreign language phrases and currency conversion when traveling abroad. It provides weather and public transit information based on where you are. It tells you the sport scores of your favorite teams and gossip about your favorite celebs. Context awareness services have enormous potential to enhance augmented reality experiences. As AR becomes a device utility like SMS and email, context awareness will proactively engage to provide AR content that you want and need without the user activating it. This is integral to the future of AR.

This list of services is just a sampling of the services available to mobile apps. For a more complete accounting of the hundreds that are available check out ProgrammableWeb’s API Directory.

Leave a Reply

Your email address will not be published. Required fields are marked *

Post Navigation