In my previous two posts I discussed how hardware, software and services come together to deliver augmented reality experiences. Many AR apps are designed to perform specific functions within specific contexts. For instance, an app designed to provide enriched information about an art museum exhibit is only of use when visiting that particular exhibit and would otherwise be of little or no value outside of that context. Furthermore, it would detract from the art if the visitor were being constantly bombarded by AR throughout their tour of the exhibit. In this example the AR experience should be triggered at certain places in the exhibit and disappear at other times. So how does an app know when to perform its intended function? In this post I will be discussing the different ways by which an app can understand context and thus trigger the intended AR experience or content.
Graphic images often referred to as “fiducials” are created to be easily recognized by common machine vision techniques. These marker images serve as triggers to display AR content on or over the surface where the fiducials are printed or displayed.
One common place where Marker Based AR can be found is in print advertising campaigns. For example, a magazine ad for an automobile may entice the reader to download an (AR) app. The consumer then points their device camera at the page. Using the camera’s vision, the app locates and recognizes the fiducial printed on the page and transmits a query back to its server. The server then returns a data set back to the mobile device. That data contains a 3D rendering of the automobile that the user sees on their viewscreen. The camera remains pointed at the page where the fiducial serves as a registration device for the position and orientation of the image. This enables the user to rotate the magazine or walk around it to view the car from any angle. This 3D content is likely to be accompanied by music and/or promotional voiceover and links to additional content such as YouTube videos and local dealership contact info. Similar techniques have been used in games, children’s books, and scores of other examples.
The three types of markers are:
Barcodes are not a commonly used fiducial because they are only capable of holding a small amount of information. They typically appear on the back or bottom of products which is not a location conducive to being freely identified. One case where it may make sense to use a UPC as a fiducial is when the AR producer needs to tie content to products to which they have no business relationship. A simplistic app could provide allergy info for grocery items based on the UPC. The user would take the product from the shelf, point the camera at the UPC and the app translates the image to a code which is sent to a database. The database then returns data displayed on the viewscreen as warnings tailored to the user’s own sensitivities. The one dimensional nature of a barcode places significant limitations on the AR experience it can facilitate as it cannot be used to effectively register the content to the image in 3D space. In this example, however, one would not need the allergy information to rotate or move along with the package.
2-D Code (QR Code, Microsoft Tag, etc.)
2-D barcodes (sometimes called matrix codes) carry information in two directions: vertically and horizontally. This gives them the capability to hold hundreds of times as much information as 1-D UPC bar codes. For instance, one of the most popular 2-D barcode formats, Denso Wave’s QR Code, can hold more than 7,000 digits or 4,000 characters of text, whereas even the most complex 1-D codes top out around 20 characters.[HowStuffWorks] Though this is still not enough data to transmit the AR experience itself, these tags have built-in features that lend themselves to machine vision techniques that can provide orientation and perspective feedback to the application thus making them far superior to the 1-D UPC for AR.
2-D Image Recognition
As QR code technology has begun to suffer from overexposure, image driven activation has grown tremendously in the publishing space.[nellymoser] In this mode, the presence of the fiducial is not necessarily apparent to the user as it may be in the form of a photo, logo, illustration or invisible digital watermark. Though the process architecture of the app is similar to that of the other marker types in that it sends information about the fiducial to a database that then returns the corresponding AR experience data, image recognition requires advanced algorithms and more robust computing capabilities.
Natural Feature Tracking (NFT)
NFT recognizes shape features in practically anything (packages, devices, structures, photos, patterns, faces) and uses them as markers to deliver and control an augmented reality experience. Maintaining accurate registration between real and computer generated objects is one of the most critical requirements for creating an augmented reality experience. As the user moves his or her head and viewpoint, the computer-generated objects must remain aligned with the 3D locations and orientations of real objects. Alignment is dependent on tracking (or measuring) the real world and viewing pose accurately. The tracked viewing pose defines the projection of 3D graphics into the real world image in 6 degrees of freedom, so tracking accuracy determines the accuracy of alignment.[Ulrich Neumann et. al] NFT AR is accomplished in three phases: detection of the targeted feature, tracking of the object as it or the viewer moves in 3d space, and rendering of the image or information in registration with the feature. This constant cycle must be accomplished at high speeds to maintain an acceptable experience. Mobile devices have only been capable of doing so effectively in the most recent generations of chips thanks to the integration of purpose built GPUs (graphics processing units) into the circuitry.
An example of how NFT operates can be described using the example of an auto repair app that a consumer would use to learn how to change the air filter in their car. The app would need to know the make, model and year of the car to acquire the requisite instruction set, perhaps automated by pointing the camera at the VIN number which submits a query to an automotive database. After instructing the user in proper safety precautions, the app would tell them to open the hood and point the camera at the engine compartment. The app would then compare the image to the retrieved visual information data and seek out familiar contours that can identify the nut that secures the air filter cover (detection). The app would then display a bright color highlight superimposed over the nut on the view screen along with a wrench icon being rotated in a counterclockwise direction (rendering). As the user moves the camera, the highlight and the wrench icon remain fixed over the nut despite the change in angle, orientation and light shading (tracking). The app would then continue to walk the user through the subsequent steps involved in the maintenance task with additional AR.
Location Based AR combines the positioning capabilities of the mobile device with its video camera to overlay location-relevant information. There are several components to achieving accurate positioning information: The global positioning system (GPS) provides coordinates in latitude, longitude and altitude from orbiting satellites which are only accurate within 20 meters due to ionospheric distortion. When a GPS signal is unavailable, geolocation apps can use information from cell towers to triangulate your approximate position, a method that isn’t as accurate as GPS but has greatly improved in recent years. Some geolocation systems use GPS and cell site triangulation (and in some instances, local Wi-Fi networks) in combination to zero in on the location of a device. These and other techniques known as “Assisted GPS” can be accurate within a meter. A geolocation database can use these coordinates to provide information such as country, region, city, postal code and time zone to an application.
Geolocation based AR is dependent upon a priori knowledge of space and features including landmarks or other points of interest. The device’s magnetometer (digital compass) is used to determine the direction of the camera’s gaze and the accelerometer to determine its angle which both contribute to an app’s ability to know what the user is looking at. Natural feature tracking or Simultaneous Location and Tracking (SLAM) may then be employed to complete the overlay of information.
There are a number of applications for geolocation in augmented reality:
- directions and navigation of streets, buildings, trade shows or other venues
- restaurants and business information
- sightseeing and tourism
- point-in-time viewing of historic structures and milieu
- social network tagging of businesses and locales
- geography, geology or anthropology educational info
An example of a geolocation based AR app is the Civil War Augmented Reality Project. This is an educational game that engages users at various historical sites by detecting their location and gaze and superimposing 3D images of things that existed in that place and time such as buildings, artillery, supplies, etc. This is accompanied by info bubbles that pop up with explanations of the events depicted and the people involved in them. Players are then given clues that lead them to discover additional sites. Teams of players can compete in a scavenger hunt of sites and activities then be rewarded for their success by unlocking additional content.
Mobile devices contain several radio technologies such as Wi-Fi, Bluetooth and NFC (Near Field Communications) in addition to the multiple standards of cellular radios such as 3G and 4G. Of these, Bluetooth and NFC are highly localized technologies that are well suited to triggering AR experiences.
The newest form of geolocation technology is the use of Bluetooth LE as an indoor positioning system. This uses small, inexpensive wireless passive sensor beacons to pinpoint your location more quickly and accurately than GPS. Apple’s iBeacon is an example of such a technology currently in use. It works by using proximity sensing to transmit a unique identifier, which when picked up by a compatible app or operating system can determine a physical location or trigger a specific action on the device. In addition to performing tasks such as a check-ins on social media, it can also be used to enhance shopping and payments. For Augmented Reality, it can provide a more personalized experienced than ordinary GPS based geolocation.
NFC is a form of contactless communication between devices like smartphones or tablets. Devices establish radio communication with each other by touching them together or bringing them into close proximity (within 4 centimeters). The technology is primarily used for contactless transactions, data exchange, and simplified setup of more complex communications such as Wi-Fi. Communication is also possible between an NFC device and an unpowered NFC chip, called a “tag”.[NearFieldCommunication] NFC differs from RFID in that it is a two-way peer-to-peer data exchange. Once a link is established then the active device acts as the initiator and the passive device acts as a target, exchanging data using a secure protocol.[Ars Technica] An NFC enabled mobile device could act as an RFID reader, reading information from tags placed in posters, ads, or other items to link to a website for further information or to instantly grab a coupon code. One example is a tag on a poster that loads the Fandango website which uses location services to show theaters and times for the particular movie in your area. Another example might be reading a tag on a display at Starbucks, which adds a coupon code that could be redeemed at the counter for a discount.[Ars Technica]
NFC tag technology is well suited to be a catalyst for augmented reality experiences. IT consulting firm Avanade has developed an interactive shopping solution called Grab & Go that combines NFC, QR codes, augmented reality and the Kinect motion sensor technology. The solution enables users to view and select products from a main video screen then transfer items to their mobile device for review or purchase using hand gestures. Consumers stand in front of the video screen and tap their device to an NFC tag to enable Grab & Go to connect with their device and launch a web app. They can then use simple hand gestures to point to items of interest and have them transferred to their mobile phone.[NFC World]