In my previous entry, 07: Modes of Visually Implementing AR, I discussed the various means by which an augmented reality experience can be visually rendered and I emphasized mobile AR since this is clearly where the compelling use cases lie and therefore where the commercial opportunities exist. While the visual rendering is at the heart of AR, there are many other technologies that must be integrated into an AR solution in order for it to be engaging, effective and attractive to the market place. In this entry I will be reviewing other hardware technology that is converging to make mobile AR happen. Software and services also play a crucial role and will be covered in upcoming posts.
We already carry around devices in the form of smart phones and tablets that are packed with sensors, radios and powerful processors that need not necessarily all be duplicated in head mounted displays (HMD) where they would add weight, bulk and battery consumption. As long as the HMD can communicate with the device (via cable or radio) with sufficient speed, practicality dictates that as much work as possible should be offloaded to the mobile device. This review includes technologies that may participate in the rendering of AR for HMDs and mobile devices, but not all of these technologies need be present to do so. At some point in the near future the technologies will be miniaturized and optimized to the point where we will see integration into a single device worn on the head that will replace the need to have both. Let me know in the comments if there are other hardware technologies you can think of that may contribute to AR experiences.
Tracking, the process of locating a user’s position in an environment, is critical to the accuracy of AR applications as more realistic results can be obtained in the presence of accurate AR registration. It usually includes determining the location, position and orientation of the AR user and their gaze.
Accelerometers detect forces along a single axis, three of which are combined to provide acceleration detection along the x, y and z axis. Accelerometers are common in many devices as they are used to determine when the phone is on its side to know when to rotate the screen.[Techoutbreak] In AR it should be located in the HMD where it can be used to stabilize an image overlay when the angle of the wearer’s head changes. For instance, when using an app that guides one through an automobile oil change by highlighting different engine components and explaining actions to be taken, the user’s head is likely to constantly be shifting as they duck under the hood and look at the engine at different angles. The accelerometer can help to maintain a highlight on the intended component during this exercise.
While the accelerometer can determine movement in space, it doesn’t know which direction it is facing. A digital compass or magnetometer can be used to determine directional orientation. Magnetometers sensors are found in most mobile devices where they are used for measuring the strength and direction of magnetic fields, however since the device will likely remain in the pocket during an AR experience, it is necessary to have one in the HMD as well. Using a compass’s sense of magnetic north as a reference is quite useful in AR as it provides a vector to a known location, which can be used in conjunction with an accelerometer to achieve full pose tracking.[Techoutbreak]
Gyro sensors, also known as angular rate sensors or angular velocity sensors, sense angular velocity. In simple terms, angular velocity is the change in rotational angle per unit of time expressed in degrees per second. Though also present in mobile devices, it is necessary to have them in the HMD for AR. Used in conjunction with an accelerometer it can produce smooth rotation detection in AR applications and adjust graphical presentation accordingly.[Techoutbreak]
GPS (Global Positioning System)
GPS is a satellite based service that is used to determine the location of a device using universal grid coordinates to determine latitude, longitude and elevation. The ubiquity and maturity of satellite-based global positioning technology on mobile devices allow AR apps to be context-aware to provide users with information relevant to their physical location or to determine which experience to trigger. GPS hardware and processing can be performed by the mobile device and therefore need not be present in the HMD.
Barometers are integrated into advanced mobile devices to help them more rapidly acquire a GPS lock by delivering altitude coordinates to the required latitude and longitude GPS equation. The barometer therefore serves as an altimeter which can be leveraged for location aware AR apps in cities where orientation must take high rise buildings into account when determining a user’s orientation in space and thus where augmented features can be expected to be displayed. Barometers need not be present in the HMD.
Computer Vision technology gives devices the ability to understand their short range surroundings in terms of 3D depth, feature shapes, distance and motion of the camera or subject. This is key to many markerless AR applications that require the user to interact with their surroundings such as objects, surfaces and gestures. Computer vision typically requires an array of emitters and sensors which must be located as close to the user’s point of view as possible thus requiring them to be mounted on the HMD, but the processing overhead may be offloaded to the robust video processing chips found in the mobile device. There are several approaches that accomplish computer vision in different ways that are too technical to get into in this blog. Most intellectual property in this space is centered around the processing algorithms rather than hardware, but since this post is about the hardware I shall provide a crude overview.
The process begins by emitting a pulsed signal at an interval sufficient for tracking moving objects. This signal may come from an infrared LED, laser or ultrasonic speaker (listed in order of depth capability) or any combination thereof. The signal spreads out across the target area which may be just a few feet for gesture recognition with infrared or 10 – 30 feet with lasers and sound.
The emitted signal is reflected off the target environment and bounces back to sensors tuned specifically to the emitted frequencies. Light is sensed by a camera CMOS while sound is picked up by a microphone. Stereoscopic sensing is often used to improve depth cues.
The signal processing is performed by firmware and/or software, often on a dedicated ASIC, which parses the patterns to determine shapes and surfaces and parses the delay of the returned signal to interpret depth. This processed information is then input into the AR software that weaves it together with contextual imagery and data to render the AR experience visually.
Human interfaces is how persons interact with a (mobile) AR system and how that system interacts with persons. This is a two way street where the human provides input to the system and the system provides feedback to the human. These are the components that accomplish this.
Microphones are a necessary component of the HMD for voice command operations as well as for applications that require voice communications. An example of the latter would be a help desk app that uses the built in camera to allow a technician to see what the user is seeing and provide guidance in a repair or assembly task by highlighting certain elements in the user’s viewscreen.
Audio output in the form of headphones are key to the communication of oral instruction and information provided by apps. However covering or plugging the ears of the user is often not desirable for natural interaction with the outside world. An alternative is bone-conduction technology which transmits audio waves to the inner ear through the skull with vibration of a piezoelectric bone conductive speaker. The transducers can be placed on the cheekbones or behind the ears thus leaving your ears free to hear ambient sounds while listening to the vocal content from an AR application. This vocal content can be heard in high noise environments (even while wearing ear plugs) yet remains virtually inaudible to bystanders. While not integral to AR experiences, bone conduction technology is an adjunct that is sure to contribute to the effectiveness of AR experiences and has already been built into head mounted displays such as Google Glass which uses the entire frame to conduct sound.
While computer vision is a viable means for controlling the AR experience through hand gestures (see above), it does have limitations and weaknesses. The requisite sensor array adds weight, bulk and battery consumption to the HMD and the gestures must be performed in full view of the sensors to be effective. Alternatively, gesture control can be accomplished through separate devices worn as a ring, bracelet or armband. These devices have their own power source and may interface with either the HMD or the mobile device through a radio communication protocol such as Bluetooth. They contain accelerometers, gyroscopes and magnetometers for determining orientation in space. They may also use electromyography (EMG) to interpret the muscle movements made by specific gestures or capacitive touch to enable certain gesturing.
While gesture tracking is all about controlling the AR experience through the sensing of hand and finger movement, many of the same capabilities can be accomplished by tracking the eyes thus leaving the hands free to accomplish tasks. Technology already exists for using one’s eyes as a mouse pointer to interact with a PC or phone by simply gazing at different areas of the display. The form factor of this technology is now being reduced to fit in HMDs to bring these same capabilities to AR. In addition to controlling the user interface through winks and other deliberate eye movements, the eyes’ gaze can also be used to determine where in the field of vision to place the augmented digital content.[Kemal Doğuş] By bouncing harmless infrared light off of the eye it can be determined where the eye is looking and at what distance it is focused. The incoming image can then be re-focused to accommodate where, and at what depth, the wearer is looking. Or it could be used to determine that the wearer is looking away, and the image can be turned off altogether.
HMDs must contain buttons or multi-function touch pads for functions that may not be convenient or practical to control through the other input interfaces such as on/off, wake, call answer/hang-up and volume control.
Motion detection can be used to interact with an HMD. A head nod or shake may activate or deactivate features and operations.
There are several approaches to providing visual feedback in a mobile AR system. These are covered in depth in my post 07: Modes of Visually Implementing AR where I discuss near-eye microdisplays, projected image overlay, dual focus contact lens, and near-eye light fields. The display is perhaps the most important human interface in an AR system.
Mobile devices and HMDs are essentially computer hardware. In this section I cover the basic system components at a high level that enable processing. There are other many other components and subsystems that enable computer system processing that are too involved to call out here.
Mobile devices use compact system on a chip integrated circuits (SoC) that are incredibly powerful, small, energy efficient and use reduced instruction set (RISC) technology incorporated into ARM processor architecture. “SoC” implies that all the data processing needs of subsystems such as graphics processing, high definition video, data modem, power management, external interfaces (i.e. USB), data encryption, RAM and 3D rendering in addition to CPU duties. The current generation now offers dual and quad core architectures that rival the capabilities of recent generation desktop computers. Next generation chips are now being deployed to meet the specific needs of the demands of augmented reality systems. Processors demand a significant amount of power so as much of the processing as is practical should be offloaded to the more powerful chips. This allows the size and power consumption of the chip in the HMD to be kept to a minimum.
Smart phones are now offering up to 64 GB of solid state disk capacity plus the support of SD cards of equal size. This enables the devices to accommodate larger programs (apps) of the complexity and detail required by AR. HMDs must also have many GBs of on board storage for their own programs, but the majority of the data input that AR applications require comes from the Cloud.
Up to 2 GB of RAM is now being built into mobile devices which is also key to the support of the complexity of AR app function. HMDs also require a similar amount of memory which is likely to include both flash and DDR types.
Other Hardware Features
The following are other hardware technology features required for AR systems.
Ambient Light Sensor
Light sensors adjust the brightness of the screen to optimize both viewability and the consumption of power in the HMD. This has implications for advanced AR apps that strive to seamlessly blend augmented content with their surroundings. For instance, an app that superimposes images of historical structures on the sites where they used to stand could use an ambient light sensor to more realistically shade the 3D buildings based on the lighting when being viewed.
Bluetooth, Wi-Fi, NFC, WiMAX, 3G and 4G LTE provide many avenues for mobile devices to acquire the contextual data required for real-time AR apps. The challenge is transmitting visual data from the HMD to the mobile device to process and integrate it with augmented data then sending it back to the HMD display quickly enough to prevent perceptible lag. The preferred wireless protocol for doing so would be Bluetooth 4 because of it’s reduced energy consumption but its throughput maxes out 24 megabits per second. Using video compression makes Bluetooth feasible, however even the best VPUs suffer some latency when decompressing the image. Until this can be worked out, a USB cable connection may be the optimal way to pair the HMD with the mobile device.
Batteries continue to get smaller, recharge faster and last longer. The demands of constant HD video input, processing and output necessary for AR is very taxing on batteries and maintaining sufficient charge is a significant challenge to overcome for both the HMD and the mobile device. While battery technology improvements historically come in small increments rather than leaps, the availability of charging facilities is rising to meet the needs of the mobile world. Wireless inductive technology promises to make device charging more convenient.
Camera (front facing)
Video and still cameras are being incorporated into HMDs as a hardware feature that is not necessary for AR experiences. The feature allows the wearer to use the HMD as a hands free point of view camera. The inclusion of this on Google Glass has caused many to react negatively to what may be perceived as a privacy intrusion when used surreptitiously.