In the minds of many, augmented reality began with Google Glass. But that is far from the truth. AR has been evolving since the 1950’s. This evolution has come in the form of incremental advances in thought and technology. In this entry, I recount many select events that I find to be important to this evolution. Like the evolution of all technologies, advances were slow at first but then began to build on each other more rapidly. This, in turn leads to more parties getting involved and more resources being thrown at solving the problems. Today, the advancement curve is just beginning its rapid ascent and is on the cusp of major acceleration as big tech companies start significantly investing in AR. This is surely the dawning of a new age. This survey is by no means a complete accounting. What events have I omitted that you find to be important?
The heads-up display (HUD) was adapted from WW II era gyro gunsight technology in the late 1950’s with the objective of supplying basic flight information on the windscreens of military planes to minimize the need to glance down and change focus from the horizon.[D.N. Jarett] HUDs entered commercial aviation in the 1970s and were first put into service on the Dassault Mercure plane.[Glenn Conor]
General Motors was the first auto manufacturer to develop HUD for cars, in 1988. By the late 1990s many Buicks were equipped with HUD, and in 2003 BMW introduced their system to Europe.[David Merline]
The term Augmented Reality is believed to have been coined in 1990 by Thomas Caudell, a former Boeing researcher. This concept first appeared in a research paper published in 1992 describing the design and prototyping steps taken toward the implementation of a heads-up, see-through, head-mounted display and how this technology would enable cost reductions and efficiency improvements in many of the human-involved operations in aircraft manufacturing, by eliminating templates, formboard diagrams, and other masking devices.
Ronald Azuma presents the first survey on Augmented Reality. In his 1997 publication, Azuma provides the seminal definition for AR, as identified by three characteristics:
1. it combines real and virtual
2. it is interactive in real time
3. it is registered in 3D
A group led by Steven K.Feiner developed the first mobile augmented reality system dubbed the Touring Machine in 1997. It was comprised of a see-through head-worn display with built-in orientation tracker and a backpack holding a computer, GPS, and digital radio for wireless network access. A hand-held computer with touchpad interface provided command input.
Sportvision, a company based in New York City, debuted its “1st and Ten” system on during a game between the Bengals and the Ravens, broadcast on ESPN on September 27, 1998.[Shell Brannon] The virtual yellow 1st & Ten line system is a patented video overlay technology that creates the illusion that a yellow first-down line or a down and distance arrow is actually painted on the field, even allowing players to cross over and stand on them as if they are real. This is a technology that most people are familiar with but likely don’t think of as being Augmented reality.
In 1998 Henry Fuchs et al. publish their research into prototyping a three-dimensional visualization system to assist with laparoscopic surgical procedures. The system uses 3D visualization, depth extraction from laparoscopic images, and six degree-of-freedom head and laparoscope tracking to display a merged real and synthetic image in the surgeon’s video-see-through head-mounted display. The study postulates that “viewing laparoscopic [AR] images from outside the body, as if there were an opening into the patient, will be more intuitive than observing laparoscopic imagery on a video monitor or on a stereo video monitor.” While the physician’s “use of the laparoscope is somewhat akin to exploring a dark room with a flashlight, [AR] will provide the added benefit of visual persistence of the regions of the scene that were previously illuminated.” This research laid the groundwork for many key advances in the use of AR for surgical and medical procedures.
The first method for tracking fiducial markers using an optical see-through head mounted display is presented by Hirokazu Kato et al. at the IEEE International Workshop on Augmented Reality in 1999. The result is ARToolKit, an open source software library for building six-degrees-of-freedom AR applications that solve the problem of changing the view of the augmented content based on the user’s viewpoint. This software continues to evolve today through collaborative open source development.
The first AR game was presented in a paper by Ben Close et al. called “ARQuake: An Outdoor/Indoor Augmented Reality First Person Application” in 2000. ARQuake is a desktop first person shooter game that they converted into a mobile augmented reality application to be played by people moving around in physical space. Their architecture provided a low cost, moderately accurate six degrees of freedom tracking system based on GPS, digital compass, and fiducial vision-based tracking.
Archeoguide was developed in 2001 with the objective of providing new ways of information access at cultural heritage sites in a compelling, user-friendly way through the use of advanced technology including augmented reality, 3D-visualization, mobile computing and multi-modal interaction. Mobile head mounted displays equipped with a camera, microphone, earphone and a lightweight portable computer with a simple input device are used to provide visitors to Olympia, Greece the ability to access contextual information to the site, adaptive navigation, 3D visualization of missing and damaged artifacts and multi-modal interaction for obtaining information on real and virtual objects through gestures and speech.
Mathias Möhring et al. present the first running video see-through augmented reality system in 2004 on a consumer cell-phone. “It supports the detection and differentiation of different markers, and correct integration of rendered 3D graphics into the live video stream…”.
In a paper by Anders Henrysson et al., a method is devised for mobile phones to support face to face collaborative AR gaming. The team created a custom port of the ARToolKit library to the Symbian mobile phone operating system enabling the development of the collaborative AR-Tennis game that went on to receive the Independent Mobile Gaming best game award for 2005.
In a 2005 paper, Reitmayr et al. presents a model-based hybrid tracking system for outdoor augmented reality in urban environments enabling accurate, real-time overlays on a handheld device. The system combines an edge-based tracker for accurate localization, gyroscope measurements to deal with fast motions, measurements of gravity and magnetic field to avoid drift, and a back store of reference frames with online frame selection to re-initialize automatically after dynamic occlusions or failures.
In 2006 Eyemagnet developed a ‘virtual dressing room’ for Hallenstein’s clothing stores in New Zealand. Their Motion Detection system uses cameras to track the users’ gestures and translate these into movement which the application then registers. The camera is used to position the user on the screen which creates a virtual mirror. Selected clothing items are then superimposed on the shopper thus allowing them to try more options than they likely would in an actual dressing room thus increasing the conversion rate from shoppers to buyers. The system is complemented by a mobile application extending the users brand experience beyond the retail environment. This was the first of many innovations in “VDR” that have since spread to home PC and mobile applications.
HIT Lab NZ worked with Saatchi and Saatchi to deliver the world’s first mobile phone based AR advertising application for the Wellington Zoo in 2007. Mobile phone users point their phones at the printed box to see a 3D model of a giraffe, cheetah or Malayan sun bear – all animals that can be seen at the zoo. The ad, placed in a major newspaper, reached 750,000 people, leading to a 32% growth in visitors at Wellington Zoo.
Gartner names Augmented Reality one of the Top Ten Disruptive Technologies for 2008 to 2012. This is according to Gartner Fellow David Cearley who sees it as one of many converging technologies that CIOs must keep on top of in order to remain competitive.
Miyashita, et al. present 2008 paper on extending existing AR technology to improve upon the ubiquitous museum audio tour experience. This included the use of markerless tracking, hybrid tracking, and an ultra-mobile-PC. This system was put in place for a six-month exhibition on Islamic art at the Musee du Louvre in Paris. This was the result of a three year collaboration that included the AR company Metaio GMbH with the intent of exploring and developing the use of these technologies for this application.
Yelp debuts Monocle as the first AR app on the iPhone in 2009. It was initially implemented as an “easter egg” or hidden feature buried within the app that required activation by shaking the phone 3 times after initiating the app. This was done to circumvent iPhone’s ban in place at the time on AR apps. Holding the device up with the camera pointing at any direction results in the screen being populated with the Yelp listed restaurants nearby. You can then click on the link and see the full Yelp reviews.
Mobilizy GmbH was founded in May 2009 by Philipp Breuss-Schneeweis as a startup specializing in mobile Augmented Reality software for smartphones. The Wikitude World Browser overlays information on the real-time camera view of a smartphone by combining GPS and compass data with Wikipedia entries.
SPRXmobile launches Layar in June 2009 as “the world’s first mobile Augmented Reality browser”, which displays real time digital information on top of reality in the camera screen of the mobile phone. While looking through the phone’s camera lens, a user can see houses for sale, popular bars and shops, jobs, healthcare providers and ATMs. Layar is an API derived from location based services and works on Android mobile phones. The GPS automatically knows the location of the phone and the compass determines in which direction the phone is facing. Each developer partner provides a set of location coordinates with relevant information which forms a digital layer. The user switches between layers by tapping the side of the screen.
Quest Visual was founded by Otavio Good in 2009 in San Francisco, CA to leverage the increasingly powerful computer vision capabilities of mobile devices. Their breakthrough product, the Word Lens app, translates printed words from one language to another using a smartphone’s video camera in real time. OCR technology was employed to initially translate Spanish to English, but other languages soon followed. It is designed to read the small number of plainly stated words found on a sign rather than longer texts or handwriting. Quest Visual was acquired by Google in 2014.
AR Conference debuts in San Francisco in April 2010 as “the world’s first and largest commercial augmented reality event” with a stated purpose “to accelerate the industry in order to make the physical world our platform.” The conference showcased participation by many of the fledgling industry’s luminaries, entrepreneurs, developers and scientists and was sponsored by a constellation of tangentially related technology and communication companies. This conference has since changed names to Augmented World Expo.
Vuzix debuts the first consumer AR glasses in 2010 with the The Wrap 920AR described as a wearable display with a 67-inch screen (as viewed from ten feet), stereo video capture, 6-degrees of freedom head tracking, VGA connectivity for your computer and plug-in software to bring Autodesk 3ds Max characters to life, it boasted 1504 x 480 resolution. It was selected as a 2010 CES Innovations Award winner. According to Vuzix CEO, Paul Travers, “This state-of-the-art video eyewear, offers consumers an AR solution only available in handheld devices to date, merging the real world with the digital in a pair of glasses makes for world changing experiences from industry to gaming.” [marketwired.com]
Google Goggles is a downloadable image recognition app launched in 2010 for searches based on pictures taken by handheld devices. The system can identify various labels or landmarks, allowing users to learn about such items without needing a text-based search. The system can identify products, barcodes or labels that allow users to search for similar products and prices and save codes for future reference. The intent was to deliver a universal visual search tool but capabilities were initially limited by the scant amount of data originally available. It was first released for the Android OS but later made available for iOS and Blackberry. Though in itself Google Goggles is not considered to be AR, applications such as this are considered to be essential services to a future AR ecosystem that will identify things within one’s view then provide useful contextual information.
Kinect is a motion sensing input device by Microsoft developed for the Xbox 360 video game console and Windows PCs that was released in 2010. The device features an RGB camera, depth sensor and a multi-array microphone that provides full-body 3D motion capture, facial recognition, and voice recognition capabilities. It enables users to control and interact with the Xbox 360 without the need to touch a game controller through a natural user interface using gestures and spoken commands. Kinect is significant because it was the first commercially available gesture sensor device. Such technology is shrinking in size and is expected to become a ubiquitous feature on smart phones to supplement touch screen input. Gestural sensors are certain to become integral to mobile AR by enabling interactivity where other input methods are not practical such as interpreting sign language.
The use of pico projectors as an alternative to head mounted displays for rendering of AR is presented in a paper by Johannes Schöning, et al. in 2010. The researchers foresaw the miniaturization of projector technology as enabling a novel means to implement unique applications of AR with mobile devices. The paper describes how physical objects in the environment can be augmented with additional information by this means (as opposed to having the information overlaid by ocular devices). They identify different application classes of such interfaces, namely object-adaptive applications, context-adaptive applications, and camera-controlled applications. One example they offer is the augmentation of a paper map with projected contextual information and graphics. Today there are rumors of pico projectors being integrated into the next generation of mobile devices.
In 2011, researchers at IBM in Haifa developed a mobile app for shoppers that provides immediate product information and comparisons based on their preferences as they move throughout a store. It uses the smart phone device’s camera along with advanced image processing technologies to identify a product on the shelves. Once recognized, it will display information above the product images and rank them based on a number of criteria, such as price and nutritional value. It will also provide the shopper with any loyalty rewards or incentives that may apply and suggest complementary items based on what the customer has already viewed. The app also allows retailers to better understand the shopping preferences and habits of consumers.
Nestor is a real-time planar shape recognition and camera pose estimation system for mobile devices developed by Nate Hagbi, et al. at The Visual Media Lab, Ben-Gurion University, Israel in 2011. The system can read shape files, or perform a learning step in which the user shows a new shape to the camera. The shape is analyzed and inserted into a library, which is used to maintain the set of shapes to be tracked and their properties, such as the models assigned to them. When a learned shape is recognized at runtime, its pose is estimated in each frame and augmentation can take place. Experiments show the system performs robust recognition and registration, maintains accurate tracking, and operates in interactive frame rates on a mobile phone.
A study by Steven Henderson in 2011 definitively establishes AR for Maintenance and Repair (ARMAR) as being viable in the execution of procedural tasks in the maintenance and repair domain. The principal objective of the project was to determine how real time computer graphics, overlaid on and registered with the actual repaired equipment, can improve the productivity, accuracy, and safety of maintenance personnel. Head-worn, motion-tracked displays augment the user’s physical view of the system with information such as sub-component labeling, guided maintenance steps, real time diagnostic data, and safety warnings. The virtualization of the user and maintenance environment allows off-site collaborators to monitor and assist with repairs. Additionally, the integration of real-world knowledge bases with detailed 3D models provides opportunities to use the system as a maintenance simulator/training tool. This study confirms what has long been expected to be one of the most practical applications for AR technology and paves the way for this domain to be exploited.
A paper by Brian F. G. Katz et al. in 2012 introduced the NAVIG augmented reality assistance system for the visually impaired with the aim of increasing autonomy and mobility in both sensing the immediate environment and pedestrian navigation. Combining satellite, rapid image recognition and spatial audio rendering, detailed trajectories can be determined and presented to the user for attaining macro- or micronavigational destinations. This kind of assistive device does not replace traditional mobility aids such as the cane or the guide dog, but provides the user with important information for spatial cognition including guidance and spatial description. It may also help users generate a functional mental map of the environment. The 3D sound module that provides cues through headphones may later be adapted for bone conduction headphones that won’t obstruct users’ access to the natural sounds of the environment.
Google’s Project Glass was named by Time Magazine as One of the Best Inventions of 2012. Though not widely available to consumers until late 2013, Google’s caché helped bring attention and an awareness to AR that it had not previously enjoyed over its decades of its existence. Google Glass is a head mounted display that includes a camera with HD video, bone induction speakers, microphone, Wi-Fi, Bluetooth, GPS, power meters/sensors, thermometer and a tiny prism display that sits above the right side eyeline. It is designed to work in conjunction with an Android device and thus is expected to integrate with many of the existing Android apps made such as Google Maps (navigation), Google+ (social media), Gmail notifications (calendar and email), Google Goggles (image recognition), etc.
An augmented reality mobile application to deliver personalized and location-based recommendations by continuous analysis of social media streams was detailed in a 2012 paper published by Marco Balduini, et al. Every day, hundreds of thousands of tweets carry the live opinions of tens of thousands of users about restaurants, bars, cafes, and many other semi-public points of interest. Trusting the power of crowd sourcing to be a solid base for novel commercial and social services, the BOTTARI sytem was conceived as an AR application that offers personalized and localized recommendations based on the temporally weighted opinions of the social media community. By pointing the device to frame the surrounding environment, the users see notation about the recommended points of interest based on their search preferences. POI types are indicated with different icons, and their reputation is indicated by thumbs-up and thumbs-down icons. This innovation is significant for its effective integration of social media, big data and AR.
In April of 2013, Google announced that its venture arm, Google Ventures, is partnering with venture capitalist firms Andreessen Horowitz and Kleiner Perkins Caufield & Byers to form a project called the Glass Collective. The Collective will provide seed funding to developers and startup firms looking to build applications for Glass. This announcement is coinciding with the imminent shipment of the first batch of glasses to 8000 developers who paid for the privilege of being “Glass Explorers” with the intent of seeding the market with the first generation of apps. The commitment by the highest profile VC firm in the Bay Area to invest in promising ideas is designed to incent the Glass Explorers to invest their own time into developing promising concepts in hopes of receiving funding.