Facebook Looking at AI Systems That Will Remember, Hear and See Everything You Do

    facebook
    facebook

    Facebook is pushing a lot of money and time into augmented reality, and its efforts include building its own AR glasses with Ray-Ban. The gadgets can only record and share imagery at the moment, but Facebook plans on using them for different functions in the future.

    Andy Jacob, CEO of DotCom Magazine says, “AI systems will change the landscape of how people live their lives. Facebook will find the balance between freaking people out with “Bog Brother” type surveillance and being helpful. While Big Tech needs to answer to their shareholders, they need to also be cautious that they don’t step over the purple thread. I expect Facebook to run the fine line between being Big Brother and offering the types of service that their users will demand in the future expertly.

    Facebook’s AI team is running a new research project that gives some indication of the scope of the company’s ideas. It envisages AI systems that will constantly analyze peoples’ lives using video by recording what they hear, do, and see in order to assist them with everyday tasks. The research team has defined a number of skills their new systems should develop, including remembering who said what when (audio-visual diarization) and answering questions such as “where are my keys?” (Episodic memory).

    The tasks described above can’t be achieved by an AI system reliably at the moment, and Facebook has clearly stated that this is not a commercial development, but rather a research project. It is however clear that the firm sees functionality like this as the future of AR computing. Kristen Grauman, a Facebook AI research scientist told The Verge that they were thinking about augmented reality and what they would be able to do with it. She added that there were possibilities in the future where they would be able to leverage this type of research.

    These ambitions have massive implications for privacy. Privacy experts are already concerned about how wearers of Facebook’s AR glasses can already record members of the public covertly. These concerns will grow if new versions of the hardware not only record, but can transcribe and analyze the footage, as this would turn wearers into moving surveillance machines.

    Facebook’s research project is known as Ego4D, referring to the analysis of “egocentric,” or first-person video. It comprises two major components: a number of benchmarks that Facebook feels AI systems will be able to handle in the future and an open dataset of egocentric videos.

    This dataset is the largest of its type ever created, and Facebook collected the data by partnering with 13 universities across the globe. Around 3,205 hours of footage was recorded by 855 participants in 9 countries. The universities were responsible for collecting the data, and not Facebook. Participants wore GoPro cameras and AR glasses to record videos of unscripted activities. This ranged from baking to construction work to socializing with friends and playing with pets. The universities de-identified all footage, which included removing any personally identifiable information and blurring the faces of bystanders.

    According to Grauman, the dataset is the first of its type in both diversity and scale. She added that the nearest comparable project contains 100 hours of first-person footage entirely shot in kitchens. The program has opened up the eyes of AI systems to more than only kitchens in Sicily and the UK, but will now include footage from Tokyo, Saudi Arabia, Colombia, and Los Angeles.

    Ego4D’s second component is several tasks, or benchmarks, that the company wants researchers around the globe to try and solve by using AI systems that have been trained on the dataset.

    Facebook describes these as follows:

    – Forecasting: What will I likely do next, e.g., have I already added sugar to this recipe?

    – Episodic memory: When did what happen, e.g., where did I leave my phone?

    – Audio-visual diarization: What was said when, e.g., what topic was discussed during the meeting?

    – Hand and object manipulation: What can I do, e.g., show me how to play the piano?

    – Social interaction: Who interacted with whom, e.g., help me hear that person better when they talk to me in a noisy environment?

    Although, AI systems would find it very difficult to tackle any of these problems at the moment, creating benchmarks and datasets are proven methods to push AI development.

    The creation of one specific dataset and an annual competition associated with it, known as ImageNet, is in fact often credited with kick-starting the recent boom in AI. The ImagetNet datasets contain pictures of a massive variety of objects which AI systems were trained to identify. The competition’s winning entry in 2012 used a specific method of deep learning to beat its rivals, initiating the current research era.

    Facebook is hopeful that the Ego4D project will have similar effects on the augmented reality world. The company believes systems trained on Ego4D may one day be used not only in wearable cameras but also in-home assistant robots, as these also use first-person cameras to navigate the world surrounding them.

    Grauman says the project has the possibility to catalyze work in this field in a manner that hasn’t yet been possible. This will move the field from its ability to analyze piles of images and videos that were taken by humans with a specific purpose, to an ongoing, fluid, first-person visual stream that AR systems can understand in the context of constant activity.

    Even though the benchmarks outlined by Facebook appear to be practical, the company’s interest in this area will cause concern for many. Facebook’s privacy record is appalling and spans $5B in fines from the FTC and data leaks. It has also been repeatedly shown that the company values engagement and growth above users’ well-being in many areas. Against this backdrop, it’s concerning that benchmarks in the Ego4D project don’t include noticeable privacy safeguards. The “audio-visual diarization” task for example does not specify that data about people who refuse to be recorded should be removed.

    When The Verge asked Facebook about these issues, a spokesperson said that privacy safeguards would likely be introduced later in the project. The spokesperson said the company expected that when companies started using the benchmarks and dataset to develop commercial applications, they would also develop the safeguards for those applications.

    Before AR glasses can for example enhance a person’s voice, there should be a protocol that is followed to ask another individual’s glasses for permission, or they may limit the device’s range so it could only hear sounds from people already in a conversation or who are nearby.

    These safeguards are however only hypothetical at the moment.