Sunday, June 25, 2023

A Recap of AI News This Week: Machine Learning Tools Garner Billions in Investments


Keeping pace with the dynamic AI landscape requires considerable effort. As we await the advent of AI-enabled assistance, we present a comprehensive overview of the past week's news, advancements, and notable research in the field of machine learning. Furthermore, we have included significant experiments that were not featured independently.

The intense competition within the AI industry, especially in the burgeoning field of generative AI, is increasingly evident. This week, Dropbox made a significant move by announcing the establishment of its inaugural corporate venture fund, Dropbox Ventures. This strategic initiative aims to support startups that are developing AI-powered products to revolutionize the future of work. Meanwhile, "not to be overshadowed, AWS launched a remarkable $100 million program dedicated to funding generative AI projects led by its esteemed partners and customers."

Undoubtedly, the AI sector is witnessing a substantial influx of capital. Salesforce Ventures, the venture capital arm of Salesforce, has disclosed its intention to allocate a staggering $500 million to support startups focused on the development of generative AI technologies. In a similar vein, Workday has recently augmented its existing venture capital fund by an additional $250 million, with a specific emphasis on backing startups specializing in AI and machine learning. Moreover, Accenture and PwC have made remarkable commitments to the AI field, with plans to invest $3 billion and $1 billion, respectively, to foster the growth and innovation in this domain.

The question arises as to whether pouring money into the AI field can truly solve the complex challenges it currently faces.

At a prominent Bloomberg conference held in San Francisco this week, an insightful panel discussion took place, featuring Meredith Whittaker, the president of Signal, a highly secure messaging app. Whittaker raised a critical point about the increasingly opaque nature of the technological framework supporting some of the most popular AI applications today. To illustrate this concern, she shared an illustrative example of an individual entering a bank and seeking a loan.

During the Bloomberg conference panel, Whittaker shared an alarming example illustrating the lack of transparency surrounding AI-powered loan evaluations. She underscored that an individual approaching a bank for a loan might be unaware of the intricate system operating in the background, potentially utilizing a Microsoft API and scraping social media data to determine their creditworthiness. Whittaker lamented the absence of a mechanism allowing individuals to access this information.

According to Whittaker, the challenge lies not in the availability of capital but rather in the prevailing power hierarchy.

Whittaker further emphasized, "Having had the opportunity to participate in discussions for a considerable period of time, approximately 15 to 20 years, I can attest that merely having a seat at the table without any real power yields limited impact."

Undoubtedly, effecting structural change poses greater challenges than simply seeking financial resources, especially when such change may not align with the existing power dynamics. Whittaker cautions about the potential consequences in the absence of sufficient resistance.

With the rapid advancement of AI, the societal ramifications are also intensifying, propelling us further along a path of exaggerated enthusiasm for AI. Whittaker expresses concern about the consolidation and normalization of power under the pretext of intelligence, leading to extensive surveillance that severely limits our individual and collective autonomy.

These considerations warrant serious reflection within the industry. However, whether this reflection will actually occur remains uncertain. The upcoming Disrupt conference in September may provide a platform for discussing and addressing these concerns.

Noteworthy AI Headlines from the Past Few Days:

DeepMind's AI Takes the Reins in Robot Control: DeepMind has announced the development of RoboCat, an AI model capable of executing various tasks across different robotic arm models. While this achievement may not be groundbreaking on its own, DeepMind asserts that RoboCat is the first model capable of solving and adapting to multiple tasks while operating with diverse real-world robots.

Robots Acquire Knowledge from YouTube: During a recent presentation, Deepak Pathak, an assistant professor at CMU Robotics Institute, unveiled VRB (Vision-Robotics Bridge), an innovative AI system developed to train robots through human demonstrations. VRB enables robots to observe recordings of humans performing tasks, focusing on crucial details such as contact points and trajectory, and subsequently replicating the task.

Otter Ventures into the Chatbot Arena: Otter, the automatic transcription service, unveiled a new AI-powered chatbot this week. The chatbot is specifically designed to facilitate seamless collaboration among participants during and after meetings by allowing them to ask questions and engage with their teammates.

EU Advocates for Regulation of AI: European regulators find themselves at a crucial crossroads concerning the regulation and commercial/noncommercial applications of AI within the region. Adding to the discourse, the European Consumer Organisation (BEUC), the leading consumer group in the EU, has released its position. The BEUC strongly advocates for swift action, calling for urgent investigations into the risks posed by generative AI.

Vimeo Introduces AI-Powered Features: Vimeo, a leading video platform, unveiled a comprehensive range of AI-driven tools this week. Specifically designed to enhance user experience, these tools offer advanced capabilities such as script creation, integrated teleprompter functionality for seamless recording, and automated removal of prolonged pauses and unwanted disfluencies such as filler words ('ahs' and 'ums') from recordings.

Capital for synthetic voices: ElevenLabs, a pioneering AI-powered platform specializing in the creation of synthetic voices, recently concluded a successful funding round, securing an impressive $19 million. Since its launch in late January, ElevenLabs has garnered significant attention, rapidly gaining momentum in the industry. However, amidst its rise to prominence, the platform has faced challenges in managing misuse by malicious actors, which has drawn unfavorable attention.

Converting Audio to Text: Gladia, an innovative AI startup based in France, has introduced a cutting-edge platform that utilizes OpenAI's Whisper transcription model. Through an accessible API, Gladia's platform can swiftly convert audio content into text, almost in real time. Promising remarkable affordability and efficiency, Gladia claims to transcribe an hour of audio for a mere $0.61, with the entire process taking approximately 60 seconds.

Harness Leverages Generative AI Capabilities: This week, Harness, a forward-thinking startup focused on improving developer operations, announced the integration of AI into its platform. Through this advancement, Harness enables automated resolution of build and deployment failures, proactive detection and remediation of security vulnerabilities, and insightful recommendations for optimizing cloud costs. By harnessing the power of AI, the platform empowers developers to enhance efficiency and streamline their operations.

Other machine learnings

The CVPR conference, held this week in Vancouver, Canada, brought together leading experts in the field of computer vision and pattern recognition. Although I was unable to attend, the talks and research papers presented at the event piqued my interest. For those with limited time, I suggest watching Yejin Choi's keynote address, where she delves into the potential, challenges, and paradoxes associated with AI.

During her presentation, the renowned University of Washington professor and recipient of the MacArthur Genius grant shed light on some unexpected constraints faced by today's most advanced models. Notably, GPT-4 exhibits significant difficulty with multiplication tasks. Surprisingly, it frequently fails to accurately calculate the product of two three-digit numbers. However, with some guidance, it manages to achieve a correct answer 95% of the time. This limitation raises an important question: Why is the inability of a language model to perform math significant? The reason lies in the current AI landscape, where the widespread belief is that language models possess robust generalization abilities across a range of complex tasks, such as tax preparation or accounting. Choi's key message emphasizes the importance of identifying and understanding the limitations of AI, enabling us to gain deeper insights into their true capabilities.

The remaining segments of her presentation were equally captivating and intellectually stimulating. The complete talk is available for viewing here.

Renowned as a 'slayer of hype,' Rod Brooks delivered a captivating presentation in which he delved into the historical origins of fundamental concepts in machine learning. He emphasized that these concepts, which may appear novel to many practitioners today, were actually pioneered decades ago. Brooks traced the roots back to influential figures such as McCulloch, Minsky, and Hebb, demonstrating how their ideas have endured and remained relevant over time. This retrospective serves as a valuable reminder that the field of machine learning is built upon the profound contributions of visionary thinkers, spanning back to the postwar era.

CVPR witnessed a substantial influx of paper submissions, encompassing a wide array of research endeavors. While it is crucial to appreciate the comprehensive body of work presented, this news roundup focuses on the papers that captured the attention of the conference judges as the most compelling contributions. Although these selected papers offer an insightful perspective, they should not be regarded as an exhaustive literature review.


VISPROG, developed by researchers at AI2, is a sophisticated meta-model designed to execute intricate visual manipulation tasks using a versatile code toolbox. For instance, if provided with an image featuring a grizzly bear on grass (as depicted), simply instructing it to 'replace the bear with a polar bear on snow' initiates the process. VISPROG adeptly identifies the different elements within the image, visually isolates them, conducts a search for a suitable replacement or generates one if needed, and intelligently reassembles the entire composition seamlessly. This remarkable capability renders even the famed 'enhance' interface from Blade Runner relatively ordinary in comparison, underscoring the breadth of VISPROG's functionalities.

A collaborative Chinese research group has introduced a concept called 'planning-oriented autonomous driving' to address the fragmented approach typically adopted in the development of self-driving cars. Traditionally, the process involves discrete stages of 'perception, prediction, and planning,' each comprising numerous sub-tasks such as person segmentation and obstacle identification. The group's model aims to consolidate these stages into a single comprehensive framework, akin to the multi-modal models capable of processing various input modalities like text, audio, or images. By doing so, this model streamlines the intricate interdependencies found in modern autonomous driving systems.


DynIBaR introduces an advanced approach to video manipulation utilizing 'dynamic Neural Radiance Fields' (NeRFs) to achieve high-quality and robust interactions with video content. By gaining a deep understanding of the objects within the video, this technique enables capabilities such as stabilization, dolly movements, and other functionalities that are typically considered unattainable after recording. Once again, we witness the transformative power of video enhancement. It is worth noting that this innovation aligns with the kind of breakthrough that technology giants like Apple seek to incorporate and showcase at prestigious events such as WWDC.

DreamBooth, which may ring a bell from an earlier release this year, represents a significant advancement in the realm of image manipulation, particularly in the creation of deepfakes. Undoubtedly, this system holds immense value and power in facilitating such image operations, providing both practical applications and entertainment. Research teams, including those at Google, are dedicated to refining the technology, striving for enhanced seamlessness and realism. However, the potential consequences of these advancements remain a topic for future consideration.

Acknowledging its exceptional contributions, the best student paper award recognizes a pioneering method dedicated to the comparison and matching of meshes, including 3D point clouds. Although the technical intricacies might be beyond my expertise to elaborate upon, it is crucial to recognize the practical importance of this capability in the realm of real-world perception. The continued efforts to refine and enhance this area are met with enthusiasm. For concrete examples and comprehensive details, I recommend referring to the paper directly.

Intel unveiled an intriguing model known as LDM3D, designed to generate 3D 360 imagery, particularly for virtual environments. This remarkable technology allows users to effortlessly request specific scenes, such as an overgrown ruin in the jungle, and have a customized, freshly generated environment instantly created. The implications for immersive experiences, particularly in the metaverse, are quite promising.

Meta, the company formerly known as Facebook, introduced Voicebox, an advanced voice synthesis tool that excels at extracting voice features and replicating them, even in cases where the input is not pristine. Typically, voice replication requires a substantial amount and diversity of clean voice recordings. However, Voicebox demonstrates superior performance with less data, often as little as two seconds. It's worth noting that Meta has taken measures to ensure responsible use of this technology. For individuals interested in voice cloning, Acapela provides a suitable platform.

0 Comments:

Post a Comment

Subscribe to Post Comments [Atom]

<< Home