AI advancements were monumental in 2023, and even greater strides are anticipated in 2024. Here are the trends we foresee:
-
Shift from Text, Image, and Audio to Video, Vision, and 3D: While practical applications initially emphasized text, image, and audio models in content creation, 2024 is expected to see a shift towards greater emphasis on video, vision, and 3D.
-
The transition from single models to multimodal: This will be akin to the consolidation of various devices (GPS, calculator, route planner, phone, iPod) into a single smartphone. In a similar fashion, different models for text, image, audio, video, and 3D will converge into a multimodal model capable of handling diverse inputs and outputs. You can already partly see this in models like GPT 4.0 and Apple Ferret. It will be interesting to see what new modalities will be added. I’m rooting for an ’emotion’ modality!
-
Submodels for Specialized Knowledge Areas: To address the challenge of growing model sizes and data requirements, submodels specializing in specific knowledge areas are expected to become more prevalent. This approach, demonstrated by models like Mistral Mixture of Experts, not only yields promising results but also minimizes computational resources needed for generating responses.
-
Development of Physical AI: The rise of agent or behavioral models, processing inputs such as text, audio, video, and touch, will continue. These multimodal models are envisioned to possess a comprehensive understanding of the world, enabling them to perceive, evaluate, and plan based on real-time information. The simultaneous evolution of hardware, exemplified by developments like Tesla Optimus, Figure-01 and Digit, indicates a leap forward in the integration of AI with robotics. There robots are rumored to be used in 2024 in Tesla Factories and Amazon warehouses. These models can be trained on synthetic data (like Nvidia Omniverse or Digital Twin data) or by video or mirroring human movement.
-
Spatial Computing / XR Growth: The recent strides in generative AI, particularly in 3D object creation and virtual worlds, are extending beyond gaming and robot training to impact spatial computing. Advances in affordable headsets like Meta Quest 3 and X-Real, along with amazing developments in hardware such as the Apple Vision Pro, are making spatial computing more accessible.
-
Closing Gap Between Open Source and Closed Source Models: The disparity between open source and closed source AI models is diminishing, with major players, including Meta, releasing their models to undermine competitors. The real battleground is shifting towards data ownership, where companies with proprietary datasets are poised to become dominant players, potentially leading to monopolies.
-
Neuromorphic Computing and On-Chip Memory: The progression from GPUs to dedicated server farms, particularly NVIDIA A100, is now advancing towards a closer integration of processors and memory. Neuromorphic computing, resembling neural network architecture, and on-chip memory, placing memory close to the chip, hold promise for improved energy efficiency and processing speed, exemplified by recent developments like the IBM Northpole chip.
-
Breakthrough off Local Models and Edge Computing: The emergence of local models and edge computing heralds a new era of AI with distinct advantages. As model sizes shrink, exemplified by innovations like Apple Ferret, and consumer hardware, notably Apple M chips, gains enhanced processing power, local models become a cost-effective, swift, and self-sufficient solution. Liberated from additional costs per request, local models respond rapidly, operating independently. This breakthrough not only promises cost savings but also opens avenues for groundbreaking business models and applications, reshaping the landscape of both software and hardware innovation.
-
Autonomous agents are poised to revolutionize the landscape of code creation: The utilization of generative AI as a powerful tool for code suggestions has already made a significant impact. However, the advent of autonomous agents, exemplified by platforms like Open Interpreter, represents the next evolutionary leap in self-reflective and self-learning capabilities. These agents go beyond mere code suggestions; they can compile, check running code, and interpret results, enabling them to actively make improvements to the codebase. This marks a transformative step in the way we approach and enhance code development.