Apple unveils AI model for instruction-based image editing
Apple has unveiled a new open-source AI model named “MGIE,” designed to edit images according to natural language instructions.
MGIE was developed as a collaborative effort between Apple and researchers from the University of California, Santa Barbara.
MGIE, short for MLLM-Guided Image Editing, pioneers the concept of leveraging MLLMs, which are robust AI models adept at processing both text and images, to elevate instruction-based image editing.
MLLMs play a crucial role in this innovation, adeptly converting straightforward or ambiguous text prompts into precise and comprehensive instructions for the photo editor to execute.
While MLLMs have demonstrated exceptional cross-modal understanding and visual-aware response generation, their application to image editing tasks has remained relatively untapped until now.
MGIE exhibits remarkable versatility, capable of addressing a broad spectrum of editing needs, from basic colour corrections to intricate object manipulations. Moreover, the model offers the flexibility to execute global and localized edits, tailoring the editing process precisely to the user’s preferences.
Also Read: U.S. regulator rejects Apple, Disney bid to skip vote on AI
MGIE employs MLLMs in two key capacities: firstly, to extract precise instructions from user input, offering clear and concise guidance for editing. For instance, input like “enhance the sky’s blueness” might yield an instruction such as “boost the sky’s saturation by 20%.”
Secondly, MGIE employs MLLMs to create a visual imagination, capturing the essence of the desired edit. This representation guides pixel-level manipulation. MGIE utilizes an innovative end-to-end training approach that optimizes instruction derivation, visual imagination, and image editing modules simultaneously.
The model was presented in a paper accepted at the prestigious International Conference on Learning Representations (ICLR) 2024, a leading venue for AI research. The paper highlights MGIE’s effectiveness in improving both automated metrics and human evaluation, all while maintaining competitive inference efficiency.
MGIE is available as an open-source initiative on GitHub, offering users access to its code, datasets, and pre-trained models. Additionally, the project features a demonstration notebook illustrating MGIE’s utility across different editing tasks. For added convenience, users can explore MGIE via an online web demo hosted on Hugging Face Spaces, a collaborative platform for machine learning (ML) projects.
Comments are closed.