Generative AI tool helps 3D print personal items that sustain daily use

Generative artificial intelligence models have left such an indelible impact on digital content creation that it’s getting harder to recall what the internet was like before it. You can call on these AI tools for clever projects such as videos and photos — but their flair for the creative hasn’t quite crossed over into the physical world just yet.

So why haven’t we seen generative AI-enabled personalized objects, such as phone cases and pots, in places like homes, offices, and stores yet? According to MIT Computer Science and Artificial Intelligence Laboratory (CSAIL) researchers, a key issue is the mechanical integrity of the 3D model.

While AI can help generate personalized 3D models that you can fabricate, those systems don’t often consider the physical properties of the 3D model. MIT Department of Electrical Engineering and Computer Science (EECS) PhD student and CSAIL engineer Faraz Faruqi has explored this trade-off, creating generative AI-based systems that can make aesthetic changes to designs while preserving functionality, and another that modifies structures with the desired tactile properties users want to feel.

Making it real 

Together with researchers at Google, Stability AI, and Northeastern University, Faruqi has now found a way to make real-world objects with AI, creating items that are both durable and exhibit the user’s intended appearance and texture. With the AI-powered “MechStyle” system, users simply upload a 3D model or select a preset asset of things like vases and hooks, and prompt the tool using images or text to create a personalized version. A generative AI model then modifies the 3D geometry, while MechStyle simulates how those changes will impact particular parts, ensuring vulnerable areas remain structurally sound. When you’re happy with this AI-enhanced blueprint, you can 3D print it and use it in the real world.

You could select a model of, say, a wall hook, and the material you’ll be printing it with (for example, plastics like polylactic acid). Then, you can prompt the system to create a personalized version, with directions like, “generate a cactus-like hook.” The AI model will work in tandem with the simulation module and generate a 3D model resembling a cactus while also having the structural properties of a hook. This green, ridged accessory can then be used to hang up mugs, coats, and backpacks. Such creations are possible thanks, in part, to a stylization process, where the system changes a model’s geometry based on its understanding of the text prompt, and working with the feedback received from the simulation module.

According to CSAIL researchers, 3D stylization used to come with unintended consequences. Their formative study revealed that only about 26 percent of 3D models remained structurally viable after they were modified, meaning that the AI system didn’t understand the physics of the models it was modifying.

“We want to use AI to create models that you can actually fabricate and use in the real world,” says Faruqi, who is a lead author on a paper presenting the project. “So MechStyle actually simulates how GenAI-based changes will impact a structure. Our system allows you to personalize the tactile experience for your item, incorporating your personal style into it while ensuring the object can sustain everyday use.”

This computational thoroughness could eventually help users personalize their belongings, creating a unique pair of glasses with speckled blue and beige dots resembling fish scales, for example. It also produced a pillbox with a rocky texture that’s checkered with pink and aqua spots. The system’s potential extends to crafting unique home and office decor, like a lampshade resembling red magma. It can even design assistive technology fit to users’ specifications, such as finger splints to aid with dexterous injuries and utensil grips to aid with motor impairments.

In the future, MechStyle could also be useful in creating prototypes for accessories and other handheld products you might sell in a toy shop, hardware store, or craft boutique. The goal, CSAIL researchers say, is for both expert and novice designers to spend more time brainstorming and testing out different 3D designs, instead of assembling and customizing items by hand.

Staying strong

To ensure MechStyle’s creations could withstand daily use, the researchers augmented their generative AI technology with a type of physics simulation called a finite element analysis (FEA). You can imagine a 3D model of an item, such as a pair of glasses, with a sort of heat map indicating which regions are structurally viable under a realistic amount of weight, and which ones aren’t. As AI refines this model, the physics simulations highlight which parts of the model are getting weaker and prevent further changes.

Faruqi adds that running these simulations every time a change is made drastically slows down the AI process, so MechStyle is designed to know when and where to do additional structural analyses. “MechStyle’s adaptive scheduling strategy keeps track of what changes are happening in specific points in the model. When the genAI system makes tweaks that endanger certain regions of the model, our approach simulates the physics of the design again. MechStyle will make subsequent modifications to make sure the model doesn’t break after fabrication.”

Combining the FEA process with adaptive scheduling allowed MechStyle to generate objects that were as high as 100 percent structurally viable. Testing out 30 different 3D models with styles resembling things like bricks, stones, and cacti, the team found that the most efficient way to create structurally viable objects was to dynamically identify weak regions and tweak the generative AI process to mitigate its effect. In these scenarios, the researchers found that they could either stop stylization completely when a particular stress threshold was reached, or gradually make smaller refinements to prevent at-risk areas from approaching that mark.

The system also offers two different modes: a freestyle feature that allows AI to quickly visualize different styles on your 3D model, and a MechStyle one that carefully analyzes the structural impacts of your tweaks. You can explore different ideas, then try the MechStyle mode to see how those artistic flourishes will affect the durability of particular regions of the model.

CSAIL researchers add that while their model can ensure your model remains structurally sound before being 3D printed, it’s not yet able to improve 3D models that weren’t viable to begin with. If you upload such a file to MechStyle, you’ll receive an error message, but Faruqi and his colleagues intend to improve the durability of those faulty models in the future.

What’s more, the team hopes to use generative AI to create 3D models for users, instead of stylizing presets and user-uploaded designs. This would make the system even more user-friendly, so that those who are less familiar with 3D models, or can’t find their design online, can simply generate it from scratch. Let’s say you wanted to fabricate a unique type of bowl, and that 3D model wasn’t available in a repository; AI could create it for you instead.

“While style-transfer for 2D images works incredibly well, not many works have explored how this transfer to 3D,” says Google Research Scientist Fabian Manhardt, who wasn’t involved in the paper. “Essentially, 3D is a much more difficult task, as training data is scarce and changing the object’s geometry can harm its structure, rendering it unusable in the real world. MechStyle helps solve this problem, allowing for 3D stylization without breaking the object’s structural integrity via simulation. This gives people the power to be creative and better express themselves through products that are tailored towards them.”

Farqui wrote the paper with senior author Stefanie Mueller, who is an MIT associate professor and CSAIL principal investigator, and two other CSAIL colleagues: researcher Leandra Tejedor SM ’24, and postdoc Jiaji Li. Their co-authors are Amira Abdel-Rahman PhD ’25, now an assistant professor at Cornell University, and Martin Nisser SM ’19, PhD ’24; Google researcher Vrushank Phadnis; Stability AI Vice President of Research Varun Jampani; MIT Professor and Center for Bits and Atoms Director Neil Gershenfeld; and Northeastern University Assistant Professor Megan Hofmann.

Their work was supported by the MIT-Google Program for Computing Innovation. It was presented at the Association for Computing Machinery’s Symposium on Computational Fabrication in November.

3 Questions: How AI could optimize the power grid

Artificial intelligence has captured headlines recently for its rapidly growing energy demands, and particularly the surging electricity usage of data centers that enable the training and deployment of the latest generative AI models. But it’s not all bad news — some AI tools have the potential to reduce some forms of energy consumption and enable cleaner grids.

One of the most promising applications is using AI to optimize the power grid, which would improve efficiency, increase resilience to extreme weather, and enable the integration of more renewable energy. To learn more, MIT News spoke with Priya Donti, the Silverman Family Career Development Professor in the MIT Department of Electrical Engineering and Computer Science (EECS) and a principal investigator at the Laboratory for Information and Decision Systems (LIDS), whose work focuses on applying machine learning to optimize the power grid.

Q: Why does the power grid need to be optimized in the first place?

A: We need to maintain an exact balance between the amount of power that is put into the grid and the amount that comes out at every moment in time. But on the demand side, we have some uncertainty. Power companies don’t ask customers to pre-register the amount of energy they are going to use ahead of time, so some estimation and prediction must be done.

Then, on the supply side, there is typically some variation in costs and fuel availability that grid managers need to be responsive to. That has become an even bigger issue because of the integration of energy from time-varying renewable sources, like solar and wind, where uncertainty in the weather can have a major impact on how much power is available. Then, at the same time, depending on how power is flowing in the grid, there is some power lost through resistive heat on the power lines. So, as a grid operator, how do you make sure all that is working all the time? That is where optimization comes in.

Q: How can AI be most useful in power grid optimization?

A: One way AI can be helpful is to use a combination of historical and real-time data to make more precise predictions about how much renewable energy will be available at a certain time. This could lead to a cleaner power grid by allowing us to handle and better utilize these resources.

AI could also help tackle the complex optimization problems that power grid operators must solve to balance supply and demand in a way that also reduces costs. These optimization problems are used to determine which power generators should produce power, how much they should produce, and when they should produce it, as well as when batteries should be charged and discharged, and whether we can leverage flexibility in power loads. These optimization problems are so computationally expensive that operators use approximations so they can solve them in a feasible amount of time. But these approximations are often wrong, and when we integrate more renewable energy into the grid, they are thrown off even farther. AI can help by providing more accurate approximations in a faster manner, which can be deployed in real-time to help grid operators responsively and proactively manage the grid.

AI could also be useful in the planning of next-generation power grids. Planning for power grids requires one to use huge simulation models, so AI can play a big role in running those models more efficiently. The technology can also help with predictive maintenance by detecting where anomalous behavior on the grid is likely to happen, reducing inefficiencies that come from outages. More broadly, AI could also be applied to accelerate experimentation aimed at creating better batteries, which would allow the integration of more energy from renewable sources into the grid.

Q: How should we think about the pros and cons of AI, from an energy sector perspective?

A: One important thing to remember is that AI refers to a heterogeneous set of technologies. There are different types and sizes of models that are used, and different ways that models are used. If you are using a model that is trained on a smaller amount of data with a smaller number of parameters, that is going to consume much less energy than a large, general-purpose model.

In the context of the energy sector, there are a lot of places where, if you use these application-specific AI models for the applications they are intended for, the cost-benefit tradeoff works out in your favor. In these cases, the applications are enabling benefits from a sustainability perspective — like incorporating more renewables into the grid and supporting decarbonization strategies.

Overall, it’s important to think about whether the types of investments we are making into AI are actually matched with the benefits we want from AI. On a societal level, I think the answer to that question right now is “no.” There is a lot of development and expansion of a particular subset of AI technologies, and these are not the technologies that will have the biggest benefits across energy and climate applications. I’m not saying these technologies are useless, but they are incredibly resource-intensive, while also not being responsible for the lion’s share of the benefits that could be felt in the energy sector.

I’m excited to develop AI algorithms that respect the physical constraints of the power grid so that we can credibly deploy them. This is a hard problem to solve. If an LLM says something that is slightly incorrect, as humans, we can usually correct for that in our heads. But if you make the same magnitude of a mistake when you are optimizing a power grid, that can cause a large-scale blackout. We need to build models differently, but this also provides an opportunity to benefit from our knowledge of how the physics of the power grid works.

And more broadly, I think it’s critical that those of us in the technical community put our efforts toward fostering a more democratized system of AI development and deployment, and that it’s done in a way that is aligned with the needs of on-the-ground applications.

Decoding the Arctic to predict winter weather

Every autumn, as the Northern Hemisphere moves toward winter, Judah Cohen starts to piece together a complex atmospheric puzzle. Cohen, a research scientist in MIT’s Department of Civil and Environmental Engineering (CEE), has spent decades studying how conditions in the Arctic set the course for winter weather throughout Europe, Asia, and North America. His research dates back to his postdoctoral work with Bacardi and Stockholm Water Foundations Professor Dara Entekhabi that looked at snow cover in the Siberian region and its connection with winter forecasting.

Cohen’s outlook for the 2025–26 winter highlights a season characterized by indicators emerging from the Arctic using a new generation of artificial intelligence tools that help develop the full atmospheric picture.

Looking beyond the usual climate drivers

Winter forecasts rely heavily on El Niño–Southern Oscillation (ENSO) diagnostics, which are the tropical Pacific Ocean and atmosphere conditions that influence weather around the world. However, Cohen notes that ENSO is relatively weak this year.

“When ENSO is weak, that’s when climate indicators from the Arctic becomes especially important,” Cohen says.

Cohen monitors high-latitude diagnostics in his subseasonal forecasting, such as October snow cover in Siberia, early-season temperature changes, Arctic sea-ice extent, and the stability of the polar vortex. “These indicators can tell a surprisingly detailed story about the upcoming winter,” he says. 

One of Cohen’s most consistent data predictors is October’s weather in Siberia. This year, when the Northern Hemisphere experienced an unusually warm October, Siberia was colder than normal with an early snow fall. “Cold temperatures paired with early snow cover tend to strengthen the formation of cold air masses that can later spill into Europe and North America,” says Cohen — weather patterns that are historically linked to more frequent cold spells later in winter.

Warm ocean temperatures in the Barents–Kara Sea and an “easterly” phase of the quasi-biennial oscillation also suggest a potentially weaker polar vortex in early winter. When this disturbance couples with surface conditions in December, it leads to lower-than-normal temperatures across parts of Eurasia and North America earlier in the season.

AI subseasonal forecasting

While AI weather models have made impressive strides showcasing in short-range (one-to–10-day) forecasts, these advances have not yet applied to longer periods. The subseasonal prediction covering two to six weeks remains one of the toughest challenges in the field.

That gap is why this year could be a turning point for subseasonal weather forecasting. A team of researchers working with Cohen won first place for the fall season in the 2025 AI WeatherQuest subseasonal forecasting competition, held by the European Centre for Medium-Range Weather Forecasts (ECMWF). The challenge evaluates how well AI models capture temperature patterns over multiple weeks, where forecasting has been historically limited.

The winning model combined machine-learning pattern recognition with the same Arctic diagnostics Cohen has refined over decades. The system demonstrated significant gains in multi-week forecasting, surpassing leading AI and statistical baselines.

“If this level of performance holds across multiple seasons, it could represent a real step forward for subseasonal prediction,” Cohen says

The model also detected a potential cold surge in mid-December for the U.S. East Coast much earlier than usual, weeks before such signals typically arise. The forecast was widely publicized in the media in real-time. If validated, Cohen explains, it would show how combining Arctic indicators with AI could extend the lead time for predicting impactful weather.

“Flagging a potential extreme event three to four weeks in advance would be a watershed moment,” he adds. “It would give utilities, transportation systems, and public agencies more time to prepare.”

What this winter may hold

Cohen’s model shows a greater chance of colder-than-normal conditions across parts of Eurasia and central North America later in the winter, with the strongest anomalies likely mid-season.

“We’re still early, and patterns can shift,” Cohen says. “But the ingredients for a colder winter pattern are there.”

As Arctic warming speeds up, its impact on winter behavior is becoming more evident, making it increasingly important to understand these connections for energy planning, transportation, and public safety. Cohen’s work shows that the Arctic holds untapped subseasonal forecasting power, and AI may help unlock it for time frames that have long been challenging for traditional models.

In November, Cohen even appeared as a clue in The Washington Post crossword, a small sign of how widely his research has entered public conversations about winter weather.

“For me, the Arctic has always been the place to watch,” he says. “Now AI is giving us new ways to interpret its signals.”

Cohen will continue to update his outlook throughout the season on his blog.

MIT scientists investigate memorization risk in the age of clinical AI

What is patient privacy for? The Hippocratic Oath, thought to be one of the earliest and most widely known medical ethics texts in the world, reads: “Whatever I see or hear in the lives of my patients, whether in connection with my professional practice or not, which ought not to be spoken of outside, I will keep secret, as considering all such things to be private.” 

As privacy becomes increasingly scarce in the age of data-hungry algorithms and cyberattacks, medicine is one of the few remaining domains where confidentiality remains central to practice, enabling patients to trust their physicians with sensitive information.

But a paper co-authored by MIT researchers investigates how artificial intelligence models trained on de-identified electronic health records (EHRs) can memorize patient-specific information. The work, which was recently presented at the 2025 Conference on Neural Information Processing Systems (NeurIPS), recommends a rigorous testing setup to ensure targeted prompts cannot reveal information, emphasizing that leakage must be evaluated in a health care context to determine whether it meaningfully compromises patient privacy.

Foundation models trained on EHRs should normally generalize knowledge to make better predictions, drawing upon many patient records. But in “memorization,” the model draws upon a singular patient record to deliver its output, potentially violating patient privacy. Notably, foundation models are already known to be prone to data leakage.

“Knowledge in these high-capacity models can be a resource for many communities, but adversarial attackers can prompt a model to extract information on training data,” says Sana Tonekaboni, a postdoc at the Eric and Wendy Schmidt Center at the Broad Institute of MIT and Harvard and first author of the paper. Given the risk that foundation models could also memorize private data, she notes, “this work is a step towards ensuring there are practical evaluation steps our community can take before releasing models.”

To conduct research on the potential risk EHR foundation models could pose in medicine, Tonekaboni approached MIT Associate Professor Marzyeh Ghassemi, who is a principal investigator at the Abdul Latif Jameel Clinic for Machine Learning in Health (Jameel Clinic), a member of the Computer Science and Artificial Intelligence Lab. Ghassemi, a faculty member in the MIT Department of Electrical Engineering and Computer Science and Institute for Medical Engineering and Science, runs the Healthy ML group, which focuses on robust machine learning in health.

Just how much information does a bad actor need to expose sensitive data, and what are the risks associated with the leaked information? To assess this, the research team developed a series of tests that they hope will lay the groundwork for future privacy evaluations. These tests are designed to measure various types of uncertainty, and assess their practical risk to patients by measuring various tiers of attack possibility.  

“We really tried to emphasize practicality here; if an attacker has to know the date and value of a dozen laboratory tests from your record in order to extract information, there is very little risk of harm. If I already have access to that level of protected source data, why would I need to attack a large foundation model for more?” says Ghassemi. 

With the inevitable digitization of medical records, data breaches have become more commonplace. In the past 24 months, the U.S. Department of Health and Human Services has recorded 747 data breaches of health information affecting more than 500 individuals, with the majority categorized as hacking/IT incidents.

Patients with unique conditions are especially vulnerable, given how easy it is to pick them out. “Even with de-identified data, it depends on what sort of information you leak about the individual,” Tonekaboni says. “Once you identify them, you know a lot more.”

In their structured tests, the researchers found that the more information the attacker has about a particular patient, the more likely the model is to leak information. They demonstrated how to distinguish model generalization cases from patient-level memorization, to properly assess privacy risk. 

The paper also emphasized that some leaks are more harmful than others. For instance, a model revealing a patient’s age or demographics could be characterized as a more benign leakage than the model revealing more sensitive information, like an HIV diagnosis or alcohol abuse. 

The researchers note that patients with unique conditions are especially vulnerable given how easy it is to pick them out, which may require higher levels of protection. “Even with de-identified data, it really depends on what sort of information you leak about the individual,” Tonekaboni says. The researchers plan to expand the work to become more interdisciplinary, adding clinicians and privacy experts as well as legal experts. 

“There’s a reason our health data is private,” Tonekaboni says. “There’s no reason for others to know about it.”

This work supported by the Eric and Wendy Schmidt Center at the Broad Institute of MIT and Harvard, Wallenberg AI, the Knut and Alice Wallenberg Foundation, the U.S. National Science Foundation (NSF), a Gordon and Betty Moore Foundation award, a Google Research Scholar award, and the AI2050 Program at Schmidt Sciences. Resources used in preparing this research were provided, in part, by the Province of Ontario, the Government of Canada through CIFAR, and companies sponsoring the Vector Institute.

Guided learning lets “untrainable” neural networks realize their potential

Even networks long considered “untrainable” can learn effectively with a bit of a helping hand. Researchers at MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL) have shown that a brief period of alignment between neural networks, a method they call guidance, can dramatically improve the performance of architectures previously thought unsuitable for modern tasks.

Their findings suggest that many so-called “ineffective” networks may simply start from less-than-ideal starting points, and that short-term guidance can place them in a spot that makes learning easier for the network. 

The team’s guidance method works by encouraging a target network to match the internal representations of a guide network during training. Unlike traditional methods like knowledge distillation, which focus on mimicking a teacher’s outputs, guidance transfers structural knowledge directly from one network to another. This means the target learns how the guide organizes information within each layer, rather than simply copying its behavior. Remarkably, even untrained networks contain architectural biases that can be transferred, while trained guides additionally convey learned patterns. 

“We found these results pretty surprising,” says Vighnesh Subramaniam ’23, MEng ’24, MIT Department of Electrical Engineering and Computer Science (EECS) PhD student and CSAIL researcher, who is a lead author on a paper presenting these findings. “It’s impressive that we could use representational similarity to make these traditionally ‘crappy’ networks actually work.”

Guide-ian angel 

A central question was whether guidance must continue throughout training, or if its primary effect is to provide a better initialization. To explore this, the researchers performed an experiment with deep fully connected networks (FCNs). Before training on the real problem, the network spent a few steps practicing with another network using random noise, like stretching before exercise. The results were striking: Networks that typically overfit immediately remained stable, achieved lower training loss, and avoided the classic performance degradation seen in something called standard FCNs. This alignment acted like a helpful warmup for the network, showing that even a short practice session can have lasting benefits without needing constant guidance.

The study also compared guidance to knowledge distillation, a popular approach in which a student network attempts to mimic a teacher’s outputs. When the teacher network was untrained, distillation failed completely, since the outputs contained no meaningful signal. Guidance, by contrast, still produced strong improvements because it leverages internal representations rather than final predictions. This result underscores a key insight: Untrained networks already encode valuable architectural biases that can steer other networks toward effective learning.

Beyond the experimental results, the findings have broad implications for understanding neural network architecture. The researchers suggest that success — or failure — often depends less on task-specific data, and more on the network’s position in parameter space. By aligning with a guide network, it’s possible to separate the contributions of architectural biases from those of learned knowledge. This allows scientists to identify which features of a network’s design support effective learning, and which challenges stem simply from poor initialization.

Guidance also opens new avenues for studying relationships between architectures. By measuring how easily one network can guide another, researchers can probe distances between functional designs and reexamine theories of neural network optimization. Since the method relies on representational similarity, it may reveal previously hidden structures in network design, helping to identify which components contribute most to learning and which do not.

Salvaging the hopeless

Ultimately, the work shows that so-called “untrainable” networks are not inherently doomed. With guidance, failure modes can be eliminated, overfitting avoided, and previously ineffective architectures brought into line with modern performance standards. The CSAIL team plans to explore which architectural elements are most responsible for these improvements and how these insights can influence future network design. By revealing the hidden potential of even the most stubborn networks, guidance provides a powerful new tool for understanding — and hopefully shaping — the foundations of machine learning.

“It’s generally assumed that different neural network architectures have particular strengths and weaknesses,” says Leyla Isik, Johns Hopkins University assistant professor of cognitive science, who wasn’t involved in the research. “This exciting research shows that one type of network can inherit the advantages of another architecture, without losing its original capabilities. Remarkably, the authors show this can be done using small, untrained ‘guide’ networks. This paper introduces a novel and concrete way to add different inductive biases into neural networks, which is critical for developing more efficient and human-aligned AI.”

Subramaniam wrote the paper with CSAIL colleagues: Research Scientist Brian Cheung; PhD student David Mayo ’18, MEng ’19; Research Associate Colin Conwell; principal investigators Boris Katz, a CSAIL principal research scientist, and Tomaso Poggio, an MIT professor in brain and cognitive sciences; and former CSAIL research scientist Andrei Barbu. Their work was supported, in part, by the Center for Brains, Minds, and Machines, the National Science Foundation, the MIT CSAIL Machine Learning Applications Initiative, the MIT-IBM Watson AI Lab, the U.S. Defense Advanced Research Projects Agency (DARPA), the U.S. Department of the Air Force Artificial Intelligence Accelerator, and the U.S. Air Force Office of Scientific Research.

Their work was recently presented at the Conference and Workshop on Neural Information Processing Systems (NeurIPS).

A “scientific sandbox” lets researchers explore the evolution of vision systems

Why did humans evolve the eyes we have today?

While scientists can’t go back in time to study the environmental pressures that shaped the evolution of the diverse vision systems that exist in nature, a new computational framework developed by MIT researchers allows them to explore this evolution in artificial intelligence agents.

The framework they developed, in which embodied AI agents evolve eyes and learn to see over many generations, is like a “scientific sandbox” that allows researchers to recreate different evolutionary trees. The user does this by changing the structure of the world and the tasks AI agents complete, such as finding food or telling objects apart.

This allows them to study why one animal may have evolved simple, light-sensitive patches as eyes, while another has complex, camera-type eyes.

The researchers’ experiments with this framework showcase how tasks drove eye evolution in the agents. For instance, they found that navigation tasks often led to the evolution of compound eyes with many individual units, like the eyes of insects and crustaceans.

On the other hand, if agents focused on object discrimination, they were more likely to evolve camera-type eyes with irises and retinas.

This framework could enable scientists to probe “what-if” questions about vision systems that are difficult to study experimentally. It could also guide the design of novel sensors and cameras for robots, drones, and wearable devices that balance performance with real-world constraints like energy efficiency and manufacturability.

“While we can never go back and figure out every detail of how evolution took place, in this work we’ve created an environment where we can, in a sense, recreate evolution and probe the environment in all these different ways. This method of doing science opens to the door to a lot of possibilities,” says Kushagra Tiwary, a graduate student at the MIT Media Lab and co-lead author of a paper on this research.

He is joined on the paper by co-lead author and fellow graduate student Aaron Young; graduate student Tzofi Klinghoffer; former postdoc Akshat Dave, who is now an assistant professor at Stony Brook University; Tomaso Poggio, the Eugene McDermott Professor in the Department of Brain and Cognitive Sciences, an investigator in the McGovern Institute, and co-director of the Center for Brains, Minds, and Machines; co-senior authors Brian Cheung, a postdoc in the  Center for Brains, Minds, and Machines and an incoming assistant professor at the University of California San Francisco; and Ramesh Raskar, associate professor of media arts and sciences and leader of the Camera Culture Group at MIT; as well as others at Rice University and Lund University. The research appears today in Science Advances.

Building a scientific sandbox

The paper began as a conversation among the researchers about discovering new vision systems that could be useful in different fields, like robotics. To test their “what-if” questions, the researchers decided to use AI to explore the many evolutionary possibilities.

“What-if questions inspired me when I was growing up to study science. With AI, we have a unique opportunity to create these embodied agents that allow us to ask the kinds of questions that would usually be impossible to answer,” Tiwary says.

To build this evolutionary sandbox, the researchers took all the elements of a camera, like the sensors, lenses, apertures, and processors, and converted them into parameters that an embodied AI agent could learn.

They used those building blocks as the starting point for an algorithmic learning mechanism an agent would use as it evolved eyes over time.

“We couldn’t simulate the entire universe atom-by-atom. It was challenging to determine which ingredients we needed, which ingredients we didn’t need, and how to allocate resources over those different elements,” Cheung says.

In their framework, this evolutionary algorithm can choose which elements to evolve based on the constraints of the environment and the task of the agent.

Each environment has a single task, such as navigation, food identification, or prey tracking, designed to mimic real visual tasks animals must overcome to survive. The agents start with a single photoreceptor that looks out at the world and an associated neural network model that processes visual information.

Then, over each agent’s lifetime, it is trained using reinforcement learning, a trial-and-error technique where the agent is rewarded for accomplishing the goal of its task. The environment also incorporates constraints, like a certain number of pixels for an agent’s visual sensors.

“These constraints drive the design process, the same way we have physical constraints in our world, like the physics of light, that have driven the design of our own eyes,” Tiwary says.

Over many generations, agents evolve different elements of vision systems that maximize rewards.

Their framework uses a genetic encoding mechanism to computationally mimic evolution, where individual genes mutate to control an agent’s development.

For instance, morphological genes capture how the agent views the environment and control eye placement; optical genes determine how the eye interacts with light and dictate the number of photoreceptors; and neural genes control the learning capacity of the agents.

Testing hypotheses

When the researchers set up experiments in this framework, they found that tasks had a major influence on the vision systems the agents evolved.

For instance, agents that were focused on navigation tasks developed eyes designed to maximize spatial awareness through low-resolution sensing, while agents tasked with detecting objects developed eyes focused more on frontal acuity, rather than peripheral vision.

Another experiment indicated that a bigger brain isn’t always better when it comes to processing visual information. Only so much visual information can go into the system at a time, based on physical constraints like the number of photoreceptors in the eyes.

“At some point a bigger brain doesn’t help the agents at all, and in nature that would be a waste of resources,” Cheung says.

In the future, the researchers want to use this simulator to explore the best vision systems for specific applications, which could help scientists develop task-specific sensors and cameras. They also want to integrate LLMs into their framework to make it easier for users to ask “what-if” questions and study additional possibilities.

“There’s a real benefit that comes from asking questions in a more imaginative way. I hope this inspires others to create larger frameworks, where instead of focusing on narrow questions that cover a specific area, they are looking to answer questions with a much wider scope,” Cheung says.

This work was supported, in part, by the Center for Brains, Minds, and Machines and the Defense Advanced Research Projects Agency (DARPA) Mathematics for the Discovery of Algorithms and Architectures (DIAL) program.

A new way to increase the capabilities of large language models

Most languages use word position and sentence structure to extract meaning. For example, “The cat sat on the box,” is not the same as “The box was on the cat.” Over a long text, like a financial document or a novel, the syntax of these words likely evolves. 

Similarly, a person might be tracking variables in a piece of code or following instructions that have conditional actions. These are examples of state changes and sequential reasoning that we expect state-of-the-art artificial intelligence systems to excel at; however, the existing, cutting-edge attention mechanism within transformers — the primarily architecture used in large language models (LLMs) for determining the importance of words — has theoretical and empirical limitations when it comes to such capabilities.

An attention mechanism allows an LLM to look back at earlier parts of a query or document and, based on its training, determine which details and words matter most; however, this mechanism alone does not understand word order. It “sees” all of the input words, a.k.a. tokens, at the same time and handles them in the order that they’re presented, so researchers have developed techniques to encode position information. This is key for domains that are highly structured, like language. But the predominant position-encoding method, called rotary position encoding (RoPE), only takes into account the relative distance between tokens in a sequence and is independent of the input data. This means that, for example, words that are four positions apart, like “cat” and “box” in the example above, will all receive the same fixed mathematical rotation specific to that relative distance. 

Now research led by MIT and the MIT-IBM Watson AI Lab has produced an encoding technique known as “PaTH Attention” that makes positional information adaptive and context-aware rather than static, as with RoPE.

“Transformers enable accurate and scalable modeling of many domains, but they have these limitations vis-a-vis state tracking, a class of phenomena that is thought to underlie important capabilities that we want in our AI systems. So, the important question is: How can we maintain the scalability and efficiency of transformers, while enabling state tracking?” says the paper’s senior author Yoon Kim, an associate professor in the Department of Electrical Engineering and Computer Science (EECS), a member of the Computer Science and Artificial Intelligence Laboratory (CSAIL), and a researcher with the MIT-IBM Watson AI Lab.

A new paper on this work was presented earlier this month at the Conference on Neural Information Processing Systems (NeurIPS). Kim’s co-authors include lead author Songlin Yang, an EECS graduate student and former MIT-IBM Watson AI Lab Summer Program intern; Kaiyue Wen of Stanford University; Liliang Ren of Microsoft; and Yikang Shen, Shawn Tan, Mayank Mishra, and Rameswar Panda of IBM Research and the MIT-IBM Watson AI Lab.

Path to understanding 

Instead of assigning every word a fixed rotation based on relative distance between tokens, as RoPE does, PaTH Attention is flexible, treating the in-between words as a path made up of small, data-dependent transformations. Each transformation, based on a mathematical operation called a Householder reflection, acts like a tiny mirror that adjusts depending on the content of each token it passes. Each step in a sequence can influence how the model interprets information later on. The cumulative effect lets the system model how the meaning changes along the path between words, not just how far apart they are. This approach allows transformers to keep track of how entities and relationships change over time, giving it a sense of “positional memory.” Think of this as walking a path while experiencing your environment and how it affects you. Further, the team also developed a hardware-efficient algorithm to more efficiently compute attention scores between every pair of tokens so that the cumulative mathematical transformation from PaTH Attention is compressed and broken down into smaller computations so that it’s compatible with fast processing on GPUs.

The MIT-IBM researchers then explored PaTH Attention’s performance on synthetic and real-world tasks, including reasoning, long-context benchmarks, and full LLM training to see whether it improved a model’s ability to track information over time. The team tested its ability to follow the most recent “write” command despite many distracting steps and multi-step recall tests, tasks that are difficult for standard positional encoding methods like RoPE. The researchers also trained mid-size LLMs and compared them against other methods. PaTH Attention improved perplexity and outcompeted other methods on reasoning benchmarks it wasn’t trained on. They also evaluated retrieval, reasoning, and stability with inputs of tens of thousands of tokens. PaTH Attention consistently proved capable of content-awareness.

“We found that both on diagnostic tasks that are designed to test the limitations of transformers and on real-world language modeling tasks, our new approach was able to outperform existing attention mechanisms, while maintaining their efficiency,” says Kim. Further, “I’d be excited to see whether these types of data-dependent position encodings, like PATH, improve the performance of transformers on structured domains like biology, in [analyzing] proteins or DNA.”

Thinking bigger and more efficiently 

The researchers then investigated how the PaTH Attention mechanism would perform if it more similarly mimicked human cognition, where we ignore old or less-relevant information when making decisions. To do this, they combined PaTH Attention with another position encoding scheme known as the Forgetting Transformer (FoX), which allows models to selectively “forget.” The resulting PaTH-FoX system adds a way to down-weight information in a data-dependent way, achieving strong results across reasoning, long-context understanding, and language modeling benchmarks. In this way, PaTH Attention extends the expressive power of transformer architectures. 

Kim says research like this is part of a broader effort to develop the “next big thing” in AI. He explains that a major driver of both the deep learning and generative AI revolutions has been the creation of “general-purpose building blocks that can be applied to wide domains,” such as “convolution layers, RNN [recurrent neural network] layers,” and, most recently, transformers. Looking ahead, Kim notes that considerations like accuracy, expressivity, flexibility, and hardware scalability have been and will be essential. As he puts it, “the core enterprise of modern architecture research is trying to come up with these new primitives that maintain or improve the expressivity, while also being scalable.”

This work was supported, in part, by the MIT-IBM Watson AI Lab and the AI2050 program at Schmidt Sciences.

“Robot, make me a chair”

Computer-aided design (CAD) systems are tried-and-true tools used to design many of the physical objects we use each day. But CAD software requires extensive expertise to master, and many tools incorporate such a high level of detail they don’t lend themselves to brainstorming or rapid prototyping.

In an effort to make design faster and more accessible for non-experts, researchers from MIT and elsewhere developed an AI-driven robotic assembly system that allows people to build physical objects by simply describing them in words.

Their system uses a generative AI model to build a 3D representation of an object’s geometry based on the user’s prompt. Then, a second generative AI model reasons about the desired object and figures out where different components should go, according to the object’s function and geometry.

The system can automatically build the object from a set of prefabricated parts using robotic assembly. It can also iterate on the design based on feedback from the user.

The researchers used this end-to-end system to fabricate furniture, including chairs and shelves, from two types of premade components. The components can be disassembled and reassembled at will, reducing the amount of waste generated through the fabrication process.

They evaluated these designs through a user study and found that more than 90 percent of participants preferred the objects made by their AI-driven system, as compared to different approaches.

While this work is an initial demonstration, the framework could be especially useful for rapid prototyping complex objects like aerospace components and architectural objects. In the longer term, it could be used in homes to fabricate furniture or other objects locally, without the need to have bulky products shipped from a central facility.

“Sooner or later, we want to be able to communicate and talk to a robot and AI system the same way we talk to each other to make things together. Our system is a first step toward enabling that future,” says lead author Alex Kyaw, a graduate student in the MIT departments of Electrical Engineering and Computer Science (EECS) and Architecture.

Kyaw is joined on the paper by Richa Gupta, an MIT architecture graduate student; Faez Ahmed, associate professor of mechanical engineering; Lawrence Sass, professor and chair of the Computation Group in the Department of Architecture; senior author Randall Davis, an EECS professor and member of the Computer Science and Artificial Intelligence Laboratory (CSAIL); as well as others at Google Deepmind and Autodesk Research. The paper was recently presented at the Conference on Neural Information Processing Systems.

Generating a multicomponent design

While generative AI models are good at generating 3D representations, known as meshes,  from text prompts, most do not produce uniform representations of an object’s geometry that have the component-level details needed for robotic assembly.

Separating these meshes into components is challenging for a model because assigning components depends on the geometry and functionality of the object and its parts.

The researchers tackled these challenges using a vision-language model (VLM), a powerful generative AI model that has been pre-trained to understand images and text. They task the VLM with figuring out how two types of prefabricated parts, structural components and panel components, should fit together to form an object.

“There are many ways we can put panels on a physical object, but the robot needs to see the geometry and reason over that geometry to make a decision about it. By serving as both the eyes and brain of the robot, the VLM enables the robot to do this,” Kyaw says.

A user prompts the system with text, perhaps by typing “make me a chair,” and gives it an AI-generated image of a chair to start.

Then, the VLM reasons about the chair and determines where panel components go on top of structural components, based on the functionality of many example objects it has seen before. For instance, the model can determine that the seat and backrest should have panels to have surfaces for someone sitting and leaning on the chair.

It outputs this information as text, such as “seat” or “backrest.” Each surface of the chair is then labeled with numbers, and the information is fed back to the VLM.

Then the VLM chooses the labels that correspond to the geometric parts of the chair that should receive panels on the 3D mesh to complete the design.

Human-AI co-design

The user remains in the loop throughout this process and can refine the design by giving the model a new prompt, such as “only use panels on the backrest, not the seat.”

“The design space is very big, so we narrow it down through user feedback. We believe this is the best way to do it because people have different preferences, and building an idealized model for everyone would be impossible,” Kyaw says.

“The human‑in‑the‑loop process allows the users to steer the AI‑generated designs and have a sense of ownership in the final result,” adds Gupta.

Once the 3D mesh is finalized, a robotic assembly system builds the object using prefabricated parts. These reusable parts can be disassembled and reassembled into different configurations.

The researchers compared the results of their method with an algorithm that places panels on all horizontal surfaces that are facing up, and an algorithm that places panels randomly. In a user study, more than 90 percent of individuals preferred the designs made by their system.

They also asked the VLM to explain why it chose to put panels in those areas.

“We learned that the vision language model is able to understand some degree of the functional aspects of a chair, like leaning and sitting, to understand why it is placing panels on the seat and backrest. It isn’t just randomly spitting out these assignments,” Kyaw says.

In the future, the researchers want to enhance their system to handle more complex and nuanced user prompts, such as a table made out of glass and metal. In addition, they want to incorporate additional prefabricated components, such as gears, hinges, or other moving parts, so objects could have more functionality.

“Our hope is to drastically lower the barrier of access to design tools. We have shown that we can use generative AI and robotics to turn ideas into physical objects in a fast, accessible, and sustainable manner,” says Davis.

3 Questions: Using computation to study the world’s best single-celled chemists

Today, out of an estimated 1 trillion species on Earth, 99.999 percent are considered microbial — bacteria, archaea, viruses, and single-celled eukaryotes. For much of our planet’s history, microbes ruled the Earth, able to live and thrive in the most extreme of environments. Researchers have only just begun in the last few decades to contend with the diversity of microbes — it’s estimated that less than 1 percent of known genes have laboratory-validated functions. Computational approaches offer researchers the opportunity to strategically parse this truly astounding amount of information.

An environmental microbiologist and computer scientist by training, new MIT faculty member Yunha Hwang is interested in the novel biology revealed by the most diverse and prolific life form on Earth. In a shared faculty position as the Samuel A. Goldblith Career Development Professor in the Department of Biology, as well as an assistant professor at the Department of Electrical Engineering and Computer Science and the MIT Schwarzman College of Computing, Hwang is exploring the intersection of computation and biology.  

Q: What drew you to research microbes in extreme environments, and what are the challenges in studying them? 

A: Extreme environments are great places to look for interesting biology. I wanted to be an astronaut growing up, and the closest thing to astrobiology is examining extreme environments on Earth. And the only thing that lives in those extreme environments are microbes. During a sampling expedition that I took part in off the coast of Mexico, we discovered a colorful microbial mat about 2 kilometers underwater that flourished because the bacteria breathed sulfur instead of oxygen — but none of the microbes I was hoping to study would grow in the lab. 

The biggest challenge in studying microbes is that a majority of them cannot be cultivated, which means that the only way to study their biology is through a method called metagenomics. My latest work is genomic language modeling. We’re hoping to develop a computational system so we can probe the organism as much as possible “in silico,” just using sequence data. A genomic language model is technically a large language model, except the language is DNA as opposed to human language. It’s trained in a similar way, just in biological language as opposed to English or French. If our objective is to learn the language of biology, we should leverage the diversity of microbial genomes. Even though we have a lot of data, and even as more samples become available, we’ve just scratched the surface of microbial diversity. 

Q: Given how diverse microbes are and how little we understand about them, how can studying microbes in silico, using genomic language modeling, advance our understanding of the microbial genome? 

A: A genome is many millions of letters. A human cannot possibly look at that and make sense of it. We can program a machine, though, to segment data into pieces that are useful. That’s sort of how bioinformatics works with a single genome. But if you’re looking at a gram of soil, which can contain thousands of unique genomes, that’s just too much data to work with — a human and a computer together are necessary in order to grapple with that data. 

During my PhD and master’s degree, we were only just discovering new genomes and new lineages that were so different from anything that had been characterized or grown in the lab. These were things that we just called “microbial dark matter.” When there are a lot of uncharacterized things, that’s where machine learning can be really useful, because we’re just looking for patterns — but that’s not the end goal. What we hope to do is to map these patterns to evolutionary relationships between each genome, each microbe, and each instance of life. 

Previously, we’ve been thinking about proteins as a standalone entity — that gets us to a decent degree of information because proteins are related by homology, and therefore things that are evolutionarily related might have a similar function. 

What is known about microbiology is that proteins are encoded into genomes, and the context in which that protein is bounded — what regions come before and after — is evolutionarily conserved, especially if there is a functional coupling. This makes total sense because when you have three proteins that need to be expressed together because they form a unit, then you might want them located right next to each other. 

What I want to do is incorporate more of that genomic context in the way that we search for and annotate proteins and understand protein function, so that we can go beyond sequence or structural similarity to add contextual information to how we understand proteins and hypothesize about their functions. 

Q: How can your research be applied to harnessing the functional potential of microbes? 

A: Microbes are possibly the world’s best chemists. Leveraging microbial metabolism and biochemistry will lead to more sustainable and more efficient methods for producing new materials, new therapeutics, and new types of polymers. 

But it’s not just about efficiency — microbes are doing chemistry we don’t even know how to think about. Understanding how microbes work, and being able to understand their genomic makeup and their functional capacity, will also be really important as we think about how our world and climate are changing. A majority of carbon sequestration and nutrient cycling is undertaken by microbes; if we don’t understand how a given microbe is able to fix nitrogen or carbon, then we will face difficulties in modeling the nutrient fluxes of the Earth. 

On the more therapeutic side, infectious diseases are a real and growing threat. Understanding how microbes behave in diverse environments relative to the rest of our microbiome is really important as we think about the future and combating microbial pathogens. 

Deep-learning model predicts how fruit flies form, cell by cell

During early development, tissues and organs begin to bloom through the shifting, splitting, and growing of many thousands of cells.

A team of MIT engineers has now developed a way to predict, minute by minute, how individual cells will fold, divide, and rearrange during a fruit fly’s earliest stage of growth. The new method may one day be applied to predict the development of more complex tissues, organs, and organisms. It could also help scientists identify cell patterns that correspond to early-onset diseases, such as asthma and cancer.

In a study appearing today in the journal Nature Methods, the team presents a new deep-learning model that learns, then predicts, how certain geometric properties of individual cells will change as a fruit fly develops. The model records and tracks properties such as a cell’s position, and whether it is touching a neighboring cell at a given moment.

The team applied the model to videos of developing fruit fly embryos, each of which starts as a cluster of about 5,000 cells. They found the model could predict, with 90 percent accuracy, how each of the 5,000 cells would fold, shift, and rearrange, minute by minute, during the first hour of development, as the embryo morphs from a smooth, uniform shape into more defined structures and features.

“This very initial phase is known as gastrulation, which takes place over roughly one hour, when individual cells are rearranging on a time scale of minutes,” says study author Ming Guo, associate professor of mechanical engineering at MIT. “By accurately modeling this early period, we can start to uncover how local cell interactions give rise to global tissues and organisms.”

The researchers hope to apply the model to predict the cell-by-cell development in other species, such zebrafish and mice. Then, they can begin to identify patterns that are common across species. The team also envisions that the method could be used to discern early patterns of disease, such as in asthma. Lung tissue in people with asthma looks markedly different from healthy lung tissue. How asthma-prone tissue initially develops is an unknown process that the team’s new method could potentially reveal.

“Asthmatic tissues show different cell dynamics when imaged live,” says co-author and MIT graduate student Haiqian Yang. “We envision that our model could capture these subtle dynamical differences and provide a more comprehensive representation of tissue behavior, potentially improving diagnostics or drug-screening assays.”

The study’s co-authors are Markus Buehler, the McAfee Professor of Engineering in MIT’s Department of Civil and Environmental Engineering; George Roy and Tomer Stern of the University of Michigan; and Anh Nguyen and Dapeng Bi of Northeastern University.

Points and foams

Scientists typically model how an embryo develops in one of two ways: as a point cloud, where each point represents an individual cell as point that moves over time; or as a “foam,” which represents individual cells as bubbles that shift and slide against each other, similar to the bubbles in shaving foam.

Rather than choose between the two approaches, Guo and Yang embraced both.

“There’s a debate about whether to model as a point cloud or a foam,” Yang says. “But both of them are essentially different ways of modeling the same underlying graph, which is an elegant way to represent living tissues. By combining these as one graph, we can highlight more structural information, like how cells are connected to each other as they rearrange over time.”

At the heart of the new model is a “dual-graph” structure that represents a developing embryo as both moving points and bubbles. Through this dual representation, the researchers hoped to capture more detailed geometric properties of individual cells, such as the location of a cell’s nucleus, whether a cell is touching a neighboring cell, and whether it is folding or dividing at a given moment in time.

As a proof of principle, the team trained the new model to “learn” how individual cells change over time during fruit fly gastrulation.

“The overall shape of the fruit fly at this stage is roughly an ellipsoid, but there are gigantic dynamics going on at the surface during gastrulation,” Guo says. “It goes from entirely smooth to forming a number of folds at different angles. And we want to predict all of those dynamics, moment to moment, and cell by cell.”

Where and when

For their new study, the researchers applied the new model to high-quality videos of fruit fly gastrulation taken by their collaborators at the University of Michigan. The videos are one-hour recordings of developing fruit flies, taken at single-cell resolution. What’s more, the videos contain labels of individual cells’ edges and nuclei — data that are incredibly detailed and difficult to come by.

“These videos are of extremely high quality,” Yang says. “This data is very rare, where you get submicron resolution of the whole 3D volume at a pretty fast frame rate.”

The team trained the new model with data from three of four fruit fly embryo videos, such that the model might “learn” how individual cells interact and change as an embryo develops. They then tested the model on an entirely new fruit fly video, and found that it was able to predict with high accuracy how most of the embryo’s 5,000 cells changed from minute to minute.

Specifically, the model could predict properties of individual cells, such as whether they will fold, divide, or continue sharing an edge with a neighboring cell, with about 90 percent accuracy.

“We end up predicting not only whether these things will happen, but also when,” Guo says. “For instance, will this cell detach from this cell seven minutes from now, or eight? We can tell when that will happen.”

The team believes that, in principle, the new model, and the dual-graph approach, should be able to predict the cell-by-cell development of other multiceullar systems, such as more complex species, and even some human tissues and organs. The limiting factor is the availability of high-quality video data.

“From the model perspective, I think it’s ready,” Guo says. “The real bottleneck is the data. If we have good quality data of specific tissues, the model could be directly applied to predict the development of many more structures.”

This work is supported, in part, by the U.S. National Institutes of Health.