A crossroads for computing at MIT

On Vassar Street, in the heart of MIT’s campus, the MIT Stephen A. Schwarzman College of Computing recently opened the doors to its new headquarters in Building 45. The building’s central location and welcoming design will help form a new cluster of connectivity at MIT and enable the space to have a multifaceted role. 

“The college has a broad mandate for computing across MIT,” says Daniel Huttenlocher, dean of the MIT Schwarzman College of Computing and the Henry Ellis Warren Professor of Electrical Engineering and Computer Science. “The building is designed to be the computing crossroads of the campus. It’s a place to bring a mix of people together to connect, engage, and catalyze collaborations in computing, and a home to a related set of computing research groups from multiple departments and labs.”

“Computing is the defining technology of our time and it will continue to be, well into the future,” says MIT President Sally Kornbluth. “As the people of MIT make progress in high-impact fields from AI to climate, this fantastic new building will enable collaboration across computing, engineering, biological science, economics, and countless other fields, encouraging the cross-pollination of ideas that inspires us to generate fresh solutions. The college has opened its doors at just the right time.”

A physical embodiment

An approximately 178,000 square foot eight-floor structure, the building is designed to be a physical embodiment of the MIT Schwarzman College of Computing’s three-fold mission: strengthen core computer science and artificial intelligence; infuse the forefront of computing with disciplines across MIT; and advance social, ethical, and policy dimensions of computing.

Oriented for the campus community and the public to come in and engage with the college, the first two floors of the building encompass multiple convening areas, including a 60-seat classroom, a 250-seat lecture hall, and an assortment of spaces for studying and social interactions.

Academic activity has commenced in both the lecture hall and classroom this semester with 13 classes for undergraduate and graduate students. Subjects include 6.C35/6.C85 (Interactive Data Visualization and Society), a class taught by faculty from the departments of Electrical Engineering and Computer Science (EECS) and Urban Studies and Planning. The class was created as part of the Common Ground for Computing Education, a cross-cutting initiative of the college that brings multiple departments together to develop and teach new courses and launch new programs that blend computing with other disciplines.

“The new college building is catering not only to educational and research needs, but also fostering extensive community connections. It has been particularly exciting to see faculty teaching classes in the building and the lobby bustling with students on any given day, engrossed in their studies or just enjoying the space while taking a break,” says Asu Ozdaglar, deputy dean of the MIT Schwarzman College of Computing and head of EECS.

The building will also accommodate 50 computing research groups, which correspond to the number of new faculty the college is hiring — 25 in core computing positions and 25 in shared positions with departments at MIT. These groups bring together a mix of new and existing teams in related research areas spanning floors four through seven of the building.

In mid-January, the initial two dozen research groups moved into the building, including faculty from the departments of EECS; Aeronautics and Astronautics; Brain and Cognitive Sciences; Mechanical Engineering; and Economics who are affiliated with the Computer Science and Artificial Intelligence Laboratory and the Laboratory for Information and Decision Systems. The research groups form a coherent overall cluster in deep learning and generative AI, natural language processing, computer vision, robotics, reinforcement learning, game theoretic methods, and societal impact of AI.

More will follow suit, including some of the 10 faculty who have been hired into shared positions by the college with the departments of Brain and Cognitive Sciences; Chemical Engineering; Comparative Media Studies and Writing; Earth, Atmospheric and Planetary Sciences; Music and Theater Arts; Mechanical Engineering; Nuclear Science and Engineering; Political Science; and the MIT Sloan School of Management.

“I eagerly anticipate the building’s expansion of opportunities, facilitating the development of even deeper connections the college has made so far spanning all five schools,“ says Anantha Chandrakasan, chief innovation and strategy officer, dean of the School of Engineering, and the Vannevar Bush Professor of Electrical Engineering and Computer Science.

Other college programs and activities that are being supported in the building include the MIT Quest for Intelligence, Center for Computational Science and Engineering, and MIT-IBM Watson AI Lab. There are also dedicated areas for the dean’s office, as well as for the cross-cutting areas of the college — the Social and Ethical Responsibilities of Computing, Common Ground, and Special Semester Topics in Computing, a new experimental program designed to bring MIT researchers and visitors together in a common space for a semester around areas of interest.

Additional spaces include conference rooms on the third floor that are available for use by any college unit. These rooms are accessible to both residents and nonresidents of the building to host weekly group meetings or other computing-related activities.

For the MIT community at large, the building’s main event space, along with three conference rooms, is available for meetings, events, and conferences. Located eight stories high on the top floor with striking views across Cambridge and Boston and of the Great Dome, the event space is already in demand with bookings through next fall, and has quickly become a popular destination on campus.

The college inaugurated the event space over the January Independent Activities Period, welcoming students, faculty, and visitors to the building for Expanding Horizons in Computing — a weeklong series of bootcamps, workshops, short talks, panels, and roundtable discussions. Organized by various MIT faculty, the 12 sessions in the series delved into exciting areas of computing and AI, with topics ranging from security, intelligence, and deep learning to design, sustainability, and policy.

Form and function

Designed by Skidmore, Owings & Merrill, the state-of-the-art space for education, research, and collaboration took shape over four years of design and construction.

“In the design of a new multifunctional building like this, I view my job as the dean being to make sure that the building fulfills the functional needs of the college mission,” says Huttenlocher. “I think what has been most rewarding for me, now that the building is finished, is to see its form supporting its wide range of intended functions.”

In keeping with MIT’s commitment to environmental sustainability, the building is designed to meet Leadership in Energy and Environmental Design (LEED) Gold certification. The final review with the U.S. Green Building Council is tracking toward a Platinum certification.

The glass shingles on the building’s south-facing side serve a dual purpose in that they allow abundant natural light in and form a double-skin façade constructed of interlocking units that create a deep sealed cavity, which is anticipated to notably lower energy consumption.

Other sustainability features include embodied carbon tracking, on-site stormwater management, fixtures that reduce indoor potable water usage, and a large green roof. The building is also the first to utilize heat from a newly completed utilities plant built on top of Building 42, which converted conventional steam-based distributed systems into more efficient hot-water systems. This conversion significantly enhances the building’s capacity to deliver more efficient medium-temperature hot water across the entire facility.

Grand unveiling

A dedication ceremony for the building is planned for the spring.

The momentous event will mark the official completion and opening of the new building and celebrate the culmination of hard work, commitment, and collaboration in bringing it to fruition.

It will also celebrate the 2018 foundational gift that established the college from Stephen A. Schwarzman, the chair, CEO, and co-founder of Blackstone, the global asset management and financial services firm. In addition, it will acknowledge Sebastian Man ’79, SM ’80, the first donor to support the building after Schwarzman. Man’s gift will be recognized with the naming of a key space in the building that will enrich the academic and research activities of the MIT Schwarzman College of Computing and the Institute.

Growing our donated organ supply

For those in need of one, an organ transplant is a matter of life and death. 

Every year, the medical procedure gives thousands of people with advanced or end-stage diseases extended life. This “second chance” is heavily dependent on the availability, compatibility, and proximity of a precious resource that can’t be simply bought, grown, or manufactured — at least not yet.

Instead, organs must be given — cut from one body and implanted into another. And because living organ donation is only viable in certain cases, many organs are only available for donation after the donor’s death.

Unsurprisingly, the logistical and ethical complexity of distributing a limited number of transplant organs to a growing wait list of patients has received much attention. There’s an important part of the process that has received less focus, however, and which may hold significant untapped potential: organ procurement itself.

“If you have a donated organ, who should you give it to? This question has been extensively studied in operations research, economics, and even applied computer science,” says Hammaad Adam, a graduate student in the Social and Engineering Systems (SES) doctoral program at the MIT Institute for Data, Systems, and Society (IDSS). “But there’s been a lot less research on where that organ comes from in the first place.”

In the United States, nonprofits called organ procurement organizations, or OPOs, are responsible for finding and evaluating potential donors, interacting with grieving families and hospital administrations, and recovering and delivering organs — all while following the federal laws that serve as both their mandate and guardrails. Recent studies estimate that obstacles and inefficiencies lead to thousands of organs going uncollected every year, even as the demand for transplants continues to grow.

“There’s been little transparent data on organ procurement,” argues Adam. Working with MIT computer science professors Marzyeh Ghassemi and Ashia Wilson, and in collaboration with stakeholders in organ procurement, Adam led a project to create a dataset called ORCHID: Organ Retrieval and Collection of Health Information for Donation. ORCHID contains a decade of clinical, financial, and administrative data from six OPOs.

“Our goal is for the ORCHID database to have an impact in how organ procurement is understood, internally and externally,” says Ghassemi.

Efficiency and equity 

It was looking to make an impact that drew Adam to SES and MIT. With a background in applied math and experience in strategy consulting, solving problems with technical components sits right in his wheelhouse.

“I really missed challenging technical problems from a statistics and machine learning standpoint,” he says of his time in consulting. “So I went back and got a master’s in data science, and over the course of my master’s got involved in a bunch of academic research projects in a few different fields, including biology, management science, and public policy. What I enjoyed most were some of the more social science-focused projects that had immediate impact.”

As a grad student in SES, Adam’s research focuses on using statistical tools to uncover health-care inequities, and developing machine learning approaches to address them. “Part of my dissertation research focuses on building tools that can improve equity in clinical trials and other randomized experiments,” he explains.

One recent example of Adam’s work: developing a novel method to stop clinical trials early if the treatment has an unintended harmful effect for a minority group of participants. “I’ve also been thinking about ways to increase minority representation in clinical trials through improved patient recruitment,” he adds.

Racial inequities in health care extend into organ transplantation, where a majority of wait-listed patients are not white — far in excess of their demographic groups’ proportion to the overall population. There are fewer organ donations from many of these communities, due to various obstacles in need of better understanding if they are to be overcome. 

“My work in organ transplantation began on the allocation side,” explains Adam. “In work under review, we examined the role of race in the acceptance of heart, liver, and lung transplant offers by physicians on behalf of their patients. We found that Black race of the patient was associated with significantly lower odds of organ offer acceptance — in other words, transplant doctors seemed more likely to turn down organs offered to Black patients. This trend may have multiple explanations, but it is nevertheless concerning.”

Adam’s research has also found that donor-candidate race match was associated with significantly higher odds of offer acceptance, an association that Adam says “highlights the importance of organ donation from racial minority communities, and has motivated our work on equitable organ procurement.”

Working with Ghassemi through the IDSS Initiative on Combatting Systemic Racism, Adam was introduced to OPO stakeholders looking to collaborate. “It’s this opportunity to impact not only health-care efficiency, but also health-care equity, that really got me interested in this research,” says Adam.

Making an impact

Creating a database like ORCHID means solving problems in multiple domains, from the technical to the political. Some efforts never overcome the first step: getting data in the first place. Thankfully, several OPOs were already seeking collaborations and looking to improve their performance.

“We have been lucky to have a strong partnership with the OPOs, and we hope to work together to find important insights to improve efficiency and equity,” says Ghassemi.

The value of a database like ORCHID is in its potential for generating new insights, especially through quantitative analysis with statistics and computing tools like machine learning. The potential value in ORCHID was recognized with an MIT Prize for Open Data, an MIT Libraries award highlighting the importance and impact of research data that is openly shared.

“It’s nice that the work got some recognition,” says Adam of the prize. “And it was cool to see some of the other great open data work that’s happening at MIT. I think there’s real impact in releasing publicly available data in an important and understudied domain.”

All the same, Adam knows that building the database is only the first step.

“I’m very interested in understanding the bottlenecks in the organ procurement process,” he explains. “As part of my thesis research, I’m exploring this by modeling OPO decision-making using causal inference and structural econometrics.”

Using insights from this research, Adam also aims to evaluate policy changes that can improve both equity and efficiency in organ procurement. “And we’re hoping to recruit more OPOs, and increase the amount of data we’re releasing,” he says. “The dream state is every OPO joins our collaboration and provides updated data every year.”

Adam is excited to see how other researchers might use the data to address inefficiencies in organ procurement. “Every organ donor saves between three and four lives,” he says. “So every research project that comes out of this dataset could make a real impact.”

New AI method captures uncertainty in medical images

In biomedicine, segmentation involves annotating pixels from an important structure in a medical image, like an organ or cell. Artificial intelligence models can help clinicians by highlighting pixels that may show signs of a certain disease or anomaly.

However, these models typically only provide one answer, while the problem of medical image segmentation is often far from black and white. Five expert human annotators might provide five different segmentations, perhaps disagreeing on the existence or extent of the borders of a nodule in a lung CT image.

“Having options can help in decision-making. Even just seeing that there is uncertainty in a medical image can influence someone’s decisions, so it is important to take this uncertainty into account,” says Marianne Rakic, an MIT computer science PhD candidate.

Rakic is lead author of a paper with others at MIT, the Broad Institute of MIT and Harvard, and Massachusetts General Hospital that introduces a new AI tool that can capture the uncertainty in a medical image.

Known as Tyche (named for the Greek divinity of chance), the system provides multiple plausible segmentations that each highlight slightly different areas of a medical image. A user can specify how many options Tyche outputs and select the most appropriate one for their purpose.

Importantly, Tyche can tackle new segmentation tasks without needing to be retrained. Training is a data-intensive process that involves showing a model many examples and requires extensive machine-learning experience.

Because it doesn’t need retraining, Tyche could be easier for clinicians and biomedical researchers to use than some other methods. It could be applied “out of the box” for a variety of tasks, from identifying lesions in a lung X-ray to pinpointing anomalies in a brain MRI.

Ultimately, this system could improve diagnoses or aid in biomedical research by calling attention to potentially crucial information that other AI tools might miss.

“Ambiguity has been understudied. If your model completely misses a nodule that three experts say is there and two experts say is not, that is probably something you should pay attention to,” adds senior author Adrian Dalca, an assistant professor at Harvard Medical School and MGH, and a research scientist in the MIT Computer Science and Artificial Intelligence Laboratory (CSAIL).

Their co-authors include Hallee Wong, a graduate student in electrical engineering and computer science; Jose Javier Gonzalez Ortiz PhD ’23; Beth Cimini, associate director for bioimage analysis at the Broad Institute; and John Guttag, the Dugald C. Jackson Professor of Computer Science and Electrical Engineering. Rakic will present Tyche at the IEEE Conference on Computer Vision and Pattern Recognition, where Tyche has been selected as a highlight.

Addressing ambiguity

AI systems for medical image segmentation typically use neural networks. Loosely based on the human brain, neural networks are machine-learning models comprising many interconnected layers of nodes, or neurons, that process data.

After speaking with collaborators at the Broad Institute and MGH who use these systems, the researchers realized two major issues limit their effectiveness. The models cannot capture uncertainty and they must be retrained for even a slightly different segmentation task.

Some methods try to overcome one pitfall, but tackling both problems with a single solution has proven especially tricky, Rakic says. 

“If you want to take ambiguity into account, you often have to use an extremely complicated model. With the method we propose, our goal is to make it easy to use with a relatively small model so that it can make predictions quickly,” she says.

The researchers built Tyche by modifying a straightforward neural network architecture.

A user first feeds Tyche a few examples that show the segmentation task. For instance, examples could include several images of lesions in a heart MRI that have been segmented by different human experts so the model can learn the task and see that there is ambiguity.

The researchers found that just 16 example images, called a “context set,” is enough for the model to make good predictions, but there is no limit to the number of examples one can use. The context set enables Tyche to solve new tasks without retraining.

For Tyche to capture uncertainty, the researchers modified the neural network so it outputs multiple predictions based on one medical image input and the context set. They adjusted the network’s layers so that, as data move from layer to layer, the candidate segmentations produced at each step can “talk” to each other and the examples in the context set.

In this way, the model can ensure that candidate segmentations are all a bit different, but still solve the task.

“It is like rolling dice. If your model can roll a two, three, or four, but doesn’t know you have a two and a four already, then either one might appear again,” she says.

They also modified the training process so it is rewarded by maximizing the quality of its best prediction.

If the user asked for five predictions, at the end they can see all five medical image segmentations Tyche produced, even though one might be better than the others.

The researchers also developed a version of Tyche that can be used with an existing, pretrained model for medical image segmentation. In this case, Tyche enables the model to output multiple candidates by making slight transformations to images.

Better, faster predictions

When the researchers tested Tyche with datasets of annotated medical images, they found that its predictions captured the diversity of human annotators, and that its best predictions were better than any from the baseline models. Tyche also performed faster than most models.

“Outputting multiple candidates and ensuring they are different from one another really gives you an edge,” Rakic says.

The researchers also saw that Tyche could outperform more complex models that have been trained using a large, specialized dataset.

For future work, they plan to try using a more flexible context set, perhaps including text or multiple types of images. In addition, they want to explore methods that could improve Tyche’s worst predictions and enhance the system so it can recommend the best segmentation candidates.

This research is funded, in part, by the National Institutes of Health, the Eric and Wendy Schmidt Center at the Broad Institute of MIT and Harvard, and Quanta Computer.

A faster, better way to prevent an AI chatbot from giving toxic responses

A user could ask ChatGPT to write a computer program or summarize an article, and the AI chatbot would likely be able to generate useful code or write a cogent synopsis. However, someone could also ask for instructions to build a bomb, and the chatbot might be able to provide those, too.

To prevent this and other safety issues, companies that build large language models typically safeguard them using a process called red-teaming. Teams of human testers write prompts aimed at triggering unsafe or toxic text from the model being tested. These prompts are used to teach the chatbot to avoid such responses.

But this only works effectively if engineers know which toxic prompts to use. If human testers miss some prompts, which is likely given the number of possibilities, a chatbot regarded as safe might still be capable of generating unsafe answers.

Researchers from Improbable AI Lab at MIT and the MIT-IBM Watson AI Lab used machine learning to improve red-teaming. They developed a technique to train a red-team large language model to automatically generate diverse prompts that trigger a wider range of undesirable responses from the chatbot being tested.

They do this by teaching the red-team model to be curious when it writes prompts, and to focus on novel prompts that evoke toxic responses from the target model.

The technique outperformed human testers and other machine-learning approaches by generating more distinct prompts that elicited increasingly toxic responses. Not only does their method significantly improve the coverage of inputs being tested compared to other automated methods, but it can also draw out toxic responses from a chatbot that had safeguards built into it by human experts.

“Right now, every large language model has to undergo a very lengthy period of red-teaming to ensure its safety. That is not going to be sustainable if we want to update these models in rapidly changing environments. Our method provides a faster and more effective way to do this quality assurance,” says Zhang-Wei Hong, an electrical engineering and computer science (EECS) graduate student in the Improbable AI lab and lead author of a paper on this red-teaming approach.

Hong’s co-authors include EECS graduate students Idan Shenfield, Tsun-Hsuan Wang, and Yung-Sung Chuang; Aldo Pareja and Akash Srivastava, research scientists at the MIT-IBM Watson AI Lab; James Glass, senior research scientist and head of the Spoken Language Systems Group in the Computer Science and Artificial Intelligence Laboratory (CSAIL); and senior author Pulkit Agrawal, director of Improbable AI Lab and an assistant professor in CSAIL. The research will be presented at the International Conference on Learning Representations.

Automated red-teaming 

Large language models, like those that power AI chatbots, are often trained by showing them enormous amounts of text from billions of public websites. So, not only can they learn to generate toxic words or describe illegal activities, the models could also leak personal information they may have picked up.

The tedious and costly nature of human red-teaming, which is often ineffective at generating a wide enough variety of prompts to fully safeguard a model, has encouraged researchers to automate the process using machine learning.

Such techniques often train a red-team model using reinforcement learning. This trial-and-error process rewards the red-team model for generating prompts that trigger toxic responses from the chatbot being tested.

But due to the way reinforcement learning works, the red-team model will often keep generating a few similar prompts that are highly toxic to maximize its reward.

For their reinforcement learning approach, the MIT researchers utilized a technique called curiosity-driven exploration. The red-team model is incentivized to be curious about the consequences of each prompt it generates, so it will try prompts with different words, sentence patterns, or meanings.

“If the red-team model has already seen a specific prompt, then reproducing it will not generate any curiosity in the red-team model, so it will be pushed to create new prompts,” Hong says.

During its training process, the red-team model generates a prompt and interacts with the chatbot. The chatbot responds, and a safety classifier rates the toxicity of its response, rewarding the red-team model based on that rating.

Rewarding curiosity

The red-team model’s objective is to maximize its reward by eliciting an even more toxic response with a novel prompt. The researchers enable curiosity in the red-team model by modifying the reward signal in the reinforcement learning set up.

First, in addition to maximizing toxicity, they include an entropy bonus that encourages the red-team model to be more random as it explores different prompts. Second, to make the agent curious they include two novelty rewards. One rewards the model based on the similarity of words in its prompts, and the other rewards the model based on semantic similarity. (Less similarity yields a higher reward.)

To prevent the red-team model from generating random, nonsensical text, which can trick the classifier into awarding a high toxicity score, the researchers also added a naturalistic language bonus to the training objective.

With these additions in place, the researchers compared the toxicity and diversity of responses their red-team model generated with other automated techniques. Their model outperformed the baselines on both metrics.

They also used their red-team model to test a chatbot that had been fine-tuned with human feedback so it would not give toxic replies. Their curiosity-driven approach was able to quickly produce 196 prompts that elicited toxic responses from this “safe” chatbot.

“We are seeing a surge of models, which is only expected to rise. Imagine thousands of models or even more and companies/labs pushing model updates frequently. These models are going to be an integral part of our lives and it’s important that they are verified before released for public consumption. Manual verification of models is simply not scalable, and our work is an attempt to reduce the human effort to ensure a safer and trustworthy AI future,” says Agrawal.  

In the future, the researchers want to enable the red-team model to generate prompts about a wider variety of topics. They also want to explore the use of a large language model as the toxicity classifier. In this way, a user could train the toxicity classifier using a company policy document, for instance, so a red-team model could test a chatbot for company policy violations.

“If you are releasing a new AI model and are concerned about whether it will behave as expected, consider using curiosity-driven red-teaming,” says Agrawal.

This research is funded, in part, by Hyundai Motor Company, Quanta Computer Inc., the MIT-IBM Watson AI Lab, an Amazon Web Services MLRA research grant, the U.S. Army Research Office, the U.S. Defense Advanced Research Projects Agency Machine Common Sense Program, the U.S. Office of Naval Research, the U.S. Air Force Research Laboratory, and the U.S. Air Force Artificial Intelligence Accelerator.

When an antibiotic fails: MIT scientists are using AI to target “sleeper” bacteria

Since the 1970s, modern antibiotic discovery has been experiencing a lull. Now the World Health Organization has declared the antimicrobial resistance crisis as one of the top 10 global public health threats. 

When an infection is treated repeatedly, clinicians run the risk of bacteria becoming resistant to the antibiotics. But why would an infection return after proper antibiotic treatment? One well-documented possibility is that the bacteria are becoming metabolically inert, escaping detection of traditional antibiotics that only respond to metabolic activity. When the danger has passed, the bacteria return to life and the infection reappears.  

“Resistance is happening more over time, and recurring infections are due to this dormancy,” says Jackie Valeri, a former MIT-Takeda Fellow (centered within the MIT Abdul Latif Jameel Clinic for Machine Learning in Health) who recently earned her PhD in biological engineering from the Collins Lab. Valeri is the first author of a new paper published in this month’s print issue of Cell Chemical Biology that demonstrates how machine learning could help screen compounds that are lethal to dormant bacteria. 

Tales of bacterial “sleeper-like” resilience are hardly news to the scientific community — ancient bacterial strains dating back to 100 million years ago have been discovered in recent years alive in an energy-saving state on the seafloor of the Pacific Ocean. 

MIT Jameel Clinic’s Life Sciences faculty lead James J. Collins, a Termeer Professor of Medical Engineering and Science in MIT’s Institute for Medical Engineering and Science and Department of Biological Engineering, recently made headlines for using AI to discover a new class of antibiotics, which is part of the group’s larger mission to use AI to dramatically expand the existing antibiotics available. 

According to a paper published by The Lancet, in 2019, 1.27 million deaths could have been prevented had the infections been susceptible to drugs, and one of many challenges researchers are up against is finding antibiotics that are able to target metabolically dormant bacteria. 

In this case, researchers in the Collins Lab employed AI to speed up the process of finding antibiotic properties in known drug compounds. With millions of molecules, the process can take years, but researchers were able to identify a compound called semapimod over a weekend, thanks to AI’s ability to perform high-throughput screening.

An anti-inflammatory drug typically used for Crohn’s disease, researchers discovered that semapimod was also effective against stationary-phase Escherichia coli and Acinetobacter baumannii

Another revelation was semapimod’s ability to disrupt the membranes of so-called “Gram-negative” bacteria, which are known for their high intrinsic resistance to antibiotics due to their thicker, less-penetrable outer membrane. 

Examples of Gram-negative bacteria include E. coli, A. baumannii, Salmonella, and Pseudomonis, all of which are challenging to find new antibiotics for. 

“One of the ways we figured out the mechanism of sema [sic] was that its structure was really big, and it reminded us of other things that target the outer membrane,” Valeri explains. “When you start working with a lot of small molecules … to our eyes, it’s a pretty unique structure.” 

By disrupting a component of the outer membrane, semapimod sensitizes Gram-negative bacteria to drugs that are typically only active against Gram-positive bacteria. 

Valeri recalls a quote from a 2013 paper published in Trends Biotechnology: “For Gram-positive infections, we need better drugs, but for Gram-negative infections we need any drugs.” 

Researchers create “The Consensus Game” to elevate AI’s text comprehension and generation skills

Imagine you and a friend are playing a game where your goal is to communicate secret messages to each other using only cryptic sentences. Your friend’s job is to guess the secret message behind your sentences. Sometimes, you give clues directly, and other times, your friend has to guess the message by asking yes-or-no questions about the clues you’ve given. The challenge is, both of you want to make sure you’re understanding each other correctly and agreeing on the secret message.

MIT Computer Science and Artificial Intelligence Laboratory (CSAIL) researchers have created a similar “game” to help improve how AI understands and generates text. The “Consensus Game” involves two parts of an AI system — one part tries to generate sentences (like giving clues), and the other part tries to understand and evaluate those sentences (like guessing the secret message).

The researchers discovered that by treating this interaction as a game, where both parts of the AI work together under specific rules to agree on the right message, they could significantly improve the AI’s ability to give correct and coherent answers to questions. They tested this new game-like approach on a variety of tasks, such as reading comprehension, solving math problems, and carrying on conversations, and found that it helped the AI perform better across the board.

Traditionally, language models (LMs) answer one of two ways: generating answers directly from the model (generative querying) or using the model to score a set of predefined answers (discriminative querying), which can lead to differing and sometimes incompatible results. With the generative approach, “Who is the President of the United States?” might yield a straightforward answer like “Joe Biden.” However, a discriminative query could incorrectly dispute this fact when evaluating the same answer, such as “Barack Obama.”

So, how do we reconcile mutually incompatible scoring procedures to achieve coherent, efficient predictions? 

“Imagine a new way to help language models understand and generate text, like a game. We’ve developed a training-free, game-theoretic method that treats the whole process as a complex game of clues and signals, where a generator tries to send the right message to a discriminator using natural language. Instead of chess pieces, they’re using words and sentences,” says MIT CSAIL PhD student Athul Jacob. “Our way to navigate this game is finding the ‚approximate equilibria,‘ leading to a new decoding algorithm called ‚Equilibrium Ranking.‘ It’s a pretty exciting demonstration of how bringing game-theoretic strategies into the mix can tackle some big challenges in making language models more reliable and consistent.”

When tested across many tasks, like reading comprehension, commonsense reasoning, math problem-solving, and dialogue, the team’s algorithm consistently improved how well these models performed. Using the ER algorithm with the LLaMA-7B model even outshone the results from much larger models. “Given that they are already competitive, that people have been working on it for a while, but the level of improvements we saw being able to outperform a model that’s 10 times the size was a pleasant surprise,” says Jacob. 

Game on

Diplomacy, a strategic board game set in pre-World War I Europe, where players negotiate alliances, betray friends, and conquer territories without the use of dice — relying purely on skill, strategy, and interpersonal manipulation — recently had a second coming. In November 2022, computer scientists, including Jacob,  developed “Cicero,” an AI agent that achieves human-level capabilities in the mixed-motive seven-player game, which requires the same aforementioned skills, but with natural language. The math behind this partially inspired The Consensus Game. 

While the history of AI agents long predates when OpenAI’s software entered the chat (and never looked back) in November 2022, it’s well documented that they can still cosplay as your well-meaning, yet pathological friend. 

The Consensus Game system reaches equilibrium as an agreement, ensuring accuracy and fidelity to the model’s original insights. To achieve this, the method iteratively adjusts the interactions between the generative and discriminative components until they reach a consensus on an answer that accurately reflects reality and aligns with their initial beliefs. This approach effectively bridges the gap between the two querying methods. 

In practice, implementing the Consensus Game approach to language model querying, especially for question-answering tasks, does involve significant computational challenges. For example, when using datasets like MMLU, which have thousands of questions and multiple-choice answers, the model must apply the mechanism to each query. Then, it must reach a consensus between the generative and discriminative components for every question and its possible answers. 

The system did struggle with a grade school right of passage: math word problems. It couldn’t generate wrong answers, which is a critical component of understanding the process of coming up with the right one. 

“The last few years have seen really impressive progress in both strategic decision-making and language generation from AI systems, but we’re just starting to figure out how to put the two together. Equilibrium ranking is a first step in this direction, but I think there’s a lot we’ll be able to do to scale this up to more complex problems.”   

An avenue of future work involves enhancing the base model by integrating the outputs of the current method. This is particularly promising since it can yield more factual and consistent answers across various tasks, including factuality and open-ended generation. The potential for such a method to significantly improve the base model’s performance is high, which could result in more reliable and factual outputs from ChatGPT and similar language models that people use daily. 

“Even though modern language models, such as ChatGPT and Gemini, have led to solving various tasks through chat interfaces, the statistical decoding process that generates a response from such models has remained unchanged for decades,” says Google research scientist Ahmad Beirami. “The proposal by the MIT researchers is an innovative game-theoretic framework for decoding from language models through solving the equilibrium of a consensus game. The significant performance gains reported in the research paper are promising, opening the door to a potential paradigm shift in language model decoding that may fuel a flurry of new applications.”

Jacob wrote the paper with MIT-IBM Watson Lab researcher Yikang Shen and MIT Department of Electrical Engineering and Computer Science assistant professors Gabriele Farina and Jacob Andreas, who is also a CSAIL member. They will present their work at the International Conference on Learning Representations (ICLR) this May. The research received a “best paper award” at the NeurIPS R0-FoMo Workshop in December and it will also be highlighted as a „spotlight paper“ at ICLR.

A first-ever complete map for elastic strain engineering

Without a map, it can be just about impossible to know not just where you are, but where you’re going, and that’s especially true when it comes to materials properties.

For decades, scientists have understood that while bulk materials behave in certain ways, those rules can break down for materials at the micro- and nano-scales, and often in surprising ways. One of those surprises was the finding that, for some materials, applying even modest strains — a concept known as elastic strain engineering — on materials can dramatically improve certain properties, provided those strains stay elastic and do not relax away by plasticity, fracture, or phase transformations. Micro- and nano-scale materials are especially good at holding applied strains in the elastic form.

Precisely how to apply those elastic strains (or equivalently, residual stress) to achieve certain material properties, however, had been less clear — until recently.

Using a combination of first principles calculations and machine learning, a team of MIT researchers has developed the first-ever map of how to tune crystalline materials to produce specific thermal and electronic properties.

Led by Ju Li, the Battelle Energy Alliance Professor in Nuclear Engineering and professor of materials science and engineering, the team described a framework for understanding precisely how changing the elastic strains on a material can fine-tune properties like thermal and electrical conductivity. The work is described in an open-access paper published in PNAS.

“For the first time, by using machine learning, we’ve been able to delineate the complete six-dimensional boundary of ideal strength, which is the upper limit to elastic strain engineering, and create a map for these electronic and phononic properties,” Li says. “We can now use this approach to explore many other materials. Traditionally, people create new materials by changing the chemistry.”

“For example, with a ternary alloy, you can change the percentage of two elements, so you have two degrees of freedom,” he continues. “What we’ve shown is that diamond, with just one element, is equivalent to a six-component alloy, because you have six degrees of elastic strain freedom you can tune independently.”

Small strains, big material benefits

The paper builds on a foundation laid as far back as the 1980s, when researchers first discovered that the performance of semiconductor materials doubled when a small — just 1 percent — elastic strain was applied to the material.

While that discovery was quickly commercialized by the semiconductor industry and today is used to increase the performance of microchips in everything from laptops to cellphones, that level of strain is very small compared to what we can achieve now, says Subra Suresh, the Vannevar Bush Professor of Engineering Emeritus.

In a 2018 Science paper, Suresh, Dao, and colleagues demonstrated that 1 percent strain was just the tip of the iceberg.

As part of a 2018 study, Suresh and colleagues demonstrated for the first time that diamond nanoneedles could withstand elastic strains of as much as 9 percent and still return to their original state. Later on, several groups independently confirmed that microscale diamond can indeed elastically deform by approximately 7 percent in tension reversibly.

“Once we showed we could bend nanoscale diamonds and create strains on the order of 9 or 10 percent, the question was, what do you do with it,” Suresh says. “It turns out diamond is a very good semiconductor material … and one of our questions was, if we can mechanically strain diamond, can we reduce the band gap from 5.6 electron-volts to two or three? Or can we get it all the way down to zero, where it begins to conduct like a metal?”

To answer those questions, the team first turned to machine learning in an effort to get a more precise picture of exactly how strain altered material properties.

“Strain is a big space,” Li explains. “You can have tensile strain, you can have shear strain in multiple directions, so it’s a six-dimensional space, and the phonon band is three-dimensional, so in total there are nine tunable parameters. So, we’re using machine learning, for the first time, to create a complete map for navigating the electronic and phononic properties and identify the boundaries.”

Armed with that map, the team subsequently demonstrated how strain could be used to dramatically alter diamond’s semiconductor properties.

“Diamond is like the Mt. Everest of electronic materials,” Li says, “because it has very high thermal conductivity, very high dielectric breakdown strengths, a very big carrier mobility. What we have shown is we can controllably squish Mt. Everest down … so we show that by strain engineering you can either improve diamond’s thermal conductivity by a factor of two, or make it much worse by a factor of 20.”

New map, new applications

Going forward, the findings could be used to explore a host of exotic material properties, Li says, from dramatically reduced thermal conductivity to superconductivity.

“Experimentally, these properties are already accessible with nanoneedles and even microbridges,” he says. “And we have seen exotic properties, like reducing diamond’s (thermal conductivity) to only a few hundred watts per meter-Kelvin. Recently, people have shown that you can produce room-temperature superconductors with hydrides if you squeeze them to a few hundred gigapascals, so we have found all kinds of exotic behavior once we have the map.”

The results could also influence the design of next-generation computer chips capable of running much faster and cooler than today’s processors, as well as quantum sensors and communication devices. As the semiconductor manufacturing industry moves to denser and denser architectures, Suresh says the ability to tune a material’s thermal conductivity will be particularly important for heat dissipation.

While the paper could inform the design of future generations of microchips, Zhe Shi, a postdoc in Li’s lab and first author of the paper, says more work will be needed before those chips find their way into the average laptop or cellphone.

“We know that 1 percent strain can give you an order of magnitude increase in the clock speed of your CPU,” Shi says. “There are a lot of manufacturing and device problems that need to be solved in order for this to become realistic, but I think it’s definitely a great start. It’s an exciting beginning to what could lead to significant strides in technology.”

This work was supported with funding from the Defense Threat Reduction Agency, an NSF Graduate Research Fellowship, the Nanyang Technological University School of Biological Sciences, the National Science Foundation (NSF), the MIT Vannevar Bush Professorship, and a Nanyang Technological University Distinguished University Professorship.

Second round of seed grants awarded to MIT scholars studying the impact and applications of generative AI

Last summer, MIT President Sally Kornbluth and Provost Cynthia Barnhart issued a call for papers to “articulate effective roadmaps, policy recommendations, and calls for action across the broad domain of generative AI.” The response to the call far exceeded expectations with 75 proposals submitted. Of those, 27 proposals were selected for seed funding.

In light of this enthusiastic response, Kornbluth and Barnhart announced a second call for proposals this fall.

“The groundswell of interest and the caliber of the ideas overall made clear that a second round was in order,” they said in their email to MIT’s research community this fall. This second call for proposals resulted in 53 submissions.

Following the second call, the faculty committee from the first round considered the proposals and selected 16 proposals to receive exploratory funding. Co-authored by interdisciplinary teams of faculty and researchers affiliated with all five of the Institute’s schools and the MIT Schwarzman College of Computing, the proposals offer insights and perspectives on the potential impact and applications of generative AI across a broad range of topics and disciplines.

Each selected research group will receive between $50,000 and $70,000 to create 10-page impact papers. Those papers will be shared widely via a publication venue managed and hosted by the MIT Press under the auspices of the MIT Open Publishing Services program.

As with the first round of papers, Thomas Tull, a member of the MIT School of Engineering Dean’s Advisory Council and a former innovation scholar at the School of Engineering, contributed funding to support the effort.

The selected papers are:

“A Road-map for End-to-end Privacy and Verifiability in Generative AI,” led by Alex Pentland, Srini Devadas, Lalana Kagal, and Vinod Vaikuntanathan;
“A Virtuous Cycle: Generative AI and Discovery in the Physical Sciences,” led by Philip Harris and Phiala Shanahan;
“Artificial Cambrian Intelligence: Generating New Forms of Visual Intelligence,” led by Ramesh Raskar and Tomaso A. Poggio;
“Artificial Fictions and the Value of AI-Generated Art,” led by Justin Khoo;
“GenAI for Improving Human-to-human Interactions with a Focus on Negotiations,” led by Lawrence Susskind;
“Generative AI as a New Applications Platform and Ecosystem,” led by Michael Cusumano;
“Generative AI for Cities: A Civic Engagement Playbook,” led by Sarah Williams, Sara Beery, and Eden Medina;
“Generative AI for Textile Engineering: Advanced Materials from Heritage Lace Craft,” led by Svetlana V. Boriskina;
“Generative AI Impact for Biomedical Innovation and Drug Discovery,” led by Manolis Kellis, Brad Pentelute, and Marinka Zitnik;
“Impact of Generative AI on the Creative Economy,” led by Ashia Wilson and Dylan Hadfield-Menell;
“Redefining Virtuosity: The Role of Generative AI in Live Music Performances,” led by Joseph A. Paradiso and Eran Egozy;
“Reflection-based Learning with Generative AI,” led by Stefanie Mueller;
“Robust and Reliable Systems for Generative AI,” led by Shafi Goldwasser, Yael Kalai, and Vinod Vaikuntanathan;
“Supporting the Aging Population with Generative AI,” led by Pattie Maes;
“The Science of Language in the Era of Generative AI,” led by Danny Fox, Yoon Kim, and Roger Levy; and
“Visual Artists, Technological Shock, and Generative AI,” led by Caroline Jones and Huma Gupta.

MIT-derived algorithm helps forecast the frequency of extreme weather

To assess a community’s risk of extreme weather, policymakers rely first on global climate models that can be run decades, and even centuries, forward in time, but only at a coarse resolution. These models might be used to gauge, for instance, future climate conditions for the northeastern U.S., but not specifically for Boston.

To estimate Boston’s future risk of extreme weather such as flooding, policymakers can combine a coarse model’s large-scale predictions with a finer-resolution model, tuned to estimate how often Boston is likely to experience damaging floods as the climate warms. But this risk analysis is only as accurate as the predictions from that first, coarser climate model.

“If you get those wrong for large-scale environments, then you miss everything in terms of what extreme events will look like at smaller scales, such as over individual cities,” says Themistoklis Sapsis, the William I. Koch Professor and director of the Center for Ocean Engineering in MIT’s Department of Mechanical Engineering.

Sapsis and his colleagues have now developed a method to “correct” the predictions from coarse climate models. By combining machine learning with dynamical systems theory, the team’s approach “nudges” a climate model’s simulations into more realistic patterns over large scales. When paired with smaller-scale models to predict specific weather events such as tropical cyclones or floods, the team’s approach produced more accurate predictions for how often specific locations will experience those events over the next few decades, compared to predictions made without the correction scheme.

Sapsis says the new correction scheme is general in form and can be applied to any global climate model. Once corrected, the models can help to determine where and how often extreme weather will strike as global temperatures rise over the coming years. 

“Climate change will have an effect on every aspect of human life, and every type of life on the planet, from biodiversity to food security to the economy,” Sapsis says. “If we have capabilities to know accurately how extreme weather will change, especially over specific locations, it can make a lot of difference in terms of preparation and doing the right engineering to come up with solutions. This is the method that can open the way to do that.”

The team’s results appear today in the Journal of Advances in Modeling Earth Systems. The study’s MIT co-authors include postdoc Benedikt Barthel Sorensen and Alexis-Tzianni Charalampopoulos SM ’19, PhD ’23, with Shixuan Zhang, Bryce Harrop, and Ruby Leung of the Pacific Northwest National Laboratory in Washington state.

Over the hood

Today’s large-scale climate models simulate weather features such as the average temperature, humidity, and precipitation around the world, on a grid-by-grid basis. Running simulations of these models takes enormous computing power, and in order to simulate how weather features will interact and evolve over periods of decades or longer, models average out features every 100 kilometers or so.

“It’s a very heavy computation requiring supercomputers,” Sapsis notes. “But these models still do not resolve very important processes like clouds or storms, which occur over smaller scales of a kilometer or less.”

To improve the resolution of these coarse climate models, scientists typically have gone under the hood to try and fix a model’s underlying dynamical equations, which describe how phenomena in the atmosphere and oceans should physically interact.

“People have tried to dissect into climate model codes that have been developed over the last 20 to 30 years, which is a nightmare, because you can lose a lot of stability in your simulation,” Sapsis explains. “What we’re doing is a completely different approach, in that we’re not trying to correct the equations but instead correct the model’s output.”

The team’s new approach takes a model’s output, or simulation, and overlays an algorithm that nudges the simulation toward something that more closely represents real-world conditions. The algorithm is based on a machine-learning scheme that takes in data, such as past information for temperature and humidity around the world, and learns associations within the data that represent fundamental dynamics among weather features. The algorithm then uses these learned associations to correct a model’s predictions.

“What we’re doing is trying to correct dynamics, as in how an extreme weather feature, such as the windspeeds during a Hurricane Sandy event, will look like in the coarse model, versus in reality,” Sapsis says. “The method learns dynamics, and dynamics are universal. Having the correct dynamics eventually leads to correct statistics, for example, frequency of rare extreme events.”

Climate correction

As a first test of their new approach, the team used the machine-learning scheme to correct simulations produced by the Energy Exascale Earth System Model (E3SM), a climate model run by the U.S. Department of Energy, that simulates climate patterns around the world at a resolution of 110 kilometers. The researchers used eight years of past data for temperature, humidity, and wind speed to train their new algorithm, which learned dynamical associations between the measured weather features and the E3SM model. They then ran the climate model forward in time for about 36 years and applied the trained algorithm to the model’s simulations. They found that the corrected version produced climate patterns that more closely matched real-world observations from the last 36 years, not used for training.

“We’re not talking about huge differences in absolute terms,” Sapsis says. “An extreme event in the uncorrected simulation might be 105 degrees Fahrenheit, versus 115 degrees with our corrections. But for humans experiencing this, that is a big difference.”

When the team then paired the corrected coarse model with a specific, finer-resolution model of tropical cyclones, they found the approach accurately reproduced the frequency of extreme storms in specific locations around the world.

“We now have a coarse model that can get you the right frequency of events, for the present climate. It’s much more improved,” Sapsis says. “Once we correct the dynamics, this is a relevant correction, even when you have a different average global temperature, and it can be used for understanding how forest fires, flooding events, and heat waves will look in a future climate. Our ongoing work is focusing on analyzing future climate scenarios.”

“The results are particularly impressive as the method shows promising results on E3SM, a state-of-the-art climate model,” says Pedram Hassanzadeh, an associate professor who leads the Climate Extremes Theory and Data group at the University of Chicago and was not involved with the study. “It would be interesting to see what climate change projections this framework yields once future greenhouse-gas emission scenarios are incorporated.”

This work was supported, in part, by the U.S. Defense Advanced Research Projects Agency.

Engineering household robots to have a little common sense

From wiping up spills to serving up food, robots are being taught to carry out increasingly complicated household tasks. Many such home-bot trainees are learning through imitation; they are programmed to copy the motions that a human physically guides them through.

It turns out that robots are excellent mimics. But unless engineers also program them to adjust to every possible bump and nudge, robots don’t necessarily know how to handle these situations, short of starting their task from the top.

Now MIT engineers are aiming to give robots a bit of common sense when faced with situations that push them off their trained path. They’ve developed a method that connects robot motion data with the “common sense knowledge” of large language models, or LLMs.

Their approach enables a robot to logically parse many given household task into subtasks, and to physically adjust to disruptions within a subtask so that the robot can move on without having to go back and start a task from scratch — and without engineers having to explicitly program fixes for every possible failure along the way.   

“Imitation learning is a mainstream approach enabling household robots. But if a robot is blindly mimicking a human’s motion trajectories, tiny errors can accumulate and eventually derail the rest of the execution,” says Yanwei Wang, a graduate student in MIT’s Department of Electrical Engineering and Computer Science (EECS). “With our method, a robot can self-correct execution errors and improve overall task success.”

Wang and his colleagues detail their new approach in a study they will present at the International Conference on Learning Representations (ICLR) in May. The study’s co-authors include EECS graduate students Tsun-Hsuan Wang and Jiayuan Mao, Michael Hagenow, a postdoc in MIT’s Department of Aeronautics and Astronautics (AeroAstro), and Julie Shah, the H.N. Slater Professor in Aeronautics and Astronautics at MIT.

Language task

The researchers illustrate their new approach with a simple chore: scooping marbles from one bowl and pouring them into another. To accomplish this task, engineers would typically move a robot through the motions of scooping and pouring — all in one fluid trajectory. They might do this multiple times, to give the robot a number of human demonstrations to mimic.

“But the human demonstration is one long, continuous trajectory,” Wang says.

The team realized that, while a human might demonstrate a single task in one go, that task depends on a sequence of subtasks, or trajectories. For instance, the robot has to first reach into a bowl before it can scoop, and it must scoop up marbles before moving to the empty bowl, and so forth. If a robot is pushed or nudged to make a mistake during any of these subtasks, its only recourse is to stop and start from the beginning, unless engineers were to explicitly label each subtask and program or collect new demonstrations for the robot to recover from the said failure, to enable a robot to self-correct in the moment.

“That level of planning is very tedious,” Wang says.

Instead, he and his colleagues found some of this work could be done automatically by LLMs. These deep learning models process immense libraries of text, which they use to establish connections between words, sentences, and paragraphs. Through these connections, an LLM can then generate new sentences based on what it has learned about the kind of word that is likely to follow the last.

For their part, the researchers found that in addition to sentences and paragraphs, an LLM can be prompted to produce a logical list of subtasks that would be involved in a given task. For instance, if queried to list the actions involved in scooping marbles from one bowl into another, an LLM might produce a sequence of verbs such as “reach,” “scoop,” “transport,” and “pour.”

“LLMs have a way to tell you how to do each step of a task, in natural language. A human’s continuous demonstration is the embodiment of those steps, in physical space,” Wang says. “And we wanted to connect the two, so that a robot would automatically know what stage it is in a task, and be able to replan and recover on its own.”

Mapping marbles

For their new approach, the team developed an algorithm to automatically connect an LLM’s natural language label for a particular subtask with a robot’s position in physical space or an image that encodes the robot state. Mapping a robot’s physical coordinates, or an image of the robot state, to a natural language label is known as “grounding.” The team’s new algorithm is designed to learn a grounding “classifier,” meaning that it learns to automatically identify what semantic subtask a robot is in — for example, “reach” versus “scoop” — given its physical coordinates or an image view.

“The grounding classifier facilitates this dialogue between what the robot is doing in the physical space and what the LLM knows about the subtasks, and the constraints you have to pay attention to within each subtask,” Wang explains.

The team demonstrated the approach in experiments with a robotic arm that they trained on a marble-scooping task. Experimenters trained the robot by physically guiding it through the task of first reaching into a bowl, scooping up marbles, transporting them over an empty bowl, and pouring them in. After a few demonstrations, the team then used a pretrained LLM and asked the model to list the steps involved in scooping marbles from one bowl to another. The researchers then used their new algorithm to connect the LLM’s defined subtasks with the robot’s motion trajectory data. The algorithm automatically learned to map the robot’s physical coordinates in the trajectories and the corresponding image view to a given subtask.

The team then let the robot carry out the scooping task on its own, using the newly learned grounding classifiers. As the robot moved through the steps of the task, the experimenters pushed and nudged the bot off its path, and knocked marbles off its spoon at various points. Rather than stop and start from the beginning again, or continue blindly with no marbles on its spoon, the bot was able to self-correct, and completed each subtask before moving on to the next. (For instance, it would make sure that it successfully scooped marbles before transporting them to the empty bowl.)

“With our method, when the robot is making mistakes, we don’t need to ask humans to program or give extra demonstrations of how to recover from failures,” Wang says. “That’s super exciting because there’s a huge effort now toward training household robots with data collected on teleoperation systems. Our algorithm can now convert that training data into robust robot behavior that can do complex tasks, despite external perturbations.”