Defining the public interest in new technologies

How are waves of disruptive technologies, such as more advanced versions of artificial intelligence systems, changing the way we work, live, and play? Are there pathways that academics, practitioners, innovators, and entrepreneurs ought to be pursuing to ensure that the largest share of the benefits associated with new technologies uplift the most marginalized populations? What professional training is needed to ensure that this happens? What responsibility do creators of new or repurposed technologies have when they, and their organizations, create products or systems that might have adverse societal consequences? We are in an era in need of clear professional guidelines and norms, to say nothing of laws and regulations regarding the social impacts of new technologies.

Public interest technology, as an emerging field, aims to help shift the scholarly focus from the technologies to the technologists. To support this nascent field, students, faculty, and staff at MIT are launching a conversation to encourage technologists from different fields to confront the ethical and moral dilemmas that require them to redefine best practices in the face of ever-changing societal needs and norms. 

The Public Interest Technologist (TPIT) is a new online publication that seeks to bring together the MIT community to define and discuss the social responsibilities of individuals who design, implement, and evaluate technologies, especially in new fields. The editorial team for this publication has identified public interest technology as a new multidisciplinary field that emphasizes the benefits that could flow from both old and new technologies as they are developed in the most responsible fashion.

“As the pace of technology innovation quickens, the impacts, often unexpected, of new technologies generate gains and losses. Past experience with technological innovation has demonstrated that those diverse gains and losses are distributed unequally,” says Lawrence Susskind, the Ford Professor of Urban and Environmental Planning at the MIT Department of Urban Studies and Planning and a member of TPIT’s editorial team. “I think that those of us who care, and those of us with leadership roles in this field, have a responsibility to take concerted action to minimize the most harmful effects while ensuring that benefits reach those most in need. We see this publication as a means to move in this direction.” 

Framing the public interest in tech design and development

As one example of technology’s recent impact on society, the Covid-19 pandemic dramatically changed how we work and commute. Among other shifts, public transit agencies have been forced to contend with a new normal.

In an interview with Emilie Flamme, an MIT graduate student in city planning and a TPIT editor, Jim Aloisi and Jinhua Zhao of MIT’s Transit Lab propose a way: “to modernize and optimize transit for a labor workforce currently experiencing shortages. To implement this process, they underscore that defining the public interest involves co-defining questions that public agencies and staff must answer with the public. Aloisi and Zhao note that their Transit Lab emphasizes the question of what is in the public interest. Public technology is at the heart of the work they do, and Zhao wonders whether students receiving technological training get enough exposure and education regarding the public interest.

Fostering conversations, both at MIT and beyond

At MIT, TPIT’s editorial team seeks to provoke a campus-wide conversation: How do public interest technologists define their social responsibilities? Is it reasonable to assume that those who invent or implement new technologies will take some responsibility for the impacts or effects these technologies have? Who should decide what these responsibilities should be? Do norms need to be enforced?

Members of 63 universities, including MIT, have formed a coalition with the support of New America Foundation to share ideas about public interest technology (PIT). Should it be the focus of new degree programs? What research questions regarding PIT deserve the highest priority? The PIT-UN coalition provides grants and organizes an annual convening, including the 2023 PIT-UN Convening at Boston University in October. The Public Interest Technologist is an extension of MIT’s involvement with the PIT-UN network.

The editorial team at TPIT hopes to involve all MIT community members in shaping its current and future content. The team invites nominations for prospective interviewees from across the MIT community, article suggestions, and already published materials that will support a broader discussion of public interest technology at MIT. Community members are also invited to attend the PIT-UN annual meeting at Boston University this fall.

A step toward safe and reliable autopilots for flying

In the film “Top Gun: Maverick, Maverick, played by Tom Cruise, is charged with training young pilots to complete a seemingly impossible mission — to fly their jets deep into a rocky canyon, staying so low to the ground they cannot be detected by radar, then rapidly climb out of the canyon at an extreme angle, avoiding the rock walls. Spoiler alert: With Maverick’s help, these human pilots accomplish their mission.

A machine, on the other hand, would struggle to complete the same pulse-pounding task. To an autonomous aircraft, for instance, the most straightforward path toward the target is in conflict with what the machine needs to do to avoid colliding with the canyon walls or staying undetected. Many existing AI methods aren’t able to overcome this conflict, known as the stabilize-avoid problem, and would be unable to reach their goal safely.

MIT researchers have developed a new technique that can solve complex stabilize-avoid problems better than other methods. Their machine-learning approach matches or exceeds the safety of existing methods while providing a tenfold increase in stability, meaning the agent reaches and remains stable within its goal region.

In an experiment that would make Maverick proud, their technique effectively piloted a simulated jet aircraft through a narrow corridor without crashing into the ground. 

“This has been a longstanding, challenging problem. A lot of people have looked at it but didn’t know how to handle such high-dimensional and complex dynamics,” says Chuchu Fan, the Wilson Assistant Professor of Aeronautics and Astronautics, a member of the Laboratory for Information and Decision Systems (LIDS), and senior author of a new paper on this technique.

Fan is joined by lead author Oswin So, a graduate student. The paper will be presented at the Robotics: Science and Systems conference.

The stabilize-avoid challenge

Many approaches tackle complex stabilize-avoid problems by simplifying the system so they can solve it with straightforward math, but the simplified results often don’t hold up to real-world dynamics.

More effective techniques use reinforcement learning, a machine-learning method where an agent learns by trial-and-error with a reward for behavior that gets it closer to a goal. But there are really two goals here — remain stable and avoid obstacles — and finding the right balance is tedious.

The MIT researchers broke the problem down into two steps. First, they reframe the stabilize-avoid problem as a constrained optimization problem. In this setup, solving the optimization enables the agent to reach and stabilize to its goal, meaning it stays within a certain region. By applying constraints, they ensure the agent avoids obstacles, So explains. 

Then for the second step, they reformulate that constrained optimization problem into a mathematical representation known as the epigraph form and solve it using a deep reinforcement learning algorithm. The epigraph form lets them bypass the difficulties other methods face when using reinforcement learning. 

“But deep reinforcement learning isn’t designed to solve the epigraph form of an optimization problem, so we couldn’t just plug it into our problem. We had to derive the mathematical expressions that work for our system. Once we had those new derivations, we combined them with some existing engineering tricks used by other methods,” So says.

No points for second place

To test their approach, they designed a number of control experiments with different initial conditions. For instance, in some simulations, the autonomous agent needs to reach and stay inside a goal region while making drastic maneuvers to avoid obstacles that are on a collision course with it.

When compared with several baselines, their approach was the only one that could stabilize all trajectories while maintaining safety. To push their method even further, they used it to fly a simulated jet aircraft in a scenario one might see in a “Top Gun” movie. The jet had to stabilize to a target near the ground while maintaining a very low altitude and staying within a narrow flight corridor.

This simulated jet model was open-sourced in 2018 and had been designed by flight control experts as a testing challenge. Could researchers create a scenario that their controller could not fly? But the model was so complicated it was difficult to work with, and it still couldn’t handle complex scenarios, Fan says.

The MIT researchers’ controller was able to prevent the jet from crashing or stalling while stabilizing to the goal far better than any of the baselines.

In the future, this technique could be a starting point for designing controllers for highly dynamic robots that must meet safety and stability requirements, like autonomous delivery drones. Or it could be implemented as part of larger system. Perhaps the algorithm is only activated when a car skids on a snowy road to help the driver safely navigate back to a stable trajectory.

Navigating extreme scenarios that a human wouldn’t be able to handle is where their approach really shines, So adds.

“We believe that a goal we should strive for as a field is to give reinforcement learning the safety and stability guarantees that we will need to provide us with assurance when we deploy these controllers on mission-critical systems. We think this is a promising first step toward achieving that goal,” he says.

Moving forward, the researchers want to enhance their technique so it is better able to take uncertainty into account when solving the optimization. They also want to investigate how well the algorithm works when deployed on hardware, since there will be mismatches between the dynamics of the model and those in the real world.

“Professor Fan’s team has improved reinforcement learning performance for dynamical systems where safety matters. Instead of just hitting a goal, they create controllers that ensure the system can reach its target safely and stay there indefinitely,” says Stanley Bak, an assistant professor in the Department of Computer Science at Stony Brook University, who was not involved with this research. “Their improved formulation allows the successful generation of safe controllers for complex scenarios, including a 17-state nonlinear jet aircraft model designed in part by researchers from the Air Force Research Lab (AFRL), which incorporates nonlinear differential equations with lift and drag tables.”

The work is funded, in part, by MIT Lincoln Laboratory under the Safety in Aerobatic Flight Regimes program.

Bringing the social and ethical responsibilities of computing to the forefront

There has been a remarkable surge in the use of algorithms and artificial intelligence to address a wide range of problems and challenges. While their adoption, particularly with the rise of AI, is reshaping nearly every industry sector, discipline, and area of research, such innovations often expose unexpected consequences that involve new norms, new expectations, and new rules and laws.

To facilitate deeper understanding, the Social and Ethical Responsibilities of Computing (SERC), a cross-cutting initiative in the MIT Schwarzman College of Computing, recently brought together social scientists and humanists with computer scientists, engineers, and other computing faculty for an exploration of the ways in which the broad applicability of algorithms and AI has presented both opportunities and challenges in many aspects of society.

“The very nature of our reality is changing. AI has the ability to do things that until recently were solely the realm of human intelligence — things that can challenge our understanding of what it means to be human,” remarked Daniel Huttenlocher, dean of the MIT Schwarzman College of Computing, in his opening address at the inaugural SERC Symposium. “This poses philosophical, conceptual, and practical questions on a scale not experienced since the start of the Enlightenment. In the face of such profound change, we need new conceptual maps for navigating the change.”

The symposium offered a glimpse into the vision and activities of SERC in both research and education. “We believe our responsibility with SERC is to educate and equip our students and enable our faculty to contribute to responsible technology development and deployment,” said Georgia Perakis, the William F. Pounds Professor of Management in the MIT Sloan School of Management, co-associate dean of SERC, and the lead organizer of the symposium. “We’re drawing from the many strengths and diversity of disciplines across MIT and beyond and bringing them together to gain multiple viewpoints.”

Through a succession of panels and sessions, the symposium delved into a variety of topics related to the societal and ethical dimensions of computing. In addition, 37 undergraduate and graduate students from a range of majors, including urban studies and planning, political science, mathematics, biology, electrical engineering and computer science, and brain and cognitive sciences, participated in a poster session to exhibit their research in this space, covering such topics as quantum ethics, AI collusion in storage markets, computing waste, and empowering users on social platforms for better content credibility.

Showcasing a diversity of work

In three sessions devoted to themes of beneficent and fair computing, equitable and personalized health, and algorithms and humans, the SERC Symposium showcased work by 12 faculty members across these domains.

One such project from a multidisciplinary team of archaeologists, architects, digital artists, and computational social scientists aimed to preserve endangered heritage sites in Afghanistan with digital twins. The project team produced highly detailed interrogable 3D models of the heritage sites, in addition to extended reality and virtual reality experiences, as learning resources for audiences that cannot access these sites.

In a project for the United Network for Organ Sharing, researchers showed how they used applied analytics to optimize various facets of an organ allocation system in the United States that is currently undergoing a major overhaul in order to make it more efficient, equitable, and inclusive for different racial, age, and gender groups, among others.

Another talk discussed an area that has not yet received adequate public attention: the broader implications for equity that biased sensor data holds for the next generation of models in computing and health care.

A talk on bias in algorithms considered both human bias and algorithmic bias, and the potential for improving results by taking into account differences in the nature of the two kinds of bias.

Other highlighted research included the interaction between online platforms and human psychology; a study on whether decision-makers make systemic prediction mistakes on the available information; and an illustration of how advanced analytics and computation can be leveraged to inform supply chain management, operations, and regulatory work in the food and pharmaceutical industries.

Improving the algorithms of tomorrow

“Algorithms are, without question, impacting every aspect of our lives,” said Asu Ozdaglar, deputy dean of academics for the MIT Schwarzman College of Computing and head of the Department of Electrical Engineering and Computer Science, in kicking off a panel she moderated on the implications of data and algorithms.

“Whether it’s in the context of social media, online commerce, automated tasks, and now a much wider range of creative interactions with the advent of generative AI tools and large language models, there’s little doubt that much more is to come,” Ozdaglar said. “While the promise is evident to all of us, there’s a lot to be concerned as well. This is very much time for imaginative thinking and careful deliberation to improve the algorithms of tomorrow.”

Turning to the panel, Ozdaglar asked experts from computing, social science, and data science for insights on how to understand what is to come and shape it to enrich outcomes for the majority of humanity.

Sarah Williams, associate professor of technology and urban planning at MIT, emphasized the critical importance of comprehending the process of how datasets are assembled, as data are the foundation for all models. She also stressed the need for research to address the potential implication of biases in algorithms that often find their way in through their creators and the data used in their development. “It’s up to us to think about our own ethical solutions to these problems,” she said. “Just as it’s important to progress with the technology, we need to start the field of looking at these questions of what biases are in the algorithms? What biases are in the data, or in that data’s journey?”

Shifting focus to generative models and whether the development and use of these technologies should be regulated, the panelists — which also included MIT’s Srini Devadas, professor of electrical engineering and computer science, John Horton, professor of information technology, and Simon Johnson, professor of entrepreneurship — all concurred that regulating open-source algorithms, which are publicly accessible, would be difficult given that regulators are still catching up and struggling to even set guardrails for technology that is now 20 years old.

Returning to the question of how to effectively regulate the use of these technologies, Johnson proposed a progressive corporate tax system as a potential solution. He recommends basing companies‘ tax payments on their profits, especially for large corporations whose massive earnings go largely untaxed due to offshore banking. By doing so, Johnson said that this approach can serve as a regulatory mechanism that discourages companies from trying to “own the entire world” by imposing disincentives.

The role of ethics in computing education

As computing continues to advance with no signs of slowing down, it is critical to educate students to be intentional in the social impact of the technologies they will be developing and deploying into the world. But can one actually be taught such things? If so, how?

Caspar Hare, professor of philosophy at MIT and co-associate dean of SERC, posed this looming question to faculty on a panel he moderated on the role of ethics in computing education. All experienced in teaching ethics and thinking about the social implications of computing, each panelist shared their perspective and approach.

A strong advocate for the importance of learning from history, Eden Medina, associate professor of science, technology, and society at MIT, said that “often the way we frame computing is that everything is new. One of the things that I do in my teaching is look at how people have confronted these issues in the past and try to draw from them as a way to think about possible ways forward.” Medina regularly uses case studies in her classes and referred to a paper written by Yale University science historian Joanna Radin on the Pima Indian Diabetes Dataset that raised ethical issues on the history of that particular collection of data that many don’t consider as an example of how decisions around technology and data can grow out of very specific contexts.

Milo Phillips-Brown, associate professor of philosophy at Oxford University, talked about the Ethical Computing Protocol that he co-created while he was a SERC postdoc at MIT. The protocol, a four-step approach to building technology responsibly, is designed to train computer science students to think in a better and more accurate way about the social implications of technology by breaking the process down into more manageable steps. “The basic approach that we take very much draws on the fields of value-sensitive design, responsible research and innovation, participatory design as guiding insights, and then is also fundamentally interdisciplinary,” he said.

Fields such as biomedicine and law have an ethics ecosystem that distributes the function of ethical reasoning in these areas. Oversight and regulation are provided to guide front-line stakeholders and decision-makers when issues arise, as are training programs and access to interdisciplinary expertise that they can draw from. “In this space, we have none of that,” said John Basl, associate professor of philosophy at Northeastern University. “For current generations of computer scientists and other decision-makers, we’re actually making them do the ethical reasoning on their own.” Basl commented further that teaching core ethical reasoning skills across the curriculum, not just in philosophy classes, is essential, and that the goal shouldn’t be for every computer scientist be a professional ethicist, but for them to know enough of the landscape to be able to ask the right questions and seek out the relevant expertise and resources that exists.

After the final session, interdisciplinary groups of faculty, students, and researchers engaged in animated discussions related to the issues covered throughout the day during a reception that marked the conclusion of the symposium.

New model offers a way to speed up drug discovery

Huge libraries of drug compounds may hold potential treatments for a variety of diseases, such as cancer or heart disease. Ideally, scientists would like to experimentally test each of these compounds against all possible targets, but doing that kind of screen is prohibitively time-consuming.

In recent years, researchers have begun using computational methods to screen those libraries in hopes of speeding up drug discovery. However, many of those methods also take a long time, as most of them calculate each target protein’s three-dimensional structure from its amino-acid sequence, then use those structures to predict which drug molecules it will interact with.

Researchers at MIT and Tufts University have now devised an alternative computational approach based on a type of artificial intelligence algorithm known as a large language model. These models — one well-known example is ChatGPT — can analyze huge amounts of text and figure out which words (or, in this case, amino acids) are most likely to appear together. The new model, known as ConPLex, can match target proteins with potential drug molecules without having to perform the computationally intensive step of calculating the molecules’ structures.

Using this method, the researchers can screen more than 100 million compounds in a single day — much more than any existing model.

“This work addresses the need for efficient and accurate in silico screening of potential drug candidates, and the scalability of the model enables large-scale screens for assessing off-target effects, drug repurposing, and determining the impact of mutations on drug binding,” says Bonnie Berger, the Simons Professor of Mathematics, head of the Computation and Biology group in MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL), and one of the senior authors of the new study.

Lenore Cowen, a professor of computer science at Tufts University, is also a senior author of the paper, which appears this week in the Proceedings of the National Academy of Sciences. Rohit Singh, a CSAIL research scientist, and Samuel Sledzieski, an MIT graduate student, are the lead authors of the paper, and Bryan Bryson, an associate professor of biological engineering at MIT and a member of the Ragon Institute of MGH, MIT, and Harvard, is also an author. In addition to the paper, the researchers have made their model available online for other scientists to use.

Making predictions

In recent years, computational scientists have made great advances in developing models that can predict the structures of proteins based on their amino-acid sequences. However, using these models to predict how a large library of potential drugs might interact with a cancerous protein, for example, has proven challenging, mainly because calculating the three-dimensional structures of the proteins requires a great deal of time and computing power.

An additional obstacle is that these kinds of models don’t have a good track record for eliminating compounds known as decoys, which are very similar to a successful drug but don’t actually interact well with the target.

“One of the longstanding challenges in the field has been that these methods are fragile, in the sense that if I gave the model a drug or a small molecule that looked almost like the true thing, but it was slightly different in some subtle way, the model might still predict that they will interact, even though it should not,” Singh says.

Researchers have designed models that can overcome this kind of fragility, but they are usually tailored to just one class of drug molecules, and they aren’t well-suited to large-scale screens because the computations take too long. 

The MIT team decided to take an alternative approach, based on a protein model they first developed in 2019. Working with a database of more than 20,000 proteins, the language model encodes this information into meaningful numerical representations of each amino-acid sequence that capture associations between sequence and structure.

“With these language models, even proteins that have very different sequences but potentially have similar structures or similar functions can be represented in a similar way in this language space, and we’re able to take advantage of that to make our predictions,” Sledzieski says.

In their new study, the researchers applied the protein model to the task of figuring out which protein sequences will interact with specific drug molecules, both of which have numerical representations that are transformed into a common, shared space by a neural network. They trained the network on known protein-drug interactions, which allowed it to learn to associate specific features of the proteins with drug-binding ability, without having to calculate the 3D structure of any of the molecules.

“With this high-quality numerical representation, the model can short-circuit the atomic representation entirely, and from these numbers predict whether or not this drug will bind,” Singh says. “The advantage of this is that you avoid the need to go through an atomic representation, but the numbers still have all of the information that you need.”

Another advantage of this approach is that it takes into account the flexibility of protein structures, which can be “wiggly” and take on slightly different shapes when interacting with a drug molecule.

High affinity

To make their model less likely to be fooled by decoy drug molecules, the researchers also incorporated a training stage based on the concept of contrastive learning. Under this approach, the researchers give the model examples of “real” drugs and imposters and teach it to distinguish between them.

The researchers then tested their model by screening a library of about 4,700 candidate drug molecules for their ability to bind to a set of 51 enzymes known as protein kinases.

From the top hits, the researchers chose 19 drug-protein pairs to test experimentally. The experiments revealed that of the 19 hits, 12 had strong binding affinity (in the nanomolar range), whereas nearly all of the many other possible drug-protein pairs would have no affinity. Four of these pairs bound with extremely high, sub-nanomolar affinity (so strong that a tiny drug concentration, on the order of parts per billion, will inhibit the protein).

While the researchers focused mainly on screening small-molecule drugs in this study, they are now working on applying this approach to other types of drugs, such as therapeutic antibodies. This kind of modeling could also prove useful for running toxicity screens of potential drug compounds, to make sure they don’t have any unwanted side effects before testing them in animal models.

“Part of the reason why drug discovery is so expensive is because it has high failure rates. If we can reduce those failure rates by saying upfront that this drug is not likely to work out, that could go a long way in lowering the cost of drug discovery,” Singh says.

This new approach “represents a significant breakthrough in drug-target interaction prediction and opens up additional opportunities for future research to further enhance its capabilities,” says Eytan Ruppin, chief of the Cancer Data Science Laboratory at the National Cancer Institute, who was not involved in the study. “For example, incorporating structural information into the latent space or exploring molecular generation methods for generating decoys could further improve predictions.”

The research was funded by the National Institutes of Health, the National Science Foundation, and the Phillip and Susan Ragon Foundation.

MIT researchers make language models scalable self-learners

Socrates once said: “It is not the size of a thing, but the quality that truly matters. For it is in the nature of substance, not its volume, that true value is found.”

Does size always matter for large language models (LLMs)? In a technological landscape bedazzled by LLMs taking center stage, a team of MIT Computer Science and Artificial Intelligence Laboratory (CSAIL) researchers think smaller models shouldn’t be overlooked, especially for natural language understanding products widely deployed in the industry.

To that end, the researchers cooked up an approach to long-standing problems of inefficiency and privacy associated with big, text-based AI models — a logic-aware model that outperforms 500-times-bigger counterparts on some language understanding tasks without human-generated annotations, while preserving privacy and robustness with high performance.

LLMs, which have shown some promising skills in generating language, art, and code, are computationally expensive, and their data requirements can risk privacy leaks when using application programming interfaces for data upload. Smaller models have been historically less capable, particularly in multitasking and weakly supervised tasks, compared to their larger counterparts.

So what’s helping these smaller models act so mighty, then? Something called “textual entailment,” a way to help these models understand a variety of language tasks, where if one sentence (the premise) is true, then the other sentence (the hypothesis) is likely to be true as well. For example, if the premise is, “all cats have tails” then the hypothesis “a tabby cat has a tail” would be entailed by the premise. This concept is used to train an “entailment model” that proved to be less biased than other language models, from the team’s previous research. They then created “prompts“ that the models can use to figure out if certain information is entailed by a given sentence or phrase according to different tasks. This method improved the model’s ability to adapt to different tasks without any additional training, known as zero-shot adaptation.

In the realm of “natural language understanding,” there are various applications that hinge on determining the relationship between two pieces of text. For example, in sentiment classification, a statement like “I think the movie is good” can be inferred or entailed from a movie review that says, “I like the story and the acting is great,” indicating a positive sentiment. Another is news classification, where the topic of a news article can be inferred from its content. For example, a statement like “the news article is about sports” can be entailed if the main content of the article reports on an NBA game. The key insight was that many existing natural language understanding tasks could be recast as an entailment (i.e., logical inference in natural language) task. 

“Our research is about improving the ability of computer programs to understand and process natural language — the way humans speak and write. Our self-trained, 350-million-parameter entailment models, without human-generated labels, outperform supervised language models with 137 to 175 billion parameters,” says MIT CSAIL postdoc Hongyin Luo, lead author on a new paper about the study. “This has potential to reshape the landscape of AI and machine learning, providing a more scalable, trustworthy, and cost-effective solution to language modeling,” says Luo. “By proving that smaller models can perform at the same level as larger ones for language understanding, this work paves the way for more sustainable and privacy-preserving AI technologies.” 

The team discovered that they could improve the model’s performance even more by using a technique called “self-training,” where the model uses its own predictions to teach itself, effectively learning without human supervision and additional annotated training data.The self-training method significantly improved performance on a bunch of downstream tasks, including sentiment analysis, question-answering, and news classification. It outperformed both Google’s LaMDA and FLAN in zero-shot capabilities, GPT models, and other supervised algorithms. 

However, one challenge with self-training is that the model can sometimes generate incorrect or noisy labels that harm performance. To overcome this, they developed a new algorithm called ‚SimPLE‘ (Simple Pseudo-Label Editing), a process to review and modify the pseudo-labels made in initial rounds of learning. By correcting any mislabeled instances, it improved the overall quality of the self-generated labels. This not only made the models more effective at understanding language, but more robust when faced with adversarial data. 

As with most research, there are some limitations. The self-training on multi-class classification tasks didn’t perform as well as on binary natural language understanding tasks, indicating the challenge of applying entailment models to multi-choice tasks.

“This research presents an efficient and effective way to train large language models (LLMs) by formulating natural language understanding tasks as contextual entailment problems and employing a pseudo-labeling self-training mechanism to incorporate large quantities of unlabelled text data in the training process,” adds CSAIL Senior Research Scientist James Glass, who is also an author on the paper. “While the field of LLMs is undergoing rapid and dramatic changes, this research shows that it is possible to produce relatively compact language models that perform very well on benchmark understanding tasks compared to their peers of roughly the same size, or even much larger language models.”

“Entailment task is a popular proxy to evaluate “understanding” of a given context by an AI model,” says Leonid Karlinsky, research staff member at the MIT-IBM Watson AI Lab. “It is used in many areas analyzing models with unimodal, like LLMs, and and multi-modal, like VLMs [visual language models] inputs, simplifying the task of question-answering about a given input context to a binary classification problem — does this context entail a certain (e.g., text) conclusion or not? This paper makes two contributions in this space. First, it proposes a way to improve the zero-shot (without additional tuning) NLU performance and robustness to adversarial attacks via tuning with synthesized (specialized) entailment tasks generated for the primal NLU task. Second, it offers a self-supervised SimPLE method including pseudo-labeling and confidence-based filtering to further improve large LLMs‘ NLU performance.”

Luo and Glass wrote the paper with Yoon Kim, a CSAIL member and assistant professor in MIT’s Department of Electrical Engineering and Computer Science, and Jiaxin Ge of Peking University. Their work will be presented at the meeting of the Association for Computational Linguistics in Toronto, Ontario this July. This research was supported by a grant from the Hong Kong Innovation AI program.

Scaling audio-visual learning without labels

Researchers from MIT, the MIT-IBM Watson AI Lab, IBM Research, and elsewhere have developed a new technique for analyzing unlabeled audio and visual data that could improve the performance of machine-learning models used in applications like speech recognition and object detection. The work, for the first time, combines two architectures of self-supervised learning, contrastive learning and masked data modeling, in an effort to scale machine-learning tasks like event classification in single- and multimodal data without the need for annotation, thereby replicating how humans understand and perceive our world.

“A larger portion of human knowledge is learned in a self-supervised way, because we don’t always get supervision signals, and we want to enable the machine-learning model to have the same ability,” says Yuan Gong, an MIT postdoc in the Computer Science and Artificial Intelligence Laboratory (CSAIL).

“So, another way to put it is that self-supervised learning often forms the foundation of an initial model, because it can learn on vast amounts of unlabeled data. And then you can use classical, supervised learning or reinforcement learning to fine tune the model to something particular if you want to,” says Jim Glass, an MIT senior research scientist and member of the MIT-IBM Watson AI Lab.

The technique, called the contrastive audio-visual masked autoencoder (CAV-MAE), is a type of neural network that can learn to extract and map meaningful latent representations into high-dimensional space from acoustic and visual data by training on large YouTube datasets of audio and video 10-second clips. The researchers say the technique is more effective than previous approaches because it explicitly models the relationships between audio and visual data in a way that other methods do not.

Joining Gong and Glass on the study are graduate students Andrew Rouditchenko and Alexander H. Liu of MIT, David Harwath PhD ’18 of the University of Texas at Austin, and MIT-IBM Watson AI Lab members Leonid Karlinsky and Hilde Kuehne. Kuehne is also affiliated with Goethe University Frankfurt. The method was recently presented at the International Conference on Learning Representations.

A joint and coordinated approach

The CAV-MAE works by “learning by prediction” and “learning by comparison,” says Gong. The masked data modeling, or the prediction method, takes a video along with its coordinated audio waveform, converts the audio to a spectrogram, and masks 75 percent of both. The unmasked data is tokenized, then fed into separate audio and visual encoders before entering a joint encoder/decoder, where the model is asked to recover the missing data. The difference (reconstruction loss) between the resulting reconstructed prediction and the original audio-visual combination is then used to train the model for better performance. An example of this would be covering part of a video of a piano and part of a spectrogram of piano music, and then asking the model to try to determine the masked inputs. Unfortunately, this method may not capture the association between the video and audio pair, whereas contrastive learning leverages this, but may discard some modality-unique information, like the background in a video.

Contrastive learning aims to map representations that are similar close to each other. For example, the model will attempt to place different video and audio data of different parrots close to each other and further away from pairs of video and audio of guitars playing. In a similar fashion to masked autoencoding, audio-visual pairs are passed into separate modality encoders; however, the audio and visual components are kept separately within the joint encoder before the model performs pooling and contrastive loss. In this way, contrastive learning tries to identify the parts of each audio or video that are most relevant to the other. For example, if a video shows someone speaking and the corresponding audio clip contains speech, the autoencoder will learn to associate the mouth movements of the speaker with the words being spoken. It will then adjust the model’s parameters so that those inputs are represented close to each other. Ultimately, the CAV-MAE method combines both techniques with multiple forward data streams with masking as a first step, modality-specific encoders, and layer normalization so that the representation strengths are similar.

“We [then] wanted to compare the proposed CAV-MAE with a model trained only with a masked autoencoder and a model trained only with contrastive learning, because we want to show that by combining masked autoencoder and contrastive learning, we can get some performance improvement,” says Gong, “and the results support our hypothesis that there’s obvious improvement.”

The researchers tested CAV-MAE — as well as their method without contrastive loss or a masked autoencoder — against other state-of-the-art methods on audio-visual retrieval and audio-visual event classification tasks using standard AudioSet (20K and 2M) and VGGSound datasets — labeled, realistic short clips, which could include multiple sounds. Audio-visual retrieval means that the model sees either the audio or visual component of a query pair and searches for the missing one; event classification includes identifying actions or sounds within data, like a person singing or a car driving.

Overall, they found that contrastive learning and masked data modeling are complementary methods. CAV-MAE was able to outperform previous techniques (with fully self-supervised pre-training) by about 2 percent for event classification performance verses models with comparable computation and, more impressively, kept pace with or outperformed models with industry-level computational resources. The team’s model ranked similarly to models trained with only the contrastive loss. And surprisingly, the team says, the incorporation of multi-modal data into CAV-MAE pre-training greatly improves the fine-tuning of single-modality representation via supervised learning (with some labeled data) and performance on audio-only event classification tasks. This demonstrates that, like humans, multi-modal information provides an additional “soft label” boost even for audio or visual only tasks; for instance, it helps the model to understand if it’s looking for an electric or acoustic guitar — a richer supervision signal.

“I think people like the elegance of this model for combining information in the different audio and visual streams. It has the contrastive and the reconstruction loss, and compared to models that have been evaluated with similar data, it clearly does very well across a range of these tasks,” says Glass.

Building on this, “one special thing is, our model can do both classification and the retrieval, which is not common,” Gong adds. “Before this work, these methods are used separately, but after this work, I see that most of the audio-visual learning frameworks use contracting loss and the masked autoencoder together, implicitly or explicitly.”

Bringing self-supervised audio-visual learning into our world

The researchers see their contribution of the contrastive audio-visual masked autoencoder (CAV-MAE) as an important milestone and a step forward for applications, which are increasingly moving from single modality to multi-modality and which require or leverage audio-visual fusion. They hypothesize that one day it could be used for action recognition in realms like sports, education, entertainment, motor vehicles, and public safety. It could also, one day, extend to other modalities. At this time, the fact that, “this only applies to audio-visual data may be a limitation, but we are targeting multi-modal learning, which is trend of machine learning,” says Gong. “As humans, we have multi-modalities — we have smell, touch — many more things that just audio-visual. So, when we try to build AI, we try to mimic humans somehow, not necessarily from the biological perspective, and this method could [potentially be] generalized to other unexplored modalities.”

As machine-learning models continue to play an increasingly important role in our lives, techniques like this one will become increasingly valuable.

This research was supported by the MIT-IBM Watson AI Lab.

New tool helps people choose the right method for evaluating AI models

When machine-learning models are deployed in real-world situations, perhaps to flag potential disease in X-rays for a radiologist to review, human users need to know when to trust the model’s predictions.

But machine-learning models are so large and complex that even the scientists who design them don’t understand exactly how the models make predictions. So, they create techniques known as saliency methods that seek to explain model behavior.

With new methods being released all the time, researchers from MIT and IBM Research created a tool to help users choose the best saliency method for their particular task. They developed saliency cards, which provide standardized documentation of how a method operates, including its strengths and weaknesses and explanations to help users interpret it correctly.

They hope that, armed with this information, users can deliberately select an appropriate saliency method for both the type of machine-learning model they are using and the task that model is performing, explains co-lead author Angie Boggust, a graduate student in electrical engineering and computer science at MIT and member of the Visualization Group of the MIT Computer Science and Artificial Intelligence Laboratory (CSAIL).

Interviews with AI researchers and experts from other fields revealed that the cards help people quickly conduct a side-by-side comparison of different methods and pick a task-appropriate technique. Choosing the right method gives users a more accurate picture of how their model is behaving, so they are better equipped to correctly interpret its predictions.

“Saliency cards are designed to give a quick, glanceable summary of a saliency method and also break it down into the most critical, human-centric attributes. They are really designed for everyone, from machine-learning researchers to lay users who are trying to understand which method to use and choose one for the first time,” says Boggust.

Joining Boggust on the paper are co-lead author Harini Suresh, an MIT postdoc; Hendrik Strobelt, a senior research scientist at IBM Research; John Guttag, the Dugald C. Jackson Professor of Computer Science and Electrical Engineering at MIT; and senior author Arvind Satyanarayan, associate professor of computer science at MIT who leads the Visualization Group in CSAIL. The research will be presented at the ACM Conference on Fairness, Accountability, and Transparency.

Picking the right method

The researchers have previously evaluated saliency methods using the notion of faithfulness. In this context, faithfulness captures how accurately a method reflects a model’s decision-making process.

But faithfulness is not black-and-white, Boggust explains. A method might perform well under one test of faithfulness, but fail another. With so many saliency methods, and so many possible evaluations, users often settle on a method because it is popular or a colleague has used it.

However, picking the “wrong” method can have serious consequences. For instance, one saliency method, known as integrated gradients, compares the importance of features in an image to a meaningless baseline. The features with the largest importance over the baseline are most meaningful to the model’s prediction. This method typically uses all 0s as the baseline, but if applied to images, all 0s equates to the color black.

“It will tell you that any black pixels in your image aren’t important, even if they are, because they are identical to that meaningless baseline. This could be a big deal if you are looking at X-rays since black could be meaningful to clinicians,” says Boggust. 

Saliency cards can help users avoid these types of problems by summarizing how a saliency method works in terms of 10 user-focused attributes. The attributes capture the way saliency is calculated, the relationship between the saliency method and the model, and how a user perceives its outputs.

For example, one attribute is hyperparameter dependence, which measures how sensitive that saliency method is to user-specified parameters. A saliency card for integrated gradients would describe its parameters and how they affect its performance. With the card, a user could quickly see that the default parameters — a baseline of all 0s — might generate misleading results when evaluating X-rays.

The cards could also be useful for scientists by exposing gaps in the research space. For instance, the MIT researchers were unable to identify a saliency method that was computationally efficient, but could also be applied to any machine-learning model.

“Can we fill that gap? Is there a saliency method that can do both things? Or maybe these two ideas are theoretically in conflict with one another,” Boggust says.

Showing their cards

Once they had created several cards, the team conducted a user study with eight domain experts, from computer scientists to a radiologist who was unfamiliar with machine learning. During interviews, all participants said the concise descriptions helped them prioritize attributes and compare methods. And even though he was unfamiliar with machine learning, the radiologist was able to understand the cards and use them to take part in the process of choosing a saliency method, Boggust says.

The interviews also revealed a few surprises. Researchers often expect that clinicians want a method that is sharp, meaning it focuses on a particular object in a medical image. But the clinician in this study actually preferred some noise in medical images to help them attenuate uncertainty.

“As we broke it down into these different attributes and asked people, not a single person had the same priorities as anyone else in the study, even when they were in the same role,” she says.

Moving forward, the researchers want to explore some of the more under-evaluated attributes and perhaps design task-specific saliency methods. They also want to develop a better understanding of how people perceive saliency method outputs, which could lead to better visualizations. In addition, they are hosting their work on a public repository so others can provide feedback that will drive future work, Boggust says.

“We are really hopeful that these will be living documents that grow as new saliency methods and evaluations are developed. In the end, this is really just the start of a larger conversation around what the attributes of a saliency method are and how those play into different tasks,” she says.

The research was supported, in part, by the MIT-IBM Watson AI Lab, the U.S. Air Force Research Laboratory, and the U.S. Air Force Artificial Intelligence Accelerator.

A more effective way to train machines for uncertain, real-world situations

Someone learning to play tennis might hire a teacher to help them learn faster. Because this teacher is (hopefully) a great tennis player, there are times when trying to exactly mimic the teacher won’t help the student learn. Perhaps the teacher leaps high into the air to deftly return a volley. The student, unable to copy that, might instead try a few other moves on her own until she has mastered the skills she needs to return volleys.

Computer scientists can also use “teacher” systems to train another machine to complete a task. But just like with human learning, the student machine faces a dilemma of knowing when to follow the teacher and when to explore on its own. To this end, researchers from MIT and Technion, the Israel Institute of Technology, have developed an algorithm that automatically and independently determines when the student should mimic the teacher (known as imitation learning) and when it should instead learn through trial and error (known as reinforcement learning).

Their dynamic approach allows the student to diverge from copying the teacher when the teacher is either too good or not good enough, but then return to following the teacher at a later point in the training process if doing so would achieve better results and faster learning.

When the researchers tested this approach in simulations, they found that their combination of trial-and-error learning and imitation learning enabled students to learn tasks more effectively than methods that used only one type of learning.

This method could help researchers improve the training process for machines that will be deployed in uncertain real-world situations, like a robot being trained to navigate inside a building it has never seen before.

“This combination of learning by trial-and-error and following a teacher is very powerful. It gives our algorithm the ability to solve very difficult tasks that cannot be solved by using either technique individually,” says Idan Shenfeld an electrical engineering and computer science (EECS) graduate student and lead author of a paper on this technique.

Shenfeld wrote the paper with coauthors Zhang-Wei Hong, an EECS graduate student; Aviv Tamar; assistant professor of electrical engineering and computer science at Technion; and senior author Pulkit Agrawal, director of Improbable AI Lab and an assistant professor in the Computer Science and Artificial Intelligence Laboratory. The research will be presented at the International Conference on Machine Learning.

Striking a balance

Many existing methods that seek to strike a balance between imitation learning and reinforcement learning do so through brute force trial-and-error. Researchers pick a weighted combination of the two learning methods, run the entire training procedure, and then repeat the process until they find the optimal balance. This is inefficient and often so computationally expensive it isn’t even feasible.

“We want algorithms that are principled, involve tuning of as few knobs as possible, and achieve high performance — these principles have driven our research,” says Agrawal.

To achieve this, the team approached the problem differently than prior work. Their solution involves training two students: one with a weighted combination of reinforcement learning and imitation learning, and a second that can only use reinforcement learning to learn the same task.

The main idea is to automatically and dynamically adjust the weighting of the reinforcement and imitation learning objectives of the first student. Here is where the second student comes into play. The researchers’ algorithm continually compares the two students. If the one using the teacher is doing better, the algorithm puts more weight on imitation learning to train the student, but if the one using only trial and error is starting to get better results, it will focus more on learning from reinforcement learning.

By dynamically determining which method achieves better results, the algorithm is adaptive and can pick the best technique throughout the training process. Thanks to this innovation, it is able to more effectively teach students than other methods that aren’t adaptive, Shenfeld says.

“One of the main challenges in developing this algorithm was that it took us some time to realize that we should not train the two students independently. It became clear that we needed to connect the agents to make them share information, and then find the right way to technically ground this intuition,” Shenfeld says.

Solving tough problems

To test their approach, the researchers set up many simulated teacher-student training experiments, such as navigating through a maze of lava to reach the other corner of a grid. In this case, the teacher has a map of the entire grid while the student can only see a patch in front of it. Their algorithm achieved an almost perfect success rate across all testing environments, and was much faster than other methods.

To give their algorithm an even more difficult test, they set up a simulation involving a robotic hand with touch sensors but no vision, that must reorient a pen to the correct pose. The teacher had access to the actual orientation of the pen, while the student could only use touch sensors to determine the pen’s orientation.

Their method outperformed others that used either only imitation learning or only reinforcement learning.

Reorienting objects is one among many manipulation tasks that a future home robot would need to perform, a vision that the Improbable AI lab is working toward, Agrawal adds.

Teacher-student learning has successfully been applied to train robots to perform complex object manipulation and locomotion in simulation and then transfer the learned skills into the real-world. In these methods, the teacher has privileged information accessible from the simulation that the student won’t have when it is deployed in the real world. For example, the teacher will know the detailed map of a building that the student robot is being trained to navigate using only images captured by its camera.

“Current methods for student-teacher learning in robotics don’t account for the inability of the student to mimic the teacher and thus are performance-limited. The new method paves a path for building superior robots,” says Agrawal.

Apart from better robots, the researchers believe their algorithm has the potential to improve performance in diverse applications where imitation or reinforcement learning is being used. For example, large language models such as GPT-4 are very good at accomplishing a wide range of tasks, so perhaps one could use the large model as a teacher to train a smaller, student model to be even “better” at one particular task. Another exciting direction is to investigate the similarities and differences between machines and humans learning from their respective teachers. Such analysis might help improve the learning experience, the researchers say.

“What’s interesting about [this method] compared to related methods is how robust it seems to various parameter choices, and the variety of domains it shows promising results in,” says Abhishek Gupta, an assistant professor at the University of Washington, who was not involved with this work. “While the current set of results are largely in simulation, I am very excited about the future possibilities of applying this work to problems involving memory and reasoning with different modalities such as tactile sensing.” 

“This work presents an interesting approach to reuse prior computational work in reinforcement learning. Particularly, their proposed method can leverage suboptimal teacher policies as a guide while avoiding careful hyperparameter schedules required by prior methods for balancing the objectives of mimicking the teacher versus optimizing the task reward,” adds Rishabh Agarwal, a senior research scientist at Google Brain, who was also not involved in this research. “Hopefully, this work would make reincarnating reinforcement learning with learned policies less cumbersome.”  

This research was supported, in part, by the MIT-IBM Watson AI Lab, Hyundai Motor Company, the DARPA Machine Common Sense Program, and the Office of Naval Research.

Celebrating the impact of IDSS

The “interdisciplinary approach” is something that has been lauded for decades for its ability to break down silos and create new integrated approaches to research.

For Munther Dahleh, founding director of the MIT Institute for Data, Systems, and Society (IDSS), showing the community that data science and statistics can transcend individual disciplines and form a new holistic approach to addressing complex societal challenges has been crucial to the institute’s success.

“From the very beginning, it was critical that we recognized the areas of data science, statistics, AI, and, in a way, computing, as transdisciplinary,” says Dahleh, who is the William A. Coolidge Professor in Electrical Engineering and Computer Science. “We made that point over and over — these are areas that embed in your field. It is not ours; this organization is here for everyone.”

On April 14-15, researchers from across and beyond MIT joined together to celebrate the accomplishments and impact IDSS has had on research and education since its inception in 2015. Taking the place of IDSS’s annual statistics and data science conference SDSCon, the celebration also doubled as a way to recognize Dahleh for his work creating and executing the vision of IDSS as he prepares to step down from his director position this summer.

In addition to talks and panels on statistics and computation, smart systems, automation and artificial intelligence, conference participants discussed issues ranging from climate change, health care, and misinformation. Nobel Prize winner and IDSS affiliate Professor Esther Duflo spoke on large scale immunization efforts, former MLK Visiting Professor Craig Watkins joined a panel on equity and justice in AI, and IDSS Associate Director Alberto Abadie discussed synthetic controls for policy evaluation. Other policy questions were explored through lightning talks, including those by students from the Technology and Policy Program (TPP) within IDSS.

A place to call home

The list of IDSS accomplishments over the last eight years is long and growing. From creating a home for 21st century statistics at MIT after other unsuccessful attempts, to creating a new PhD preparing the trilingual student who is an expert in data science and social science in the context of a domain, to playing a key role in determining an effective process for Covid testing in the early days of the pandemic, IDSS has left its mark on MIT. More recently, IDSS launched an initiative using big data to help effect structural and normative change toward racial equity, and will continue to explore societal challenges through the lenses of statistics, social science, and science and engineering.

“I’m very proud of what we’ve done and of all the people who have contributed to this. The leadership team has been phenomenal in their commitment and their creativity,” Dahleh says. “I always say it doesn’t take one person, it takes the village to do what we have done, and I am very proud of that.”

Prior to the institute’s formation, Dahleh and others at MIT were brought together to answer one key question: How would MIT prepare for the future of systems and data?

“Data science is a complex area because in some ways it’s everywhere and it belongs to everyone, similar to statistics and AI,” Dahleh says “The most important part of creating an organization to support it was making it clear that it was an organization for everyone.” The response the team came back with was to build an Institute: a department that could cross all other departments and schools.

While Dahleh and others on the committee were creating this blueprint for the future, the events that would lead early IDSS hires like Caroline Uhler to join the team were also beginning to take shape. Uhler, now an MIT professor of computer science and co-director of the Eric and Wendy Schmidt Center at the Broad Institute, was a panelist at the celebration discussing statistics and human health.

In 2015, Uhler was a faculty member at the Institute of Science and Technology in Austria looking to move back to the U.S. “I was looking for positions in all different types of departments related to statistics, including electrical engineering and computer science, which were areas not related to my degree,” Uhler says. “What really got me to MIT was Munther’s vision for building a modern type of statistics, and the unique opportunity to be part of building what statistics should be moving forward.”

The breadth of the Statistics and Data Science Center has given it a unique and a robust character that makes for an attractive collaborative environment at MIT. “A lot of IDSS’s impact has been in giving people like me a home,” Uhler adds. “By building an institute for statistics that is across all schools instead of housed within a single department, it has created a home for everyone who is interested in the field.”

Filling the gap

For Ali Jadbabaie, former IDSS associate director and another early IDSS hire, being in the right place at the right time landed him in the center of it all. A control theory expert and network scientist by training, Jadbabaie first came to MIT during a sabbatical from his position as a professor at the University of Pennsylvania.

“My time at MIT coincided with the early discussions around forming IDSS and given my experience they asked me to stay and help with its creation,” Jadbabaie says. He is now head of the Department of Civil and Environmental Engineering at MIT, and he spoke at the celebration about a new MIT major in climate system science and engineering.

A critical early accomplishment of IDSS was the creation of a doctoral program in social and engineering systems (SES), which has the goal of educating and fostering the success of a new type of PhD student, says Jadbabaie.

“We realized we had this opportunity to educate a new type of PhD student who was conversant in the math of information sciences and statistics in addition to an understanding of a domain — infrastructures, climate, political polarization — in which problems arise,” he says. “This program would provide training in statistics and data science, the math of information sciences and a branch of social science that is relevant to their domain.”

“SES has been filling a gap,” adds Jadbabaie. “We wanted to bring quantitative reasoning to areas in social sciences, particularly as they interact with complex engineering systems.”

“My first year at MIT really broadened my horizon in terms of what was available and exciting,” says Manxi Wu, a member of the first cohort of students in the SES program after starting out in the Master of Science in Transportation (MST) program. “My advisor introduced me to a number of interesting topics at the intersection of game theory, economics, and engineering systems, and in my second year I realized my interest was really about the societal scale systems, with transportation as my go-to application area when I think about how to make an impact in the real world.”

Wu, now an assistant professor in the School of Operations Research and Information Engineering at Cornell, was a panelist at the Celebration’s session on smart infrastructure systems. She says that the beauty of the SES program lies in its ability to create a common ground between groups of students and researchers who all have different applications interests but share an eagerness to sharpen their technical skills.

“While we may be working on very different application areas, the core methodologies, such as mathematical tools for data science and probability optimization, create a common language,” Wu says. “We are all capable of speaking the technical language, and our diversified interests give us even more to talk about.”

In addition to the PhD program, IDSS has helped bring quality MIT programming to people around the globe with its MicroMasters Program in Statistics and Data Science (SDS), which recently celebrated the certification of over 1,000 learners. The MicroMasters is just one offering in the newly-minted IDSSx, a collection of online learning opportunities for learners at different skill levels and interests.

“The impact of branding what MIT-IDSS does across the globe has been great,” Dahleh says. “In addition, we’ve created smaller online programs for continued education in data science and machine learning, which I think is also critical in educating the community at large.”

Hopes for the future

Through all of its accomplishments, the core mission of IDSS has never changed.

“The belief was always to create an institute focused on how data science can be used to solve pressing societal problems,” Dahleh says. “The organizational structure of IDSS as an MIT Institute has enabled it to promote data and systems as a transdiciplinary area that embeds in every domain to support its mission. This reverse ownership structure will continue to strengthen the presence of IDSS in MIT and will make it an essential unit within the Schwarzman College of Computing.”

As Dahleh prepares to step down from his role, and Professor Martin Wainwright gets ready to fill his (very big) shoes as director, Dahleh’s colleagues say the real key to the success of IDSS all started with his passion and vision.

“Creating a new academic unit within MIT is actually next to impossible,” Jadbabaie says. “It requires structural changes, as well as someone who has a strong understanding of multiple areas, who knows how to get people to work together collectively, and who has a mission.“

“The most important thing is that he was inclusive,” he adds. “He didn’t try to create a gate around it and say these people are in and these people are not. I don’t think this would have ever happened without Munther at the helm.”

Using AI, scientists find a drug that could combat drug-resistant infections

Using an artificial intelligence algorithm, researchers at MIT and McMaster University have identified a new antibiotic that can kill a type of bacteria that is responsible for many drug-resistant infections.

If developed for use in patients, the drug could help to combat Acinetobacter baumannii, a species of bacteria that is often found in hospitals and can lead to pneumonia, meningitis, and other serious infections. The microbe is also a leading cause of infections in wounded soldiers in Iraq and Afghanistan.

Acinetobacter can survive on hospital doorknobs and equipment for long periods of time, and it can take up antibiotic resistance genes from its environment. It’s really common now to find A. baumannii isolates that are resistant to nearly every antibiotic,” says Jonathan Stokes, a former MIT postdoc who is now an assistant professor of biochemistry and biomedical sciences at McMaster University.

The researchers identified the new drug from a library of nearly 7,000 potential drug compounds using a machine-learning model that they trained to evaluate whether a chemical compound will inhibit the growth of A. baumannii.

“This finding further supports the premise that AI can significantly accelerate and expand our search for novel antibiotics,” says James Collins, the Termeer Professor of Medical Engineering and Science in MIT’s Institute for Medical Engineering and Science (IMES) and Department of Biological Engineering. “I’m excited that this work shows that we can use AI to help combat problematic pathogens such as A. baumannii.”

Collins and Stokes are the senior authors of the new study, which appears today in Nature Chemical Biology. The paper’s lead authors are McMaster University graduate students Gary Liu and Denise Catacutan and recent McMaster graduate Khushi Rathod.

Drug discovery

Over the past several decades, many pathogenic bacteria have become increasingly resistant to existing antibiotics, while very few new antibiotics have been developed.

Several years ago, Collins, Stokes, and MIT Professor Regina Barzilay (who is also an author on the new study), set out to combat this growing problem by using machine learning, a type of artificial intelligence that can learn to recognize patterns in vast amounts of data. Collins and Barzilay, who co-direct MIT’s Abdul Latif Jameel Clinic for Machine Learning in Health, hoped this approach could be used to identify new antibiotics whose chemical structures are different from any existing drugs.

In their initial demonstration, the researchers trained a machine-learning algorithm to identify chemical structures that could inhibit growth of E. coli. In a screen of more than 100 million compounds, that algorithm yielded a molecule that the researchers called halicin, after the fictional artificial intelligence system from “2001: A Space Odyssey.” This molecule, they showed, could kill not only E. coli but several other bacterial species that are resistant to treatment.

“After that paper, when we showed that these machine-learning approaches can work well for complex antibiotic discovery tasks, we turned our attention to what I perceive to be public enemy No. 1 for multidrug-resistant bacterial infections, which is Acinetobacter,” Stokes says.

To obtain training data for their computational model, the researchers first exposed A. baumannii grown in a lab dish to about 7,500 different chemical compounds to see which ones could inhibit growth of the microbe. Then they fed the structure of each molecule into the model. They also told the model whether each structure could inhibit bacterial growth or not. This allowed the algorithm to learn chemical features associated with growth inhibition.

Once the model was trained, the researchers used it to analyze a set of 6,680 compounds it had not seen before, which came from the Drug Repurposing Hub at the Broad Institute. This analysis, which took less than two hours, yielded a few hundred top hits. Of these, the researchers chose 240 to test experimentally in the lab, focusing on compounds with structures that were different from those of existing antibiotics or molecules from the training data.

Those tests yielded nine antibiotics, including one that was very potent. This compound, which was originally explored as a potential diabetes drug, turned out to be extremely effective at killing A. baumannii but had no effect on other species of bacteria including Pseudomonas aeruginosa, Staphylococcus aureus, and carbapenem-resistant Enterobacteriaceae.

This “narrow spectrum” killing ability is a desirable feature for antibiotics because it minimizes the risk of bacteria rapidly spreading resistance against the drug. Another advantage is that the drug would likely spare the beneficial bacteria that live in the human gut and help to suppress opportunistic infections such as Clostridium difficile.

“Antibiotics often have to be administered systemically, and the last thing you want to do is cause significant dysbiosis and open up these already sick patients to secondary infections,” Stokes says.

A novel mechanism

In studies in mice, the researchers showed that the drug, which they named abaucin, could treat wound infections caused by A. baumannii. They also showed, in lab tests, that it works against a variety of drug-resistant A. baumannii strains isolated from human patients.

Further experiments revealed that the drug kills cells by interfering with a process known as lipoprotein trafficking, which cells use to transport proteins from the interior of the cell to the cell envelope. Specifically, the drug appears to inhibit LolE, a protein involved in this process.

All Gram-negative bacteria express this enzyme, so the researchers were surprised to find that abaucin is so selective in targeting A. baumannii. They hypothesize that slight differences in how A. baumannii performs this task might account for the drug’s selectivity.

“We haven’t finalized the experimental data acquisition yet, but we think it’s because A. baumannii does lipoprotein trafficking a little bit differently than other Gram-negative species. We believe that’s why we’re getting this narrow spectrum activity,” Stokes says.

Stokes’ lab is now working with other researchers at McMaster to optimize the medicinal properties of the compound, in hopes of developing it for eventual use in patients.

The researchers also plan to use their modeling approach to identify potential antibiotics for other types of drug-resistant infections, including those caused by Staphylococcus aureus and Pseudomonas aeruginosa.

The research was funded by the David Braley Center for Antibiotic Discovery, the Weston Family Foundation, the Audacious Project, the C3.ai Digital Transformation Institute, the Abdul Latif Jameel Clinic for Machine Learning in Health, the DTRA Discovery of Medical Countermeasures Against New and Emerging Threats program, the DARPA Accelerated Molecular Discovery program, the Canadian Institutes of Health Research, Genome Canada, the Faculty of Health Sciences of McMaster University, the Boris Family, a Marshall Scholarship, and the Department of Energy Biological and Environmental Research program.