Detailed images from space offer clearer picture of drought effects on plants

“MIT is a place where dreams come true,” says César Terrer, an assistant professor in the Department of Civil and Environmental Engineering. Here at MIT, Terrer says he’s given the resources needed to explore ideas he finds most exciting, and at the top of his list is climate science. In particular, he is interested in plant-soil interactions, and how the two can mitigate impacts of climate change. In 2022, Terrer received seed grant funding from the Abdul Latif Jameel Water and Food Systems Lab (J-WAFS) to produce drought monitoring systems for farmers. The project is leveraging a new generation of remote sensing devices to provide high-resolution plant water stress at regional to global scales.

Growing up in Granada, Spain, Terrer always had an aptitude and passion for science. He studied environmental science at the University of Murcia, where he interned in the Department of Ecology. Using computational analysis tools, he worked on modeling species distribution in response to human development. Early on in his undergraduate experience, Terrer says he regarded his professors as “superheroes” with a kind of scholarly prowess. He knew he wanted to follow in their footsteps by one day working as a faculty member in academia. Of course, there would be many steps along the way before achieving that dream. 

Upon completing his undergraduate studies, Terrer set his sights on exciting and adventurous research roles. He thought perhaps he would conduct field work in the Amazon, engaging with native communities. But when the opportunity arose to work in Australia on a state-of-the-art climate change experiment that simulates future levels of carbon dioxide, he headed south to study how plants react to CO2 in a biome of native Australian eucalyptus trees. It was during this experience that Terrer started to take a keen interest in the carbon cycle and the capacity of ecosystems to buffer rising levels of CO2 caused by human activity.

Around 2014, he began to delve deeper into the carbon cycle as he began his doctoral studies at Imperial College London. The primary question Terrer sought to answer during his PhD was “will plants be able to absorb predicted future levels of CO2 in the atmosphere?” To answer the question, Terrer became an early adopter of artificial intelligence, machine learning, and remote sensing to analyze data from real-life, global climate change experiments. His findings from these “ground truth” values and observations resulted in a paper in the journal Science. In it, he claimed that climate models most likely overestimated how much carbon plants will be able to absorb by the end of the century, by a factor of three. 

After postdoctoral positions at Stanford University and the Universitat Autonoma de Barcelona, followed by a prestigious Lawrence Fellowship, Terrer says he had “too many ideas and not enough time to accomplish all those ideas.” He knew it was time to lead his own group. Not long after applying for faculty positions, he landed at MIT. 

New ways to monitor drought

Terrer is employing similar methods to those he used during his PhD to analyze data from all over the world for his J-WAFS project. He and postdoc Wenzhe Jiao collect data from remote sensing satellites and field experiments and use machine learning to come up with new ways to monitor drought. Terrer says Jiao is a “remote sensing wizard,” who fuses data from different satellite products to understand the water cycle. With Jiao’s hydrology expertise and Terrer’s knowledge of plants, soil, and the carbon cycle, the duo is a formidable team to tackle this project.

According to the U.N. World Meteorological Organization, the number and duration of droughts has increased by 29 percent since 2000, as compared to the two previous decades. From the Horn of Africa to the Western United States, drought is devastating vegetation and severely stressing water supplies, compromising food production and spiking food insecurity. Drought monitoring can offer fundamental information on drought location, frequency, and severity, but assessing the impact of drought on vegetation is extremely challenging. This is because plants’ sensitivity to water deficits varies across species and ecosystems. 

Terrer and Jiao are able to obtain a clearer picture of how drought is affecting plants by employing the latest generation of remote sensing observations, which offer images of the planet with incredible spatial and temporal resolution. Satellite products such as Sentinel, Landsat, and Planet can provide daily images from space with such high resolution that individual trees can be discerned. Along with the images and datasets from satellites, the team is using ground-based observations from meteorological data. They are also using the MIT SuperCloud at MIT Lincoln Laboratory to process and analyze all of the data sets. The J-WAFS project is among one of the first to leverage high-resolution data to quantitatively measure plant drought impacts in the United States with the hopes of expanding to a global assessment in the future.

Assisting farmers and resource managers 

Every week, the U.S. Drought Monitor provides a map of drought conditions in the United States. The map has zero resolution and is more of a drought recap or summary, unable to predict future drought scenarios. The lack of a comprehensive spatiotemporal evaluation of historic and future drought impacts on global vegetation productivity is detrimental to farmers both in the United States and worldwide.  

Terrer and Jiao plan to generate metrics for plant water stress at an unprecedented resolution of 10-30 meters. This means that they will be able to provide drought monitoring maps at the scale of a typical U.S. farm, giving farmers more precise, useful data every one to two days. The team will use the information from the satellites to monitor plant growth and soil moisture, as well as the time lag of plant growth response to soil moisture. In this way, Terrer and Jiao say they will eventually be able to create a kind of “plant water stress forecast” that may be able to predict adverse impacts of drought four weeks in advance. “According to the current soil moisture and lagged response time, we hope to predict plant water stress in the future,” says Jiao. 

The expected outcomes of this project will give farmers, land and water resource managers, and decision-makers more accurate data at the farm-specific level, allowing for better drought preparation, mitigation, and adaptation. “We expect to make our data open-access online, after we finish the project, so that farmers and other stakeholders can use the maps as tools,” says Jiao. 

Terrer adds that the project “has the potential to help us better understand the future states of climate systems, and also identify the regional hot spots more likely to experience water crises at the national, state, local, and tribal government scales.” He also expects the project will enhance our understanding of global carbon-water-energy cycle responses to drought, with applications in determining climate change impacts on natural ecosystems as a whole.

Peter Baddoo, Department of Mathematics instructor, dies at 29

Peter Baddoo, an instructor in the Department of Mathematics, passed away suddenly on Feb. 15 while playing basketball on campus.

Baddoo joined the MIT Department of Mathematics in January 2021. Prior to this, he was an EPSRC Doctoral Prize Fellow at Imperial College London. He studied mathematics as an undergraduate at the University of Oxford and received his PhD from Cambridge University.

An accomplished applied mathematician, Baddoo had broad research interests and activities spanning complex function theory, fluid dynamics, and machine learning and data-driven methods. His book, “Analytic Solutions for Flows Through Cascades” (Springer, 2020) received praise for its “exceptionally clear presentation with beautiful figures.”

“Peter was an outstanding, self-propelling researcher, a master of complex function theory with a burgeoning interest in machine learning, and had several collaborations within the U.S. and farther afield. He had an exceptionally promising future in academia. He was a deeply respected and valued member of my research group and the broader applied math community. He will be sorely missed,” says Professor John Bush, his faculty mentor.

In addition to his research, Baddoo was an exemplary teacher who gave generously of his time in assisting colleagues, graduate students, and undergraduates. 

“Peter was an excellent lecturer — clear, composed, thoughtful, and kind. He was extremely popular among his students,” says Michel Goemans, the RSA Professor of Mathematics and Department of Mathematics head. One of Baddoo’s students in class 18.04 (Complex Variables with Applications) says that “I took Peter’s class, and I walked out of that class actually liking math. I was assured that I want to study more of math and pursue a minor.”

Aside from his work as a scholar and teacher, Baddoo brought the department together by organizing social events for postdocs and instructors; for these and other efforts he received a Math Community Service Award. His interests extended well beyond mathematics and included music and sports such as basketball and lacrosse — which he played at Oxford and Cambridge universities, and as a member of the Senior England Men’s training squad. He was also a devoted and active member of Park Street Church.

In his honor, the Department of Mathematics will be endowing a Peter Baddoo Prize to recognize outstanding contributions to community-building within the department.

Peter Baddoo is survived by his parents, Jim and Nancy; his sisters, Kate and Harriet; and his fiancée, Yuna Kim.  

Mining the right transition metals in a vast chemical space

Swift and significant gains against climate change require the creation of novel, environmentally benign, and energy-efficient materials. One of the richest veins researchers hope to tap in creating such useful compounds is a vast chemical space where molecular combinations that offer remarkable optical, conductive, magnetic, and heat transfer properties await discovery.

But finding these new materials has been slow going.

“While computational modeling has enabled us to discover and predict properties of new materials much faster than experimentation, these models aren’t always trustworthy,” says Heather J. Kulik  PhD ’09, associate professor in the departments of Chemical Engineering and Chemistry. “In order to accelerate computational discovery of materials, we need better methods for removing uncertainty and making our predictions more accurate.”

A team from Kulik’s lab set out to address these challenges with a team including Chenru Duan PhD ’22.

A tool for building trust

Kulik and her group focus on transition metal complexes, molecules comprised of metals found in the middle of the periodic table that are surrounded by organic ligands. These complexes can be extremely reactive, which gives them a central role in catalyzing natural and industrial processes. By altering the organic and metal components in these molecules, scientists can generate materials with properties that can improve such applications as artificial photosynthesis, solar energy absorption and storage, higher efficiency OLEDS (organic light emitting diodes), and device miniaturization.

“Characterizing these complexes and discovering new materials currently happens slowly, often driven by a researcher’s intuition,” says Kulik. “And the process involves trade-offs: You might find a material that has good light-emitting properties, but the metal at the center may be something like iridium, which is exceedingly rare and toxic.”

Researchers attempting to identify nontoxic, earth-abundant transition metal complexes with useful properties tend to pursue a limited set of features, with only modest assurance that they are on the right track. “People continue to iterate on a particular ligand, and get stuck in local areas of opportunity, rather than conduct large-scale discovery,” says Kulik.

To address these screening inefficiencies, Kulik’s team developed a new approach — a machine-learning based “recommender” that lets researchers know the optimal model for pursuing their search. Their description of this tool was the subject of a paper in Nature Computational Science in December.

“This method outperforms all prior approaches and can tell people when to use methods and when they’ll be trustworthy,” says Kulik.

The team, led by Duan, began by investigating ways to improve the conventional screening approach, density functional theory (DFT), which is based on computational quantum mechanics. He built a machine learning platform to determine how accurate density functional models were in predicting structure and behavior of transition metal molecules.

“This tool learned which density functionals were the most reliable for specific material complexes,” says Kulik. “We verified this by testing the tool against materials it had never encountered before, where it in fact chose the most accurate density functionals for predicting the material’s property.”

A critical breakthrough for the team was its decision to use the electron density — a fundamental quantum mechanical property of atoms — as a machine learning input. This unique identifier, as well as the use of a neural network model to carry out the mapping, creates a powerful and efficient aide for researchers who want to determine whether they are using the appropriate density functional for characterizing their target transition metal complex. “A calculation that would take days or weeks, which makes computational screening nearly infeasible, can instead take only hours to produce a trustworthy result.”

Kulik has incorporated this tool into molSimplify, an open source code on the lab’s website, enabling researchers anywhere in the world to predict properties and model transition metal complexes.

Optimizing for multiple properties

In a related research thrust, which they showcased in a recent publication in JACS Au, Kulik’s group demonstrated an approach for quickly homing in on transition metal complexes with specific properties in a large chemical space.

Their work springboarded off a 2021 paper showing that agreement about the properties of a target molecule among a group of different density functionals significantly reduced the uncertainty of a model’s predictions.

Kulik’s team exploited this insight by demonstrating, in a first, multi-objective optimization. In their study, they successfully identified molecules that were easy to synthesize, featuring significant light-absorbing properties, using earth-abundant metals. They searched 32 million candidate materials, one of the largest spaces ever searched for this application. “We took apart complexes that are already in known, experimentally synthesized materials, and we recombined them in new ways, which allowed us to maintain some synthetic realism,” says Kulik.

After collecting DFT results on 100 compounds in this giant chemical domain, the group trained machine learning models to make predictions on the entire 32 million-compound space, with an eye to achieving their specific design goals. They repeated this process generation after generation to winnow out compounds with the explicit properties they wanted.

“In the end we found nine of the most promising compounds, and discovered that the specific compounds we picked through machine learning contained pieces (ligands) that had been experimentally synthesized for other applications requiring optical properties, ones with favorable light absorption spectra,” says Kulik.

Applications with impact

While Kulik’s overarching goal involves overcoming limitations in computational modeling, her lab is taking full advantage of its own tools to streamline the discovery and design of new, potentially impactful materials.

In one notable example, “We are actively working on the optimization of metal–organic frameworks for the direct conversion of methane to methanol,” says Kulik. “This is a holy grail reaction that folks have wanted to catalyze for decades, but have been unable to do efficiently.” 

The possibility of a fast path for transforming a very potent greenhouse gas into a liquid that is easily transported and could be used as a fuel or a value-added chemical holds great appeal for Kulik. “It represents one of those needle-in-a-haystack challenges that multi-objective optimization and screening of millions of candidate catalysts is well-positioned to solve, an outstanding challenge that’s been around for so long.”

A new method to boost the speed of online databases

Hashing is a core operation in most online databases, like a library catalogue or an e-commerce website. A hash function generates codes that replace data inputs. Since these codes are shorter than the actual data, and usually a fixed length, this makes it easier to find and retrieve the original information.

However, because traditional hash functions generate codes randomly, sometimes two pieces of data can be hashed with the same value. This causes collisions — when searching for one item points a user to many pieces of data with the same hash value. It takes much longer to find the right one, resulting in slower searches and reduced performance.

Certain types of hash functions, known as perfect hash functions, are designed to sort data in a way that prevents collisions. But they must be specially constructed for each dataset and take more time to compute than traditional hash functions.

Since hashing is used in so many applications, from database indexing to data compression to cryptography, fast and efficient hash functions are critical. So, researchers from MIT and elsewhere set out to see if they could use machine learning to build better hash functions.

They found that, in certain situations, using learned models instead of traditional hash functions could result in half as many collisions. Learned models are those that have been created by running a machine-learning algorithm on a dataset. Their experiments also showed that learned models were often more computationally efficient than perfect hash functions.

“What we found in this work is that in some situations we can come up with a better tradeoff between the computation of the hash function and the collisions we will face. We can increase the computational time for the hash function a bit, but at the same time we can reduce collisions very significantly in certain situations,” says Ibrahim Sabek, a postdoc in the MIT Data Systems Group of the Computer Science and Artificial Intelligence Laboratory (CSAIL).

Their research, which will be presented at the International Conference on Very Large Databases, demonstrates how a hash function can be designed to significantly speed up searches in a huge database. For instance, their technique could accelerate computational systems that scientists use to store and analyze DNA, amino acid sequences, or other biological information.

Sabek is co-lead author of the paper with electrical engineering and computer science (EECS) graduate student Kapil Vaidya. They are joined by co-authors Dominick Horn, a graduate student at the Technical University of Munich; Andreas Kipf, an MIT postdoc; Michael Mitzenmacher, professor of computer science at the Harvard John A. Paulson School of Engineering and Applied Sciences; and senior author Tim Kraska, associate professor of EECS at MIT and co-director of the Data Systems and AI Lab.

Hashing it out

Given a data input, or key, a traditional hash function generates a random number, or code, that corresponds to the slot where that key will be stored. To use a simple example, if there are 10 keys to be put into 10 slots, the function would generate a random integer between 1 and 10 for each input. It is highly probable that two keys will end up in the same slot, causing collisions.

Perfect hash functions provide a collision-free alternative. Researchers give the function some extra knowledge, such as the number of slots the data are to be placed into. Then it can perform additional computations to figure out where to put each key to avoid collisions. However, these added computations make the function harder to create and less efficient.

“We were wondering, if we know more about the data — that it will come from a particular distribution — can we use learned models to build a hash function that can actually reduce collisions?” Vaidya says.

A data distribution shows all possible values in a dataset, and how often each value occurs. The distribution can be used to calculate the probability that a particular value is in a data sample.

The researchers took a small sample from a dataset and used machine learning to approximate the shape of the data’s distribution, or how the data are spread out. The learned model then uses the approximation to predict the location of a key in the dataset.

They found that learned models were easier to build and faster to run than perfect hash functions and that they led to fewer collisions than traditional hash functions if data are distributed in a predictable way. But if the data are not predictably distributed, because gaps between data points vary too widely, using learned models might cause more collisions.

“We may have a huge number of data inputs, and each one has a different gap between it and the next one, so learning that is quite difficult,” Sabek explains.

Fewer collisions, faster results

When data were predictably distributed, learned models could reduce the ratio of colliding keys in a dataset from 30 percent to 15 percent, compared with traditional hash functions. They were also able to achieve better throughput than perfect hash functions. In the best cases, learned models reduced the runtime by nearly 30 percent.

As they explored the use of learned models for hashing, the researchers also found that throughout was impacted most by the number of sub-models. Each learned model is composed of smaller linear models that approximate the data distribution. With more sub-models, the learned model produces a more accurate approximation, but it takes more time.

“At a certain threshold of sub-models, you get enough information to build the approximation that you need for the hash function. But after that, it won’t lead to more improvement in collision reduction,” Sabek says.

Building off this analysis, the researchers want to use learned models to design hash functions for other types of data. They also plan to explore learned hashing for databases in which data can be inserted or deleted. When data are updated in this way, the model needs to change accordingly, but changing the model while maintaining accuracy is a difficult problem.

“We want to encourage the community to use machine learning inside more fundamental data structures and operations. Any kind of core data structure presents us with an opportunity use machine learning to capture data properties and get better performance. There is still a lot we can explore,” Sabek says.

This work was supported, in part, by Google, Intel, Microsoft, the National Science Foundation, the United States Air Force Research Laboratory, and the United States Air Force Artificial Intelligence Accelerator.

MIT professor to Congress: “We are at an inflection point” with AI

Government should not “abdicate” its responsibilities and leave the future path of artificial intelligence solely to Big Tech, Aleksander Mądry, the Cadence Design Systems Professor of Computing at MIT and director of the MIT Center for Deployable Machine Learning, told a Congressional panel on Wednesday. 

Rather, Mądry said, government should be asking questions about the purpose and explainability of the algorithms corporations are using, as a precursor to regulation, which he described as “an important tool” in ensuring that AI is consistent with society’s goals. If the government doesn’t start asking questions, then “I am extremely worried” about the future of AI, Mądry said in response to a question from Rep. Gerald Connolly.

Mądry, a leading expert on explainability and AI, was testifying at a hearing titled “Advances in AI: Are We Ready for a Tech Revolution?” before the House Subcommittee on Cybersecurity, Information Technology, and Government Innovation, a panel of the House Committee on Government Reform and Oversight. The other witnesses at the hearing were former Google CEO Eric Schmidt, IBM Vice President Scott Crowder, and Center for AI and Digital Policy Senior Research Director Merve Hickok.

In her opening remarks, Subcommittee Chair Rep. Nancy Mace cited the book “The Age of AI: And Our Human Future” by Schmidt, Henry Kissinger, and Dan Huttenlocher, the dean of the MIT Schwarzman College of Computing. She also called attention to a March 3 op-ed in The Wall Street Journal by the three authors that summarized the book while discussing ChatGPT. Mace said her formal opening remarks had been entirely written by ChatGPT.

In his prepared remarks, Mądry raised three overarching points. First, he noted that AI is “no longer a matter of science fiction” or confined to research labs. It is out in the world, where it can bring enormous benefits but also poses risks.

Second, he said AI exposes us to “interactions that go against our intuition.” He said because AI tools like ChatGPT mimic human communication, people are too likely to unquestioningly believe what such large language models produce. In the worst case, Mądry warned, human analytical skills will atrophy. He also said it would be a mistake to regulate AI as if it were human — for example, by asking AI to explain its reasoning and assuming that the resulting answers are credible.

Finally, he said too little attention has been paid to problems that will result from the nature of the AI “supply chain” — the way AI systems are built on top of each other. At the base are general systems like ChatGPT, which can be developed by only a few companies because they are so expensive and complex to build. Layered on top of such systems are many AI systems designed to handle a particular task, like figuring out whom a company should hire. 

Mądry said this layering raised several “policy-relevant” concerns. First, the entire system of AI is subject to whatever vulnerabilities or biases are in the large system at its base, and is dependent on the work of a few, large companies. Second, the interaction of AI systems is not well-understood from a technical standpoint, making the results of AI even more difficult to predict or explain, and making the tools difficult to “audit.” Finally, the mix of AI tools makes it difficult to know whom to hold responsible when a problem results — who should be legally liable and who should address the concern.

In the written material submitted to the subcommittee, Mądry concluded, “AI technology is not particularly well-suited for deployment through complex supply chains,” even though that is exactly how it is being deployed.

Mądry ended his testimony by calling on Congress to probe AI issues and to be prepared to act. “We are at an inflection point in terms of what future AI will bring. Seizing this opportunity means discussing the role of AI, what exactly we want it to do for us, and how to ensure it benefits us all. This will be a difficult conversation but we do need to have it, and have it now,” he told the subcommittee.

The testimony of all the hearing witnesses and a video of the hearing, which lasted about two hours is available at https://oversight.house.gov/hearing/advances-in-ai-are-we-ready-for-a-tech-revolution/.

Creating a versatile vaccine to take on Covid-19 in its many guises

One of the 12 labors of Hercules, according to ancient lore, was to destroy a nine-headed monster called the Hydra. The challenge was that when Hercules used his sword to chop off one of the monster’s heads, two would grow back in its place. He therefore needed an additional weapon, a torch, to vanquish his foe.

There are parallels between this legend and our three-years-and-counting battle with SARS-Cov-2, the virus that causes Covid-19. Every time scientists have thought they’d subdued one strain of the virus — be it alpha, beta, delta, or omicron — another variant or subvariant emerged a short while later.

For this reason, researchers at MIT and other institutions are preparing a new strategy against the virus — a novel vaccine that, unlike those in use today, could potentially counteract all variants of the disease, having a property called “pan-variance” that could circumvent the need for a different booster shot every time a new strain comes into circulation. In a paper published today in the journal Frontiers in Immunology, the team reports on experiments with mice that demonstrate the vaccine’s effectiveness in preventing death from Covid-19 infection.

Viral vaccines typically work by exposing the immune system to a small piece of the virus. That can create learned responses that protect people later when they’re exposed to the actual virus. The premise of standard Covid-19 vaccines, such as those produced by Moderna and Pfizer, is to activate the part of the immune system that releases neutralizing antibodies. They do this by providing cells with instructions (in the form of mRNA molecules) for making the spike protein — a protein found on the surface of the Covid-19 virus whose presence can trigger an immune reaction. “The problem with that approach is that the target keeps changing” — the spike protein itself can vary among different viral strains — “and that can make the vaccine ineffective,” says David Gifford, an MIT professor in electrical engineering and computer science and biological engineering, as well as a coauthor of the Frontiers paper.

He and his colleagues, accordingly, have taken a different approach, selecting a different target for their vaccine: activating the part of the immune system that unleashes “killer” T cells, which attack cells infected with the virus. A vaccine of this sort will not keep people from getting Covid-19, but it could keep them from getting very sick or dying.

A key innovation made by this group — which included researchers from MIT, the University of Texas, Boston University, Tufts University, Massachusetts General Hospital, and Acuitas Therapeutics — was to bring machine learning techniques into the vaccine design process. A critical aspect of that process involves determining which parts of SARS-Cov-2, which peptides (chains of amino acids that are the building blocks of proteins), should go into the vaccine. That entails sifting through thousands of peptides in the virus and picking out just 30 or so that should be incorporated.

But that decision has to take into account so-called HLA molecules — protein fragments on the surface of cells that serve as “billboards,” telling immune cells (which lack X-ray vision) what is going on inside other cells. The display of specific protein fragments can indicate, for instance, that a certain cell is infected by SARS-Cov-2 and should be gotten rid of.

Machine learning algorithms were used to solve a complicated set of “optimization problems,” notes Brandon Carter, a PhD student in MIT’s Department of Electrical Engineering and Computer Science, an affiliate of the MIT Computer Science and Artificial Intelligence Laboratory (CSAIL), and a lead author of the new paper. The overriding goal is to select peptides that are present, or “conserved,” in all variants of the virus. But those peptides also need to be associated with HLA molecules that have a high likelihood of being displayed so they can alert the immune system. “You want this to happen in as many people as possible to get maximum population coverage from your vaccine,” Carter says. Furthermore, you want each individual to be covered multiple times by the vaccine, he adds. “This means that more than one peptide in the vaccine is predicted to be displayed by some HLA in each person.” Achieving these various objectives is a task that can be significantly expedited by machine learning tools.

While that touches on the theoretical end of this project, the latest results came from experiments carried out by collaborators at the University of Texas Medical Branch in Galveston, which showed a strong immune response in mice given the vaccine. The mice in this experiment did not die but were were “humanized,” meaning that they had an HLA molecule found in human cells. “This study,” Carter says, “offers proof in a living system, an actual mouse, that the vaccines we devised using machine learning can afford protection from the Covid virus.” Gifford characterizes their work as “the first experimental evidence that a vaccine formulated in this fashion would be effective.”

Paul Offit, a professor of pediatrics in the Division of Infectious Diseases at Children’s Hospital of Philadelphia, finds the results encouraging. „A lot of people wonder about what approaches will be used to make Covid-19 vaccines in the future,” Offit says. „Given that T cells are critical in protection against severe Covid-19, future vaccines that focus on inducing the broadest T cell responses will be an important step forward in the next generation of vaccines.“

More animal studies — and eventual human studies — would have to be done before this work can usher in the “next generation of vaccines.” The fact that 24 percent of the lung cells in vaccinated mice were T cells, Gifford says, “showed that their immune systems were poised to fight viral infection.” But one has to be careful to avoid too strong of an immune response, he cautions, so as not to cause lung damage.

Other questions abound. Should T-cell vaccines be used instead of, or in combination with, standard spike protein vaccines? While it might be possible to enhance existing vaccines by including a T-cell component, Gifford says, “putting two things together may not be strictly additive, as one part of the vaccine could mask the other.”

Nevertheless, he and his colleagues believe their T-cell vaccine has the potential to help immunocompromised individuals who cannot produce neutralizing antibodies and thus may not benefit from traditional Covid vaccines. Their vaccine may also alleviate suffering from “long Covid” in people who continue to harbor reservoirs of the virus well after their initial infection.

The mechanism behind current flu vaccines, like current Covid-19 vaccines, is to induce neutralizing antibodies, but those vaccines don’t always work for different influenza strains. Carter sees potential for flu vaccines based on a T-cell response, “which may prove to be more effective, providing broader coverage, because of their pan-variance.”

Nor are the methods they are developing limited to Covid-19 or the flu, he maintains, as they might someday be applied to cancer. Gifford agrees, saying that a T-cell vaccine — designed to maximize immune protection both within an individual and among the greatest number of individuals — could become a key asset in the fight against cancer. “That’s not within the scope of our present study,” he says, “but it could be the subject of future work.”

Other MIT contributors to the work were Ge Liu and Alexander Dimitrakakis. The work was supported, in part, by Schmidt Futures and a C3.ai grant to David Gifford.

New insights into training dynamics of deep classifiers

A new study from researchers at MIT and Brown University characterizes several properties that emerge during the training of deep classifiers, a type of artificial neural network commonly used for classification tasks such as image classification, speech recognition, and natural language processing.

The paper, “Dynamics in Deep Classifiers trained with the Square Loss: Normalization, Low Rank, Neural Collapse and Generalization Bounds,” published today in the journal Research, is the first of its kind to theoretically explore the dynamics of training deep classifiers with the square loss and how properties such as rank minimization, neural collapse, and dualities between the activation of neurons and the weights of the layers are intertwined.

In the study, the authors focused on two types of deep classifiers: fully connected deep networks and convolutional neural networks (CNNs).

A previous study examined the structural properties that develop in large neural networks at the final stages of training. That study focused on the last layer of the network and found that deep networks trained to fit a training dataset will eventually reach a state known as “neural collapse.” When neural collapse occurs, the network maps multiple examples of a particular class (such as images of cats) to a single template of that class. Ideally, the templates for each class should be as far apart from each other as possible, allowing the network to accurately classify new examples.

An MIT group based at the MIT Center for Brains, Minds and Machines studied the conditions under which networks can achieve neural collapse. Deep networks that have the three ingredients of stochastic gradient descent (SGD), weight decay regularization (WD), and weight normalization (WN) will display neural collapse if they are trained to fit their training data. The MIT group has taken a theoretical approach — as compared to the empirical approach of the earlier study — proving that neural collapse emerges from the minimization of the square loss using SGD, WD, and WN.

Co-author and MIT McGovern Institute postdoc Akshay Rangamani states, “Our analysis shows that neural collapse emerges from the minimization of the square loss with highly expressive deep neural networks. It also highlights the key roles played by weight decay regularization and stochastic gradient descent in driving solutions towards neural collapse.”

Weight decay is a regularization technique that prevents the network from over-fitting the training data by reducing the magnitude of the weights. Weight normalization scales the weight matrices of a network so that they have a similar scale. Low rank refers to a property of a matrix where it has a small number of non-zero singular values. Generalization bounds offer guarantees about the ability of a network to accurately predict new examples that it has not seen during training.

The authors found that the same theoretical observation that predicts a low-rank bias also predicts the existence of an intrinsic SGD noise in the weight matrices and in the output of the network. This noise is not generated by the randomness of the SGD algorithm but by an interesting dynamic trade-off between rank minimization and fitting of the data, which provides an intrinsic source of noise similar to what happens in dynamic systems in the chaotic regime. Such a random-like search may be beneficial for generalization because it may prevent over-fitting.

“Interestingly, this result validates the classical theory of generalization showing that traditional bounds are meaningful. It also provides a theoretical explanation for the superior performance in many tasks of sparse networks, such as CNNs, with respect to dense networks,” comments co-author and MIT McGovern Institute postdoc Tomer Galanti. In fact, the authors prove new norm-based generalization bounds for CNNs with localized kernels, that is a network with sparse connectivity in their weight matrices.

In this case, generalization can be orders of magnitude better than densely connected networks. This result validates the classical theory of generalization, showing that its bounds are meaningful, and goes against a number of recent papers expressing doubts about past approaches to generalization. It also provides a theoretical explanation for the superior performance of sparse networks, such as CNNs, with respect to dense networks. Thus far, the fact that CNNs and not dense networks represent the success story of deep networks has been almost completely ignored by machine learning theory. Instead, the theory presented here suggests that this is an important insight in why deep networks work as well as they do.

“This study provides one of the first theoretical analyses covering optimization, generalization, and approximation in deep networks and offers new insights into the properties that emerge during training,” says co-author Tomaso Poggio, the Eugene McDermott Professor at the Department of Brain and Cognitive Sciences at MIT and co-director of the Center for Brains, Minds and Machines. “Our results have the potential to advance our understanding of why deep learning works as well as it does.”

Large language models are biased. Can logic help save them?

Turns out, even language models “think” they’re biased. When prompted in ChatGPT, the response was as follows: “Yes, language models can have biases, because the training data reflects the biases present in society from which that data was collected. For example, gender and racial biases are prevalent in many real-world datasets, and if a language model is trained on that, it can perpetuate and amplify these biases in its predictions.” A well-known but dangerous problem. 

Humans (typically) can dabble with both logical and stereotypical reasoning when learning. Still, language models mainly mimic the latter, an unfortunate narrative we’ve seen play out ad nauseam when the ability to employ reasoning and critical thinking is absent. So would injecting logic into the fray be enough to mitigate such behavior? 

Scientists from MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL) had an inkling that it might, so they set off to examine if logic-aware language models could significantly avoid more harmful stereotypes. They trained a language model to predict the relationship between two sentences, based on context and semantic meaning, using a dataset with labels for text snippets detailing if a second phrase “entails,” “contradicts,” or is neutral with respect to the first one. Using this dataset — natural language inference — they found that the newly trained models were significantly less biased than other baselines, without any extra data, data editing, or additional training algorithms.

For example, with the premise “the person is a doctor” and the hypothesis “the person is masculine,” using these logic-trained models, the relationship would be classified as “neutral,” since there’s no logic that says the person is a man. With more common language models, two sentences might seem to be correlated due to some bias in training data, like “doctor” might be pinged with “masculine,” even when there’s no evidence that the statement is true. 

At this point, the omnipresent nature of language models is well-known: Applications in natural language processing, speech recognition, conversational AI, and generative tasks abound. While not a nascent field of research, growing pains can take a front seat as they increase in complexity and capability. 

“Current language models suffer from issues with fairness, computational resources, and privacy,” says MIT CSAIL postdoc Hongyin Luo, the lead author of a new paper about the work. “Many estimates say that the CO2 emission of training a language model can be higher than the lifelong emission of a car. Running these large language models is also very expensive because of the amount of parameters and the computational resources they need. With privacy, state-of-the-art language models developed by places like ChatGPT or GPT-3 have their APIs where you must upload your language, but there’s no place for sensitive information regarding things like health care or finance. To solve these challenges, we proposed a logical language model that we qualitatively measured as fair, is 500 times smaller than the state-of-the-art models, can be deployed locally, and with no human-annotated training samples for downstream tasks. Our model uses 1/400 the parameters compared with the largest language models, has better performance on some tasks, and significantly saves computation resources.” 

This model, which has 350 million parameters, outperformed some very large-scale language models with 100 billion parameters on logic-language understanding tasks. The team evaluated, for example, popular BERT pretrained language models with their “textual entailment” ones on stereotype, profession, and emotion bias tests. The latter outperformed other models with significantly lower bias, while preserving the language modeling ability. The “fairness” was evaluated with something called ideal context association (iCAT) tests, where higher iCAT scores mean fewer stereotypes. The model had higher than 90 percent iCAT scores, while other strong language understanding models ranged between 40 to 80. 

Luo wrote the paper alongside MIT Senior Research Scientist James Glass. They will present the work at the Conference of the European Chapter of the Association for Computational Linguistics in Croatia. 

Unsurprisingly, the original pretrained language models the team examined were teeming with bias, confirmed by a slew of reasoning tests demonstrating how professional and emotion terms are significantly biased to the feminine or masculine words in the gender vocabulary. 

With professions, a language model (which is biased) thinks that “flight attendant,” “secretary,” and “physician’s assistant” are feminine jobs, while “fisherman,” “lawyer,” and “judge” are masculine. Concerning emotions, a language model thinks that “anxious,” “depressed,” and “devastated” are feminine.

While we may still be far away from a neutral language model utopia, this research is ongoing in that pursuit. Currently, the model is just for language understanding, so it’s based on reasoning among existing sentences. Unfortunately, it can’t generate sentences for now, so the next step for the researchers would be targeting the uber-popular generative models built with logical learning to ensure more fairness with computational efficiency. 

“Although stereotypical reasoning is a natural part of human recognition, fairness-aware people conduct reasoning with logic rather than stereotypes when necessary,“ says Luo. „We show that language models have similar properties. A language model without explicit logic learning makes plenty of biased reasoning, but adding logic learning can significantly mitigate such behavior. Furthermore, with demonstrated robust zero-shot adaptation ability, the model can be directly deployed to different tasks with more fairness, privacy, and better speed.”

Robot armies duke it out in Battlecode’s epic on-screen battles

In a packed room in MIT’s Stata Center, hundreds of digital robots collide across a giant screen projected at the front of the room. A crowd of students in the audience gasps and cheers as the battle’s outcome hangs in the balance. In an upper corner of the screen, the people who have programmed the robot armies’ strategies narrate the action in real time.

This isn’t the latest e-sports event, it’s MIT’s long-running Battlecode competition. Open to student teams around the world, Battlecode tasks participants with writing the code to program entire armies — not just individual bots — before they duke it out. The resulting dramatic, often-unexpected outcomes are decided based on whose programming strategy aligns best with the parameters of the game and the circumstances of the battle.

The unique competition pushes teams to spend hours coding and refining their armies in a quest for the perfectly crafted game plan. Since 2007, the competition has involved high school and college students from around the world, upping the intellectual ante as people with diverse backgrounds tackle the open-ended challenge.

“We change it every year, so there’s new rules, new types of robots, new actions they can do against each other, and a new goal for how to win,” Battlecode co-president and MIT sophomore Serena Li said before this year’s final match on Feb. 5. “The strategies change every year because the game changes.”

MIT was especially well-represented in this year’s final tournament. Of the 16 finalist teams, three were made up entirely of MIT students, while another included three MIT students and one Yale University student. The winners were a pair of students from Carnegie Mellon University.

Although this year’s competition is officially closed, the hard work and long hours required for success in Battlecode often create a bond among participants that lasts far beyond the tight timeline of the competition.

“The spirit of the competitors is what makes the program so great,” fellow co-president and MIT junior Andy Wang says. “There’s always teams looking to create more and more advanced robots and heuristics to solve this thing, and people are putting in all this work and dedication, only to be matched by competitors doing the same thing. It creates a really incredible atmosphere every year.”

Setting the code

Since the early 2000s, Battlecode has given students a specified amount of time and computing power to write a program for armies of bots that battle in a video-game-style tournament.

When the program kicks off in January, participants are given the Battlecode software and the year’s game parameters. Throughout Independent Activities Period (IAP), which MIT students can take for course credit, participants learn to use artificial intelligence, pathfinding, distributed algorithms, and more to make the best possible strategy.

“This is a game that’s too complicated to play manually,” explains MIT senior Isaac Liao, who won the main tournament last year. “You can’t control every unit because there are hundreds of them and you’re going for 2,000 turns.”

Battlecode includes tracks for first-time MIT participants, U.S. college students (including MIT students who have competed before), international college students, and high school teams.

“The ability for anyone to compete really opens up the opportunity for everyone to try their skills on an even playing field,” Wang says. “High schoolers and international students do really well, and it’s cool because a lot of these teams will stick together and keep contacting each other even after high school.”

Following a month of refining their strategies, teams begin competing in tournament matches that lead up to the final event. Battlecode’s organizers fly in the international finalists and set them up in a hotel, where they often meet in person for the first time after weeks of online back and forth. Liao, who has competed for several years, says he still keeps in touch with former competitors.

The final battle is played out in front of a live audience at MIT, with the top teams receiving cash prizes.

Over the years, there have been many memorable events. One year an MIT student broke the game by figuring out how to leave the software space designed for contestants. (He kindly informed organizers of the flaw before the actual tournament). Another year organizers threw a new variable into the battles: zombies. A team made the finals by hiding a bot in the corner of the screen and letting the rest of the bots turn to zombies to consume the opposition.

This year’s total prize pool was over $20,000. Organizers made about 200 T-shirts to give out before the final event and quickly ran out.

The unpredictable final match makes for a tense scene as competitors are given a mic to explain the strategies unfolding on screen in real time.

Wang says organizing the event, which has increased in complexity with the inclusion of international players, is hectic but fun.

“The Battlecode members are all really friendly and welcoming, and it’s a great time running the actual event and meeting all these new people and seeing this project you work on all semester come together,” Wang says.

Indeed, the ultimate legacy of Battlecode might be the friendships formed through the intense competition.

“A lot of teams are made of students who haven’t worked together too closely,” Wang says. “They found each other through the team-building process or they know each other casually, but a lot of them end up sticking together and go on to do a lot of things together. It’s a way to form these lifetime acquaintances.”

Skills that last a lifetime

A number of current and former players noted the skills required to have success in Battlecode transfer well to startups.

“Rather than other competitions where it’s just you in front of a computer, there’s a lot to be gained from teamwork in Battlecode,” says senior and former president Jerry Mao. “That really transfers into industry and into the real world.”

This year’s sponsors included Dropbox and Regression Games, which were both founded by past participants of Battlecode. Another past sponsor, Amplitude, was founded by Spenser Skates ’10 and Curtis Liu ’10, who met during Battlecode and have been working together ever since.

“There are a lot of parallels between what you’re trying to do in Battlecode and what you end up having to do in the early stages of a startup,” Liu says. “You have limited resources, limited time, and you’re trying to accomplish a goal. What we found is trying a lot of different things, putting our ideas out there and testing them with real data, really helped us focus on the things that actually mattered. That method of iteration and continual improvement set the foundation for how we approach building products and startups.”

Beyond startups, participants and organizers said Battlecode can prepare students for a number of careers, from quantitative trading to training AI systems to conducting research. Perhaps that’s why students keep coming back.

“The most important skills for success are a lot of iteration and perseverance and willingness to adapt on the fly — basically to change how you’re working quickly,” Wang says. “You see what other teams are doing and you’re not just competing but also talking to them, studying what they’re doing well, and adding their strengths to your bots. I think those skills are important anywhere, whether you’re building a startup or doing research or working in a big company.”

Phiala Shanahan is seeking fundamental answers about our physical world

In 2010, Phiala Shanahan was an undergraduate at the University of Adelaide, wrapping up a degree in computational physics, when she heard of an unexpected discovery in particle physics. The news had nothing to do with any of the rare, exotic particles that physicists were searching for at the time. Rather, the revelation revolved around the mundane, ubiquitous proton.

That year, scientists had measured the proton’s radius and discovered that the particle was ever so slightly smaller than what previous experiments had reported. This new measurement threw into question what physicists had assumed was well-understood: What exactly was the size of the proton?

What would then be coined the “proton radius puzzle” immediately drew Shanahan’s interest, prompting a more fundamental question: What else don’t we know about this seemingly straightforward particle?

“Protons and neutrons make up 99 percent of visible matter in the universe,” she says. “I assumed that, just like the mass of the proton is known very precisely, the size would be too. That was one moment fairly early on when I realized, there really are fundamental questions that we still have no answers to.”

The proton puzzle was one impetus that propelled Shanahan to pursue theoretical particle and nuclear physics. Today, she is the Class of 1957 Career Development Associate Professor of Physics at MIT, having recently received tenure at the Institute.

In her research, she seeks a fundamental understanding of our physical world. Using the equations of the Standard Model of Physics as her guide, she is looking for fundamental bridges — concrete, mathematical connections between the behavior of elementary particles, such as the quarks and gluons within a single proton, and the interactions between multiple protons, which coalesce into the visible matter we see around us.

Tracing these fundamental connections will ultimately help physicists recognize breaks in our understanding, such as instances when a proton interacts with dark matter, which is thought to make up 85 percent of the total mass in the universe and for which the Standard Model — our best representation of our physical understanding — has no explanation.

“We are trying to understand how you can bridge understanding from our most fundamental theory — this beautiful predictive theory of fundamental particles — all the way up to nuclear physics,” Shanahan says.

Up for a challenge

Shanahan was born in Sydney, Australia, and spent most of her childhood and early education in the suburbs of Adelaide, where she earned a scholarship to attend an all-girls school. She quickly took to studying math and science, learning new languages, and playing a variety of instruments.

“At the time, I don’t think you could’ve picked me for a scientist rather than a musician or a linguist,” she says.

After high school, Shanahan stayed local, attending the University of Adelaide, where she took classes in Latin and ancient Greek, and played in a cover band on the weekends. She also pursued a bachelor’s degree in high-performance computational physics, which she chose almost as a personal challenge.

“It was the hardest degree to get into at the time, and I thought, ‘I want something challenging,’” she recalls.

Her interest in physics began to crystallize after hearing of the proton radius puzzle one day in a research seminar. She also discovered that she enjoyed research, after accepting an offer to work as a summer assistant in the lab of her undergraduate advisor, Anthony Thomas, who specialized in nuclear physics. She continued working with Thomas through graduate school, also at the University of Adelaide, where she earned a PhD in theoretical nuclear physics.

“I’d already been caught by this idea that we didn’t know nearly as much about the proton as I thought, so my PhD was about understanding in great detail the structure of the proton and what we could add to that understanding,” Shanahan says.

A direct trace

After finishing her education in Australia, Shanahan looked to take her next step, outside the country. With funds from a traveling fellowship, she planned out a two-month tour of physics departments and facilities across Europe and the United States, including at MIT. The experience was a whirlwind, as Shanahan was introduced at every stop to new ideas and avenues of research.

“The mind expansion was really exciting,” she says.

When she came home to Australia, she found she was keen to keep on the research track, and to live abroad. Soon, she packed her bags for a postdoc position in MIT’s Department of Physics. She arrived at the Institute in 2015 and spent the next two years researching the interactions of gluons, the elementary, force-carrying particles that bind to quarks to form a proton.

“It’s very difficult to measure experimentally certain aspects of the gluon structure of a proton,” Shanahan says. “I wanted to see what we could calculate, which at the time was quite a new thing.”

Until then, Shanahan considered herself a mostly “pen-and-paper” theorist. But she wanted to see how far the behavior of gluons — interactions known as quantum chromodynamics — could be directly traced using the equations of the Standard Model. To do so would require large-scale numerical calculations, and she found herself learning a new set of computational tools and exploring ways to search for fundamental interactions among gluons using machine learning — a novel approach that Shanahan was one of the first to adopt, and which she continues to pursue today.

A creative space

After finishing her postdoc, she spent a year as a faculty member at the College of William and Mary and as a senior staff scientist at the Thomas Jefferson National Accelerator Facility before returning to MIT in 2018 as an assistant professor in the Center for Theoretical Physics. Before she put down campus roots, Shanahan spent the fall semester at the Perimeter Institute for Theoretical Physics in Ontario, Canada, as part of a fellowship that supports female physicists. The program provided food and board for fellows, and also delivered meals to their offices — all with the goal of freeing the physicists to focus on their work.

“That program really gave me the launchpad for what became my research agenda as a new faculty member,” she says. “It all started from that quiet time where I could actually think for hours at a time. That was incredibly valuable, and it gave me the space to be creative.”

At MIT, she continues to study the equations of the Standard Model to understand the quantum dynamics of gluons and quarks, and the structure of the proton, as well as the interactions that underpin nuclear physics, and what the fundamental behavior of certain nuclei can tell us about the conditions of the early universe.

She is also focusing on nuclei that are used in dark matter experiments, and is looking to map out the space of nuclear interactions that can be explained concretely through the Standard Model. Any interactions outside of this fundamentally derived space could then be a sign of dark matter or other phenomena beyond what the Standard Model can explain.

“Now my research group is going in all sorts of directions,” she says. “We are using every tool at our disposal, from pen-and-paper calculations, to designing and running new algorithms on supercomputers, to really understand new aspects of the structure and interactions of the matter that makes up our universe.”