The curse of variety in transportation systems

Cathy Wu has always delighted in systems that run smoothly. In high school, she designed a project to optimize the best route for getting to class on time. Her research interests and career track are evidence of a propensity for organizing and optimizing, coupled with a strong sense of responsibility to contribute to society instilled by her parents at a young age.

As an undergraduate at MIT, Wu explored domains like agriculture, energy, and education, eventually homing in on transportation. “Transportation touches each of our lives,” she says. “Every day, we experience the inefficiencies and safety issues as well as the environmental harms associated with our transportation systems. I believe we can and should do better.”

But doing so is complicated. Consider the long-standing issue of traffic systems control. Wu explains that it is not one problem, but more accurately a family of control problems impacted by variables like time of day, weather, and vehicle type — not to mention the types of sensing and communication technologies used to measure roadway information. Every differentiating factor introduces an exponentially larger set of control problems. There are thousands of control-problem variations and hundreds, if not thousands, of studies and papers dedicated to each problem. Wu refers to the sheer number of variations as the curse of variety — and it is hindering innovation.

“To prove that a new control strategy can be safely deployed on our streets can take years. As time lags, we lose opportunities to improve safety and equity while mitigating environmental impacts. Accelerating this process has huge potential,” says Wu.  

Which is why she and her group in the MIT Laboratory for Information and Decision Systems are devising machine learning-based methods to solve not just a single control problem or a single optimization problem, but families of control and optimization problems at scale. “In our case, we’re examining emerging transportation problems that people have spent decades trying to solve with classical approaches. It seems to me that we need a different approach.”

Optimizing intersections

Currently, Wu’s largest research endeavor is called Project Greenwave. There are many sectors that directly contribute to climate change, but transportation is responsible for the largest share of greenhouse gas emissions — 29 percent, of which 81 percent is due to land transportation. And while much of the conversation around mitigating environmental impacts related to mobility is focused on electric vehicles (EVs), electrification has its drawbacks. EV fleet turnover is time-consuming (“on the order of decades,” says Wu), and limited global access to the technology presents a significant barrier to widespread adoption.

Wu’s research, on the other hand, addresses traffic control problems by leveraging deep reinforcement learning. Specifically, she is looking at traffic intersections — and for good reason. In the United States alone, there are more than 300,000 signalized intersections where vehicles must stop or slow down before re-accelerating. And every re-acceleration burns fossil fuels and contributes to greenhouse gas emissions.

Highlighting the magnitude of the issue, Wu says, “We have done preliminary analysis indicating that up to 15 percent of land transportation CO2 is wasted through energy spent idling and re-accelerating at intersections.”

To date, she and her group have modeled 30,000 different intersections across 10 major metropolitan areas in the United States. That is 30,000 different configurations, roadway topologies (e.g., grade of road or elevation), different weather conditions, and variations in travel demand and fuel mix. Each intersection and its corresponding scenarios represents a unique multi-agent control problem.

Wu and her team are devising techniques that can solve not just one, but a whole family of problems comprised of tens of thousands of scenarios. Put simply, the idea is to coordinate the timing of vehicles so they arrive at intersections when traffic lights are green, thereby eliminating the start, stop, re-accelerate conundrum. Along the way, they are building an ecosystem of tools, datasets, and methods to enable roadway interventions and impact assessments of strategies to significantly reduce carbon-intense urban driving.

Their collaborator on the project is the Utah Department of Transportation, which Wu says has played an essential role, in part by sharing data and practical knowledge that she and her group otherwise would not have been able to access publicly.

“I appreciate industry and public sector collaborations,” says Wu. “When it comes to important societal problems, one really needs grounding with practitioners. One needs to be able to hear the perspectives in the field. My interactions with practitioners expand my horizons and help ground my research. You never know when you’ll hear the perspective that is the key to the solution, or perhaps the key to understanding the problem.”

Finding the best routes

In a similar vein, she and her research group are tackling large coordination problems. For example, vehicle routing. “Every day, delivery trucks route more than a hundred thousand packages for the city of Boston alone,” says Wu. Accomplishing the task requires, among other things, figuring out which trucks to use, which packages to deliver, and the order in which to deliver them as efficiently as possible. If and when the trucks are electrified, they will need to be charged, adding another wrinkle to the process and further complicating route optimization.

The vehicle routing problem, and therefore the scope of Wu’s work, extends beyond truck routing for package delivery. Ride-hailing cars may need to pick up objects as well as drop them off; and what if delivery is done by bicycle or drone? In partnership with Amazon, for example, Wu and her team addressed routing and path planning for hundreds of robots (up to 800) in their warehouses.

Every variation requires custom heuristics that are expensive and time-consuming to develop. Again, this is really a family of problems — each one complicated, time-consuming, and currently unsolved by classical techniques — and they are all variations of a central routing problem. The curse of variety meets operations and logistics.

By combining classical approaches with modern deep-learning methods, Wu is looking for a way to automatically identify heuristics that can effectively solve all of these vehicle routing problems. So far, her approach has proved successful.

“We’ve contributed hybrid learning approaches that take existing solution methods for small problems and incorporate them into our learning framework to scale and accelerate that existing solver for large problems. And we’re able to do this in a way that can automatically identify heuristics for specialized variations of the vehicle routing problem.” The next step, says Wu, is applying a similar approach to multi-agent robotics problems in automated warehouses.

Wu and her group are making big strides, in part due to their dedication to use-inspired basic research. Rather than applying known methods or science to a problem, they develop new methods, new science, to address problems. The methods she and her team employ are necessitated by societal problems with practical implications. The inspiration for the approach? None other than Louis Pasteur, who described his research style in a now-famous article titled “Pasteur’s Quadrant.” Anthrax was decimating the sheep population, and Pasteur wanted to better understand why and what could be done about it. The tools of the time could not solve the problem, so he invented a new field, microbiology, not out of curiosity but out of necessity.

AI model can help determine where a patient’s cancer arose

For a small percentage of cancer patients, doctors are unable to determine where their cancer originated. This makes it much more difficult to choose a treatment for those patients, because many cancer drugs are typically developed for specific cancer types.

A new approach developed by researchers at MIT and Dana-Farber Cancer Institute may make it easier to identify the sites of origin for those enigmatic cancers. Using machine learning, the researchers created a computational model that can analyze the sequence of about 400 genes and use that information to predict where a given tumor originated in the body.

Using this model, the researchers showed that they could accurately classify at least 40 percent of tumors of unknown origin with high confidence, in a dataset of about 900 patients. This approach enabled a 2.2-fold increase in the number of patients who could have been eligible for a genomically guided, targeted treatment, based on where their cancer originated.

“That was the most important finding in our paper, that this model could be potentially used to aid treatment decisions, guiding doctors toward personalized treatments for patients with cancers of unknown primary origin,” says Intae Moon, an MIT graduate student in electrical engineering and computer science who is the lead author of the new study.

Alexander Gusev, an associate professor of medicine at Harvard Medical School and Dana-Farber Cancer Institute, is the senior author of the paper, which appears today in Nature Medicine.

Mysterious origins

In 3 to 5 percent of cancer patients, particularly in cases where tumors have metastasized throughout the body, oncologists don’t have an easy way to determine where the cancer originated. These tumors are classified as cancers of unknown primary (CUP).

This lack of knowledge often prevents doctors from being able to give patients “precision” drugs, which are typically approved for specific cancer types where they are known to work. These targeted treatments tend to be more effective and have fewer side effects than treatments that are used for a broad spectrum of cancers, which are commonly prescribed to CUP patients.

“A sizeable number of individuals develop these cancers of unknown primary every year, and because most therapies are approved in a site-specific way, where you have to know the primary site to deploy them, they have very limited treatment options,” Gusev says.

Moon, an affiliate of the Computer Science and Artificial Intelligence Laboratory who is co-advised by Gusev, decided to analyze genetic data that is routinely collected at Dana-Farber to see if it could be used to predict cancer type. The data consist of genetic sequences for about 400 genes that are often mutated in cancer. The researchers trained a machine-learning model on data from nearly 30,000 patients who had been diagnosed with one of 22 known cancer types. That set of data included patients from Memorial Sloan Kettering Cancer Center and Vanderbilt-Ingram Cancer Center, as well as Dana-Farber.

The researchers then tested the resulting model on about 7,000 tumors that it hadn’t seen before, but whose site of origin was known. The model, which the researchers named OncoNPC, was able to predict their origins with about 80 percent accuracy. For tumors with high-confidence predictions, which constituted about 65 percent of the total, its accuracy rose to roughly 95 percent.

After those encouraging results, the researchers used the model to analyze a set of about 900 tumors from patients with CUP, which were all from Dana-Farber. They found that for 40 percent of these tumors, the model was able to make high-confidence predictions.

The researchers then compared the model’s predictions with an analysis of the germline, or inherited, mutations in a subset of tumors with available data, which can reveal whether the patients have a genetic predisposition to develop a particular type of cancer. The researchers found that the model’s predictions were much more likely to match the type of cancer most strongly predicted by the germline mutations than any other type of cancer.

Guiding drug decisions

To further validate the model’s predictions, the researchers compared data on the CUP patients’ survival time with the typical prognosis for the type of cancer that the model predicted. They found that CUP patients who were predicted to have cancer with a poor prognosis, such as pancreatic cancer, showed correspondingly shorter survival times. Meanwhile, CUP patients who were predicted to have cancers that typically have better prognoses, such as neuroendocrine tumors, had longer survival times.

Another indication that the model’s predictions could be useful came from looking at the types of treatments that CUP patients analyzed in the study had received. About 10 percent of these patients had received a targeted treatment, based on their oncologists’ best guess about where their cancer had originated. Among those patients, those who received a treatment consistent with the type of cancer that the model predicted for them fared better than patients who received a treatment typically given for a different type of cancer than what the model predicted for them.

Using this model, the researchers also identified an additional 15 percent of patients (2.2-fold increase) who could have received an existing targeted treatment, if their cancer type had been known. Instead, those patients ended up receiving more general chemotherapy drugs.

“That potentially makes these findings more clinically actionable because we’re not requiring a new drug to be approved. What we’re saying is that this population can now be eligible for precision treatments that already exist,” Gusev says.

The researchers now hope to expand their model to include other types of data, such as pathology images and radiology images, to provide a more comprehensive prediction using multiple data modalities. This would also provide the model with a comprehensive perspective of tumors, enabling it to predict not just the type of tumor and patient outcome, but potentially even the optimal treatment.

The research was funded by the National Institutes of Health, the Louis B. Mayer Foundation, the Doris Duke Charitable Foundation, the Phi Beta Psi Sorority, and the Emerson Collective.

Using AI to protect against AI image manipulation

As we enter a new era where technologies powered by artificial intelligence can craft and manipulate images with a precision that blurs the line between reality and fabrication, the specter of misuse looms large. Recently, advanced generative models such as DALL-E and Midjourney, celebrated for their impressive precision and user-friendly interfaces, have made the production of hyper-realistic images relatively effortless. With the barriers of entry lowered, even inexperienced users can generate and manipulate high-quality images from simple text descriptions — ranging from innocent image alterations to malicious changes. Techniques like watermarking pose a promising solution, but misuse requires a preemptive (as opposed to only post hoc) measure. 

In the quest to create such a new measure, researchers from MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL) developed “PhotoGuard,” a technique that uses perturbations — minuscule alterations in pixel values invisible to the human eye but detectable by computer models — that effectively disrupt the model’s ability to manipulate the image.

PhotoGuard uses two different “attack” methods to generate these perturbations. The more straightforward “encoder” attack targets the image’s latent representation in the AI model, causing the model to perceive the image as a random entity. The more sophisticated “diffusion” one defines a target image and optimizes the perturbations to make the final image resemble the target as closely as possible.

“Consider the possibility of fraudulent propagation of fake catastrophic events, like an explosion at a significant landmark. This deception can manipulate market trends and public sentiment, but the risks are not limited to the public sphere. Personal images can be inappropriately altered and used for blackmail, resulting in significant financial implications when executed on a large scale,” says Hadi Salman, an MIT graduate student in electrical engineering and computer science (EECS), affiliate of MIT CSAIL, and lead author of a new paper about PhotoGuard

“In more extreme scenarios, these models could simulate voices and images for staging false crimes, inflicting psychological distress and financial loss. The swift nature of these actions compounds the problem. Even when the deception is eventually uncovered, the damage — whether reputational, emotional, or financial — has often already happened. This is a reality for victims at all levels, from individuals bullied at school to society-wide manipulation.”

PhotoGuard in practice

AI models view an image differently from how humans do. It sees an image as a complex set of mathematical data points that describe every pixel’s color and position — this is the image’s latent representation. The encoder attack introduces minor adjustments into this mathematical representation, causing the AI model to perceive the image as a random entity. As a result, any attempt to manipulate the image using the model becomes nearly impossible. The changes introduced are so minute that they are invisible to the human eye, thus preserving the image’s visual integrity while ensuring its protection.

The second and decidedly more intricate “diffusion” attack strategically targets the entire diffusion model end-to-end. This involves determining a desired target image, and then initiating an optimization process with the intention of closely aligning the generated image with this preselected target.

In implementing, the team created perturbations within the input space of the original image. These perturbations are then used during the inference stage, and applied to the images, offering a robust defense against unauthorized manipulation.

“The progress in AI that we are witnessing is truly breathtaking, but it enables beneficial and malicious uses of AI alike,” says MIT professor of EECS and CSAIL principal investigator Aleksander Madry, who is also an author on the paper. “It is thus urgent that we work towards identifying and mitigating the latter. I view PhotoGuard as our small contribution to that important effort.”

The diffusion attack is more computationally intensive than its simpler sibling, and requires significant GPU memory. The team says that approximating the diffusion process with fewer steps mitigates the issue, thus making the technique more practical.

To better illustrate the attack, consider an art project, for example. The original image is a drawing, and the target image is another drawing that’s completely different. The diffusion attack is like making tiny, invisible changes to the first drawing so that, to an AI model, it begins to resemble the second drawing. However, to the human eye, the original drawing remains unchanged.

By doing this, any AI model attempting to modify the original image will now inadvertently make changes as if dealing with the target image, thereby protecting the original image from intended manipulation. The result is a picture that remains visually unaltered for human observers, but protects against unauthorized edits by AI models.

As far as a real example with PhotoGuard, consider an image with multiple faces. You could mask any faces you don’t want to modify, and then prompt with “two men attending a wedding.” Upon submission, the system will adjust the image accordingly, creating a plausible depiction of two men participating in a wedding ceremony.

Now, consider safeguarding the image from being edited; adding perturbations to the image before upload can immunize it against modifications. In this case, the final output will lack realism compared to the original, non-immunized image.

All hands on deck

Key allies in the fight against image manipulation are the creators of the image-editing models, says the team. For PhotoGuard to be effective, an integrated response from all stakeholders is necessary. “Policymakers should consider implementing regulations that mandate companies to protect user data from such manipulations. Developers of these AI models could design APIs that automatically add perturbations to users’ images, providing an added layer of protection against unauthorized edits,” says Salman.

Despite PhotoGuard’s promise, it’s not a panacea. Once an image is online, individuals with malicious intent could attempt to reverse engineer the protective measures by applying noise, cropping, or rotating the image. However, there is plenty of previous work from the adversarial examples literature that can be utilized here to implement robust perturbations that resist common image manipulations.

“A collaborative approach involving model developers, social media platforms, and policymakers presents a robust defense against unauthorized image manipulation. Working on this pressing issue is of paramount importance today,” says Salman. “And while I am glad to contribute towards this solution, much work is needed to make this protection practical. Companies that develop these models need to invest in engineering robust immunizations against the possible threats posed by these AI tools. As we tread into this new era of generative models, let’s strive for potential and protection in equal measures.”

“The prospect of using attacks on machine learning to protect us from abusive uses of this technology is very compelling,” says Florian Tramèr, an assistant professor at ETH Zürich. “The paper has a nice insight that the developers of generative AI models have strong incentives to provide such immunization protections to their users, which could even be a legal requirement in the future. However, designing image protections that effectively resist circumvention attempts is a challenging problem: Once the generative AI company commits to an immunization mechanism and people start applying it to their online images, we need to ensure that this protection will work against motivated adversaries who might even use better generative AI models developed in the near future. Designing such robust protections is a hard open problem, and this paper makes a compelling case that generative AI companies should be working on solving it.”

Salman wrote the paper alongside fellow lead authors Alaa Khaddaj and Guillaume Leclerc MS ’18, as well as Andrew Ilyas ’18, MEng ’18; all three are EECS graduate students and MIT CSAIL affiliates. The team’s work was partially done on the MIT Supercloud compute cluster, supported by U.S. National Science Foundation grants and Open Philanthropy, and based upon work supported by the U.S. Defense Advanced Research Projects Agency. It was presented at the International Conference on Machine Learning this July.

A simpler method for learning to control a robot

Researchers from MIT and Stanford University have devised a new machine-learning approach that could be used to control a robot, such as a drone or autonomous vehicle, more effectively and efficiently in dynamic environments where conditions can change rapidly.

This technique could help an autonomous vehicle learn to compensate for slippery road conditions to avoid going into a skid, allow a robotic free-flyer to tow different objects in space, or enable a drone to closely follow a downhill skier despite being buffeted by strong winds.

The researchers’ approach incorporates certain structure from control theory into the process for learning a model in such a way that leads to an effective method of controlling complex dynamics, such as those caused by impacts of wind on the trajectory of a flying vehicle. One way to think about this structure is as a hint that can help guide how to control a system.

“The focus of our work is to learn intrinsic structure in the dynamics of the system that can be leveraged to design more effective, stabilizing controllers,” says Navid Azizan, the Esther and Harold E. Edgerton Assistant Professor in the MIT Department of Mechanical Engineering and the Institute for Data, Systems, and Society (IDSS), and a member of the Laboratory for Information and Decision Systems (LIDS). “By jointly learning the system’s dynamics and these unique control-oriented structures from data, we’re able to naturally create controllers that function much more effectively in the real world.”

Using this structure in a learned model, the researchers’ technique immediately extracts an effective controller from the model, as opposed to other machine-learning methods that require a controller to be derived or learned separately with additional steps. With this structure, their approach is also able to learn an effective controller using fewer data than other approaches. This could help their learning-based control system achieve better performance faster in rapidly changing environments.

“This work tries to strike a balance between identifying structure in your system and just learning a model from data,” says lead author Spencer M. Richards, a graduate student at Stanford University. “Our approach is inspired by how roboticists use physics to derive simpler models for robots. Physical analysis of these models often yields a useful structure for the purposes of control — one that you might miss if you just tried to naively fit a model to data. Instead, we try to identify similarly useful structure from data that indicates how to implement your control logic.”

Additional authors of the paper are Jean-Jacques Slotine, professor of mechanical engineering and of brain and cognitive sciences at MIT, and Marco Pavone, associate professor of aeronautics and astronautics at Stanford. The research will be presented at the International Conference on Machine Learning (ICML).

Learning a controller

Determining the best way to control a robot to accomplish a given task can be a difficult problem, even when researchers know how to model everything about the system.

A controller is the logic that enables a drone to follow a desired trajectory, for example. This controller would tell the drone how to adjust its rotor forces to compensate for the effect of winds that can knock it off a stable path to reach its goal.

This drone is a dynamical system — a physical system that evolves over time. In this case, its position and velocity change as it flies through the environment. If such a system is simple enough, engineers can derive a controller by hand. 

Modeling a system by hand intrinsically captures a certain structure based on the physics of the system. For instance, if a robot were modeled manually using differential equations, these would capture the relationship between velocity, acceleration, and force. Acceleration is the rate of change in velocity over time, which is determined by the mass of and forces applied to the robot.

But often the system is too complex to be exactly modeled by hand. Aerodynamic effects, like the way swirling wind pushes a flying vehicle, are notoriously difficult to derive manually, Richards explains. Researchers would instead take measurements of the drone’s position, velocity, and rotor speeds over time, and use machine learning to fit a model of this dynamical system to the data. But these approaches typically don’t learn a control-based structure. This structure is useful in determining how to best set the rotor speeds to direct the motion of the drone over time.

Once they have modeled the dynamical system, many existing approaches also use data to learn a separate controller for the system.

“Other approaches that try to learn dynamics and a controller from data as separate entities are a bit detached philosophically from the way we normally do it for simpler systems. Our approach is more reminiscent of deriving models by hand from physics and linking that to control,” Richards says.

Identifying structure

The team from MIT and Stanford developed a technique that uses machine learning to learn the dynamics model, but in such a way that the model has some prescribed structure that is useful for controlling the system.

With this structure, they can extract a controller directly from the dynamics model, rather than using data to learn an entirely separate model for the controller.

“We found that beyond learning the dynamics, it’s also essential to learn the control-oriented structure that supports effective controller design. Our approach of learning state-dependent coefficient factorizations of the dynamics has outperformed the baselines in terms of data efficiency and tracking capability, proving to be successful in efficiently and effectively controlling the system’s trajectory,” Azizan says. 

When they tested this approach, their controller closely followed desired trajectories, outpacing all the baseline methods. The controller extracted from their learned model nearly matched the performance of a ground-truth controller, which is built using the exact dynamics of the system.

“By making simpler assumptions, we got something that actually worked better than other complicated baseline approaches,” Richards adds.

The researchers also found that their method was data-efficient, which means it achieved high performance even with few data. For instance, it could effectively model a highly dynamic rotor-driven vehicle using only 100 data points. Methods that used multiple learned components saw their performance drop much faster with smaller datasets.

This efficiency could make their technique especially useful in situations where a drone or robot needs to learn quickly in rapidly changing conditions.

Plus, their approach is general and could be applied to many types of dynamical systems, from robotic arms to free-flying spacecraft operating in low-gravity environments.

In the future, the researchers are interested in developing models that are more physically interpretable, and that would be able to identify very specific information about a dynamical system, Richards says. This could lead to better-performing controllers.

“Despite its ubiquity and importance, nonlinear feedback control remains an art, making it especially suitable for data-driven and learning-based methods. This paper makes a significant contribution to this area by proposing a method that jointly learns system dynamics, a controller, and control-oriented structure,” says Nikolai Matni, an assistant professor in the Department of Electrical and Systems Engineering at the University of Pennsylvania, who was not involved with this work. “What I found particularly exciting and compelling was the integration of these components into a joint learning algorithm, such that control-oriented structure acts as an inductive bias in the learning process. The result is a data-efficient learning process that outputs dynamic models that enjoy intrinsic structure that enables effective, stable, and robust control. While the technical contributions of the paper are excellent themselves, it is this conceptual contribution that I view as most exciting and significant.”

This research is supported, in part, by the NASA University Leadership Initiative and the Natural Sciences and Engineering Research Council of Canada.

A faster way to teach a robot

Imagine purchasing a robot to perform household tasks. This robot was built and trained in a factory on a certain set of tasks and has never seen the items in your home. When you ask it to pick up a mug from your kitchen table, it might not recognize your mug (perhaps because this mug is painted with an unusual image, say, of MIT’s mascot, Tim the Beaver). So, the robot fails.

“Right now, the way we train these robots, when they fail, we don’t really know why. So you would just throw up your hands and say, ‘OK, I guess we have to start over.’ A critical component that is missing from this system is enabling the robot to demonstrate why it is failing so the user can give it feedback,” says Andi Peng, an electrical engineering and computer science (EECS) graduate student at MIT.

Peng and her collaborators at MIT, New York University, and the University of California at Berkeley created a framework that enables humans to quickly teach a robot what they want it to do, with a minimal amount of effort.

When a robot fails, the system uses an algorithm to generate counterfactual explanations that describe what needed to change for the robot to succeed. For instance, maybe the robot would have been able to pick up the mug if the mug were a certain color. It shows these counterfactuals to the human and asks for feedback on why the robot failed. Then the system utilizes this feedback and the counterfactual explanations to generate new data it uses to fine-tune the robot.

Fine-tuning involves tweaking a machine-learning model that has already been trained to perform one task, so it can perform a second, similar task.

The researchers tested this technique in simulations and found that it could teach a robot more efficiently than other methods. The robots trained with this framework performed better, while the training process consumed less of a human’s time.

This framework could help robots learn faster in new environments without requiring a user to have technical knowledge. In the long run, this could be a step toward enabling general-purpose robots to efficiently perform daily tasks for the elderly or individuals with disabilities in a variety of settings.

Peng, the lead author, is joined by co-authors Aviv Netanyahu, an EECS graduate student; Mark Ho, an assistant professor at the Stevens Institute of Technology; Tianmin Shu, an MIT postdoc; Andreea Bobu, a graduate student at UC Berkeley; and senior authors Julie Shah, an MIT professor of aeronautics and astronautics and the director of the Interactive Robotics Group in the Computer Science and Artificial Intelligence Laboratory (CSAIL), and Pulkit Agrawal, a professor in CSAIL. The research will be presented at the International Conference on Machine Learning.

On-the-job training

Robots often fail due to distribution shift — the robot is presented with objects and spaces it did not see during training, and it doesn’t understand what to do in this new environment.

One way to retrain a robot for a specific task is imitation learning. The user could demonstrate the correct task to teach the robot what to do. If a user tries to teach a robot to pick up a mug, but demonstrates with a white mug, the robot could learn that all mugs are white. It may then fail to pick up a red, blue, or “Tim-the-Beaver-brown” mug.

Training a robot to recognize that a mug is a mug, regardless of its color, could take thousands of demonstrations.

“I don’t want to have to demonstrate with 30,000 mugs. I want to demonstrate with just one mug. But then I need to teach the robot so it recognizes that it can pick up a mug of any color,” Peng says.

To accomplish this, the researchers’ system determines what specific object the user cares about (a mug) and what elements aren’t important for the task (perhaps the color of the mug doesn’t matter). It uses this information to generate new, synthetic data by changing these “unimportant” visual concepts. This process is known as data augmentation.

The framework has three steps. First, it shows the task that caused the robot to fail. Then it collects a demonstration from the user of the desired actions and generates counterfactuals by searching over all features in the space that show what needed to change for the robot to succeed.

The system shows these counterfactuals to the user and asks for feedback to determine which visual concepts do not impact the desired action. Then it uses this human feedback to generate many new augmented demonstrations.

In this way, the user could demonstrate picking up one mug, but the system would produce demonstrations showing the desired action with thousands of different mugs by altering the color. It uses these data to fine-tune the robot.

Creating counterfactual explanations and soliciting feedback from the user are critical for the technique to succeed, Peng says.

From human reasoning to robot reasoning

Because their work seeks to put the human in the training loop, the researchers tested their technique with human users. They first conducted a study in which they asked people if counterfactual explanations helped them identify elements that could be changed without affecting the task.

“It was so clear right off the bat. Humans are so good at this type of counterfactual reasoning. And this counterfactual step is what allows human reasoning to be translated into robot reasoning in a way that makes sense,” she says.

Then they applied their framework to three simulations where robots were tasked with: navigating to a goal object, picking up a key and unlocking a door, and picking up a desired object then placing it on a tabletop. In each instance, their method enabled the robot to learn faster than with other techniques, while requiring fewer demonstrations from users.

Moving forward, the researchers hope to test this framework on real robots. They also want to focus on reducing the time it takes the system to create new data using generative machine-learning models.

“We want robots to do what humans do, and we want them to do it in a semantically meaningful way. Humans tend to operate in this abstract space, where they don’t think about every single property in an image. At the end of the day, this is really about enabling a robot to learn a good, human-like representation at an abstract level,” Peng says.

This research is supported, in part, by a National Science Foundation Graduate Research Fellowship, Open Philanthropy, an Apple AI/ML Fellowship, Hyundai Motor Corporation, the MIT-IBM Watson AI Lab, and the National Science Foundation Institute for Artificial Intelligence and Fundamental Interactions.

Armando Solar-Lezama named inaugural Distinguished College of Computing Professor

The MIT Stephen A. Schwarzman College of Computing named Armando Solar-Lezama as the inaugural Distinguished College of Computing Professor, effective July 1. 

Solar-Lezama is the first person appointed to this position generously endowed by Professor Jae S. Lim of the Department of Electrical Engineering and Computer Science (EECS). Established in the MIT Schwarzman College of Computing, the chair is being awarded to Solar-Lezama for being an outstanding faculty member who is recognized as a leader and innovator.

“I’m pleased to make this appointment and recognize Armando for his remarkable contributions to MIT and the scientific community,” says Daniel Huttenlocher, dean of the MIT Schwarzman College of Computing and the Henry Ellis Warren Professor of Electrical Engineering and Computer Science. “I’m greatly appreciative of Professor Lim for his thoughtful gesture in creating this new chair in the college, providing us with the opportunity to acknowledge the accomplishments of our faculty.”

Solar-Lezama, a professor of electrical engineering and computer science, leads the Computer-Aided Programming Group in the Computer Science and Artificial Intelligence Laboratory (CSAIL) that focuses on program synthesis, an area of research that lies at the intersection of programming systems and artificial intelligence. The group’s research ranges from designing new analysis techniques and automated reasoning mechanisms to developing new programming models that automate challenging aspects of programming.

A member of the EECS faculty since 2008, Solar-Lezama, who also serves as the associate director and chief operating officer for CSAIL, is most interested in software synthesis and its applications to particular program domains such as high-performance computing. He first found this niche area of program synthesis as a graduate student at the University of California at Berkeley, for which his thesis project, a language called Sketch, treats program synthesis as a search problem in which the algorithms pare down the search space to make the search faster and more efficient. Since then, program synthesis research has greatly expanded into the active field it is today.

AI helps household robots cut planning time in half

Your brand new household robot is delivered to your house, and you ask it to make you a cup of coffee. Although it knows some basic skills from previous practice in simulated kitchens, there are way too many actions it could possibly take — turning on the faucet, flushing the toilet, emptying out the flour container, and so on. But there’s a tiny number of actions that could possibly be useful. How is the robot to figure out what steps are sensible in a new situation?

It could use PIGINet, a new system that aims to efficiently enhance the problem-solving capabilities of household robots. Researchers from MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL) are using machine learning to cut down on the typical iterative process of task planning that considers all possible actions. PIGINet eliminates task plans that can’t satisfy collision-free requirements, and reduces planning time by 50-80 percent when trained on only 300-500 problems. 

Typically, robots attempt various task plans and iteratively refine their moves until they find a feasible solution, which can be inefficient and time-consuming, especially when there are movable and articulated obstacles. Maybe after cooking, for example, you want to put all the sauces in the cabinet. That problem might take two to eight steps depending on what the world looks like at that moment. Does the robot need to open multiple cabinet doors, or are there any obstacles inside the cabinet that need to be relocated in order to make space? You don’t want your robot to be annoyingly slow — and it will be worse if it burns dinner while it’s thinking.

Household robots are usually thought of as following predefined recipes for performing tasks, which isn’t always suitable for diverse or changing environments. So, how does PIGINet avoid those predefined rules? PIGINet is a neural network that takes in “Plans, Images, Goal, and Initial facts,” then predicts the probability that a task plan can be refined to find feasible motion plans. In simple terms, it employs a transformer encoder, a versatile and state-of-the-art model designed to operate on data sequences. The input sequence, in this case, is information about which task plan it is considering, images of the environment, and symbolic encodings of the initial state and the desired goal. The encoder combines the task plans, image, and text to generate a prediction regarding the feasibility of the selected task plan. 

Keeping things in the kitchen, the team created hundreds of simulated environments, each with different layouts and specific tasks that require objects to be rearranged among counters, fridges, cabinets, sinks, and cooking pots. By measuring the time taken to solve problems, they compared PIGINet against prior approaches. One correct task plan may include opening the left fridge door, removing a pot lid, moving the cabbage from pot to fridge, moving a potato to the fridge, picking up the bottle from the sink, placing the bottle in the sink, picking up the tomato, or placing the tomato. PIGINet significantly reduced planning time by 80 percent in simpler scenarios and 20-50 percent in more complex scenarios that have longer plan sequences and less training data.

“Systems such as PIGINet, which use the power of data-driven methods to handle familiar cases efficiently, but can still fall back on “first-principles” planning methods to verify learning-based suggestions and solve novel problems, offer the best of both worlds, providing reliable and efficient general-purpose solutions to a wide variety of problems,” says MIT Professor and CSAIL Principal Investigator Leslie Pack Kaelbling.

PIGINet’s use of multimodal embeddings in the input sequence allowed for better representation and understanding of complex geometric relationships. Using image data helped the model to grasp spatial arrangements and object configurations without knowing the object 3D meshes for precise collision checking, enabling fast decision-making in different environments. 

One of the major challenges faced during the development of PIGINet was the scarcity of good training data, as all feasible and infeasible plans need to be generated by traditional planners, which is slow in the first place. However, by using pretrained vision language models and data augmentation tricks, the team was able to address this challenge, showing impressive plan time reduction not only on problems with seen objects, but also zero-shot generalization to previously unseen objects.

“Because everyone’s home is different, robots should be adaptable problem-solvers instead of just recipe followers. Our key idea is to let a general-purpose task planner generate candidate task plans and use a deep learning model to select the promising ones. The result is a more efficient, adaptable, and practical household robot, one that can nimbly navigate even complex and dynamic environments. Moreover, the practical applications of PIGINet are not confined to households,” says Zhutian Yang, MIT CSAIL PhD student and lead author on the work. “Our future aim is to further refine PIGINet to suggest alternate task plans after identifying infeasible actions, which will further speed up the generation of feasible task plans without the need of big datasets for training a general-purpose planner from scratch. We believe that this could revolutionize the way robots are trained during development and then applied to everyone’s homes.” 

“This paper addresses the fundamental challenge in implementing a general-purpose robot: how to learn from past experience to speed up the decision-making process in unstructured environments filled with a large number of articulated and movable obstacles,” says Beomjoon Kim PhD ’20, assistant professor in the Graduate School of AI at Korea Advanced Institute of Science and Technology (KAIST). “The core bottleneck in such problems is how to determine a high-level task plan such that there exists a low-level motion plan that realizes the high-level plan. Typically, you have to oscillate between motion and task planning, which causes significant computational inefficiency. Zhutian’s work tackles this by using learning to eliminate infeasible task plans, and is a step in a promising direction.”

Yang wrote the paper with NVIDIA research scientist Caelan Garrett SB ’15, MEng ’15, PhD ’21; MIT Department of Electrical Engineering and Computer Science professors and CSAIL members Tomás Lozano-Pérez and Leslie Kaelbling; and Senior Director of Robotics Research at NVIDIA and University of Washington Professor Dieter Fox. The team was supported by AI Singapore and grants from National Science Foundation, the Air Force Office of Scientific Research, and the Army Research Office. This project was partially conducted while Yang was an intern at NVIDIA Research. Their research will be presented in July at the conference Robotics: Science and Systems.

Study finds ChatGPT boosts worker productivity for some writing tasks

Amid a huge amount of hype around generative AI, a new study from researchers at MIT sheds light on the technology’s impact on work, finding that it increased productivity for workers assigned tasks like writing cover letters, delicate emails, and cost-benefit analyses.

The tasks in the study weren’t quite replicas of real work: They didn’t require precise factual accuracy or context about things like a company’s goals or a customer’s preferences. Still, a number of the study’s participants said the assignments were similar to things they’d written in their real jobs — and the benefits were substantial. Access to the assistive chatbot ChatGPT decreased the time it took workers to complete the tasks by 40 percent, and output quality, as measured by independent evaluators, rose by 18 percent.

The researchers hope the study, which appears today in open-access form in the journal Science, helps people understand the impact that AI tools like ChatGPT can have on the workforce.

What we can say for sure is generative AI is going to have a big effect on white collar work,” says Shakked Noy, a PhD student in MIT’s Department of Economics, who co-authored the paper with fellow PhD student Whitney Zhang ’21. “I think what our study shows is that this kind of technology has important applications in white collar work. It’s a useful technology. But it’s still too early to tell if it will be good or bad, or how exactly it’s going to cause society to adjust.”

Simulating work for chatbots

For centuries, people have worried that new technological advancements would lead to mass automation and job loss. But new technologies also create new jobs, and when they increase worker productivity, they can have a net positive effect on the economy.

“Productivity is front of mind for economists when thinking of new technological developments,” Noy says. “The classical view in economics is that the most important thing that technological advancement does is raise productivity, in the sense of letting us produce economic output more efficiently.”

To study generative AI’s effect on worker productivity, the researchers gave 453 college-educated marketers, grant writers, consultants, data analysts, human resource professionals, and managers two writing tasks specific to their occupation. The 20- to 30-minute tasks included writing cover letters for grant applications, emails about organizational restructuring, and plans for analyses helping a company decide which customers to send push notifications to based on given customer data. Experienced professionals in the same occupations as each participant evaluated each submission as if they were encountering it in a work setting. Evaluators did not know which submissions were created with the help of ChatGPT.

Half of participants were given access to the chatbot ChatGPT-3.5, developed by the company OpenAI, for the second assignment. Those users finished tasks 11 minutes faster than the control group, while their average quality evaluations increased by 18 percent.

The data also showed that performance inequality between workers decreased, meaning workers who received a lower grade in the first task benefitted more from using ChatGPT for the second task.

The researchers say the tasks were broadly representative of assignments such professionals see in their real jobs, but they noted a number of limitations. Because they were using anonymous participants, the researchers couldn’t require contextual knowledge about a specific company or customer. They also had to give explicit instructions for each assignment, whereas real-world tasks may be more open-ended. Additionally, the researchers didn’t think it was feasible to hire fact-checkers to evaluate the accuracy of the outputs. Accuracy is a major problem for today’s generative AI technologies.

The researchers said those limitations could lessen ChatGPT’s productivity-boosting potential in the real world. Still, they believe the results show the technology’s promise — an idea supported by another of the study’s findings: Workers exposed to ChatGPT during the experiment were twice as likely to report using it in their real job two weeks after the experiment.

“The experiment demonstrates that it does bring significant speed benefits, even if those speed benefits are lesser in the real world because you need to spend time fact-checking and writing the prompts,” Noy says.

Taking the macro view

The study offered a close-up look at the impact that tools like ChatGPT can have on certain writing tasks. But extrapolating that impact out to understand generative AI’s effect on the economy is more difficult. That’s what the researchers hope to work on next.

“There are so many other factors that are going to affect wages, employment, and shifts across sectors that would require pieces of evidence that aren’t in our paper,” Zhang says. “But the magnitude of time saved and quality increases are very large in our paper, so it does seem like this is pretty revolutionary, at least for certain types of work.”

Both researchers agree that, even if it’s accepted that ChatGPT will increase many workers’ productivity, much work remains to be done to figure out how society should respond to generative AI’s proliferation.

“The policy needed to adjust to these technologies can be very different depending on what future research finds,” Zhang says. “If we think this will boost wages for lower-paid workers, that’s a very different implication than if it’s going to increase wage inequality by boosting the wages of already high earners. I think there’s a lot of downstream economic and political effects that are important to pin down.”

The study was supported by an Emergent Ventures grant, the Mercatus Center, George Mason University, a George and Obie Shultz Fund grant, the MIT Department of Economics, and a National Science Foundation Graduate Research Fellowship Grant.

A new way to look at data privacy

Imagine that a team of scientists has developed a machine-learning model that can predict whether a patient has cancer from lung scan images. They want to share this model with hospitals around the world so clinicians can start using it in diagnosis.

But there’s a problem. To teach their model how to predict cancer, they showed it millions of real lung scan images, a process called training. Those sensitive data, which are now encoded into the inner workings of the model, could potentially be extracted by a malicious agent. The scientists can prevent this by adding noise, or more generic randomness, to the model that makes it harder for an adversary to guess the original data. However, perturbation reduces a model’s accuracy, so the less noise one can add, the better.

MIT researchers have developed a technique that enables the user to potentially add the smallest amount of noise possible, while still ensuring the sensitive data are protected.

The researchers created a new privacy metric, which they call Probably Approximately Correct (PAC) Privacy, and built a framework based on this metric that can automatically determine the minimal amount of noise that needs to be added. Moreover, this framework does not need knowledge of the inner workings of a model or its training process, which makes it easier to use for different types of models and applications.

In several cases, the researchers show that the amount of noise required to protect sensitive data from adversaries is far less with PAC Privacy than with other approaches. This could help engineers create machine-learning models that provably hide training data, while maintaining accuracy in real-world settings.

“PAC Privacy exploits the uncertainty or entropy of the sensitive data in a meaningful way,  and this allows us to add, in many cases, an order of magnitude less noise. This framework allows us to understand the characteristics of arbitrary data processing and privatize it automatically without artificial modifications. While we are in the early days and we are doing simple examples, we are excited about the promise of this technique,” says Srini Devadas, the Edwin Sibley Webster Professor of Electrical Engineering and co-author of a new paper on PAC Privacy.

Devadas wrote the paper with lead author Hanshen Xiao, an electrical engineering and computer science graduate student. The research will be presented at the International Cryptography Conference (Crypto 2023).

Defining privacy

A fundamental question in data privacy is: How much sensitive data could an adversary recover from a machine-learning model with noise added to it?

Differential Privacy, one popular privacy definition, says privacy is achieved if an adversary who observes the released model cannot infer whether an arbitrary individual’s data is used for the training processing. But provably preventing an adversary from distinguishing data usage often requires large amounts of noise to obscure it. This noise reduces the model’s accuracy.

PAC Privacy looks at the problem a bit differently. It characterizes how hard it would be for an adversary to reconstruct any part of randomly sampled or generated sensitive data after noise has been added, rather than only focusing on the distinguishability problem.

For instance, if the sensitive data are images of human faces, differential privacy would focus on whether the adversary can tell if someone’s face was in the dataset. PAC Privacy, on the other hand, could look at whether an adversary could extract a silhouette — an approximation — that someone could recognize as a particular individual’s face.

Once they established the definition of PAC Privacy, the researchers created an algorithm that automatically tells the user how much noise to add to a model to prevent an adversary from confidently reconstructing a close approximation of the sensitive data. This algorithm guarantees privacy even if the adversary has infinite computing power, Xiao says.

To find the optimal amount of noise, the PAC Privacy algorithm relies on the uncertainty, or entropy, in the original data from the viewpoint of the adversary.

This automatic technique takes samples randomly from a data distribution or a large data pool and runs the user’s machine-learning training algorithm on that subsampled data to produce an output learned model. It does this many times on different subsamplings and compares the variance across all outputs. This variance determines how much noise one must add — a smaller variance means less noise is needed.

Algorithm advantages

Different from other privacy approaches, the PAC Privacy algorithm does not need knowledge of the inner workings of a model, or the training process.

When implementing PAC Privacy, a user can specify their desired level of confidence at the outset. For instance, perhaps the user wants a guarantee that an adversary will not be more than 1 percent confident that they have successfully reconstructed the sensitive data to within 5 percent of its actual value. The PAC Privacy algorithm automatically tells the user the optimal amount of noise that needs to be added to the output model before it is shared publicly, in order to achieve those goals.

“The noise is optimal, in the sense that if you add less than we tell you, all bets could be off. But the effect of adding noise to neural network parameters is complicated, and we are making no promises on the utility drop the model may experience with the added noise,” Xiao says.

This points to one limitation of PAC Privacy — the technique does not tell the user how much accuracy the model will lose once the noise is added. PAC Privacy also involves repeatedly training a machine-learning model on many subsamplings of data, so it can be computationally expensive.  

To improve PAC Privacy, one approach is to modify a user’s machine-learning training process so it is more stable, meaning that the output model it produces does not change very much when the input data is subsampled from a data pool.  This stability would create smaller variances between subsample outputs, so not only would the PAC Privacy algorithm need to be run fewer times to identify the optimal amount of noise, but it would also need to add less noise.

An added benefit of stabler models is that they often have less generalization error, which means they can make more accurate predictions on previously unseen data, a win-win situation between machine learning and privacy, Devadas adds.

“In the next few years, we would love to look a little deeper into this relationship between stability and privacy, and the relationship between privacy and generalization error. We are knocking on a door here, but it is not clear yet where the door leads,” he says.

This research is funded, in part, by DSTA Singapore, Cisco Systems, Capital One, and a MathWorks Fellowship.

Generative AI imagines new protein structures

Biology is a wondrous yet delicate tapestry. At the heart is DNA, the master weaver that encodes proteins, responsible for orchestrating the many biological functions that sustain life within the human body. However, our body is akin to a finely tuned instrument, susceptible to losing its harmony. After all, we’re faced with an ever-changing and relentless natural world: pathogens, viruses, diseases, and cancer. 

Imagine if we could expedite the process of creating vaccines or drugs for newly emerged pathogens. What if we had gene editing technology capable of automatically producing proteins to rectify DNA errors that cause cancer? The quest to identify proteins that can strongly bind to targets or speed up chemical reactions is vital for drug development, diagnostics, and numerous industrial applications, yet it is often a protracted and costly endeavor.

To advance our capabilities in protein engineering, MIT CSAIL researchers came up with “FrameDiff,” a computational tool for creating new protein structures beyond what nature has produced. The machine learning approach generates “frames” that align with the inherent properties of protein structures, enabling it to construct novel proteins independently of preexisting designs, facilitating unprecedented protein structures.

„In nature, protein design is a slow-burning process that takes millions of years. Our technique aims to provide an answer to tackling human-made problems that evolve much faster than nature’s pace,” says MIT CSAIL PhD student Jason Yim, a lead author on a new paper about the work. “The aim, with respect to this new capacity of generating synthetic protein structures, opens up a myriad of enhanced capabilities, such as better binders. This means engineering proteins that can attach to other molecules more efficiently and selectively, with widespread implications related to targeted drug delivery and biotechnology, where it could result in the development of better biosensors. It could also have implications for the field of biomedicine and beyond, offering possibilities such as developing more efficient photosynthesis proteins, creating more effective antibodies, and engineering nanoparticles for gene therapy.” 

Framing FrameDiff

Proteins have complex structures, made up of many atoms connected by chemical bonds. The most important atoms that determine the protein’s 3D shape are called the “backbone,” kind of like the spine of the protein. Every triplet of atoms along the backbone shares the same pattern of bonds and atom types. Researchers noticed this pattern can be exploited to build machine learning algorithms using ideas from differential geometry and probability. This is where the frames come in: Mathematically, these triplets can be modeled as rigid bodies called “frames” (common in physics) that have a position and rotation in 3D. 

These frames equip each triplet with enough information to know about its spatial surroundings. The task is then for a machine learning algorithm to learn how to move each frame to construct a protein backbone. By learning to construct existing proteins, the algorithm hopefully will generalize and be able to create new proteins never seen before in nature.

Training a model to construct proteins via “diffusion” involves injecting noise that randomly moves all the frames and blurs what the original protein looked like. The algorithm’s job is to move and rotate each frame until it looks like the original protein. Though simple, the development of diffusion on frames requires techniques in stochastic calculus on Riemannian manifolds. On the theory side, the researchers developed “SE(3) diffusion” for learning probability distributions that nontrivially connects the translations and rotations components of each frame.

The subtle art of diffusion

In 2021, DeepMind introduced AlphaFold2, a deep learning algorithm for predicting 3D protein structures from their sequences. When creating synthetic proteins, there are two essential steps: generation and prediction. Generation means the creation of new protein structures and sequences, while „prediction“ means figuring out what the 3D structure of a sequence is. It’s no coincidence that AlphaFold2 also used frames to model proteins. SE(3) diffusion and FrameDiff were inspired to take the idea of frames further by incorporating frames into diffusion models, a generative AI technique that has become immensely popular in image generation, like Midjourney, for example. 

The shared frames and principles between protein structure generation and prediction meant the best models from both ends were compatible. In collaboration with the Institute for Protein Design at the University of Washington, SE(3) diffusion is already being used to create and experimentally validate novel proteins. Specifically, they combined SE(3) diffusion with RosettaFold2, a protein structure prediction tool much like AlphaFold2, which led to “RFdiffusion.” This new tool brought protein designers closer to solving crucial problems in biotechnology, including the development of highly specific protein binders for accelerated vaccine design, engineering of symmetric proteins for gene delivery, and robust motif scaffolding for precise enzyme design. 

Future endeavors for FrameDiff involve improving generality to problems that combine multiple requirements for biologics such as drugs. Another extension is to generalize the models to all biological modalities including DNA and small molecules. The team posits that by expanding FrameDiff’s training on more substantial data and enhancing its optimization process, it could generate foundational structures boasting design capabilities on par with RFdiffusion, all while preserving the inherent simplicity of FrameDiff. 

“Discarding a pretrained structure prediction model [in FrameDiff] opens up possibilities for rapidly generating structures extending to large lengths,” says Harvard University computational biologist Sergey Ovchinnikov. The researchers‘ innovative approach offers a promising step toward overcoming the limitations of current structure prediction models. Even though it’s still preliminary work, it’s an encouraging stride in the right direction. As such, the vision of protein design, playing a pivotal role in addressing humanity’s most pressing challenges, seems increasingly within reach, thanks to the pioneering work of this MIT research team.” 

Yim wrote the paper alongside Columbia University postdoc Brian Trippe, French National Center for Scientific Research in Paris‘ Center for Science of Data researcher Valentin De Bortoli, Cambridge University postdoc Emile Mathieu, and Oxford University professor of statistics and senior research scientist at DeepMind Arnaud Doucet. MIT professors Regina Barzilay and Tommi Jaakkola advised the research. 

The team’s work was supported, in part, by the MIT Abdul Latif Jameel Clinic for Machine Learning in Health, EPSRC grants and a Prosperity Partnership between Microsoft Research and Cambridge University, the National Science Foundation Graduate Research Fellowship Program, NSF Expeditions grant, Machine Learning for Pharmaceutical Discovery and Synthesis consortium, the DTRA Discovery of Medical Countermeasures Against New and Emerging threats program, the DARPA Accelerated Molecular Discovery program, and the Sanofi Computational Antibody Design grant. This research will be presented at the International Conference on Machine Learning in July.