KI Use Cases – Seite 5 – KIBU – KI Community Bayerischer Untermain

Six MIT students selected as spring 2024 MIT-Pillar AI Collective Fellows

The MIT-Pillar AI Collective has announced six fellows for the spring 2024 semester. With support from the program, the graduate students, who are in their final year of a master’s or PhD program, will conduct research in the areas of AI, machine learning, and data science with the aim of commercializing their innovations.

Launched by MIT’s School of Engineering and Pillar VC in 2022, the MIT-Pillar AI Collective supports faculty, postdocs, and students conducting research on AI, machine learning, and data science. Supported by a gift from Pillar VC and administered by the MIT Deshpande Center for Technological Innovation, the mission of the program is to advance research toward commercialization.

The spring 2024 MIT-Pillar AI Collective Fellows are:

Yasmeen AlFaraj

Yasmeen AlFaraj is a PhD candidate in chemistry whose interest is in the application of data science and machine learning to soft materials design to enable next-generation, sustainable plastics, rubber, and composite materials. More specifically, she is applying machine learning to the design of novel molecular additives to enable the low-cost manufacturing of chemically deconstructable thermosets and composites. AlFaraj’s work has led to the discovery of scalable, translatable new materials that could address thermoset plastic waste. As a Pillar Fellow, she will pursue bringing this technology to market, initially focusing on wind turbine blade manufacturing and conformal coatings. Through the Deshpande Center for Technological Innovation, AlFaraj serves as a lead for a team developing a spinout focused on recyclable versions of existing high-performance thermosets by incorporating small quantities of a degradable co-monomer. In addition, she participated in the National Science Foundation Innovation Corps program and recently graduated from the Clean Tech Open, where she focused on enhancing her business plan, analyzing potential markets, ensuring a complete IP portfolio, and connecting with potential funders. AlFaraj earned a BS in chemistry from University of California at Berkeley.

Ruben Castro Ornelas

Ruben Castro Ornelas is a PhD student in mechanical engineering who is passionate about the future of multipurpose robots and designing the hardware to use them with AI control solutions. Combining his expertise in programming, embedded systems, machine design, reinforcement learning, and AI, he designed a dexterous robotic hand capable of carrying out useful everyday tasks without sacrificing size, durability, complexity, or simulatability. Ornelas’s innovative design holds significant commercial potential in domestic, industrial, and health-care applications because it could be adapted to hold everything from kitchenware to delicate objects. As a Pillar Fellow, he will focus on identifying potential commercial markets, determining the optimal approach for business-to-business sales, and identifying critical advisors. Ornelas served as co-director of StartLabs, an undergraduate entrepreneurship club at MIT, where he earned an BS in mechanical engineering.

Keeley Erhardt

Keeley Erhardt is a PhD candidate in media arts and sciences whose research interests lie in the transformative potential of AI in network analysis, particularly for entity correlation and hidden link detection within and across domains. She has designed machine learning algorithms to identify and track temporal correlations and hidden signals in large-scale networks, uncovering online influence campaigns originating from multiple countries. She has similarly demonstrated the use of graph neural networks to identify coordinated cryptocurrency accounts by analyzing financial time series data and transaction dynamics. As a Pillar Fellow, Erhardt will pursue the potential commercial applications of her work, such as detecting fraud, propaganda, money laundering, and other covert activity in the finance, energy, and national security sectors. She has had internships at Google, Facebook, and Apple and held software engineering roles at multiple tech unicorns. Erhardt earned an MEng in electrical engineering and computer science and a BS in computer science, both from MIT.

Vineet Jagadeesan Nair

Vineet Jagadeesan Nair is a PhD candidate in mechanical engineering whose research focuses on modeling power grids and designing electricity markets to integrate renewables, batteries, and electric vehicles. He is broadly interested in developing computational tools to tackle climate change. As a Pillar Fellow, Nair will explore the application of machine learning and data science to power systems. Specifically, he will experiment with approaches to improve the accuracy of forecasting electricity demand and supply with high spatial-temporal resolution. In collaboration with Project Tapestry @ Google X, he is also working on fusing physics-informed machine learning with conventional numerical methods to increase the speed and accuracy of high-fidelity simulations. Nair’s work could help realize future grids with high penetrations of renewables and other clean, distributed energy resources. Outside academics, Nair is active in entrepreneurship, most recently helping to organize the 2023 MIT Global Startup Workshop in Greece. He earned an MS in computational science and engineering from MIT, an MPhil in energy technologies from Cambridge University as a Gates Scholar, and a BS in mechanical engineering and a BA in economics from University of California at Berkeley.

Mahdi Ramadan

Mahdi Ramadan is a PhD candidate in brain and cognitive sciences whose research interests lie at the intersection of cognitive science, computational modeling, and neural technologies. His work uses novel unsupervised methods for learning and generating interpretable representations of neural dynamics, capitalizing on recent advances in AI, specifically contrastive and geometric deep learning techniques capable of uncovering the latent dynamics underlying neural processes with high fidelity. As a Pillar Fellow, he will leverage these methods to gain a better understanding of dynamical models of muscle signals for generative motor control. By supplementing current spinal prosthetics with generative AI motor models that can streamline, speed up, and correct limb muscle activations in real time, as well as potentially using multimodal vision-language models to infer the patients’ high-level intentions, Ramadan aspires to build truly scalable, accessible, and capable commercial neuroprosthetics. Ramadan’s entrepreneurial experience includes being the co-founder of UltraNeuro, a neurotechnology startup, and co-founder of Presizely, a computer vision startup. He earned a BS in neurobiology from University of Washington.

Rui (Raymond) Zhou

Rui (Raymond) Zhou is a PhD candidate in mechanical engineering whose research focuses on multimodal AI for engineering design. As a Pillar Fellow, he will advance models that could enable designers to translate information in any modality or combination of modalities into comprehensive 2D and 3D designs, including parametric data, component visuals, assembly graphs, and sketches. These models could also optimize existing human designs to accomplish goals such as improving ergonomics or reducing drag coefficient. Ultimately, Zhou aims to translate his work into a software-as-a-service platform that redefines product design across various sectors, from automotive to consumer electronics. His efforts have the potential to not only accelerate the design process but also reduce costs, opening the door to unprecedented levels of customization, idea generation, and rapid prototyping. Beyond his academic pursuits, Zhou founded UrsaTech, a startup that integrates AI into education and engineering design. He earned a BS in electrical engineering and computer sciences from University of California at Berkeley.

How symmetry can come to the aid of machine learning

Behrooz Tahmasebi — an MIT PhD student in the Department of Electrical Engineering and Computer Science (EECS) and an affiliate of the Computer Science and Artificial Intelligence Laboratory (CSAIL) — was taking a mathematics course on differential equations in late 2021 when a glimmer of inspiration struck. In that class, he learned for the first time about Weyl’s law, which had been formulated 110 years earlier by the German mathematician Hermann Weyl. Tahmasebi realized it might have some relevance to the computer science problem he was then wrestling with, even though the connection appeared — on the surface — to be thin, at best. Weyl’s law, he says, provides a formula that measures the complexity of the spectral information, or data, contained within the fundamental frequencies of a drum head or guitar string.

Tahmasebi was, at the same time, thinking about measuring the complexity of the input data to a neural network, wondering whether that complexity could be reduced by taking into account some of the symmetries inherent to the dataset. Such a reduction, in turn, could facilitate — as well as speed up — machine learning processes.

Weyl’s law, conceived about a century before the boom in machine learning, had traditionally been applied to very different physical situations — such as those concerning the vibrations of a string or the spectrum of electromagnetic (black-body) radiation given off by a heated object. Nevertheless, Tahmasebi believed that a customized version of that law might help with the machine learning problem he was pursuing. And if the approach panned out, the payoff could be considerable.

He spoke with his advisor, Stefanie Jegelka — an associate professor in EECS and affiliate of CSAIL and the MIT Institute for Data, Systems, and Society — who believed the idea was definitely worth looking into. As Tahmasebi saw it, Weyl’s law had to do with gauging the complexity of data, and so did this project. But Weyl’s law, in its original form, said nothing about symmetry.

He and Jegelka have now succeeded in modifying Weyl’s law so that symmetry can be factored into the assessment of a dataset’s complexity. “To the best of my knowledge,” Tahmasebi says, “this is the first time Weyl’s law has been used to determine how machine learning can be enhanced by symmetry.”

The paper he and Jegelka wrote earned a “Spotlight” designation when it was presented at the December 2023 conference on Neural Information Processing Systems — widely regarded as the world’s top conference on machine learning.

This work, comments Soledad Villar, an applied mathematician at Johns Hopkins University, “shows that models that satisfy the symmetries of the problem are not only correct but also can produce predictions with smaller errors, using a small amount of training points. [This] is especially important in scientific domains, like computational chemistry, where training data can be scarce.”

In their paper, Tahmasebi and Jegelka explored the ways in which symmetries, or so-called “invariances,” could benefit machine learning. Suppose, for example, the goal of a particular computer run is to pick out every image that contains the numeral 3. That task can be a lot easier, and go a lot quicker, if the algorithm can identify the 3 regardless of where it is placed in the box — whether it’s exactly in the center or off to the side — and whether it is pointed right-side up, upside down, or oriented at a random angle. An algorithm equipped with the latter capability can take advantage of the symmetries of translation and rotations, meaning that a 3, or any other object, is not changed in itself by altering its position or by rotating it around an arbitrary axis. It is said to be invariant to those shifts. The same logic can be applied to algorithms charged with identifying dogs or cats. A dog is a dog is a dog, one might say, irrespective of how it is embedded within an image.

The point of the entire exercise, the authors explain, is to exploit a dataset’s intrinsic symmetries in order to reduce the complexity of machine learning tasks. That, in turn, can lead to a reduction in the amount of data needed for learning. Concretely, the new work answers the question: How many fewer data are needed to train a machine learning model if the data contain symmetries?

There are two ways of achieving a gain, or benefit, by capitalizing on the symmetries present. The first has to do with the size of the sample to be looked at. Let’s imagine that you are charged, for instance, with analyzing an image that has mirror symmetry — the right side being an exact replica, or mirror image, of the left. In that case, you don’t have to look at every pixel; you can get all the information you need from half of the image — a factor of two improvement. If, on the other hand, the image can be partitioned into 10 identical parts, you can get a factor of 10 improvement. This kind of boosting effect is linear.

To take another example, imagine you are sifting through a dataset, trying to find sequences of blocks that have seven different colors — black, blue, green, purple, red, white, and yellow. Your job becomes much easier if you don’t care about the order in which the blocks are arranged. If the order mattered, there would be 5,040 different combinations to look for. But if all you care about are sequences of blocks in which all seven colors appear, then you have reduced the number of things — or sequences — you are searching for from 5,040 to just one.

Tahmasebi and Jegelka discovered that it is possible to achieve a different kind of gain — one that is exponential — that can be reaped for symmetries that operate over many dimensions. This advantage is related to the notion that the complexity of a learning task grows exponentially with the dimensionality of the data space. Making use of a multidimensional symmetry can therefore yield a disproportionately large return. “This is a new contribution that is basically telling us that symmetries of higher dimension are more important because they can give us an exponential gain,” Tahmasebi says.

The NeurIPS 2023 paper that he wrote with Jegelka contains two theorems that were proved mathematically. “The first theorem shows that an improvement in sample complexity is achievable with the general algorithm we provide,” Tahmasebi says. The second theorem complements the first, he added, “showing that this is the best possible gain you can get; nothing else is achievable.”

He and Jegelka have provided a formula that predicts the gain one can obtain from a particular symmetry in a given application. A virtue of this formula is its generality, Tahmasebi notes. “It works for any symmetry and any input space.” It works not only for symmetries that are known today, but it could also be applied in the future to symmetries that are yet to be discovered. The latter prospect is not too farfetched to consider, given that the search for new symmetries has long been a major thrust in physics. That suggests that, as more symmetries are found, the methodology introduced by Tahmasebi and Jegelka should only get better over time.

According to Haggai Maron, a computer scientist at Technion (the Israel Institute of Technology) and NVIDIA who was not involved in the work, the approach presented in the paper “diverges substantially from related previous works, adopting a geometric perspective and employing tools from differential geometry. This theoretical contribution lends mathematical support to the emerging subfield of ‘Geometric Deep Learning,’ which has applications in graph learning, 3D data, and more. The paper helps establish a theoretical basis to guide further developments in this rapidly expanding research area.”

Creating new skills and new connections with MIT’s Quantitative Methods Workshop

Starting on New Year’s Day, when many people were still clinging to holiday revelry, scores of students and faculty members from about a dozen partner universities instead flipped open their laptops for MIT’s Quantitative Methods Workshop, a jam-packed, weeklong introduction to how computational and mathematical techniques can be applied to neuroscience and biology research. But don’t think of QMW as a “crash course.” Instead the program’s purpose is to help elevate each participant’s scientific outlook, both through the skills and concepts it imparts and the community it creates.

“It broadens their horizons, it shows them significant applications they’ve never thought of, and introduces them to people whom as researchers they will come to know and perhaps collaborate with one day,” says Susan L. Epstein, a Hunter College computer science professor and education coordinator of MIT’s Center for Brains, Minds, and Machines, which hosts the program with the departments of Biology and Brain and Cognitive Sciences and The Picower Institute for Learning and Memory. “It is a model of interdisciplinary scholarship.”

This year 83 undergraduates and faculty members from institutions that primarily serve groups underrepresented in STEM fields took part in the QMW, says organizer Mandana Sassanfar, senior lecturer and director of diversity and science outreach across the four hosting MIT entities. Since the workshop launched in 2010, it has engaged more than 1,000 participants, of whom more than 170 have gone on to participate in MIT Summer Research Programs (such as MSRP-BIO), and 39 have come to MIT for graduate school.

Individual goals, shared experience

Undergraduates and faculty in various STEM disciplines often come to QMW to gain an understanding of, or expand their expertise in, computational and mathematical data analysis. Computer science- and statistics-minded participants come to learn more about how such techniques can be applied in life sciences fields. In lectures; in hands-on labs where they used the computer programming language Python to process, analyze, and visualize data; and in less formal settings such as tours and lunches with MIT faculty, participants worked and learned together, and informed each other’s perspectives.

And regardless of their field of study, participants made connections with each other and with the MIT students and faculty who taught and spoke over the course of the week.

Hunter College computer science sophomore Vlad Vostrikov says that while he has already worked with machine learning and other programming concepts, he was interested to “branch out” by seeing how they are used to analyze scientific datasets. He also valued the chance to learn the experiences of the graduate students who teach QMW’s hands-on labs.

“This was a good way to explore computational biology and neuroscience,” Vostrikov says. “I also really enjoy hearing from the people who teach us. It’s interesting to hear where they come from and what they are doing.”

Jariatu Kargbo, a biology and chemistry sophomore at University of Maryland Baltimore County, says when she first learned of the QMW she wasn’t sure it was for her. It seemed very computation-focused. But her advisor Holly Willoughby encouraged Kargbo to attend to learn about how programming could be useful in future research — currently she is taking part in research on the retina at UMBC. More than that, Kargbo also realized it would be a good opportunity to make connections at MIT in advance of perhaps applying for MSRP this summer.

“I thought this would be a great way to meet up with faculty and see what the environment is like here because I’ve never been to MIT before,” Kargbo says. “It’s always good to meet other people in your field and grow your network.”

QMW is not just for students. It’s also for their professors, who said they can gain valuable professional education for their research and teaching.

Fayuan Wen, an assistant professor of biology at Howard University, is no stranger to computational biology, having performed big data genetic analyses of sickle cell disease (SCD). But she’s mostly worked with the R programming language and QMW’s focus is on Python. As she looks ahead to projects in which she wants analyze genomic data to help predict disease outcomes in SCD and HIV, she says a QMW session delivered by biology graduate student Hannah Jacobs was perfectly on point.

“This workshop has the skills I want to have,” Wen says.

Moreover, Wen says she is looking to start a machine-learning class in the Howard biology department and was inspired by some of the teaching materials she encountered at QMW — for example, online curriculum modules developed by Taylor Baum, an MIT graduate student in electrical engineering and computer science and Picower Institute labs, and Paloma Sánchez-Jáuregui, a coordinator who works with Sassanfar.

Tiziana Ligorio, a Hunter College computer science doctoral lecturer who together with Epstein teaches a deep machine-learning class at the City University of New York campus, felt similarly. Rather than require a bunch of prerequisites that might drive students away from the class, Ligorio was looking to QMW’s intense but introductory curriculum as a resource for designing a more inclusive way of getting students ready for the class.

Instructive interactions

Each day runs from 9 a.m. to 5 p.m., including morning and afternoon lectures and hands-on sessions. Class topics ranged from statistical data analysis and machine learning to brain-computer interfaces, brain imaging, signal processing of neural activity data, and cryogenic electron microscopy.

“This workshop could not happen without dedicated instructors — grad students, postdocs, and faculty — who volunteer to give lectures, design and teach hands-on computer labs, and meet with students during the very first week of January,” Saassanfar says.

The sessions surround student lunches with MIT faculty members. For example, at midday Jan. 2, assistant professor of biology Brady Weissbourd, an investigator in the Picower Institute, sat down with seven students in one of Building 46’s curved sofas to field questions about his neuroscience research in jellyfish and how he uses quantitative techniques as part of that work. He also described what it’s like to be a professor, and other topics that came to the students’ minds.

Then the participants all crossed Vassar Street to Building 26’s Room 152, where they formed different but similarly sized groups for the hands-on lab “Machine learning applications to studying the brain,” taught by Baum. She guided the class through Python exercises she developed illustrating “supervised” and “unsupervised” forms of machine learning, including how the latter method can be used to discern what a person is seeing based on magnetic readings of brain activity.

As students worked through the exercises, tablemates helped each other by supplementing Baum’s instruction. Ligorio, Vostrikov, and Kayla Blincow, assistant professor of biology at the University of the Virgin Islands, for instance, all leapt to their feet to help at their tables.

At the end of the class, when Baum asked students what they had learned, they offered a litany of new knowledge. Survey data that Sassanfar and Sánchez-Jáuregui use to anonymously track QMW outcomes, revealed many more such attestations of the value of the sessions. With a prompt asking how one might apply what they’ve learned, one respondent wrote: “Pursue a research career or endeavor in which I apply the concepts of computer science and neuroscience together.”

Enduring connections

While some new QMW attendees might only be able to speculate about how they’ll apply their new skills and relationships, Luis Miguel de Jesús Astacio could testify to how attending QMW as an undergraduate back in 2014 figured into a career where he is now a faculty member in physics at the University of Puerto Rico Rio Piedras Campus. After QMW, he returned to MIT that summer as a student in the lab of neuroscientist and Picower Professor Susumu Tonegawa. He came back again in 2016 to the lab of physicist and Francis Friedman Professor Mehran Kardar. What’s endured for the decade has been his connection to Sassanfar. So while he was once a student at QMW, this year he was back with a cohort of undergraduates as a faculty member.

Michael Aldarondo-Jeffries, director of academic advancement programs at the University of Central Florida, seconded the value of the networking that takes place at QMW. He has brought students for a decade, including four this year. What he’s observed is that as students come together in settings like QMW or UCF’s McNair program, which helps to prepare students for graduate school, they become inspired about a potential future as researchers.

“The thing that stands out is just the community that’s formed,” he says. “For many of the students, it’s the first time that they’re in a group that understands what they’re moving toward. They don’t have to explain why they’re excited to read papers on a Friday night.”

Or why they are excited to spend a week including New Year’s Day at MIT learning how to apply quantitative methods to life sciences data.

Entrepreneur creates career pathways with MIT OpenCourseWare

When June Odongo interviewed early-career electrical engineer Cynthia Wacheke for a software engineering position at her company, Wacheke lacked knowledge of computer science theory but showed potential in complex problem-solving.

Determined to give Wacheke a shot, Odongo turned to MIT OpenCourseWare to create a six-month “bridging course” modeled after the classes she once took as a computer science student. Part of MIT Open Learning, OpenCourseWare offers free, online, open educational resources from more than 2,500 courses that span the MIT undergraduate and graduate curriculum.

“Wacheke had the potential and interest to do the work that needed to be done, so the way to solve this was for me to literally create a path for her to get that work done,” says Odongo, founder and CEO of Senga Technologies.

Developers, Odongo says, are not easy to find. The OpenCourseWare educational resources provided a way to close that gap. “We put Wacheke through the course last year, and she is so impressive,” Odongo says. “Right now, she is doing our first machine learning models. It’s insane how good of a team member she is. She has done so much in such a short time.”

Making high-quality candidates job-ready

Wacheke, who holds a bachelor’s degree in electrical engineering from the University of Nairobi, started her professional career as a hardware engineer. She discovered a passion for software while working on a dashboard design project, and decided to pivot from hardware to software engineering. That’s when she discovered Senga Technologies, a logistics software and services company in Kenya catering to businesses that ship in Africa.

Odongo founded Senga with the goal of simplifying and easing the supply chain and logistics experience, from the movement of goods to software tools. Senga’s ultimate goal, Odongo says, is to have most of their services driven by software. That means employees — and candidates — need to be able to think through complex problems using computer science theory.

“A lot of people are focused on programming, but we care less about programming and more about problem-solving,” says Odongo, who received a bachelor’s degree in computer science from the University of Massachusetts at Lowell and an MBA from Harvard Business School. “We actually apply the things people learn in computer science programs.”

Wacheke started the bridging course in June 2022 and was given six months to complete the curriculum on the MIT OpenCourseWare website. She took nine courses, including: Introduction to Algorithms; Mathematics for Computer Science; Design and Analysis of Algorithms; Elements of Software Construction; Automata, Computability, and Complexity; Database Systems; Principles of Autonomy and Decision Making; Introduction to Machine Learning; and Networks.

“The bridging course helped me learn how to think through things,” Wacheke says. “It’s one thing to know how to do something, but it’s another to design that thing from scratch and implement it.”

During the bridging course, Wacheke was paired with a software engineer at Senga, who mentored her and answered questions along the way. She learned Ruby on Rails, a server-side web application framework under the MIT License. Wacheke also completed other projects to complement the theory she was learning. She created a new website that included an integration to channel external requests to Slack, a cross-platform team communication tool used by the company’s employees.

Continuous learning for team members

The bridging course concluded with a presentation to Senga employees, during which Wacheke explained how the company could use graph theory for decision-making. “If you want to get from point A to B, there are algorithms you can use to find the shortest path,” Wacheke says. “Since we’re a logistics company, I thought we could use this when we’re deciding which routes our trucks take.”

The presentation, which is the final requirement for the bridging course, is also a professional development opportunity for Senga employees. “This process is helpful for our team members, particularly those who have been out of school for a while,” Odongo says. “The candidates present what they’ve learned in relation to Senga. It’s a way of doing continuous learning for the existing team members.”

After successfully completing the bridging course in November 2022, Wacheke transitioned to a full-time software engineer role. She is currently developing a “machine” that can interpret and categorize hundreds of documents, including delivery notes, cash flows, and receipts.

“The goal is to enable our customers to simply feed those documents into our machine, and then we can more accurately read and convert them to digital formats to drive automation,” Odongo says. “The machine will also enable someone to ask a document a question, such as ‘What did I deliver to retailer X on date Y?’ or ‘What is the total price of the goods delivered?’”

The bridging course, which was initially custom-designed for Wacheke, is now a permanent program at Senga. A second team member completed the course in October 2023 and has joined the software team full time.

“Developers are not easy to find, and you also want high-quality developers,” Odongo says. “At least when we do this, we know that the person has gone through what we need.”

Q&A: A blueprint for sustainable innovation

Atacama Biomaterials is a startup combining architecture, machine learning, and chemical engineering to create eco-friendly materials with multiple applications. Passionate about sustainable innovation, its co-founder Paloma Gonzalez-Rojas SM ’15, PhD ’21 highlights here how MIT has supported the project through several of its entrepreneurship initiatives, and reflects on the role of design in building a holistic vision for an expanding business.

Q: What role do you see your startup playing in the sustainable materials space?

A: Atacama Biomaterials is a venture dedicated to advancing sustainable materials through state-of-the-art technology. With my co-founder Jose Tomas Dominguez, we have been working on developing our technology since 2019. We initially started the company in 2020 under another name and received Sandbox funds the next year. In 2021, we went through The Engine’s accelerator, Blueprint, and changed our name to Atacama Biomaterials in 2022 during the MITdesignX program.

This technology we have developed allows us to create our own data and material library using artificial intelligence and machine learning, and serves as a platform applicable to various industries horizontally — biofuels, biological drugs, and even mining. Vertically, we produce inexpensive, regionally sourced, and environmentally friendly bio-based polymers and packaging — that is, naturally compostable plastics as a flagship product, along with AI products.

Q: What motivated you to venture into biomaterials and found Atacama?

A: I’m from Chile, a country with a beautiful, rich geography and nature where we can see all the problems stemming from industry, waste management, and pollution. We named our company Atacama Biomaterials because the Atacama Desert in Chile — one of the places where you can best see the stars in the world — is becoming a plastic dump, as many other places on Earth. I care deeply about sustainability, and I have an emotional attachment to stop these problems. Considering that manufacturing accounts for 29 percent of global carbon emissions, it is clear that sustainability has a role in how we define technology and entrepreneurship, as well as a socio-economic dimension.

When I first came to MIT, it was to develop software in the Department of Architecture’s Design and Computation Group, with MIT professors Svafa Gronfeldt as co-advisor and Regina Barzilay as committee member. During my PhD, I studied machine-learning methods simulating pedestrian motion to understand how people move in space. In my work, I would use lots of plastics for 3D printing and I couldn’t stop thinking about sustainability and climate change, so I reached out to material science and mechanical engineering professors to look into biopolymers and degradable bio-based materials. This is how I met my co-founder, as we were both working with MIT Professor Neil Gershenfeld. Together, we were part of one of the first teams in the world to 3D print wood fibers, which is difficult — it’s slow and expensive — and quickly pivoted to sustainable packaging.

I then won a fellowship from MCSC [the MIT Climate and Sustainability Consortium], which gave me freedom to explore further, and I eventually got a postdoc in MIT chemical engineering, guided by MIT Professor Gregory Rutledge, a polymer physicist. This was unexpected in my career path. Winning Nucleate Eco Track 2022 and the MITdesignX Innovation Award in 2022 profiled Atacama Biomaterials as one of the rising startups in Boston’s biotechnology and climate-tech scene.

Q: What is your process to develop new biomaterials?

A: My PhD research, coupled with my background in material development and molecular dynamics, sparked the realization that principles I studied simulating pedestrian motion could also apply to molecular engineering. This connection may seem unconventional, but for me, it was a natural progression. Early in my career, I developed an intuition for materials, understanding their mechanics and physics.

Using my experience and skills, and leveraging machine learning as a technology jump, I applied a similar conceptual framework to simulate the trajectories of molecules and find potential applications in biomaterials. Making that parallel and shift was amazing. It allowed me to optimize a state-of-the-art molecular dynamic software to run twice as fast as more traditional technologies through my algorithm presented at the International Conference of Machine Learning this year. This is very important, because this kind of simulation usually takes a week, so narrowing it down to two days has major implications for scientists and industry, in material science, chemical engineering, computer science and related fields. Such work greatly influenced the foundation of Atacama Biomaterials, where we developed our own AI to deploy our materials. In an effort to mitigate the environmental impact of manufacturing, Atacama is targeting a 16.7 percent reduction in carbon dioxide emissions associated with the manufacturing process of its polymers, through the use of renewable energy.

Another thing is that I was trained as an architect in Chile, and my degree had a design component. I think design allows me to understand problems at a very high level, and how things interconnect. It contributed to developing a holistic vision for Atacama, because it allowed me to jump from one technology or discipline to another and understand broader applications on a conceptual level. Our design approach also meant that sustainability came to the center of our work from the very beginning, not just a plus or an added cost.

Q: What was the role of MITdesignX in Atacama’s development?

A: I have known Svafa Grönfeldt, MITdesignX’s faculty director, for almost six years. She was the co-advisor of my PhD, and we had a mentor-mentee relationship. I admire the fact that she created a space for people interested in business and entrepreneurship to grow within the Department of Architecture. She and Executive Director Gilad Rosenzweig gave us fantastic advice, and we received significant support from mentors. For example, Daniel Tsai helped us with intellectual property, including a crucial patent for Atacama. And we’re still in touch with the rest of the cohort. I really like this “design your company” approach, which I find quite unique, because it gives us the opportunity to reflect on who we want to be as designers, technologists, and entrepreneurs. Studying user insights also allowed us to understand the broad applicability of our research, and align our vision with market demands, ultimately shaping Atacama into a company with a holistic perspective on sustainable material development.

Q: How does Atacama approach scaling, and what are the immediate next steps for the company?

A: When I think about accomplishing our vision, I feel really inspired by my 3-year-old daughter. I want her to experience a world with trees and wildlife when she’s 100 years old, and I hope Atacama will contribute to such a future.

Going back to the designer’s perspective, we designed the whole process holistically, from feedstock to material development, incorporating AI and advanced manufacturing. Having proved that there is a demand for the materials we are developing, and having tested our products, manufacturing process, and technology in critical environments, we are now ready to scale. Our level of technology-readiness is comparable to the one used by NASA (level 4).

We have proof of concept: a biodegradable and recyclable packaging material which is cost- and energy-efficient as a clean energy enabler in large-scale manufacturing. We have received pre-seed funding, and are sustainably scaling by taking advantage of available resources around the world, like repurposing machinery from the paper industry. As presented in the MIT Industrial Liaison and STEX Program’s recent Sustainability Conference, unlike our competitors, we have cost-parity with current packaging materials, as well as low-energy processes. And we also proved the demand for our products, which was an important milestone. Our next steps involve strategically expanding our manufacturing capabilities and research facilities and we are currently evaluating building a factory in Chile and establishing an R&D lab plus a manufacturing plant in the U.S.

What to do about AI in health?

Before a drug is approved by the U.S. Food and Drug Administration (FDA), it must demonstrate both safety and efficacy. However, the FDA does not require an understanding a drug’s mechanism of action for approval. This acceptance of results without explanation raises the question of whether the „black box“ decision-making process of a safe and effective artificial intelligence model must be fully explained in order to secure FDA approval.

This topic was one of many discussion points addressed on Monday, Dec. 4 during the MIT Abdul Latif Jameel Clinic for Machine Learning in Health (Jameel Clinic) AI and Health Regulatory Policy Conference, which ignited a series of discussions and debates amongst faculty; regulators from the United States, EU, and Nigeria; and industry experts concerning the regulation of AI in health.

As machine learning continues to evolve rapidly, uncertainty persists as to whether regulators can keep up and still reduce the likelihood of harmful impact while ensuring that their respective countries remain competitive in innovation. To promote an environment of frank and open discussion, the Jameel Clinic event’s attendance was highly curated for an audience of 100 attendees debating through the enforcement of the Chatham House Rule, to allow speakers anonymity for discussing controversial opinions and arguments without being identified as the source.

Rather than hosting an event to generate buzz around AI in health, the Jameel Clinic’s goal was to create a space to keep regulators apprised of the most cutting-edge advancements in AI, while allowing faculty and industry experts to propose new or different approaches to regulatory frameworks for AI in health, especially for AI use in clinical settings and in drug development.

AI’s role in medicine is more relevant than ever, as the industry struggles with a post-pandemic labor shortage, increased costs (“Not a salary issue, despite common belief,” said one speaker), as well as high rates of burnout and resignations among health care professionals. One speaker suggested that priorities for clinical AI deployment should be focused more on operational tooling rather than patient diagnosis and treatment.

One attendee pointed out a “clear lack of education across all constituents — not just amongst developer communities and health care systems, but with patients and regulators as well.” Given that medical doctors are often the primary users of clinical AI tools, a number of the medical doctors present pleaded with regulators to consult them before taking action.

Data availability was a key issue for the majority of AI researchers in attendance. They lamented the lack of data to make their AI tools work effectively. Many faced barriers such as intellectual property barring access or simply a dearth of large, high-quality datasets. “Developers can’t spend billions creating data, but the FDA can,” a speaker pointed out during the event. “There’s a price uncertainty that could lead to underinvestment in AI.” Speakers from the EU touted the development of a system obligating governments to make health data available for AI researchers.

By the end of the daylong event, many of the attendees suggested prolonging the discussion and praised the selective curation and closed environment, which created a unique space conducive to open and productive discussions on AI regulation in health. Once future follow-up events are confirmed, the Jameel Clinic will develop additional workshops of a similar nature to maintain the momentum and keep regulators in the loop on the latest developments in the field.

“The North Star for any regulatory system is safety,” acknowledged one attendee. “Generational thought stems from that, then works downstream.”

New hope for early pancreatic cancer intervention via AI-based risk prediction

The first documented case of pancreatic cancer dates back to the 18th century. Since then, researchers have undertaken a protracted and challenging odyssey to understand the elusive and deadly disease. To date, there is no better cancer treatment than early intervention. Unfortunately, the pancreas, nestled deep within the abdomen, is particularly elusive for early detection.

MIT Computer Science and Artificial Intelligence Laboratory (CSAIL) scientists, alongside Limor Appelbaum, a staff scientist in the Department of Radiation Oncology at Beth Israel Deaconess Medical Center (BIDMC), were eager to better identify potential high-risk patients. They set out to develop two machine-learning models for early detection of pancreatic ductal adenocarcinoma (PDAC), the most common form of the cancer. To access a broad and diverse database, the team synced up with a federated network company, using electronic health record data from various institutions across the United States. This vast pool of data helped ensure the models‘ reliability and generalizability, making them applicable across a wide range of populations, geographical locations, and demographic groups.

The two models — the “PRISM” neural network, and the logistic regression model (a statistical technique for probability), outperformed current methods. The team’s comparison showed that while standard screening criteria identify about 10 percent of PDAC cases using a five-times higher relative risk threshold, Prism can detect 35 percent of PDAC cases at this same threshold.

Using AI to detect cancer risk is not a new phenomena — algorithms analyze mammograms, CT scans for lung cancer, and assist in the analysis of Pap smear tests and HPV testing, to name a few applications. “The PRISM models stand out for their development and validation on an extensive database of over 5 million patients, surpassing the scale of most prior research in the field,” says Kai Jia, an MIT PhD student in electrical engineering and computer science (EECS), MIT CSAIL affiliate, and first author on an open-access paper in eBioMedicine outlining the new work. “The model uses routine clinical and lab data to make its predictions, and the diversity of the U.S. population is a significant advancement over other PDAC models, which are usually confined to specific geographic regions, like a few health-care centers in the U.S. Additionally, using a unique regularization technique in the training process enhanced the models‘ generalizability and interpretability.”

“This report outlines a powerful approach to use big data and artificial intelligence algorithms to refine our approach to identifying risk profiles for cancer,” says David Avigan, a Harvard Medical School professor and the cancer center director and chief of hematology and hematologic malignancies at BIDMC, who was not involved in the study. “This approach may lead to novel strategies to identify patients with high risk for malignancy that may benefit from focused screening with the potential for early intervention.”

Prismatic perspectives

The journey toward the development of PRISM began over six years ago, fueled by firsthand experiences with the limitations of current diagnostic practices. “Approximately 80-85 percent of pancreatic cancer patients are diagnosed at advanced stages, where cure is no longer an option,” says senior author Appelbaum, who is also a Harvard Medical School instructor as well as radiation oncologist. “This clinical frustration sparked the idea to delve into the wealth of data available in electronic health records (EHRs).”

The CSAIL group’s close collaboration with Appelbaum made it possible to understand the combined medical and machine learning aspects of the problem better, eventually leading to a much more accurate and transparent model. “The hypothesis was that these records contained hidden clues — subtle signs and symptoms that could act as early warning signals of pancreatic cancer,” she adds. “This guided our use of federated EHR networks in developing these models, for a scalable approach for deploying risk prediction tools in health care.”

Both PrismNN and PrismLR models analyze EHR data, including patient demographics, diagnoses, medications, and lab results, to assess PDAC risk. PrismNN uses artificial neural networks to detect intricate patterns in data features like age, medical history, and lab results, yielding a risk score for PDAC likelihood. PrismLR uses logistic regression for a simpler analysis, generating a probability score of PDAC based on these features. Together, the models offer a thorough evaluation of different approaches in predicting PDAC risk from the same EHR data.

One paramount point for gaining the trust of physicians, the team notes, is better understanding how the models work, known in the field as interpretability. The scientists pointed out that while logistic regression models are inherently easier to interpret, recent advancements have made deep neural networks somewhat more transparent. This helped the team to refine the thousands of potentially predictive features derived from EHR of a single patient to approximately 85 critical indicators. These indicators, which include patient age, diabetes diagnosis, and an increased frequency of visits to physicians, are automatically discovered by the model but match physicians‘ understanding of risk factors associated with pancreatic cancer.

The path forward

Despite the promise of the PRISM models, as with all research, some parts are still a work in progress. U.S. data alone are the current diet for the models, necessitating testing and adaptation for global use. The path forward, the team notes, includes expanding the model’s applicability to international datasets and integrating additional biomarkers for more refined risk assessment.

“A subsequent aim for us is to facilitate the models‘ implementation in routine health care settings. The vision is to have these models function seamlessly in the background of health care systems, automatically analyzing patient data and alerting physicians to high-risk cases without adding to their workload,” says Jia. “A machine-learning model integrated with the EHR system could empower physicians with early alerts for high-risk patients, potentially enabling interventions well before symptoms manifest. We are eager to deploy our techniques in the real world to help all individuals enjoy longer, healthier lives.”

Jia wrote the paper alongside Applebaum and MIT EECS Professor and CSAIL Principal Investigator Martin Rinard, who are both senior authors of the paper. Researchers on the paper were supported during their time at MIT CSAIL, in part, by the Defense Advanced Research Projects Agency, Boeing, the National Science Foundation, and Aarno Labs. TriNetX provided resources for the project, and the Prevent Cancer Foundation also supported the team.

Reasoning and reliability in AI

In order for natural language to be an effective form of communication, the parties involved need to be able to understand words and their context, assume that the content is largely shared in good faith and is trustworthy, reason about the information being shared, and then apply it to real-world scenarios. MIT PhD students interning with the MIT-IBM Watson AI Lab — Athul Paul Jacob SM ’22, Maohao Shen SM ’23, Victor Butoi, and Andi Peng SM ’23 — are working to attack each step of this process that’s baked into natural language models, so that the AI systems can be more dependable and accurate for users.

To achieve this, Jacob’s research strikes at the heart of existing natural language models to improve the output, using game theory. His interests, he says, are two-fold: “One is understanding how humans behave, using the lens of multi-agent systems and language understanding, and the second thing is, ‘How do you use that as an insight to build better AI systems?’” His work stems from the board game “Diplomacy,” where his research team developed a system that could learn and predict human behaviors and negotiate strategically to achieve a desired, optimal outcome.

“This was a game where you need to build trust; you need to communicate using language. You need to also play against six other players at the same time, which were very different from all the kinds of task domains people were tackling in the past,” says Jacob, referring to other games like poker and GO that researchers put to neural networks. “In doing so, there were a lot of research challenges. One was, ‘How do you model humans? How do you know whether when humans tend to act irrationally?’” Jacob and his research mentors — including Associate Professor Jacob Andreas and Assistant Professor Gabriele Farina of the MIT Department of Electrical Engineering and Computer Science (EECS), and the MIT-IBM Watson AI Lab’s Yikang Shen — recast the problem of language generation as a two-player game.

Using “generator” and “discriminator” models, Jacob’s team developed a natural language system to produce answers to questions and then observe the answers and determine if they are correct. If they are, the AI system receives a point; if not, no point is rewarded. Language models notoriously tend to hallucinate, making them less trustworthy; this no-regret learning algorithm collaboratively takes a natural language model and encourages the system’s answers to be more truthful and reliable, while keeping the solutions close to the pre-trained language model’s priors. Jacob says that using this technique in conjunction with a smaller language model could, likely, make it competitive with the same performance of a model many times bigger.

Once a language model generates a result, researchers ideally want its confidence in its generation to align with its accuracy, but this frequently isn’t the case. Hallucinations can occur with the model reporting high confidence when it should be low. Maohao Shen and his group, with mentors Gregory Wornell, Sumitomo Professor of Engineering in EECS, and lab researchers with IBM Research Subhro Das, Prasanna Sattigeri, and Soumya Ghosh — are looking to fix this through uncertainty quantification (UQ). “Our project aims to calibrate language models when they are poorly calibrated,” says Shen. Specifically, they’re looking at the classification problem. For this, Shen allows a language model to generate free text, which is then converted into a multiple-choice classification task. For instance, they might ask the model to solve a math problem and then ask it if the answer it generated is correct as “yes, no, or maybe.” This helps to determine if the model is over- or under-confident.

Automating this, the team developed a technique that helps tune the confidence output by a pre-trained language model. The researchers trained an auxiliary model using the ground-truth information in order for their system to be able to correct the language model. “If your model is over-confident in its prediction, we are able to detect it and make it less confident, and vice versa,” explains Shen. The team evaluated their technique on multiple popular benchmark datasets to show how well it generalizes to unseen tasks to realign the accuracy and confidence of language model predictions. “After training, you can just plug in and apply this technique to new tasks without any other supervision,” says Shen. “The only thing you need is the data for that new task.”

Victor Butoi also enhances model capability, but instead, his lab team — which includes John Guttag, the Dugald C. Jackson Professor of Computer Science and Electrical Engineering in EECS; lab researchers Leonid Karlinsky and Rogerio Feris of IBM Research; and lab affiliates Hilde Kühne of the University of Bonn and Wei Lin of Graz University of Technology — is creating techniques to allow vision-language models to reason about what they’re seeing, and is designing prompts to unlock new learning abilities and understand key phrases.

Compositional reasoning is just another aspect of the decision-making process that we ask machine-learning models to perform in order for them to be helpful in real-world situations, explains Butoi. “You need to be able to think about problems compositionally and solve subtasks,” says Butoi, “like, if you’re saying the chair is to the left of the person, you need to recognize both the chair and the person. You need to understand directions.” And then once the model understands “left,” the research team wants the model to be able to answer other questions involving “left.”

Surprisingly, vision-language models do not reason well about composition, Butoi explains, but they can be helped to, using a model that can “lead the witness”, if you will. The team developed a model that was tweaked using a technique called low-rank adaptation of large language models (LoRA) and trained on an annotated dataset called Visual Genome, which has objects in an image and arrows denoting relationships, like directions. In this case, the trained LoRA model would be guided to say something about “left” relationships, and this caption output would then be used to provide context and prompt the vision-language model, making it a “significantly easier task,” says Butoi.

In the world of robotics, AI systems also engage with their surroundings using computer vision and language. The settings may range from warehouses to the home. Andi Peng and mentors MIT’s H.N. Slater Professor in Aeronautics and Astronautics Julie Shah and Chuang Gan, of the lab and the University of Massachusetts at Amherst, are focusing on assisting people with physical constraints, using virtual worlds. For this, Peng’s group is developing two embodied AI models — a “human” that needs support and a helper agent — in a simulated environment called ThreeDWorld. Focusing on human/robot interactions, the team leverages semantic priors captured by large language models to aid the helper AI to infer what abilities the “human” agent might not be able to do and the motivation behind actions of the “human,” using natural language. The team’s looking to strengthen the helper’s sequential decision-making, bidirectional communication, ability to understand the physical scene, and how best to contribute.

“A lot of people think that AI programs should be autonomous, but I think that an important part of the process is that we build robots and systems for humans, and we want to convey human knowledge,” says Peng. “We don’t want a system to do something in a weird way; we want them to do it in a human way that we can understand.”

Stratospheric safety standards: How aviation could steer regulation of AI in health

What is the likelihood of dying in a plane crash? According to a 2022 report released by the International Air Transport Association, the industry fatality risk is 0.11. In other words, on average, a person would need to take a flight every day for 25,214 years to have a 100 percent chance of experiencing a fatal accident. Long touted as one of the safest modes of transportation, the highly regulated aviation industry has MIT scientists thinking that it may hold the key to regulating artificial intelligence in health care.

Marzyeh Ghassemi, an assistant professor at the MIT Department of Electrical Engineering and Computer Science (EECS) and Institute of Medical Engineering Sciences, and Julie Shah, an H.N. Slater Professor of Aeronautics and Astronautics at MIT, share an interest in the challenges of transparency in AI models. After chatting in early 2023, they realized that aviation could serve as a model to ensure that marginalized patients are not harmed by biased AI models.

Ghassemi, who is also a principal investigator at the MIT Abdul Latif Jameel Clinic for Machine Learning in Health (Jameel Clinic) and the Computer Science and Artificial Intelligence Laboratory (CSAIL), and Shah then recruited a cross-disciplinary team of researchers, attorneys, and policy analysts across MIT, Stanford University, the Federation of American Scientists, Emory University, University of Adelaide, Microsoft, and the University of California San Francisco to kick off a research project, the results of which were recently accepted to the Equity and Access in Algorithms, Mechanisms and Optimization Conference.

“I think I can speak for both Marzyeh and myself when I say that we’re really excited to see kind of excitement around AI starting to come about in society,” says first author Elizabeth Bondi-Kelly, now an assistant professor of EECS at the University of Michigan who was a postdoc in Ghassemi’s lab when the project began. “But we’re also a little bit cautious and want to try to make sure that it’s possible we can have frameworks in place to manage potential risks as these deployments start to happen, so we were looking for inspiration for ways to try to facilitate that.”

AI in health today bears a resemblance to where the aviation industry was a century ago, says co-author Lindsay Sanneman, a PhD student in the Department of Aeronautics and Astronautics at MIT. Though the 1920s were known as “the Golden Age of Aviation,” fatal accidents were “disturbingly numerous,” according to the Mackinac Center for Public Policy.

Jeff Marcus, the current chief of the National Transportation Safety Board (NTSB) Safety Recommendations Division, recently published a National Aviation Month blog post noting that while a number of fatal accidents occurred in the 1920s, 1929 remains the “worst year on record” for the most fatal aviation accidents in history, with 51 reported accidents. By today’s standards that would be 7,000 accidents per year, or 20 per day. In response to the high number of fatal accidents in the 1920s, President Calvin Coolidge passed landmark legislation in 1926 known as the Air Commerce Act, which would regulate air travel via the Department of Commerce.

But the parallels do not stop there — aviation’s subsequent path into automation is similar to AI’s. AI explainability has been a contentious topic given AI’s notorious “black box” problem, which has AI researchers debating how much an AI model must “explain” its result to the user before potentially biasing them to blindly follow the model’s guidance.

“In the 1970s there was an increasing amount of automation … autopilot systems that take care of warning pilots about risks,” Sanneman adds. “There were some growing pains as automation entered the aviation space in terms of human interaction with the autonomous system — potential confusion that arises when the pilot doesn’t have keen awareness about what the automation is doing.”

Today, becoming a commercial airline captain requires 1,500 hours of logged flight time along with instrument trainings. According to the researchers‘ paper, this rigorous and comprehensive process takes approximately 15 years, including a bachelor’s degree and co-piloting. Researchers believe the success of extensive pilot training could be a potential model for training medical doctors on using AI tools in clinical settings.

The paper also proposes encouraging reports of unsafe health AI tools in the way the Federal Aviation Agency (FAA) does for pilots — via “limited immunity”, which allows pilots to retain their license after doing something unsafe, as long as it was unintentional.

According to a 2023 report published by the World Health Organization, on average, one in every 10 patients is harmed by an adverse event (i.e., “medical errors”) while receiving hospital care in high-income countries.

Yet in current health care practice, clinicians and health care workers often fear reporting medical errors, not only because of concerns related to guilt and self-criticism, but also due to negative consequences that emphasize the punishment of individuals, such as a revoked medical license, rather than reforming the system that made medical error more likely to occur.

“In health, when the hammer misses, patients suffer,” wrote Ghassemi in a recent comment published in Nature Human Behavior. “This reality presents an unacceptable ethical risk for medical AI communities who are already grappling with complex care issues, staffing shortages, and overburdened systems.”

Grace Wickerson, co-author and health equity policy manager at the Federation of American Scientists, sees this new paper as a critical addition to a broader governance framework that is not yet in place. “I think there’s a lot that we can do with existing government authority,” they say. “There’s different ways that Medicare and Medicaid can pay for health AI that makes sure that equity is considered in their purchasing or reimbursement technologies, the NIH [National Institute of Health] can fund more research in making algorithms more equitable and build standards for these algorithms that could then be used by the FDA [Food and Drug Administration] as they’re trying to figure out what health equity means and how they’re regulated within their current authorities.”

Among others, the paper lists six primary existing government agencies that could help regulate health AI, including: the FDA, the Federal Trade Commission (FTC), the recently established Advanced Research Projects Agency for Health, the Agency for Healthcare Research and Quality, the Centers for Medicare and Medicaid, the Department of Health and Human Services, and the Office of Civil Rights (OCR).

But Wickerson says that more needs to be done. The most challenging part to writing the paper, in Wickerson’s view, was “imagining what we don’t have yet.”

Rather than solely relying on existing regulatory bodies, the paper also proposes creating an independent auditing authority, similar to the NTSB, that allows for a safety audit for malfunctioning health AI systems.

“I think that’s the current question for tech governance — we haven’t really had an entity that’s been assessing the impact of technology since the ’90s,” Wickerson adds. “There used to be an Office of Technology Assessment … before the digital era even started, this office existed and then the federal government allowed it to sunset.”

Zach Harned, co-author and recent graduate of Stanford Law School, believes a primary challenge in emerging technology is having technological development outpace regulation. “However, the importance of AI technology and the potential benefits and risks it poses, especially in the health-care arena, has led to a flurry of regulatory efforts,” Harned says. “The FDA is clearly the primary player here, and they’ve consistently issued guidances and white papers attempting to illustrate their evolving position on AI; however, privacy will be another important area to watch, with enforcement from OCR on the HIPAA [Health Insurance Portability and Accountability Act] side and the FTC enforcing privacy violations for non-HIPAA covered entities.”

Harned notes that the area is evolving fast, including developments such as the recent White House Executive Order 14110 on the safe and trustworthy development of AI, as well as regulatory activity in the European Union (EU), including the capstone EU AI Act that is nearing finalization. “It’s certainly an exciting time to see this important technology get developed and regulated to ensure safety while also not stifling innovation,” he says.

In addition to regulatory activities, the paper suggests other opportunities to create incentives for safer health AI tools such as a pay-for-performance program, in which insurance companies reward hospitals for good performance (though researchers recognize that this approach would require additional oversight to be equitable).

So just how long do researchers think it would take to create a working regulatory system for health AI? According to the paper, “the NTSB and FAA system, where investigations and enforcement are in two different bodies, was created by Congress over decades.”

Bondi-Kelly hopes that the paper is a piece to the puzzle of AI regulation. In her mind, “the dream scenario would be that all of us read the paper and are super inspired and able to apply some of the helpful lessons from aviation to help AI to prevent some of the potential harm that might come about.”

In addition to Ghassemi, Shah, Bondi-Kelly, and Sanneman, MIT co-authors on the work include Senior Research Scientist Leo Anthony Celi and former postdocs Thomas Hartvigsen and Swami Sankaranarayanan. Funding for the work came, in part, from an MIT CSAIL METEOR Fellowship, Quanta Computing, the Volkswagen Foundation, the National Institutes of Health, the Herman L. F. von Helmholtz Career Development Professorship and a CIFAR Azrieli Global Scholar award.

Multiple AI models help robots execute complex plans more transparently

Your daily to-do list is likely pretty straightforward: wash the dishes, buy groceries, and other minutiae. It’s unlikely you wrote out “pick up the first dirty dish,” or “wash that plate with a sponge,” because each of these miniature steps within the chore feels intuitive. While we can routinely complete each step without much thought, a robot requires a complex plan that involves more detailed outlines.

MIT’s Improbable AI Lab, a group within the Computer Science and Artificial Intelligence Laboratory (CSAIL), has offered these machines a helping hand with a new multimodal framework: Compositional Foundation Models for Hierarchical Planning (HiP), which develops detailed, feasible plans with the expertise of three different foundation models. Like OpenAI’s GPT-4, the foundation model that ChatGPT and Bing Chat were built upon, these foundation models are trained on massive quantities of data for applications like generating images, translating text, and robotics.

Unlike RT2 and other multimodal models that are trained on paired vision, language, and action data, HiP uses three different foundation models each trained on different data modalities. Each foundation model captures a different part of the decision-making process and then works together when it’s time to make decisions. HiP removes the need for access to paired vision, language, and action data, which is difficult to obtain. HiP also makes the reasoning process more transparent.

What’s considered a daily chore for a human can be a robot’s “long-horizon goal” — an overarching objective that involves completing many smaller steps first — requiring sufficient data to plan, understand, and execute objectives. While computer vision researchers have attempted to build monolithic foundation models for this problem, pairing language, visual, and action data is expensive. Instead, HiP represents a different, multimodal recipe: a trio that cheaply incorporates linguistic, physical, and environmental intelligence into a robot.

“Foundation models do not have to be monolithic,” says NVIDIA AI researcher Jim Fan, who was not involved in the paper. “This work decomposes the complex task of embodied agent planning into three constituent models: a language reasoner, a visual world model, and an action planner. It makes a difficult decision-making problem more tractable and transparent.”

The team believes that their system could help these machines accomplish household chores, such as putting away a book or placing a bowl in the dishwasher. Additionally, HiP could assist with multistep construction and manufacturing tasks, like stacking and placing different materials in specific sequences.

Evaluating HiP

The CSAIL team tested HiP’s acuity on three manipulation tasks, outperforming comparable frameworks. The system reasoned by developing intelligent plans that adapt to new information.

First, the researchers requested that it stack different-colored blocks on each other and then place others nearby. The catch: Some of the correct colors weren’t present, so the robot had to place white blocks in a color bowl to paint them. HiP often adjusted to these changes accurately, especially compared to state-of-the-art task planning systems like Transformer BC and Action Diffuser, by adjusting its plans to stack and place each square as needed.

Another test: arranging objects such as candy and a hammer in a brown box while ignoring other items. Some of the objects it needed to move were dirty, so HiP adjusted its plans to place them in a cleaning box, and then into the brown container. In a third demonstration, the bot was able to ignore unnecessary objects to complete kitchen sub-goals such as opening a microwave, clearing a kettle out of the way, and turning on a light. Some of the prompted steps had already been completed, so the robot adapted by skipping those directions.

A three-pronged hierarchy

HiP’s three-pronged planning process operates as a hierarchy, with the ability to pre-train each of its components on different sets of data, including information outside of robotics. At the bottom of that order is a large language model (LLM), which starts to ideate by capturing all the symbolic information needed and developing an abstract task plan. Applying the common sense knowledge it finds on the internet, the model breaks its objective into sub-goals. For example, “making a cup of tea” turns into “filling a pot with water,” “boiling the pot,” and the subsequent actions required.

“All we want to do is take existing pre-trained models and have them successfully interface with each other,” says Anurag Ajay, a PhD student in the MIT Department of Electrical Engineering and Computer Science (EECS) and a CSAIL affiliate. “Instead of pushing for one model to do everything, we combine multiple ones that leverage different modalities of internet data. When used in tandem, they help with robotic decision-making and can potentially aid with tasks in homes, factories, and construction sites.”

These models also need some form of “eyes” to understand the environment they’re operating in and correctly execute each sub-goal. The team used a large video diffusion model to augment the initial planning completed by the LLM, which collects geometric and physical information about the world from footage on the internet. In turn, the video model generates an observation trajectory plan, refining the LLM’s outline to incorporate new physical knowledge.

This process, known as iterative refinement, allows HiP to reason about its ideas, taking in feedback at each stage to generate a more practical outline. The flow of feedback is similar to writing an article, where an author may send their draft to an editor, and with those revisions incorporated in, the publisher reviews for any last changes and finalizes.

In this case, the top of the hierarchy is an egocentric action model, or a sequence of first-person images that infer which actions should take place based on its surroundings. During this stage, the observation plan from the video model is mapped over the space visible to the robot, helping the machine decide how to execute each task within the long-horizon goal. If a robot uses HiP to make tea, this means it will have mapped out exactly where the pot, sink, and other key visual elements are, and begin completing each sub-goal.

Still, the multimodal work is limited by the lack of high-quality video foundation models. Once available, they could interface with HiP’s small-scale video models to further enhance visual sequence prediction and robot action generation. A higher-quality version would also reduce the current data requirements of the video models.

That being said, the CSAIL team’s approach only used a tiny bit of data overall. Moreover, HiP was cheap to train and demonstrated the potential of using readily available foundation models to complete long-horizon tasks. “What Anurag has demonstrated is proof-of-concept of how we can take models trained on separate tasks and data modalities and combine them into models for robotic planning. In the future, HiP could be augmented with pre-trained models that can process touch and sound to make better plans,” says senior author Pulkit Agrawal, MIT assistant professor in EECS and director of the Improbable AI Lab. The group is also considering applying HiP to solving real-world long-horizon tasks in robotics.

Ajay and Agrawal are lead authors on a paper describing the work. They are joined by MIT professors and CSAIL principal investigators Tommi Jaakkola, Joshua Tenenbaum, and Leslie Pack Kaelbling; CSAIL research affiliate and MIT-IBM AI Lab research manager Akash Srivastava; graduate students Seungwook Han and Yilun Du ’19; former postdoc Abhishek Gupta, who is now assistant professor at University of Washington; and former graduate student Shuang Li PhD ’23.

The team’s work was supported, in part, by the National Science Foundation, the U.S. Defense Advanced Research Projects Agency, the U.S. Army Research Office, the U.S. Office of Naval Research Multidisciplinary University Research Initiatives, and the MIT-IBM Watson AI Lab. Their findings were presented at the 2023 Conference on Neural Information Processing Systems (NeurIPS).

Netzwerk

KIBU – KI Community
Bayerischer Untermain
Würzburger Straße 96
63743 Aschaffenburg
Telefon: +49 6021 4485299
kontakt@kibu.community

Kontakt
Impressum
Datenschutz