KI Trends – Seite 19 – KIBU – KI Community Bayerischer Untermain

Machine learning helped Waseem Alshikh plow through textbooks in college. Now he’s putting generative AI to work, creating content for hundreds of companies.

Born and raised in Syria, Alshikh spoke no English, but he was fluent in software, a talent that served him well when he arrived at college in Lebanon.

“The first day they gave me a stack of textbooks, each one a thousand pages thick, and all of it in English,” he recalled.

So, he wrote a program — a crude but effective statistical classifier that summarized the books — then he studied the summaries.

From Concept to Company

In 2014, he shared his story with May Habib, an entrepreneur he met while working in Dubai. They agreed to create a startup that could help marketing departments — which are always pressured to do more with less — use machine learning to quickly create copy for their web pages, blogs, ads and more.

“Initially, the tech was not there, until transformer models were announced — that was something we could build on,” said Alshikh, the startup’s CTO.

Writer co-founders Habib, CEO, and Alshikh, CTO.

“We found a few engineers and spent almost six months building our first model, a neural network that barely worked and had about 128 million parameters,” an often-used measure of an AI model’s capability.

Along the way, the young company won some business, changed its name to Writer and connected with NVIDIA.

A Startup Accelerated

“Once we got introduced to NVIDIA NeMo, we were able to build industrial-strength models with three, then 20 and now 40 billion parameters, and we’re still scaling,” he said.

NeMo is an application framework that helps companies curate their training datasets, build and customize large language models (LLMs), and run them in production at scale. Organizations everywhere from Korea to Sweden are using it to customize LLMs for their local languages and industries.

“Before NeMo, it took us four and a half months to build a new billion-parameter model. Now we can do it in 16 days — this is mind blowing,” Alshikh said.

Models Make Opportunities

In the first six months of this year, the startup’s team of fewer than 20 AI engineers used NeMo to develop 10 models, each with 30 billion parameters or more.

That translates into big opportunities. Hundreds of businesses now use Writer’s models that NeMo customized for finance, healthcare, retail and other vertical markets.

Writer’s Recap tool creates written summaries from audio recordings of an interview or event.

The startup’s customer list includes household names like Deloitte, L’Oreal, Intuit, Uber and many Fortune 500 companies.

Writer’s success with NeMo is just the start of the story. Dozens of other companies have already downloaded NeMo.

The software will be available soon for anyone to use. It’s part of NVIDIA AI Enterprise, full-stack software optimized to accelerate generative AI workloads and backed by enterprise-grade support, security and application programming interface stability.

Writer offers a full-stack platform for enterprise users.

A Trillion API Calls a Month

Some customers run Writer’s models on their own systems or cloud services. Others ask Writer to host the models, or they use Writer’s API.

“Our cloud infrastructure, managed basically by two people, hosts a trillion API calls a month — we’re generating 90,000 words a second,” Alshikh said. “We’re delivering high-quality models that compete with products from companies with larger teams and bigger budgets.”

NVIDIA NeMo supports an end-to-end flow for generative AI from data curation to inference.

Writer uses the Triton Inference Server that’s packaged with NeMo to run models in production for its customers. Alshikh reports that Triton, used by many companies running LLMs, enables lower latency and greater throughput than alternative programs.

“This means you can run a service for $20,000, instead of $100,000, so we can invest more in building meaningful features,” he said.

A Wide Horizon

Writer is also a member of NVIDIA Inception, a program that nurtures cutting-edge startups. “Thanks to Inception, we got early access to NeMo and some amazing people who guided us through the process of finding and using the tools we need,” he said.

Now that Writer’s text products are getting traction, Alshikh, who splits his time between homes in Florida and California, is searching the horizon for what’s next. In today’s broad frontier of generative AI, he sees opportunities in images, audio, video, 3D — maybe all of the above.

“We see multimodality as the future,” he said.

Check out this page to get started with NeMo. And learn about the early access program for multimodal NeMo here.

And if you enjoyed this story, let folks on social networks know using the following, a summary suggested by Writer:

“Learn how startup Writer uses NVIDIA NeMo software to generate content for hundreds of companies and rack up impressive revenues with a small staff and budget.”

Organizations across industries are using extended reality (XR) to redesign workflows and boost productivity, whether for immersive training or collaborative design reviews.

With the growing use of all-in-one (AIO) headsets, more teams have adopted and integrated XR. While easing XR use, AIO headsets have modest compute and rendering power that can limit the graphics quality of streaming experiences.

NVIDIA is enabling more enterprises and developers to adopt high-quality XR with its CloudXR Suite. Built to greatly simplify streaming, CloudXR enables anyone with an AIO headset or mobile XR device to experience high-fidelity, immersive environments from any location.

CloudXR Suite combines the power of NVIDIA RTX GPUs and NVIDIA RTX Virtual Workstation (vWS) software to stream high-fidelity XR applications to Android and iOS devices. By dynamically adjusting to network conditions, CloudXR maximizes image quality and frame rates to power next-level, wireless augmented-reality and virtual-reality experiences.

With CloudXR, enterprises can gain the flexibility to effectively orchestrate and scale XR workloads, and developers can use the advanced platform to create custom XR products for their users. The suite offers high-quality streaming across both public and private networks.

Ericsson and VMware are among the first companies to use CloudXR.

Taking XR Workflows to the Next Level

CloudXR Suite offers performance that’s comparable to tethered VR experiences.

It comprises three components, including several updates:

CloudXR Essentials, the suite’s underlying streaming layer, brings new improvements such as 5G L4S optimizations, QoS algorithms and enhanced logging tools. Essentials also includes the SteamVR plug-in, along with sample clients and a new server-side application programming interface.
CloudXR Server Extensions improves server-side interfaces with a source-code addition to the Monado OpenXR runtime. The new CloudXR Server API contained in CloudXR Essentials and the OpenXR API represent the gateway to scaling XR distribution for orchestration partners.
CloudXR Client Extensions include as a first offering a CloudXR plug-in built for the Unity Editor. This lets developers build custom CloudXR client applications using already-familiar Unity development tools. Plus, Unity app developers can more easily build applications with branded custom interfaces and lobbies before connecting to their CloudXR streaming server using the plug-in.

Teams can tap into the power of NVIDIA RTX GPUs to achieve ultimate graphics performance on mobile devices. Enterprises can scale to data center and edge networks, and stream to concurrent users with NVIDIA RTX vWS software.

In addition, users can stream stunning XR content from any OpenVR or OpenXR application at the edge using high-bandwidth, low-latency 5G signals.

Partners Experience Enterprise-Grade XR Streaming

Organizations across industries use XR streaming to advance their workflows.

To provide optimal streaming performance, NVIDIA is working with leading companies like Ericsson to implement low-latency, low-loss scalable throughput (L4S) in NVIDIA CloudXR. L4S helps reduce lag in interactive, cloud-based video streaming, so CloudXR users will be able to experience photorealistic XR environments on high-bandwidth, low-latency networks.

“At Ericsson, we believe innovations like L4S are fundamental building blocks to enable latency-critical applications,” said Sibel Tombaz, head of product line for 5G Radio Access Network at Ericsson. “As a key part of Ericsson’s Time-Critical Communication capabilities, L4S will significantly improve user experience for use-cases like cloud gaming, and its great news that NVIDIA is making L4S a production element of CloudXR. We’re excited to be working with NVIDIA to further enhance the XR experience for enterprises, developers and consumers.

More professionals can elevate XR streaming from the cloud with VMware Workspace ONE XR Hub, which includes an integration of CloudXR.

Workspace ONE XR Hub enhances user experiences with VR headsets through advanced authentication and customization options. Combined with the streaming capabilities of CloudXR, Workspace ONE XR Hub allows teams across industries to quickly, securely access complex immersive environments using AIO headsets.

“With this new integration, access to high-fidelity immersive experiences is even easier because streaming lets users tap into the power of RTX GPUs from anywhere,” said Matt Coppinger, director of product management for end-user computing at VMware. “Workspace ONE XR Hub and CloudXR will allow our customers to stream rich XR content, and more teams can boost productivity and integrate realistic, virtual experiences into their workflows.”

Availability

CloudXR Suite will be available to download soon, so users can stream a wide range of XR applications over the network without worrying about demanding graphics requirements.

For example, independent software vendors (ISVs) can create a single, high-quality version of their application that’s built to take advantage of powerful GPUs. And with CloudXR streaming, ISVs can target users with mobile XR devices.

Mobile-device manufacturers can also offer their ISV partners and end users access to high-performance GPU acceleration for unparalleled graphics experiences.

In addition, cloud service providers, orchestrators and system integrators can extend their GPU services with interactive graphics to support next-generation XR applications.

Learn more about NVIDIA CloudXR Suite.

Professionals, teams, creators and others can tap into the power of AI to create high-quality audio and video effects — even using standard microphones and webcams — with the help of NVIDIA Maxine.

The suite of GPU-accelerated software development kits and cloud-native microservices lets users deploy AI features that enhance audio, video and augmented-reality effects for real-time communications services and platforms. Maxine will also expand features for video editing, enabling teams to reach new heights in video communication.

Plus, an NVIDIA Research demo at this week’s SIGGRAPH conference displays how AI can take video conferencing to the next level with 3D features.

NVIDIA Maxine Features Expand to Video Editing

Wireless connectivity has enabled people to join virtual meetings from more locations than ever. Typically, audio and video quality are heavily impacted when a caller is on the move or in a location with poor connectivity.

Advanced, real-time Maxine features — such as Background Noise Removal, Super Resolution and Eye Contact — allow remote users to enhance interpersonal communication experiences.

In addition, Maxine can now be used for video editing. NVIDIA partners are transforming this professional workflow with the same Maxine features that elevate video conferencing. The goal when editing a video, whether a sales pitch or a webinar, is to engage the broadest audience possible. Using Maxine, professionals can tap into AI features that enhance audio and video signals.

With Maxine, a spokesperson can look away from the screen to reference notes or a script while their gaze remains as if looking directly into the camera. Users can also film videos in low resolution and enhance the quality later. Plus, Maxine lets people record videos in several different languages and export the video in English.

Maxine features to be released in early access this year include:

Interpreter: Translates from simplified Chinese, Russian, French, German and Spanish to English while animating the user’s image to show them speaking English.
Voice Font: Enables users to apply characteristics of a speaker’s voice and map it to the audio output.
Audio Super Resolution: Improves audio quality by increasing the temporal resolution of the audio signal and extending bandwidth. It currently supports upsampling from 8,000Hz to 16,000Hz as well as from 16,000Hz to 48,000Hz. This feature is also updated with more than 50% reduction in latency and up to 2x better throughput.
Maxine Client: Brings the AI capabilities of Maxine’s microservices to video-conferencing sessions on PCs. The application is optimized for low-latency streaming and will use the cloud for all of its GPU compute requirements. Thin Client will be available on Windows this fall, with additional OS support to follow.

Maxine can be deployed in the cloud, on premises or at the edge, meaning quality communication can be accessible from nearly anywhere.

Taking Video Conferencing to New Heights

Many partners and customers are experiencing high-quality video conferencing and editing with Maxine. Two features of Maxine — Eye Contact and Live Portrait — are now available in production releases on the NVIDIA AI Enterprise software platform. Eye Contact simulates direct eye contact with the camera by estimating and aligning the user’s gaze with the camera. And Live Portrait animates a person’s portrait photo through their live video feed.

Software company Descript aims to make video a staple of every communicator’s toolkit, alongside docs and slides. With NVIDIA Maxine, professionals and beginners who use Descript can access AI features that improve their video-content workflows.

“With the NVIDIA Maxine Eye Contact feature, users no longer have to worry about memorizing scripts or doing tedious video retakes,” said Jay LeBoeuf, head of business and corporate development at Descript. “They can maintain a perfect on-screen presence while nailing their script every time.”

Reincubate’s Camo app aims to broaden access to great video by taking advantage of the hardware and devices people already own. It does this by giving users greater control over their image and by implementing a powerful, efficient processing pipeline for video effects and transformation. Using technologies enabled by NVIDIA Maxine, Camo can offer users an easier way to achieve incredible video creation.

“Integrating NVIDIA Maxine into Camo couldn’t have been easier, and it’s enabled us to get high performance from users’ RTX GPUs right out of the box,” said Aidan Fitzpatrick, founder and CEO of Reincubate. “With Maxine, the team’s been able to move faster and with more confidence.”

Quicklink’s Cre8 is a powerful video production platform for creating professional, on-brand productions, virtual and hybrid live events. The user-friendly interface combines an intuitive design with all the tools needed to build, edit and customize a professional-looking production. Cre8 incorporates NVIDIA Maxine technology to maximize productivity and the quality of video productions, offering complete control to the operator.

“Quicklink Cre8 now offers the most advanced video production platform on the planet,” said Richard Rees, CEO of Quicklink. “With NVIDIA Maxine, we were able to add advanced features, including Auto Framing, Video Noise Removal, Noise and Echo Cancellation, and Eye Contact Simulation.”

Los Angeles-based company gemelo.ai provides a platform for creating AI twins that can scale a user’s voice, content and interactions. Using Maxine’s Live Portrait feature, the gemelo.ai team can unlock new opportunities for scaled, personalized content and one-on-one interactions.

“The realism of Live Portrait has been a game-changer, unlocking new realms of potential for our AI twins,” said Paul Jaski, CEO of gemelo.ai. “Our customers can now design and deploy incredibly realistic digital twins with the superpowers of unlimited scalability in content production and interaction across apps, websites and mixed-reality experiences.”

NVIDIA Research Shows How 3D Video Enhances Immersive Communication

In addition to powering the advanced features of Maxine, NVIDIA AI enhances video communication with 3D. NVIDIA Research recently published a paper demonstrating how AI could power a 3D video-conferencing system with minimal capture equipment.

3D telepresence systems are typically expensive, require a large space or production studio, and use high-bandwidth, volumetric video streaming — all of which limits the technology’s accessibility. NVIDIA Research shared a new method, which runs on a novel VisionTransformer-based encoder, that takes 2D video input from a standard webcam and turns it into a 3D video representation. Instead of requiring 3D data to be passed back and forth between the participants in a conference, AI enables bandwidth requirements for the call to stay the same as for a 2D conference.

The technology takes a user’s 2D video and automatically creates a 3D representation called a neural radiance field, or NeRF, using volumetric rendering. As a result, participants can stream 2D videos, like they would for traditional video conferencing, while decoding high-quality 3D representations that can be rendered in real time. And with Maxine’s Live Portrait, users can bring their portraits to life in 3D.

AI-mediated 3D video conferencing could significantly reduce the cost for 3D capture, provide a high-fidelity 3D representation, accommodate photorealistic or stylized avatars, and enable mutual eye contact in video conferencing. Related research projects show how AI can help elevate communications and virtual interactions, as well as inform future NVIDIA technologies for video conferencing.

See the system in action below. SIGGRAPH attendees can visit the Emerging Technologies booth, where groups will be able to simultaneously view the live demo on a 3D display designed by New York-based company Looking Glass.

Availability

Learn more about NVIDIA Maxine, which is now available on NVIDIA AI Enterprise.

And see more of the research behind the 3D video conference project.

Featured image courtesy of NVIDIA Research.

AI and accelerated computing were in the spotlight at SIGGRAPH — the world’s largest gathering of computer graphics experts — as NVIDIA founder and CEO Jensen Huang announced during his keynote address updates to NVIDIA Omniverse, a platform for building and connecting 3D tools and applications, as well as acceleration for Universal Scene Description (known as OpenUSD), the open and extensible ecosystem for 3D worlds.

This follows the recent announcement of NVIDIA joining Pixar, Adobe, Apple and Autodesk to form the Alliance for OpenUSD. It marks a major leap toward unlocking the next era of 3D graphics, design and simulation by ensuring compatibility in 3D tools and content for digitalization across industries.

NVIDIA launched three new desktop workstation Ada Generation GPUs — the NVIDIA RTX 5000, RTX 4500 and RTX 4000 — which deliver the latest AI, graphics and real-time rendering technology to professionals worldwide.

Shutterstock is bringing generative AI to 3D scene backgrounds with a foundation model trained using NVIDIA Picasso, a cloud-based foundry for building visual generative AI models. Picasso-trained models can now generate photorealistic, 8K, 360-degree high-dynamic-range imaging (HDRi) environment maps for quicker scene development. Autodesk will also integrate generative AI content-creation services — developed using foundation models in Picasso — with its popular Autodesk Maya software.

Each month, NVIDIA Studio Driver releases provide artists, creators and 3D developers with the best performance and reliability when working with creative applications. Available today, the August NVIDIA Studio Driver gives creators peak reliability for using their favorite creative apps. It includes support for updates to Omniverse, XSplit Broadcaster and Reallusion iClone.

Plus, this week’s featured In the NVIDIA Studio artist Andrew Averkin shows how AI influenced his process in building a delightful cup of joe for his Natural Coffee piece.

Omniverse Expands

Omniverse received a major upgrade, bringing new connectors and advancements to the platform.

These updates are showcased in Omniverse foundation applications, which are fully customizable reference applications that creators, enterprises and developers can copy, extend or enhance.

Upgraded Omniverse applications include Omniverse USD Composer, which lets 3D users assemble large-scale, OpenUSD-based scenes. Omniverse Audio2Face — which provides generative AI application programming interfaces that create realistic facial animations and gestures from only an audio file — now includes multilingual support and a new female base model.

The update brings boosted efficiency and an improved user experience. New rendering optimizations take full advantage of the NVIDIA Ada Lovelace architecture enhancements in NVIDIA RTX GPUs with DLSS 3 technology fully integrated into the Omniverse RTX Renderer. In addition, a new AI denoiser enables real-time 4K path tracing of massive industrial scenes.

New application and experience templates provide developers getting started with OpenUSD and Omniverse a major headstart with minimal coding.

A new Omniverse Kit Extension Registry, a central repository for accessing, sharing and managing Omniverse extensions, lets developers easily turn functionality on and off in their application, making it easier than ever to build custom apps from over 500 core Omniverse extensions provided by NVIDIA.

New extended-reality developer tools let users build spatial-computing options natively into their Omniverse-based applications, giving users the flexibility to experience their 3D projects and virtual worlds however they like.

Expanding their collaboration across Adobe Substance 3D, generative AI and OpenUSD initiatives, Adobe and NVIDIA announced plans to make Adobe Firefly, Adobe’s family of creative generative AI models, available as APIs in Omniverse, enabling developers and creators to enhance their design processes.

Developers and industrial enterprises have new foundation apps and services to optimize and enhance 3D pipelines with the OpenUSD framework and generative AI.

Studio professionals can connect the world of generative AI to their workflows to accelerate entire projects — from environment creation and character animation to scene-setting and more. With Kit AI Agent, OpenUSD Connectors and extensions to prompt top generative AI tools and APIs, Omniverse can aggregate the final result in a unified viewport — collectively reducing the time from conception to creation.

RTX: The Next Generation

The new NVIDIA RTX 5000, RTX 4500 and RTX 4000 Ada Generation professional desktop GPUs feature the latest NVIDIA Ada Lovelace architecture technologies, including DLSS 3, for smoother rendering and real-time interactivity in 3D applications such as Unreal Engine.

These workstation-class GPUs feature third-generation RT Cores with up to 2x the throughput of the previous generation. This enables users to work with larger, higher-fidelity images in real time, helping artists and designers maintain their creative flow.

Fourth-generation Tensor Cores deliver up to 2x the AI performance of the previous generation for AI training and development as well as inferencing and generative AI workloads. ‌Large GPU memory enables AI-augmented multi-application workflows with the latest generative AI-enabled tools and applications.

The Ada architecture provides these new GPUs with up to twice the video encode and decode capability of the previous generation, encoding up to 8K60 video in real time, with support for AV1 encode and decode. Combined with next-generation AI performance, these capabilities make the new professional GPUs ideal for multi-stream video editing workflows with high-resolution content using AI-augmented video editing applications such as Adobe Premiere and DaVinci Resolve.

Designed for high-end creative, multi-application professional workflows that require large models and datasets, these new GPUs provide large GDDR6 memory: 20GB for the RTX 4000, 24GB for the RTX 4500 and 32GB for the RTX 5000 — all supporting error-correcting code for error-free computing.

A Modern-Day Picasso

3D artists regularly face the monumental task of bringing scenes to life by artistically mixing hero assets with props, materials, backgrounds and lighting. Generative AI technologies can help streamline this workflow by generating secondary assets, like environment maps that light the scene.

At SIGGRAPH, Shutterstock announced that it’s tapping into NVIDIA Picasso to train a generative AI model that can create 360 HDRi photorealistic environment maps. The model is built using Shutterstock’s responsibly licensed libraries.

Shutterstock using NVIDIA Picasso to create 360 HDRi photorealistic environment maps.

Previously, artists needed to use expensive 360-degree cameras to create backgrounds and environment maps from scratch, or choose from fixed options that may not precisely match their 3D scene. Now, from simple prompts or using their desired background as a reference, the Shutterstock generative AI feature will quickly generate custom 360-degree, 8K-resolution, HDRi environment maps, which artists can use to set a background and light a scene. This allows more time to work on hero 3D assets, which are the primary assets of a 3D scene that viewers will focus on.

Autodesk also announced that it will integrate generative AI content-creation services — developed using foundation models in Picasso — with its popular 3D software Autodesk Maya.

Autodesk Maya generative AI content-creation services developed using foundation models in Picasso.

August Studio Driver Delivers

The August Studio Driver supports these updates and more, including the latest release of XSplit Broadcaster, the popular streaming software that lets users simultaneously stream to multiple platforms.

XSplit Broadcaster 4.5 introduces NVIDIA Encoder (NVENC) AV1 support. GeForce and NVIDIA RTX 40 Series GPU users can now stream in high-quality 4K 60 frames per second directly to YouTube Live, dramatically improving video quality.

XSplit Broadcaster 4.5 adds AV1 livestreaming support for YouTube.

Streaming in AV1 with RTX GPUs provides 40% better efficiency than H.264, reducing bandwidth requirements for livestreaming or reducing file size for high-quality local captures.

H.264 vs. AV1: 4K60 source encoded at 8 Mbps.

An update to the Reallusion iClone Omniverse Connector includes new features such as real-time synchronization of projects, as well as enhanced import functionality for OpenUSD. This makes work between iClone and Omniverse quicker, smoother and more efficient.

Brew-tiful Artwork

Words can’t espresso the stunning 3D scene Natural Coffee.

Do they accept reservations?

NVIDIA artist Andrew Averkin has over 15 years of experience in the creative field. He finds joy in a continuous journey — blending art and technology — to bring his vivid imagination to life.

His work, Natural Coffee, has a compelling origin story. Once upon a time, in a bustling office, there was a cup of “natural coffee” known for its legendary powers. It gave artists nerves of steel at work, improved performance across the board and, as a small bonus, offered magical music therapy.

Averkin used an image generator to quickly cycle through visual ideas created from simple text-based prompts. Using AI to brainstorm imagery at the beginning of creative workflows is becoming more popular by artists looking to save time on iteration.

Averkin iterates for inspiration.

With a visual foundation, Averkin speeds up the process by acquiring 3D assets from online stores to quickly build a 3D blockout of the scene, a rough-draft level built using simple 3D shapes without details or polished details.

Next, Averkin polished individual assets in Autodesk 3ds Max, sculpting models with fine detail, testing and applying different textures and materials. His GeForce RTX 4090 GPU unlocked RTX-accelerated AI denoising — with the default Autodesk Arnold renderer — delivering interactive 3D modeling, which helped tremendously while composing the scene.

Averkin working in Autodesk 3ds Max.

“I chose a GeForce RTX graphics card for quality, speed and safety, plain and simple,” said Averkin.

Averkin then exported Natural Coffee to the NVIDIA Omniverse USD Composer app via the Autodesk 3ds Max Connector. “Inside USD Composer I added more details, played a lot with a built-in collection of materials, plus did a lot of lighting work to make composition look more realistic,” he explained.

Real-time rendering in Omniverse USD Composer.

One of the biggest benefits in USD Composer is the ability to review scenes rendering in real time with photorealistic light, shadows, textures and more. This dramatically improves the process of editing massive 3D scenes, making it quicker and easier. Averkin was even able to add a camera fly animation, further elevating the scene.

The final step was to add a few touch-ups in Adobe Photoshop. Over 30 GPU-accelerated features gave Averkin plenty of options for playing with colors and contrast, and making final image adjustments smoothly and quickly.

Averkin encourages advanced 3D artists to experiment with the OpenUSD framework. “I use it a lot in my work at NVIDIA and in personal projects,” he said. “OpenUSD is very powerful. It helps with work in multiple creative apps in a non-destructive way, and other great features make the entire process easier and more flexible.”

NVIDIA artist Andrew Averkin.

Check out Averkin’s portfolio on ArtStation.

Follow NVIDIA Studio on Instagram, Twitter and Facebook. Access tutorials on the Studio YouTube channel and get updates directly in your inbox by subscribing to the Studio newsletter.

Picture this: Creators can quickly create and customize 3D scene backgrounds with the help of generative AI, thanks to cutting-edge tools from Shutterstock.

The visual-content provider is building services using NVIDIA Picasso — a cloud-based foundry for developing generative AI models for visual design.

The work incorporates Picasso’s latest feature — announced today during NVIDIA founder and CEO Jensen Huang’s SIGGRAPH keynote — which will help artists enhance and light 3D scenes based on simple text or image prompts, all with AI models built using fully licensed, rights-reserved data.

From these prompts, the new gen AI feature quickly generates custom 360-degree, 8K-resolution, high-dynamic-range imaging (HDRi) environment maps, which artists can use to set a background and light a scene.

This expands on NVIDIA’s collaboration with Shutterstock to empower the next generation of digital content-creation tools and accelerate 3D model generation.

To meet a surging demand for immersive visuals in films, games, virtual worlds, advertising and more, the 3D artist community is rapidly expanding, with over 20% growth in the past year.

Many of these artists are tapping generative AI to bolster their complex workflows — and will be able to use the technology to quickly create and customize environment maps. This allows more time to work on hero 3D assets, which are the primary assets of a 3D scene that viewers will focus on. It makes a panoramic difference when creating compelling 3D visuals.

“We’re committed to hyper-enabling 3D artists and collaborators — helping them build the immersive environments they envision faster than ever before and streamlining their content-creation workflows using NVIDIA Picasso,” said Dade Orgeron, vice president of 3D innovation at Shutterstock.

Generating Photorealistic Environment Maps

Previously, artists needed to buy expensive 360-degree cameras to create backgrounds and environment maps from scratch, or choose from fixed options that may not precisely match their 3D scene.

Now, users can simply provide a prompt — whether that’s text or a reference image — and the 360 HDRi services built on Picasso will quickly generate panoramic images. Plus, thanks to generative AI, the custom environment map can automatically match the background image that’s inputted as a prompt.

Users can then customize the maps and quickly iterate on ideas until they achieve the vision they want.

Collaboration to Boost 3D World-Building

Autodesk, a provider of 3D software and tools for creators in media and entertainment, is focused on giving artists the creative freedom to inspire and delight audiences worldwide.

Enabling artists to trade mundane tasks for unbridled creativity, Autodesk will integrate generative AI content-creation services — developed using foundation models in Picasso — with its popular 3D software Maya.

Supercharging Autodesk customer workflows with AI allows artists to focus on creating — and to ultimately produce content faster.

Generative AI Model Foundry

Picasso is part of NVIDIA AI Foundations, which advances enterprise-level generative AI for text, visual content and even biology.

The foundry will also adopt new NVIDIA research to generate physics-based rendering materials from text and image prompts, demonstrated at SIGGRAPH’s Real-Time Live competition. This will enable content providers to create 3D services, software and tools that enhance and expedite the simulation of diverse physical materials, such as tiles, metals and wood — complete with texture-mapping techniques, including normal, roughness and ambient occlusion.

Picasso runs on the NVIDIA Omniverse Cloud platform-as-a-service and is accessible via a serverless application programming interface that content and service providers like Shutterstock can easily connect to their websites and applications.

Learn about the latest advances in generative AI, graphics and more by joining NVIDIA at SIGGRAPH, running through Thursday, Aug. 10.

NVIDIA researchers are taking the stage at SIGGRAPH, the world’s largest computer graphics conference, to demonstrate a generative AI workflow that helps artists rapidly create and iterate on materials for 3D scenes.

The research demo, which will be presented today at the show’s Real-Time Live event, showcases how artists can use text or image prompts to generate custom textured materials — such as fabric, wood and stone — faster and with finer creative control. These capabilities will be coming to NVIDIA Picasso, allowing enterprises, software creators and service providers to create custom generative AI models for materials, developed using their own fully licensed data.

This set of AI models will facilitate iterative creating and editing of materials, enabling companies to offer new tools that’ll help artists rapidly refine a 3D object’s appearance until they achieve the desired result.

In the demo, NVIDIA researchers experiment with a living-room scene, like an interior designer assisted by AI might do in any 3D rendering application. In this case, researchers use NVIDIA Omniverse USD Composer — a reference application for scene assembly and composition using Universal Scene Description, known as OpenUSD — to add a brick-textured wall, to create and modify fabric choices for the sofa and throw pillows, and to incorporate an abstract animal design in a specific area of the wall.

Generative AI Enables Iterative Design

The Real-Time Live demo combines several optimized AI models — a palette of tools that developers using Picasso will be able to customize and integrate into creative applications for artists.

Once integrated into creative applications, these features will allow artists to enter a brief text prompt to generate materials — such as a brick or a mosaic pattern — that are tileable, meaning they can be seamlessly replicated over a surface of any size. Or, they can import a reference image, such as a swatch of flannel fabric, and apply it to any object in the virtual scene.

An AI editing tool lets artists modify a specific area of the material they’re working on, such as the center of a coffee table texture.

The AI-generated materials support physics-based rendering, responding realistically to changes in the scene’s lighting. They include normal, roughness and ambient occlusion maps — features that are critical to creating and fine-tuning materials for photorealistic 3D scenes.

When accelerated on NVIDIA Tensor Core GPUs, materials can be generated in near real time, and can be upscaled in the background, achieving up to 4K resolution while creators continue to refine other parts of the scene.

Across creative industries — including architecture, game development and interior design — these capabilities could help artists quickly explore ideas and experiment with different aesthetic styles to create multiple versions of a scene.

A game developer, for example, could use these generative AI features to speed up the process of designing an open world environment or creating a character’s wardrobe. An architect could experiment with different styles of building facades in various lighting environments.

Build Generative AI Services With NVIDIA Picasso

These capabilities for physics-based material generation will be made available in NVIDIA Picasso, a cloud-based foundry that allows companies to build, optimize and fine-tune their own generative AI foundational models for visual content.

Picasso enables content providers to develop generative AI tools and services trained on fully licensed, rights-reserved data. It’s part of NVIDIA AI Foundations, a set of model-making services that advance generative AI across text, visual content and biology.

At today’s SIGGRAPH keynote, NVIDIA founder and CEO Jensen Huang also announced a new Picasso feature to generate photorealistic 360 HDRi environment maps to light 3D scenes using simple text or image prompts.

See This Research at SIGGRAPH’s Real-Time Live

Real-Time Live is one of the most anticipated events at SIGGRAPH. This year, the showcase features more than a dozen jury-reviewed projects, including those from teams at Roblox, the University of Utah and Metaphysic, a member of the NVIDIA Inception program for cutting-edge startups.

At the event, NVIDIA researchers will present this interactive materials research live, including a demo of the super resolution tool. Conference attendees can catch the session today at 6 p.m. PT in West Hall B at the Los Angeles Convention Center.

Learn about the latest advances in generative AI, graphics and more by joining NVIDIA at SIGGRAPH, running through Thursday, Aug. 10.

DENZA, the luxury EV brand joint venture between BYD and Mercedes-Benz, has collaborated with marketing and communications giant WPP and NVIDIA Omniverse Cloud to build and deploy its next generation of car configurators, NVIDIA founder and CEO Jensen Huang announced at SIGGRAPH.

WPP is using Omniverse Cloud — a platform for developing, deploying and managing industrial digitalization applications — to help unify the automaker’s highly complex design and marketing pipeline.

Omniverse Cloud enables WPP to build a single, physically accurate, real-time digital twin of the DENZA N7 model by integrating full-fidelity design data from the EV maker’s preferred computer-aided design tools via Universal Scene Description, or OpenUSD.

OpenUSD is a 3D framework that enables interoperability between software tools and data types for the building of virtual worlds.

The implementation of a new unified asset pipeline breaks down proprietary data silos, fostering enhanced data accessibility and facilitating collaborative, iterative reviews for the organization’s large design teams and stakeholders. It enables WPP to work on launch campaigns earlier in the design process, making iterations faster and less costly.

Unifying Asset Pipelines With Omniverse Cloud

Using Omniverse Cloud, WPP’s teams can connect their own pipeline of OpenUSD-enabled design and content creation tools such as Autodesk Maya and Adobe Substance 3D Painter to develop a new configurator for the DENZA N7. With a unified asset pipeline in Omniverse, WPP’s teams of artists can iterate and edit in real time a path-traced view of the full engineering dataset of the DENZA N7 — ensuring the virtual car accurately represents the physical car.

Traditional car configurators require hundreds of thousands of images to be prerendered to represent all possible options and variants. OpenUSD makes it possible for WPP to create a digital twin of the car that includes all possible variants in one single asset. No prerendered images are required.

In parallel, WPP’s environmental artists create fully interactive, live 3D virtual sets. These can start with a scan of a real-world environment, such as those WPP captures with their robot dog, or tap into generative AI tools from providers such as Shutterstock to instantly generate 360-degree HDRi backgrounds to maximize opportunity for personalization.

Shutterstock is using NVIDIA Picasso — a foundry for building generative AI visual models — to develop a variety of generative AI services to accelerate 3D workflows. At SIGGRAPH, Shutterstock announced the first offering of these new services – 360 HDRi – to create photorealistic HDR environment maps to relight a scene. With this feature, artists can rapidly create custom environments that fit their needs.

One-Click Publish to GDN

Once the 3D experience is complete, with just one click, WPP can publish it to Graphics Delivery Network (GDN), part of NVIDIA Omniverse Cloud. GDN is a network of data centers built to serve real-time, high-fidelity 3D content to nearly any web device, enabling interactive experiences in the dealer showroom as well as on consumers’ mobile devices.

This eliminates the tedious process of manually packaging, deploying, hosting and managing the experience themselves. If updates are needed, just like with the initial deployment, WPP can publish them with a single click.

CTA: Learn more about Omniverse Cloud and GDN.

Microsoft Azure users can now turn to the latest NVIDIA accelerated computing technology to train and deploy their generative AI applications.

Available today, the Microsoft Azure ND H100 v5 VMs using NVIDIA H100 Tensor Core GPUs and NVIDIA Quantum-2 InfiniBand networking — enables scaling generative AI, high performance computing (HPC) and other applications with a click from a browser.

Available to customers across the U.S., the new instance arrives as developers and researchers are using large language models (LLMs) and accelerated computing to uncover new consumer and business use cases.

The NVIDIA H100 GPU delivers supercomputing-class performance through architectural innovations, including fourth-generation Tensor Cores, a new Transformer Engine for accelerating LLMs and the latest NVLink technology that lets GPUs talk to each other at 900GB/sec.

The inclusion of NVIDIA Quantum-2 CX7 InfiniBand with 3,200 Gbps cross-node bandwidth ensures seamless performance across the GPUs at massive scale, matching the capabilities of top-performing supercomputers globally.

Scaling With v5 VMs

ND H100 v5 VMs are ideal for training and running inference for increasingly complex LLMs and computer vision models. These neural networks drive the most demanding and compute-intensive generative AI applications, including question answering, code generation, audio, video and image generation, speech recognition and more.

The ND H100 v5 VMs achieve up to 2x speedup in LLMs like the BLOOM 175B model for inference versus previous generation instances, demonstrating their potential to further optimize AI applications.

NVIDIA and Azure

NVIDIA H100 Tensor Core GPUs on Azure provide enterprises the performance, versatility and scale to supercharge their AI training and inference workloads. The combination streamlines the development and deployment of production AI with the NVIDIA AI Enterprise software suite integrated with Azure Machine Learning for MLOps, and delivers record-setting AI performance in industry-standard MLPerf benchmarks.

In addition, by connecting the NVIDIA Omniverse platform to Azure, NVIDIA and Microsoft are providing hundreds of millions of Microsoft enterprise users with access to powerful industrial digitalization and AI supercomputing resources.

Learn more about new Azure v5 instances powered by NVIDIA H100 GPUs.

One pandemic and one generative AI revolution later, NVIDIA founder and CEO Jensen Huang returns to the SIGGRAPH stage next week to deliver a live keynote at the world’s largest professional graphics conference.

The address, slated for Tuesday, Aug. 8, at 8 a.m. PT in Los Angeles, will feature an exclusive look at some of NVIDIA’s newest breakthroughs, including award-winning research, OpenUSD developments and the latest AI-powered solutions for content creation.

NVIDIA founder and CEO Jensen Huang.

Huang’s address comes after NVIDIA joined forces last week with Pixar, Adobe, Apple and Autodesk to found the Alliance for OpenUSD, a major leap toward unlocking the next era of interoperability in 3D graphics, design and simulation.

The group will standardize and extend OpenUSD, the open-source Universal Scene Description framework that’s the foundation of interoperable 3D applications and projects ranging from visual effects to industrial digital twins.

Huang will also offer a perspective on what’s been a raucous year for AI, with wildly popular new generative AI applications — including ChatGPT and Midjourney — providing a taste of what’s to come as developers worldwide get to work.

Throughout the conference, NVIDIA will participate in sessions on immersive visualization, 3D interoperability and AI-mediated video conferencing and presenting 20 research papers. Attendees will also get the opportunity to join hands-on labs.

Join SIGGRAPH to witness the evolution of AI and visual computing. Watch the keynote on this page.

Image source: Ron Diering, via Flickr, some rights reserved.

Goran Vuksic is the brain behind a project to build a real-world pit droid, a type of Star Wars bot that repairs and maintains podracers which zoom across the much-loved film series.

The edge AI Jedi used an NVIDIA Jetson Orin Nano Developer Kit as the brain of the droid itself. The devkit enables the bot, which is a little less than four feet tall and has a simple webcam for eyes, to identify and move its head toward objects.

Vuksic — originally from Croatia and now based in Malmö, Sweden — recently traveled with the pit droid across Belgium and the Netherlands to several tech conferences. He presented to hundreds of people on computer vision and AI, using the droid as an engaging real-world demo.

The pit droid’s first look at the world.

A self-described Star Wars fanatic, he’s upgrading the droid’s capabilities in his free time, when not engrossed in his work as an engineering manager at a Copenhagen-based company. He’s also co-founder and chief technology officer of syntheticAIdata, a member of the NVIDIA Inception program for cutting-edge startups.

The company, which creates vision AI models with cost-effective synthetic data, uses a connector to the NVIDIA Omniverse platform for building and operating 3D tools and applications.

About the Maker

Named a Jetson AI Specialist by NVIDIA and an AI “Most Valuable Professional” by Microsoft, Vuksic got started with artificial intelligence and IT about a decade ago when working for a startup that classified tattoos with vision AI.

Since then, he’s worked as an engineering and technical manager, among other roles, developing IT strategies and solutions for various companies.

Robotics has always interested him, as he was a huge sci-fi fan growing up.

“Watching Star Wars and other films, I imagined how robots might be able to see and do stuff in the real world,” said Vuksic, also a member of the NVIDIA Developer Program.

Now, he’s enabling just that with the pit droid project powered by the NVIDIA Jetson platform, which the developer has used since the launch of its first product nearly a decade ago.

Vuksic reads to the pit droid.

Apart from tinkering with computers and bots, Vuksic enjoys playing the bass guitar in a band with his friends.

His Inspiration

Vuksic built the pit droid for both fun and educational purposes.

As a frequent speaker at tech conferences, he takes the pit droid on stage to engage with his audience, demonstrate how it works and inspire others to build something similar, he said.

Vuksic, his startup co-founder Sherry List and the pit droid present at the Techorama conference in Antwerp, Belgium.

“We live in a connected world — all the things around us are exchanging data and becoming more and more automated,” he added. “I think this is super exciting, and we’ll likely have even more robots to help humans with tasks.”

Using the NVIDIA Jetson platform, Vuksic is at the forefront of robotics innovation, along with an ecosystem of developers using edge AI.

His Jetson Project

Vuksic’s pit droid project, which took him four months, began with 3D printing its body parts and putting them all together.

He then equipped the bot with the Jetson Orin Nano Developer Kit as the brain in its head, which can move in all directions thanks to two motors.

Vuksic places an NVIDIA Jetson Orin Nano Developer Kit in the pit droid’s head.

The Jetson Orin Nano enables real-time processing of the camera feed. “It’s truly, truly amazing to have this processing power in such a small box that fits in the droid’s head,” said Vuksic.

He also uses Microsoft Azure to process the data in the cloud for object-detection training.

“My favorite part of the project was definitely connecting it to the Jetson Orin Nano, which made it easy to run the AI and make the droid move according to what it sees,” said Vuksic, who wrote a step-by-step technical guide to building the bot, so others can try it themselves.

“The most challenging part was traveling with the droid — there was a bit of explanation necessary when I was passing security and opened my bag which contained the robot in parts,” the developer mused. “I said, ‘This is just my big toy!’”

Learn more about the NVIDIA Jetson platform.

From Concept to Company

A Startup Accelerated

Models Make Opportunities

A Trillion API Calls a Month

A Wide Horizon

Taking XR Workflows to the Next Level

Partners Experience Enterprise-Grade XR Streaming

Availability

NVIDIA Maxine Features Expand to Video Editing

Taking Video Conferencing to New Heights

NVIDIA Research Shows How 3D Video Enhances Immersive Communication

Availability

Omniverse Expands

RTX: The Next Generation

A Modern-Day Picasso

August Studio Driver Delivers

Brew-tiful Artwork

Generating Photorealistic Environment Maps

Collaboration to Boost 3D World-Building

Generative AI Model Foundry

Generative AI Enables Iterative Design

Build Generative AI Services With NVIDIA Picasso

See This Research at SIGGRAPH’s Real-Time Live

Unifying Asset Pipelines With Omniverse Cloud

One-Click Publish to GDN

Scaling With v5 VMs

NVIDIA and Azure

Image source: Ron Diering, via Flickr, some rights reserved.

About the Maker

His Inspiration

His Jetson Project

Netzwerk