Ahorra un 25 % (o incluso más) en tus costes de Kafka | Acepta el reto del ahorro con Kafka de Confluent

Sep 26, 2024Lecturas: 3 min

Atomic Tessellator: Revolutionizing Computational Chemistry with Data Streaming

Escrito por

Will StoltonProduct Marketing Manager, Confluent
Tim GraczewskiGlobal Head of Confluent for Startups

Sep 26, 2024Lecturas: 3 min

Computational chemistry relies on large volumes of complex data in order to provide insights into new applications, whether it’s for electric vehicles or new battery development. With the emergence of generative AI (GenAI), the rapid, scalable processing of this data has become possible and critical to investigate previously unexplored areas in catalysis and materials science. One company that’s applying GenAI to computational chemistry is Atomic Tessellator, a New Zealand-based startup and finalist in Confluent’s inaugural Data Streaming Startup Challenge. With Confluent sitting at the core of their data infrastructure, Atomic Tessellator provides a powerful platform for molecular research backed by computational methods, focusing on catalyst discovery. Read on to learn how data streaming plays a central role in their technology.

The problem: huge, siloed databases and legacy messaging

Molecular research consists of two processes: search and evaluation. The search element involves exploring huge databases of molecular structures or configurations and identifying candidates (i.e., for a specific aim, such as testing the performance characteristics of a material for a relevant application). The evaluation step then applies filtering methods and quantum chemistry calculations to these candidates and surfacing results.

The traditional data infrastructure and tooling underpinning these processes is slow, siloed, incompatible, and fragile. The source databases are large (i.e., billions of records) and contain multitudes of file types (e.g., HD5 and SDF), often with 10,000s of dimensions which is orders of magnitude more challenging than the requirements of most industries. Atomic Tessellator has possibly the largest collection of theoretical and experimental catalyst datasets in the world.

Prior to adopting data streaming, Atomic Tessellator relied on batch-based ETL using RabbitMQ, Celery, and a Django app to retrieve molecular data from its databases. While the setup was functional, it was far from ideal; the team had to spend a significant amount of time on state maintenance and manually provisioning compute resources to deal with variable workloads. If there was a failure, it took a long time to figure out what had caused it. Ultimately, this resulted in a data infrastructure that was slow and cumbersome to maintain—a barrier to the rapid processing of huge volumes of the data that was needed.

The solution: GenAI and data streaming

In order to create a more flexible, scalable platform, Atomic Tessellator turned to Confluent and Apache Kafka®.

Atomic Tessellator uses AI to push volumes of complex molecular data from various databases and generative models into Confluent, and from there streams the data to a downstream “evaluation” application to quickly and reliably replicate chemical reactions in order to discover new catalysts with high commercial value. Experiments are also augmented with LLMs so that the vast amount of the experiment design, execution, and debugging is done automatically.

Fig. 1. – Atomic Tessellator’s logical data flow architecture

(See full-size image)

Business benefits

With a data streaming architecture built on Kafka, Atomic Tessellator is able to scale their pipelines to meet the temporary demands of experiments, avoiding the need to over-provision compute resources. They’re also able to tweak components of their pipelines midway through a workload, rather than disrupt or redeploy it; if generative components are emitting too many similar results, for instance, they can alter the “entropy injection rate” in order to increase candidate diversity (i.e., improving the quality of the experiment) in the moment.

Fig. 2. – Electron density simulation enabled by GenAI and data streaming

What’s more, the team has clear visibility over the status of their production pipelines. The ability to inspect data streams in real time via JMX (e.g., throughputs, latencies, durability, and availability) allows them to develop faster and shorten the time to market for new applications.

As the team innovates further, they plan to utilize the full capabilities of Confluent’s data streaming platform. Over the next year, for instance, they’re exploring the use of Apache Flink® in building out a new “pipeline templates” system, which functions as a turnkey set of common simulation configurations.

Summary: Blazing a trail in computational chemistry, powered by Confluent

By moving away from legacy messaging technologies and toward data streaming with Confluent, Atomic Tessellator has been able to build a computational chemistry platform that is fast, reliable, and flexible. They can deploy streaming data pipelines in minutes (rather than days or weeks), sync complex molecular data into downstream evaluation applications, and rapidly integrate new, innovative services into their platform (e.g., Fig. 2. – a custom LLM-powered electron density simulation, which generates quantum chemistry input files into an interactive 3D image). With Confluent as the backbone of their data infrastructure, Atomic Tessellator has developed a powerful platform to facilitate the rapid discovery of new catalysts for a wide range of industries.

Further resources

Webpage: Making Real-Time data Real Easy for Startups
Webpage: Generative Artificial Intelligence (GenAI) | Confluent
Website: Atomic Tessellator

Will Stolton is a Product Marketing Manager at Confluent, focusing on core data streaming. He previously worked in product marketing at Snowplow, having transitioned from financial auditing with BDO.
Tim Graczewski is the Global Head of Confluent for Startups. A two-time venture backed entrepreneur, Tim has also held senior strategy and business development roles at Oracle and Intuit.

¿Te ha gustado esta publicación? Compártela ahora

Focal Systems: Boosting Store Performance with an AI Retail Operating System and Real-Time Data

Feb 4, 2026

Discover how Focal Systems uses computer vision, AI, and data streaming to improve retail store performance, shelf availability, and real-time inventory accuracy.

Thunai Automates Customer Support with AI Agents and Data Streaming

Jan 23, 2026

Learn how Thunai uses real-time data streaming to power agentic AI, achieving 70–80% L1 support deflection and cutting resolution time from hours to minutes.

Atomic Tessellator: Revolutionizing Computational Chemistry with Data Streaming

Get Started with Confluent Cloud

Escrito por

The problem: huge, siloed databases and legacy messaging

The solution: GenAI and data streaming

Business benefits

Summary: Blazing a trail in computational chemistry, powered by Confluent

Further resources

Get Started with Confluent Cloud

¿Te ha gustado esta publicación? Compártela ahora

Focal Systems: Boosting Store Performance with an AI Retail Operating System and Real-Time Data

Thunai Automates Customer Support with AI Agents and Data Streaming

The problem: huge, siloed databases and legacy messaging

The solution: GenAI and data streaming

Business benefits

Summary: Blazing a trail in computational chemistry, powered by Confluent

Further resources

Get Started with Confluent Cloud

¿Te ha gustado esta publicación? Compártela ahora

Suscríbete al blog de Confluent

Focal Systems: Boosting Store Performance with an AI Retail Operating System and Real-Time Data

Thunai Automates Customer Support with AI Agents and Data Streaming