How Robotics Works Today

For most of robotics history, a robot was a programmed machine. Robotics engineers wrote software that specified exactly what the robot should do: move to this position, close the gripper with this force, move to the next position, repeat.

‍

This approach transformed industries like automotive manufacturing and electronics assembly, where robots perform the same task millions of times in tightly controlled environments. It’s far less effective in environments where every object is different. Food manufacturing is one example. Ingredients are stacked up differently in every container. They vary in shape, density, and moisture content. Conveyor speeds fluctuate. Trays arrive slightly misaligned. No two servings are exactly the same.

‍

Over the past several years, robotics has undergone a fundamental shift. Rather than programming every behavior, engineers increasingly train robots with data. To understand modern robotics, we need to understand this progression.

‍

Robotics 0.0: Mechatronic automation

Before industrial robots existed, factories were automated using purpose-built machines.

‍

These systems combined mechanical design with actuators, sensors, pneumatics, hydraulics, and programmable logic controllers (PLCs) to perform a single task at extremely high speed and reliability. Filling bottles, sealing packages, conveying products, dispensing sauces, sorting parts, and labeling products have all been automated this way for decades.

‍

In food manufacturing, examples include multihead weighers, depositors, dispensers, thermoformers, packaging machines, and conveyors. These machines often incorporate sophisticated engineering, including servo-controlled motors, precision motion systems, force sensors, pressure feedback, and closed-loop control. Many remain state-of-the-art today.

‍

What they generally lack is decision-making. A depositor dispenses a programmed volume. A conveyor moves at a programmed speed. A packaging machine seals a tray using programmed timing and force. If the product changes unexpectedly, the machine typically can’t adapt beyond the range anticipated by its control logic.

‍

The intelligence lives almost entirely in the machine’s mechanical design and control system rather than software that reasons about the environment. For highly repetitive manufacturing tasks, this approach is still highly effective today. In many factories, mechatronic automation still performs the vast majority of physical work.

‍

Robotics 1.0: Programmed motion

The first generation of industrial robotics was built around deterministic programs. An engineer specified every waypoint, every gripper command, and every transition in between. A simple version of one of these programs might look like this:

Move to point A
Move to point B
Close the gripper
Move to point C
Open the gripper
Repeat

‍

Industrial robots became extraordinarily good at executing these programs. Modern six-degree-of-freedom (6-DOF) robot arms from companies like ABB, FANUC, KUKA, and Yaskawa achieve sub-millimeter repeatability while operating around the clock.

‍

This approach works remarkably well when the world is predictable. Automotive welding, palletizing, packaging, and machine tending all fit this model because the robot encounters essentially the same situation every cycle.

‍

The limitation was never the hardware. Industrial robot arms, servo systems, and motion controllers have been highly sophisticated for decades. The limitation of programmed industrial robots was that they could only execute the program they had been given. If one parameter in its environment changed, the robot was unable to adapt.

‍

Robotics 2.0: Learned heuristics

As robots began operating in less structured environments, engineers introduced another layer: heuristics. Instead of programming only motion, they programmed decision rules. For example:

If the meal tray is shifted left, offset the ingredient deposit by 5 mm.
If the previous scoop was underweight, scoop slightly deeper.
If the ingredient level falls below a threshold, adjust the approach angle.

‍

Many food automation systems operate this way today. Traditional depositors, dispensers, and vision systems often contain hundreds or thousands of carefully tuned rules that account for expected sources of variability.

‍

Heuristics significantly expanded what robots could accomplish, but they introduced a different challenge. Every new ingredient, container, or production line required additional engineering effort. Over time, robotics software became a growing collection of special cases and exceptions. Robots with learned heuristics could adapt, but only within the range of situations that engineers had anticipated.

‍

Robotics 3.0: Hybrid AI

In modern robotics, instead of relying entirely on handwritten rules, engineers increasingly replace individual components of the robotics stack with learned AI models. Most production robots today still follow a modular architecture, as every autonomous robot needs to perform the following three tasks:

See the environment
Think about what action to take
Act to execute that action

‍

These tasks are usually implemented as separate modules. A perception module detects objects and estimates their pose. A planning module determines what the robot should do next. A control module converts that plan into motor commands that move the robot.

‍

What has changed is that these modules are increasingly learned rather than programmed. For example, a robot may use a neural network to detect ingredients, another model to determine where to grasp them, and a traditional motion planner to execute the resulting trajectory. Rather than replacing the entire robotics stack with a single AI model, learning is introduced where it provides the greatest benefit.

‍

This hybrid approach combines decades of advances in robotics engineering with modern machine learning (ML). It allows robots to adapt to variability that would be difficult to capture with handwritten rules while still benefiting from the reliability and interpretability of modular software.

‍

Robotics 4.0: End-to-end learning

Today, rather than using separate modules for see, think, and act, we see the advent of “end-to-end” models. They take in pixels and output control (i.e., joint angles for an articulated 6-DOF arm or steering wheel angle and a brake scalar).

‍

These are powered entirely by data. Rather than writing code, the models (or policies) are learned entirely from data.

‍

Powered by data

If robotics is becoming a data discipline, the obvious question is: where does the data come from? One route is demonstrations.

‍

Instead of programming a robot to perform a task, an operator performs the task while the robot observes. Over dozens or hundreds of demonstrations, the robot sees slightly different versions of the same activity and learns the underlying behavior.

‍

Imagine teaching a robot to assemble a burger. Rather than writing code that specifies how to pick up a burger bun, place a slice of cheese, or reposition a piece of lettuce that’s hanging off the edge, an operator simply performs those actions repeatedly. Every demonstration is slightly different. The cheese may stick to the slice underneath it. The lettuce may be rotated differently. The burger patty may land a little off center. The robot learns from those variations.

‍

This process is called learning from demonstration, or imitation learning. Instead of encoding human intuition into handwritten software, the intuition becomes part of the model itself. Decisions such as how to angle the gripper to separate two slices of cheese or where to grasp a tomato slice so it doesn’t slip are learned directly from demonstrations.

‍

One increasingly common method for collecting demonstrations is leader-follower teleoperation, originally popularized by Stanford’s ALOHA system. An operator controls a leader arm while a follower arm mirrors the motion in real time. Cameras mounted on the follower observe the workspace, while the robot records the actions required to complete the task.

‍

Leader-follower teleoperation has several advantages. As every demonstration is performed on the actual robot hardware, every motion is physically executable. The robot’s cameras observe the same scene they will encounter during deployment rather than one partially blocked by a human operator. Most importantly, the demonstrations capture real interaction with real food. The dataset contains not only successful trajectories, but also the subtle corrections required to manipulate deformable, slippery, sticky, and highly variable ingredients.

‍

Learning a policy

Data collection is one problem, but now the robot needs to learn a policy that converts observations into actions. Today, one of the most successful approaches to this is diffusion policy, introduced by Chi et al. in 2023.

‍

Readers familiar with image generation models like DALL-E or Sora will recognize the underlying idea. Rather than generating pixels, a diffusion policy generates a sequence of robot motions. Given camera images and a short history of the robot’s recent motion, the model predicts where the robot’s end effector should move next. Traditional robotics used to require engineers to specify these motions explicitly. A learned policy discovers them directly from data.

‍

Why data matters more than code

Food manipulation is a unique challenge because organic, deformable materials are extremely difficult to simulate. Rigid objects can often be simulated accurately (think autonomous vehicles or warehouse robots). Food ingredients can’t. Rice behaves differently depending on how it was cooked. Mashed potatoes change in consistency as they cool. Pasta, shredded chicken, and leafy greens all deform differently under identical manipulation.

‍

As a result, food robotics relies much more heavily on real-world demonstrations than other robotics domains. This creates an important flywheel: every production deployment generates new data with new examples of variability. Those examples become new training data, which improves future models and enables the robot to better handle situations it has never encountered before.

‍

A modern robotics company improves its systems differently compared to a traditional robotics company. Rather than primarily writing new software, it collects better demonstrations, curates larger datasets, and trains better models.

‍

The rise of robotics foundation models

Today’s diffusion policies are typically trained to perform a single task. The frontier is moving toward robotics foundation models that can generalize across many tasks, environments, and embodiments using a single set of model weights.

‍

One direction is vision-language-action (VLA) models. VLAs combine large vision-language models with robot action decoders. Rather than training a separate policy for every manipulation task, a VLA can take a natural-language instruction such as “assemble this burger” or “place the tomatoes” and generate the corresponding robot actions. Models such as Physical Intelligence’s π₀, NVIDIA GR00T, Gemini Robotics, and OpenVLA all explore this direction.

‍

A second direction is world models. Rather than directly predicting robot actions, world models learn to predict how the physical world evolves over time. By developing an internal understanding of physics, they can support planning, simulation, and reasoning about the consequences of different actions. For food robotics, this could eventually enable robots to reason about what happens when rice compresses, mashed potatoes cool, or shredded chicken shifts under a gripper; in other words, to develop an understanding of food physics.

‍

These approaches aren’t mutually exclusive. We expect future physical AI systems to combine fast action policies with increasingly capable world models, allowing robots to both react quickly and reason more effectively about the world around them.

‍

At Chef, we’re building toward that future. Today, our Food Foundation Model (FFM) applies VLA-based techniques to learn new meals and ingredients from demonstrations rather than new code. Looking ahead, we believe world models will become an increasingly important component of physical AI, particularly for manipulating highly variable, deformable materials like food.

‍

Building production-ready foundation models also introduces new engineering challenges, such as latency, since robot control loops operate much faster than today’s large AI models can perform inference. Those implementation details matter, but they’re ultimately in service of a much larger trend: robotics is becoming increasingly driven by learned models.

‍

From code to data

The most significant change in robotics over the past decade hasn’t been better robot arms or better cameras. Those technologies have steadily improved for years. The largest change is that robot behavior is no longer defined primarily by code.

‍

Historically, robotics engineers improved robots by writing new programs and adding new heuristics. Today, they increasingly improve robots by collecting better demonstrations, curating larger datasets, and training more capable models. In other words, robotics is becoming a data discipline.

‍

The progression from mechatronic automation to programmed motion, heuristics, and ultimately end-to-end learning reflects a broader shift in how intelligence is built into machines. Rather than explicitly telling robots what to do, we increasingly teach them through experience. That shift from code to data is what distinguishes today’s generation of physical AI systems from the industrial robots that came before them.

‍

To learn more about the work of our engineering team, read our engineering blog and contact us with any questions.