/dev/null: CandyOps: Over-Engineering Halloween with AI and Robots

A few weeks ago my wife sent me a link to this candy throwing robot and said it looked like a fun project for our kids' upcoming Trunk or Treat. I of course immediately took that as permission to buy a new Raspberry Pi5 and all the necessary parts (including a few unnecessary ones) for the SO101 Robot Arm under the thinly veiled excuse of a great father/daughter project (and totally not for my own amusement).

Project Goals

"We do these things not because they are easy, but because we thought they would be easy."

As with most personal projects, the end state and project goals have evolved over time. I originally set out to just replicate the linked candy throwing robot project, but we quickly decided we would get more out of it if we tried to build our own thing as opposed to copy/pasting someone else's project.

At a high level, my goals for this project were:

Share the learning and discovery process of building something with my kids.
Automate the act of picking up a piece of candy (ideally out of a bowl) and hand it to a trick or treater.
Automate the triggering of the hand-out process through some visual or audio input.
Build a portable assembly that could be moved easily and did not require internet access.

Assembly

I opted to buy the motor kit rather than individually sourcing each part and we 3D printed the parts using this model that is optimized for our printer. I also picked up a Raspberry Pi 5 to control everything, although given the amount of AI models and training we tested for this project I do wonder if the Jetson Orin Nano would have been a better choice.

Printing the robot linkages

Other than a few self-inflicted issues, the printing and assembly were pretty straightforward. I also added the overhead and wrist mounted cameras to better support the data collection for imitation learning.

Assembling the robot

Once assembled, we followed the instructions for motor configuration and calibration. I did manage to burn out one of the motor boards during calibration when a cable came loose and shorted the circuit, so I highly recommend double checking the motor cables are fully seated and run it through the full rotation BEFORE applying power.

After a quick test to make sure everything worked, we mounted the RPI, power strip, and both of the arms to a scrap piece of plywood for stability and to make it easy to move around. This has the added benefit of giving a very consistent environment when recording and training the models.

Fully assembled robot and work area

Training

I knew from reading the LeRobot Documentation, and my own personal experience with projects like these, that going straight to the complex imitation learning through neural networks was going to cause a lot of headaches. Luckily, they provide a nice guide that sets out a series of incremental steps that would help us get more familiar with operating the robot, and at the same time provide nice gates that, if nothing else worked, would still give a fun experience for the Trunk or Treat.

Manual Operation

The first step was just to get the robot working with manual teleoperation. In this scenario the user controls the Leader arm (the one on the right with the handle) and the Follower arm imitates the movements. LeRobot provides some handy scripts to get this all working and in no time we were able to practice moving the robot around. Honestly, we could have stopped at this point and the kids would have been super happy just trying to guide the robot to pick up candy out of a bowl.

Manual Tele-operation of the robot

Record/Replay

The next step was to practice making a recording and then replaying that recording. In this scenario you operate the robot much like you do with manual teleoperation, except this time the software is recording all of the servo positions over time into a dataset that can be played back later in the future with no manual intervention. We practiced recording the arm reaching into the bucket and handing someone some candy and then replaying that recording. This mode of operation proved to be the most reliable and was easy to string together with some basic control logic to replay specific episodes based on desired conditions.

Episode Replay with Motor Positioning Data

Traditional Kinematics

The traditional way of controlling robots would be to model the physical robot and then develop some forward and inverse kinematic solvers. This would allow you to provide it a set of normalized coordinates and generate the necessary motor positions, or provide it a set of motor positions and know where it is in coordinate space. The LeRobot and SO101 library come with some helpful tools for this (URDF models, Mujoco for simulation, etc), but we skipped this mode entirely as it felt a bit too complex for a first robotics project. I do plan to revisit this for future projects.

Neural Network

Because LeRobot is a Hugging Face project, it has a lot of scripts and utilities for generating datasets and training those datasets to build a neural network. The neural net can then be fed into the Robot Processor Pipeline to take observations (motor positions, camera feeds) and infer future state. This is definitely the most interesting mode as it has the potential for a lot of flexibility for operations like Pick and Place of an object.

Wandb output of our first training run

We made two attempts at this process - the first one was with the full candy bowl. This turned out to have way too many variables and the resulting model did not work at all. The second attempt was a much more pared back pick and place with a single candy. This one got closer, but the movement was still very jittery, and it often missed the candy all together. I will say, it was very cool to see it miss the candy, start moving away, realize it wasn't holding the candy, and then go back to try again.

One of the rare successful attempts using our neural net model

We had a lot of fun making the datasets, and brainstorming why it was exhibiting certain behaviors after training, but because of the slow process for creating datasets, training (took about an hour on an A100 I rented), and actual pickup time we ultimately decided not to use this mode for our Trunk or Treat.

My initial sense is that just like traditional LLMs, the premise here is very exciting for the field of robotics, but the scale required for datasets and training is probably beyond the reach for all but the largest of organizations.

Putting it all together

To this point we had been mostly controlling the robot directly from a laptop by running scripts manually. We knew this wouldn't be very useful in a Trunk or Treat setting, so we set out to tie it all together. For this part I wanted my daughter to start dipping her toes into coding, so I set up VSCode with Copilot and walked her through some basics of AI assisted coding.

Our first thought was to have the robot detect a person through the camera, and then run the pass out candy replay. We tested out a small YOLO model which was surprisingly easy to get working in python, and ran really well on the RPI5 (about 1 - 2 Hz). We quickly decided this would be difficult in a real life setting as there are often dozens of people walking around during a typical Trunk or Treat and so it would likely ALWAYS be detecting a person.

Our second attempt was to use a voice command. Again, I was surprised at both how well copilot took some very basic requests and turned it into working code, as well as how well the Vosk voice models ran on the RPI. This mode seemed to work better as we could give it specific commands to wait for, as well as piece together different replays based on different commands.

Finally, we worked with copilot to build a web frontend so we can control the robot from a cell phone. We ended up with a simple Flask based python web frontend, and even got it themed for Halloween.

Command and Control interface

At this point, I took over to clean up the AI generated code and make it a bit more robust:

The first iteration was simply calling lerobot scripts for replaying a dataset episode. This worked fine, but we found there was a few second delay as the script set everything up and loaded the dataset. I updated the code to do all of these loading and setup steps during the initialization function so that each replay call was much more responsive.
The voice model worked OK, but had a lot of false positives which would only get worse in a crowded noisy environment. I found that the KaldiRecognizer class could be given a list of strings and only register those in the model. This greatly reduced the false positives and seemed much easier than finetuning the model.
Not so much a code change, but I found during the Trunk or Treat that it was pretty easy for me to naturally work in the registered key words ('Trick or Treat', 'please' etc) when talking to kids if it didn't register right away. This made it feel natural even when the voice activation didn't work as expected.

Conclusion

Just like any project, whether DIY or professional, there comes a point where there are still 100 things you would like to do and about 10 times as many ways it could all go sideways, but time marches on and deadlines come quicker than you want. Fortunately, we had tested and planned enough that our candy passing robot was a huge success at the Trunk or Treat event.

Trunk or Treat!

Whether it is your kid, your mentor, or the new junior on the team, I believe there is value in not just teaching the next generation what you already know, but also finding opportunities to experience the learning process together. We both had a ton of fun on this project, and my hope is that I've been able to move the needle from "my dad knows how to build robots" toward "my dad and I figured out how to build a robot and I'll bet if I had to, I could do it again".

Saturday, November 1, 2025

CandyOps: Over-Engineering Halloween with AI and Robots