Zachary Clement

A place to host my thoughts and side projects

Baby Cry Translator App: Methodology and Retrospective

Posted at — Apr 11, 2026

Over the past couple of weeks, I built a web app to detect cries. It is probably not ready for general use but it was a fun learning experience.

Engineering Approach

Vibe Coding

I used Claude Code to develop this app. Before writing any code, I developed a fairly detailed plan. I find that it is much easier to write code with vibe coding than to modivy existing code, so I wanted things written down beforehand as much as possible to reduce need for extensive changes in the future.

ML Predictions

To make cry predictions, I first created embeddings for each cry using the hidden state of two audio processing neural networks. I used the hidden state of emotion2vec because I believed that some vocal features that display emotion in adult speech (like pitch, intensity, cadence) is also present in babies. In addition, I used the hidden state for whisper because I believed that babies probably use some vocal features similar to speech (for example, when a baby is hungry, they frequently sound like their cries start with “n”.

After normalizing the hidden states, I used a vector database (Chroma) to get the three most similar cries to the current one. These cries were fed into the LLM, along with the associated cry reason and solution. The LLM then created a cry reason and suggested a solution for this cry. This approach extended the classic K-Nearest-Neighbors approach to allow for prediction using the free-text entries that people would put in the app.

Database

I used SQLite for my database as I did not anticipate any users except for me on this app. I used a star schema, with normalized tables.

Learnings

Consider the “default state” before building an ML system

When I started building out this web app, I didn’t have a ton of experience being a parent, and I didn’t really consider how parents respond to cries. Now, I realize that it is pretty quick to iterate through all the reasons why a baby might cry–it doesn’t take that long to check their diaper, for example. So, an ML system for cry detection in babies needs to respond very quickly to provide any added value to parents.

High

Future Work

Using External Data for Cry Interpretation

In the future, I plan to improve the quality of my app by using external recordings to interpret results of cry recordings. I found this dataset that I plan to use to provide initial predictions for cries while building up an individualized dataset for a given baby.

Integrating Multiple Cry Reasons

I did not realize this at the time I designed this app, but it is very common for babies to cry for multiple reasons at once. For example, babies might need their diapers changed and need to be fed at the same time. In the future, I would probably use discrete categories for cry reasons, and allow users to select multiple reasons. I would then use a different algorithm to select the top one or two reasons that match a given cry.

Integrating Additional Data Streams

As a parent responding to baby’s cries, a lot of my ability to soothe the baby is based on my knowledge of events that happened in the past. I’ll think of the last time that the baby was fed or had her diaper changed, and if either of those events happened far in the past, I’ll consider those as possible cry reasons. If I were to make cry detection better, I would have people enter previous feeding/diaper changing times into the app to improve predictions.