Step 1 of 7 — Welcome
Sound Comfort Study

Welcome to the prototype evaluation

This takes around 10–15 minutes. You'll set your sound preferences, listen to audio samples, see model predictions, then complete a short survey.

What happens
1
Set your preferences
Tag every top-level sound category as comfortable, neutral, or uncomfortable. You can be as specific as you like.
2
Rate audio clips
Listen to short clips and a longer recording, marking how each sounds to you.
3
See model predictions
The AI's 0–1 comfort/discomfort scores are revealed only after you've finished rating.
4
Complete the survey
Tell us how well the model matched your judgement and how usable the system feels.
Model predictions are hidden until after you complete all ratings.
Step 2 of 7

Your sound preferences

By default, all sounds are tagged as Neutral. Please specify which sounds make you feel comfortable or uncomfortable. Some categories belong inside others or overlap — for example, using Tag all on "Alarm" will also affect "Car alarm" in the "Vehicle → Motor vehicle (road) → Car" category.

Active rating mode
Click to select, then click any sound:
Category checklist — all must be tagged
Your profile so far
Comfortable
none yet
Neutral
none yet
Uncomfortable
none yet
Step 3 of 7

Rate the audio clips

Listen to each clip fully, then mark whether it sounds comfortable, neutral, or uncomfortable to you.

Listen before rating. Your honest impression matters — not what you think the model expects.
Step 4 of 7

Annotate the longer recording

Press play, select a label mode, then click and drag on the timeline to mark sections. Each section can only have one label — painting over an existing section replaces it.

Recording

Loading audio…

Step 5 of 7

Model predictions

The AI's comfort and discomfort scores are shown below, based on your preferences. Scores run 0–1. Compare with your own ratings above each card.

Personalizing model...
Step 6 of 7

Evaluation survey

Please answer based on what you just experienced.

Accuracy
1. The model's comfort scores matched how I actually experienced the sounds.
Strongly disagreeStrongly agree
2. The model's discomfort scores matched how I actually experienced the sounds.
Strongly disagreeStrongly agree
3. Were there clips where the model was clearly wrong? Which ones and why?
Preference expression
4. I was able to express my sound preferences accurately using the category system.
5. The sound categories were at the right level of detail.
Too coarseToo granular
6. Were there sounds you care about that you couldn't find or express?
Usability
7. The overall interface was easy to navigate.
8. The timeline tool for marking the longer recording felt natural.
Usefulness & trust
9. I would find this system useful in real earbuds.
10. I would trust the system to automatically adjust sounds based on my preferences.
11. I would prefer to give feedback after each clip rather than setting preferences upfront.
i.e. reactive feedback vs. proactive preference setting
Personalisation
12. The model's predictions felt personalised to me specifically, not just generic.
13. I understood what the 0–1 scores meant.
14. I would be comfortable with this system running passively in the background without my input.
Open feedback
15. What would most improve the model's accuracy for you?
16. Any other comments?

Thank you!

Your responses have been recorded. We really appreciate your time and feedback.

If you would like to follow up on the results of this research, feel free to get in touch at sally.choi@student.uva.nl

Personalizing model...
Gathering data