Review of the article „ViVo: Piano Learning Through Visualizing Vocalizations on a Lighted Keyboard” by Maya Caren

The paper introduces the concept of ViVo, a tool for instrument learning, that combines the already well-researched but until now separately used methods of vocalization and visualization. It has two modes: One that visualizes the real-time vocalizations of the user, transforming it into illuminations of the corresponding piano keys and a mode where the user can practise with visualizations of pre-recorded vocalizations.

In the introduction the author gives information about the technique of vocalizing, which is an important tool to learn instruments and improve the general sense of music and improvisation and composition skills. Its effectiveness can be attributed to the fact that vocalized melodies can be memorized better than instrumental melodies. Despite this, vocalization is not used in instrument learning environments to its full potential. Visualization, on the other hand, is broadly used in music education, guiding the learners through real-time instructions and feedback. ViVo aims to combine both learning methods to create a more effective and intuitive tool.

The tool consists of a microphone that’s being used to record the vocalizations, a processor and an electronic piano with illuminable keys. In another section there is more specific information on the used hardware.

The goals of ViVo are described as follows: portability, low material costs, effective real-time operation, pitch accuracy, pitch range and note onset/offset fidelity.

The author then describes the algorithms, that were used for ViVo: firstly, two pitch detection methods (YIN algorithm and an FFT-based heuristic approach) to extract pitch from vocal input. While standard interpolation techniques (like parabolic fitting) are already known, ViVo adapts them specifically for singing. Its modified method focuses on a typical vocal pitch range (100–1000 Hz), calculates the average energy within that range, and then scans for the first three consecutive frequency bins exceeding this average. It applies a parabola fit to these bins to estimate the fundamental frequency (f₀), optionally smoothing the result with a filter. These adjustments improve robustness to noise and reduce common errors, such as detecting the wrong octave when harmonics are stronger than the fundamental frequency.

To test if ViVo fulfils the previously set goals, it was being tested by playing a voices library through a speaker placed 70cm from the microphone to simulate the expected use conditions.

In the following sections the methods of pitch detection, range and note onset/ offset are described in more detail. After this the author gives an outlook on future work. Planned features are for example the extension of the detection system to recognize polyphonic melodies which allows to use recorded music instead of only relying on unison vocals. The use case will be extended to larger music education contexts such as classrooms. There will also be studies on how the tool can help to learn jazz improvisation. Another mentioned step is to develop the ViVo tool for other instruments such as guitar. Then the author illustrates how the tool could be used in live performances.

In the last part of the paper there is a section on ethical standards. The author declares that she used low-cost materials and open-source software that was funded by herself. She also indicates that there were no conflicts of interest and that all participation in the project was voluntary.  This part is followed by the references.

I found the approach of combining vocalization and visualization as learning methods very interesting. The prototype shown on a picture seems a bit raw and unfinished, but I think has potential to be further developed. I wonder if the idea of this tool can also be applied to other instruments that do not have a visible scale, such as trumpet, saxophone or French horn. In my own experience of learning the flute, trumpet, French horn and guitar I never worked with visualization, but sometimes used vocalization with my French horn teacher, which I think is helpful to establish a feeling for pitch and intervals and makes it easier to hit the right notes. This is especially crucial for brass instrument players, since those instruments usually only have 3-4 keys, so the same key combinations are being used for various notes that have to be controlled through air pressure.

The paper itself was well structured, although I would have put some of the sections in a different orde. In my opinion it would make more sense to put the goals of the tool right after the introduction that describes the concept of the tool. Other than that I think it was easy to understand the idea and how ViVo works.

Leave a Reply

Your email address will not be published. Required fields are marked *