Annotator: The Position You May Have Never Heard Of

Andhika S Pratama
Data Folks Indonesia
4 min readMay 30, 2020

--

You probably never heard of this position

What comes to your mind when you hear the word “Annotator”? … That’s right, not many people know. That is why I try to write this story in hopes that many of you could know more about Data Annotator.

It all started in early February of 2019. I was in my last semester doing my undergraduate thesis. I was looking for an internship opportunity in a job portal when I stumbled upon one position in Warung Pintar called “Annotator Internship”.

The requirement for this job is fortunately perfect for someone like me who had a major in Language because, in this digital era, those who had a major in IT tend to have more opportunities.

Very excited for my first job, I applied. After going through the interview I finally got the internship.

Long story short, I finally become a full-time Annotator in Warung Pintar. I have M Jemmi Visgun as my mentor in Annotator before he moves to another company.

What have I learned so far and what are the job descriptions of Annotator exactly? well, this answer is based only on my personal experience as an Annotator in Warung Pintar.

⦿ Audio Recording

In order for Warung Pintar to develop an Automated Speech Recognition (ASR), the Machine Learning team needs recorded voices to improve their ASR model. My job is to record as many voices as possible and deliver it to the Machine Learning team. The recorded voices consist of the talents order things from minimarket ‘warung’ and how they pronounced the brand, the quantity, and the unit of measurement. The talents used a script (there is a word script and sentence script) that was created based on how Indonesian people pronounced those words. As for the tool, I use Audacity simply because it is free (lol) and it is easy to use.

⦿ Audio Labeling

Before the recorded audio delivered to the ML team, audio labeling is needed by using a spreadsheet so that the audio can be cut into several files by the engineer based on the order of words or sentences. Here is how it looks:

The hardest part of this process is to match the sound with the time template as you can see in the “start” and “end” column. Time template is created so that the audio labeling process could be done a lot faster.

⦿ Image Labeling

When I first got the task of labeling an image, I had little to no knowledge of image labeling, what tools to use, and how to install the tools. I labeled the items that are displayed in the minimarket ‘warung’ that later on will be used by the Machine Learning engineer to develop brand recognition. Mainly, I used labelimg tool to label the image. There are many free tools besides labelimg that can be used and all of the tools are listed here along with its Github link: https://www.datasetlist.com/tools/

Here is an example of my work of image labeling:

Image labeling using labelimg
Image labeling using labeimg

Besides those 3, I sometimes help the NLP engineer in sorting out or fixing the dataset that the NLP engineer wanted to use.

I enjoy my time as an Annotator because I could understand (though it is still a little) about Machine Learning and its use in this modern digital era.

Aaaand that’s about it! I hope that this story is worthwhile reading for you *smile*.

Hopefully, “Annotator” or “Data Annotator” could get more recognition especially in Indonesia as the increase of use in Machine Learning and that in order for a machine to work autonomously, it needs to learn.

Here is one more additional picture of me with my team:

RnD team (-IoT chapter)

--

--

Andhika S Pratama
Data Folks Indonesia

Hi there! Currently, I’m a Data Annotator in Tictag.io who have an interest in writing such as Copywriting, and UX Writing.