ApplyingML


Alexey Grigorev - Principal Data Scientist @ OLX Group

Learn more about Alexey on his Twitter, and LinkedIn, as well as the community he runs at DataTalks.Club.

Please share a bit about yourself: your current role, where you work, and what you do?

I work as a principal data scientist at OLX Group and I lead a small team of two people. Our main focus right now is helping our data scientists be more effective. Mostly, it’s about solving engineering challenges that come with model deployment.

I also go into other areas of the process — from problem definition to model evaluation. One of the projects I’m doing now is standardizing how we productionize machine learning in our data science department.

After work, I run DataTalks.Club — a community of people who love data. We have weekly events and amazing discussions in our Slack.

What was your path towards working with machine learning? What factors helped along the way?

I started my career as a Java developer. I worked at a bank and my colleagues told me about this new exciting course on Coursera by Andrew Ng. Then I took more courses and eventually did a masters in business intelligence. At the same time, I was freelancing — doing machine learning projects in Java. That and my master thesis helped to build a good portfolio of projects, so it was enough to get my first data science job.

At my first job, my colleague convinced me to try Kaggle. It was a competition about finding the correct answers to a set multiple choice question. I failed miserably in that competition, but I also learned a lot. Most importantly, I learned that all the theoretical knowledge I had from my masters and online courses was quite useless for applied machine learning problems. I took part in more competitions and this is when I really learned machine learning.

After some time, I joined a startup. In a startup, there’s always more work than people. There, I would do everything: work on the roadmap, set up data pipelines, write scrapers, and buy groceries. It was an amazing experience and I realized that being a generalist is more interesting for me than being a specialist in one particular area. When I joined OLX, I saw that many of my colleagues don’t like deploying machine learning projects, they’d rather focus on modelling. But I liked the deploying part, so I started helping my colleagues with that from my first days.

Now I work as a principal data scientist on a variety of different things — starting from identifying the most impactful projects to unifying how we do machine learning across the organization.

How do you spend your time day-to-day?

I do a number of things:

  • Coordinating work
  • Mentoring
  • Creating courses and tutorials
  • Writing documentation
  • Creating proof of concepts
  • Identifying inefficiencies in our work and trying to address them
  • Identifying things that work well and scaling them to other projects

I spend most of my day in meetings.

Imagine you're given a new, unfamiliar problem to solve with machine learning. How would you approach it?

  • Spend a lot of time talking to stakeholders and subject matter experts. Figure out the problem we try to model and solve. How is it solved now? What kind of data is generated? Do we capture this data properly?
  • Define KPIs — Understand how to measure the success of this project.
  • Come up with a baseline that doesn’t involve any machine learning. Get feedback from the stakeholders if we’re moving in the right direction. If they are satisfied, we can already go ahead and deploy this baseline.
  • Gradually increase the complexity of the solution. Start with simple linear or tree-based models. Favor explainability over performance. Keep the stakeholders in the loop.

It’s also important to document all these steps. I like creating “project journals” — a document that contains everything related to a project: problem description, the KPIs, notes from meetings (who was there, the decisions, and the next steps).

How does your organization or team enable rapid iteration on machine learning experiments and systems?

  • To create a new solution, we have a set of guidelines that we can follow and start a new project quickly.
  • We have a data catalog to understand what data is available.
  • We have an internal tool based on Airflow that makes it easy to schedule DAGs with data preparation, model training and model scoring.
  • Also, there’s a tool for A/B tests that we can use to evaluate the impact of our solutions on the product.

How do you quantify the impact of your work? What was the greatest impact you made?

Experimentation is probably the best way of doing it. Usually we do it with A/B tests.

After shipping your ML project, how do you operate and maintain it sustainably?

We usually have monitoring and on-call for important projects.

Think of people who are able to apply ML effectively–what skills or traits do you think contributed to that?

They can do things end-to-end. Also, they are good communicators and can convince others to help them.

Do you have any lessons or advice about applying ML that's especially helpful? Anything that you didn't learn at school or via a book (i.e., only at work)?

A successful machine learning project involves a lot of talking. At the beginning you need to understand the problem well. Once you have a model, you need to explain how it works. If others don’t understand how it works — they won’t trust your solution. Finally, when your solution is ready, you also need to convince others to use it. That involves a lot of talking as well.

Another thing the school didn’t emphasise enough was the importance of setting up a cross-validation framework. I think this is the most important machine learning skill. You can answer any question by setting up a cross-validation framework and then experimenting.

For example:

  • Is XGBoost better than Logistic regression?
  • Do I need to apply log transformation to this feature?
  • Is this feature useful?

How do you learn continuously? What are some resources or role models that you've learned from?

I often do “just-in-time learning” (this is how Eugene called it during our interview). When I don’t know how to solve something, I start digging in — do a lot of googling, read blogs and papers, and eventually find the solution. This type of learning works best for me — I focus on the problem and learn by solving it.

Also, I have access to Udemy at work, and I like watching courses there. I’ve finished courses on product and project management, web development, cloud services, vector graphics, marketing, copywriting, public speaking, and many other areas. When possible, I try to use these skills — if I don’t do this, I forget the content of that course in one month.

Read more mentor interviews?


© Eugene Yan 2024AboutSuggest edits.