ApplyingML


Aditya Nambiar - Founding Eng @ Fennel / Ex-Tech Lead @ Meta

Learn more about Aditya at LinkedIn and Twitter.

Please share a bit about yourself: your current role, where you work, and what you do?

Hey I am Aditya, founding engineer at Fennel.ai and was previously a Tech Lead at Instagram Ads and Community Integrity at Facebook. Prior to which I worked in research teams at Google. Currently, I am responsible for building the core ML infrastructure at Fennel.ai.

What was your path towards working with machine learning? What factors helped along the way?

I earned my bachelor's degree in Computer Science & Engineering from IIT Bombay and pursued a minor in Statistics. This provided me with a solid foundation in the mathematical principles underlying machine learning.

My journey towards working with machine learning began during my time as a software engineer at Google. I was initially working on projects that used Information Retrieval techniques to solve problems, but I became increasingly interested in the potential of machine learning to tackle complex challenges that traditional software engineering approaches couldn't solve.

Fortunately, I had the opportunity to switch to a team that was using machine learning to address similar problems. This team was at the forefront of developing the TTSN ( Two Tower Sparse Neural Networks ) architecture for retrieval, which provided me with my first exposure to machine learning. I was fascinated by the power of ML and the potential it held for transforming the way we solve problems.

From there, I moved on to become a Tech Lead at Meta, where I helped build the ML platform that used deep learning to validate the authenticity of identification documents, parse & match them to user's profiles. This was a challenging problem that required heavy use of computer vision and natural language processing. Later on, I joined Instagram where I led the development of ads in new surfaces such as Reels and Explore. This experience gave me the chance to apply my machine learning expertise to a new set of challenges and further deepen my understanding of the field.

How do you spend your time day-to-day?

As a founding engineer at a startup, my day-to-day work involves a broad range of responsibilities. I spend a significant amount of time building and the core ML infrastructure that powers our platform. In addition to that, I work closely with our customers to understand their needs and ensure that our platform is meeting their requirements.

Some days are purely brainstorming about a range of topics such as go-to-market strategies, sales and marketing initiatives, building the roadmap for the product or just meeting potential customers and understanding their needs.

It's a lot of fun since you get to wear multiple hats and learn a lot of new things, especially around topics that I earlier had zero exposure to. It has definitely been a very rewarding and fulfilling experience.

How do you work with business to identify and define problems suited for machine learning? How do you align ML projects with business objectives?

Working with business stakeholders to identify and define problems suited for machine learning requires a collaborative and iterative approach. Typically, I start by sitting down with product managers and engineering managers to understand the problem at hand and how it relates to the broader business objectives. From there, it's important to understand what the business metrics are and how they are measured. This helps us determine whether machine learning is the right approach for solving the problem, and if so, what modeling approach to use. In some cases, a simple rule-based solution may be more appropriate than a machine learning model. However, if machine learning is the right approach, we can then start to iterate on an initial solution and improve it over time.

To align machine learning projects with business objectives, it's essential to choose the right objective function and metrics that align with the business goals. This requires ongoing communication and collaboration between the technical team and the business stakeholders. It's important to ensure that everyone is aligned on the metrics and that we're tracking progress towards the desired outcomes. This helps to ensure that the machine learning projects are delivering value to the business and are aligned with its broader objectives.

Imagine you're given a new, unfamiliar problem to solve with machine learning. How would you approach it?

When faced with a new, unfamiliar problem to solve with machine learning, I would approach it in a structured and iterative manner. Here's the process I would follow:

  • Understand the problem: I would start by gaining a deep understanding of the problem at hand, including its context, scope, and requirements. This would involve working closely with stakeholders to understand their needs, as well as conducting research on similar problems that have been solved with machine learning.
  • Understand the data: Once I have a clear understanding of the problem, I would dive into the data to understand its structure, quality, and availability. This would involve data exploration, cleaning, and preprocessing, as well as feature engineering if necessary.
  • Define the metrics: Next, I would work with stakeholders to define the business metrics that will be used to measure the success of the machine learning solution. This would ensure that the solution is aligned with business goals and objectives.
  • Build a baseline solution: If there is no existing solution, I would start by building a simple rule-based solution to solve the problem. This would provide a baseline against which to compare the performance of the machine learning solution.
  • Build and evaluate the model: I would then build a simple machine learning model to solve the problem and compare its performance against the baseline solution. If the machine learning model performs better, I would then iterate on the model to improve its performance.
  • Iterate and refine: Finally, I would iterate on the model to improve its performance, incorporating feedback from stakeholders and fine-tuning the model as necessary. This process would continue until the model meets the desired performance metrics and is ready for deployment.

Designing, building, and operating ML systems is a big effort. Who do you collaborate with? How do you scale yourself?

Depending on the problem, I would collaborate with different people.

  • Research Teams: If the problem is a hard problem that can take several months to solve, I would collaborate with research teams to solve the problem.
  • Data Engineers: They help get the data and build the pipelines to get the data.
  • Labeling/Operations Team: They help with labeling the data and building the infrastructure to do so.
  • Product Managers: They help with understanding the issue and the business metrics.
  • Platform Engineers: They help with building and deploying the infrastructure to serve the models.

How does your organization or team enable rapid iteration on machine learning experiments and systems?

Enabling rapid iteration on machine learning experiments and systems is critical to achieving success in this field. At many organizations, teams rely on a combination of technical infrastructure and processes to facilitate rapid iteration.

One key component of this infrastructure is a strong feature engineering platform. Feature engineering is a critical component of developing high-performing machine learning models, and having a platform that enables real-time feature engineering can help teams save time, ensure there are fewer bugs and iterate more quickly. For example at Facebook, we had an entire org that was devoted to this. In fact they were rewriting the feature engineering platform to something called F3 (Feature Framework at Facebook).

We at Fennel.ai are also building a platform that enables teams to quickly build, test, and deploy new features in real-time. Fennel.ai helps teams focus on the high-level aspects of machine learning and deliver value to their customers more quickly.

In addition to a strong feature engineering platform, organizations that enable rapid iteration on machine learning experiments and systems often invest in scalable training infrastructure to make it easier to train several models, and experiment tracking to help with reproducibility and collaboration.

What processes, tools, or artifacts have you found helpful in the machine learning lifecycle? What would you introduce if you joined a new team?

For me the most important things are speed of iteration & reproducibility. I would introduce tools that help with these two things. So apart from the above stated tools like a feature engineering platform, scalable training infrastructure, and experiment tracking, with some good examples being Weights & Biases, Metaflow or MLFlow.

I would also introduce tools that help with monitoring, logging, and alerting. Since it is much easier to introduce bugs in machine learning systems compared to traditional software systems. There are bunch of useful tools that are now in the market such as Arise AI or Whylogs.

During the development cycle, I have always found it helpful to maintain a running doc detailing the goals, why we are solving the problem, the various steps that have been taken, ideas that have been discussed / discarded and the progress that has been made. Having a good paper trail has always been something that has proven immensely useful in my career.

How do you quantify the impact of your work? What was the greatest impact you made?

I think the best way to quantify the impact of your work is to look at the business metrics. For example, if you are working on a ranking problem, you can look at metrics such as CTR, CPC, etc.

The largest impact I have made is to help build the Identity ML platform at Meta. This platform used deep learning to validate the authenticity of identification documents, parse and extract information from them, and match them to the user's profile. This helped to reduce fraud and improve the user experience. We were able to break down the problem into multiple ML problems, and build a system that was able to correctly validate & parse the documents 75% of the time and match them to the user's profile 90% of the time. This was a huge improvement over the previous system for which I was invited to present at the Integrity All Hands at Meta.

After shipping your ML project, how do you monitor performance in production? Did you have to update pipelines or retrain models—how manual or automatic was this?

Monitoring the performance of machine learning models in production is crucial to ensure that they continue to perform as expected over time. There are several key areas to focus on when monitoring ML projects in production.

First, it's important to monitor the data pipelines that generate the training data. This helps to detect and correct any issues with the data that could lead to data skew and negatively impact model performance. Second, monitoring the performance of the model itself is essential. This can be achieved by tracking a range of performance metrics and displaying them on a dashboard. At Facebook, we used a tool called Unidash to display multiple metrics in a single dashboard, which made it easy to monitor model performance over time. Finally, it's important to set up alerts based on the performance metrics being monitored. However, it's crucial to avoid optimizing solely for recall, as this can lead to too many false alerts that are ultimately ignored. Instead, it's important to strike a balance between detecting real issues and minimizing false alarms

In terms of updating pipelines or retraining models, this can be a manual or automatic process depending on the specific project and its requirements. For eg in Instagram Ads we used online learning to update the models in production. But this was a lot of overhead and we had to build a lot of infrastructure to support this.

Think of people who are able to apply ML effectively–what skills or traits do you think contributed to that?

Effective application of machine learning requires a combination of technical skills and personal traits. Here are some skills and traits that I believe contribute to success in this field:

  • Strong technical skills: Machine learning is a highly technical field, and proficiency in programming, data analysis, and mathematics is essential.
  • Curiosity: Machine learning is a rapidly evolving field, and it's important to stay up-to-date with the latest research and developments.
  • Creativity: Machine learning is often used to solve complex problems, and individuals who are able to think creatively and approach problems from multiple angles are more likely to find effective solutions.

Do you have any lessons or advice about applying ML that's especially helpful? Anything that you didn't learn at school or via a book (i.e., only at work)?

Some lessons I have learnt over the years are:

  • Modelling is generally the easiest part of the problem. The hardest part is getting the data and the infrastructure right. So focus on that.
  • Have strong data quality checks in place. You might want to look into using something like great expectations. This will save you a lot of time and effort in the long run.
  • Converting offline gains to online gains is hard, there are a ton of reasons why your offline gains might not translate to online gains. A blog where we talk about it in detail is here
  • Be mindful of how you create your test and train split, it is easy to leak information from the test set to the train set.
  • Track your experiments and be able to reproduce them. This is a skill that is often overlooked but is very important.

How do you learn continuously? What are some resources or role models that you've learned from?

Staying up-to-date with the latest developments in machine learning is essential for continuous learning and professional growth. To this end, I leverage various resources to learn from, including Hacker News, the Hacker Newsletter, and Twitter. These platforms provide valuable insights and discussions on the latest trends and best practices in machine learning.

In addition to these online resources, I also read engineering blogs from companies that I admire. This helps me understand how other organizations are leveraging machine learning to solve complex problems and improve their products and services.

Read more mentor interviews?


© Eugene Yan 2024AboutSuggest edits.