Hey I am Aditya, founding engineer at Fennel.ai and was previously a Tech Lead at Instagram Ads and Community Integrity at Facebook. Prior to which I worked in research teams at Google. Currently, I am responsible for building the core ML infrastructure at Fennel.ai.
I earned my bachelor's degree in Computer Science & Engineering from IIT Bombay and pursued a minor in Statistics. This provided me with a solid foundation in the mathematical principles underlying machine learning.
My journey towards working with machine learning began during my time as a software engineer at Google. I was initially working on projects that used Information Retrieval techniques to solve problems, but I became increasingly interested in the potential of machine learning to tackle complex challenges that traditional software engineering approaches couldn't solve.
Fortunately, I had the opportunity to switch to a team that was using machine learning to address similar problems. This team was at the forefront of developing the TTSN ( Two Tower Sparse Neural Networks ) architecture for retrieval, which provided me with my first exposure to machine learning. I was fascinated by the power of ML and the potential it held for transforming the way we solve problems.
From there, I moved on to become a Tech Lead at Meta, where I helped build the ML platform that used deep learning to validate the authenticity of identification documents, parse & match them to user's profiles. This was a challenging problem that required heavy use of computer vision and natural language processing. Later on, I joined Instagram where I led the development of ads in new surfaces such as Reels and Explore. This experience gave me the chance to apply my machine learning expertise to a new set of challenges and further deepen my understanding of the field.
As a founding engineer at a startup, my day-to-day work involves a broad range of responsibilities. I spend a significant amount of time building and the core ML infrastructure that powers our platform. In addition to that, I work closely with our customers to understand their needs and ensure that our platform is meeting their requirements.
Some days are purely brainstorming about a range of topics such as go-to-market strategies, sales and marketing initiatives, building the roadmap for the product or just meeting potential customers and understanding their needs.
It's a lot of fun since you get to wear multiple hats and learn a lot of new things, especially around topics that I earlier had zero exposure to. It has definitely been a very rewarding and fulfilling experience.
Working with business stakeholders to identify and define problems suited for machine learning requires a collaborative and iterative approach. Typically, I start by sitting down with product managers and engineering managers to understand the problem at hand and how it relates to the broader business objectives. From there, it's important to understand what the business metrics are and how they are measured. This helps us determine whether machine learning is the right approach for solving the problem, and if so, what modeling approach to use. In some cases, a simple rule-based solution may be more appropriate than a machine learning model. However, if machine learning is the right approach, we can then start to iterate on an initial solution and improve it over time.
To align machine learning projects with business objectives, it's essential to choose the right objective function and metrics that align with the business goals. This requires ongoing communication and collaboration between the technical team and the business stakeholders. It's important to ensure that everyone is aligned on the metrics and that we're tracking progress towards the desired outcomes. This helps to ensure that the machine learning projects are delivering value to the business and are aligned with its broader objectives.
When faced with a new, unfamiliar problem to solve with machine learning, I would approach it in a structured and iterative manner. Here's the process I would follow:
Depending on the problem, I would collaborate with different people.
Enabling rapid iteration on machine learning experiments and systems is critical to achieving success in this field. At many organizations, teams rely on a combination of technical infrastructure and processes to facilitate rapid iteration.
One key component of this infrastructure is a strong feature engineering platform. Feature engineering is a critical component of developing high-performing machine learning models, and having a platform that enables real-time feature engineering can help teams save time, ensure there are fewer bugs and iterate more quickly. For example at Facebook, we had an entire org that was devoted to this. In fact they were rewriting the feature engineering platform to something called F3 (Feature Framework at Facebook).
We at Fennel.ai are also building a platform that enables teams to quickly build, test, and deploy new features in real-time. Fennel.ai helps teams focus on the high-level aspects of machine learning and deliver value to their customers more quickly.
In addition to a strong feature engineering platform, organizations that enable rapid iteration on machine learning experiments and systems often invest in scalable training infrastructure to make it easier to train several models, and experiment tracking to help with reproducibility and collaboration.
For me the most important things are speed of iteration & reproducibility. I would introduce tools that help with these two things. So apart from the above stated tools like a feature engineering platform, scalable training infrastructure, and experiment tracking, with some good examples being Weights & Biases, Metaflow or MLFlow.
I would also introduce tools that help with monitoring, logging, and alerting. Since it is much easier to introduce bugs in machine learning systems compared to traditional software systems. There are bunch of useful tools that are now in the market such as Arise AI or Whylogs.
During the development cycle, I have always found it helpful to maintain a running doc detailing the goals, why we are solving the problem, the various steps that have been taken, ideas that have been discussed / discarded and the progress that has been made. Having a good paper trail has always been something that has proven immensely useful in my career.
I think the best way to quantify the impact of your work is to look at the business metrics. For example, if you are working on a ranking problem, you can look at metrics such as CTR, CPC, etc.
The largest impact I have made is to help build the Identity ML platform at Meta. This platform used deep learning to validate the authenticity of identification documents, parse and extract information from them, and match them to the user's profile. This helped to reduce fraud and improve the user experience. We were able to break down the problem into multiple ML problems, and build a system that was able to correctly validate & parse the documents 75% of the time and match them to the user's profile 90% of the time. This was a huge improvement over the previous system for which I was invited to present at the Integrity All Hands at Meta.
Monitoring the performance of machine learning models in production is crucial to ensure that they continue to perform as expected over time. There are several key areas to focus on when monitoring ML projects in production.
First, it's important to monitor the data pipelines that generate the training data. This helps to detect and correct any issues with the data that could lead to data skew and negatively impact model performance. Second, monitoring the performance of the model itself is essential. This can be achieved by tracking a range of performance metrics and displaying them on a dashboard. At Facebook, we used a tool called Unidash to display multiple metrics in a single dashboard, which made it easy to monitor model performance over time. Finally, it's important to set up alerts based on the performance metrics being monitored. However, it's crucial to avoid optimizing solely for recall, as this can lead to too many false alerts that are ultimately ignored. Instead, it's important to strike a balance between detecting real issues and minimizing false alarms
In terms of updating pipelines or retraining models, this can be a manual or automatic process depending on the specific project and its requirements. For eg in Instagram Ads we used online learning to update the models in production. But this was a lot of overhead and we had to build a lot of infrastructure to support this.
Effective application of machine learning requires a combination of technical skills and personal traits. Here are some skills and traits that I believe contribute to success in this field:
Some lessons I have learnt over the years are:
Staying up-to-date with the latest developments in machine learning is essential for continuous learning and professional growth. To this end, I leverage various resources to learn from, including Hacker News, the Hacker Newsletter, and Twitter. These platforms provide valuable insights and discussions on the latest trends and best practices in machine learning.
In addition to these online resources, I also read engineering blogs from companies that I admire. This helps me understand how other organizations are leveraging machine learning to solve complex problems and improve their products and services.
Read more mentor interviews?