ApplyingML

Shreya Jain - Lead Data Scientist @ Bidgely

Learn more about Shreya on her LinkedIn and Medium.

Please share a bit about yourself: your current role, where you work, and what you do?

I work as a Lead Data Scientist at a clean energy-focused utility-based B2B2C startup, Bidgely. Here, I work on products related to recommendation engines, electric vehicles, and other IP initiatives that are based on Computer Vision and Machine Learning. As a side hustle, I write content on data science in business, articles.

What was your path towards working with machine learning? What factors helped along the way?

I started out as a Software engineer in the cloud applications division for the Photo Gallery app in Samsung. In retrospect, the intervention of Computer Vision seems indispensable as I was designing the gallery application. It is then I directly started working on deep learning-related POCs while taking online courses on the same: CS231n(CNN); deeplearning.ai (Andrew Ng), to name a few.

Though the learning was satisfying as I was developing cool projects based on GAN, VAE, etc, I started seeing gaps in understanding due to the missing machine learning fundamentals. This led me to move to an early stage Advertisement Tech start-up and begin learning Linear Algebra, Optimization, Machine Learning basics from scratch. Resources:(books: ESL, Pattern Recognition; courses: Linear Algebra(Gilbert Strang)). For the many diverse projects I completed in the course of three years with the start-up, I worked on algorithms ranging from statistics, NLP to deep learning.

Factors helped along the way:

Discussions with colleagues always helped me gain a deeper understanding of the algorithms along with working on side projects with specific objectives.
Referring and clarifying business objectives in the development stages of the algorithms helped me work out more practical solutions than ideal.

How do you spend your time day-to-day?

I start my day by checking on major updates in the start-up world and the finance section. There are a lot of great quality newsletters and Medium articles that give a daily dose of gratification. I divide the day into 3 parts:

IC: Projects/tasks that have little dependency on other folks. This could include working on an IC project, improving on your existing algorithm, researching state-of-the-art solutions, writing research papers, documentation work, preparing a presentation, etc.

Scheduled meetings: Daily sync-up, weekly updates, product meetings, project presentations, client meetings, etc.

Collaborations: Meetings that involve brainstorming with the team, discussions on improvements in algorithms/pipelines, streamlining tasks, timeline discussions, etc.

How do you work with business to identify and define problems suited for machine learning? How do you align ML projects with business objectives?

Aligning business objectives with ML projects is one of the most important tasks that should be thought through early on. This helps in deciding the primary feature set, evaluation metrics and vastly aids in streamlining efforts in the right direction. For this, I discuss the business objectives in detail with the stakeholders concerning the short-term and long-term vision of the company.

For instance: A project to devise a new pricing strategy using ML. For this, one needs to understand the demand-supply paradigm of how sales work in your industry and your start-up. New information like what % of the algorithm should be automated, marginal error % the sales team is ready to risk, etc unfolds that should be a direct input to the ML model.

In all, having a deep understanding of the industry you’re working in helps you ask the right questions to the business team, and thereby embedding the same information in your algorithm makes the whole difference.

Machine learning systems can be several steps removed from users, relative to product and UI. How do you maintain empathy with your end-users?

Even before beginning the ML work of the project, one should spend some time in customer discovery. This information is usually procured from the product team. As an ML practitioner, you should have a solid understanding of the top 3 pain points you’re trying to alleviate for the end-user. The difficult job is to embed the same through your algorithm.

If end-users are tired of the generic insights they get from your product, then you invest time in making the recommendation engine more personalized.
If end-users don’t want to go through the arduous 10-step process then automation should be done to simplify the same.

Imagine you're given a new, unfamiliar problem to solve with machine learning. How would you approach it?

Understand the objective of the problem:
- Aids in feature set determination(addition or removal of certain features)
- Choosing the type of algorithm: offline vs run-time, scalable on distributed setups or not, expected performance, etc
- Set evaluation metrics
Research on existing solutions: With given product constraints research on the advantages and drawbacks of the existing solution.
Brainstorm with the team to get new ideas.
Conduct a POC with a set of potential algorithms on a sample set. Assess the benefits and the drawbacks.
Start with the experimentation at scale.

Designing, building, and operating ML systems is a big effort. Who do you collaborate with? How do you scale yourself?

The extent to which an ML practitioner is involved in designing and building ML pipelines varies from firm to firm - depending on whether there’s a dedicated ML Engineering team in your start-up. Nevertheless, these are some important points that need to be kept in mind while designing an ML framework:

The scale of data you’re working with will grow by leaps and bounds. So your solution should be extensible and be able to support the evolving dataset in the long term.
Choice of machines, databases, queues, dashboard integrations become extremely important. Enough focus should be given to interoperability and the existence of a strong online community for the respective choices.
The design should be iterated many times with all possible corner cases and future work before finalizing one.
Lastly, the cost factor should be an important consideration as this is one of the primary reasons why migration from one framework to another takes place.

There are many ways to structure DS/ML teams—what have you seen work, or not work?

In my opinion, there’s no one-size-fits-all solution especially when it comes to managing people. Each employee has a different set of incentives that make them perform to their potential. Having said that, there are definitely some guidelines that a manager should abide by while structuring their team:

For each project, there should be a team brainstorming session in the POC phase. It invites new ideas, helps in overall learning, and reduces any form of redundant work.
For each project, there should be a separate project owner and an executioner. The solution overall improves drastically with a constant feedback cycle in between these two.
The manager should have a fair idea of the growth trajectories of each of their subordinates, as motivation plays a big role in producing outcomes.

How does your organization or team enable rapid iteration on machine learning experiments and systems?

Pre-empting potential iterations aids in timely execution. This way the solution is also modularised from the very beginning.
Clear prioritization and when everyone is aware of their goals helps with smooth execution.
Performance benchmarking on all important metrics stabilizes the flow of experimentations

What processes, tools, or artifacts have you found helpful in the machine learning lifecycle? What would you introduce if you joined a new team?

Data preparation: This includes tagging; division into train/test/validation sets.
EDA: Exploratory Data Analysis. Profiling and visualizations. Pandas-profiling
Code-versioning and reviews: GitHub. Coding guidelines: pep8
Model tracking: Elastic Observability, GCP AI-Platform
Evaluation metrics

How do you quantify the impact of your work? What was the greatest impact you made?

The impact of the work should not be measured by how fancy the algorithm is. My work would be successful if the algorithm metrics and business metrics are above expectations when the solution is set to production and continues to do so. When most of the corner cases are thought of beforehand and the solution faces little or no bugs when running in production. When the code is readable and understandable by a new employee and KT is a cakewalk.

The greatest impact I made was while designing an ML pipeline end-to-end that modularized all the operations right from cleaning, data imputation to performance benchmarking. Automated the flow in most aspects- for example- filtering choice of the algorithm on the use-case, data distribution, dataset size. This impact was quantified on effective data segregation for different clients, parallel processing, and fairly less redundant lines of code.

After shipping your ML project, how do you monitor performance in production? Did you have to update pipelines or retrain models—how manual or automatic was this?

For projects deployed on one of the popular frameworks- GCP and AWS, monitoring is pretty straightforward. We took advantage of the AI platform framework for the same.

For cases when projects operated separately through an automated workflow, I’d store the updated metrics on a database. From here, this information would be uploaded on the dashboards like- Tableau.

Think of people who are able to apply ML effectively–what skills or traits do you think contributed to that?

Data Science is not a career, it’s a lifestyle; that means you’ll always have to be reading, discussing, and keep yourself updated with the state-of-the-art as well as the machine learning fundamentals-like Linear Algebra, Optimization strategies, etc. The second most important trait is to apply learning to real-life applications and justify the business end of it.

Do you have any lessons or advice about applying ML that's especially helpful? Anything that you didn't learn at school or via a book (i.e., only at work)?

Frequent discussions on ML questions and fundamentals.
Replicating research papers to execution with modifications depending on your use case.
ML engineering: decisions on choosing the right framework with a use-case and budget constraints

How do you learn continuously? What are some resources or role models that you've learned from?

Research papers/posts on very classic solutions like journey of NLP, computer vision architectures on Inception, ResNets; on generative models- GAN, VAE, etc.
Fundamentals - Linear Algebra - Gilbert Strang, Elements of Statistical Learning
Online courses - deep learning - fast.ai , Coursera - deeplearning.ai

Read more mentor interviews?