Learn more about Shreya on her LinkedIn and Medium.
I work as a Lead Data Scientist at a clean energy-focused utility-based B2B2C startup, Bidgely. Here, I work on products related to recommendation engines, electric vehicles, and other IP initiatives that are based on Computer Vision and Machine Learning. As a side hustle, I write content on data science in business, articles.
I started out as a Software engineer in the cloud applications division for the Photo Gallery app in Samsung. In retrospect, the intervention of Computer Vision seems indispensable as I was designing the gallery application. It is then I directly started working on deep learning-related POCs while taking online courses on the same: CS231n(CNN); deeplearning.ai (Andrew Ng), to name a few.
Though the learning was satisfying as I was developing cool projects based on GAN, VAE, etc, I started seeing gaps in understanding due to the missing machine learning fundamentals. This led me to move to an early stage Advertisement Tech start-up and begin learning Linear Algebra, Optimization, Machine Learning basics from scratch. Resources:(books: ESL, Pattern Recognition; courses: Linear Algebra(Gilbert Strang)). For the many diverse projects I completed in the course of three years with the start-up, I worked on algorithms ranging from statistics, NLP to deep learning.
Factors helped along the way:
I start my day by checking on major updates in the start-up world and the finance section. There are a lot of great quality newsletters and Medium articles that give a daily dose of gratification. I divide the day into 3 parts:
IC: Projects/tasks that have little dependency on other folks. This could include working on an IC project, improving on your existing algorithm, researching state-of-the-art solutions, writing research papers, documentation work, preparing a presentation, etc.
Scheduled meetings: Daily sync-up, weekly updates, product meetings, project presentations, client meetings, etc.
Collaborations: Meetings that involve brainstorming with the team, discussions on improvements in algorithms/pipelines, streamlining tasks, timeline discussions, etc.
Aligning business objectives with ML projects is one of the most important tasks that should be thought through early on. This helps in deciding the primary feature set, evaluation metrics and vastly aids in streamlining efforts in the right direction. For this, I discuss the business objectives in detail with the stakeholders concerning the short-term and long-term vision of the company.
For instance: A project to devise a new pricing strategy using ML. For this, one needs to understand the demand-supply paradigm of how sales work in your industry and your start-up. New information like what % of the algorithm should be automated, marginal error % the sales team is ready to risk, etc unfolds that should be a direct input to the ML model.
In all, having a deep understanding of the industry you’re working in helps you ask the right questions to the business team, and thereby embedding the same information in your algorithm makes the whole difference.
Even before beginning the ML work of the project, one should spend some time in customer discovery. This information is usually procured from the product team. As an ML practitioner, you should have a solid understanding of the top 3 pain points you’re trying to alleviate for the end-user. The difficult job is to embed the same through your algorithm.
The extent to which an ML practitioner is involved in designing and building ML pipelines varies from firm to firm - depending on whether there’s a dedicated ML Engineering team in your start-up. Nevertheless, these are some important points that need to be kept in mind while designing an ML framework:
In my opinion, there’s no one-size-fits-all solution especially when it comes to managing people. Each employee has a different set of incentives that make them perform to their potential. Having said that, there are definitely some guidelines that a manager should abide by while structuring their team:
The impact of the work should not be measured by how fancy the algorithm is. My work would be successful if the algorithm metrics and business metrics are above expectations when the solution is set to production and continues to do so. When most of the corner cases are thought of beforehand and the solution faces little or no bugs when running in production. When the code is readable and understandable by a new employee and KT is a cakewalk.
The greatest impact I made was while designing an ML pipeline end-to-end that modularized all the operations right from cleaning, data imputation to performance benchmarking. Automated the flow in most aspects- for example- filtering choice of the algorithm on the use-case, data distribution, dataset size. This impact was quantified on effective data segregation for different clients, parallel processing, and fairly less redundant lines of code.
For projects deployed on one of the popular frameworks- GCP and AWS, monitoring is pretty straightforward. We took advantage of the AI platform framework for the same.
For cases when projects operated separately through an automated workflow, I’d store the updated metrics on a database. From here, this information would be uploaded on the dashboards like- Tableau.
Data Science is not a career, it’s a lifestyle; that means you’ll always have to be reading, discussing, and keep yourself updated with the state-of-the-art as well as the machine learning fundamentals-like Linear Algebra, Optimization strategies, etc. The second most important trait is to apply learning to real-life applications and justify the business end of it.
Read more mentor interviews?
© Eugene Yan 2023 • About • Suggest edits.