Learn more about Eugene on his site, Twitter, and LinkedIn.
I’m an Applied Scientist at Amazon, part of the Books organization. We work to help users read more, and get more out of reading. My team builds machine learning and recommendation systems to help customers discover books through recommendations, search, and interactive widgets. We also have other teams that focus on subscriptions, pricing, etc.
My path is fairly unusual and I wouldn't recommend others to try to replicate it. I studied psychology in college because I was interested in people. I focused on social and cognitive psychology—how do people perceive, make decisions, and behave? This included conducting lab experiments and writing a thesis on how some people performed better under competition while others performed worse. Along the way, I picked up skills in data analysis, statistics, and experimental design.
After graduation, I wasn't sure what to do and joined the government, because well, it's a stable job. I worked on investment policy for a bit but started to miss working with data. Thus, in my free time, I took courses on data analysis, machine learning, and programming on Coursera and edX.
I also started interviewing for entry-level data positions and eventually landed a role at IBM as a data analyst. After a year, I joined the workforce analytics teams. Here, I developed job demand forecasts and contributed to an internal job recommendation engine. This was my first role involving machine learning. Since then, I’ve had various roles in healthcare and e-commerce. Though the domain might differ across roles, the focus has always been on understanding and helping people better. (I've also written more about my journey here.)
Note that this will differ across various roles, as well as across different stages of the machine learning life cycle. Nonetheless, across a project, the bulk of my time is spent on design and implementation.
Design includes researching how others have solved similar problems, understanding the infra and components available, building prototypes to assess feasibility, and finally, writing a design doc and gathering feedback. A good design doc ensures that the solution meets the business requirements (e.g., customer value proposition, cost) and technical requirements (e.g., latency, throughput). This reduces the risk of building the wrong thing, or having to backtrack due to an ill-defined problem statement or tech choice that doesn’t scale.
Once the high-level design is finalized, implementation begins. The bulk of the work is technical and includes writing data pipelines, building model servers, training and deployment workflows, orchestration, monitoring, etc. There are also non-technical tasks such as setting up customer targeting, A/B tests, and getting permissions to launch our new system.
In addition, I spend significant time communicating via daily stand-ups and project syncs. The scientists in our team also have a paper reading group. Every two weeks, someone presents a paper and we discuss how we how it might be applicable to our work. I also have 1-on-1s with folks in the team and wider organization. This helps me keep up to date with the problems that others are working on and stay in touch, especially since we’re all working from home now.
Usually, it’s the business coming to me with problems they want solved. We then work together to identify the right problem to solve, and the best way to solve it. For example, in a previous role, someone from the logistics team asked: “Could you boost the rank of products that are Fulfilled By Lazada (FBL)?” To better understand their intent, I probed further.
“Because FBL products are delivered faster.” While I thought the customer benefit was clear, I asked why it was important to them.
“Because when it’s delivered faster, we get fewer complaints about late deliveries.” This was when we realized the solution wasn’t getting products to customers faster—it was giving customers more accurate delivery estimates. This wasn’t a problem that should be solved by ranking products differently; that wouldn’t solve the root cause of underestimating forecasts. Thus, we reframed the problem and solved it by improving our forecasting algorithm instead.
Sometimes, the data team might identify problems or opportunities that the organization might have overlooked. How to align this depends on the audience. Some audiences like a structured document identifying the customer’s pain point, expected benefits, and proposed solution. Others like a prototype that demonstrates the customer experience. Nonetheless, the underlying question is the same—how will it help the customer and business?
More on how to influence without authority and align with business.
I'm fortunate that my work mostly involves customer-facing ML systems. Thus, I can experience and use what we build as a customer myself. We also get tickets (read: complaints and feedback) from customers via customer support on what can be improved.
In addition, customers tell us how we’re doing. We have user studies, customer feedback and anecdotes, and friends and family who use our product. I can also analyze the data to understand customer needs. How often do customers have a bad shopping experience (e.g., product returns, poor review) due to poor product quality? How often do customers search for something and not find what they’re looking for?
It’s also a good idea to work backwards from the customer. While we’re defining the problem or designing the solution, we should ask ourselves: “How will this help the customer?”
I would start with clarifying the intent (why), desired outcomes (what), and constraints (how not to). This ensures I'm aligned on the problem, and provides freedom to solve the problem in the best way, as long as it meets the intent, desired outcome, and constraints
Next, I would research how others have solved similar problems. Learning what worked, or didn’t work, reduces the search space and accelerates my design and implementation process. I’ve had to do this so often that I’ve curated a list of papers and tech blogs on applied machine learning. I try to time-box this step to a week or two.
Then, I would write a doc. This could be a research doc or a design doc. Writing ensures I understand the problem space, can summarize my research, and consider the trade-offs. For example, recommendations can be refreshed daily, or generated in real-time. And to generate recommendations in real-time, we can use raw EC2 and load balancers, SageMaker, or something else. Writing the doc makes me think through these trade-offs, explain my decision rationale, and get feedback.
I mostly collaborate with product managers and software engineers. Product managers help with defining requirements, getting permissions to experiment, etc. Software engineers help with integrating our ML systems into the overall platform, setting up monitors and alarms, etc.
The main way I scale myself is through writing. This starts with writing one-pagers and design docs to get feedback and ensure everyone’s on the same page. I also try to make my systems and data artifacts easy to consume and reuse. This way, others can build on my work and what I built goes further and drives more impact.
Haha I've a clear bias on this and have written about it here.
One way is to give easy access to data. For example, we have a data lake that makes it easy to discover, publish, and consume quality data. Within my first week, I was able to access and analyze data on user reading behavior, which I then used to guide improvements to our recommendation system.
We also try to have infra mostly self-service. I can spin up a Spark cluster on my own to analyze data, set up beefy notebook instances, and kick off intensive training jobs. Having it self-service removes the friction of data scientists having to deal with infra, or having to wait for an infra engineer to help them. This greatly reduces the barriers to analyzing large amounts of data or running a quick experiment.
The steps I shared on how I approach a new problem have been invaluable. One-pagers help to clarify the intent and desired outcome, ensuring that we’re solving the right problem for customers. Conducting a literature review to understand how others solved similar problems reduces the search space and speeds up the design process. Writing design docs help to clarify my thinking and scale sharing and feedback.
In terms of tools, I'm a big fan of tools that enable fast prototyping, and have written about using Jupyter, Papermill, and MLflow.
I also like to build prototypes (with a front-end) via FastAPI and some basic HTML and CSS. Having something visual makes it easier for non-data folks to understand, interact with, and give feedback on our recommendation or machine learning work.
The most straightforward way is to measure metrics via an A/B test. These can be business metrics involving revenue and cost, or online metrics such as click-through rate, conversion, etc.
You might also need to develop your own metrics. One problem I tackled was prioritizing product quality. Some sellers try to game the system by selling poor-quality products at exceptionally low prices—these products sell well, but the customer experience is bad. To qualify the impact of my work, I tracked customer NPS specific to product quality (increased 8-12%) and return rates (reduced 8-12%). This was one of the most meaningful projects I shipped as it had a direct benefit on customer experience.
You might also work on projects that aren’t customer-facing. For example, in a prior role, I realized that the team had to write a lot of boilerplate code to clean data, and train and deploy models. To address this, we built a library and docker templates. We measured how much development time was required before running the first experiment and then deploying the first prototype model. The new library reduced the development cycle by 2/3. In this case, I quantified impact via improvements to team productivity which allows us to do more for the customer and business.
The greatest impact I made, to the business, was probably during my time in Lazada where my work on ranking and push notifications increased customer engagement (conversion up 5-8%, revenue per session up 15-20%), and my work to automate manual processes (e.g., product and review classification) saved manual effort (>95%) and reduced lead time for new products being available online.
Aside from that, I think my work in Lazada to introduce new sellers/products, and prioritize on product quality, was most meaningful. The former helped new sellers grow and increased CTR and add-to-cart of new products (30-80%), while the latter reduced return rates by 8-12%. Also, my current work at Amazon to make it easiest for customer to find and enjoy new books.
It starts with monitoring the input data. If your data pipelines are established and robust, this might not seem as important. Nonetheless, there was one instance where I had to rely on external data from healthcare providers. The data schemas were inconsistent and the data would drift due to seasonality and healthcare policy changes. In that instance, we built a simple pipeline to run basic schema and statistical checks on the input data and flag anomalies.
When retraining models, I have a test set based on a slice of the most recent data. When models are refreshed, I evaluate them on the test set. If the model evaluation metrics don’t meet the threshold, the refreshed model doesn’t get deployed. Better to have a slightly stale model than a new but bad model.
At Amazon, we set up monitors and alarms for our recommenders. For example, we get alarms when latency increases beyond a threshold for a prolonged period, or when model traffic drops below a minimum level. We also have dashboards for basic metrics such as impressions, click-through rate, etc., as well as an on-call rotation.
More about maintaining machine learning in production here.
There was a project where the goal was to generate tags describing the attribute of our products. The team initially tried unsupervised and semi-supervised ML such as clustering, topic modelling, and belief propagation. Unfortunately, this required significant manual labour to name the clusters/topics, as well as audit the relevancy and appropriateness of tags.
For one of my attempts, I tried to be lazy—instead of generating labels, I looked for labels that existed in the data. For example, users may search for "abcde for children". We can then associate their clicked or purchased product with "for children". They might also organize items into named groups—we can use these names as tags for items in those groups.
This "lazy" approach turned out to be far more effective and efficient, and is a great example where creative sourcing and use of data trumps sophisticated ML. The approach is currently live and is a foundational source of tags.
Empathy with the end-user helps. This ensures that the machine learning work is tied to the end user’s needs and benefits. Data scientists who keep the customer on top of their minds tend to build systems that have better outcomes for the customer, and thus, the business.
It’s also good to have a strong sense of ownership. Trying to apply ML in a real-world setting involves a lot of work. Sometimes, you need to stretch by working with the business to define the right problem, or figure out how to deploy and integrate your system yourself. People who have ownership and a nothing-is-not-my-job attitude tend to roll up their sleeves and get it done, instead of waiting for help. This helps them iterate and ship faster.
Start with the problem, not the technology. Choosing the right problem is half the battle won. Don't solve problems that won't matter to customers or the business, no matter how exciting the technology is. Also, take the time to frame the problem in various ways to see which works best. Framing the problem the right way can have an outsized impact on outcomes.
It’s also helpful to focus more on system and training data design (instead of just model design). A simple model in a well-architected system will have a greater impact than a sophisticated model that can’t be served reliably. Similarly, how we design training data and labels often has a bigger impact than the machine learning model itself. Try to frame your problem and data so you can benefit from a self-supervised approach. Take care when generating negative labels so they aren't too easy or difficult.
I try to read and write more. For example, when I wanted to catch up on NLP, I read one or two papers a week. Then, I wrote a summary, starting from RNNs all the way to T5. Similarly, when I wanted to learn about data discovery platforms and feature stores, I read whatever I could find about their design, implementation, and use cases, and wrote teardowns about them.
Read more mentor interviews?
© Eugene Yan 2023 • About • Suggest edits.