Project Portfolio

Machine Learning Projects

Churn prediction

Why

At an OTT company, on average 20% of paid subscribers were churning every month and the company wanted to reduce the churn and retain the customers.

Role

As a Data Science scientist, my target was to reduce the churn of customers from 20% to 17% in six months.

Action

We started with talking with different teams and understanding their perspective on churn, gathering as much domain knowledge as possible.
Analyzing the patterns of churn and searching for the various data sources available. Talking to different data owners and understanding the data
Exploratory data analysis
Feature engineering
Creating a model to predict churn
Evaluating results
Model deployment
Measuring the impact of the predictions and making changes for improvement

Result

We are able to reduce churn by 4%, there by increasing the customer rentention and revenue for the organisation.

Tools Used

AWS S3, Athena, AWS Glue, EC2, ECR
Databricks
Sisense / Tableau for Dashboards

Content Moderation, Tagging & Monitoring

why

At a social media company, we wanted to keep the platform clean and safe to increase the satisfaction of all our customers. We wanted to increase their enjoyment by recommending videos which users will prefer and are difficult to discover on the platform due to plethora of choices.

Role

As a Data Science manager, my target was to blacklist the NSFW content and reduce the workload of manual moderators by 60%. Tag the videos and improve the number of video views and watch time by 5%

Action

Brainstorming what is meant by NSFW, reading the content moderation guidelines
Understanding what should not be published on the platform.
We created computer vision models and also used open source models to detect NSFW and category of the content.
Evaluate and deploy the models
Used an open source tool Fiftyone to automate samping videos, getting the ground truth, calculating the ML metrics and publishing on the dashboard.

Result

We reduced the effort of moderation and tagging by more than 80%. Increased the watchtime and video views on average by 3%

Tools Used

Snowflake, EC2, Ray Serve, Grafana, Fiftyone

Modeling as a Service for look-alike Models

Why

At an Adtech organization, partner companies have the data for survey participants. The number of survey participants are very less. Find the customers in our company’s database who are similar to survery participants for ad targeting and monetisation.

Role

As a Data science manager, develop a framework for data integration with partners and automation the pipeline for building look-alike models, segment the customers and send the data for campaigning.

Action

Co-ordination with different teams for data integration
Designing a framework for data ingestion, feature engineering, model creation, segmenting the customers and evaluation of results

Result

Creating a framework to automate data ingestion, creation of hundereds of look-alike models and segmenting the customers and deploying them. Using the framework for integration with different partners. Increasing the campaign revenue by more than 2%. The complete deployment was automated using Mlops

Tools Used

AWS S3, Glue, Sagemaker, EMR, EC2, Mlflow
Snowflake
Airflow
Apache Superset for Dashboarding

Placement prediction

Why

At an Edtech organization, students enrollment for a course is high, if he is informed in advance the placement outcomes of the course.

Role

As a Data scientist, I need to create a model which will help increase the enrollment by 10%

Action

Collecting the data on student’s past educational performance, work experience etc.
Creating and evaluating the model

Result

Student enrollment increased 12% when they were informed in advance their placement outcomes

Tools Used

AWS & Streamlit

Recommendation systems

Why

At a social media company, enabling content discovery and providing informed choices for content consumption as per the user preferences

Role

As a Data Science manager, my role is to design a framework for recommendation engines, executing the project and evaluating the results. Recommendation engines should increase the video views and watch time by 5%

Action

Designing a recommendation system based on candidate retrieval and ranking system
Creating the pipelines for data processing
Execution of candidte retrieval using elasticsearch
Modeling and ranking the videos for the user
Evaluation of the recommendation system

Result

We are currently deployment stage

Tools Used

Elasticsearch, FastAPI, AWS, Snowflake

Distributed computing, Data cleaning and Data Quality

Building Datalake

why

At an OTT company, data is generated from numerous tools which is used by different departments in an organisation. Accessing data generated by different tools was becoming difficult and different numbers were being reported at different levels.

Role

As a Data Science manager my role was to create a single point of truth for the data. Automate the data processing requirements of the organization by 50% and reduce the infrastructure cost by 30%

Action

Collecting all the data generated by different tools at a central location.
Designing a framework for automation, cleaning and processing of the data.
Scheduling of workflows and creating alerts for failures
Creating different marts for the data
- Bronze tables - Cleaned and transformed data
- Silver tables - Joining different tables, calculating and storing the features required for analysis and different machine learning models
- Gold tables - Calculating and storing metrics for business reporting and dashboards
Setting up of expectations for data quality, monitoring of validations and alerts in case of any discrepancy.

Ensuring Data Quality with Great Expectations

Result

Achieved more than 50% of organisations data processing requirements and automation using Data engineering and MLOps

Tools Used

AWS S3, EMR, Athena, Glue, Databricks, Pyspark, Hive metastore, Airflow etc
Great Expectations for monitoring Data quality
Dashboards for monitoring the data processed and results of quality checks

Data Discovery

why

Data Scientists in the organization was facing challenges to know about existing data sources, understanding their definition and meaning. Knowledge existed in silos as the information generated by one data scientist was not shared with others

Role

To create a interactive platform where information about data sources are searchable, definitions of features available. Creating a knowledge repository to share all the analysis and reports.

Action

Search for a data discovery platform which can easily integrate with AWS
Starting and executing a POC for a data discovery platform
Providing access to the platform to all data scientists in the organization and collecting feedback.
Implementation in production

Result

Increase in work satisfaction and productivity of all the data scientists in the organization.

Tools Used

Quilt, Elasticsearch

Visualization

Building dashboards

Why

At an OTT company, Business teams wanted to see a few metrics with dynamic date ranges which were extremely difficult to show on the dashboards. An example, unique users on the platform with a dynamic data range.

Role

Do a POC on how to show these metrics on the dashboard which can be rendered very fast and are computationally cheap.

Action

Conducting a study on how to process the data at scale to calculate these metrics and provide the information to the dashboard quickly
Reading about sketching algorithms which can provide approximate results very quickly
Undertaking a POC
Creating a dashboard and presenting the results

Result

Able to provide metrics required by the business team on the dashboard for a dynamic date range

Creating Dashboard on raw data using Dremio

Tools Used

Dremio, Druid, Apache Superset