Mihir Ojha

Personal Projects

This portfolio is a collection of notebooks that I have worked on throughout my data science journey. It contains projects of topics that interest me. I wish to continually add my personal and professional projects. (Still on the quest of landing my first job as a data scientist)

Basic churn prediction using EDA and Classification algorithms.

jupyter webapp

Customer churn is always a top priority of any subscription-based company. Ideally, churn should be 0.00% but that's only if we lived in an ideal world where unicorns existed. However, in reality, churn should be as low as possible because it costs more to acquire new customers than it is to retain existing customers. This can be especially problematic for businesses that rely on a steady stream of new customers to keep their revenue stream flowing. So it becomes really important for companies to maintain a loyal customer base and simultaneously determine the factors that affect the customer experience resulting in them churning. Having that information can help a company make data-driven decisions to keep the churn rate minimal. High rates of churn can be a indicator that there are problems with the company's products or services, which can be addressed through targeted improvements.This project tries to determine the factors that affect churn. It involves predicting whether a customer is likely to switch to a different service provider. Classification algorithms, such as random forest, K-nearest neighbor, logistic regression, and support vector machines can be used to predict telecom churn. You can also try the webapp and input custom values to find if the customer will churn or not.

Creating multiple playlists using K-Means Clustering, Spotipy, and Plotly.

nbviewer

Spotify is a digital music, podcast, and video streaming service that was launched in 2006. It allows users to browse and stream a wide variety of music, podcasts, and other audio content, as well as create and share playlists. Spotify is available in more than 92 countries and has over 345 million active users, making it one of the most popular streaming services in the world. These days, Spotify and music have become synonymous. I use Spotify daily and I feel their recommendation algorithm is a cut above the other music streaming service given their price point of just $10 a month (having tried Apple music, amazon prime unlimited). I always wondered how did they make such accurate recommendations given my streaming behavior. I wanted to tackle this from a data science perspective, so I set out to create some playlists for myself. This project uses Spotipy, Spotify’s API, to access the top 1000 songs from 2017 to 2022. It takes a detailed look into how a track's audio features impact how they are grouped. The project utilizes PCA to reduce the dimensions of the audio feature list before running a K-Means model to classify and create a custom playlist from the most played tracks. You can access the playlists here:

Classifying racist and homophobic tweets.

nbviewer webapp

Twitter is a social media platform that allows users to send and read short messages called "tweets." Tweets are limited to 280 characters or less and can include text, images, and links to other websites. Twitter was founded in 2006 and has since become a popular platform for celebrities, politicians, journalists, and other public figures to share their thoughts and updates with their followers. Twitter has millions of active users and is available in more than 40 languages. With such an active user base using the platform to share their opinion, it can be an arduous task to moderate tweets. One such difficulty for twitter is to flag hate tweets. Hate speech is any speech, conduct, writing, or expression that may incite violence or prejudicial action against or by a particular individual or group, or because it disparages or intimidates a particular individual or group. It is generally considered to be harmful and unacceptable, as it can contribute to a culture of intolerance and discrimination. In many countries, hate speech is regulated by law and may be punishable by fines or imprisonment. However, the definition of hate speech can vary and is often a subject of debate, as it can be difficult to draw a line between protected free speech and harmful expression. Hate speech detection is important for Twitter because it can help the platform create a safer and more inclusive environment for its users. Hate speech can cause harm to individuals and groups who are targeted, and can also contribute to a culture of intolerance and discrimination. By detecting and removing hate speech, Twitter can help to protect its users from this kind of content and create a more positive and welcoming atmosphere. Additionally, detecting and addressing hate speech can be important for Twitter from a business perspective, as it can help to maintain the platform's reputation and user trust. In recent years, there has been increased pressure on social media platforms to take action against hate speech and other forms of harmful content, and Twitter has implemented several policies and tools to help identify and address this type of content on its platform. This project is like a tip of the tip of the tip of the iceberg (yes, I typed it 3 times.). The project utilizes Natural language processing (NLP), which is a field of artificial intelligence that focuses on the interaction between computers and humans through the use of natural language. It involves developing algorithms and models that can understand, interpret, and generate human language. NLP can be used to classify hateful or abusive tweets as part of an effort to combat hate speech on social media platforms. This is still a complex task because labeling tweets as hateful or not is also a big pandora’s box. However, I took a very simple approach which involved cleaning and formatting the raw text data to make it more suitable for analysis. This included tokenization (splitting the text into individual words or symbols), stemming (reducing words to their base form), and removing stop words (common words that do not convey meaning). I classified tweets that either racist or homophobic. I also created a webapp where you can type in a tweet and find out if that tweet is a hate tweet or not. Imagine a much more advanced version of the app that can help serve as a guiding light on hate speech.

Back to Top