Image by Author | Canva
The moment I see a house price prediction or an image classification project, I start yawning. I can’t help it. Everybody does the same projects, and it’s so effing boring!
Sure, why would you care about me, but why insist on boring yourself? What, data science should not be fun? Who said that?
There’s really no need to do the same projects everybody does and stifle your creativity. When you get creative, you can have fun while also sharpening your skills with impactful projects. It also helps you stand out from other candidates.
Here are some uncommon projects for shaking things up!
1. AI-Powered Art Generation
What’s more logical if you want to get more creative in data science than using it to create art. (We’ll leave discussion about what art is to philosophers).
Project Idea: Teaching AI to Be the Next Picasso With GANs
Instead of just analyzing existing artworks, why not create new ones using GANs? Train it on a dataset of paintings and try to create images that mimic a certain style or your favorite artist.
Project Approach
- Collect a Dataset: Gather a collection of paintings to serve as training data. Some of your sources could be the DELAUNAY dataset, Abstract Paintings Dataset, or ArtEmis: Affective Language for Visual Art. Always keep in mind to use a diverse dataset, with various painting styles and color schemes.
- Preprocess the Data: Resize and normalize the images to ensure consistency.
- Train the GAN: Select a suitable GAN architecture to learn the patterns and features of paintings. Some examples of the architecture are Deep Convolutional GAN (DCGAN), Wasserstein GAN (WGAN), Progressive Growing GAN (PGGAN), StyleGAN (Style-Based GAN), or BigGAN (Large-Scale GAN).
- Generate Art: Use GAN to create new images that reflect the dataset’s characteristics.
Why Is This Useful?
With this project, you will:
- Expand your creativity by combining AI and artistic expression
- Learn deep learning and GANs in a practical way
- Experiment with different art styles by modifying training data
2. Data-Driven Storytelling
Raw data is useless to most people unless communicated effectively. Many data scientists lack in this department. It will set you apart from others if you learn how to use data storytelling, which will engage decision-makers and persuade them to take action.
Project Idea: What Are Musicians Really Singing About?
If your brain hurts from the banality of modern pop music lyrics, maybe you can discover why it’s so by exploring song lyrics over time. You could uncover interesting (!?) patterns in language, sentiment, and themes. In this project, you will collect, analyze, and visualize data to create a compelling narrative from your findings.
Project Approach
- Data Collection: Use Python’s BeautifulSoup and requests to extract lyrics from sources such as AZLyrics, Genius, and Lyrics. You can also use APIs, such as Genius API, Musixmatch Developer API, and ChartLyrics Lyric API.
- Perform Text Analysis to Find Recurring Themes: Common approaches to text analysis include sentiment analysis, keyword extraction, TF-IDF analysis, and topic modeling (LDA).
- Visualize Results: Make your findings more interesting by creating bar charts, word clouds, and line graphs. Use Tableau, Power BI, or Plotly Dash to create interactive dashboards so users can explore lyrical trends dynamically, e.g., the most common words, changes in sentiment across the decades, genre comparison, etc.
Why Is This Useful?
In this project:
- You identify cultural shifts in music trends over time
- You reveal hidden patterns in songwriting styles
- You can turn data into a compelling narrative
3. Automated Social Media Analysis
It’s not a wild guess that you probably spend too much time on social media. That is not something I can recommend, but it’s reality. So, why not replace the brain rot with something useful? Maybe ‘brain refresh’ by automating the process of analyzing trends, sentiments, and engagement metrics on social media.
Project Idea: The Trump Tweet-O-Meter™
In this project, you could create a real-time sentiment analysis for Twitter (sorry, X) posts on Donald Trump (sorry, POTUS). Trump Tweet-O-Meter™ is a real-time natural language processing (NLP) pipeline designed to ingest, analyze, and visualize sentiment trends in those posts. It will involve employing streaming data, sentiment classification, and time-series analysis to get insights into public opinion shifts, political discourse patterns, and major sentiment fluctuations in response to real-world events.
Project Approach
- Real-Time Data Ingestion: Fetch posts using Twitter API.
- NLP Classification: Apply TextBlob or VADER sentiment analysis to classify tweets into sentiment categories.
- Tracking Sentiment Shifts: Store tweets and sentiment scores in a time-indexed database (e.g., SQLite, PostgreSQL, or MongoDB), detect sentiment trends with time series smoothing, and identify spikes and dips after major Trump-related events (you won’t lack these, for sure).
- Visualization: Generate dynamic sentiment charts, such as word clouds, heatmaps of geolocated Trump tweets, or stacked bar charts to compare Trump’s sentiment to, say, Zelenskyy’s.
Why Is This Useful?
- You do a political research, but make it fun
- You automate real-time public sentiment tracking
- Journalists can use this for fact-based reporting on public opinion
- You can predict Trump’s impeachment, “so help you God”
4. Niche Predictive Models
You see a predictive modeling project, and it will probably be about stock or house prices, weather forecasts, or customer behavior. (Yes, the sound you just heard, it was me yawning.) You can make this more niche, for example, to predict book popularity (to become a rich writer) or meme trends (to become a meme king). Simultaneously, you build your machine-learning skills.
Project Idea: Will Your Book Be the Next Bestseller or Just Kindling for a BBQ?
Build a machine learning model that predicts a book’s success on factors such as genre, title length, page number, and readers’ ratings, for example.
Project Approach
- Collect Book Data: Collect the data through Hardcover API, New York Times Bestsellers API, and Amazon Books scraping.
- Data Preprocessing: Clean the collected data and prepare it for machine learning.
- Train an ML Model: Split the data into training and testing sets, and use a classification model (e.g., logistic regression, random forest, gradient boosting) to predict if a book will be a bestseller.
- Evaluate Model: Use metrics such as accuracy score, confusion matrix, and feature importance to ensure your model is accurate and generalizable. Improve its performance through hyperparameter tuning, e.g., GridSearchCV.
Why Is This Useful?
You can:
- Understand what makes books successful and will get
- Avoid developing predictive models for the overused datasets
- Extend it to recommend book pricing and marketing strategies
Conclusion
Data science projects and fun are not mutually exclusive concepts. With those four project suggestions, you can learn and practice fundamental data science skills while also having fun.
The main point is to go beyond the several standard datasets and topics everybody uses. Have that in mind, and you will easily think of many more uncommon projects. As a bonus benefit, you will stand out from the crowd.
Nate Rosidi is a data scientist and in product strategy. He’s also an adjunct professor teaching analytics, and is the founder of StrataScratch, a platform helping data scientists prepare for their interviews with real interview questions from top companies. Nate writes on the latest trends in the career market, gives interview advice, shares data science projects, and covers everything SQL.