Feel the music.
5 AI models. One vibe.
MODIFY.AI now reads your face, your words, your voice, and your photos — and turns your emotions into the perfect Spotify playlist, every time.
Every model captures emotion differently. Use one or combine all five for the most accurate mood detection ever.
Facial Emotion AI
Real-time webcam analysis detects 7 core emotions — happy, sad, angry, surprised, fearful, disgusted, neutral — using simulated deep learning facial feature extraction.
Text Sentiment NLP
Type how you feel — or paste a journal entry, tweet, or message. The NLP model reads linguistic patterns, tone, and keywords to map your text to an emotional state and playlist.
Image Mood Detection
Upload any photo — a selfie, a landscape, a painting. The vision model analyses colour palette, scene composition, and visual cues to extract the emotional atmosphere of the image.
Voice Tone Analysis
Speak into your mic for 5 seconds. The audio model analyses pitch, tempo, energy, and timbre to understand the emotional signature of your voice — no words needed.
AI Lyric Generator
Choose a mood and music genre — the generative AI writes original song lyrics tailored exactly to your emotional state. Export them, share them, or use them as a playlist prompt.
Fusion Mode
The most powerful feature. Combine ALL five models simultaneously — face + text + voice + image — and the ensemble model votes on your dominant emotion for maximum accuracy.
Pick any model and experience emotion-based music personalisation in real time.
Click below to start webcam
Input Collection Multi-modal
Face (webcam), text (typed), voice (mic), image (uploaded) — each captured independently and preprocessed into an emotion probability vector.
Per-model Inference Parallel
Each model runs its own emotion classification: CNN for face, VADER/BERT for text, spectral analysis for voice, colour histogram for image.
Ensemble Voting Weighted
The ensemble layer applies learned weights to each model's output based on its historical accuracy, then soft-votes on the final emotion label.
Playlist Recommendation Spotify API
The fused emotion label and confidence score are used to query Spotify's API with emotion-optimised parameters (valence, energy, danceability).
A layered architecture where each modal pipeline feeds into a shared emotion space before playlist resolution.
Frontend Layer
React.js SPA with modular panel components, real-time webcam stream, Web Audio API integration, and animated UI powered by CSS keyframes.
AI Model Layer
Five parallel emotion classifiers — CNN (face), VADER NLP (text), spectral ML (voice), colour ML (image), and a meta-ensemble for fusion mode.
Spotify API Layer
Emotion labels map to Spotify audio features: valence (happiness), energy, tempo, danceability. Playlist queries are dynamically constructed per emotion profile.
Emotion Space
All five models output into a shared 7-dimensional emotion vector (joy, sadness, anger, fear, disgust, surprise, neutral) before playlist resolution.
Generative Layer
A GPT-style language model fine-tuned on lyric datasets generates mood-matched song lyrics using mood + genre + theme as conditioning inputs.
Fusion Ensemble
Weighted soft-voting across available model outputs. Confidence scores modulate weights so high-confidence models contribute more to the final label.
| Feature | Model Type | Midterm v1 | End Term v2 |
|---|---|---|---|
| Facial Emotion Detection | CNN / Face API | ✓ Existed | ✓ Improved |
| Spotify Playlist Recommendation | Rule-based | ✓ Existed | ✓ Improved |
| Futuristic Animated UI | CSS / React | ✓ Existed | ✓ Enhanced |
| Text Sentiment Analysis | NEW VADER NLP | ✗ | ✓ Added |
| Image Mood Detection | NEW Vision ML | ✗ | ✓ Added |
| Voice Tone Analysis | NEW Audio DSP | ✗ | ✓ Added |
| AI Lyric Generator | NEW Generative LLM | ✗ | ✓ Added |
| Multimodal Fusion Mode | NEW Ensemble | ✗ | ✓ Added |
| Per-model Confidence Scores | NEW | ✗ | ✓ Added |
| Real-time Audio Waveform | NEW Web Audio API | ✗ | ✓ Added |