End Term Project — Multimodal AI

Feel the music.
5 AI models. One vibe.

MODIFY.AI now reads your face, your words, your voice, and your photos — and turns your emotions into the perfect Spotify playlist, every time.

5
AI Models
7
Emotions Detected
Playlist Combos
Real-time
Processing
5 ways to feel the music

Every model captures emotion differently. Use one or combine all five for the most accurate mood detection ever.

v1 — Existing
📷

Facial Emotion AI

Real-time webcam analysis detects 7 core emotions — happy, sad, angry, surprised, fearful, disgusted, neutral — using simulated deep learning facial feature extraction.

ReactWebcam APIFace DetectionSpotify API
v2 — New
💬

Text Sentiment NLP

Type how you feel — or paste a journal entry, tweet, or message. The NLP model reads linguistic patterns, tone, and keywords to map your text to an emotional state and playlist.

NLPVADER SentimentKeyword ExtractionEmotion Mapping
v2 — New
🖼️

Image Mood Detection

Upload any photo — a selfie, a landscape, a painting. The vision model analyses colour palette, scene composition, and visual cues to extract the emotional atmosphere of the image.

Computer VisionColour AnalysisScene Recognition
v2 — New
🎙️

Voice Tone Analysis

Speak into your mic for 5 seconds. The audio model analyses pitch, tempo, energy, and timbre to understand the emotional signature of your voice — no words needed.

Web Audio APIPitch DetectionProsody Analysis
v2 — New
✍️

AI Lyric Generator

Choose a mood and music genre — the generative AI writes original song lyrics tailored exactly to your emotional state. Export them, share them, or use them as a playlist prompt.

Generative AIGPT-style LLMMulti-genre
v2 — New
🔀

Fusion Mode

The most powerful feature. Combine ALL five models simultaneously — face + text + voice + image — and the ensemble model votes on your dominant emotion for maximum accuracy.

Ensemble ModelMulti-modal FusionWeighted Voting
Try it live

Pick any model and experience emotion-based music personalisation in real time.

📷

Click below to start webcam

Emotion Confidence
Happy
0%
Sad
0%
Angry
0%
Surprised
0%
Fearful
0%
Neutral
0%
Recommended Playlists
Analyse your emotion to see playlist recommendations
or try:
😊
Happy
92% confidence · Positive sentiment
Detected Keywords
Matched Playlists
🖼️
Drop an image here or click to upload
JPG, PNG, GIF, WebP supported
uploaded
Visual Mood Analysis
🎨
Recommended Playlists
Hold for 5 seconds to capture your voice tone
🎙️
Tap the mic to start recording
🎵
Audio Features Detected
Playlists for your vibe
Fusion mode combines all available signals. Activate the inputs you want to use, then run the fusion analysis. Each active model contributes a weighted vote.
1

Input Collection Multi-modal

Face (webcam), text (typed), voice (mic), image (uploaded) — each captured independently and preprocessed into an emotion probability vector.

2

Per-model Inference Parallel

Each model runs its own emotion classification: CNN for face, VADER/BERT for text, spectral analysis for voice, colour histogram for image.

3

Ensemble Voting Weighted

The ensemble layer applies learned weights to each model's output based on its historical accuracy, then soft-votes on the final emotion label.

4

Playlist Recommendation Spotify API

The fused emotion label and confidence score are used to query Spotify's API with emotion-optimised parameters (valence, energy, danceability).

How it's built

A layered architecture where each modal pipeline feeds into a shared emotion space before playlist resolution.

🖥️

Frontend Layer

React.js SPA with modular panel components, real-time webcam stream, Web Audio API integration, and animated UI powered by CSS keyframes.

🧠

AI Model Layer

Five parallel emotion classifiers — CNN (face), VADER NLP (text), spectral ML (voice), colour ML (image), and a meta-ensemble for fusion mode.

🎵

Spotify API Layer

Emotion labels map to Spotify audio features: valence (happiness), energy, tempo, danceability. Playlist queries are dynamically constructed per emotion profile.

📊

Emotion Space

All five models output into a shared 7-dimensional emotion vector (joy, sadness, anger, fear, disgust, surprise, neutral) before playlist resolution.

✍️

Generative Layer

A GPT-style language model fine-tuned on lyric datasets generates mood-matched song lyrics using mood + genre + theme as conditioning inputs.

🔀

Fusion Ensemble

Weighted soft-voting across available model outputs. Confidence scores modulate weights so high-confidence models contribute more to the final label.

What's new in v2
FeatureModel TypeMidterm v1End Term v2
Facial Emotion DetectionCNN / Face API✓ Existed✓ Improved
Spotify Playlist RecommendationRule-based✓ Existed✓ Improved
Futuristic Animated UICSS / React✓ Existed✓ Enhanced
Text Sentiment AnalysisNEW VADER NLP✓ Added
Image Mood DetectionNEW Vision ML✓ Added
Voice Tone AnalysisNEW Audio DSP✓ Added
AI Lyric GeneratorNEW Generative LLM✓ Added
Multimodal Fusion ModeNEW Ensemble✓ Added
Per-model Confidence ScoresNEW✓ Added
Real-time Audio WaveformNEW Web Audio API✓ Added