MODIFY.AI — Multimodal Emotion Music

Multimodal Intelligence

5 ways to feel the music

Every model captures emotion differently. Use one or combine all five for the most accurate mood detection ever.

v1 — Existing

📷

Facial Emotion AI

Real-time webcam analysis detects 7 core emotions — happy, sad, angry, surprised, fearful, disgusted, neutral — using simulated deep learning facial feature extraction.

ReactWebcam APIFace DetectionSpotify API

v2 — New

💬

Text Sentiment NLP

Type how you feel — or paste a journal entry, tweet, or message. The NLP model reads linguistic patterns, tone, and keywords to map your text to an emotional state and playlist.

NLPVADER SentimentKeyword ExtractionEmotion Mapping

v2 — New

🖼️

Image Mood Detection

Upload any photo — a selfie, a landscape, a painting. The vision model analyses colour palette, scene composition, and visual cues to extract the emotional atmosphere of the image.

Computer VisionColour AnalysisScene Recognition

v2 — New

🎙️

Voice Tone Analysis

Speak into your mic for 5 seconds. The audio model analyses pitch, tempo, energy, and timbre to understand the emotional signature of your voice — no words needed.

Web Audio APIPitch DetectionProsody Analysis

v2 — New

✍️

AI Lyric Generator

Choose a mood and music genre — the generative AI writes original song lyrics tailored exactly to your emotional state. Export them, share them, or use them as a playlist prompt.

Generative AIGPT-style LLMMulti-genre

v2 — New

🔀

Fusion Mode

The most powerful feature. Combine ALL five models simultaneously — face + text + voice + image — and the ensemble model votes on your dominant emotion for maximum accuracy.

Ensemble ModelMulti-modal FusionWeighted Voting

Interactive Demo

Try it live

Pick any model and experience emotion-based music personalisation in real time.

📷

Click below to start webcam

Emotion Confidence

Happy

0%

Sad

0%

Angry

0%

Surprised

0%

Fearful

0%

Neutral

0%

Recommended Playlists

Analyse your emotion to see playlist recommendations

Type how you're feeling

or try:

😊

Happy

92% confidence · Positive sentiment

Detected Keywords

Matched Playlists

🖼️

Drop an image here or click to upload

JPG, PNG, GIF, WebP supported

Visual Mood Analysis

🎨

Recommended Playlists

Hold for 5 seconds to capture your voice tone

🎙️

Tap the mic to start recording

🎵

Audio Features Detected

Playlists for your vibe

Mood

Genre

Theme (optional)

Structure

Fusion mode combines all available signals. Activate the inputs you want to use, then run the fusion analysis. Each active model contributes a weighted vote.

1

Input Collection Multi-modal

Face (webcam), text (typed), voice (mic), image (uploaded) — each captured independently and preprocessed into an emotion probability vector.

2

Per-model Inference Parallel

Each model runs its own emotion classification: CNN for face, VADER/BERT for text, spectral analysis for voice, colour histogram for image.

3

Ensemble Voting Weighted

The ensemble layer applies learned weights to each model's output based on its historical accuracy, then soft-votes on the final emotion label.

4

Playlist Recommendation Spotify API

The fused emotion label and confidence score are used to query Spotify's API with emotion-optimised parameters (valence, energy, danceability).

System Design

How it's built

A layered architecture where each modal pipeline feeds into a shared emotion space before playlist resolution.

🖥️

Frontend Layer

React.js SPA with modular panel components, real-time webcam stream, Web Audio API integration, and animated UI powered by CSS keyframes.

🧠

AI Model Layer

Five parallel emotion classifiers — CNN (face), VADER NLP (text), spectral ML (voice), colour ML (image), and a meta-ensemble for fusion mode.

🎵

Spotify API Layer

Emotion labels map to Spotify audio features: valence (happiness), energy, tempo, danceability. Playlist queries are dynamically constructed per emotion profile.

📊

Emotion Space

All five models output into a shared 7-dimensional emotion vector (joy, sadness, anger, fear, disgust, surprise, neutral) before playlist resolution.

✍️

Generative Layer

A GPT-style language model fine-tuned on lyric datasets generates mood-matched song lyrics using mood + genre + theme as conditioning inputs.

🔀

Fusion Ensemble

Weighted soft-voting across available model outputs. Confidence scores modulate weights so high-confidence models contribute more to the final label.

Midterm vs End Term

What's new in v2

Feature	Model Type	Midterm v1	End Term v2
Facial Emotion Detection	CNN / Face API	✓ Existed	✓ Improved
Spotify Playlist Recommendation	Rule-based	✓ Existed	✓ Improved
Futuristic Animated UI	CSS / React	✓ Existed	✓ Enhanced
Text Sentiment Analysis	NEW VADER NLP	✗	✓ Added
Image Mood Detection	NEW Vision ML	✗	✓ Added
Voice Tone Analysis	NEW Audio DSP	✗	✓ Added
AI Lyric Generator	NEW Generative LLM	✗	✓ Added
Multimodal Fusion Mode	NEW Ensemble	✗	✓ Added
Per-model Confidence Scores	NEW	✗	✓ Added
Real-time Audio Waveform	NEW Web Audio API	✗	✓ Added

Feel the music. 5 AI models. One vibe.

Facial Emotion AI

Text Sentiment NLP

Image Mood Detection

Voice Tone Analysis

AI Lyric Generator

Fusion Mode

Input Collection Multi-modal

Per-model Inference Parallel

Ensemble Voting Weighted

Playlist Recommendation Spotify API

Frontend Layer

AI Model Layer

Spotify API Layer

Emotion Space

Generative Layer

Fusion Ensemble

Feel the music.
5 AI models. One vibe.