The Challenge: Personalized Learning at Scale
At CSEWhy, we faced a classic ed-tech problem: how do you generate personalized mock tests for UPSC aspirants in real-time?
UPSC preparation is notorious for its vast syllabus spanning history, geography, polity, economics, science, and current affairs. Each aspirant has different strengths and weaknesses, and they need targeted practice that adapts to their learning progress. Traditional static question banks simply don’t cut it.
The problem statement was clear:
Generate personalized mock tests for UPSC students on runtime that adapt to their learning patterns and knowledge gaps.
Why This Matters for Developers
Before diving into our solution, let me explain why this case study matters for the unprompt.dev community. This isn’t just about ed-tech it’s about practical AI implementation in production environments. The challenges we faced and solutions we developed apply to any scenario where you need:
- Real-time content generation
- Domain-specific AI models
- Scalable personalization
- Quality control for AI-generated content
Our Solution Architecture
Phase 1: Foundation Model Selection
We started with GPT-3.5 Turbo as our foundation model. Why GPT-3.5 and not GPT-4?
// Cost-effectiveness calculation
const gpt35Cost = 0.002; // per 1K tokens
const gpt4Cost = 0.03; // per 1K tokens
const dailyQuestions = 10000;
const avgTokensPerQuestion = 150;
// Monthly cost comparison
const gpt35Monthly = (dailyQuestions * avgTokensPerQuestion * 30 * gpt35Cost) / 1000;
const gpt4Monthly = (dailyQuestions * avgTokensPerQuestion * 30 * gpt4Cost) / 1000;
console.log(`GPT-3.5: $${gpt35Monthly}`); // ~$90
console.log(`GPT-4: $${gpt4Monthly}`); // ~$1,350
For our use case, GPT-3.5’s quality was sufficient, and the cost difference was massive at scale.
Phase 2: Dataset Curation
This is where the magic happened. We didn’t just rely on the foundation model, we created our own curated dataset.
Step 1: Historical Data Collection
- Collected 15+ years of UPSC Prelims questions
- Organized by subject, year, and difficulty level
- Tagged questions by topic and subtopic
Step 2: Pattern Analysis
# Simplified pattern analysis approach
patterns = {
'question_structure': analyze_question_formats(),
'answer_distribution': analyze_option_patterns(),
'difficulty_progression': analyze_difficulty_curves(),
'topic_correlation': analyze_topic_relationships()
}
Step 3: Synthetic Dataset Generation Using our pattern analysis, we generated synthetic questions that maintained the authentic UPSC style while covering knowledge gaps in our historical dataset.
Phase 3: Fine-tuning Strategy
Here’s where we moved from generic AI to domain-specific intelligence:
# Training Configuration
model: gpt-3.5-turbo
training_data:
- upsc_prelims_2009_2024.jsonl
- synthetic_questions_v2.jsonl
- explanations_dataset.jsonl
Our fine-tuned model learned to:
- Generate questions in authentic UPSC format
- Create plausible distractors (wrong options)
- Provide detailed explanations
- Maintain appropriate difficulty levels
The Results: Numbers That Matter
After fine-tuning, our model achieved:
- 88% expert approval rating from UPSC mentors
- 40% improvement in mock test engagement
- 25% increase in average study session duration
- 80% reduction in content creation time for our team
Technical Implementation Insights
Challenge 1: Quality Control
Problem: How do you ensure AI-generated questions meet UPSC standards?
Solution: Multi-layer validation pipeline
interface QuestionValidation {
factualAccuracy: boolean;
upscRelevance: number; // 0-1 score
difficultyLevel: 'easy' | 'medium' | 'hard';
subjectAlignment: boolean;
}
const validateQuestion = async (question: GeneratedQuestion): Promise<QuestionValidation> => {
// Implementation details in next blog post
}
Challenge 2: Personalization
Problem: How do you make each test unique to the user’s learning pattern?
Solution: Dynamic prompt engineering based on user analytics
const generatePersonalizedPrompt = (userProfile: UserAnalytics) => {
return `Generate a UPSC Prelims question focusing on ${userProfile.weakSubjects.join(', ')}
with difficulty level ${userProfile.currentLevel}
avoiding topics: ${userProfile.masteredTopics.join(', ')}`;
}
Key Takeaways for Developers
- Start with Business Goals: We didn’t build AI for AI’s sake. we solved a specific user problem
- Domain Data is King: Generic models + domain-specific data > Powerful models alone
- Cost-Performance Balance: Sometimes the “best” model isn’t the right model for production
- Validation is Critical: AI-generated content needs rigorous quality controls
- User Analytics Drive Personalization: The AI is only as good as the data you feed it
What’s Next?
This was just the model training part. In our next blog post, I’ll dive into:
- Production deployment strategies: How we integrated this model into our React Native app
- Caching and optimization: Handling 10K+ daily requests efficiently
- Real-time personalization: Dynamic prompt engineering based on user behavior
- Monitoring and maintenance: Keeping AI quality high in production
The Bigger Picture
Building bhAi taught us that successful AI integration isn’t about using the latest model it’s about understanding your domain, curating quality data, and solving real user problems.
This experience directly inspired unprompt.dev. Developers need practical, battle-tested approaches to AI implementation, not just theoretical knowledge.
Have you implemented AI in production? What challenges did you face with quality control and personalization? Share your experiences. I’d love to learn from your journey.
Next week: “From Model to Mobile: Deploying AI in React Native at Scale” - Stay tuned!