Mellifera: Agricultural ML Platform
Production ML classifiers for precision beekeeping with voice-first data collection and TensorFlow.js edge inference
The ML Problem
Agricultural Decision Support Under Uncertainty
Beekeeping presents a compelling ML challenge: sparse, noisy observations (inspections every 1-2 weeks), delayed feedback (winter survival validated 3-6 months later), and high-stakes decisions(wrong treatment timing can kill a colony). The problem structure mirrors clinical decision support.
Key challenges: class imbalance (most colonies survive), temporal dependencies (treatment history matters), geospatial variation (climate affects optimal timing), and the need for interpretable predictions.
Why this matters: Honeybees pollinate ~35% of global food crops. North American beekeepers lose 30-40% of colonies annually. Better prediction models directly impact food security by enabling targeted intervention on high-risk colonies.
Mite Treatment Timing Classifier
Binary classification with temporal and geospatial features
Feature Engineering
- ⢠Temporal: Day of year (cyclical sin/cos encoding), days since treatment
- ⢠Geospatial: Lat/lon (sin/cos encoded), elevationācaptures regional climate
- ⢠Time series: Mite count trajectory with weighted averaging by sample size
Model Output
- ⢠Primary: treat_now vs wait (binary)
- ⢠Auxiliary: Optimal treatment window
- ⢠Calibrated confidence via Platt scaling
Why Geospatial Features Matter
Optimal treatment timing varies by 2-4 weeks between southern Texas and northern Minnesota. Encoding latitude/longitude allows the model to learn regional patterns without explicit climate zone labels. Elevation further refines predictionsāmountain apiaries have shorter seasons than valley locations.
Feature Encoding Mathematics
Cyclical Temporal Encoding:
where = day of year. This encoding ensures December 31 is close to January 1 in feature space.
Geospatial Encoding:
where = latitude, = longitude, = elevation (meters). Sin/cos encoding handles the wraparound at ±180° longitude.
Weighted Mite Load:
where = mite count at time , = sample size. Square root weighting prevents large samples from dominating.
1// Mite Treatment Timing Classifier
2// Binary classification with temporal and geospatial features
3
4interface MiteFeatures {
5 // Temporal features (derived from timestamps)
6 dayOfYear: number; // 1-365, cyclical encoding
7 daysSinceLastTreatment: number;
8 treatmentRoundsThisSeason: number;
9
10 // Geospatial features (from apiary location)
11 latitude: number; // Affects season timing
12 longitude: number;
13 elevationMeters: number;
14
15 // Mite trajectory (time series)
16 miteCountHistory: number[]; // Last 5 readings
17 miteCountSampleSizes: number[]; // Statistical confidence
18 miteLoadTrend: number; // Linear regression slope
19}
20
21class MiteTreatmentClassifier {
22 async predict(features: MiteFeatures): Promise<{
23 recommendation: 'treat_now' | 'wait';
24 confidence: number;
25 treatmentWindow: { start: Date; end: Date };
26 }> {
27 // Encode geospatial features for regional patterns
28 const geoEncoding = [
29 Math.sin(features.latitude * Math.PI / 180),
30 Math.cos(features.latitude * Math.PI / 180),
31 Math.sin(features.longitude * Math.PI / 180),
32 Math.cos(features.longitude * Math.PI / 180),
33 features.elevationMeters / 2000
34 ];
35
36 // Cyclical encoding for day of year
37 const temporalEncoding = [
38 Math.sin(2 * Math.PI * features.dayOfYear / 365),
39 Math.cos(2 * Math.PI * features.dayOfYear / 365),
40 features.daysSinceLastTreatment / 90,
41 features.treatmentRoundsThisSeason / 3
42 ];
43
44 const input = tf.tensor2d([[
45 ...temporalEncoding,
46 ...geoEncoding,
47 this.calculateWeightedMiteLoad(features),
48 features.miteLoadTrend
49 ]]);
50
51 const prediction = this.model.predict(input) as tf.Tensor;
52 const [treatProb] = await prediction.data();
53
54 return {
55 recommendation: treatProb > 0.5 ? 'treat_now' : 'wait',
56 confidence: this.calibrateConfidence(treatProb),
57 treatmentWindow: this.decodeWindow(features)
58 };
59 }
60}Winter Survival Risk Model
Probabilistic prediction with uncertainty quantification
Feature Engineering
- ⢠Geospatial: Location determines winter severity (lat + elevation ā cold exposure)
- ⢠Temporal: Inspection date relative to first frost (location-dependent)
- ⢠Colony health: Mite load, queen age, population, brood pattern
- ⢠Resources: Honey frames, pollen stores, preparation status
Model Output
- ⢠Survival probability (calibrated via isotonic regression)
- ⢠95% confidence interval via bootstrap
- ⢠Risk factors with SHAP-like attribution
- ⢠Actionable recommendations
Handling Delayed Feedback
Ground truth arrives 3-6 months after prediction. We use temporal cross-validationātraining on past seasons, validating on futureāto prevent data leakage. The model outputs uncertainty estimates because fall predictions shouldn't claim false precision about spring outcomes.
Uncertainty Quantification
Isotonic Calibration:
where is a monotonic step function learned from validation data, ensuring (perfect calibration).
Bootstrap Confidence Interval:
where is the -quantile of bootstrap predictions. Each bootstrap sample resamples features with replacement.
Risk Categorization:
Thresholds calibrated against historical colony loss rates in the training population.
1// Winter Survival Risk Model
2// Probabilistic prediction with uncertainty quantification
3
4interface WinterFeatures {
5 // Geospatial context
6 latitude: number;
7 longitude: number;
8 elevationMeters: number;
9
10 // Temporal context
11 inspectionDayOfYear: number;
12
13 // Colony health metrics
14 miteLoadPerHundred: number;
15 queenAgeMonths: number;
16 estimatedPopulation: number;
17
18 // Resource assessment
19 honeyFrames: number;
20 pollenFrames: number;
21
22 // Winter preparation
23 isInsulated: boolean;
24 ventilationScore: 1 | 2 | 3 | 4 | 5;
25}
26
27class WinterSurvivalPredictor {
28 async predict(features: WinterFeatures): Promise<{
29 survivalProbability: number;
30 confidenceInterval: [number, number];
31 riskLevel: 'low' | 'moderate' | 'high' | 'critical';
32 riskFactors: string[];
33 }> {
34 const encoded = this.encodeFeatures(features);
35 const rawPred = this.model.predict(encoded) as tf.Tensor;
36 const [rawProb] = await rawPred.data();
37
38 // Calibrate with isotonic regression
39 const calibratedProb = this.calibrationModel.transform(rawProb);
40
41 // Bootstrap confidence interval
42 const ci = await this.bootstrapCI(features, 0.95);
43
44 return {
45 survivalProbability: calibratedProb,
46 confidenceInterval: ci,
47 riskLevel: this.categorizeRisk(calibratedProb),
48 riskFactors: this.attributeRisk(features)
49 };
50 }
51}Feeding Recommendation System
Multi-output: feed type (classification) + amount (regression)
Feature Engineering
- ⢠Temporal: Day of year (cyclical), days since last feed
- ⢠Geospatial: Lat/lon for regional nectar flow patterns
- ⢠Environmental: 7-day temp history + forecast from weather API
- ⢠Colony state: Honey stores, brood pattern, nectar flow estimate
Model Output
- ⢠Feed type: 1:1 syrup, 2:1 syrup, candy, pollen sub, none
- ⢠Amount: Pounds or quarts (regression)
- ⢠Urgency: immediate / soon / optional / not_needed
- ⢠Next assessment date
Immediate Feedback Loop
Unlike winter survival, feeding recommendations get rapid feedbackāthe colony's response is visible within days. This enables faster model iteration and higher confidence. The system learns colony-specific preferences and adjusts recommendations based on historical response patterns.
Multi-Output Architecture
Shared Feature Encoder:
where = temporal features, = geospatial features, = colony state features.
Classification Head (Feed Type):
Softmax over 5 classes: 1:1 syrup, 2:1 syrup, candy, pollen substitute, none.
Regression Head (Amount):
ReLU activation ensures non-negative amount predictions. Output in pounds/quarts.
1// Feeding Recommendation System
2// Multi-output: feed type (classification) + amount (regression)
3
4interface FeedingFeatures {
5 // Temporal features
6 dayOfYear: number;
7 daysSinceLastFeed: number;
8
9 // Geospatial features
10 latitude: number;
11 longitude: number;
12
13 // Environmental (from weather API)
14 avgTempLast7Days: number;
15 forecastTempNext7Days: number;
16
17 // Colony state
18 honeyStores: number;
19 broodPattern: 'expanding' | 'stable' | 'contracting';
20 estimatedNectarFlow: 'none' | 'light' | 'moderate' | 'heavy';
21}
22
23class FeedingRecommender {
24 async recommend(features: FeedingFeatures): Promise<{
25 feedType: 'syrup_1_1' | 'syrup_2_1' | 'candy' | 'pollen_sub' | 'none';
26 amount: number;
27 urgency: 'immediate' | 'soon' | 'optional' | 'not_needed';
28 nextAssessmentDate: Date;
29 }> {
30 // Cyclical temporal encoding
31 const temporal = [
32 Math.sin(2 * Math.PI * features.dayOfYear / 365),
33 Math.cos(2 * Math.PI * features.dayOfYear / 365),
34 features.daysSinceLastFeed / 30
35 ];
36
37 // Geospatial for regional nectar flow patterns
38 const geo = this.encodeLocation(features.latitude, features.longitude);
39
40 const encoded = tf.tensor2d([[
41 ...temporal, ...geo,
42 (features.avgTempLast7Days - 50) / 30,
43 features.honeyStores / 10,
44 this.encodeNectarFlow(features.estimatedNectarFlow)
45 ]]);
46
47 const [typeProbs, amount] = await Promise.all([
48 this.typeModel.predict(encoded).data(),
49 this.amountModel.predict(encoded).data()
50 ]);
51
52 return {
53 feedType: this.decodeFeedType(typeProbs),
54 amount: Math.round(amount[0] * 10) / 10,
55 urgency: this.calculateUrgency(features),
56 nextAssessmentDate: this.calculateNextCheck(features)
57 };
58 }
59}Voice-First Data Collection
ML models are only as good as their training data. Mellifera solves the data collection problem with a bidirectional voice interfaceābeekeepers speak observations while working, and the system speaks back confirmations and ML-generated recommendations.
Traditional Data Entry
- ⢠Remove gloves to use phone
- ⢠Risk stings on exposed hands
- ⢠Squint at screen in sunlight
- ⢠~8 minutes per hive
Voice with Mellifera
- ⢠Keep gloves on, stay protected
- ⢠Speak naturally while working
- ⢠TTS speaks ML predictions back
- ⢠Fully hands-free
1// Voice Command Processing with LLM-powered NLU
2router.post('/voice-command', auth, async (req, res) => {
3 const { audioTranscript, context } = req.body;
4
5 const systemPrompt = `Extract structured observations from voice notes.
6Context: Apiary at (${context.lat}, ${context.lon}), Hive: ${context.hiveName}
7
8Extract if mentioned:
9- Mite count and sample size
10- Population estimate (frames of bees)
11- Brood pattern, honey stores, queen status
12- Treatments applied, feeding done
13
14Return JSON with extracted fields and confidence scores.`;
15
16 const extraction = await llm.complete({
17 system: systemPrompt,
18 user: audioTranscript,
19 responseFormat: 'json'
20 });
21
22 // Trigger ML predictions with new data
23 const predictions = await runMLPredictions(context.hiveId);
24
25 res.json({ extraction, predictions });
26});System Architecture
Full-stack MERN application with voice interface, LLM-powered NLU, and ML prediction API. Containerized with Docker and deployable to Kubernetes.
Data Model & API
MongoDB schema with full REST API documented via Swagger. The data model captures the natural hierarchy of beekeeping operations.
Core Entities
- ⢠Users - Auth, preferences, OAuth
- ⢠Apiaries - Locations, metadata
- ⢠Hives - Individual colony tracking
- ⢠Boxes - Hive body components
Activity Records
- ⢠Inspections - Observations, scores
- ⢠Treatments - Mite control, medications
- ⢠Feedings - Supplemental nutrition
- ⢠Queens - Lineage, performance
API Design: Full REST API with Swagger documentation. JWT authentication with Google OAuth support. Specialized endpoints for voice commands, ML predictions, and aggregate reporting.
Designed for Real Field Conditions
Most software assumes a user sitting at a desk with keyboard and mouse. Beekeepers work in protective suits with thick leather gloves, mesh veils obscuring their vision, surrounded by thousands of stinging insects, often in direct sunlight that makes screens unreadable.
Traditional Data Entry
- ⢠Remove gloves to use phone
- ⢠Risk stings on exposed hands
- ⢠Squint at screen in sunlight
- ⢠Navigate complex forms
- ⢠~8 minutes per hive
Bidirectional Voice with Mellifera
- ⢠Keep gloves on, stay protected
- ⢠Speak naturally while working
- ⢠TTS speaks responses back
- ⢠LLM extracts structured data
- ⢠Fully hands-free operation
The UX insight: Voice interfaces aren't just convenientāthey're sometimes the only viable interface. This project taught me to start with user constraints, not technology capabilities. The same principle applies to clinical settings: surgeons can't touch keyboards mid-procedure, nurses have hands full with patients.