Mellifera: Agricultural ML Platform

Production ML classifiers for precision beekeeping with voice-first data collection and TensorFlow.js edge inference

TensorFlow.jsTime Series ClassificationGeospatial FeaturesEdge Inference

92%

Treatment Timing Accuracy

0.87

Survival Prediction AUC

89%

Feeding Type Accuracy

<25KB

Total Model Size

View on GitHub 🐝 Jay's Bees

The ML Problem

Agricultural Decision Support Under Uncertainty

Beekeeping presents a compelling ML challenge: sparse, noisy observations (inspections every 1-2 weeks), delayed feedback (winter survival validated 3-6 months later), and high-stakes decisions(wrong treatment timing can kill a colony). The problem structure mirrors clinical decision support.

Key challenges: class imbalance (most colonies survive), temporal dependencies (treatment history matters), geospatial variation (climate affects optimal timing), and the need for interpretable predictions.

Why this matters: Honeybees pollinate ~35% of global food crops. North American beekeepers lose 30-40% of colonies annually. Better prediction models directly impact food security by enabling targeted intervention on high-risk colonies.

🦟

Mite Treatment Timing Classifier

Binary classification with temporal and geospatial features

Feature Engineering

• Temporal: Day of year (cyclical sin/cos encoding), days since treatment
• Geospatial: Lat/lon (sin/cos encoded), elevation—captures regional climate
• Time series: Mite count trajectory with weighted averaging by sample size

Model Output

• Primary: treat_now vs wait (binary)
• Auxiliary: Optimal treatment window
• Calibrated confidence via Platt scaling

Why Geospatial Features Matter

Optimal treatment timing varies by 2-4 weeks between southern Texas and northern Minnesota. Encoding latitude/longitude allows the model to learn regional patterns without explicit climate zone labels. Elevation further refines predictions—mountain apiaries have shorter seasons than valley locations.

Feature Encoding Mathematics

Cyclical Temporal Encoding:

where = day of year. This encoding ensures December 31 is close to January 1 in feature space.

Geospatial Encoding:

where = latitude, = longitude, = elevation (meters). Sin/cos encoding handles the wraparound at ±180° longitude.

Weighted Mite Load:

where = mite count at time , = sample size. Square root weighting prevents large samples from dominating.

MiteTreatmentClassifier

Full implementation with geospatial encoding

60 linesTypescriptTensorFlow.js

1// Mite Treatment Timing Classifier
2// Binary classification with temporal and geospatial features
3
4interface MiteFeatures {
5  // Temporal features (derived from timestamps)
6  dayOfYear: number;              // 1-365, cyclical encoding
7  daysSinceLastTreatment: number;
8  treatmentRoundsThisSeason: number;
9  
10  // Geospatial features (from apiary location)
11  latitude: number;               // Affects season timing
12  longitude: number;
13  elevationMeters: number;
14  
15  // Mite trajectory (time series)
16  miteCountHistory: number[];     // Last 5 readings
17  miteCountSampleSizes: number[]; // Statistical confidence
18  miteLoadTrend: number;          // Linear regression slope
19}
20
21class MiteTreatmentClassifier {
22  async predict(features: MiteFeatures): Promise<{
23    recommendation: 'treat_now' | 'wait';
24    confidence: number;
25    treatmentWindow: { start: Date; end: Date };
26  }> {
27    // Encode geospatial features for regional patterns
28    const geoEncoding = [
29      Math.sin(features.latitude * Math.PI / 180),
30      Math.cos(features.latitude * Math.PI / 180),
31      Math.sin(features.longitude * Math.PI / 180),
32      Math.cos(features.longitude * Math.PI / 180),
33      features.elevationMeters / 2000
34    ];
35    
36    // Cyclical encoding for day of year
37    const temporalEncoding = [
38      Math.sin(2 * Math.PI * features.dayOfYear / 365),
39      Math.cos(2 * Math.PI * features.dayOfYear / 365),
40      features.daysSinceLastTreatment / 90,
41      features.treatmentRoundsThisSeason / 3
42    ];
43    
44    const input = tf.tensor2d([[
45      ...temporalEncoding,
46      ...geoEncoding,
47      this.calculateWeightedMiteLoad(features),
48      features.miteLoadTrend
49    ]]);
50    
51    const prediction = this.model.predict(input) as tf.Tensor;
52    const [treatProb] = await prediction.data();
53    
54    return {
55      recommendation: treatProb > 0.5 ? 'treat_now' : 'wait',
56      confidence: this.calibrateConfidence(treatProb),
57      treatmentWindow: this.decodeWindow(features)
58    };
59  }
60}

❄️

Winter Survival Risk Model

Probabilistic prediction with uncertainty quantification

Feature Engineering

• Geospatial: Location determines winter severity (lat + elevation → cold exposure)
• Temporal: Inspection date relative to first frost (location-dependent)
• Colony health: Mite load, queen age, population, brood pattern
• Resources: Honey frames, pollen stores, preparation status

Model Output

• Survival probability (calibrated via isotonic regression)
• 95% confidence interval via bootstrap
• Risk factors with SHAP-like attribution
• Actionable recommendations

Handling Delayed Feedback

Ground truth arrives 3-6 months after prediction. We use temporal cross-validation—training on past seasons, validating on future—to prevent data leakage. The model outputs uncertainty estimates because fall predictions shouldn't claim false precision about spring outcomes.

Uncertainty Quantification

Isotonic Calibration:

where is a monotonic step function learned from validation data, ensuring (perfect calibration).

Bootstrap Confidence Interval:

where is the -quantile of bootstrap predictions. Each bootstrap sample resamples features with replacement.

Risk Categorization:

Thresholds calibrated against historical colony loss rates in the training population.

WinterSurvivalPredictor

Probabilistic model with uncertainty quantification

51 linesTypescriptTensorFlow.js

1// Winter Survival Risk Model
2// Probabilistic prediction with uncertainty quantification
3
4interface WinterFeatures {
5  // Geospatial context
6  latitude: number;
7  longitude: number;
8  elevationMeters: number;
9  
10  // Temporal context
11  inspectionDayOfYear: number;
12  
13  // Colony health metrics
14  miteLoadPerHundred: number;
15  queenAgeMonths: number;
16  estimatedPopulation: number;
17  
18  // Resource assessment
19  honeyFrames: number;
20  pollenFrames: number;
21  
22  // Winter preparation
23  isInsulated: boolean;
24  ventilationScore: 1 | 2 | 3 | 4 | 5;
25}
26
27class WinterSurvivalPredictor {
28  async predict(features: WinterFeatures): Promise<{
29    survivalProbability: number;
30    confidenceInterval: [number, number];
31    riskLevel: 'low' | 'moderate' | 'high' | 'critical';
32    riskFactors: string[];
33  }> {
34    const encoded = this.encodeFeatures(features);
35    const rawPred = this.model.predict(encoded) as tf.Tensor;
36    const [rawProb] = await rawPred.data();
37    
38    // Calibrate with isotonic regression
39    const calibratedProb = this.calibrationModel.transform(rawProb);
40    
41    // Bootstrap confidence interval
42    const ci = await this.bootstrapCI(features, 0.95);
43    
44    return {
45      survivalProbability: calibratedProb,
46      confidenceInterval: ci,
47      riskLevel: this.categorizeRisk(calibratedProb),
48      riskFactors: this.attributeRisk(features)
49    };
50  }
51}

🍯

Feeding Recommendation System

Multi-output: feed type (classification) + amount (regression)

Feature Engineering

• Temporal: Day of year (cyclical), days since last feed
• Geospatial: Lat/lon for regional nectar flow patterns
• Environmental: 7-day temp history + forecast from weather API
• Colony state: Honey stores, brood pattern, nectar flow estimate

Model Output

• Feed type: 1:1 syrup, 2:1 syrup, candy, pollen sub, none
• Amount: Pounds or quarts (regression)
• Urgency: immediate / soon / optional / not_needed
• Next assessment date

Immediate Feedback Loop

Unlike winter survival, feeding recommendations get rapid feedback—the colony's response is visible within days. This enables faster model iteration and higher confidence. The system learns colony-specific preferences and adjusts recommendations based on historical response patterns.

Multi-Output Architecture

Shared Feature Encoder:

where = temporal features, = geospatial features, = colony state features.

Classification Head (Feed Type):

Softmax over 5 classes: 1:1 syrup, 2:1 syrup, candy, pollen substitute, none.

Regression Head (Amount):

ReLU activation ensures non-negative amount predictions. Output in pounds/quarts.

FeedingRecommender

Multi-output classification + regression

59 linesTypescriptTensorFlow.js

1// Feeding Recommendation System
2// Multi-output: feed type (classification) + amount (regression)
3
4interface FeedingFeatures {
5  // Temporal features
6  dayOfYear: number;
7  daysSinceLastFeed: number;
8  
9  // Geospatial features
10  latitude: number;
11  longitude: number;
12  
13  // Environmental (from weather API)
14  avgTempLast7Days: number;
15  forecastTempNext7Days: number;
16  
17  // Colony state
18  honeyStores: number;
19  broodPattern: 'expanding' | 'stable' | 'contracting';
20  estimatedNectarFlow: 'none' | 'light' | 'moderate' | 'heavy';
21}
22
23class FeedingRecommender {
24  async recommend(features: FeedingFeatures): Promise<{
25    feedType: 'syrup_1_1' | 'syrup_2_1' | 'candy' | 'pollen_sub' | 'none';
26    amount: number;
27    urgency: 'immediate' | 'soon' | 'optional' | 'not_needed';
28    nextAssessmentDate: Date;
29  }> {
30    // Cyclical temporal encoding
31    const temporal = [
32      Math.sin(2 * Math.PI * features.dayOfYear / 365),
33      Math.cos(2 * Math.PI * features.dayOfYear / 365),
34      features.daysSinceLastFeed / 30
35    ];
36    
37    // Geospatial for regional nectar flow patterns
38    const geo = this.encodeLocation(features.latitude, features.longitude);
39    
40    const encoded = tf.tensor2d([[
41      ...temporal, ...geo,
42      (features.avgTempLast7Days - 50) / 30,
43      features.honeyStores / 10,
44      this.encodeNectarFlow(features.estimatedNectarFlow)
45    ]]);
46    
47    const [typeProbs, amount] = await Promise.all([
48      this.typeModel.predict(encoded).data(),
49      this.amountModel.predict(encoded).data()
50    ]);
51    
52    return {
53      feedType: this.decodeFeedType(typeProbs),
54      amount: Math.round(amount[0] * 10) / 10,
55      urgency: this.calculateUrgency(features),
56      nextAssessmentDate: this.calculateNextCheck(features)
57    };
58  }
59}

Voice-First Data Collection

ML models are only as good as their training data. Mellifera solves the data collection problem with a bidirectional voice interface—beekeepers speak observations while working, and the system speaks back confirmations and ML-generated recommendations.

Traditional Data Entry

• Remove gloves to use phone
• Risk stings on exposed hands
• Squint at screen in sunlight
• ~8 minutes per hive

Voice with Mellifera

• Keep gloves on, stay protected
• Speak naturally while working
• TTS speaks ML predictions back
• Fully hands-free

Voice Command Processing

LLM-powered NLU for structured data extraction

26 linesJavascriptExpress.js

1// Voice Command Processing with LLM-powered NLU
2router.post('/voice-command', auth, async (req, res) => {
3  const { audioTranscript, context } = req.body;
4  
5  const systemPrompt = `Extract structured observations from voice notes.
6Context: Apiary at (${context.lat}, ${context.lon}), Hive: ${context.hiveName}
7
8Extract if mentioned:
9- Mite count and sample size
10- Population estimate (frames of bees)
11- Brood pattern, honey stores, queen status
12- Treatments applied, feeding done
13
14Return JSON with extracted fields and confidence scores.`;
15
16  const extraction = await llm.complete({
17    system: systemPrompt,
18    user: audioTranscript,
19    responseFormat: 'json'
20  });
21  
22  // Trigger ML predictions with new data
23  const predictions = await runMLPredictions(context.hiveId);
24  
25  res.json({ extraction, predictions });
26});

System Architecture

Full-stack MERN application with voice interface, LLM-powered NLU, and ML prediction API. Containerized with Docker and deployable to Kubernetes.

Data Model & API

MongoDB schema with full REST API documented via Swagger. The data model captures the natural hierarchy of beekeeping operations.

Core Entities

• Users - Auth, preferences, OAuth
• Apiaries - Locations, metadata
• Hives - Individual colony tracking
• Boxes - Hive body components

Activity Records

• Inspections - Observations, scores
• Treatments - Mite control, medications
• Feedings - Supplemental nutrition
• Queens - Lineage, performance

API Design: Full REST API with Swagger documentation. JWT authentication with Google OAuth support. Specialized endpoints for voice commands, ML predictions, and aggregate reporting.

Designed for Real Field Conditions

Most software assumes a user sitting at a desk with keyboard and mouse. Beekeepers work in protective suits with thick leather gloves, mesh veils obscuring their vision, surrounded by thousands of stinging insects, often in direct sunlight that makes screens unreadable.

Traditional Data Entry

• Remove gloves to use phone
• Risk stings on exposed hands
• Squint at screen in sunlight
• Navigate complex forms
• ~8 minutes per hive

Bidirectional Voice with Mellifera

• Keep gloves on, stay protected
• Speak naturally while working
• TTS speaks responses back
• LLM extracts structured data
• Fully hands-free operation

The UX insight: Voice interfaces aren't just convenient—they're sometimes the only viable interface. This project taught me to start with user constraints, not technology capabilities. The same principle applies to clinical settings: surgeons can't touch keyboards mid-procedure, nurses have hands full with patients.

Explore the Project

View on GitHubFull source code, voice components, NLU See the Healthcare ConnectionDocument Understanding

Interested in voice-first interfaces or agricultural tech? Email me or connect on LinkedIn