Skip to main content

Overview

The quality of your conditions directly impacts detection accuracy. This guide covers how to write conditions that work well with vision models.

The Golden Rule: Yes/No Questions

Your condition must be a yes/no question. The VLM will answer “yes” or “no” based on what it sees in the frame. Good (yes/no questions):
  • “Is it snowing?”
  • “Are there people visible?”
  • “Is there a traffic accident?”
  • “Is the street wet?”
Bad (counting, descriptive):
  • “How many cars are visible?” (counting)
  • “Describe the weather” (descriptive)
  • “What color is the sky?” (open-ended)

Write with Context

Add details about what you’re looking for. The more specific, the better:

❌ Too Vague

"Is there snow?"
Vision models may interpret this broadly. Is a single snowflake enough? What if it’s just a reflection?

✅ Better

"Is it snowing? Look for falling snowflakes or white accumulation on surfaces."
Now the VLM knows exactly what “snow” means in your context.

More Examples

Weather Monitoring

"Is it raining? Look for visible water droplets, wet surfaces, or people with umbrellas."

"Is it foggy? Look for reduced visibility where distant objects are obscured."

"Is it windy? Look for things being blown: leaves, dust, or people struggling."

Traffic Monitoring

"Is there a traffic accident or collision? Look for damaged vehicles, emergency vehicles, or unusual vehicle positions."

"Is there heavy congestion? Look for bumper-to-bumper vehicles or slow-moving traffic."

"Is a traffic light red? Identify the color of the active traffic signal."

Crowd & Activity

"Are there many people visible? Look for crowds of 10+ people in the frame."

"Is anyone at this location? Look for any human figures, even distant ones."

"Are people running? Look for rapid movement or panic behavior."

Security

"Is someone at the entrance? Look for any human figure near the door."

"Is equipment being moved? Look for people carrying or moving devices."

"Is the area damaged? Look for broken windows, graffiti, or displaced items."

Context Matters

Add specifics about the location or scenario:
"Is it snowing in Pittsburgh?" ← location matters
"Is someone climbing the fence?" ← action + object
"Are cars parked illegally here?" ← location-specific

Testing Your Condition

Use Check Once to test conditions before starting a full Live Monitor job:
# Test the condition
curl -X POST https://trio.machinefi.com/check-once \
  -H "Content-Type: application/json" \
  -d '{
    "stream_url": "https://youtube.com/watch?v=YOUR_STREAM",
    "condition": "Is it snowing? Look for falling snowflakes or white accumulation."
  }'
Check the response:
  • If explanation makes sense → condition is good
  • If explanation seems off-topic → refine your condition

Common Mistakes

1. Assuming VLM Knows Your Domain

❌ “Is the status light green?” ✅ “Is the status indicator light displaying green color?“

2. Relying on Color Alone

❌ “Is the car red?” ✅ “Is there a red car visible? Look for a vehicle with red coloring.”

3. Negation

❌ “Is the lot not empty?” ✅ “Are there people visible in the lot?” Positive conditions work better than negations.

4. Multiple Conditions in One

❌ “Are there accidents or delays?” ✅ Just pick one: “Is there a traffic accident?“

5. Assuming Real-Time Knowledge

❌ “Has it been snowing for more than 30 minutes?” ✅ “Is it currently snowing?” VLMs see one frame at a time, not time series.

Refining Over Time

Start broad, then refine:
  1. Initial: “Is there snow?”
  2. After testing: “Is there snow? Look for white precipitation.”
  3. After running: “Is it actively snowing? Look for falling snowflakes, not just snow on ground.”
Each iteration improves accuracy.