Scheduled Tasks
Scheduled tasks allow you to automatically execute data collection according to preset times, achieving unattended automated data collection. AutoWDS supports flexible scheduling configuration to meet various automated collection needs.
What are Scheduled Tasks?
Scheduled tasks are functionality that automatically executes collection tasks according to set time rules. You can set tasks to run automatically at specific times, daily, weekly, or at custom time intervals.
Application Scenarios
Price Monitoring
- Daily scheduled collection of product prices
- Monitor price change trends
- Send notifications when prices are abnormal
Content Updates
- Scheduled collection of news articles
- Monitor website content updates
- Automatically archive historical data
Data Synchronization
- Regularly sync data to database
- Update Google Sheets
- Generate periodic reports
Competitive Analysis
- Scheduled collection of competitor information
- Track market dynamics
- Generate analysis reports
Creating Scheduled Tasks
Method 1: Set When Creating New Task
-
Create Collection Task
- Configure collection rules
- Test to confirm rules are correct
-
Enable Scheduling
- Find "Scheduled Execution" in task settings
- Enable scheduling function
-
Configure Execution Time
- Select execution frequency
- Or enter Cron expression
- Set start and end times
-
Save Task
- Save task configuration
- Task will execute automatically on schedule
Method 2: Add Scheduling to Existing Task
-
Open Task List
- Enter "Task Management"
- Find task to schedule
-
Edit Task
- Click task "Edit" button
- Enter task settings
-
Configure Scheduling
- Find "Schedule Settings" option
- Configure execution rules
-
Enable and Save
- Enable scheduling function
- Save settings
Scheduling Configuration Methods
Simple Mode
Use preset common time rules.
Daily Execution:
Execution Time: Daily 09:00
Description: Execute once daily at 9 AM
Weekly Execution:
Execution Time: Monday 09:00
Description: Execute Monday at 9 AM
Monthly Execution:
Execution Time: 1st of month 00:00
Description: Execute at midnight on 1st of each month
Interval Execution:
Execution Interval: Every 2 hours
Description: Execute every 2 hours
Cron Expression Mode
Use Cron expressions for more flexible scheduling configuration.
Cron Expression Format
* * * * *
β β β β β
β β β β ββββ Day of week (0-7, both 0 and 7 represent Sunday)
β β β ββββββ Month (1-12)
β β ββββββββ Day (1-31)
β ββββββββββ Hour (0-23)
ββββββββββββ Minute (0-59)
Common Expression Examples
Daily Scheduling:
# Daily at 9 AM
0 9 * * *
# Daily at 2:30 PM
30 14 * * *
# Daily at noon and 6 PM
0 12,18 * * *
Weekly Scheduling:
# Monday at 9 AM
0 9 * * 1
# Weekdays at 9 AM
0 9 * * 1-5
# Weekends at 10 AM
0 10 * * 0,6
Monthly Scheduling:
# 1st of month at midnight
0 0 1 * *
# 15th of month at noon
0 12 15 * *
# Last day of month
0 0 L * *
Interval Execution:
# Every hour
0 * * * *
# Every 2 hours
0 */2 * * *
# Every 30 minutes
*/30 * * * *
# Every 15 minutes
*/15 * * * *
Specific Time Periods:
# Weekdays 9 AM to 6 PM, every hour
0 9-18 * * 1-5
# Daily 6 AM to 10 PM, every 2 hours
0 6-22/2 * * *
Cron Expression Special Characters
| Character | Description | Example |
|---|---|---|
| * | Any value | * * * * * every minute |
| , | List values | 0 9,12,18 * * * at 9, 12, 18 |
| - | Range values | 0 9-17 * * * 9 AM to 5 PM |
| / | Interval values | */10 * * * * every 10 minutes |
| L | Last | 0 0 L * * last day of month |
| W | Weekday | 0 0 15W * * weekday closest to 15th |
| # | Nth occurrence | 0 0 * * 1#2 2nd Monday of month |
Advanced Configuration
Execution Window:
{
"cron": "0 9 * * *",
"window": {
"start": "2024-01-01",
"end": "2024-12-31"
}
}
Execution Count Limit:
{
"cron": "0 */2 * * *",
"maxExecutions": 10
}
Skip Holidays:
{
"cron": "0 9 * * 1-5",
"skipHolidays": true,
"holidayCalendar": "US"
}
Task Execution Management
View Execution Schedule
View task's next execution time and execution history.
Execution Schedule Info:
Task Name: Product Price Monitor
Cron Expression: 0 9 * * *
Next Execution: 2024-01-16 09:00:00
Last Execution: 2024-01-15 09:00:00
Execution Status: Success
Execution History
View task's historical execution records.
History Records Include:
- Execution time
- Execution status (success/failure)
- Collected data count
- Execution duration
- Error information (if failed)
Example:
2024-01-15 09:00:00 | Success | 1000 records | 5m30s
2024-01-14 09:00:00 | Success | 980 records | 5m15s
2024-01-13 09:00:00 | Failed | 0 records | Network timeout
2024-01-12 09:00:00 | Success | 1020 records | 5m45s
Manual Trigger
Even with scheduling configured, you can manually execute tasks anytime.
- Open task list
- Find task to execute
- Click "Execute Now" button
- Task starts running
Note: Manual execution doesn't affect scheduled plan.
Pause and Resume
Temporarily pause scheduled tasks, resume when needed.
Pause Task:
- Open task settings
- Turn off "Enable Scheduling" switch
- Save settings
Resume Task:
- Open task settings
- Turn on "Enable Scheduling" switch
- Save settings
Delete Scheduling
Completely remove task's scheduling configuration.
- Open task settings
- Enter "Schedule Settings"
- Click "Delete Schedule"
- Confirm deletion
Note: Deleting schedule doesn't delete the task itself.
Execution Notifications
Notification Types
Task Start Notification:
Title: Scheduled Task Started
Content: Task "Product Price Monitor" started execution
Time: 2024-01-15 09:00:00
Task Completion Notification:
Title: Scheduled Task Completed Successfully
Content: Task "Product Price Monitor" completed execution
Data: Collected 1000 records
Duration: 5m30s
Task Failure Notification:
Title: Scheduled Task Failed
Content: Task "Product Price Monitor" execution failed
Error: Network connection timeout
Suggestion: Please check network connection and retry
Notification Methods
Browser Notification:
- Chrome native notifications
- Requires notification permission authorization
- Visible when browser is open
Email Notification:
{
"notification": {
"email": {
"enabled": true,
"to": "your@email.com",
"onSuccess": false,
"onFailure": true
}
}
}
Webhook Notification:
{
"notification": {
"webhook": {
"enabled": true,
"url": "https://your-webhook-url.com",
"method": "POST",
"headers": {
"Authorization": "Bearer your-token"
}
}
}
}
DingTalk/Enterprise WeChat:
{
"notification": {
"dingtalk": {
"enabled": true,
"webhook": "https://oapi.dingtalk.com/robot/send?access_token=xxx",
"atMobiles": ["13812345678"]
}
}
}
Notification Conditions
Configure when to send notifications.
Always Notify:
{
"notifyOn": "always"
}
Only on Failure:
{
"notifyOn": "failure"
}
Only on Success:
{
"notifyOn": "success"
}
Conditional Notification:
{
"notifyOn": "condition",
"conditions": [
{
"type": "dataCount",
"operator": "<",
"value": 100,
"message": "Collected data count less than expected"
},
{
"type": "duration",
"operator": ">",
"value": 600,
"message": "Execution time exceeded 10 minutes"
}
]
}
Error Handling
Auto Retry
Automatically retry when task execution fails.
{
"retry": {
"enabled": true,
"maxAttempts": 3,
"interval": 300,
"backoff": "exponential"
}
}
Retry Strategies:
- Fixed Interval: Same interval for each retry
- Exponential Backoff: Gradually increasing retry intervals (5min, 10min, 20min)
- Linear Increase: Linearly increasing retry intervals (5min, 10min, 15min)
Failure Handling
Configure handling after task failure.
Continue Execution:
{
"onFailure": "continue",
"skipToNext": true
}
Pause Task:
{
"onFailure": "pause",
"requireManualResume": true
}
Send Alert:
{
"onFailure": "alert",
"alertChannels": ["email", "webhook"]
}
Timeout Settings
Set maximum execution time for tasks.
{
"timeout": {
"enabled": true,
"duration": 1800,
"action": "abort"
}
}
Timeout Actions:
- abort: Abort task
- continue: Continue execution
- savePartial: Save collected data
Concurrency Control
Prevent Duplicate Execution
Ensure same task doesn't execute multiple times simultaneously.
{
"concurrency": {
"preventOverlap": true,
"skipIfRunning": true
}
}
Handling Strategies:
- Skip: Skip current execution if task is running
- Queue: Wait for current execution to complete before executing
- Abort: Abort current execution and start new one
Task Priority
Set execution priority for multiple scheduled tasks.
{
"priority": "high",
"maxConcurrent": 3
}
Priority Levels:
- high: High priority, execute first
- normal: Normal priority
- low: Low priority, execute last
Resource Management
Execution Time Window Restrictions
Restrict tasks to execute only during specific time periods.
{
"executionWindow": {
"allowedHours": [0, 1, 2, 3, 4, 5, 22, 23],
"timezone": "America/New_York"
}
}
Use Cases:
- Avoid website peak hours
- Utilize nighttime periods for collection
- Comply with website access restrictions
Resource Limits
Limit system resources used by tasks.
{
"resources": {
"maxMemory": "2GB",
"maxCPU": "50%",
"maxDuration": "30m"
}
}
Advanced Features
Task Chains
Execute multiple tasks in sequence.
{
"taskChain": {
"enabled": true,
"tasks": [
{
"taskId": "task-1",
"waitForCompletion": true
},
{
"taskId": "task-2",
"waitForCompletion": true
},
{
"taskId": "task-3",
"waitForCompletion": false
}
]
}
}
Execution Flow:
Task1 β Wait for completion β Task2 β Wait for completion β Task3 β Don't wait
Conditional Execution
Decide whether to execute task based on conditions.
{
"conditionalExecution": {
"enabled": true,
"conditions": [
{
"type": "dataChange",
"source": "task-1",
"threshold": 10
},
{
"type": "timeRange",
"start": "09:00",
"end": "18:00"
}
],
"operator": "AND"
}
}
Incremental Collection
Only collect new or changed data.
{
"incremental": {
"enabled": true,
"compareField": "id",
"lastRunData": "stored"
}
}
Working Principle:
- Record data from last collection
- Compare during current collection
- Only save new or changed data
Monitoring and Logging
Execution Monitoring
Real-time monitoring of task execution status.
Monitoring Metrics:
- Execution status
- Collection progress
- Data count statistics
- Resource usage
- Error information
Log Recording
Detailed recording of task execution process.
Log Levels:
- DEBUG: Debug information
- INFO: General information
- WARN: Warning information
- ERROR: Error information
Log Content:
[2024-01-15 09:00:00] INFO Task execution started
[2024-01-15 09:00:05] INFO Page loading completed
[2024-01-15 09:00:10] INFO Started collecting list data
[2024-01-15 09:02:30] INFO Collected page 1, got 50 records
[2024-01-15 09:05:00] INFO Collected page 2, got 48 records
[2024-01-15 09:05:30] INFO Task execution completed, collected 1000 records total
Performance Analysis
Analyze task execution performance.
Performance Metrics:
- Total execution time
- Page loading time
- Data extraction time
- Network request time
- Data processing time
Best Practices
1. Set Reasonable Execution Frequency
β Recommended:
- Set based on data update frequency
- Avoid overly frequent execution
- Consider website access restrictions
- Stagger peak hours
β Avoid:
- Execute every minute
- Execute many tasks simultaneously
- Ignore website load
- Ignore anti-scraping restrictions
2. Configure Appropriate Notifications
β Recommended:
- Send notifications on failure
- Send alerts for abnormal situations
- Send periodic execution reports
- Use multiple notification methods
β Avoid:
- Notify on every execution
- Unclear notification information
- Use only single notification method
- Ignore notification failures
3. Handle Errors Properly
β Recommended:
- Configure auto retry
- Set reasonable timeouts
- Record detailed error logs
- Regularly check execution status
β Avoid:
- Don't handle errors
- Infinite retries
- Ignore timeouts
- Don't record logs
4. Monitor Task Execution
β Recommended:
- Regularly check execution history
- Analyze performance metrics
- Optimize execution efficiency
- Handle exceptions promptly
β Avoid:
- Set and forget
- Don't check execution results
- Ignore performance issues
- Don't handle failed tasks
Troubleshooting
Issue 1: Scheduled task not executing
Possible Causes:
- Task not enabled
- Incorrect Cron expression
- Browser not open
- Incorrect system time
Solutions:
- Check if task is enabled
- Validate Cron expression
- Ensure browser is open
- Check system time settings
Issue 2: Task execution failed
Possible Causes:
- Network connection issues
- Website structure changes
- Selector failures
- Anti-scraping restrictions
Solutions:
- Check network connection
- Update collection rules
- Adjust selectors
- Reduce collection frequency
Issue 3: Not receiving notifications
Possible Causes:
- Notifications not enabled
- Incorrect notification configuration
- Permissions not granted
- Network issues
Solutions:
- Check notification settings
- Verify configuration information
- Grant notification permissions
- Test notification functionality
Next Steps
- Batch Collection - Batch collect multiple targets
- Data Processing - Process collected data
- Third-party Integration - Integrate external services
- Tutorials - Complete automated collection cases