Skip to content

Workers

This page is a work-in-progress.

The Worker System in Malbox enables specialized workers that can be optimized for specific task types and support batch processing capabilities. This system improves efficiency and resource utilization by allowing workers to be tailored to particular analysis scenarios.

Workers can be configured to handle specific types of tasks based on their capabilities and resource requirements. For example, this allows for:

  • Lightweight Workers: Dedicated to host plugins that don’t require VMs or heavy resources
  • Specialized Workers: Optimized for specific analysis types (e.g., static analysis, YARA scanning)
  • Resource-Intensive Workers: Configured for dynamic analysis requiring full VM isolation

Compatible workers can process multiple similar tasks in a single operation, significantly improving throughput for certain types of analysis:

  • Task Grouping: Similar tasks are collected and processed together
  • Optimized Resource Usage: Eliminates redundant initialization steps/overhead
  • Configurable Parameters: Timeouts and batch sizes can be adjusted based on workload
ParameterDescriptionDefaultExample
nameUnique identifier for the worker"worker-{id}""static-analysis-worker"
compatible_tasksTask types this worker can process[] (all)["yara", "static", "hash"]
execution_modeWorker’s execution environment"host""host", "guest", "hybrid"
batch_processingWhether batch processing is enabledfalsetrue
max_batch_sizeMaximum number of tasks in a batch150
batch_timeoutMaximum time to wait for batch collection (ms)5002000
max_concurrent_tasksMaximum tasks worker can handle simultaneously15
resource_limitsResource constraints for the worker{}{"memory": "4G", "cpu": 2}
plugin_restrictionsSpecific plugins this worker can/cannot use{}{"allow": ["yara", "pe"], "deny": ["sandbox"]}
workers.toml
[[worker]]
name = "yara-scanner"
compatible_tasks = ["yara"]
execution_mode = "host"
batch_processing = true
max_batch_size = 100
batch_timeout = 1000
max_concurrent_tasks = 1
resource_limits = { memory = "2G", cpu = 1 }
plugin_restrictions = { allow = ["yara"] }
[[worker]]
name = "dynamic-analyzer"
compatible_tasks = ["dynamic"]
execution_mode = "guest"
batch_processing = false
max_concurrent_tasks = 1
resource_limits = { memory = "8G", cpu = 4 }
plugin_restrictions = { allow = ["sandbox", "behavior"] }
  1. Tasks marked as batch-compatible are held in a queue until:

    • The queue reaches max_batch_size
    • The batch_timeout is reached
    • A non-compatible task arrives and requires immediate processing
  2. The worker processes all collected tasks in a single operation

  3. Results are individually reported for each task in the batch

Batch processing is primarily suitable for:

  • Stateless Analysis: Tasks that don’t modify system state
  • File-Based Scanning: YARA, hash calculation, static analysis

Incompatible with:

  • Dynamic analysis that executes the sample
  • Analysis that modifies system state
  • Tasks with complex dependencies
  • Separate Concerns: Create specialized workers for different analysis types
  • Resource Allocation: Assign resources based on task requirements
  • Batch Wisely: Only enable batch processing for truly stateless operations
  • Tune Batch Parameters: Adjust timeout and batch size based on workload characteristics
  • Monitor Worker Load: Track worker performance to identify bottlenecks
  • Consider Task Affinity: Route related tasks to the same worker when appropriate
  • Worker Pools: Group similar workers into pools for better load distribution
  • Dynamic Scaling: Add or remove workers based on queue length and processing times
  • Resource Limits: Set appropriate memory and CPU limits to prevent resource exhaustion
  • Batch processing introduces a small delay for the first tasks in a batch depending on the worker’s configuration
  • Not all plugins support batch processing - check plugin documentation
  • Complex dependencies between tasks may limit batch processing opportunities