---
title: TenderHub WebAI Verification Worker
colorFrom: "blue"
colorTo: "purple"
sdk: gradio
sdk_version: "4.44.0"
python_version: "3.11"
app_file: app.py
pinned: false
license: mit
tags:
  - document-processing
  - tender-analysis
  - verification
  - multimodal-ai
short_description: Secondary verification layer using webAI-ColVec1-4b
---

# TenderHub WebAI Verification Worker

A secondary verification layer for tender document processing using the webAI-ColVec1-4b multimodal model. This worker provides an alternative analysis pipeline to cross-validate the primary worker's results.

## Architecture Overview

This worker uses a different approach than the primary worker:
- **Vision-Language Model**: webAI-ColVec1-4b for direct document understanding
- **ZeroGPU Deployment**: Leverages HF Spaces ZeroGPU for on-demand GPU access
- **Memory Optimization**: 8-bit quantization + FlashAttention-2 for minimal memory overhead
- **Verification Logic**: Cross-compares results with primary worker

## Processing Pipeline

1. **Document Ingestion**: Same document retrieval as primary worker
2. **Vision Analysis**: Direct image/text processing with webAI-ColVec1-4b
3. **Structured Extraction**: Multimodal understanding for tender analysis
4. **Comparison Engine**: Cross-validation with primary worker results
5. **Confidence Scoring**: Agreement/disagreement metrics

## Deployment Strategy

- **Platform**: Hugging Face Spaces with ZeroGPU
- **Memory Management**: 8-bit quantization + CPU fallback
- **Scaling**: On-demand GPU allocation for processing tasks
- **Cost**: Free tier with dynamic GPU provisioning

## Key Differences from Primary Worker

- **Model Architecture**: Vision-language vs text-only pipeline
- **Processing Approach**: End-to-end multimodal vs staged extraction
- **Validation**: Cross-model verification vs single-model processing
- **Memory Strategy**: GPU-accelerated vs CPU-optimized

## Integration Points

- **Database**: Reads from same processing_jobs table
- **Storage**: Shared Supabase document access
- **Results**: Stores verification metrics and comparisons
- **API**: Compatible job processing interface

## Deployment Instructions

### 1. Create HF Space

```bash
# Create new space on Hugging Face
huggingface-cli space create \
  --name tenderhub-webai-verification \
  --space-type gradio \
  --hardware cpu-basic \
  --private
```

### 2. Environment Variables

Set these in your HF Space settings:

```bash
DATABASE_URL=postgresql://user:pass@host:port/db
NEXT_PUBLIC_SUPABASE_URL=https://your-project.supabase.co
SUPABASE_SERVICE_ROLE_KEY=your-service-role-key
SUPABASE_STORAGE_BUCKET=tender-documents
```

### 3. Memory Optimization

The worker automatically applies several OOM prevention strategies:

- **8-bit Quantization**: Reduces 4B model memory from ~8GB to ~4GB with better quality
- **FlashAttention-2**: Optimized attention mechanism with minimal memory overhead
- **Adaptive DPI**: High DPI (200-300) for better extraction with memory-aware scaling
- **CPU Loading**: Model loads on CPU, moves to GPU only during inference
- **Batch Size 1**: Processes one document at a time
- **Aggressive Memory Cleanup**: Manual garbage collection after each document to prevent ghost memory
- **Image Resizing**: Optimized to 336x336 for webAI models

### 4. Memory Cleanup

Vision tensors can leave 4GB+ of "ghost memory" due to Python's lazy garbage collection. The worker implements aggressive cleanup:

**Cleanup Strategy:**
- **GPU Cache Clearing**: Multiple passes of `torch.cuda.empty_cache()`
- **CUDA Synchronization**: Ensures all GPU operations complete before cleanup
- **Python GC**: 3-generation garbage collection with multiple passes
- **PIL Cache**: Clears image processing caches
- **Memory Monitoring**: Tracks memory freed and cleanup effectiveness

**Cleanup Triggers:**
- After every document processing
- After WebAI model inference
- On processing failures (ensure cleanup even on errors)
- Manual cleanup available via `aggressive_memory_cleanup()`

**Monitoring:**
```bash
# Monitor cleanup effectiveness
grep "memory.cleanup" /var/log/app.log | jq '.memory_freed_gb'

# Track ghost memory prevention
grep "memory_freed_gb" /var/log/app.log | awk '{sum+=$2} END {print "Total freed: " sum "GB"}'
```

### 5. DPI Configuration

High DPI (200-300) significantly improves extraction quality for messy documents:

**Memory Impact Analysis:**
- **200 DPI**: ~4x larger images (~1.2MB each)
- **300 DPI**: ~9x larger images (~2.7MB each)
- **Memory Impact**: 4-9x increase during processing
- **Quality Impact**: Dramatically better text recognition in complex documents

**Adaptive DPI Scaling:**
- **12GB+ Memory**: 300 DPI (maximum quality)
- **8GB+ Memory**: 250 DPI (high quality)
- **4GB+ Memory**: 200 DPI (medium quality)
- **<4GB Memory**: 150 DPI (conservative)

**Configuration Options:**
```bash
# Set maximum DPI (default: 200)
PDF_DPI=300

# Enable adaptive scaling (default: true)
ADAPTIVE_DPI=true
```

### 6. Database Schema

Add verification tables to your PostgreSQL database:

```sql
-- WebAI verification results
CREATE TABLE public.webai_verifications (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    tender_id UUID NOT NULL REFERENCES public.tenders(id),
    analysis JSONB NOT NULL,
    comparison JSONB NOT NULL,
    created_at TIMESTAMP WITH TIME ZONE DEFAULT now(),
    INDEX (tender_id)
);

-- Add verification status to tenders
ALTER TABLE public.tenders 
ADD COLUMN verification_status TEXT DEFAULT 'PENDING',
ADD COLUMN verification_score FLOAT DEFAULT 0.0;
```

## Usage

### Automatic Verification

The worker automatically processes verification jobs from the queue:

```sql
-- Queue a verification job
INSERT INTO public.processing_jobs (tender_id, job_type, payload)
VALUES ('tender-uuid', 'VERIFY', '{}');
```

### Manual Testing

Use the Gradio interface to test individual documents:

1. Upload a PDF or image document
2. Click "Verify Document"
3. Review the structured analysis output

### Verification Results

Access verification results via the database:

```sql
-- Get verification for a tender
SELECT 
    tender_id,
    analysis->>'tenderTitle' as title,
    comparison->>'agreement_score' as agreement_score,
    comparison->'recommendation_comparison' as bid_comparison,
    created_at
FROM public.webai_verifications 
WHERE tender_id = 'your-tender-id';
```

## Comparison Metrics

The worker provides detailed comparison metrics:

- **Agreement Score**: 0.0-1.0 overall similarity
- **Bid Decision Comparison**: Primary vs WebAI recommendations
- **Confidence Comparison**: Model confidence differences
- **Key Differences**: Discrepancies requiring human review

## Monitoring

Monitor worker performance through structured logs:

```bash
# View recent verification logs
grep "webai-verification-worker" /var/log/app.log | tail -20

# Check agreement score distribution
grep "agreement_score" /var/log/app.log | jq '.agreement_score'
```

## Troubleshooting

### Common Issues

1. **OOM Errors**: Check that 4-bit quantization is enabled
2. **Slow Processing**: Verify ZeroGPU is working (check HF Space logs)
3. **Parsing Errors**: WebAI responses may need post-processing
4. **Database Connection**: Ensure DATABASE_URL is accessible from HF

### Performance Tips

- Use smaller images when possible
- Limit `max_new_tokens` to reduce memory usage
- Monitor GPU allocation in HF Space metrics
- Consider upgrading to paid tier for higher throughput

## Cost Optimization

- **Free Tier**: ~20-30 documents/hour with 4B model, FlashAttention-2, and adaptive DPI
- **Paid Tier**: Linear scaling with GPU allocation
- **Batch Processing**: Queue multiple jobs for efficiency
- **Caching**: Reuse cached document embeddings when possible
- **Memory Efficiency**: FlashAttention-2 reduces attention memory by ~40%
- **DPI Impact**: High DPI reduces throughput by ~15-25% but dramatically improves quality