--- title: TenderHub WebAI Verification Worker colorFrom: "blue" colorTo: "purple" sdk: gradio sdk_version: "4.44.0" python_version: "3.11" app_file: app.py pinned: false license: mit tags: - document-processing - tender-analysis - verification - multimodal-ai short_description: Secondary verification layer using webAI-ColVec1-4b --- # TenderHub WebAI Verification Worker A secondary verification layer for tender document processing using the webAI-ColVec1-4b multimodal model. This worker provides an alternative analysis pipeline to cross-validate the primary worker's results. ## Architecture Overview This worker uses a different approach than the primary worker: - **Vision-Language Model**: webAI-ColVec1-4b for direct document understanding - **ZeroGPU Deployment**: Leverages HF Spaces ZeroGPU for on-demand GPU access - **Memory Optimization**: 8-bit quantization + FlashAttention-2 for minimal memory overhead - **Verification Logic**: Cross-compares results with primary worker ## Processing Pipeline 1. **Document Ingestion**: Same document retrieval as primary worker 2. **Vision Analysis**: Direct image/text processing with webAI-ColVec1-4b 3. **Structured Extraction**: Multimodal understanding for tender analysis 4. **Comparison Engine**: Cross-validation with primary worker results 5. **Confidence Scoring**: Agreement/disagreement metrics ## Deployment Strategy - **Platform**: Hugging Face Spaces with ZeroGPU - **Memory Management**: 8-bit quantization + CPU fallback - **Scaling**: On-demand GPU allocation for processing tasks - **Cost**: Free tier with dynamic GPU provisioning ## Key Differences from Primary Worker - **Model Architecture**: Vision-language vs text-only pipeline - **Processing Approach**: End-to-end multimodal vs staged extraction - **Validation**: Cross-model verification vs single-model processing - **Memory Strategy**: GPU-accelerated vs CPU-optimized ## Integration Points - **Database**: Reads from same processing_jobs table - **Storage**: Shared Supabase document access - **Results**: Stores verification metrics and comparisons - **API**: Compatible job processing interface ## Deployment Instructions ### 1. Create HF Space ```bash # Create new space on Hugging Face huggingface-cli space create \ --name tenderhub-webai-verification \ --space-type gradio \ --hardware cpu-basic \ --private ``` ### 2. Environment Variables Set these in your HF Space settings: ```bash DATABASE_URL=postgresql://user:pass@host:port/db NEXT_PUBLIC_SUPABASE_URL=https://your-project.supabase.co SUPABASE_SERVICE_ROLE_KEY=your-service-role-key SUPABASE_STORAGE_BUCKET=tender-documents ``` ### 3. Memory Optimization The worker automatically applies several OOM prevention strategies: - **8-bit Quantization**: Reduces 4B model memory from ~8GB to ~4GB with better quality - **FlashAttention-2**: Optimized attention mechanism with minimal memory overhead - **Adaptive DPI**: High DPI (200-300) for better extraction with memory-aware scaling - **CPU Loading**: Model loads on CPU, moves to GPU only during inference - **Batch Size 1**: Processes one document at a time - **Aggressive Memory Cleanup**: Manual garbage collection after each document to prevent ghost memory - **Image Resizing**: Optimized to 336x336 for webAI models ### 4. Memory Cleanup Vision tensors can leave 4GB+ of "ghost memory" due to Python's lazy garbage collection. The worker implements aggressive cleanup: **Cleanup Strategy:** - **GPU Cache Clearing**: Multiple passes of `torch.cuda.empty_cache()` - **CUDA Synchronization**: Ensures all GPU operations complete before cleanup - **Python GC**: 3-generation garbage collection with multiple passes - **PIL Cache**: Clears image processing caches - **Memory Monitoring**: Tracks memory freed and cleanup effectiveness **Cleanup Triggers:** - After every document processing - After WebAI model inference - On processing failures (ensure cleanup even on errors) - Manual cleanup available via `aggressive_memory_cleanup()` **Monitoring:** ```bash # Monitor cleanup effectiveness grep "memory.cleanup" /var/log/app.log | jq '.memory_freed_gb' # Track ghost memory prevention grep "memory_freed_gb" /var/log/app.log | awk '{sum+=$2} END {print "Total freed: " sum "GB"}' ``` ### 5. DPI Configuration High DPI (200-300) significantly improves extraction quality for messy documents: **Memory Impact Analysis:** - **200 DPI**: ~4x larger images (~1.2MB each) - **300 DPI**: ~9x larger images (~2.7MB each) - **Memory Impact**: 4-9x increase during processing - **Quality Impact**: Dramatically better text recognition in complex documents **Adaptive DPI Scaling:** - **12GB+ Memory**: 300 DPI (maximum quality) - **8GB+ Memory**: 250 DPI (high quality) - **4GB+ Memory**: 200 DPI (medium quality) - **<4GB Memory**: 150 DPI (conservative) **Configuration Options:** ```bash # Set maximum DPI (default: 200) PDF_DPI=300 # Enable adaptive scaling (default: true) ADAPTIVE_DPI=true ``` ### 6. Database Schema Add verification tables to your PostgreSQL database: ```sql -- WebAI verification results CREATE TABLE public.webai_verifications ( id UUID PRIMARY KEY DEFAULT gen_random_uuid(), tender_id UUID NOT NULL REFERENCES public.tenders(id), analysis JSONB NOT NULL, comparison JSONB NOT NULL, created_at TIMESTAMP WITH TIME ZONE DEFAULT now(), INDEX (tender_id) ); -- Add verification status to tenders ALTER TABLE public.tenders ADD COLUMN verification_status TEXT DEFAULT 'PENDING', ADD COLUMN verification_score FLOAT DEFAULT 0.0; ``` ## Usage ### Automatic Verification The worker automatically processes verification jobs from the queue: ```sql -- Queue a verification job INSERT INTO public.processing_jobs (tender_id, job_type, payload) VALUES ('tender-uuid', 'VERIFY', '{}'); ``` ### Manual Testing Use the Gradio interface to test individual documents: 1. Upload a PDF or image document 2. Click "Verify Document" 3. Review the structured analysis output ### Verification Results Access verification results via the database: ```sql -- Get verification for a tender SELECT tender_id, analysis->>'tenderTitle' as title, comparison->>'agreement_score' as agreement_score, comparison->'recommendation_comparison' as bid_comparison, created_at FROM public.webai_verifications WHERE tender_id = 'your-tender-id'; ``` ## Comparison Metrics The worker provides detailed comparison metrics: - **Agreement Score**: 0.0-1.0 overall similarity - **Bid Decision Comparison**: Primary vs WebAI recommendations - **Confidence Comparison**: Model confidence differences - **Key Differences**: Discrepancies requiring human review ## Monitoring Monitor worker performance through structured logs: ```bash # View recent verification logs grep "webai-verification-worker" /var/log/app.log | tail -20 # Check agreement score distribution grep "agreement_score" /var/log/app.log | jq '.agreement_score' ``` ## Troubleshooting ### Common Issues 1. **OOM Errors**: Check that 4-bit quantization is enabled 2. **Slow Processing**: Verify ZeroGPU is working (check HF Space logs) 3. **Parsing Errors**: WebAI responses may need post-processing 4. **Database Connection**: Ensure DATABASE_URL is accessible from HF ### Performance Tips - Use smaller images when possible - Limit `max_new_tokens` to reduce memory usage - Monitor GPU allocation in HF Space metrics - Consider upgrading to paid tier for higher throughput ## Cost Optimization - **Free Tier**: ~20-30 documents/hour with 4B model, FlashAttention-2, and adaptive DPI - **Paid Tier**: Linear scaling with GPU allocation - **Batch Processing**: Queue multiple jobs for efficiency - **Caching**: Reuse cached document embeddings when possible - **Memory Efficiency**: FlashAttention-2 reduces attention memory by ~40% - **DPI Impact**: High DPI reduces throughput by ~15-25% but dramatically improves quality