check and run again.

 


Thanks, Deviprasad. You're running two programs:

  • Program 1: Downloads and caches packages (Debian and Python), installs them, runs Python code.

  • Program 2: Uses only cached packages from Google Drive and runs the same Python code offline.

❌ Error Root Causes:

  1. cp: cannot stat '/content/drive/MyDrive/Lib/archives/partial/*.deb'
    → This means no .deb files were downloaded or stored during apt-get with -o=dir::cache.

  2. dpkg: cannot access archive '/content/drive/MyDrive/Lib/debs/*.deb'
    → This is a result of the cp failing earlier.

  3. PDFInfoNotInstalledError from pdf2image
    → This means pdfinfo (from poppler-utils) is not installed or not found in PATH in Program 2.


✅ Solution Strategy

To fix this, change how you cache .deb files in Program 1, and ensure pdfinfo exists and is executable in Program 2.


Program 1: Install & Cache All Required Packages

bash
# Create necessary cache and deb storage folders !mkdir -p /content/drive/MyDrive/Lib/debs # Update APT index !apt-get update -qq # Download .deb files only (no install yet) !apt-get -qq --download-only -o=dir::cache::archives="/content/drive/MyDrive/Lib/debs" install poppler-utils tesseract-ocr-tel # Install downloaded .deb files to make them available now !dpkg -i /content/drive/MyDrive/Lib/debs/*.deb || true # Fix any missing dependencies !apt-get -f install -y

✅ Cache Python packages:

bash
!pip download -d /content/drive/MyDrive/Lib/ Pillow pytesseract pdf2image

Program 2: Install from Cached Packages

bash
# Install .deb files from Google Drive !dpkg -i /content/drive/MyDrive/Lib/debs/*.deb || true # Fix dependencies if needed !apt-get -f install -y

Install Python packages offline:

bash
!pip install --no-index --find-links=/content/drive/MyDrive/Lib/ Pillow pytesseract pdf2image

🧪 Test PDF Functionality (same xyz Python code)

Here’s a test snippet that uses pdf2image and pytesseract:

python
from pdf2image import convert_from_path import pytesseract images = convert_from_path("sample.pdf") for i, img in enumerate(images): text = pytesseract.image_to_string(img, lang='tel') print(f"Page {i+1} Text:\n{text}")

✅ Summary:

  • Use --download-only and specify dir::cache::archives directly to get .deb files.

  • pdfinfo must be installed via poppler-utils, which your second program must install from .deb, not via apt-get (which won't work offline).

  • pdf2image relies on pdfinfo — so without poppler-utils properly installed, you'll always get PDFInfoNotInstalledError.

Would you like me to give you a single Colab-friendly setup that handles both modes (online and offline) with checks?

Popular posts from this blog

SAP CPI : camle expression in sap cpi , cm, router, filter and groovy script. format

pss book: గురు ప్రార్థనామంజరి . completed 21st july 2024

pss book : శ్రీకృష్ణుడు దేవుడా, భగవంతుడా completed , second review needed. 26th April 2024