check and run again.
Thanks, Deviprasad. You're running two programs:
-
Program 1: Downloads and caches packages (Debian and Python), installs them, runs Python code.
-
Program 2: Uses only cached packages from Google Drive and runs the same Python code offline.
❌ Error Root Causes:
-
cp: cannot stat '/content/drive/MyDrive/Lib/archives/partial/*.deb'
→ This means no.deb
files were downloaded or stored duringapt-get
with-o=dir::cache
. -
dpkg: cannot access archive '/content/drive/MyDrive/Lib/debs/*.deb'
→ This is a result of thecp
failing earlier. -
PDFInfoNotInstalledError
frompdf2image
→ This meanspdfinfo
(frompoppler-utils
) is not installed or not found in PATH in Program 2.
✅ Solution Strategy
To fix this, change how you cache .deb
files in Program 1, and ensure pdfinfo
exists and is executable in Program 2.
✅ Program 1: Install & Cache All Required Packages
✅ Cache Python packages:
✅ Program 2: Install from Cached Packages
Install Python packages offline:
🧪 Test PDF Functionality (same xyz
Python code)
Here’s a test snippet that uses pdf2image
and pytesseract
:
✅ Summary:
-
Use
--download-only
and specifydir::cache::archives
directly to get.deb
files. -
pdfinfo
must be installed viapoppler-utils
, which your second program must install from.deb
, not viaapt-get
(which won't work offline). -
pdf2image
relies onpdfinfo
— so without poppler-utils properly installed, you'll always getPDFInfoNotInstalledError
.
Would you like me to give you a single Colab-friendly setup that handles both modes (online and offline) with checks?