run pyton code, similary coloab offline

The [Programiz Python Online Compiler](https://www.programiz.com/python-programming/online-compiler/) is a web-based tool and does not support offline execution or direct file system access like Google Colab. To use the provided PDF-to-text conversion code offline, you need a local Python environment (e.g., on your computer) with the downloaded libraries (`pdf2image`, `pytesseract`, `Pillow`, and dependencies) and `poppler-utils`. Below, I’ll correct and adapt the process to download the libraries from Colab, set up an offline Python environment, and run the code locally. The provided code is mostly correct but needs adjustments for offline use and path updates. I’ll also clarify where to place the downloaded files and how to handle `poppler-utils` offline.


### Issues with Programiz Compiler

- **Online Only**: Programiz requires an internet connection and doesn’t allow local file uploads (e.g., PDFs or library files) or offline execution.

- **No File System Access**: It lacks access to `/content/lib` or local directories, making it unsuitable for this task.

- **Solution**: Use a local Python installation (e.g., Python 3.8+ with an IDE like IDLE, VS Code, or PyCharm) as an offline compiler.


### Step-by-Step Instructions


#### Step 1: Download Library Files in Google Colab

Run the following code in Google Colab to download `pdf2image`, `pytesseract`, `Pillow`, and their dependencies to `/content/lib`. This also verifies the downloaded files.


```python

# Create /content/lib directory

!mkdir -p /content/lib


# Install required libraries to cache dependencies

!apt-get install -y poppler-utils

!pip install pdf2image pytesseract Pillow


# Download pdf2image, pytesseract, Pillow, and dependencies to /content/lib

!pip download pdf2image pytesseract Pillow -d /content/lib


# List downloaded files to verify

!ls /content/lib

```


**Expected Output**: Files like:

```

pdf2image-1.17.0-py3-none-any.whl

pytesseract-0.3.13-py3-none-any.whl

Pillow-10.4.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl

...

```


**Notes**:

- Dependencies like `requests`, `packaging`, etc., are included.

- `poppler-utils` cannot be downloaded as a Python package; it’s a system dependency. You’ll install it offline later.


#### Step 2: Download the Library Folder and PDF

1. In Colab’s file explorer (left sidebar), right-click `/content/lib` and select “Download” to save `lib.zip` or the folder to your local machine.

2. Upload your PDF (`Pravakthalu-Yevaru.pdf`) to Colab’s `/content/` directory, then download it:

   - Right-click `Pravakthalu-Yevaru.pdf` in Colab’s file explorer and select “Download”.

3. Transfer the `lib` folder and `Pravakthalu-Yevaru.pdf` to a local directory, e.g.:

   - Windows: `C:\offline_python\lib` and `C:\offline_python\Pravakthalu-Yevaru.pdf`

   - Linux/Mac: `~/offline_python/lib` and `~/offline_python/Pravakthalu-Yevaru.pdf`


#### Step 3: Set Up an Offline Python Environment

1. **Install Python**:

   - Download and install Python 3.8+ from [python.org](https://www.python.org/downloads/) or Miniconda from [conda.io](https://docs.conda.io/en/latest/miniconda.html).

   - Verify:

     ```bash

     python --version

     ```


2. **Install poppler-utils Offline**:

   - **Windows**:

     - Download `poppler` from [GitHub](https://github.com/oschwartz10612/poppler-windows/releases) (e.g., `poppler-24.08.0.zip`).

     - Extract to `C:\offline_python\poppler`.

     - Add `C:\offline_python\poppler\Library\bin` to your system PATH:

       ```bash

       set PATH=%PATH%;C:\offline_python\poppler\Library\bin

       ```

   - **Linux**:

     - Download `poppler-utils` package (e.g., `.deb` for Ubuntu) from [packages.ubuntu.com](https://packages.ubuntu.com/) or use a package manager offline.

     - Install offline:

       ```bash

       sudo dpkg -i poppler-utils*.deb

       ```

   - **Mac**:

     - Use Homebrew offline or download `poppler` from [macports.org](https://www.macports.org/).

     - Install via downloaded package.


3. **Install Tesseract OCR Offline**:

   - **Windows**:

     - Download Tesseract installer from [GitHub](https://github.com/UB-Mannheim/tesseract/wiki) (e.g., `tesseract-ocr-w64-setup-v5.4.0.exe`).

     - Install and add to PATH (e.g., `C:\Program Files\Tesseract-OCR`).

     - Download Telugu language data (`tel.traineddata`) from [tesseract-ocr/tessdata](https://github.com/tesseract-ocr/tessdata) and place in `C:\Program Files\Tesseract-OCR\tessdata`.

   - **Linux**:

     - Download Tesseract and language data:

       ```bash

       sudo apt-get download tesseract-ocr tesseract-ocr-tel

       sudo dpkg -i tesseract-ocr*.deb

       ```

     - Place `tel.traineddata` in `/usr/share/tesseract-ocr/5/tessdata`.

   - **Mac**:

     - Use Homebrew offline or download Tesseract package and `tel.traineddata`.


4. **Create Project Directory**:

   - Create a folder, e.g., `C:\offline_python` (Windows) or `~/offline_python` (Linux/Mac).

   - Place `lib`, `Pravakthalu-Yevaru.pdf`, and the Python script here.


5. **Install Libraries Offline**:

   - Open a terminal and navigate to the project directory:

     ```bash

     cd C:\offline_python  # Windows

     cd ~/offline_python   # Linux/Mac

     ```

   - Install libraries:

     ```bash

     pip install lib/*

     ```


#### Step 4: Corrected Python Code

Save the following code as `pdf_to_text.py` in your project directory (e.g., `C:\offline_python\pdf_to_text.py`). It’s updated for local paths and offline compatibility.


```python

from PIL import Image

import pytesseract

import os

from pdf2image import convert_from_path


# Paths to the files (update for local machine)

pdf_file_path = 'Pravakthalu-Yevaru.pdf'  # Update path as needed

output_text_file_path = 'Pravakthalu-Yevaru.txt'


# Verify the PDF file existence

if not os.path.exists(pdf_file_path):

    raise FileNotFoundError(f"PDF file {pdf_file_path} not found")


# Convert PDF to images

images = convert_from_path(pdf_file_path, dpi=300)


# Configure Tesseract for Telugu and English

tesseract_config = r'--oem 3 --psm 6 -l tel+eng'


# Process each image and save the text

with open(output_text_file_path, 'w', encoding='utf-8') as output_file:

    for i, image in enumerate(images):

        # Extract text from the image using Tesseract

        text = pytesseract.image_to_string(image, config=tesseract_config)

        output_file.write(f"Page {i + 1}\n")

        output_file.write(text)

        output_file.write("\n\n")


print(f'OCR text written to file "{output_text_file_path}"')

```


**Notes**:

- Update `pdf_file_path` and `output_text_file_path` to match your local paths (e.g., `C:\offline_python\Pravakthalu-Yevaru.pdf`).

- Ensure `poppler` and `tesseract` binaries are in your PATH.


#### Step 5: Run the Code Offline

1. Place `Pravakthalu-Yevaru.pdf` in the project directory.

2. Open a terminal and navigate to the project directory:

   ```bash

   cd C:\offline_python  # Windows

   cd ~/offline_python   # Linux/Mac

   ```

3. Run the script:

   ```bash

   python pdf_to_text.py

   ```

4. Check for `Pravakthalu-Yevaru.txt` in the directory, containing the extracted text.


#### Step 6: Offline Compiler Setup

Use a local Python IDE as your offline compiler instead of Programiz:

- **IDLE**: Included with Python. Open `pdf_to_text.py` and press F5 to run.

- **VS Code**: Install VS Code, add the Python extension, set the interpreter to your local Python, and run the script.

- **PyCharm**: Install PyCharm Community, open the project, and run `pdf_to_text.py`.


#### Troubleshooting

- **Library Installation Fails**: Install `.whl` files individually:

  ```bash

  pip install lib/pdf2image-1.17.0-py3-none-any.whl

  pip install lib/pytesseract-0.3.13-py3-none-any.whl

  pip install lib/Pillow-10.4.0-cp311-cp311-*.whl

  ```

- **poppler Error**: Ensure `poppler` binaries are in PATH and accessible (e.g., `pdfinfo --version` works).

- **Tesseract Error**: Verify `tesseract` is installed and `tel.traineddata` is in the tessdata directory.

- **FileNotFoundError**: Check PDF path in `pdf_to_text.py`.


This setup ensures the code runs offline with all dependencies. If you need help with a specific IDE or encounter errors, let me know! Would you like an image of the folder structure?

Popular posts from this blog

SAP CPI : camle expression in sap cpi , cm, router, filter and groovy script. format

pss book: గురు ప్రార్థనామంజరి . completed 21st july 2024

pss book : శ్రీకృష్ణుడు దేవుడా, భగవంతుడా completed , second review needed. 26th April 2024