Table of Contents

Convert PDF to HTML
#

This post will explain how you can get a free website using GitHub, and github pages. This was generated because i needed a way to take a portfolio pdf file and turn it into a website. The following takes a pdf and just convert it to a index.html file that points to images taken of the pdf. SO it isn’t really converting to html, but more converting pages to images and then refering those with a html index.html file.

Prerequisites
#

You need to have a GitHub account.
Install the required dependencies on your machine.

How to Install Dependencies
#

Install the required Python packages:
```
pip install pdf2image pillow
```
Install poppler-utils (required by pdf2image):
```
sudo apt-get install poppler-utils
```

Step 1: Prepare Your PDF File
#

Place your PDF file in the same directory as the script.
If your PDF is not named portfolio.pdf, either rename it to portfolio.pdf or edit the script to use your PDF file’s name.

Step 2: Run the Script
#

To generate the HTML files from your PDF, use the following script named convert_pdf.py:

# How to use
# 1. Install the required dependencies:
#    - Install Python packages:
#      pip install pdf2image pillow
#    - Install poppler-utils (required by pdf2image):
#      sudo apt-get install poppler-utils
# 2. Run the script:
#    - To use default settings:
#      python3 convert_pdf.py
#    - To specify a PDF file and output folder:
#      python3 convert_pdf.py path/to/your_pdf_file.pdf path/to/output_folder

from pdf2image import convert_from_path
import os
import sys

# Default PDF file and output folder
pdf_file = "portfolio.pdf"  # Default PDF file name
output_folder = "webpage"  # Default folder where the HTML and images will be saved

# Check for command-line arguments to override defaults
if len(sys.argv) > 1:
    pdf_file = sys.argv[1]  # First command-line argument is the PDF file
    print(f"Using provided PDF file: {pdf_file}")
else:
    print(f"No PDF file provided. Using default: {pdf_file}")

if len(sys.argv) > 2:
    output_folder = sys.argv[2]  # Second command-line argument is the output folder
    print(f"Using provided output folder: {output_folder}")
else:
    print(f"No output folder provided. Using default: {output_folder}")

# Subfolder to store the images
src_folder = os.path.join(output_folder, "assets") 

# Start the conversion process
print("Starting PDF to HTML conversion...")
print(f"PDF file to convert: {pdf_file}")
print(f"Output folder: {output_folder}")

# Create the output and src folders if they don't exist
print("Creating necessary directories...")
if not os.path.exists(output_folder):
    os.makedirs(output_folder)  # Create the output directory
    print(f"Created directory: {output_folder}")
if not os.path.exists(src_folder):
    os.makedirs(src_folder)  # Create the subdirectory for images
    print(f"Created directory: {src_folder}")

# Convert PDF to images
print("Converting PDF to images...")
try:
    # Convert each page of the PDF into an image at 300 DPI resolution
    pages = convert_from_path(pdf_file, 300)
    print(f"Successfully converted {len(pages)} pages from PDF to images.")
except Exception as e:
    # If there is an error during conversion, print the error and exit
    print(f"Error during PDF to image conversion: {e}")
    exit()

# Initialize a list to store the paths of the saved images
image_files = []
for i, page in enumerate(pages):
    # Define the filename for each image (page_1.jpg, page_2.jpg, etc.)
    image_filename = os.path.join(src_folder, f'page_{i+1}.jpg')
    try:
        # Save each page as a JPEG image in the src folder
        page.save(image_filename, 'JPEG')
        image_files.append(image_filename)
        print(f"Saved image: {image_filename}")
    except Exception as e:
        # If there is an error saving an image, print the error
        print(f"Error saving image {image_filename}: {e}")

# Begin generating the HTML file
print("Generating HTML file...")
html_content = """
<html>
<head>
<style>
    body {
        margin: 0;
        padding: 0;
        background-color: #f0f0f0;  /* Set a light background color */
    }
    img {
        display: block;
        margin: 0 auto;  /* Center images horizontally */
    }
</style>
</head>
<body>
"""

# Add each image to the HTML content
print("Adding images to the HTML content...")
for i in range(len(image_files)):
    image_file = os.path.join("src", f'page_{i+1}.jpg')
    # Add a div for each image with 100% width and a max-width of 1200px
    html_content += f'<div><img src="{image_file}" style="width:100%; max-width:1200px;"/></div>\n'
    print(f"Added image {image_file} to HTML content.")

html_content += "</body></html>"

# Define the filename for the index HTML file
html_filename = os.path.join(output_folder, "index.html")
print(f"Saving HTML file to: {html_filename}")
try:
    # Write the generated HTML content to the output file
    with open(html_filename, "w") as f:
        f.write(html_content)
    print(f"HTML file created successfully: {html_filename}")
except Exception as e:
    # If there is an error writing the HTML file, print the error
    print(f"Error writing HTML file: {e}")

To run the script:

Using default settings (PDF file named portfolio.pdf and output folder webpage):
```
python3 convert_pdf.py
```

To specify a custom PDF file and output folder:

python3 convert_pdf.py path/to/your_pdf_file.pdf path/to/output_folder

Step 3: Upload to GitHub Pages
#

Create a Repository: On GitHub, create a new repository named <YOUR USERNAME>.github.io.
Upload Files: Upload the contents of your output folder (including index.html and the assets folder) to the repository.
Publish: Once uploaded, your site will be live at https://<YOUR USERNAME>.github.io.

Step 4: Visit Your Website
#

After uploading the files, visit your new website by navigating to https://<YOUR USERNAME>.github.io in your web browser.

Convert PDF to HTML#

Prerequisites#

How to Install Dependencies#

Step 1: Prepare Your PDF File#

Step 2: Run the Script#

Step 3: Upload to GitHub Pages#

Step 4: Visit Your Website#