PYTHON2DAY Telegram 6649
🔥 Полезные библиотеки Python

Python PDF Handling Tutorial
— интересная подборка скриптов для работы с PDF-файлами в Python:

Вы научитесь:
➡️ Извлекать текст и изображения из PDF файлов;
➡️ Извлекать таблицы и URL адреса из PDF файлов;
➡️ Извлекать страницы из PDF файлов как изображения;
➡️ Создавать PDF файлы;
➡️ Добавлять текст, изображения и таблицы в PDF файлы;
➡️ Выделять текст в PDF файлах и многое другое.

Пример извлечения текста:

from io import StringIO
from pdfminer.pdfinterp import PDFPageInterpreter, PDFResourceManager
from pdfminer.pdfpage import PDFPage
from pdfminer.converter import TextConverter
from pdfminer.layout import LAParams

# PDFMiner Analyzers
rsrcmgr = PDFResourceManager()
sio = StringIO()
codec = "utf-8"
laparams = LAParams()
device = TextConverter(rsrcmgr, sio, codec=codec, laparams=laparams)
interpreter = PDFPageInterpreter(rsrcmgr, device)

# path to our input file
pdf_file = "sample.pdf"

# Extract text
pdfFile = open(pdf_file, "rb")
for page in PDFPage.get_pages(pdfFile):
interpreter.process_page(page)
pdfFile.close()

# Return text from StringIO
text = sio.getvalue()

print(text)

# Freeing Up
device.close()
sio.close()



Пример извлечения изображений:


import fitz
import io
from PIL import Image

# path to our input file
pdf_file = "sample.pdf"

# Input PDF file
pdf_file = fitz.open(pdf_file)

for page_no in range(len(pdf_file)):
curr_page = pdf_file[page_no]
images = curr_page.getImageList()

for image_no, image in enumerate(curr_page.getImageList()):
# get the XREF of the image
xref = image[0]
# extract the image bytes
curr_image = pdf_file.extractImage(xref)
img_bytes = curr_image["image"]
# get the image extension
img_extension = curr_image["ext"]
# load it to PIL
image = Image.open(io.BytesIO(img_bytes))
# save it to local disk
image.save(open(f"page{page_no+1}_img{image_no}.{img_extension}", "wb"))


⚙️ GitHub/Инструкция

#python #soft #github
Please open Telegram to view this post
VIEW IN TELEGRAM
👍309🔥8



tgoop.com/python2day/6649
Create:
Last Update:

🔥 Полезные библиотеки Python

Python PDF Handling Tutorial
— интересная подборка скриптов для работы с PDF-файлами в Python:

Вы научитесь:
➡️ Извлекать текст и изображения из PDF файлов;
➡️ Извлекать таблицы и URL адреса из PDF файлов;
➡️ Извлекать страницы из PDF файлов как изображения;
➡️ Создавать PDF файлы;
➡️ Добавлять текст, изображения и таблицы в PDF файлы;
➡️ Выделять текст в PDF файлах и многое другое.

Пример извлечения текста:

from io import StringIO
from pdfminer.pdfinterp import PDFPageInterpreter, PDFResourceManager
from pdfminer.pdfpage import PDFPage
from pdfminer.converter import TextConverter
from pdfminer.layout import LAParams

# PDFMiner Analyzers
rsrcmgr = PDFResourceManager()
sio = StringIO()
codec = "utf-8"
laparams = LAParams()
device = TextConverter(rsrcmgr, sio, codec=codec, laparams=laparams)
interpreter = PDFPageInterpreter(rsrcmgr, device)

# path to our input file
pdf_file = "sample.pdf"

# Extract text
pdfFile = open(pdf_file, "rb")
for page in PDFPage.get_pages(pdfFile):
interpreter.process_page(page)
pdfFile.close()

# Return text from StringIO
text = sio.getvalue()

print(text)

# Freeing Up
device.close()
sio.close()



Пример извлечения изображений:


import fitz
import io
from PIL import Image

# path to our input file
pdf_file = "sample.pdf"

# Input PDF file
pdf_file = fitz.open(pdf_file)

for page_no in range(len(pdf_file)):
curr_page = pdf_file[page_no]
images = curr_page.getImageList()

for image_no, image in enumerate(curr_page.getImageList()):
# get the XREF of the image
xref = image[0]
# extract the image bytes
curr_image = pdf_file.extractImage(xref)
img_bytes = curr_image["image"]
# get the image extension
img_extension = curr_image["ext"]
# load it to PIL
image = Image.open(io.BytesIO(img_bytes))
# save it to local disk
image.save(open(f"page{page_no+1}_img{image_no}.{img_extension}", "wb"))


⚙️ GitHub/Инструкция

#python #soft #github

BY [PYTHON:TODAY]




Share with your friend now:
tgoop.com/python2day/6649

View MORE
Open in Telegram


Telegram News

Date: |

Co-founder of NFT renting protocol Rentable World emiliano.eth shared the group Tuesday morning on Twitter, calling out the "degenerate" community, or crypto obsessives that engage in high-risk trading. Step-by-step tutorial on desktop: Hashtags A few years ago, you had to use a special bot to run a poll on Telegram. Now you can easily do that yourself in two clicks. Hit the Menu icon and select “Create Poll.” Write your question and add up to 10 options. Running polls is a powerful strategy for getting feedback from your audience. If you’re considering the possibility of modifying your channel in any way, be sure to ask your subscribers’ opinions first. The channel also called on people to turn out for illegal assemblies and listed the things that participants should bring along with them, showing prior planning was in the works for riots. The messages also incited people to hurl toxic gas bombs at police and MTR stations, he added.
from us


Telegram [PYTHON:TODAY]
FROM American