Top

Convert PDF to Text on Linux

How can I convert PDF to text on Linux? Let’s look at a couple of ways to convert PDF to text in Linux step-by-step by using command lines.

Free Download Free Download Get on App Store Get on Google Play
security guaranteed100% Secure |aiPowered by AI
home Home > PDF Text > Convert PDF to Text on Linux

A convert PDF to text job on Linux is easy if you know a few tips and tricks in your particular distro, but what if you're new to Linux and you need to get a PDF document converted to a text-based equivalent? Are there any Linux tools specifically designed for this? How about OCR modules - how do you get them for Linux? The answers to these questions are all in this article, so read on to learn more about how to convert PDF to text in Linux.

Tip: If you need to convert a PDF to text quickly without dealing with complicated command lines, consider using an online PDF converter with OCR (Optical Character Recognition). This makes it much easier to extract text from both scanned image PDFs and regular text PDFs on Linux. 

One great option is HiPDF’s Online PDF to Text Converter, which is free to use. It lets you convert PDFs to text with or without OCR on Linux. Simply upload your PDF file, and the conversion process will begin automatically.

If you're using a Windows PC or Mac, you can also try a desktop PDF tool like Wondershare PDFelement - PDF Editor Wondershare PDFelement Wondershare PDFelement . It allows you to convert PDFs into editable formats such as Word, Excel, or plain text (TXT), offering more flexibility for document editing.

Try It Free Try It Free Try It Free Try It Free
star icon G2 Rating: 4.5/5 | 100% safe100% Secure
star icon G2 Rating: 4.5/5 |seguridad garantizada100% Secure

3 Methods to Convert PDF to Text on Linux

Let's look at a couple of ways to convert PDF to Text on a Linux desktop.

Method 1: Convert PDF to Text on Linux with Calibre

Essentially, what you want to do is convert a non-editable and possibly non-searchable PDF document and convert the content without actually changing the format. For this, you can use freeware or an open-source application like Calibre. It is available in most repos for Ubuntu, Mint, Fedora, and other popular distros. The correct syntax varies from one distribution to another, but your basic Terminal command should look something like this:

sudo apt install calibre

Once installed, you can follow the flow of the process from within the application. Here's what it should look like:

  • Launch the application and click the Add Books button on the top left to import one or more scanned or non-editable PDF documents.
  • When you see the PDFs in the list below the Calibre toolbar, select the file(s) you want to convert to text and hit the Convert Books option at the top.
  • Choose the format of the output file to TXT in the conversion window and hit OK to convert.
pdf to text linux

You can now open the file in any text editor and make changes or edit the content the way you want. This does not retain the format of the original but it's a fairly authentic copy of the non-editable file. The original PDF document will be unchanged, so you can save the new version with a slightly different name like Doc1_OCR, Doc2_OCR, and so on.

Method 2: Convert PDF to Text with Terminal Commands

On the other hand, if you're at an expert level on your Linux machine, you can try the command line way of converting PDF to text. For this, you can use something like pdftotext. It's part of the Poppler package, but the name might vary based on which distro you're using. The first step is to install it, and you can do it with the following commands:

1. First, type the following in Terminal and hit "Enter"

sudo apt install poppler-utils [Works for Debian, Mint, Ubuntu, etc.]

2. The next command is the one for conversion, and it should look like this:

pdftotext -layout source.pdf target.txt [Source is the original PDF and Target is the final output]

To execute the above command, the Terminal prompt needs to be in the same folder location as your source PDF file. Alternatively, you can define a file path before the source and target file names within the command.

3. Hit Enter to run the command on the entire PDF document. To convert just a single range of pages within the document, modify the syntax to match the one shown below:

pdftotext -layout -f M -l N source.pdf target.txt [where M is the first page and N is the last one to be converted.]

linux pdf to text

Method 3: Convert PDF to Text with OCR on Linux

The two tools mentioned above cannot extract text from image or scanned PDFs. If you need to convert scanned or image PDFs to editable or searchable text on Linux, there is an open-source tool by the name OCRmyPDF that you can use.

1. Install OCRmyPDF on Linux

To install OCRmyPDF, enter the following command depending on your Linux system:

sudo apt-get install ocrmypdf [Works for Debian, Ubuntu, etc.]

dnf install ocrmypdf [Works for Fedora]

2. OCR PDF to Text on Linux

Once installed, you can use OCRmyPDF to extract text from image-based PDFs. Here’s how:

ocrmypdf --output-type pdf input.pdf output.pdf [Replace input.pdf with your original scanned or image-based PDF, and output.pdf with the name or path where you want the converted file saved. ]

To convert specific pages only, use the command line: ocrmypdf --pages 2,4,20-24 input.pdf output.pdf

To OCR a PDF file in a non-English language, use the command line: ocrmypdf -l rus french_doc.pdf french_doc_ocr.pdf


How to Convert PDF to Text on Windows and Mac

Now you know how to convert PDF to text in Linux, how about Windows or Mac? Do you know how to do the same thing on these OS platforms? If not, read on to learn about a unique and robust utility to do the same job in operating systems other than Linux.

Wondershare PDFelement - PDF Editor Wondershare PDFelement Wondershare PDFelement is a cross-platform PDF editor with desktop and mobile applications for PDF management. They're a lightweight family of PDF tools that are incredibly powerful and versatile. More importantly, they're far more affordable than some of the other premium options that rule the market today. For that reason, PDFelement is quickly becoming the de facto PDF editor for businesses that can't afford expensive alternatives. In addition, it boasts these features:

Try It Free Try It Free Try It Free Try It Free
star icon G2 Rating: 4.5/5 | 100% safe100% Secure
star icon G2 Rating: 4.5/5 |seguridad garantizada100% Secure

linux pdf to txt
  • Full editing capability for all PDF text, images, links, media, and other objects.
  • Comprehensive markup tools to annotate PDFs.
  • Strong security features for redaction, watermarking, encryption, and digital signing.
  • Advanced batch processes for conversion and OCR tasks.
  • Fully-integrated forms management: create interactive forms, convert from non-editable PDF forms, access a large template library, extract data from forms and PDFs in bulk, etc.
  • Robust ‘to and from PDF' conversion capability with very wide file-type support.
  • More accurate and faster than many premium PDF editors.

Steps for Converting PDF to Text in Windows and Mac:

Windows:

  • 1. After launching PDFelement on your Windows PC, import the file by dragging it into the software window or just click on "File" → "Open" and get it that way. Even when the PDF editor is closed, you can open a document by dragging its icon over the app's icon.
  • 2. If you click on the "Convert" tab option at the top, you'll see a button in the toolbar right below it with the words "To Text" and an icon. The mouseover (tooltip) should say "Convert your PDF to text". Click on the button.
  • 3. Specify your output folder and, if you need to, you can change the output file type on the "Save As" dialog box, too.
pdftotext ubuntu

Mac (macOS versions including 10.15 Catalina):

PDFelement is equally intuitive on a Mac as it is in Windows. You might see a lot of UI differences between the two, but those features have been designed to work as closely as possible with the nuances of their platforms. The end result is a pretty native experience on any platform, including touchscreen-based iOS and Android devices and screens.

  • 1. PDFelement for Mac has a distinctively Mac App feel to it as soon as you install and launch the application. You can open your PDF using the same methods as for Windows - drag-and-drop or using the "File" menu.
  • 2. Again, in the "File" menu, you'll see an option called "Export To", which opens another contextual menu. Select "Text" as your option and wait for the conversion to be completed.
linux convert pdf to text

Now you know all there is to know about how to convert PDF to Text on Linux, Windows, and Mac.

Free Download or Buy PDFelement right now!

Free Download or Buy PDFelement right now!

Try for Free right now!

Try for Free right now!


author avatar

Margarete Cotty

chief Editor