Details
Parsing PDFs in Python with Tika
|
A few months ago, one of my friends asked me if I could help him extract some data from a collection of PDFs. The PDFs contained records of his financial transactions over a period of years and he wanted to analyze them. Unfortunately, Excel and plain text versions of the files were no longer available, so the PDFs were his only option. I reviewed a few Python-based PDF parsers and decided to try Tika, which is a port of Apache Tika. Tika parsed the PDFs quickly and accurately. I extracted the data my friend needed and sent it to him in CSV format so he could analyze it with the program of his choice. Tika was so fast and easy to use that I really enjoyed the experience. I enjoyed it so much I decided to write a blog post about parsing PDFs with Tika. |
python |
Submitted by elementlist on Jan 13, 2017 |
356 views. Averaging 0 views per day. |
Please login or register if you wish to leave a comment.
Submit
New Links
Most Popular
Quick Search
Statistics
3,012 listings in 21 categories, with 2,291,664 clicks. Directory last updated Sep 12, 2023.
Welcome JamesCal, the newest member.
Comments on Parsing PDFs in Python with Tika