This question touched two major processes: OCR and Data Capture (or parsing)
OCR stands for Optical Character Recognition. This process converts images to text. You will have to use this category of software if your PDFs are image-only PDFs (no text layer, such as scan, fax, rasterized, etc.). If your PDF already contains electronic text data, you 'may' be able to skip this step.
Data Capture standard for intelligent data location and extraction, such as finding specific fields among all other text. There are specialized software packages and/or parsing processes for that (see my previous post here).
If all your docs have the same 'area' that contains your text, you can crop the images, then pass smaller zones to OCR, which in turn will simplify your text extraction logic (because there will be less text to deal with).
ilya