I'm pulling articles from specific URLs for conversion to sentences, but the text body has a random behavior of eliminating whitespace between some sentences resulting in:
Jane went to the store.She bought a dog. The dog was very friendly.It had no teeth.
Some of my text is stock symbols (AZ.GAN) etc. So I can't simply insert a space between all periods which have no adjacent whitespace.
Jane bought several shares of (TY.JPN). She lost all her cash money."Arg!" She cried.
The above example would destroy the stock symbol variable.
Curious if anyone knows the cause of this. I have tried several HTML and DOM. I use Simple_DOM to grab the plaintext. Although, I get the same result if I do it manually, or with any other parsing engine.