I'm trying to parse and match a lot of legal text, splitting it all up into individual sentences. I have the following regex which would work for just a few lines of easy text just fine:
[^\.\!\?\;
]*[\.\!\?\;
](\s+)
! and ? or pretty irrelvant here but . and ; as separators are quite common in the texts I'm trying to work with. The problem is that the above regex is just finding those delimiters followed by a space character. The following text for example would not be properly matched:
Member State law or pursuant to contract with a health professional and subject to the conditions and safeguards referred to in paragraph 3; processing is necessary for reasons of public interest in the area of public health, such as protecting against serious cross-border threats to health or ensuring high standards comparison tool at https://ec.europa.eu/ploteus/en/compare Adopted 7 comparable procedures (e. g. certifications/audits), and registered as required by the Member State. of quality and safety of health care and of medicinal products or medical devices, on the basis of Union or Member State law, which provides for suitable and specific measures to safeguard the rights and freedoms of the data subject, in particular professional secrecy; processing is...
the following entire section:
processing is necessary for reasons of public interest in the area of public health, such as protecting against serious cross-border threats to health or ensuring high standards comparison tool at https://ec.europa.
would not be matched at all.
Any help in improving the above regex would be greatly appreciated!
Thanks