如何浏览长文本并将其转换为MySQL的Insert语句

I have a very long text that looks like this:

1- E.M. Smith, J.P. LAVERGNE, P. VIALLEFONT et J. DAUNIS. Recherches en série triazépine-1,2,4. J. Heterocyclic Chem. 12, 66 (1975).

2- M. BENCHIDMI et E.M. ESSASSI. Synthèse de bis s-triazolo [4,3-b : 4,3-d] triazépines-1,2,4. J. Heterocyclic Chem., 13, 885 (1976).

3- LAVERGNE et P. VIALLEFONT. Hydrazinolyse d'azabenzodiazépinones et d'azabenzodiazépine-thiones de type 1,5. Tetrahedron, 33, 28O7 (1977).

4- E.M. ESSASSI. "Synthèse et étude de RMN1H en présence de l'Eu(fod)3 des pyrazolo [1,5,4-ef] benzodiazépine-1,5 ones-6 Bull. Soc. Chim. Belg., 96, 399 (1987).

. . . .

And the list continues for over 300 more, I need to extract each line and add it into an Insert Query for MySql, removing the list numbers and escaping all quotes and double quotes, I have though about using regular expressions but it turns out to be quite difficult for me.

The insert query should look like:

INSERT INTO PUBLICATIONS (NAME,AUTHOR,CITE,PUB_YEAR) VALUES
("Recherches en série triazépine-1,2,4.", "E.M. Smith, J.P. LAVERGNE, P. VIALLEFONT et J. DAUNIS.","J. Heterocyclic Chem. 12, 66","1975"), 
( "Synthèse de bis s-triazolo [4,3-b : 4,3-d] triazépines-1,2,4.", "M. BENCHIDMI et E.M. ESSASSI.","J. Heterocyclic Chem., 13, 885","1976" ),
etc.

I just gave some format to the text to have some idea but it has no spaces or next lines, it is all in one huge string.

What I have thought is using something like:

$string = "all my string"
$pattern = '/regex pattern/';
$replacement = 'result format';
echo preg_replace($pattern, $replacement, $string);

I realized that splitting it up might be impossible as there is no specific pattern so I could maybe add a manually to split each line

Thanks a lot!

展开全部

写回答
好问题 0 提建议
关注问题
分享
邀请回答
编辑收藏删除结题
收藏举报

1条回答默认最新

关注

码龄粉丝数原力等级 --

被采纳

被点赞

采纳率
dongyuling0312 2013-04-23 22:22
关注
EDIT:After observations, this kind of pattern can do the job, but i need more data to see all possible exceptions, and to better understand the "logic" of this kind of data. (But the first answer is always a way.)

Some Rules i have seen:

authors :

Begin with eventually with forname initials followed by the name

All authors are separated by a comma and a space, the last by ~ et ~

end with a dot and a space

titles :

Begin with uppercase with eventually a qouble quote before

don't have dots

don't always ending with a digit:

with a comma before and with a dot and a space after

or with a - before and a space after

except if there's no dot at the end

cites :

Begin with uppercase

differents words with uppercase first letter that can be shorted with a dot

followed by : comma, space, number, comma space number, space.

code

$subject = <<<LOD 1- E.M. Smith, J.P. LAVERGNE, P. VIALLEFONT et J. DAUNIS. Recherches en série triazépine-1,2,4. J. Heterocyclic Chem. 12, 66 (1975). 2- M. BENCHIDMI et E.M. ESSASSI. Synthèse de bis s-triazolo [4,3-b : 4,3-d] triazépines-1,2,4. J. Heterocyclic Chem., 13, 885 (1976). 3- LAVERGNE et P. VIALLEFONT. Hydrazinolyse d'azabenzodiazépinones et d'azabenzodiazépine-thiones de type 1,5. Tetrahedron, 33, 28O7 (1977). 4- E.M. ESSASSI. "Synthèse et étude de RMN1H en présence de l'Eu(fod)3 des pyrazolo [1,5,4-ef] benzodiazépine-1,5 ones-6 Bull. Soc. Chim. Belg., 96, 399 (1987). 1O- J.M.F. BOURGOIN-DE-LA-VILLARDIERE. Recherches en série triazepine-1,2,4: 1 - détermination de la structure de la triazolotriazépinone obtenue par action de l'acétylacétate d'éthyle sur le diamino-3,4 triazole-1,2,4 J. Heterocyclic Chem., 13, 885 (1976). LOD; $pattern = '~# authors : (?(DEFINE)(?<FN>(?:[A-Z]\.){0,3}+(?(?<=\.)\h)) ) # ForName (?(DEFINE)(?<NM>[A-Z](?:[A-Z]++|[a-z]++)(?:-[A-Z](?:[A-Z]++|[a-z]++))*+)) # NaMe [O\d]++-\h(?<author>(?&FN)(?&NM)(?>(,\h(?&FN)(?&NM))*+\het\h(?&FN)(?&NM))?+)\.\h # titles : "?+(?<title>[A-Z][^.]+?(?:\.|(?:,|-)\d))\h # cites : (?<cite>(?:[A-Z][a-z]*+\.?+\h)*[A-Z][a-z]*+\.?+,?+\h[O\d]++,\h[O\d]++)\h # date : $(?<date>[^)]++)$ ~x'; preg_match_all($pattern, $subject, $matches, PREG_SET_ORDER); foreach ($matches as &$match) { //cosmetic foreach ($match as $key=>$value) { if (is_numeric($key)||$key=='NM'||$key=='FN') unset($match[$key]); } } echo '<meta charset="UTF-8"/><pre>' . print_r($matches, true) . '</pre>';

--Answer before edit--

Wow, do you notice there's absolutely nothing to make the difference between Author, Name and Cite. A way is to slice (a simple newline between Author, Name and Cite) that with hand (with about 5s per line, you finish in less than 30min, toutouyoutou:).

I say that because the only difference i see between Author, Name and Cite is the sense that can't be matched with a regex.

If you make this rebarbative work, it will be easy to make the sql query. example:

1- E.M. Smith, J.P. LAVERGNE, P. VIALLEFONT et J. DAUNIS. Recherches en série triazépine-1,2,4. J. Heterocyclic Chem. 12, 66 (1975).

Thats all, no need to touch the number or the date, the regex can do the job. If you do this work, edit your message to have some help for the regex.
本回答被题主选为最佳回答 , 对您是否有帮助呢?

解决无用
评论打赏
分享
举报
编辑

预览
轻敲空格完成输入
显示为

卡片

标题

链接
评论

按下Enter换行，Ctrl+Enter发表内容

编辑

预览

报告相同问题？

关注问题

将以下mysql语句转换成kingbase语句 mysql
2022-09-19 09:00

回答 5 已采纳有没有数据库客户端，在客户端修改吧https://jingyan.baidu.com/article/d5c4b52beca06b9b570dc567.html
Mysql语句转换为Mybatis语句？ mysql sql 数据库
2021-12-16 04:50

回答 2 已采纳在mybatis的xml文件里把>= 写成>= 就可以了>就表示大于号 SELECT date_format(ordertime, '%Y-%m-%d') AS 日期,
mysql ,insert语句插入多条数据 mysql
2022-06-07 10:49

回答 1 已采纳内部的每个括号，放的不是应该是一行的值么？你怎么放的一列？而且不要放主键
mysql(51) : 大数据导出为insert, 支持条件查询
2023-12-25 10:13

Lxinccode的博客【代码】mysql(51) : 大数据导出为insert。
mysql中insert语句出现错误 mysql
2023-03-11 03:08

回答 3 已采纳本质还是数据库字符集的问题，按以下步骤执行下：修改表字符集：alter table test character set utf8;修改字段字符集：alter table test modify na
Oracle/Mysql sql语句转换软件工具？ mysql oracle 数据库
2023-01-05 10:30

回答 3 已采纳可以试试当下最火的ChatGPT
MYSQL能否实现将字符串转换为表达式进行计算？ mysql sql 有问必答
2022-02-14 05:15

回答 2 已采纳 sql不能把字符串当成指令的,只能用存储过程,存储过程里可以把字符串当成动态sql执行,具体可以搜索一下 "mysql 动态sql" 进行了解
【大数据面试】MySQL面试题与答案
2023-12-20 09:36

话数Science的博客数据库中的事务是什么，MySQL中是怎么实现的 MySQL事务的特性? 数据库事务的隔离级别?解决了什么问题?默认事务隔离级别? 脏读，幻读，不可重复读的定义 MySQL怎么实现可重复读? 数据库第三范式和第四范式区别? ...
关于mysql中的Insert语句 big data mysql 其他
2021-04-07 13:42

回答 1 已采纳 id是主键，MySQL底层会保证id不重复，插入重复的id时会报错：duplicate entry for key primary。如果想提前验证，可以在插入前查询一下id在数据库中是否存在。
关于mysql中insert语句的外键问题 mysql
2021-10-24 06:22

回答 1 已采纳登录的时候就把当前账号的id存在session里面，这里拿出来填到value那里
mysql查询语句in查不出为空的数据如何解决？ mysql sql
2023-01-10 07:54

回答 4 已采纳这里有几种解决方法可以让查询语句查出为空的数据： 1.在 IN 子查询中使用 UNION ALL 来将 NULL 值与其他值结合起来。这样，即使 IN 子查询中没有匹配的行，也会返回一个 NULL 值
从Hive建表语句到MySQL的转换
2024-08-14 18:09

琳琅破碎的博客从Hive建表语句到MySQL的转换起因在数据处理和数据仓库建设中，常常会用到Hive...本文将介绍如何将Hive中的建表语句转换为MySQL中的建表语句，方便数据迁移和数据同步。Hive建表语句示例假设我们有一个在Hive中创...
【将txt文本中的数据保存到MySQL数据库】
2024-05-27 07:10

贺公子之数据科学与艺术的博客通过读取txt文本数据并解析json对象，动态生成创建表的SQL语句，最后使用Java代码将json对象批量新增或更新到数据库中。我们将实现动态生成对应的创建表语句，并且使用优化后的Java代码将json对象批量新增或更新到...
大数据基础——MySql篇
2022-02-21 10:08

我菜的要死的博客 MySql 什么是数据库数据库：保存数据的仓库，他在电脑中是一个文件系统，然后把数据都保存在这些特殊的文件中，并且使用固定的语言（SQL语言）去操作文件中的数据。数据库就是按照数据结构来组织，存储和管理数据...
python+大数据-MySQL-day02(黑马)
2022-07-22 06:51

呆猛的呆呆小哥的博客 python+大数据-MySQL-day02(黑马) 一 MySQL学习日志–变量，函数，流程控制 1.变量系统变量：全局变量会话变量自定义变量：用户变量局部变量 #一、系统变量说明：变量由系统定义，不是用户定义，属于...
没有解决我的问题, 去提问

如何浏览长文本并将其转换为MySQL的Insert语句

1条回答 默认 最新

1条回答默认最新