检测文件之间的代码重复并进行半自动重构

It doesn't matter if the solution is represented by a framework, a tool or anyting else. The problem is pretty hard to solve I'm fighting against it since years.

I'll make an example to better clarify what I'm speaking of.

File1

<head>
<title>Fotografia Elenco Completo Filtri Professionali</title>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
<META name="Language" content="it">
<META http-equiv="Revisit-After" content="2 days">
<style>
<!--
 table.MsoNormalTable
    {mso-style-parent:"";
    font-size:10.0pt;
    font-family:"Times New Roman"}
-->
</style>
</head>

File2

<head>
<title>Militari</title>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
<meta name="keywords" content="militari, ....">
<meta name="robots" content="INDEX, FOLLOW">
<meta name="Language" content="it">
<meta http-equiv="Revisit-After" content="2 days">
<meta name="Rating" content="General">
<link rel="stylesheet" type="text/css" href="./file/stile.css">
<script language="JavaScript">

File 3

<head>
<title>Cinema - Recensioni e Trame di Film</title>
<meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1">
<meta name="keywords" content="recensioni film">
<meta name="description" content="Ottimo sito di recensioni di film, trame di film cinematografice, di Videogame e Romanzi. ">
<meta name="robots" content="INDEX, FOLLOW">
<meta name="Language" content="it">
<meta http-equiv="Revisit-After" content="2 days">
<meta name="Rating" content="General">
<link rel="stylesheet" type="text/css" href="file/stile.css">
<style type="text/css">
body {
    background-color:#F0F0F0;
    text-align: center;
}
</style>

For an human being the task of avoiding this kind of code duplication is obvious. He can recognize that "", "" are delimiters. That the order of line doesn't matter and which part can be put into variables (or stored as values on a database) and also which files are similar enough to be refactored.

The whole process would seem not be so terrible hard to automatize. But.. I couldn't find any solution until now. Even automatizing the recognizing of the delimiter is hard..

The best way I found is to play with regular expression tools and become mad :D

After refactoring

file1

header -> PrintHeader();

file2

header -> PrintHeader();

file3

header -> PrintHeader();

GlobalFile

class header
{
 function PrintHeader
 {
  SELECT title, content-type, language, revisit-after, rating, robots, extra_text_unparsed
  into myArray
  FROM header_table
  WHERE filename = $filename

 foreach(v in myArray)
 {
  echo ....
 }
 }
}

Any suggestion?

写回答
好问题 0 提建议
追加酬金
关注问题
分享
邀请回答
编辑收藏删除结题
收藏举报

1条回答默认最新

关注

码龄粉丝数原力等级 --

被采纳

被点赞

采纳率
douzhao9608 2015-11-08 16:15
关注
What you want is a clone detector.

See https://en.wikipedia.org/wiki/Duplicate_code. There's a list of clone detectors there.

The key issues are:

What language does the clone detector support?

How does it detect clones?

How can such clones be removed?

Does the tool provide automation for removing clones?

Pure "string clone detection" can be language independent, but typically cannot find removable clones because they don't understand boundaries between code fragments.

I build AST-based clone detectors. These detect clones based on the structure of the target language, as represented by the AST. Clones detected this way are much more natural with respect to language boundaries than other detectors. A downside: these are necessarily language dependent. You need a different detector for each language. The payoff is you get high-quality clones detected across large sets of code.

Removing clones automatically is hard; each langauge offers its own means for abstracting code (e.g., make a subroutine, macro, include file, ...), and the tool would have to know each of them. You invented an abstraction for HTML which is outside what HTML can code (putting fragments into a database: not in HMTL's vocabulary).

As a practical matter, there are basically no automated clone removers. Pretty much what you have to do is to identify the clones (this is why the clone detector is good) and then manually remove them, especially to get custom effects like the one you show.

If you want to implement an automated clone removal tool, you need what amounts to a program transformation system. (See my bio for one, that happens to also support clone detection).
本回答被题主选为最佳回答 , 对您是否有帮助呢?

解决无用
评论打赏
分享
举报

评论

按下Enter换行，Ctrl+Enter发表内容

报告相同问题？

关注问题

检测文件之间的代码重复并进行半自动重构 c# php
2015-11-08 11:16

回答 1 已采纳 What you want is a clone detector. See https://en.wikipedia.org/wiki/Duplicate_code. There's a
如何进行局部代码重构前端
2022-03-29 20:50

回答 2 已采纳先梳理逻辑，拆分函数，把一些逻辑复杂的拆分成多个函数，相互调用。逻辑拆分完了过后就可以进行封装合并。然后就可以对模板动手了，通用部分就可以拆分成一些无状态组件。这些组件就可以相互嵌套组合了。
重构自己的代码，减去重复的部分
2016-03-22 04:53

回答 2 已采纳 ``` TextBox作为参数传入 sub insertSQL(textBox_headline as TextBox,textBox_keyword as TextBox,textBo
[Paddle Detection]基于PP-YOLOE+实现道路场景目标检测及部署
2022-12-27 17:51

心无旁骛~的博客该项目着眼于基于视觉深度学习的自动驾驶场景，旨在对车载摄像头采集的视频数据进行道路场景解析，为自动驾驶提供一种解决思路。利用YOLO系列模型PP_YOLOE+完成车辆检测实现一种高效高精度的道路场景解析方式,从而...
java中什么是代码重构，什么时候需要代码重构 java
2016-01-08 05:36

回答 3 已采纳当你的代码不好维护，不好升级，不好管理的时候肯定是需要重新构造。每次重构都会学到很多东西。开始写代码如果质量高，需要重构的量就少。反之就多。参考这个：[http://blog.mkfree.com/p
C++代码重构遇到的问题，不太清楚要求 c++ 有问必答
2021-10-25 16:03

回答 1 已采纳 std::vector<float> 可以用一个 float* 的变量代替 CPoint 可以使用一个二级指针代替举例入下： std::vector<float> v;可以
求教关于代码优化重构的问题
2015-02-04 08:41

回答 2 已采纳 1、看着你这里是set值后去获取信息了，这里的优化是：可以弄个构造方法，这样就不用总set了； 2、关于获取后赋值的，你可以这么做：JSONObject testm_obj = JSONObject
第三章信息系统集成专业技术知识(选择10分)
2023-10-26 09:54

编程指南针的博客软件需求是针对待解决问题的特性的描述。所定义的需求必须可以被验证。在资源有限时，可以通过优先级对需求进行权衡。通过需求分析，可以检测和解决需求之间的冲突；发现系统的边界；并详细描述出系统需求。
如何重构指针的重复Golang代码库用法
2014-09-21 10:30

回答 1 已采纳 You can certainly wrap them up.. perhaps something like this: type registration func(container *r
如何重构语义重复
2018-03-02 11:07

回答 2 已采纳 I would say the essence of sending the request is that you are sending a body to an endpoint and p
matlab重构三维流场 matlab
2023-03-03 15:21

回答 6 已采纳我发现你的问题可能是因为你的输入参数 x,y,u_new,v_new 等都是三维矩阵，而 contourf 函数只能接受二维矩阵作为输入。你需要将三维矩阵切片成二维矩阵，然后再用 contourf 函
Jim Bandy ADOBE 公司首席技术官
2023-07-28 00:29

禅与计算机程序设计艺术的博客早在1992年，就被任命为Adobe公司的首席技术官（Technical Director），负责设计、开发和管理Adobe Creative Suite产品的研发团队。20多年间，他从事研发工作，帮助公司打造出全球领先的数字创意解决方案。Jim Bandy...
使用Fork/Join重构代码后，生产环境和重构前效果一样阿里云
2018-07-31 02:11

回答 1 已采纳这样比较没有多大意义, 最好试一下重构前在阿里云上的时间. 因为不同机器, 参数也可能不一样(内存, 磁盘IO, ＪＤＫ版本等等）．
自动化工具大全
2022-06-02 12:00

Test-Sunny的博客这里对自动化工具进行了汇总简介，给在自动化路上迷茫，不知道学那款软件的你们，综合自己所掌握的知识，选择最适合自己的自动化工具
走向全民开发，低代码重塑企业数字化生产力 | 爱分析报告
2022-04-18 10:00

ifenxi爱分析的博客《2022爱分析·低代码应用实践报告》正式发布！
没有解决我的问题, 去提问

悬赏问题

¥20 机器学习能否像多层线性模型一样处理嵌套数据
¥20 西门子S7-Graph,S7-300，梯形图
¥50 用易语言http 访问不了网页
¥50 safari浏览器fetch提交数据后数据丢失问题
¥15 matlab不知道怎么改，求解答！！
¥15 永磁直线电机的电流环pi调不出来
¥15 用stata实现聚类的代码
¥15 请问paddlehub能支持移动端开发吗？在Android studio上该如何部署？
¥20 docker里部署springboot项目，访问不到扬声器
¥15 netty整合springboot之后自动重连失效

检测文件之间的代码重复并进行半自动重构

1条回答 默认 最新

悬赏问题

1条回答默认最新