duanpi7578 2015-11-08 11:16
浏览 53
已采纳

检测文件之间的代码重复并进行半自动重构

It doesn't matter if the solution is represented by a framework, a tool or anyting else. The problem is pretty hard to solve I'm fighting against it since years.

I'll make an example to better clarify what I'm speaking of.

File1

<head>
<title>Fotografia Elenco Completo Filtri Professionali</title>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
<META name="Language" content="it">
<META http-equiv="Revisit-After" content="2 days">
<style>
<!--
 table.MsoNormalTable
    {mso-style-parent:"";
    font-size:10.0pt;
    font-family:"Times New Roman"}
-->
</style>
</head>

File2

<head>
<title>Militari</title>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
<meta name="keywords" content="militari, ....">
<meta name="robots" content="INDEX, FOLLOW">
<meta name="Language" content="it">
<meta http-equiv="Revisit-After" content="2 days">
<meta name="Rating" content="General">
<link rel="stylesheet" type="text/css" href="./file/stile.css">
<script language="JavaScript">

File 3

<head>
<title>Cinema - Recensioni e Trame di Film</title>
<meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1">
<meta name="keywords" content="recensioni film">
<meta name="description" content="Ottimo sito di recensioni di film, trame di film cinematografice, di Videogame e Romanzi. ">
<meta name="robots" content="INDEX, FOLLOW">
<meta name="Language" content="it">
<meta http-equiv="Revisit-After" content="2 days">
<meta name="Rating" content="General">
<link rel="stylesheet" type="text/css" href="file/stile.css">
<style type="text/css">
body {
    background-color:#F0F0F0;
    text-align: center;
}
</style>

For an human being the task of avoiding this kind of code duplication is obvious. He can recognize that "", "" are delimiters. That the order of line doesn't matter and which part can be put into variables (or stored as values on a database) and also which files are similar enough to be refactored.

The whole process would seem not be so terrible hard to automatize. But.. I couldn't find any solution until now. Even automatizing the recognizing of the delimiter is hard..

The best way I found is to play with regular expression tools and become mad :D


After refactoring

file1

header -> PrintHeader();

file2

header -> PrintHeader();

file3

header -> PrintHeader();

GlobalFile

class header
{
 function PrintHeader
 {
  SELECT title, content-type, language, revisit-after, rating, robots, extra_text_unparsed
  into myArray
  FROM header_table
  WHERE filename = $filename

 foreach(v in myArray)
 {
  echo ....
 }
 }
}

Any suggestion?

  • 写回答

1条回答 默认 最新

  • douzhao9608 2015-11-08 16:15
    关注

    What you want is a clone detector.

    See https://en.wikipedia.org/wiki/Duplicate_code. There's a list of clone detectors there.

    The key issues are:

    • What language does the clone detector support?
    • How does it detect clones?
    • How can such clones be removed?
    • Does the tool provide automation for removing clones?

    Pure "string clone detection" can be language independent, but typically cannot find removable clones because they don't understand boundaries between code fragments.

    I build AST-based clone detectors. These detect clones based on the structure of the target language, as represented by the AST. Clones detected this way are much more natural with respect to language boundaries than other detectors. A downside: these are necessarily language dependent. You need a different detector for each language. The payoff is you get high-quality clones detected across large sets of code.

    Removing clones automatically is hard; each langauge offers its own means for abstracting code (e.g., make a subroutine, macro, include file, ...), and the tool would have to know each of them. You invented an abstraction for HTML which is outside what HTML can code (putting fragments into a database: not in HMTL's vocabulary).

    As a practical matter, there are basically no automated clone removers. Pretty much what you have to do is to identify the clones (this is why the clone detector is good) and then manually remove them, especially to get custom effects like the one you show.

    If you want to implement an automated clone removal tool, you need what amounts to a program transformation system. (See my bio for one, that happens to also support clone detection).

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论

报告相同问题?

悬赏问题

  • ¥20 机器学习能否像多层线性模型一样处理嵌套数据
  • ¥20 西门子S7-Graph,S7-300,梯形图
  • ¥50 用易语言http 访问不了网页
  • ¥50 safari浏览器fetch提交数据后数据丢失问题
  • ¥15 matlab不知道怎么改,求解答!!
  • ¥15 永磁直线电机的电流环pi调不出来
  • ¥15 用stata实现聚类的代码
  • ¥15 请问paddlehub能支持移动端开发吗?在Android studio上该如何部署?
  • ¥20 docker里部署springboot项目,访问不到扬声器
  • ¥15 netty整合springboot之后自动重连失效