doushen1026 2014-01-07 18:37 采纳率: 100%
浏览 71
已采纳

PHP中的高效字符串替换

Is there any way to make this chunk of code more efficient? I'm not looking for someone to write my code for me, just to point me in the right direction...

    $string = preg_replace('/<ref[^>]*>([\s\S]*?)<\/ref[^>]*>/', '', $string);
    $string = preg_replace('/{{(.*?)\}}/s', '', $string); 
    $string = preg_replace('/File:(.*?)\
/s', '', $string);
    $string = preg_replace('/==(.*?)\=\
/s', '', $string);        
    $string = str_replace('|', '/', $string);
    $string = str_replace('[[', '', $string);
    $string = str_replace(']]', '', $string);
    $string = strip_tags($string);

The catch, however, is that the replacement has to happen in this order...

Sample input text:

    ===API sharing and reuse via virtual machine===
{{Expand section|date=December 2013}}

Some languages like those running in a [[virtual machine]] (e.g. [[List of CLI languages|.NET CLI compliant languages]] in the [[Common Language Runtime]] (CLR), and [[List of JVM languages|JVM compliant languages]] in the [[Java Virtual Machine]]) can share an API.  In this case, a virtual machine enables [[language interoperability]], by abstracting a programming language using an intermediate [[bytecode]] and its [[language binding]]s.==Web APIs==
{{Main|Web API}}
When used in the context of [[web development]], an API is typically defined as a set of [[Hypertext Transfer Protocol]] (HTTP) request messages, along with a definition of the structure of response messages, which is usually in an Extensible Markup Language ([[XML]]) or JavaScript Object Notation ([[JSON]]) format. While "web API" historically has been virtually synonymous for [[web service]], the recent trend (so-called [[Web 2.0]]) has been moving away from Simple Object Access Protocol ([[SOAP]]) based web services and [[service-oriented architecture]] (SOA) towards more direct [[representational state transfer]] (REST) style [[web resource]]s and [[resource-oriented architecture]] (ROA).<ref>
{{cite web
 |first       = Djamal
 |last        = Benslimane
 |coauthors   = Schahram Dustdar, and Amit Sheth
 |title       = Services Mashups: The New Generation of Web Applications
 |url         = http://dsonline.computer.org/portal/site/dsonline/menuitem.9ed3d9924aeb0dcd82ccc6716bbe36ec/index.jsp?&pName=dso_level1&path=dsonline/2008/09&file=w5gei.xml&xsl=article.xsl
 |work        = IEEE Internet Computing, vol. 12, no. 5
 |publisher   = Institute of Electrical and Electronics Engineers
 |pages       = 13–15
 |year        = 2008
}}
</ref> Part of this trend is related to the [[Semantic Web]] movement toward [[Resource Description Framework]] (RDF), a concept to promote web-based [[ontology engineering]] technologies. Web APIs allow the combination of multiple APIs into new applications known as [[mashup (web application hybrid)|mashup]]s.<ref>
{{citation
 |first       = James
 |last        = Niccolai
 |title       = So What Is an Enterprise Mashup, Anyway?
 |url         = http://www.pcworld.com/businesscenter/article/145039/so_what_is_an_enterprise_mashup_anyway.html
 |work        = [[PC World (magazine)|PC World]]
 |date        = 2008-04-23
}}</ref>

Sample output (with current script):

Some languages like those running in a virtual machine (e.g. List of CLI languages/.NET CLI compliant languages in the Common Language Runtime (CLR), and List of JVM languages/JVM compliant languages in the Java Virtual Machine) can share an API.  In this case, a virtual machine enables language interoperability, by abstracting a programming language using an intermediate bytecode and its language bindings.
When used in the context of web development, an API is typically defined as a set of Hypertext Transfer Protocol (HTTP) request messages, along with a definition of the structure of response messages, which is usually in an Extensible Markup Language (XML) or JavaScript Object Notation (JSON) format. While "web API" historically has been virtually synonymous for web service, the recent trend (so-called Web 2.0) has been moving away from Simple Object Access Protocol (SOAP) based web services and service-oriented architecture (SOA) towards more direct representational state transfer (REST) style web resources and resource-oriented architecture (ROA). Part of this trend is related to the Semantic Web movement toward Resource Description Framework (RDF), a concept to promote web-based ontology engineering technologies. Web APIs allow the combination of multiple APIs into new applications known as mashup (web application hybrid)/mashups.
  • 写回答

1条回答 默认 最新

  • dongyirong3564 2014-01-07 18:43
    关注

    Since you are only removing things from your string (i.e. you have always the same replacement pattern), you can put all in a single preg_replace. In this way you will parse the string only once.

    You can optimise your subpatterns by avoiding lazy quantifiers and removing capturing groups that are useless.

    example:

    $str = preg_replace('~{{(?>[^}]++|}(?!}))*+}}|\||\[\[|]]~', '', $str);
    

    will replace your second line and the three str_replace

    details:

    ~            # pattern delimiter
    {{           # literal: {{
    (?>          # open an atomic group (no backtracking inside, make the pattern fail faster)
        [^}]++   # all characters except } one or more times (possessive: same thing than atomic grouping)
      |          # OR
        }(?!})   # a } not followed by }
    )*+          # repeat the atomic group zero or more time (possessive)
    }}           # literal: }}
    |            # OR
    \|           # literal: |
    |            # OR
    \[\[         # literal: [[
    |            # OR
    ]]           # literal: ]]
    ~            # pattern delimiter
    

    You only need now to add the subpattern 1,3,4 to this pattern in the same way. Note that you don't need the s modifier since it never use the dot.

    About strip_tags:

    You can try to use a subpattern too:

    $str = preg_replace('~<[^>]++>~', '', $str);
    

    But be careful with that because your code can contains several traps, example:

    blah blah blah <!--  blah > --> blah blah
    or
    <div theuglyattribute=">">
    

    It is possible to avoid all these problems but your pattern will become very long.

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论

报告相同问题?

悬赏问题

  • ¥15 asp.textbox后台赋值前端不能显示什么原因
  • ¥15 宇视监控服务器无法登录
  • ¥15 PADS Logic 原理图
  • ¥15 PADS Logic 图标
  • ¥15 电脑和power bi环境都是英文如何将日期层次结构转换成英文
  • ¥15 DruidDataSource一直closing
  • ¥20 气象站点数据求取中~
  • ¥15 如何获取APP内弹出的网址链接
  • ¥15 wifi 图标不见了 不知道怎么办 上不了网 变成小地球了
  • ¥50 STM32单片机传感器读取错误