dream198731 2017-11-22 09:20
浏览 57
已采纳

在跳过缩写时,将camel case格式化为可在PHP中读取

So i am stuck - I have looked at tons of answers in here, but none seems to resolve my last problem.

Through an API with JSON, I receive an equipment list in a camelcase format. I can not change that.

I need this camelcase to be translated into normal language -

So far i have gotten most words seperated through:

$string = "SomeEquipmentHere";

$spaced = preg_replace('/([A-Z])/', ' $1', $string);
var_dump($spaced);

string ' Some Equipment Here' (length=20)

$trimmed = trim($spaced);
var_dump($trimmed);
string 'Some Equipment Here' (length=19)

Which is working fine - But in some of the equipments consists of abbreviations

"ABSBrakes" - this would require ABS and separated from Brakes

I can't check for several uppercases next to each other since it will then keep ABS and Brakes together - there are more like these, ie: "CDRadio"

So what is want is the output to be:

"ABS Brakes"

Is there a way to format it so, if there is uppercases next to eachother, then only add a space before the last uppercase letter of that sequence?

I am not strong in regex.

EDIT

Both contributions are awesome - people coming here later should read both answers

The last problems to consists are the following patterns :

"ServiceOK" becomes "Service O K"

"ESP" becomes "ES P"

The pattern only consisting of a pure uppercased abbreviation is fixed by a function counting lowercase letter, if there is none, it will skip over the preg_replace().

But as Flying wrote in the comments on his answer, there could potentially be a lot of instances not covered by his regex, and an answer could be impossible - I don't know if this could be a challenge for the regex.

Possibly by adding some "If there is not a lowercase after the uppercase, there should not be inserted a space" rule

  • 写回答

2条回答 默认 最新

  • dongshan4878 2017-11-22 12:28
    关注

    Here is a single-call pattern that doesn't use any anchors, capture groups, or references in the replacement string: /(?:[a-z]|[A-Z]+)\K(?=[A-Z]|\d+)/

    Pattern&Replace Demo

    Code: (Demo)

    $tests = [
        'SomeEquipmentHere',
        'ABSBrakes',
        'CDRadio',
        'Valve14',
    ];
    foreach ($tests as $test) {
        echo preg_replace('/(?:[a-z]|[A-Z]+)\K(?=[A-Z]|\d+)/',' ',$test),"
    ";
    }
    

    Output:

    Some Equipment Here
    ABS Brakes
    CD Radio
    Valve 14
    

    This is a better method because there is nothing to mop up. If there are new strings to consider (that break my method), please leave them in a comment so that I can update my pattern.

    Pattern Explanation:

    /         #start the pattern
    (?:[a-z]  #match 1 lowercase letter
    |         #or
    [A-Z]+)   #1 or more uppercase letters
    \K        #restart the fullstring match (forget the past)
    (?=[A-Z]  #look-ahead for 1 uppercase letter
    |         #or
    \d+)      #1 or more digits
    /         #end the pattern
    

    Edit:

    There are some other patterns that may provide better accuracy including:

    /(?:[a-z]|\B[A-Z]+)\K(?=[A-Z]\B|\d+)/
    

    Granted, the above pattern will not properly handle ServiceOK

    Demo Link Word Boundaries Link


    or this pattern with an anchor:

    /(?!^)(?=[A-Z][a-z]+|(?<=\D)\d)/
    

    The above pattern will accurately split: SomeEquipmentHere, ABSBrakes, CDRadio, Valve14, ServiceOK, ESP as requested by the OP.

    Demo Link

    *Note: Pattern accuracy can be improved as more sample strings are provided.

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论
查看更多回答(1条)

报告相同问题?

悬赏问题

  • ¥15 DIFY API Endpoint 问题。
  • ¥20 sub地址DHCP问题
  • ¥15 delta降尺度计算的一些细节,有偿
  • ¥15 Arduino红外遥控代码有问题
  • ¥15 数值计算离散正交多项式
  • ¥30 数值计算均差系数编程
  • ¥15 redis-full-check比较 两个集群的数据出错
  • ¥15 Matlab编程问题
  • ¥15 训练的多模态特征融合模型准确度很低怎么办
  • ¥15 kylin启动报错log4j类冲突