dqwh1209 2012-05-20 05:43
浏览 25
已采纳

没有HTML的str_word_count()

I'm using str_word_count() to calculate the number of words in a content from CKEditor. the content I get from the CKEditior is an HTML content, and I need to calculate the word count. in MS words I get the word count 328. On the other hand in html tags I get from my content after using str_word_count() a 362 words. Is there any way to remove any HTML tags from a php string variable? I tried to use strip_tags(), and it gave me 336. is there any way to get the exact word count in PHP ? thank you in advance.

for example this essay entered by a user like this.

Mixed School or Unisex School

Have you ever think about the impact of mixed schools for students? Most of the schools in the U.S are mixed gender, which mean girls and boys are studying with each other in the same classroom. Some parents wonder about the influences of their child’s in the school either in mixed school or in unisex ones. These influences are not about the education only, the influences about their personality, behavior with the opposite sex and finally their education. In my opinion, I think the unisex schools for teenager’s students are much better than mixed schools, and this conclusion based in many reasons.

and in the MS word the word count is: 107

in php

 

Mixed School or Unisex School

 

Have you ever think about the impact of mixed schools for students? Most of the schools in the U.S are mixed gender, which mean girls and boys are studying with each other in the same classroom. Some parents wonder about the influences of their child’s in the school either in mixed school or in unisex ones. These influences are not about the education only, the influences about their personality, behavior with the opposite sex and finally their education. In my opinion, I think the unisex schools for teenager’s students are much better than mixed schools, and this conclusion based in many reasons.

and the result: 114

I'm calculating an extra 7 words for one paragraph essay.

edit

after using

    $text = strip_tags($this->orginal_content);
    $text = str_replace(' ',"",$text);
    $this->orginal_content_count = str_word_count($text);

the result: 112

I've found 3 spaces

        Mixed School or Unisex School       Have you ever think about the impact of mixed schools for students? Most of the schools in the U.S are mixed gender, which mean girls and boys are studying with each other in the same classroom. Some parents wonder about the influences of their child’s in the school either in mixed school or in unisex ones. These influences are not about the education only, the influences about their personality, behavior with the opposite sex and finally their education. In my opinion, I think the unisex schools for teenager’s students are much better than mixed schools, and this conclusion based in many reasons. 
  • 写回答

1条回答 默认 最新

  • duangang1991 2012-05-20 06:41
    关注

    Okay.

    You already know about strip_tags(). That's a good start.

    You're replacing   with a space, but that only deals with that single specific entity. You would be better off using PHP's html_entity_decode() function which will get rid of all of the entity codes from your string.

    If extra spacing is causing you problems, you could try doing str_replace() or preg_replace() to get rid of them. eg:

    $output = preg_replace('/\s\s+/',' ',$input);
    

    This will convert all multiple-whitespace instances into a single space character.

    Now your word count should work a little better.

    Hope that helps.

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论

报告相同问题?

悬赏问题

  • ¥15 目详情-五一模拟赛详情页
  • ¥15 有了解d3和topogram.js库的吗?有偿请教
  • ¥100 任意维数的K均值聚类
  • ¥15 stamps做sbas-insar,时序沉降图怎么画
  • ¥15 买了个传感器,根据商家发的代码和步骤使用但是代码报错了不会改,有没有人可以看看
  • ¥15 关于#Java#的问题,如何解决?
  • ¥15 加热介质是液体,换热器壳侧导热系数和总的导热系数怎么算
  • ¥100 嵌入式系统基于PIC16F882和热敏电阻的数字温度计
  • ¥15 cmd cl 0x000007b
  • ¥20 BAPI_PR_CHANGE how to add account assignment information for service line