dpbf62565 2014-04-08 14:59
浏览 76
已采纳

性能调优PHP调用Java进程

I have a PHP web form that accepts file uploads (image and text), from which text is extracted (OCR and .pdf, .doc, etc stripped to plain text). The text extraction is performed by using exec to invoke a jar file/command line process (I am not in control of the source for either) which returns the text. While testing there is no issue, however, with 5 simultaneous PDF uploads (each about 5MB) the server load maxes out. The entire process (each upload) takes 10-15 seconds and load drops back to normal immediately after.

I am assuming the issue is with Java and allocation to the JRE for each exec call; when manually invoking the jar file from the command line it takes about 10 seconds, so nearly the same as a single upload response. Running the extraction as background processes is not possible because the HTTP response contains the 'data' processed from the uploaded files text. I considered forking the process, but that doesn't help with the server load (will probably make it worse). I am hoping to avoid rewriting the service entirely in Java.

Is there a way to pre-load the Java process JRE or pipe successive files to the same, or something of the like?

  • 写回答

2条回答 默认 最新

  • duanlei8119 2014-04-08 15:05
    关注

    Sure, starting a JVM for each request is an extremely bad idea. That's exactly where Java is slow.

    It should be pretty easy using e.g., ServerSocket. Start a process and send requests to it. It's not the fastest solution, but simple and a guaranteed huge speedup.


    A JAR file is sometimes an "executable", but it's always a "library". It's actually just a renamed ZIP file, so you can easily look what's inside (and I wouldn't call it reverse engineering). There's a file called manifest containing a reference to the main class. You can write your own class calling the original main or ignoring it.

    For this you don't need to modify the original JAR at all. Just make you own, but you don't even need a JAR file. For something as simple a single class should suffice. Then you call it like

    java -cp "old.jar;." YourClass
    

    assuming you're using Windows (otherwise replace ; by :), YourClass is in the main package (which is usually a bad idea, but OK for a single class project), and YourClass.class (i.e., the compiled version of your YourClass.java is in the current working directory.

    I wouldn't go for a faster and more complicated solution like using ServerSocketChannels, as it's not worth it. Starting a new JVM takes time, moreover, it starts with interpreting bytecode and compiling it... that far worse than some communication overhead. You could save some more microseconds....

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论
查看更多回答(1条)

报告相同问题?

悬赏问题

  • ¥15 seatunnel-web使用SQL组件时候后台报错,无法找到表格
  • ¥15 fpga自动售货机数码管(相关搜索:数字时钟)
  • ¥15 用前端向数据库插入数据,通过debug发现数据能走到后端,但是放行之后就会提示错误
  • ¥30 3天&7天&&15天&销量如何统计同一行
  • ¥30 帮我写一段可以读取LD2450数据并计算距离的Arduino代码
  • ¥15 飞机曲面部件如机翼,壁板等具体的孔位模型
  • ¥15 vs2019中数据导出问题
  • ¥20 云服务Linux系统TCP-MSS值修改?
  • ¥20 关于#单片机#的问题:项目:使用模拟iic与ov2640通讯环境:F407问题:读取的ID号总是0xff,自己调了调发现在读从机数据时,SDA线上并未有信号变化(语言-c语言)
  • ¥20 怎么在stm32门禁成品上增加查询记录功能