dpbf62565 2014-04-08 14:59
浏览 76
已采纳

性能调优PHP调用Java进程

I have a PHP web form that accepts file uploads (image and text), from which text is extracted (OCR and .pdf, .doc, etc stripped to plain text). The text extraction is performed by using exec to invoke a jar file/command line process (I am not in control of the source for either) which returns the text. While testing there is no issue, however, with 5 simultaneous PDF uploads (each about 5MB) the server load maxes out. The entire process (each upload) takes 10-15 seconds and load drops back to normal immediately after.

I am assuming the issue is with Java and allocation to the JRE for each exec call; when manually invoking the jar file from the command line it takes about 10 seconds, so nearly the same as a single upload response. Running the extraction as background processes is not possible because the HTTP response contains the 'data' processed from the uploaded files text. I considered forking the process, but that doesn't help with the server load (will probably make it worse). I am hoping to avoid rewriting the service entirely in Java.

Is there a way to pre-load the Java process JRE or pipe successive files to the same, or something of the like?

  • 写回答

2条回答 默认 最新

  • duanlei8119 2014-04-08 15:05
    关注

    Sure, starting a JVM for each request is an extremely bad idea. That's exactly where Java is slow.

    It should be pretty easy using e.g., ServerSocket. Start a process and send requests to it. It's not the fastest solution, but simple and a guaranteed huge speedup.


    A JAR file is sometimes an "executable", but it's always a "library". It's actually just a renamed ZIP file, so you can easily look what's inside (and I wouldn't call it reverse engineering). There's a file called manifest containing a reference to the main class. You can write your own class calling the original main or ignoring it.

    For this you don't need to modify the original JAR at all. Just make you own, but you don't even need a JAR file. For something as simple a single class should suffice. Then you call it like

    java -cp "old.jar;." YourClass
    

    assuming you're using Windows (otherwise replace ; by :), YourClass is in the main package (which is usually a bad idea, but OK for a single class project), and YourClass.class (i.e., the compiled version of your YourClass.java is in the current working directory.

    I wouldn't go for a faster and more complicated solution like using ServerSocketChannels, as it's not worth it. Starting a new JVM takes time, moreover, it starts with interpreting bytecode and compiling it... that far worse than some communication overhead. You could save some more microseconds....

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论
查看更多回答(1条)

报告相同问题?

悬赏问题

  • ¥15 为什么eprime输出的数据会有缺失?
  • ¥20 腾讯企业邮箱邮件可以恢复么
  • ¥15 有人知道怎么将自己的迁移策略布到edgecloudsim上使用吗?
  • ¥15 错误 LNK2001 无法解析的外部符号
  • ¥50 安装pyaudiokits失败
  • ¥15 计组这些题应该咋做呀
  • ¥60 更换迈创SOL6M4AE卡的时候,驱动要重新装才能使用,怎么解决?
  • ¥15 让node服务器有自动加载文件的功能
  • ¥15 jmeter脚本回放有的是对的有的是错的
  • ¥15 r语言蛋白组学相关问题