drhs13583567608 2017-09-08 15:18
浏览 78
已采纳

如何在不重做数据库的情况下使用错误的数据库优化此旧版Webapp

Questions modified slightly to improve understandability

My goal is to optimize a web application which has a very bad DB design and for which I can't touch the DB (I can't alter table nor introduce a new DB). I can operate on the code itself, on the filesystem or via a proxy. By "optimizing" I mean: reduce the requests sent to the webapp, as opposed to the ones sent to filesystem directly, keep DB queries to a minimum, reduce the number of different URL calls (keep caching in mind).

Let me try to construct a fictitious example, just to provide something to talk upon. Let us imagine this scenario:

  • I have a php webapp, which exposes a database of a million different people. each person decided at some point if they are happy or sad
  • when I visit person.php?id=x {x=1,..1000000}, the page creates a link to show_picture_of_person.php?id=x. show_picture_of_person.php will go inside the db of a million rows and this db will tell me if the person is sad or happy by returning an image. I don't know what this image is, unless I extract it from the DB. If I extract it from the db, i can then analyze it and understand if it's either a sad face or a happy face. The function show_picture_of_person actually outputs an image. Also the DB stores the image itself in a blob. Images are always either sad.jpg or happy.jpg.

what I would like to have, instead of a million link to show_picture_of_person.php?id=x, is to have 2 links, one for a sad.jpg and one for a happy.jpg. Possible solutions in my mind:

  1. I write a script which calls all the possible combinations of show_picture_of_person, save all the images, understand which are the ones that are equal and then write a lookup table. I put that lookup table in a php function make_sensible_url("show_picture_of_person.php?id=x") -> happy.jpg. This function will be called in the person.php script. I am worried about performance here of the php engine itself (and array like that would be by itself a 50+MB file!)
  2. Same as above, but instead of constructing an array in PHP, I create a filesystem of a million text files, and inside each text file I will have the name of the actual static file of the image (avoiding repetitions). The function make_sensible_url("show_picture_of_person.php?id=x") will simply read and output the content of the file. I like this approach as no DB calls and reading to fs should be faster then the huge PHP array in solution 1.
  3. I change person.php so that there is no link to show_picture_of_person.php and instead I make data:images. The issue with this is that, if I have x calls to person.php, I still have 2x calls to the DB (one for person.php and one for show_picture_of_person.php). I would like to have only x calls to the DB. Also this increases the size of the page (in my real case I have ~20 images in 1 page, so lots of queries and lots of bytes)
  4. don't know what else..

How would you solve this? Thanks!


For completeness' sake, here was the original question:

This is the scenario:

  • a database with various tables with data which is not properly indexed (let's say for this argument's sake that we have 5000 unique objects represented in around 50.000 rows - so duplicates are present)
  • we are in a situation in which the database is non modifiable (this also mean that we can't insert another table)
  • we have a php app exposing these objects
  • there exists around 1 million php calls (all legitimate) which return one of those 5000 objects (e.g.: bad_url.php?id=bar, bad_url.php?id=foo, ..)
  • there is no easy way to programmatically decide which of the 5000 objects will be returned

Our goal is to somehow convert the million+ calls to calls which will be giveme.php?id=x, where x is one of the 5000 unique objects.

Just to give you an idea of a theoretical approach:

  • we could index all the millions calls and map them with which distinct object is returned
  • we can create a hash table or something and create a php function which would work as give_me_nice_url("bad_url.php?....").
  • my feeling is that creating an array with such solution would result in a 50-100MB array .. not sure how performant it would be running realtime under load.

My ask for this question is which approach would you use to solve this issue and handle the large data set? Does there exists a better way than a lookup table like in my solution? Remember I can't use a database in the final production setup.

  • 写回答

4条回答 默认 最新

  • dpl9717 2017-09-13 06:28
    关注

    I would cache the results of show_picture_of_person.php?id=x to the file system, which is similar to your approach #2.

    However you may want to consider using a caching library instead of rolling your own. Some frameworks like laravel come with caching options or you could use a 3rd party caching library instead such as https://github.com/tedious/Stash

    Here's an example for stash:

    // show_picture_of_person.php
    
    if (!isset($_GET['image_path'])) {
        // Create Driver with default options
        $driver = new Stash\Driver\FileSystem(array());
        // Inject the driver into a new Pool object.
        $pool = new Stash\Pool($driver);
        // Get a cache item.
        $item = $pool->getItem('image_path_' . $_GET['id']);
    
        // Attempt to get the data
        $image_path = $item->get();
    
        // Check to see if the data was a miss.
        if($item->isMiss())
        {
            // Let other processes know that this one is rebuilding the data.
            $item->lock();
    
            // Run intensive code
            $image_path = codeThatTakesALongTime(); // save image to disk here
    
            if ($image_path) {
                // Store the expensive to generate data.
                $pool->save($item->set($image_path));
            }
        }
    }
    else {
        $image_path = $_GET['image_path'];
    }
    
    // Continue as normal.
    useDataForStuff($image_path);
    

    In person.php?id=x you can now check the cache above for key x and if it's populated then render show_picture_of_person.php?image_path=[sad/happy].jpg and if it's not populated then render show_picture_of_person.php?id=x, which will populate the cache once clicked.

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论
查看更多回答(3条)

报告相同问题?

悬赏问题

  • ¥15 ansys fluent计算闪退
  • ¥15 有关wireshark抓包的问题
  • ¥15 Ubuntu20.04无法连接GitHub
  • ¥15 需要写计算过程,不要写代码,求解答,数据都在图上
  • ¥15 向数据表用newid方式插入GUID问题
  • ¥15 multisim电路设计
  • ¥20 用keil,写代码解决两个问题,用库函数
  • ¥50 ID中开关量采样信号通道、以及程序流程的设计
  • ¥15 U-Mamba/nnunetv2固定随机数种子
  • ¥30 C++行情软件的tick数据如何高效的合成K线