如何在不重做数据库的情况下使用错误的数据库优化此旧版Webapp

Questions modified slightly to improve understandability

My goal is to optimize a web application which has a very bad DB design and for which I can't touch the DB (I can't alter table nor introduce a new DB). I can operate on the code itself, on the filesystem or via a proxy. By "optimizing" I mean: reduce the requests sent to the webapp, as opposed to the ones sent to filesystem directly, keep DB queries to a minimum, reduce the number of different URL calls (keep caching in mind).

Let me try to construct a fictitious example, just to provide something to talk upon. Let us imagine this scenario:

I have a php webapp, which exposes a database of a million different people. each person decided at some point if they are happy or sad
when I visit person.php?id=x {x=1,..1000000}, the page creates a link to show_picture_of_person.php?id=x. show_picture_of_person.php will go inside the db of a million rows and this db will tell me if the person is sad or happy by returning an image. I don't know what this image is, unless I extract it from the DB. If I extract it from the db, i can then analyze it and understand if it's either a sad face or a happy face. The function show_picture_of_person actually outputs an image. Also the DB stores the image itself in a blob. Images are always either sad.jpg or happy.jpg.

what I would like to have, instead of a million link to show_picture_of_person.php?id=x, is to have 2 links, one for a sad.jpg and one for a happy.jpg. Possible solutions in my mind:

I write a script which calls all the possible combinations of show_picture_of_person, save all the images, understand which are the ones that are equal and then write a lookup table. I put that lookup table in a php function make_sensible_url("show_picture_of_person.php?id=x") -> happy.jpg. This function will be called in the person.php script. I am worried about performance here of the php engine itself (and array like that would be by itself a 50+MB file!)
Same as above, but instead of constructing an array in PHP, I create a filesystem of a million text files, and inside each text file I will have the name of the actual static file of the image (avoiding repetitions). The function make_sensible_url("show_picture_of_person.php?id=x") will simply read and output the content of the file. I like this approach as no DB calls and reading to fs should be faster then the huge PHP array in solution 1.
I change person.php so that there is no link to show_picture_of_person.php and instead I make data:images. The issue with this is that, if I have x calls to person.php, I still have 2x calls to the DB (one for person.php and one for show_picture_of_person.php). I would like to have only x calls to the DB. Also this increases the size of the page (in my real case I have ~20 images in 1 page, so lots of queries and lots of bytes)
don't know what else..

How would you solve this? Thanks!

For completeness' sake, here was the original question:

This is the scenario:

a database with various tables with data which is not properly indexed (let's say for this argument's sake that we have 5000 unique objects represented in around 50.000 rows - so duplicates are present)

we are in a situation in which the database is non modifiable (this also mean that we can't insert another table)

we have a php app exposing these objects

there exists around 1 million php calls (all legitimate) which return one of those 5000 objects (e.g.: bad_url.php?id=bar, bad_url.php?id=foo, ..)

there is no easy way to programmatically decide which of the 5000 objects will be returned

Our goal is to somehow convert the million+ calls to calls which will be giveme.php?id=x, where x is one of the 5000 unique objects.

Just to give you an idea of a theoretical approach:

we could index all the millions calls and map them with which distinct object is returned

we can create a hash table or something and create a php function which would work as give_me_nice_url("bad_url.php?....").

my feeling is that creating an array with such solution would result in a 50-100MB array .. not sure how performant it would be running realtime under load.

My ask for this question is which approach would you use to solve this issue and handle the large data set? Does there exists a better way than a lookup table like in my solution? Remember I can't use a database in the final production setup.

展开全部

写回答
好问题 0 提建议
关注问题
分享
邀请回答
编辑收藏删除结题
收藏举报

4条回答默认最新

关注

码龄粉丝数原力等级 --

被采纳

被点赞

采纳率
dpl9717 2017-09-12 22:28
关注
I would cache the results of show_picture_of_person.php?id=x to the file system, which is similar to your approach #2.

However you may want to consider using a caching library instead of rolling your own. Some frameworks like laravel come with caching options or you could use a 3rd party caching library instead such as https://github.com/tedious/Stash

Here's an example for stash:

// show_picture_of_person.php if (!isset($_GET['image_path'])) { // Create Driver with default options $driver = new Stash\Driver\FileSystem(array()); // Inject the driver into a new Pool object. $pool = new Stash\Pool($driver); // Get a cache item. $item = $pool->getItem('image_path_' . $_GET['id']); // Attempt to get the data $image_path = $item->get(); // Check to see if the data was a miss. if($item->isMiss()) { // Let other processes know that this one is rebuilding the data. $item->lock(); // Run intensive code $image_path = codeThatTakesALongTime(); // save image to disk here if ($image_path) { // Store the expensive to generate data. $pool->save($item->set($image_path)); } } } else { $image_path = $_GET['image_path']; } // Continue as normal. useDataForStuff($image_path);

In person.php?id=x you can now check the cache above for key x and if it's populated then render show_picture_of_person.php?image_path=[sad/happy].jpg and if it's not populated then render show_picture_of_person.php?id=x, which will populate the cache once clicked.
展开全部

本回答被题主选为最佳回答 , 对您是否有帮助呢?

解决无用
评论打赏
分享
举报
编辑

预览
轻敲空格完成输入
显示为

卡片

标题

链接
评论

按下Enter换行，Ctrl+Enter发表内容

查看更多回答(3条)

编辑

预览

报告相同问题？

关注问题

【史上最全】Ambari 大数据集群运维与管理操作指南
2022-03-30 11:08

大数据研习社的博客 Hadoop 是用在商业主机网络集群上的大规模、分布式的数据存储和处理基础架构。监控和管理如此复杂的分布式系统是不简单的。为了管理这种复杂性， Apache Ambari 从集群节点和服务收集了大量的信息，并把它们表现为...
【数据库】MySQL + Sequel Pro + 报错总结
2018-11-15 07:50

Yaau的博客 1.创建Maven项目： File - New - Project ;...下列选项中选择 maven - archetype - webapp GroupId : com.imooc ArtifactId（项目名）:o2o 2.下载，补充； 3.查看IDEA左侧Projects：需要生成下列几个Directory...
Java八股文—helloxf
2023-06-25 09:23

HELLO XF的博客轮到下一组相邻元素，执行同样的比较操作，再找下一组，直到没有相邻元素可比较为止，此时最后的元素应是最大的数。除了每次排序得到的最后一个元素，对剩余元素重复以上步骤，直到没有任何一对元素需要比较为止。 ...
一个程序员的成长之路
2023-04-13 08:01

weixin_35713159的博客 equals一般比较对象内容是否相等，而==比较两者的栈中保存的对象堆地址值是否相等，即是否指向同一个对象，equals在object中就存在，如不重写的话两者没有区别，重写时一般使用hashcode方法结合，比较相等首先判断...
JAVA面试必备
2021-08-21 12:05

AI代码食堂的博客线程安全 1、乐观锁，CAS思想 2、synchronized底层实现 3、ReenTrantLock底层实现 4、公平锁和非公平锁区别 5、使用层面锁优化 6、系统层面锁优化 7、ThreadLocal原理 8、HashMap线程安全 9、String不可变原因 ...
java面试概要
2021-04-25 03:31

昨日入城市的博客 String是final修饰的，不可变，每次操作都会产生新的String对象 StringBuffer和StringBuilder都是在原对象上操作 StringBuffer是线程安全的，StringBuilder是线程不安全的 StrubgVffer方法都是synchronized修饰的 ...
Java面试考核题
2021-05-28 05:55

恋雨流觞的博客 1.1Java基础 1.集合有哪些?数据结构?初始长度?扩容机制?哪些是线程安全的? hashmap的底层原理?...集合类型主要有3种：set(集）、list(列表）和map(映射)。...Set里存放的对象是无序，不能重复的，集合中的对象.
HADOOP笔记
2016-03-13 23:58

zys800228的博客 <name>mapreduce.jobhistory.webapp.address <value>hadoop0:19888 ...
JAVA开发全集
2016-09-23 08:16

u神的博客 //6�将Source转化为DOM进行操作，使用Transform对象转换 Transformer tran = TransformerFactory.newInstance().newTransformer(); DOMResult result = new DOMResult(); tran.transform(response, ...
没有解决我的问题, 去提问

如何在不重做数据库的情况下使用错误的数据库优化此旧版Webapp

4条回答 默认 最新

4条回答默认最新