从其他网站更改页面

Sorry for the vague question name - didn't know how to phrase it.

I have built a PHP engine to parse web pages and extract phone numbers, addresses etc.

This is going to be used by clients to populate an address book by simply entering a new contacts web address.

The problem I am having is useability:

At the moment the script just adds each item (landline number, fax etc) to a different list box and the user picks the correct one - from a useability standpoint this is hard work (how do you know which is the correct contact number without looking at the site)

so my question (finally!)

How would achieve the functionality of

http://bartaz.github.io/sandbox.js/jquery.highlight.html

On someone else website (I have no problem writing this functionality).

FOR CLARITY** I want to show someone elses site (their contact page for example) on my site BUT I want to highlight items I have found (so for example add a tag around a phone number my php script has found)

I am aware that to display a website not on your domain an iFrame would be used - but as I need to alter the page content this is useless.

I also contemplated writing a bookmarklet that could be run on that page - but that means re-writing my parsing engine in javascript and exposing some of my tricks to make it accurate.

So I am left with pulling the page by cURL and then trying to match up javascript files, css files etc. that have relative URLs

Does anyone know how best to achieve this - and any pitfalls that might befall me.

I have tried using simple html dom parser - but it is tricky to get consistency and I also dont know how having two sets of tags, body tags etc. would affect sites.

If anyone has managed this before and could point me to the tools / general methods they used I would be eternally grateful!

PLEASE NOTE - I am very proficient with google and stack-overflow and have looked there first!

写回答
好问题 0 提建议
追加酬金
关注问题
分享
邀请回答
编辑收藏删除结题
收藏举报

1条回答默认最新

关注

码龄粉丝数原力等级 --

被采纳

被点赞

采纳率
dtt78245 2014-01-04 16:44
关注
The ideal HTML solution

The easiest way to work around the relative paths for an arbitrary site would be to use the base href tag to specify the default relative location (just use the url up to the filename, such as <base href="http://www.example.com/path/to/" /> for the URL http://www.example.com/path/to/page. This should go at the top of the head block.

Then you can alter the site simply by finding the relative parts and wrapping them in your own tag, such as a span. For the formatting of these tags, the easiest way would be to add a style attribute, but you could also try to insert a <style> tag in the <head>.

Of course, you'll also need to account for badly made webpages without <html>, <head> or <body> tags. You could either wrap the source in a new set of these tags, or just put in your base and style tags, hoping that the browser will work out what to do.

You probably also want to make this interactive, so you should also wrap them with some kind of link, and ideally you'll insert some javascript to handle their actions by ajax. You should also insert your own header at the top of the page, probably floating at the top, so that they know they're using your tool. Just keep in mind that some advanced pages might then conflict with your alterations (though for those cases you could have a link saying 'is this page not displaying correctly?' to take the user to your original basic listbox page as a backup).

The more robust solution

Clearly there are a lot of potential problems with the above, even though it is ideal. If you want to ensure robustness and avoid any problems with custom javascript and css on the page you're trying to alter, you could instead use a similar algorithm to that used in text based browsers such as lynx to reformat the page consistently. Then you can apply your algorithm to highlight the relevant parts of the page, and you can apply your own formatting as well without risk of it not displaying correctly. This way you can frame it really well and maintain your interface.

The problem with this is that you lose the actual look of the original page, but you should keep the context around the numbers and addresses which is the important thing. You would also then be able to use some dynamic javascript to take the user to each number and address consecutively to improve the user experience. Basically, this is rigorous and gives you complete control over the user experience, but you lose the original look of the website which may or may not confuse your users.

Personally, I'd go for the second option, but I'm not sure if anyone's created such a parser before. If not, the simplest thing you could do would be to strip the tags to get it as plain text. The next simplest would be to convert it into some simple text markup format like markdown, then convert it back into html. That way, you'd keep some basic layout such as headings, italicised and bold text, etc.

You definitely don't want to have nested body tags. It might work, but it'll probably mess up your formatting and be inconsistent across browsers.

Here's a resource I found after a quick Google search:

https://github.com/nickcernis/html-to-markdown

There are other html to markdown scripts, but this was the more robust from the few I found. I'm still not sure though whether it can handle badly formatted pages or ones with advanced formatting, try it out yourself.

There are quite a few markdown to html converters though, in fact you could probably make a custom converter yourself quite easily to accommodate your personal needs.

本回答被题主选为最佳回答 , 对您是否有帮助呢?

解决无用
评论打赏
分享
举报

评论

按下Enter换行，Ctrl+Enter发表内容

报告相同问题？

关注问题

vue文件修改html页面不更新 html vue.js 前端
2022-04-10 15:43

回答 2 已采纳清下浏览器页面缓存 CTRL+F5
前端页面展示出生日期怎么设置 html vue.js 前端
2022-03-08 14:48

回答 1 已采纳 <el-date-picker v-model="form.brthDt" type="date"
页面显示的间距如何修改 vue.js 前端框架
2022-03-15 13:53

回答 3 已采纳改下这个值，看是不是这个值影响的
AngularJS前端页面操作之用户修改密码功能示例
2020-10-20 03:01

主要介绍了AngularJS前端页面操作之用户修改密码功能,结合具体实例形式分析了AngularJS针对前端用户修改密码的判断操作实现技巧,需要的朋友可以参考下
javaweb项目，修改信息后如何通过页面刷新来更新页面 java tomcat 前端
2021-08-24 17:31

回答 3 已采纳修改后成功后，返回用户信息返回会前端页面，更新页面消息，
Java一个前后端不分离的项目,其中前端有一个页面前端修改不了,即使把页面代码和对应的路由删了页面也没有消失 java 前端
2022-03-29 13:44

回答 4 已采纳修改的项目对吗？修改后打完了吗？都没错的情况可以按ctrl+f5强制刷新一下
前端初始化页面，后端接口会被多次调用 vue.js 前端
2023-04-14 17:29

回答 5 已采纳不知道你这个问题是否已经解决, 如果还没有解决的话: 帮你找了个相似的问题, 你可以看下: https://ask.csdn.net/questions/7668205这篇博客你也可以参考下：怎么用v
java-前端页面模板
2019-03-04 16:06

本模板未使用任何前端框架，简单易上手。简单修改后即可满足基本需求。登录、查询展示等功能页面都有。
编辑页面设置下拉框不能修改前端前端框架
2022-12-21 21:51

回答 2 已采纳楼主意思就是这个页面既可以新增又可以编辑对吧，我的想法是【还是要通过设置disabled来实现】，在设置的时候判断当前状态是编辑还是新增。新增的话不设置disabled，编辑的话设置disabled
使用页面传入的id完成修改 java json vue.js
2022-04-24 09:12

回答 2 已采纳这报错看起来是传参的问题，brandName之类的参数值是Brand.class 里面的属性？那是不是得brand.brandName？参照id = #{id}
公司让我客串一下前端页面jsp的修改,我有个按钮搞不出来 css javascript 前端
2022-01-25 15:53

回答 3 已采纳看样子应该是封好的控件你能知道用的啥ui框架吗 toolbarIteml 应该是这个一般都是在这个里面加对应的mode 最好找框架的文档看下
前端随机抽选的一个小页面
2021-07-15 00:27

一个选择的页面，可修改js来换数组中数据
vue项目运行非常缓慢，每次刷新页面至少需要两分钟才会有响应 vue.js 其他前端
2022-06-08 15:38

回答 5 已采纳这个明细是电脑卡顿问题，重启一下电脑试一下。
【前端】一个好看的前端页面
2023-04-20 15:08

颜淡慕潇的博客序言突发奇想，看到这个特效还不错，就加工了一下，如果也能帮... } 完整代码 UUID Activation UUID 手机号有效日期密码生成UUID UUID 学习参考： uuid生成页面效果页面效果2 针对input[type=date]属性样式的更改
前端修改服务器网页源代码,网站后台数据库被手动修改后前端页面不更新的问题解决...
2021-08-11 14:56

富叔的博客问题描述：网站前端用vue，后端用java mvc+tomcat...而如果页面查询一次数据后，再通过手动或其他windows程序修改数据库时，前端页面不会显示修改后的数据。问题分析：经网上查询大量资料，有人说是页面缓存问题，但...
没有解决我的问题, 去提问

悬赏问题

¥15 华为ensp模拟器中S5700交换机在配置过程中老是反复重启
¥15 java写代码遇到问题，求帮助
¥15 uniapp uview http 如何实现统一的请求异常信息提示？
¥15 有了解d3和topogram.js库的吗？有偿请教
¥100 任意维数的K均值聚类
¥15 stamps做sbas-insar，时序沉降图怎么画
¥15 买了个传感器，根据商家发的代码和步骤使用但是代码报错了不会改，有没有人可以看看
¥15 关于#Java#的问题，如何解决？
¥15 加热介质是液体，换热器壳侧导热系数和总的导热系数怎么算
¥100 嵌入式系统基于PIC16F882和热敏电阻的数字温度计

从其他网站更改页面

1条回答 默认 最新

The ideal HTML solution

The more robust solution

悬赏问题

1条回答默认最新