Lets say I found a website that has the following markup:
<body>
<div id="paper">
<div id="contentwrapper">
<div id="rightcontent">
<h1>1967-002A</h1>
<p>
<strong>NSSDCA/COSPAR ID:</strong> 1967-002A</p>
<div class="twocol">
<div class="urone">
<h2>Description</h2>
<p>
This US Air Force photo surveillance satellite was launched from Vandenberg AFB aboard a Thor Agena D rocket. It was a KH-4A (Key Hole-4A) type satellite. The satellite had fair image quality.
</p>
</div>
<div class="urtwo">
<h2>Alternate Names</h2>
<ul>
<li>02642</li>
</ul>
<h2>Facts in Brief</h2>
<p>
<strong>Launch Date:</strong> 1967-01-14
<br/>
<strong>Launch Vehicle:</strong> Thor
<br/>
<strong>Launch Site:</strong> Vandenberg AFB, United States
<br/>
<strong>Mass:</strong> 1500.0 kg
<br/>
</p>
<h2>Funding Agency</h2>
<ul>
<li>Department of Defense-Department of the Air Force (United States)</li>
</ul>
<h2>Discipline</h2>
<ul>
<li>Surveillance and Other Military</li>
</ul>
</div>
</div>
</div>
</div>
</div>
</body>
This contains information, such as Description, Launch Date, Launch Vehicle, Launch Site and Mass, Funding Agency and Discipline. - These could all be the columns in the mySQL database.
The webpage has a link of /spacecraftDisplay.do?id=1967-002A
. I already have a database with the 1967-002A
- the spacecraft identifier. So I am guessing to take each identifier from my database and save the data from the URL with the same identifier. Each webpage is the same
I already know how to save data from an external API which has a JSON format using Guzzle. Instead of JSON, we are dealing with the HTML of the external website.
What I want to know first, if its possible to even save this data from the webpage or is there limitations to what you can do?
</div>