R
E
S
O
U
R
C
E
S
       Home      Products & Services      Contact Us      Links


WebHatchers will design & develop your site for you.
_______________________

Website Menu Heaven: menus, buttons, etc.
_______________________

Send us your questions.
_______________________

site search by freefind
_______________________

HOME
SEO, Google, Privacy
   and Anonymity
Browser Insanity
JavaScript
Popups and Tooltips
Free Website Search
HTML Form Creator
Animation
Buttons and Menus
Counters
Captchas
Image Uploading
CSS and HTML
PHP
AJAX
XPATH
Website Poll
IM and Texting
Databases—MySQL
   or Not MySQL
Personal Status Boards
Content Management
   Systems
Article Content
   Management Systems
Website Directory
   CMS Systems
Photo Gallery CMS
Forum CMS
Blog CMS
Customer Records
   Management CMS
Address Book CMS
Private Messaging CMS
Chat Room CMS
JavaScript Charts
   and Graphs




Free Personal Status Boards (PSB™)

Free Standard Free PSB

Free PSB Pro Version

Free Social PSB

Free Social PSB Plus (with Email)

Free Business PSB

Free Business PSB Plus (with Email)

PSB demo

Social PSB demo

Business PSB demo

So what's all this PSB stuff about?

Chart comparing business status boards

PSB hosting diagram

PSB Licence Agreement



Copyright © 2002 -
MCS Investments, Inc. sitemap

PSBs, social networking, social evolution, microcommunities, personal status boards
PSBs, social networking, business personal status boards
website design, ecommerce solutions
website menus, buttons, image rotators
Ez-Architect, home design software
the magic carpet and the cement wall, children's adventure book
the squirrel valley railroad, model railroad videos, model train dvds
the deep rock railroad, model railroad videos, model train dvds

PHP Search Web Page for Anything—VIEW SOURCE REMOTELY

This page will help you search any web page on any website—use it to VIEW SOURCE REMOTELY. You can use it for any website but only a page at a time. Why is it different from just going to the page and pressing Ctrl F and searching? With Ctrl F you have to surf to the page first, and you won't be able to find the on-page CSS styling, JavaScript, linked file names, metatag description, metatag keywords, DOCTYPE, and HTML tags and their attributes and classes and ids and styles. You'll have to View Source after loading in the page in order to do that, but this PHP page searching app gets all this data remotely. Of course you can search for page text as well, with our app. And it also finds content from includes that load in more HTML code, such as PHP includes—our favorite. Search for weird code like:
echo '<div class="pw"><table class="ts"><tr><
or <a HREF="http://validator.w3.org/">HTML validator</a> with our app, entering either tags or their html entity representations in your search text. If the exact code you enter is found, you'll get results. Approximations are not found.

The pros of using this app is it's the only one on the Net—currently—to feature remote, full-flavored, complete text and code searches. The cons of this app are its good but not perfect searches where html entities are searched for. The app's instructions spell it out: Search for whatever that's typeable, but if you search for text that has html entities in it, the highlighted results will be without the entities. For example: &lt;/span&gt; will return these results: </span>. And "%20" will only find " " but not "%20". You'll see why this was unavoidable when you read on and see that for Internet safety, we had to turn the web page display into html entities like %20 and &lt;. To clarify, the app will successfully find &lt;/span&gt; when you type in &lt;/span&gt;, but it won't display it as &lt;/span&gt; but as a light-blue-highlighted </span>. To be fair, we've been programming for decades and have done thousands of searches, but 99.9% of the time we do not type in html entities to search for, but instead look for function names, links, file names, image names, keywords, anchor or link text, titles, or code like
$content = file_get_contents($g.$a[$j],0,$context);<BR>. Besides, if you're a PHP-loving programmer, you'll look at the slight quirk in our app as a challenge: can YOU make a routine that finds anything and displays pages safely (like the script below) but has no html entity quirks in results display.

Let's look at the script code: We start with the JavaScript function convert(), which runs the encodeURIComponent() function to do some encoding prior to POSTing it to the PHP script. Next, in the PHP section, we start with error_reporting(E_ERROR); so fatal errors that can't be recovered from are reported and the script halts. Next we POST the form data to the PHP script and use rawurldecode() on the search word but not on the url page. It turns out that even though we encoded them both with the onsubmit event running the encodeURIComponent() function, we only decoded one of them, experimentally. It seems to have made no difference—the url page data seems to have had no need of decoding and probably the encoding was superfluous as well. On http://php.net/manual/en/function.rawurldecode.php, we learn that: "Please note that the combination encodeURIComponent (Javascript) and rawurldecode (PHP) only works well if magic quotes are turned off in php.ini (magic_quotes_gpc = Off)". We checked. Ours is set to On by our host, and this directive, in the PHP Core section of the page display when you run the phpinfo() function, is only addressable at the system level, so we merely sighed and went about our business. To check your Magic Quotes directive, run this PHP file from your public_html folder:

<?php
phpinfo();
?>

Continuing with the code, we check to see if the page URL has been entered, and if not, we echo in the form which has text input fields for the page and search term or phrase. Next we check the input and if their URL has no htm, html, or php extension, they see "Enter URL with htm, html, or php extension" and the page reloads. We dump tags from the URL, trim outside spaces and change inside spaces to %20 so PHP doesn't complain about the file name. Then we use the PHP function file_get_contents() to get all the web page's content from the source on the Net. If their search term is too short or just "the" we tell them: "Enter longer search terms." and reload the page.

We use the stripslashes() function in case there are slashes in the search term or web page content. Then we convert any > or < characters to html entities in the search term but save it in $R rather than $S where the search term is. After stripping slashes from the page content, we use the PHP function str_ireplace(), which replaces some characters with some other characters in a string. In this case, we use a span tag to use the CSS style that puts a light blue background around the search term everywhere it is found in the page content string, using the count parameter option to count the replacements. But we cannot display the page content now, since it's still full of tags and not ready for display. The reason we searched for $S but replaced with $R is that str_ireplace() seems to like searches for > and < all right but the display of these characters won't work unless they're html entities. The browser tries to do tag things if you give it tag characters, but has no problem with showing any html entity character. It should be obvious why the PHP strip_tags() function is the last thing on Earth to be using on the page content string or the search string. We want to view this stuff, not kill it.

Now we run the htmlentities() function on the search string and on the page content to make it safer and more displayable. Next we replace the fully html entitied highlighted search string with the partially html entitied highlighted search string that only has the user-entered search string html entitied. This is the key to making the whole script find all findables and highlight them correctly. What this does is to cure the huge problem we get when we run htmlentities() on the page content in the $t string. The highlighted search terms in $t all get ruined when that function is run, but when we use str_ireplace($x, $z, $t) to replace all purposely ruined versions of the highlighted search string ($x) in $t with the unruined version ($z) that will display our highlighted version of the search string correctly, we cure this merciless ruination. Finally, we replace all new line characters with <BR> tags. This is what makes the display look like good source code rather than an all-jammed-together mess. So why not wait until the htmlentities() function has fixed up the page content and then search and replace on a ruined version of the search string to highlight? Tried that. As the Flashpoint TV show people say for anything not found or not working: "no joy." One has to leave the page string raw until the search and replaces are done—then the conversion to entities of search string and page string done in parallel, followed by str_ireplace($x, $z, $t) seems to be the perfect solution. With such liberal search term entry, with any other order of function running we'd get many more display issues as well as "not found" issues, so we settled for a very effective View Source app with a tiny quirk in displaying but not in finding search strings. Comments?

The str_ireplace() function has an optional count parameter that counts the replacements. We use this and put the result in the $count variable. If $count is true (there were replacements) rather than false, the page URL is displayed as a link to that page, then the entire View Source-type code display of the whole page fills the browser screen, followed by another display of the page url. If $count is false (no replacements), the user sees "Term was not found on this site.", then the page reloads.

Summary: The reason for any replacements during the page search is to highlight the search terms that are found on the page, using the span tag and background color technique.

On to the code for the script. Name the following: search-web-page.php


<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<HTML>
<HEAD>
<META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=windows-1252">
<TITLE>Search Web Page</TITLE>
<meta name="description" content="Search Web Page">
<meta name="keywords" content="Search Web Page,Search Webpage URLs,Search Page,search site page,search website page,php,CMS,javascript, dhtml, DHTML">
<style type="text/css">
BODY {margin-left:0; margin-right:0; margin-top:0;text-align:left;background-color:#ddd}
h1 {font:bold 28px Verdana; color:black;text-align:center}
td {font:normal 13px Verdana;text-align:left;background-color:#ccc}
.textbox {position:absolute;top:50px;left:190px;width:772px;}
.info {position:absolute;top:0px;left:2px;width:150px;background-color:#bbb;border:1px solid blue;padding:5px}
.ts {background-color:#8aa;border:6px solid blue;padding:6px}
.pw {position:absolute;top:150px;left:185px;width:820px;text-align:center}
</style>
<script language="javascript">
function convert(){
w=document.formurl.search.value;
encodeURIComponent(w);
document.formurl.search.value=w;}
</script>
</head>
<body>
<center><h1>Search Web Page</h1></center>
<?php
error_reporting(E_ERROR);

$f=$_POST['page'];
$S=rawurldecode($_POST['search']);
if (!isset($f)){
echo '<div class="pw"><table class="ts"><tr><td style="text-align:center"><form id="formurl" name="formurl" method="post" action="search-web-page.php" onsubmit="convert()"><b>web page URL (must end in php, htm, or html)</b><BR><label for="URL">URL: </b><input type="text" name="page" size="66" maxlength="99" value=""></label><br><br><b>Search word/phrase (only exact word or phrase will be searched for)</b><BR><label for="Search">Search: </b><input type="text" name="search" size="66" maxlength="99" value=""></label><br><br><input type="submit" value="Submit URL"><br><br><input type="reset" value="Reset"></form></td></tr></table></div>';

}else{

if (substr($f,-4)<>".htm" && substr($f,-4)<>"html" && substr($f,-4)<>".php"){echo '<script language="javascript">alert("Enter URL with htm, html, or php extension.\n\nPress a key to submit another URL.");window.location="search-web-page.php"; </script>';}

$f=strip_tags($f);
$f = str_replace(" ", "%20", $f); $f=trim($f);

$t = file_get_contents($f);

echo "<div class='textbox'>";
echo '<table width="772" border="1">';

if (strlen($S)<3 || $S=="the" || $S=="The" || $S=="THE") {echo '<script language="javascript">alert("Enter longer search terms.");window.location="search-web-page.php"; </script>';

}else{

$S=stripslashes($S);
$R = str_ireplace("<", "&lt;", $S);
$R = str_ireplace(">", "&gt;", $R);
$t=stripslashes($t);
$t = str_ireplace($S, '<span style="background-color:lightblue;">'.$R.'</span>', $t, $count);
$z='<span style="background-color:lightblue;">'.$R.'</span>';
$x=htmlentities($z, ENT_QUOTES);
$t=htmlentities($t, ENT_QUOTES);
$t = str_ireplace($x, $z, $t);
$t = str_ireplace("\n", "<BR>", $t);

if ($count){
echo "<a target='_blank' a HREF='".$f."'>".$f."</a><BR>";
echo "<BR><b>There were ".$count." matches.</b><BR><BR>";
echo $t."<BR><I><span style='color:green;background-color:#ddd'>".$f."</span></I>";
echo "<br><br></td></tr></table>";

}else{

echo '<script language="javascript">alert("Term was not found on this site.");window.location="search-web-page.php";</script>';}

}}
?>

</div>

<div id='info' class='info'>Search for whatever that's typeable, but if you search for text that has html entities in it, the highlighted results will be without the entities. For example: &lt;/span&gt; will return these results: </span>. And "%20" will only find " " but not "%20". <BR><A HREF="javascript:history.go(-1)">GO BACK</A> </div>
</body>
</html>