Regular Expression to Get All Image URLs From a Page

Have you ever wanted to, while working on some sort of PHP project, get an array listing of all the images used in a chunk of HTML? I’ve been planning out a web app over the past couple months, which will be doing a bit of RSS parsing, and I thought it would be nice to do just that, when it came time to start coding. Suppose you were going to show a summary of an article from a feed, with a link to the original source. Wouldn’t it look better if you pulled an image from that article, scaled it down if necessary, and displayed it next to it? (Caching it, of course. Hotlinking == bad.)

I was reading an article from Cats Who Code and, lo and behold, there was a code snippet that did just that with a regular expression. (I decided to file it away to save time in the future.)

$images = array();
preg_match_all('/(img|src)=("|')[^"'>]+/i', $data, $media);
unset($data);
$data=preg_replace('/(img|src)("|'|="|=')(.*)/i',"$3",$media[0]);
foreach($data as $url)
{
$info = pathinfo($url);
if (isset($info['extension']))
{
if (($info['extension'] == 'jpg') ||
($info['extension'] == 'jpeg') ||
($info['extension'] == 'gif') ||
($info['extension'] == 'png'))
array_push($images, $url);
}
}

Source: 15 PHP regular expressions for web developers

Feed-parsing is an excellent use for this, as you have just the article, no layout-related imagery, like you would see if you were screen-scraping a web page to obtain the image URLs. Though I imagine Digg takes the latter route when they dig-up (Freudian pun unintended, honest) the thumbnails that go along with their news links.

Twitter 101 for Business

Twitter is reaching out to businesses with a new guide, called Twitter 101 for Business, designed to help show the value that the microblogging can provide, in both customer relations and sales leads. Every day, millions of people use Twitter to create, discover and…

Get the Average Number of Comments Per Post in WordPress

Comments are often a good metric of how engaged your readers are. The more comments you get per post, the better you are doing as a blogger. If you get a high average of comments per post, then your readers are interesting in your…

Automattic WordPress Post Thumbnails

There are plenty of posts floating around the internet about using Custom Fields to assign thumbnail images to individual posts in WordPress. Web Developer Plus has a different idea. Do you often put images in your posts? You probably upload them through the media…

BlogBuzz July 25, 2009

Revenge of the DiggBar

You probably remember the controversy over Digg’s “DiggBar.” Marketed as a URL shortener integrated with Digg, with some other sharing options as well, it had a rough start because of a few technical and behavioral problems that web publishers weren’t too happy about. Well,…

Pods: A Plugin to Turn WordPress Into a Full CMS

WordPress is a good, lightweight CMS well-suited for any article-centric website. If you need a blog, a site with mainly static pages, or a combination, there’s no better software to use. However, WordPress isn’t necessarily the ideal tool for every job. Sometimes you need…

Getting RSS and Twitter Subscriber Counts in WordPress

I previously wrote a post about how some blogs are displaying their RSS subscriber and Twitter follower counts. Mac AppStorm is combining their Twitter and RSS counts into one number, and FreelanceSwitch has a section in their footer with separate readouts for RSS, Twitter,…

TechCrunch Posts Leaked Twitter Documents, Twitter Not Happy

Last Thursday, TechCrunch posted a multitude of proprietary information belonging to Twitter. Some 300 confidential documents arrived in the tech blog’s inbox the prior Tuesday, from someone known as “Hacker Croll.” The documents include employment agreements, calendars of the founders, new employee interview schedules,…

Twitter Security Goof: “Password” isn’t a Good Password

TechCrunch is reporting that the admin panel for Twitter Search was compromised recently. How? The password for it was “Password.” Twitter co-founder Biz Stone, responding to our email, said “this bug allowed access to the search product interface only. No personally identifiable user information…