I’ve been working on a neat enhancement for my Tweetable WordPress plugin. Already I have a handy “Documentation” link on the plugin’s pages in the WordPress admin. When clicked, it opens a ThickBox dialog pointing to the README.txt file.
Not bad, but it had a few rough edges. Raw markdown doesn’t look look stellar, and then there was the problem with the horizontal scrollbars that would appear from loading a plain text file into the ThickBox. So I made a new script that would load-up the README.txt file and use Regular Expressions to parse some of the more basic markdown syntax into good old HTML.
As I write this, the changes haven’t been released to the public quite yet, as I have a few more things to finish up before putting out a new patch to the plugin, but they’re on their way.
How do you pull off something like this? It’s not too hard.
First, dump a basic HTML page wrapper into your new PHP file:
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd"> <html xmlns="http://www.w3.org/1999/xhtml"> <head> <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" /> <title>Documentation</title> </head> <body> </body> </html>
Now, between the two body tags, we’ll put the beginnings of our script. We need to reference wp-load.php, so we can access a few WordPress-related functions later.
<?php require_once('../../../wp-load.php');
Now it’s time to load the README.txt file. Once we dump the contents into a variable, we run them through a series of functions. wp_specialchars()
to escape PHP code and other unpleasant things, nl2br()
to turn each newline character into a <br />
tag (which makes the text nice and readable, instead of a jumbled mess), and finally make_clickable()
to turn any URLs into clickable links.
$readme = file_get_contents('readme.txt'); $readme = make_clickable(nl2br(wp_specialchars($readme)));
With that out of the way, we move on to actually parsing some of the markdown formatting. Let’s start with turning backticks (`) into HTML <code> and </code> tags.
$readme = preg_replace('/`(.*?)`/', '<code>\\1</code>', $readme);
It may look a bit…strange, but that line does just as advertised. The / characters signify the start and end of a Regular Expression, and the middle part isn’t too hard to guess at. The backticks are the markdown formatting we see wrapping a section of code (e.g. `echo $this;`
) The part between the backticks, enclosed by the parenthesis, means “one or more of any sort of letter, number, or character.” The second argument of preg_replace()
is the part we’ll be replacing the matches with, code tags with the content inside the backticks (represented as \\1
) inside them.
Now we do a similar thing for *italics* and **bold text**. It’s important to put the line for the boldface formatting before the one for the italics, otherwise you’ll have some Unexpected Results happening.
$readme = preg_replace('/[\040]\*\*(.*?)\*\*/', ' <strong>\\1</strong>', $readme); $readme = preg_replace('/[\040]\*(.*?)\*/', ' <em>\\1</em>', $readme);
This one looks like more of a mess, doesn’t it? That’s because we have to escape the asterisks with backslashes (i.e. \*
), as the asterisk has meaning in a regular expression otherwise. The [\040]
, which represents a space character, is added so the expression will only match instances where the first asterisk has a space in front of it. This is mainly a safety feature, so no code snippets break anything…
Next we handle headings, which are marked-up as one to three equality signs on either side of a line of text.
$readme = preg_replace('/=== (.*?) ===/', '<h2>\\1</h2>', $readme); $readme = preg_replace('/== (.*?) ==/', '<h3>\\1</h3>', $readme); $readme = preg_replace('/= (.*?) =/', '<h4>\\1</h4>', $readme);
Once again, the order of the lines matters.
Now all that needs to be done is to echo-out the text and close our PHP block:
echo $readme; ?>
That wasn’t too hard was it? It’s only the most basic markdown syntax that’s being parsed, but it’s lightyears better than plain text.
Pingback: designfloat.com