JDT |
John Dixon |
Document Conversion and File Processing |
|
|
Converting documents from one format to another is a common problem that many companies face today. At John Dixon Technology we have considerable experience in the conversion of documents from applications such as Adobe FrameMaker to HTML, and the subsequent use of Perl scripts to tidy up the HTML, ready for publishing on a web site. Click here to view the document conversion portfolio page. Converting documents into HTMLMany applications, such as Word and FrameMaker, offer excellent tools for converting documents into HTML and, in combination with a well designed cascading style sheet (CSS), can often produce perfect results. However, sometimes the results are less than perfect and the HTML code becomes untidy. This then necessitates time spent fixing problems. Quite often messy HTML code is caused by authors not applying paragraph tags or styles correctly in the source document. In the case of one or two documents these errors can be easily resolved by opening the resultant HTML files and editing them directly. However, in the case of hundreds or even thousands of documents, this manual approach would be inefficient and result in days of repetitive editing. At John Dixon Technology, we can provide you with customised scripts (typically written in Perl) to "fix" the HTML code. Scripts can often be written in just a few hours, saving days of potential work.
Customised perl scripts can be used to make a range of modifications to your HTML documents, for example, to split files into frames. Changes to hundreds of files can be effected automatically in this way, saving days of manual work. Perl code snippetsThe following Perl code snippets show how easy it is to use regular expressions1 to automate HTML code changes. Changing the cascading style sheets used by an HTML file The following Perl code changes the style sheets used (one for on-screen viewing and one for printing) by one or more HTML files. The code searches for the line <link rel="StyleSheet" href="standard.css" type="text/css"> and replaces it with <link rel="StyleSheet" href="style/online_style.css" type="text/css" media="screen"> <link rel="StyleSheet" href="style/print_style.css" type="text/css" media="print"> Perl code: $line =~ s/<link rel="StyleSheet" href="standard.css" type="text\/css">/ Note that the line breaks in the above code are for display purposes only. Change the "bold" setting for text The following Perl code changes the formatting of bold text throughout one or more files. The code searches for any occurrences of <span style="color: #000000; font-style: normal; font-weight: bold; text-decoration: none; and replaces them with <b>text</b> Perl code: $line =~ s/<span style="color: #000000; font-style: normal; font-weight: bold; text-decoration: Note that the line breaks in the above code are for display purposes only. Change image settings The following Perl code centres images and removes "width" and "height" settings, and a few other things. The code searches for occurrences of <table align="left, right, etc"><tr><td><img src="images/image name" height="number" width= and replaces them with <table align="center"><tr><td><img src="images/image name" align="center"></td></tr></table> Perl code: $line =~ s/<table align="(.*?)"><tr><td><img src="images\/(.*?)" height="(.*?)" Note that the line breaks in the above code are for display purposes only. 1 A regular expression is a string of characters that tells the searcher (in our case, Perl) which string (or strings) you are looking for. |
|
|||||
|
© 2007-2008 - John Dixon Technology Ltd |
|||||