Sunday, February 22, 2009

Workflow: CHF to Sony Reader PDF files

I’ve had a lot of difficulty getting CHM files to format properly for an eReader. My goals were to retain the table of contents, graphics, and tables. All content is resized for readability.

Prerequisites:

  1. CHM to HTML converter (CHM to HTML, Armenian Dictionary Software)
  2. Microsoft Word 2007 with VBS and PDF/XPS output plugin
  3. Word Macros: Download Here

General Scheme

  1. Convert CHM to a directory of HTML files
  2. Import HTML into Microsoft Word
  3. Clean it up using page breaks and search'/replace
  4. Convert CHM/ HTML headings into Word Headings
  5. Macro: Resize page, graphics, and tables
  6. Final formatting (line spacing, font changes, and sizing)
  7. Save as PDF

1. Convert CHM to HTML

Any number of programs can be used for this purpose and straightforward. There are even methods using Microsoft’s built in .chm decompression.

2. Insert HTML into a blank Word document

Use Word’s “Insert Text File” feature to import in all of the newly created HTML files from the output directory.

test[4]

A few tips:

Use the search box at the top of the insert dialog to filter all the *.html or *.htm files in the directory.test2

Insert a few files at a time if the files aren’t conveniently numbered. Also, I would suggest leaving out the less useful portions of a book such as some of the introductory material and index (which will be useless)

After this step, you will have a nearly perfect version of your CHM file which is basically an HTML document. Word seems to handle this well.

3. Clean up the document

  • create a title page by creating a page break
  • search and replace the strange characters that some times show up in CHM ebooks (œ").
  • search and replace any frequent common headers or footers. Use the wild card checkbox to make this job easier.
  • test3

4. Convert CHM headings to Word Headers to easily create bookmarks

Word has a fascination feature where you can “search” for a certain font and size and “replace” it with a style. Usually every CHM file will have their headers in a larger font, which is sufficient for this trick.

The first step though is to make sure your document styles for Headers matches you CHM files. If you haven’t worked with Word styles before this can be confusing.

Click on one of the headings in your book “Chaper 1: etc.” and then right click on “Header 1” and pick “Match style to formatting”. That’s it.

Now go to search and replace and then in the “Find” box keep it blank and pick “formatting” in the pull up box at the bottom of the dialog box. Then pick font and match the size and font to your heading. In the “replace” box, pick the “formatting box” but this time pick “style” and then choose header1. Then replace all.

tesmp 4

est4

You’ve now effectively transformed random plain text in your document that was distinguished by it’s size into a bona fide Word heading that will automatically translate into a PDF bookmark when you convert.

5. The following macro does the rest of the dirty work of formatting the document to the right page size, resizing graphics, and resizing tables. One of the sub modules is from http://guru-web.blogspot.com who wrote an excellent module for resizing images.

Sub DocToSonyReader()

With ActiveDocument.PageSetup
    .PageHeight = InchesToPoints(4.82)
    .PageWidth = InchesToPoints(3.57)
    .LeftMargin = CentimetersToPoints(0.2)
    .RightMargin = CentimetersToPoints(0.2)
    .TopMargin = CentimetersToPoints(0.2)
    .BottomMargin = CentimetersToPoints(0.2)
    End With

ConvertShapes

End Sub

Private Sub ConvertShapes()
' Used from http://guru-web.blogspot.com/2008/08/converting-word-docs to-sony-ebook.html
    Dim iShapeCount As Integer
    Dim height As Integer
    Dim width As Integer
    Dim DocThis As Document
    Set DocThis = ActiveDocument
    iShapeCount = DocThis.InlineShapes.Count
    Dim J As Integer
    For J = 1 To iShapeCount
        width = DocThis.InlineShapes(J).width
        If PointsToInches(width) > 3 Then
            DocThis.InlineShapes(J).width = InchesToPoints(3)
        End If
    Next J
End Sub

Private Sub ResizeTables()
Dim myTable As Table
   For Each myTable In ActiveDocument.Tables
      myTable.Select
      Selection.Font.Size = 8
      myTable.PreferredWidth = InchesToPoints(3)
      myTable.LeftPadding = 0
      myTable.Rows.Alignment = wdAlignRowCenter
      With Selection.ParagraphFormat
        .LeftIndent = InchesToPoints(0)
        .RightIndent = InchesToPoints(0)
        .SpaceBefore = 0
        .SpaceBeforeAuto = False
        .SpaceAfter = 0
        .SpaceAfterAuto = False
        .LineSpacingRule = wdLineSpaceSingle
        .WidowControl = True
        .KeepWithNext = False
        .KeepTogether = False
        .PageBreakBefore = False
        .NoLineNumber = False
        .Hyphenation = True
        .FirstLineIndent = InchesToPoints(0)
        .OutlineLevel = wdOutlineLevelBodyText
        .CharacterUnitLeftIndent = 0
        .CharacterUnitRightIndent = 0
        .CharacterUnitFirstLineIndent = 0
        .LineUnitBefore = 0
        .LineUnitAfter = 0
        .MirrorIndents = False
        .TextboxTightWrap = wdTightNone
    End With
    Selection.Tables(1).Rows.Alignment = wdAlignRowCenter
   Next myTable
End Sub

*Before running this macro make sure you are in DRAFT view. If you run in page view, the document updating itself can lead to instability within the macro.

Like with most Word macros, expect there to be some difficulties. I’ve run it on a 8,000 page textbook and it might take a while. Usually though, it works well enough and is stable.

6. Finally, the finishing touches

  1. select all the text (CTRL-A) and change the font to one that is friendly with the Sony Reader. George and Book Antiqua are good choices.
  2. Change the size.of the fonts to 12 which I find is very readable.
  3. If you prefer, change the line spacing to 1.5.
  4. Use the document properties to add title and author information
  5. Add a picture to title page from Amazon

7. Now, go to File Save As and PDF/XPS file. If this option isn’t available, download the plug in from Microsoft at http://www.microsoft.com/downloads/details.aspx?FamilyId=F1FC413C-6D89-4F15-991B-63B07BA5F2E5&displaylang=en

asdfad

Finally, make sure you check “Create bookmarks using Headings”.

8. That’s it!  I essentially will preview the file in Acrobat or FoxIt and then copy the file directly to the my reader’s memory card.

Hope this helps!

Tuesday, January 6, 2009

Small, Green, and Cheap Windows Home Server



WHS Server "Black_Rook"

If you are reading this post you've probably been hit by the fever of wanting to custom build your own Windows Home Server. Before putting my creation together, I spend weeks thinking and plotting what components I would use and what functions my server would perform. Thinking is probably too mild of a term - my significant other would freely use the word obsessed. 

Finally, I began to purchase piece after piece -roaming freely through the vastness of the internet. Every few days a new box would arrive - a cooling fan, a SATA cable, a case, etc. My prototype server, an old repurposed Dell Dimension 4500, continued to moan away in the corner - it's 5 IDE hard drives whining and it's venerable 2.4 Ghz P4 churning through watts. I knew the old workhorse could tell my embarrassment at it's hefty figure by my sole use of remote desktop and it's home within the closet. I longed for a svelt, sweet chassis and an elegant SATA setup. 

The most interesting thing about my setup is that it is unique. I could not build another one of these if I wanted (I tried) and certainly not at the price I achieved. This is not a guide on how to build a cheap WHS server, this is how a cheap WHS server was built.

I managed to take advantage of the EBAY and Microsoft live cashback offer, thereby reducing my price on all EBAY purchases by 30%. Furthermore, I painstakenly played the wicked EBAY game complete with myebay alerts and sniping choice items when they wandered online. You might expect with such technique a horrible frankensteinish monster was created - complete with a beige DVD drive in a black case - but this is not so. It is indeed a sight, however.

My computer naming scheme is fairly common and involves chess pieces. "Black Rook" is the result of the inspiration of the WHS community. Let me break it down:

Case InWin Mt Jade $50 Small case, co-designed by Intel. Only needs CPU fan. Quality.
CPU Core 2 Duo E4300 $90 Found the CPU, MB, and MEM on EBAY as someone had
MB Intel "Council Bluffs" 945G VIIV  removed it from an old gateway media pc. Great deal.
MEM 2 Gig PC-4200
HD WD 1TB Green  $75 Popular drive. Again, Live.com cashback off $100
FAN Stock Intel CPU fan $10 Needed to purchase, not included

What we have in the end is a nearly silent, very low power, 1 TB Windows Home Server for < $250.
What it is doing now will be another post.