douweng7233 2009-06-25 13:00
浏览 274

使用PHP快速将(.rtf | .doc)文件转换为Markdown语法

I've been manually converting articles into Markdown syntax for a few days now, and it's getting rather tedious. Some of these are 3 or 4 pages, italics and other emphasized text throughout. Is there a faster way to convert (.rtf|.doc) files to clean Markdown Syntax that I can take advantage of?

  • 写回答

7条回答 默认 最新

  • donglu9743 2011-05-09 20:53
    关注

    ProgTips has a possible solution with a Word macro (source download):

    A simple macro (source download) for converting the most trivial things automatically. This macro does:

    • Replace bold and italics
    • Replace headings (marked heading 1-6)
    • Replace numbered and bulleted lists

    It's very buggy, I believe it hangs on larger documents, however I'm NOT stating it's a stable release anyway! :-) Experimental use only, recode and reuse it as you like, post a comment if you've found a better solution.

    Source: ProgTips

    Macro source

    Installation

    • open WinWord,
    • press Alt+F11 to open the VBA editor,
    • right click the first project in the project browser
    • choose insert->module
    • paste the code from the file
    • close macro editor
    • go tools>macro>macros; run the macro named MarkDown

    Source: ProgTips

    Source

    Macro source for safe keeping if ProgTips deletes the post or the site gets wiped out:

    '*** A simple MsWord->Markdown replacement macro by Kriss Rauhvargers, 2006.02.02.
    '*** This tool does NOT implement all the markup specified in MarkDown definition by John Gruber, only
    '*** the most simple things. These are:
    '*** 1) Replaces all non-list paragraphs to ^p paragraph so MarkDown knows it is a stand-alone paragraph
    '*** 2) Converts tables to text. In fact, tables get lost.
    '*** 3) Adds a single indent to all indented paragraphs
    '*** 4) Replaces all the text in italics to _text_
    '*** 5) Replaces all the text in bold to **text**
    '*** 6) Replaces Heading1-6 to #..#Heading (Heading numbering gets lost)
    '*** 7) Replaces bulleted lists with ^p *  listitem ^p*  listitem2...
    '*** 8) Replaces numbered lists with ^p 1. listitem ^p2.  listitem2...
    '*** Feel free to use and redistribute this code
    Sub MarkDown()
        Dim bReplace As Boolean
        Dim i As Integer
        Dim oPara As Paragraph
    
    
        'remove formatting from paragraph sign so that we dont get **blablabla^p** but rather **blablabla**^p
        Call RemoveBoldEnters
    
    
        For i = Selection.Document.Tables.Count To 1 Step -1
                Call Selection.Document.Tables(i).ConvertToText
        Next
    
        'simple text indent + extra paragraphs for non-numbered paragraphs
        For i = Selection.Document.Paragraphs.Count To 1 Step -1
            Set oPara = Selection.Document.Paragraphs(i)
            If oPara.Range.ListFormat.ListType = wdListNoNumbering Then
                If oPara.LeftIndent > 0 Then
                    oPara.Range.InsertBefore (">")
                End If
                oPara.Range.InsertBefore (vbCrLf)
            End If
    
    
        Next
    
        'italic -> _italic_
        Selection.HomeKey Unit:=wdStory
        bReplace = ReplaceOneItalic  'first replacement
        While bReplace 'other replacements
            bReplace = ReplaceOneItalic
        Wend
    
        'bold-> **bold**
        Selection.HomeKey Unit:=wdStory
        bReplace = ReplaceOneBold 'first replacement
        While bReplace
            bReplace = ReplaceOneBold 'other replacements
        Wend
    
    
    
        'Heading -> ##heading
        For i = 1 To 6 'heading1 to heading6
            Selection.HomeKey Unit:=wdStory
            bReplace = ReplaceH(i) 'first replacement
            While bReplace
                bReplace = ReplaceH(i) 'other replacements
            Wend
        Next
    
        Call ReplaceLists
    
    
        Selection.HomeKey Unit:=wdStory
    End Sub
    
    
    '***************************************************************
    ' Function to replace bold with _bold_, only the first occurance
    ' Returns true if any occurance found, false otherwise
    ' Originally recorded by WinWord macro recorder, probably contains
    ' quite a lot of useless code
    '***************************************************************
    Function ReplaceOneBold() As Boolean
        Dim bReturn As Boolean
    
        Selection.Find.ClearFormatting
        With Selection.Find
            .Text = ""
            .Forward = True
            .Wrap = wdFindContinue
            .Font.Bold = True
            .Format = True
            .MatchCase = False
            .MatchWholeWord = False
            .MatchWildcards = False
            .MatchSoundsLike = False
            .MatchAllWordForms = False
        End With
    
        bReturn = False
        While Selection.Find.Execute = True
            bReturn = True
            Selection.Text = "**" & Selection.Text & "**"
            Selection.Font.Bold = False
            Selection.Find.Execute
        Wend
    
        ReplaceOneBold = bReturn
    End Function
    
    '*******************************************************************
    ' Function to replace italic with _italic_, only the first occurance
    ' Returns true if any occurance found, false otherwise
    ' Originally recorded by WinWord macro recorder, probably contains
    ' quite a lot of useless code
    '********************************************************************
    Function ReplaceOneItalic() As Boolean
        Dim bReturn As Boolean
    
            Selection.Find.ClearFormatting
    
        With Selection.Find
            .Text = ""
            .Forward = True
            .Wrap = wdFindContinue
            .Font.Italic = True
            .Format = True
            .MatchCase = False
            .MatchWholeWord = False
            .MatchWildcards = False
            .MatchSoundsLike = False
            .MatchAllWordForms = False
        End With
    
        bReturn = False
        While Selection.Find.Execute = True
            bReturn = True
            Selection.Text = "_" & Selection.Text & "_"
            Selection.Font.Italic = False
            Selection.Find.Execute
        Wend
        ReplaceOneItalic = bReturn
    End Function
    
    '*********************************************************************
    ' Function to replace headingX with #heading, only the first occurance
    ' Returns true if any occurance found, false otherwise
    ' Originally recorded by WinWord macro recorder, probably contains
    ' quite a lot of useless code
    '*********************************************************************
    Function ReplaceH(ByVal ipNumber As Integer) As Boolean
        Dim sReplacement As String
    
        Select Case ipNumber
        Case 1: sReplacement = "#"
        Case 2: sReplacement = "##"
        Case 3: sReplacement = "###"
        Case 4: sReplacement = "####"
        Case 5: sReplacement = "#####"
        Case 6: sReplacement = "######"
        End Select
    
        Selection.Find.ClearFormatting
        Selection.Find.Style = ActiveDocument.Styles("Heading " & ipNumber)
        With Selection.Find
            .Text = ""
            .Replacement.Text = ""
            .Forward = True
            .Wrap = wdFindContinue
            .Format = True
            .MatchCase = False
            .MatchWholeWord = False
            .MatchWildcards = False
            .MatchSoundsLike = False
            .MatchAllWordForms = False
        End With
    
    
         bReturn = False
        While Selection.Find.Execute = True
            bReturn = True
            Selection.Range.InsertBefore (vbCrLf & sReplacement & " ")
            Selection.Style = ActiveDocument.Styles("Normal")
            Selection.Find.Execute
        Wend
    
        ReplaceH = bReturn
    End Function
    
    
    
    '***************************************************************
    ' A fix-up for paragraph marks that ar are bold or italic
    '***************************************************************
    Sub RemoveBoldEnters()
        Selection.HomeKey Unit:=wdStory
        Selection.Find.ClearFormatting
        Selection.Find.Font.Italic = True
        Selection.Find.Replacement.ClearFormatting
        Selection.Find.Replacement.Font.Bold = False
        Selection.Find.Replacement.Font.Italic = False
        With Selection.Find
            .Text = "^p"
            .Replacement.Text = "^p"
            .Forward = True
            .Wrap = wdFindContinue
            .Format = True
        End With
        Selection.Find.Execute Replace:=wdReplaceAll
    
        Selection.HomeKey Unit:=wdStory
        Selection.Find.ClearFormatting
        Selection.Find.Font.Bold = True
        Selection.Find.Replacement.ClearFormatting
        Selection.Find.Replacement.Font.Bold = False
        Selection.Find.Replacement.Font.Italic = False
        With Selection.Find
            .Text = "^p"
            .Replacement.Text = "^p"
            .Forward = True
            .Wrap = wdFindContinue
            .Format = True
        End With
        Selection.Find.Execute Replace:=wdReplaceAll
    End Sub
    
    '***************************************************************
    ' Function to replace bold with _bold_, only the first occurance
    ' Returns true if any occurance found, false otherwise
    ' Originally recorded by WinWord macro recorder, probably contains
    ' quite a lot of useless code
    '***************************************************************
    Sub ReplaceLists()
        Dim i As Integer
        Dim j As Integer
        Dim Para As Paragraph
    
        Selection.HomeKey Unit:=wdStory
    
        'iterate through all the lists in the document
        For i = Selection.Document.Lists.Count To 1 Step -1
            'check each paragraph in the list
            For j = Selection.Document.Lists(i).ListParagraphs.Count To 1 Step -1
                Set Para = Selection.Document.Lists(i).ListParagraphs(j)
                'if it's a bulleted list
                If Para.Range.ListFormat.ListType = wdListBullet Then
                            Para.Range.InsertBefore (ListIndent(Para.Range.ListFormat.ListLevelNumber, "*"))
                'if it's a numbered list
                ElseIf Para.Range.ListFormat.ListType = wdListSimpleNumbering Or _
                                                        wdListMixedNumbering Or _
                                                        wdListListNumOnly Then
                    Para.Range.InsertBefore (Para.Range.ListFormat.ListValue & ".  ")
                End If
            Next j
            'inserts paragraph marks before and after, removes the list itself
            Selection.Document.Lists(i).Range.InsertParagraphBefore
            Selection.Document.Lists(i).Range.InsertParagraphAfter
            Selection.Document.Lists(i).RemoveNumbers
        Next i
    End Sub
    
    '***********************************************************
    ' Returns the MarkDown indent text
    '***********************************************************
    Function ListIndent(ByVal ipNumber As Integer, ByVal spChar As String) As String
        Dim i  As Integer
        For i = 1 To ipNumber - 1
            ListIndent = ListIndent & "    "
        Next
        ListIndent = ListIndent & spChar & "    "
    End Function
    

    Source: ProgTips

    评论

报告相同问题?

悬赏问题

  • ¥20 BAPI_PR_CHANGE how to add account assignment information for service line
  • ¥500 火焰左右视图、视差(基于双目相机)
  • ¥100 set_link_state
  • ¥15 虚幻5 UE美术毛发渲染
  • ¥15 CVRP 图论 物流运输优化
  • ¥15 Tableau online 嵌入ppt失败
  • ¥100 支付宝网页转账系统不识别账号
  • ¥15 基于单片机的靶位控制系统
  • ¥15 真我手机蓝牙传输进度消息被关闭了,怎么打开?(关键词-消息通知)
  • ¥15 装 pytorch 的时候出了好多问题,遇到这种情况怎么处理?