.TH HTML2MARKDOWN "1" "July 2015" "html2markdown 2015.6.21" "User Commands" .SH NAME html2markdown \- converts a page of HTML into markdown. .SH SYNOPSIS .B html2markdown [options...] [(\fIfilename\fR|\fIurl\fR) [\fIencoding\fR]] .SH DESCRIPTION \fBhtml2markdown\fR downloads the specified HTML page, and converts it to text marked up with markdown. The source HTML page may be a local file or remote URL. If not specified, it will be read from standard input. The output is printed to standard output. .P If an \fIencoding\fR is specified, it will override any encoding information provided by the HTTP Server. When not specified, \fBpython-feedparser\fR (if available) will be used to determine the source encoding. If not available, or when reading local files, the encoding is assumed to be UTF-8. .SH OPTIONS .TP \fB\-\-default\-image\-alt\fR=\fITEXT\fR The default alt string for images with missing ones. .TP .B \-\-pad\-tables Pad the cells to equal column width in tables. .TP .B \-\-no\-wrap\-links Don't wrap long links. .TP .B \-\-wrap\-list\-items Wrap long list items. .TP .B \-\-wrap\-tables Wrap long table rows. .TP .B \-\-ignore\-emphasis Don't include any formatting for emphasis. .TP .B \-\-reference\-links Use reference style links instead of in\-line links. .TP .B \-\-ignore\-links Don't include any formatting for links. .TP .B \-\-ignore\-mailto\-links Don't include any formatting for mailto: links. .TP .B \-\-protect\-links Protect links from line breaks surrounding them with angle brackets. .TP .B \-\-ignore\-images Don't include any formatting for images. .TP .B \-\-images\-as\-html Always write image tags as row html; preserves \fIheight\fR, \fIwidth\fR and \fIalt\fR if possible. .TP .B \-\-images\-to\-alt Discard image data, only keep alt text. .TP .B \-\-images\-with\-size Write image tags with height and width attrs as raw html to retain dimensions. .TP .BR \-g ", " \-\-google\-doc Convert an html-exported Google Document. .TP .BR \-d ", " \-\-dash\-unordered\-list Use a dash rather than a star for unordered list items. .TP .BR \-e ", " \-\-asterisk\-emphasis Use an asterisk rather than an underscore for emphasized text. .TP \fB\-b\fR \fIBODY_WIDTH\fR, \fB\-\-body\-width\fR=\fIBODY_WIDTH\fR Number of characters per output line, \fB0\fR for no wrap. .TP \fB\-i\fR \fILIST_INDENT\fR, \fB\-\-google\-list\-indent\fR=\fILIST_INDENT\fR Number of pixels Google indents nested lists. .TP .BR \-s ", " \-\-hide\-strikethrough Hide strike-through text. Only relevant when \fB-g\fR is specified as well. .TP .B \-\-escape\-all Escape all special characters. Output is less readable, but avoids corner case formatting issues. .TP .B \-\-bypass\-tables Format tables in HTML rather than Markdown syntax. .TP .B \-\-ignore\-tables Ignore table-related tags (table, th, td, tr) while keeping rows. .TP .B \-\-single\-line\-break Use a single line break after a block element rather than two line breaks. .B NOTE: Requires \fB--body-width\fR=\fB0\fR. .TP .B \-\-unicode\-snob Use unicode throughout document. .TP .B \-\-no\-automatic\-links Do not use automatic links wherever applicable. .TP .B \-\-no\-skip\-internal\-links Do not skip internal links. .TP .B \-\-links\-after\-para Put links after each paragraph instead of document. .TP .B \-\-mark\-code Mark program code blocks with \fB[code]\fI...\fB[/code]\fR. .TP \fB\-\-decode\-errors\fR=\fIDECODE_ERRORS\fR What to do in case of decode errors. \fBignore\fR, \fBstrict\fR, and \fBreplace\fR are acceptable values. .TP \fB\-\-open\-quote\fR=\fICHAR\fR The character used to open quotes. .TP \fB\-\-close\-quote\fR=\fICHAR\fR The character used to clone quotes. .TP .B \-\-include\-sup\-sub Include the sup and sub tags. .TP .B \-\-version Show program's version number and exit. .TP .BR \-h ", " \-\-help Show a help message and exit. .SH AUTHOR This manpage was written for Debian, by Stefano Rivera .