Convert HTML to PDF with Special Characters using Java

I am using flying saucer with iText 2.1.7 for converting html to pdf. It works fine, but the problem occurs when there are some chinese, korean, etc characters in the html.

I get unexpected characters in my PDF instead of the normal chinese characters

I found this issue opened, so I assume there is currently no way of making flying saucer into rendering the PDF correctly?

PS: I also found this issue, but I can’t understand the solution they have provided.

This is the code that I am using

String doc = file.toURI().toURL().toString();
ITextRenderer renderer = new ITextRenderer();
renderer.getFontResolver().addFont (
    "C://ARIALUNI.TTF",
     BaseFont.IDENTITY_H,
     BaseFont.EMBEDDED
);
renderer.setDocument(doc);
String outputFile = "report.pdf";
OutputStream os = new FileOutputStream(outputFile);

renderer.layout();
renderer.createPDF(os);
os.flush();
os.close();

Where file is the html which I am trying to convert.

Is there some other way or library to do the same?

This is the css that i am using

@font-face {
  font-family: "Arial";
  src: url("media/arialuni.ttf");
 -fs-pdf-font-embed: embed;
 -fs-pdf-font-encoding: Identity-H; 
}

The HTML file that I need to convert

These are the re-compiled flying saucer jar compatible with itext 2.1..x

Convert Special characters into Html Encoded characters

I want to converter all special characters into Html encoded characters. I found many post related to used HttpUtility.HtmlEncode(); , but it’s only convert some of special characters like &, &

How do I convert special characters using java?

I have strings like: Avery® Laser & Inkjet Self-Adhesive I need to convert them to Avery Laser & Inkjet Self-Adhesive. I.e. remove special characters and convert html special chars to

Java – Problems to convert Html special characters

I’m trying to parse an HTML page by using Xpath with JAVA. Here is my code: /** Cleaning the html file */ /** the ‘doc’ variable is a String containing the whole html file */ TagNode tagNode = new Ht

PHP – Convert Special Characters to HTML Entities

I have a problem with sending emails. If it contains special characters, it won’t send. I want to convert the special characters to HTML entities like this: ==> " & ==> &

Failure in pdf generation using flying saucer if input xhtml contains special characters

I am using flying saucer to convert xhtml to pdf. If the xhtml file contains special characters, pdf generation fails. By special characters, I mean the characters which are outside of ASCII character

Convert special characters to HTML in Javascript

Does any one know how to convert special characters to HTML in Javascript? Example: ‘&’ (ampersand) becomes ‘&amp’ <br> ” (double quote) becomes ‘&quot’ when ENT_NOQUOTES is not s

Ignores html special characters when reading website using Java DocumentBuilder()

I am trying to read a website (HTML) using Java DocumentBuilder(), it is reading but when there is html &pound; &ldquo; sign or any other html especial characters. It stops reading anything af

convert special characters to html code with php

I need a function that will clean a strings’ special characters. I do NOT want this to convert HTML characters like <br /> to &lt;br /&gt; I want to convert things like: •, ½, ’ to html

Convert PDF to HTML file Java API

I want to convert a pdf file to html file using java application. The PDF file contains some images , text etc. Doesn anybody know a good java API? (please don’t suggest Aspose). I tried Apache PDFBox

Convert HTML special characters in a string

Through a JSON parser from Google, I get a string. But it contains all the special characters in html format, such as &#39; instead of ‘. Do you know if there is a special encoding, or some method

Answers

Try this:

font.addFont(Html2Pdfs.class.getResource(“SIMSUN.TTC”).toString().substring(6),BaseFont.IDENTITY_H,BaseFont.NOT_EMBEDDED)

Your font is probably not embedded in the PDF file. ( How do I know if the fonts in a PDF file are embedded or not? )

Every font has a name, ARIALUNI.TTF defines Arial Unicode MS, you should use that.

So change this:

@font-face {
    font-family: Arial1;
    src: url("arialuni.ttf");
    -fs-pdf-font-embed: embed;
    -fs-pdf-font-encoding: Identity-H;
}

* {
        font-family: Arial1;
}

To this:

@font-face {
    font-family: Arial Unicode MS;
    src: url("arialuni.ttf");
    -fs-pdf-font-embed: embed;
    -fs-pdf-font-encoding: Identity-H;
}

* {
        font-family: Arial Unicode MS;
}

This way the font will be embedded.

And you don’t need to call renderer.getFontResolver().addFont, the css is enough.