# How do I get the font file or PDFont of each word in a PDF file?

Is there a way to get the font of each word of a PDF file using PDFBox? I have tried this but it just lists all the fonts used on that page.

```PDDocument pdfDocument = PDDocument.load(new File("xxofd.pdf"));

PDPageTree pages = pdfDocument.getDocumentCatalog().getPages();
for (PDPage page : pages) {
PDResources res = page.getResources();

for (COSName fontName : res.getFontNames()) {
PDFont font = null;
try {
font = res.getFont(fontName);
} catch (IOException e) {
e.printStackTrace();
}
}
}
```

There are many different characters in the pdf file, and maybe different characters are different fonts. I want to extract a subset of these fonts. This subset only contains the fonts of the words that have appeared in the pdf file. This will make the font file smaller.So I want get the font file or PDFont structure of each word of a PDF file. Is there any way? Thanks.

Let the PDF file:

then

```PDDocument pdfDocument = PDDocument.load(new File("/home/josejuan/tmp/fonts.pdf"));

PDFTextStripper pdfStripper = new PDFTextStripper() {
@Override
protected void processTextPosition(TextPosition text) {
System.out.println("Text `" + text.getUnicode() + "` with font `" + text.getFont().getName() + "`");
}
};

// force parse
pdfStripper.getText(pdfDocument);
```

produce the expected output

```Text `E` with font `BAAAAA+LiberationSerif`
Text `x` with font `BAAAAA+LiberationSerif`
Text `a` with font `CAAAAA+CantarellRegular`
Text `m` with font `CAAAAA+CantarellRegular`
Text `p` with font `BAAAAA+LiberationSerif`
....
```

(you can group by of course)

From that code you can describe every character of text, for example, if you need the font file:

```text.getFont().getFontDescriptor().getFontFile()
```

but depending on what exactly you are looking for it will be better to use `PDFont`, `PDFontDescriptor`, `PDStream`, …