I want to implement a feature which allows the user to double-click to highlight a word in a PDF document using the JPedal library. This would be trivial to do if I could get a word’s bounding rectangle and see if the MouseEvent location falls within it; the following snippet demonstrates how to highlight a region:
private void highlightText() { Rectangle highlightRectangle = new Rectangle(firstPoint.x, firstPoint.y, secondPoint.x - firstPoint.x, secondPoint.y - firstPoint.y); pdfDecoder.getTextLines().addHighlights(new Rectangle[]{highlightRectangle}, false, currentPage); pdfDecoder.repaint(); }
I can only find plaintext extraction examples in the documentation however.
Answer
After looking at Mark’s examples I managed to get it working. There are a few quirks so I’ll explain how it all works in case it helps someone else. The key method is extractTextAsWordlist
, which returns a List<String>
of the form {word1, w1_x1, w1_y1, w1_x2, w1_y2, word2, w2_x1, ...}
when given a region to extract from. Step-by-step instructions are listed below.
Firstly, you need to transform the MouseEvent
‘s Component/screen coordinates to PDF page coordinates and correct for scaling:
/** * Transforms Component coordinates to page coordinates, correcting for * scaling and panning. * * @param x Component x-coordinate * @param y Component y-coordinate * @return Point on the PDF page */ private Point getPageCoordinates(int x, int y) { float scaling = pdfDecoder.getScaling(); int x_offset = ((pdfDecoder.getWidth() - pdfDecoder.getPDFWidth()) / 2); int y_offset = pdfDecoder.getPDFHeight(); int correctedX = (int)((x - x_offset + viewportOffset.x) / scaling); int correctedY = (int)((y_offset - (y + viewportOffset.y)) / scaling); return new Point(correctedX, correctedY); }
Next, create a box to scan for text. I chose to make this the width of the page and +/- 20 page units vertically (this is a fairly arbitrary number), centered at the MouseEvent
:
/** * Scans for all the words located with in a box the width of the page and * 40 points high, centered at the supplied point. * * @param p Point to centre the scan box around * @return A List of words within the scan box * @throws PdfException */ private List<String> scanForWords(Point p) throws PdfException { List<String> result = Collections.emptyList(); if (pdfDecoder.getlastPageDecoded() > 0) { PdfGroupingAlgorithms currentGrouping = pdfDecoder.getGroupingObject(); PdfPageData currentPageData = pdfDecoder.getPdfPageData(); int x1 = currentPageData.getMediaBoxX(currentPage); int x2 = currentPageData.getMediaBoxWidth(currentPage) + x1; int y1 = p.y + 20; int y2 = p.y - 20; result = currentGrouping.extractTextAsWordlist(x1, y1, x2, y2, currentPage, true, ""); } return result; }
Then I parsed this into a sequence of Rectangle
s:
/** * Parse a String sequence of: * {word1, w1_x1, w1_y1, w1_x2, w1_y2, word2, w2_x1, ...} * * Into a sequence of Rectangles. * * @param wordList Word list sequence to parse * @return A List of Rectangles */ private List<Rectangle> parseWordBounds(List<String> wordList) { List<Rectangle> wordBounds = new LinkedList<Rectangle>(); Iterator<String> wordListIterator = wordList.iterator(); while(wordListIterator.hasNext()) { // sequences are: {word, x1, y1, x2, y2} wordListIterator.next(); // skip the word int x1 = (int) Float.parseFloat(wordListIterator.next()); int y1 = (int) Float.parseFloat(wordListIterator.next()); int x2 = (int) Float.parseFloat(wordListIterator.next()); int y2 = (int) Float.parseFloat(wordListIterator.next()); wordBounds.add(new Rectangle(x1, y2, x2 - x1, y1 - y2)); // in page, not screen coordinates } return wordBounds; }
Then identified which Rectangle
the MouseEvent
fell within:
/** * Finds the bounding Rectangle of a word located at a Point. * * @param p Point to find word bounds * @param wordBounds List of word boundaries to search * @return A Rectangle that bounds a word and contains a point, or null if * there is no word located at the point */ private Rectangle findWordBoundsAtPoint(Point p, List<Rectangle> wordBounds) { Rectangle result = null; for (Rectangle wordBound : wordBounds) { if (wordBound.contains(p)) { result = wordBound; break; } } return result; }
For some reason, just passing this Rectangle to the highlighting method didn’t work. After some tinkering, I found that shrinking the Rectangle
by a point on each side resolved the problem:
/** * Contracts a Rectangle to enable it to be highlighted. * * @return A contracted Highlight Rectangle */ private Rectangle contractHighlight(Rectangle highlight){ int x = highlight.x + 1; int y = highlight.y + 1; int width = highlight.width -2; int height = highlight.height - 2; return new Rectangle(x, y, width, height); }
Then I just passed it to this method to add highlights:
/** * Highlights text on the document */ private void highlightText(Rectangle highlightRectangle) { pdfDecoder.getTextLines().addHighlights(new Rectangle[]{highlightRectangle}, false, currentPage); pdfDecoder.repaint(); }
Finally, all the above calls are packed into this convenient method:
/** * Highlights the word at the given point. * * @param p Point where word is located */ private void highlightWordAtPoint(Point p) { try { Rectangle wordBounds = findWordBoundsAtPoint(p, parseWordBounds(scanForWords(p))); if (wordBounds != null) { highlightText(contractHighlight(wordBounds)); } } catch (PdfException e) { // TODO Auto-generated catch block e.printStackTrace(); } }