After extracting our PDF’s text and images, CLIP will generate a semantically-useful index that we can search by giving it an image or text as input (and it’ll understand the input semantically, not just match keywords or pixels). For the next post we’ll look at feeding these into CLIP, a deep learning model that “understands” text and images.In this post we’ll cover how to extract the images and text from PDFs, process them, and store them in a sane way. This will be part 1 of n posts that walk you through creating a PDF neural search engine using Python: I know several folks already building PDF search engines powered by AI, so I figured I’d give it a stab too. With neural search seeing rapid adoption, more people are looking at using it for indexing and searching through their unstructured data.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |