this post was submitted on 28 Sep 2024
590 points (97.6% liked)
LinkedinLunatics
3583 readers
7 users here now
A place to post ridiculous posts from linkedIn.com
(Full transparency.. a mod for this sub happens to work there.. but that doesn't influence his moderation or laughter at a lot of posts.)
founded 1 year ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
Not necessarily, CVs have complicated formatting. Nobody (should) write blocks of text, and you don't know how many columns the candidate is using. Is the candidate using a specific section to show star based skill rating or word based? So you can still search for individual keywords but if you try copying the whole pdf and paste it in txt (which is what will be forwarded to ATS), it does not make much sense. The structure is too complicated extract where you studied, what did you studied and your grade, what other experiences you have and how long you worked there etc.
Extracting structured data is in its own right a different field of science. There is plenty of recent research on extracting structured data from academic pdfs (I was working on this in a research institute in germany around 2022), even when LLMs are used it can get really complicated to the point that there are specialized LLMs for just that.
But ATS systems are cheap/not high enough priority to even use OCR let alone LLMs so unfortunately the responsibility of making an easily parsable CV comes down to the candidate.
Try this next time you see your CV, copy its text to a txt then think about if you can write a program that can reliably extract your experience, education, interests etc. Its going to be super difficult and even then it won't generalize to thousands of other CVs.
All those "problems" apply to Word too. Maybe you use tables, maybe you use lists, maybe you use stars, maybe ... So there's no advantage in forcing people to use Word "because the machine can understand it better". Because that's a lie.