Synergise AI reposted this
Large language models “versus” small: embrace diversity A short example showing how natural language understanding (NLU) benefits from a mix of models. 👉A critical task in NLU is understanding what “she” and “that choice” refers to in the following sentences: “Janet doesn't have a car. She said if she were to buy one, it would be red. Bob criticized that choice.” (Multiple words or phrases referring to the same thing are called coreferences.) Without the ability to associate references together, a large part of the meaning is lost. Ask GPT 3.5 Turbo, "In those sentences, what color car doesn't he like?” and it will answer correctly, "Bob doesn't like red cars." Not only did it correctly connect the references in the input sentences, it also referenced “he” back to “Bob” in the question. Great! 👉 Job done? Turbo can identify coreferences? A huge step toward extracting maximum information from documents such as the following? (Taken from “Speech and Language Processing. Daniel Jurafsky & James H. Martin. Copyright © 2021.”) <document>Victoria Chen, CFO of Megabucks Banking, saw her pay jump to $2.3 million, as the 38-year-old became the company’s president. It is widely known that she came to Megabucks from rival Lotsabucks. </document> (Can you spot the four references to Victoria Chen and the three references to Megabucks?) 👉 Let’s try it! User: "Resolve all coreferences in the following document. <Insert Chen document here>" 3.5 Turbo answer: "<document>Victoria Chen, CFO of Megabucks Banking, saw her pay jump to $2.3 million, as the 38-year-old became the company’s president. It is widely known that Victoria Chen came to Megabucks from rival Lotsabucks. </document>" Turbo resolved only one of the five references, correctly replacing “she” in the second sentence. 👉 Back to the drawing board Let’s try a much smaller but fine-tuned model on this particular task. The Fastcoref Python library’s model is about 600M parameters; about 33x times smaller then 3.5 Turbo. (A guess on the size of the GPT 3.5 Turbo model is 20 billion parameters - only OpenAI knows for sure.) Fastcoref answer: “Victoria Chen, CFO of Megabucks Banking, saw Victoria Chen, CFO of Megabucks Banking's pay jump to $2.3 million, as Victoria Chen, CFO of Megabucks Banking became Megabucks Banking's president. It is widely known that Victoria Chen, CFO of Megabucks Banking came to Megabucks Banking from rival Lotsabucks.” Fastcoref replaced all references to both Chen and Megabucks with the longest, most specific name. (Interesting that it included Megabucks Banking into the Chen longest name. Good or bad?) 👉 Wrapping up Modern cars incorporate 50 to 100 microprocessors of all sizes and function. In the future, I expect line-of-business apps to use dozens, if not hundreds, of models from large and general purpose to small and dedicated. The first microprocessors were used in cars in the late 1970s. What will the ecosystem of models look like in 40 years?