At least 10% of research may already be co-authored by AI

Share This Post

[ad_1]

It is a question ever more readers of scientific papers are asking. Large language models (LLMs) are now more than good enough to help write a scientific paper. They can breathe life into dense scientific prose and speed up the drafting process, especially for non-native English speakers. Such use also comes with risks: LLMs are particularly susceptible to reproducing biases, for example, and can churn out vast amounts of plausible nonsense. Just how widespread an issue this was, though, has been unclear.

In a preprint posted recently on arXiv, researchers based at the University of Tübingen in Germany and Northwestern University in America provide some clarity. Their research, which has not yet been peer-reviewed, suggests that at least one in ten new scientific papers contains material produced by an LLM. That means over 100,000 such papers will be published this year alone. And that is a lower bound. In some fields, such as computer science, over 20% of research abstracts are estimated to contain LLM-generated text. Among papers from Chinese computer scientists, the figure is one in three.

Spotting LLM-generated text is not easy. Researchers have typically relied on one of two methods: detection algorithms trained to identify the tell-tale rhythms of human prose, and a more straightforward hunt for suspicious words disproportionately favoured by LLMs, such as “pivotal” or “realm”. Both approaches rely on “ground truth” data: one pile of texts written by humans and one written by machines. These are surprisingly hard to collect: both human- and machine-generated text change over time, as languages evolve and models update. Moreover, researchers typically collect LLM text by prompting these models themselves, and the way they do so may be different from how scientists behave.

...

View Full Image


The latest research by Dmitry Kobak, at the University of Tübingen, and his colleagues, shows a third way, bypassing the need for ground-truth data altogether. The team’s method is inspired by demographic work on excess deaths, which allows mortality associated with an event to be ascertained by looking at differences between expected and observed death counts. Just as the excess-deaths method looks for abnormal death rates, their excess-vocabulary method looks for abnormal word use. Specifically, the researchers were looking for words that appeared in scientific abstracts with a significantly greater frequency than predicted by that in the existing literature (see chart 1). The corpus which they chose to analyse consisted of the abstracts of virtually all English-language papers available on PubMed, a search engine for biomedical research, published between January 2010 and March 2024, some 14.2m in all.

The researchers found that in most years, word usage was relatively stable: in no year from 2013-19 did a word increase in frequency beyond expectation by more than 1%. That changed in 2020, when “SARS”, “coronavirus”, “pandemic”, “disease”, “patients” and “severe” all exploded. (Covid-related words continued to merit abnormally high usage until 2022.)

...

View Full Image


By early 2024, about a year after LLMs like ChatGPT had become widely available, a different set of words took off. Of the 774 words whose use increased significantly between 2013 and 2024, 329 took off in the first three months of 2024. Fully 280 of these were related to style, rather than subject matter. Notable examples include: “delves”, “potential”, “intricate”, “meticulously”, “crucial”, “significant”, and “insights” (see chart 2).

The most likely reason for such increases, say the researchers, is help from LLMs. When they estimated the share of abstracts which used at least one of the excess words (omitting words which are widely used anyway), they found that at least 10% probably had LLM input. As PubMed indexes about 1.5m papers annually, that would mean that more than 150,000 papers per year are currently written with LLM assistance.

...

View Full Image


This seems to be more widespread in some fields than others. The researchers’ found that computer science had the most use, at over 20%, whereas ecology had the least, with a lower bound below 5%. There was also variation by geography: scientists from Taiwan, South Korea, Indonesia and China were the most frequent users, and those from Britain and New Zealand used them least (see chart 3). (Researchers from other English-speaking countries also deployed LLMs infrequently.) Different journals also yielded different results. Those in the Nature family, as well as other prestigious publications like Science and Cell, appear to have a low LLM-assistance rate (below 10%), while Sensors (a journal about, unimaginatively, sensors), exceeded 24%.

The excess-vocabulary method’s results are roughly consistent with those from older detection algorithms, which looked at smaller samples from more limited sources. For instance, in a preprint released in April 2024, a team at Stanford found that 17.5% of sentences in computer-science abstracts were likely to be LLM-generated. They also found a lower prevalence in Nature publications and mathematics papers (LLMs are terrible at maths). The excess vocabulary identified also fits with existing lists of suspicious words.

Such results should not be overly surprising. Researchers routinely acknowledge the use of LLMs to write papers. In one survey of 1,600 researchers conducted in September 2023, over 25% told Nature they used LLMs to write manuscripts. The largest benefit identified by the interviewees, many of whom studied or used AI in their own work, was to help with editing and translation for those who did not have English as their first language. Faster and easier coding came joint second, together with the simplification of administrative tasks; summarising or trawling the scientific literature; and, tellingly, speeding up the writing of research manuscripts.

For all these benefits, using LLMs to write manuscripts is not without risks. Scientific papers rely on the precise communication of uncertainty, for example, which is an area where the capabilities of LLMs remain murky. Hallucination—whereby LLMs confidently assert fantasies—remains common, as does a tendency to regurgitate other people’s words, verbatim and without attribution.

Studies also indicate that LLMs preferentially cite other papers that are highly cited in a field, potentially reinforcing existing biases and limiting creativity. As algorithms, they can also not be listed as authors on a paper or held accountable for the errors they introduce. Perhaps most worrying, the speed at which LLMs can churn out prose risks flooding the scientific world with low-quality publications.

Academic policies on LLM use are in flux. Some journals ban it outright. Others have changed their minds. Up until November 2023, Science labelled all LLM text as plagiarism, saying: “Ultimately the product must come from—and be expressed by—the wonderful computers in our heads.” They have since amended their policy: LLM text is now permitted if detailed notes on how they were used are provided in the method section of papers, as well as in accompanying cover letters. Nature and Cell also allow its use, as long as it is acknowledged clearly.

How enforceable such policies will be is not clear. For now, no reliable method exists to flush out LLM prose. Even the excess-vocabulary method, though useful at spotting large-scale trends, cannot tell if a specific abstract had LLM input. And researchers need only avoid certain words to evade detection altogether. As the new preprint puts it, these are challenges that must be meticulously delved into.

© 2024, The Economist Newspaper Limited. All rights reserved. From The Economist, published under licence. The original content can be found on www.economist.com

[ad_2]

Source link

Related Posts

- Advertisement -spot_img
judi bola onlinejudi bola onlinesabung ayam onlinelive casino onlinejudi bola onlinejudi bola onlinejuara303juara303juara303juara303juara303juara303juara303juara303SV388Mix ParlayLive Casino OnlineSlot GacorSabung Ayam OnlineMix ParlayAgen BlackjackPRAGMATIC PLAYsabung ayam onlinejudi bola onlinesabung ayam onlinejudi bola onlineslot mahjong wayssabung ayam onlinejudi bola onlineslot mahjong wayssabung ayam onlinejudi bola onlineslot mahjong ways 2sabung ayam onlinejudi bola onlineagen live casino onlinebandar togel onlinesabung ayam onlinejudi bolasabung ayam onlinejudi bolasabung ayam onlinehari guru nasional bikin semangat belajar termasuk pahami pola mahjong waysdinamika gempa blitar magnitudo dan fenomena pola yang berguncang mahjong ways
Slot Mahjong Gacorsabung ayam onlinejudi bolalive casinoindobit88judi bolaslot mahjong gacorslot pulsajudi bolalive casino onlinesabung ayam onlinemahjong ways 2sbobetsv388slot zeussabung ayam onlinesitus judi bolaMahjong Ways 2situs judi bolasitus live casinosabung ayam onlinejudi bolapoker onlineindobit88Sabung Ayam OnlineJudi Bola OnlineCasino OnlineSlot777Sabung Ayam OnlineJudi Bola OnlineLive Casino OnlineMahjong Ways 2judi bolajudi bolasv388judi bolajudi bola onlineslot depo 10kcasino onlinesabung ayam onlinejudi bola onlinejudi bola onlinejudi bola onlinelive casino onlinesabung ayam onlinesv388sbobet88casino onlinescatter hitamsabung ayam onlinemix parlay sbobetlive casino onlinezeus slotSV388Bandar Judi BolaDream GamingMahjong Ways 2Wala MeronMix ParlayPokerSlot Mahjongmahjong ways 2sabung ayam onlinemahjong ways 2mahjong ways 2sabung ayam onlinesabung ayam onlinesabung ayam onlinejudi bola onlinejudi bola onlineagen live casino onlinesitus live casino onlinesitus live casinosabung ayam onlinejudi bola onlinekajian pola mahjong ways dalam konteks pembelajaran hari guruketerkaitan tren harga emas antam dengan pola mahjong wayspola perubahan harga bbm pertamina ke dinamika mahjong waysjudi bolajudi bolajudi bolajudi bolasabung ayam onlinesabung ayam onlinesabung ayam onlinesabung ayam online
JUDI BOLA ONLINEMAHJONG WAYS 2SABUNG AYAM ONLINELIVE CASINO ONLINEMAHJONG WAYSjudi bola onlinejudi bola onlinejudi bola onlinesabung ayam onlinejudi bola onlinesabung ayam onlinejudi bola onlinelive casino onlineslot mahjong waysjuara303juara303juara303juara303juara303juara303juara303juara303Sabung Ayam OnlineMix ParlayBandar Casino OnlineMahjong WaysWala MeronJudi BolaPokerSlot Mahjongjudi bola onlinejudi bola onlinesabung ayam onlinejudi bola onlineSLOT MAHJONGmahjong ways 2judi bolamahjong ways 2sabung ayam onlinetosayama academy workshopsabung ayam onlinejudi bola onlinesitus live casino onlinesabung ayam onlinejudi bola onlineagen live casino onlineimplementasi logika analisis bmkg dalam membaca tren mahjong wayscloudflare jadi faktor mudahnya menang di permainan mahjong wayssiswa srma 44 minahasa memahami probabilitas melalui pola digital mahjong wayspola mahjong ways bisa bikin untung besar walaupun harga emas jatuhgunung semeru erupsi bikin geger tetapi pola majong ways lebih bikin dagdigdugsabung ayam onlinesabung ayam onlinesabung ayam onlinesabung ayam onlinesabung ayam online
judi bolaslot pulsaslot pulsaslot gacor mahjongsabung ayam onlinelive casino onlineindobit88judi bolasv388judi bolaMAHJONG WAYS 2LIVE CASINOJUDI BOLA ONLINESABUNG AYAM ONLINEmix parlaysabung ayam onlinelive casinomahjong waysmix parlaysabung ayam onlinelive casinomahjong wayssabung ayam onlinesabung ayam onlinemix parlaysabung ayam onlinelive casinomahjong waysmix parlaysabung ayam onlinelive casinomahjong waysmix parlaymahjong slotSABUNG AYAM ONLINESITUS LIVE CASINO ONLINESLOT MAHJONGSLOT777SLOT MAHJONGSLOT THAILANDJUDI BOLA ONLINESABUNG AYAM ONLINESABUNG AYAM ONLINESABUNG AYAM ONLINESLOT MAHJONG WAYSSLOT MAHJONG WAYSSITUS JUDI BOLAJUDI BOLA ONLINELIVE CASINO ONLINESLOT KAKEK ZEUSMIX PARLAYSABUNG AYAM ONLINESLOT MAHJONG WAYSSABUNG AYAM ONLINEjudi bolaagen baccaratsv388Slot Mahjong Gacorlive casinosv388
sabung ayam onlineslot thailandslot mahjong waysjudi bola onlinejudi bola onlinesabung ayam onlineslot gacoragus berhasil memecahkan pola rahasia yang bikin tajirpola abadi dari kakek yang bikin cuan tiap hariSitus Live Casinotrik profesional membongkar pola mahjong ways untuk raih multiplier maksimalbonus free spin adalah fitur yang paling dicari dalam setiap spin di mahjong wins 3cara cepat stabilkan kemenangan di indojawa88 untuk pemain yang sering boncostrik pause otomatis 7 detik bikin mahjong wild muncul lebih seringpanduan strategi turbo auto untuk mahjong wins 2 agar scatter munculkunci strategi meningkatkan efektivitas bermain mahjong ways 2Slot MahjongJudi BolaSabung Ayam OnlineSabung Ayam OnlineSlot MahjongJudi BolaSabung Ayam Onlinesabung ayam onlinelive casino onlineMAHJONG WAYS 2SV388JUDI BOLA ONLINELIVE CASINO ONLINEJUDI BOLA ONLINESBOBET88SBOBETlive casino onlinejudi bola onlineslot mahjong wayssabung ayam onlinejudi bola onlinelive casino onlineslot mahjong waysSabung Ayam OnlineMix ParlayAgen Casino OnlineZeus SlotSabung Ayam OnlineJudi Bola OnlineLive Casino OnlineSlot Gacor online
Mahjong Ways 2mahjong ways 2daftar dan login wahanabetCapWorks Official ContactAynsley Official SitedexelTienda de antigüedades y muebles rústicos会社概要 / Company ProfileHarifuku Clinic Official AccessNusa Islands Bali Official PackagesTrinidad and Tobago Pilots’ Association Official About Pagekuasai pola rtp pragmatic playlangkah mendapatkan scatter emaspola rtp pg soft indojawa88Green Gold Mountain Official SiteKomite SMKN 1 Tanjung Jabung Barat Official Sitetutorial maxwin mahjong waysstrategi rtp mahjong waysEIKON Official Policieskontak situs pecinta ayamNusa Islands Bali Official ContactCitraLand Surabaya Official NewsLenterakita About PageVinayak Group Official SiteI Think An Idea Official SitePITAC Official SitePortfolioSitez Official SiteMedical LTD Official SiteCapworks Official SiteMartino & Luth Official SiteTech With Mike First Official SiteSahabat Tiopan Official SiteE-Sekolah CBT Official SiteBDF Ventura Official SiteOcean E Soft Official SiteArab DMC Official SiteBBC Noun Official SiteCang Vu Hai Phong Official SiteThe Flat Official SiteThe Black Sheep Official SiteCEM Argentina Official SiteSlot MahjongTop Dawg Tavern Official SiteKelas Nesfatin Official SiteDuhoc Interlink Official SiteKarunia Inda Med Mandiri Official SiteJFV Pulm Official SiteRatiohead Official SiteAskona Official SiteMAN Surabaya E-Learning Official SiteShaker Group Official SiteTakaKawa Shoten Official SiteBrydan Solutions Official SiteConcursos Rodin Official SiteEHOB Official SiteConmou Official SiteCareer Wings Official SiteMontero Espinosa Official SiteBDF Ventura Official SiteDesa Sangginora Official SiteBDF Ventura Official SiteTaruna Akademia Official SiteAkura Official SiteMUI Ciamis Official SiteNamulanda Technical Institute Official Site