Researchers are figuring out how large language models work

Share This Post

[ad_1]

LLMs are built using a technique called deep learning, in which a network of billions of neurons, simulated in software and modelled on the structure of the human brain, is exposed to trillions of examples of something to discover inherent patterns. Trained on text strings, LLMs can hold conversations, generate text in a variety of styles, write software code, translate between languages and more besides.

Models are essentially grown, rather than designed, says Josh Batson, a researcher at Anthropic, an AI startup. Because LLMs are not explicitly programmed, nobody is entirely sure why they have such extraordinary abilities. Nor do they know why LLMs sometimes misbehave, or give wrong or made-up answers, known as “hallucinations”. LLMs really are black boxes. This is worrying, given that they and other deep-learning systems are starting to be used for all kinds of things, from offering customer support to preparing document summaries to writing software code.

It would be helpful to be able to poke around inside an LLM to see what is going on, just as it is possible, given the right tools, to do with a car engine or a microprocessor. Being able to understand a model’s inner workings in bottom-up, forensic detail is called “mechanistic interpretability”. But it is a daunting task for networks with billions of internal neurons. That has not stopped people trying, including Dr Batson and his colleagues. In a paper published in May, they explained how they have gained new insight into the workings of one of Anthropic’s LLMs.

One might think individual neurons inside an LLM would correspond to specific words. Unfortunately, things are not that simple. Instead, individual words or concepts are associated with the activation of complex patterns of neurons, and individual neurons may be activated by many different words or concepts. This problem was pointed out in earlier work by researchers at Anthropic, published in 2022. They proposed—and subsequently tried—various workarounds, achieving good results on very small language models in 2023 with a so-called “sparse autoencoder”. In their latest results they have scaled up this approach to work with Claude 3 Sonnet, a full-sized LLM.

A sparse autoencoder is, essentially, a second, smaller neural network that is trained on the activity of an LLM, looking for distinct patterns in activity when “sparse” (ie, very small) groups of its neurons fire together. Once many such patterns, known as features, have been identified, the researchers can determine which words trigger which features. The Anthropic team found individual features that corresponded to specific cities, people, animals and chemical elements, as well as higher-level concepts such as transport infrastructure, famous female tennis players, or the notion of secrecy. They performed this exercise three times, identifying 1m, 4m and, on the last go, 34m features within the Sonnet LLM.

The result is a sort of mind-map of the LLM, showing a small fraction of the concepts it has learned about from its training data. Places in the San Francisco Bay Area that are close geographically are also “close” to each other in the concept space, as are related concepts, such as diseases or emotions. “This is exciting because we have a partial conceptual map, a hazy one, of what’s happening,” says Dr Batson. “And that’s the starting point—we can enrich that map and branch out from there.”

Focus the mind

As well as seeing parts of the LLM light up, as it were, in response to specific concepts, it is also possible to change its behaviour by manipulating individual features. Anthropic tested this idea by “spiking” (ie, turning up) a feature associated with the Golden Gate Bridge. The result was a version of Claude that was obsessed with the bridge, and mentioned it at any opportunity. When asked how to spend $10, for example, it suggested paying the toll and driving over the bridge; when asked to write a love story, it made up one about a lovelorn car that could not wait to cross it.

That may sound silly, but the same principle could be used to discourage the model from talking about particular topics, such as bioweapons production. “AI safety is a major goal here,” says Dr Batson. It can also be applied to behaviours. By tuning specific features, models could be made more or less sycophantic, empathetic or deceptive. Might a feature emerge that corresponds to the tendency to hallucinate? “We didn’t find a smoking gun,” says Dr Batson. Whether hallucinations have an identifiable mechanism or signature is, he says, a “million-dollar question”. And it is one addressed, by another group of researchers, in a new paper in Nature.

Sebastian Farquhar and colleagues at the University of Oxford used a measure called “semantic entropy” to assess whether a statement from an LLM is likely to be a hallucination or not. Their technique is quite straightforward: essentially, an LLM is given the same prompt several times, and its answers are then clustered by “semantic similarity” (ie, according to their meaning). The researchers’ hunch was that the “entropy” of these answers—in other words, the degree of inconsistency—corresponds to the LLM’s uncertainty, and thus the likelihood of hallucination. If all its answers are essentially variations on a theme, they are probably not hallucinations (though they may still be incorrect).

In one example, the Oxford group asked an LLM which country is associated with fado music, and it consistently replied that fado is the national music of Portugal—which is correct, and not a hallucination. But when asked about the function of a protein called StarD10, the model gave several wildly different answers, which suggests hallucination. (The researchers prefer the term “confabulation”, a subset of hallucinations they define as “arbitrary and incorrect generations”.) Overall, this approach was able to distinguish between accurate statements and hallucinations 79% of the time; ten percentage points better than previous methods. This work is complementary, in many ways, to Anthropic’s.

Others have also been lifting the lid on LLMs: the “superalignment” team at OpenAI, maker of GPT-4 and ChatGPT, released its own paper on sparse autoencoders in June, though the team has now been dissolved after several researchers left the firm. But the OpenAI paper contained some innovative ideas, says Dr Batson. “We are really happy to see groups all over, working to understand models better,” he says. “We want everybody doing it.”

© 2024, The Economist Newspaper Limited. All rights reserved. From The Economist, published under licence. The original content can be found on www.economist.com

[ad_2]

Source link

Related Posts

- Advertisement -spot_img
JUDI BOLA ONLINEMAHJONG WAYS 2SABUNG AYAM ONLINELIVE CASINO ONLINEMAHJONG WAYSjudi bola onlinejudi bola onlinejudi bola onlinesabung ayam onlinejudi bola onlinesabung ayam onlinejudi bola onlinelive casino onlineslot mahjong waysjuara303juara303juara303juara303juara303juara303juara303juara303Sabung Ayam OnlineMix ParlayBandar Casino OnlineMahjong WaysWala MeronJudi BolaPokerSlot Mahjongjudi bola onlinejudi bola onlinesabung ayam onlinejudi bola onlineSLOT MAHJONGmahjong ways 2judi bolamahjong ways 2sabung ayam onlinetosayama academy workshopsabung ayam onlinejudi bola onlinesitus live casino onlinesabung ayam onlinejudi bola onlineagen live casino onlineimplementasi logika analisis bmkg dalam membaca tren mahjong wayscloudflare jadi faktor mudahnya menang di permainan mahjong wayssiswa srma 44 minahasa memahami probabilitas melalui pola digital mahjong wayspola mahjong ways bisa bikin untung besar walaupun harga emas jatuhgunung semeru erupsi bikin geger tetapi pola majong ways lebih bikin dagdigdugsabung ayam onlinesabung ayam onlinesabung ayam onlinesabung ayam onlinesabung ayam online
judi bolaslot pulsaslot pulsaslot gacor mahjongsabung ayam onlinelive casino onlineindobit88judi bolasv388judi bolaMAHJONG WAYS 2LIVE CASINOJUDI BOLA ONLINESABUNG AYAM ONLINEmix parlaysabung ayam onlinelive casinomahjong waysmix parlaysabung ayam onlinelive casinomahjong wayssabung ayam onlinesabung ayam onlinemix parlaysabung ayam onlinelive casinomahjong waysmix parlaysabung ayam onlinelive casinomahjong waysmix parlaymahjong slotSABUNG AYAM ONLINESITUS LIVE CASINO ONLINESLOT MAHJONGSLOT777SLOT MAHJONGSLOT THAILANDJUDI BOLA ONLINESABUNG AYAM ONLINESABUNG AYAM ONLINESABUNG AYAM ONLINESLOT MAHJONG WAYSSLOT MAHJONG WAYSSITUS JUDI BOLAJUDI BOLA ONLINELIVE CASINO ONLINESLOT KAKEK ZEUSMIX PARLAYSABUNG AYAM ONLINESLOT MAHJONG WAYSSABUNG AYAM ONLINEjudi bolaagen baccaratsv388Slot Mahjong Gacorlive casinosv388
sabung ayam onlineslot thailandslot mahjong waysjudi bola onlinejudi bola onlinesabung ayam onlineslot gacoragus berhasil memecahkan pola rahasia yang bikin tajirpola abadi dari kakek yang bikin cuan tiap hariSitus Live Casinotrik profesional membongkar pola mahjong ways untuk raih multiplier maksimalbonus free spin adalah fitur yang paling dicari dalam setiap spin di mahjong wins 3cara cepat stabilkan kemenangan di indojawa88 untuk pemain yang sering boncostrik pause otomatis 7 detik bikin mahjong wild muncul lebih seringpanduan strategi turbo auto untuk mahjong wins 2 agar scatter munculkunci strategi meningkatkan efektivitas bermain mahjong ways 2Slot MahjongJudi BolaSabung Ayam OnlineSabung Ayam OnlineSlot MahjongJudi BolaSabung Ayam Onlinesabung ayam onlinelive casino onlineMAHJONG WAYS 2SV388JUDI BOLA ONLINELIVE CASINO ONLINEJUDI BOLA ONLINESBOBET88SBOBETlive casino onlinejudi bola onlineslot mahjong wayssabung ayam onlinejudi bola onlinelive casino onlineslot mahjong waysSabung Ayam OnlineMix ParlayAgen Casino OnlineZeus SlotSabung Ayam OnlineJudi Bola OnlineLive Casino OnlineSlot Gacor online
judi bola onlinejudi bola onlinejudi bola onlinesabung ayam onlineSV388Mix ParlayDragon TigerMahjong WaysSabung Ayam OnlineJudi Bola OnlineBlackjack dan BaccaratMahjong Wayssabung ayam onlinemix parlay sbobetlive casino onlinescatter hitamsv388sbobet88casino onlinezeus slotsv388mix parlay sbobetlive casino onlinescatter hitamsabung ayam onlinesabung ayam onlinejudi bola onlinejudi bola onlinejudi bola onlinejudi bola onlinejudi bola onlinejudi bola onlinebororan trik mudah menang mahjong wayspola gacor mahjong winsmaxwin mahjong ways 3tips membaca ritme mahjong wins 3profit konsisten mahjong waysrtp gacor mahjong waysscatter hitam mahjong wins 3Judi Bola Onlineteknik mengendalikan bacaan rtp mahjong ways tanpa ribetmapping rtp dan pola taktik kemenangan pragmatic pgsoftpanduan memilih waktu terbaik main mahjong ways agar dapat untung maksimalpola dan trik menang terbaru terbukti memberi kejutanrahasia sukses menguasai mahjong ways secara total3 teknik spin ala indojawa88 untuk mnejemput scatter hitamstrategi pola rtp untuk optimasi mahjong wins black scatterSabung Ayam OnlineSitus Sabung AyamJudi Bola
mix parlaysabung ayam onlinelive casinomahjong slotmix parlaysabung ayam onlinelive casinoslot mahjongmix parlaylive casinomix parlaysabung ayam onlinelive casinomahjong slotmix parlaysabung ayam onlinelive casinomahjong slotsabung ayam onlineslot mahjongSITUS JUDI BOLAJUDI BOLA ONLINELIVE CASINO ONLINESLOT KAKEK ZEUSMIX PARLAYSABUNG AYAM ONLINESLOT MAHJONG WAYSSABUNG AYAM ONLINEJUDI BOLA ONLINESABUNG AYAM ONLINEJUDI BOLA ONLINESABUNG AYAM ONLINEJUDI BOLA ONLINESABUNG AYAM ONLINESABUNG AYAM ONLINEMIX PARLAYSLOT MAHJONGMAHJONG WAYS 2SABUNG AYAM ONLINESBOBET88judi bolalive casino onlinesabung ayam onlineslot mahjong gacorsabung ayam onlinejudi bola onlinelive casino onlineslot mahjong gacorsabung ayam onlinejudi bola onlinelive casino onlineslot mahjong gacorSabung Ayam OnlineJudi Bola OnlineCasino OnlineMahjong Ways 2Sabung Ayam OnlineJudi Bola OnlineCasino OnlineMahjong Ways 2Sabung Ayam Onlinesabung ayam onlinejudi bola onlineagen live casino onlinemahjong ways 2CASINO ONLINEJUDI BOLA ONLINESABUNG AYAM ONLINEJUDI BOLA ONLINESABUNG AYAM ONLINECASINO ONLINESITUS JUDI BOLASlot Qrislive casino onlinesabung ayam onlinejudi bolajudi bola onlineslot mahjongJUDI BOLAJUDI BOLALIVE CASINO ONLINEJUDI BOLALIVE CASINO ONLINESABUNG AYAM ONLINESLOT QRISjudi bolaLive Casino OnlineSabung Ayam OnlineSlot QrisMix ParlayMix Parlay
mahjong ways 2daftar dan login wahanabetCapWorks Official ContactAynsley Official SitedexelTienda de antigüedades y muebles rústicos会社概要 / Company ProfileHarifuku Clinic Official AccessNusa Islands Bali Official PackagesTrinidad and Tobago Pilots’ Association Official About Pagekuasai pola rtp pragmatic playlangkah mendapatkan scatter emaspola rtp pg soft indojawa88Green Gold Mountain Official SiteKomite SMKN 1 Tanjung Jabung Barat Official Sitetutorial maxwin mahjong waysstrategi rtp mahjong waysEIKON Official Policieskontak situs pecinta ayamNusa Islands Bali Official ContactCitraLand Surabaya Official NewsLenterakita About PageVinayak Group Official SiteI Think An Idea Official SitePITAC Official SitePortfolioSitez Official SiteMedical LTD Official SiteCapworks Official SiteMartino & Luth Official SiteTech With Mike First Official SiteSahabat Tiopan Official SiteE-Sekolah CBT Official SiteBDF Ventura Official SiteOcean E Soft Official SiteArab DMC Official SiteBBC Noun Official SiteCang Vu Hai Phong Official SiteThe Flat Official SiteThe Black Sheep Official SiteCEM Argentina Official SiteSlot MahjongTop Dawg Tavern Official SiteKelas Nesfatin Official SiteDuhoc Interlink Official SiteKarunia Inda Med Mandiri Official SiteJFV Pulm Official SiteRatiohead Official SiteAskona Official SiteMAN Surabaya E-Learning Official SiteShaker Group Official SiteTakaKawa Shoten Official SiteBrydan Solutions Official SiteConcursos Rodin Official SiteEHOB Official SiteConmou Official SiteCareer Wings Official SiteMontero Espinosa Official SiteBDF Ventura Official SiteDesa Sangginora Official SiteBDF Ventura Official SiteTaruna Akademia Official SiteAkura Official SiteMUI Ciamis Official SiteNamulanda Technical Institute Official Site