Skip to main content

Claude 3.5 Sonnet, literary analysis capabilities, an experiment

 There are those in the AI observer community that have been suggesting of late that 'AI has plateaued' which reveals, to me, a lack of understanding of how models develop. It's not iterative design but step changes we are witnessing. The differences between Claude 3 Sonnet and 3.5 Sonnet are stark. 

One test I often carry out to asses the current capabilities of LLMs is to request a simple prompt to analyse an unpublished poem, commentating on the style and literary devices employed. The outputs have improved significantly over the 18 months I have employed this approach. This is my recent attempt.

The poem is self written, self published (so not widely available) to ensure that it's unlikely to have found its way into the training data. For added context, implied in the text, this was written during a residency at a Museum and Art Gallery and came from a conversation with a member of staff about his late father. Calm is museum archival software.


PROMPT. "analyse this poem for style and literary devices employed: "


Concrete Boxes


A swing of the garage door

revealed

box upon box upon box

box upon box upon box

box upon box upon box

box upon box upon box

box upon box upon box

box upon box upon box


of stacked plastic brimmed

with the collected and uncollectable.

Boxes of beer mats and then

another box of match stubs,

so much boxed content, so little space

remaining in my late fathers garage.


Back at work and a

term is typed in Calm,

number retrieved and read

from screen then descent by lift

the rolling of white metal shelving

to disclose rows of boxes.


Number is found, written in pencil upon

a heavy fawn box, which is extricated:

lid lifted to reveal a row of paper

wrapped parcels, each tied with care

with a beige material bow.

One parcel is extracted


bow pulled to release

the tight wrap of paper that protects

the glass photographic plate

upon which is imprinted

portraits of an Edwardian

family life, in the spacious garden

of a terraced home.


Answer: Certainly. Let's analyze the poem "Concrete Boxes":


Style:

This poem is also written in free verse, without a consistent rhyme scheme or meter. It's composed of short, descriptive lines that create a visual and sensory experience for the reader. The style is observational and reflective, with a focus on concrete details that carry symbolic weight.


Literary Devices:

1. Repetition:

   The repeated lines "box upon box upon box" create a visual representation of the stacked boxes and emphasize their overwhelming presence.

2. Imagery:

   Strong visual imagery throughout, particularly in descriptions of the garage, the archive at work, and the Edwardian family photograph.

3. Juxtaposition:

   The contrast between the cluttered garage and the organized archive, as well as between past and present.

4. Symbolism:

   The boxes symbolize memories, history, and the accumulation of life's artifacts.

5. Alliteration:

   "beer mats" and "match stubs"

6. Assonance:

   "white" and "slide" in the description of the archive shelves

7. Enjambment:

   Used throughout to create a flow between lines and stanzas, mimicking the process of discovery and revelation.

8. Metaphor:

   The boxes themselves can be seen as a metaphor for compartmentalized memories or lives.

9. Sensory Details:

   Visual and tactile details bring the scenes to life, such as "heavy fawn box" and "beige material bow"

10. Structure:

    The poem is divided into three distinct sections: the garage, the work archive, and the photograph, creating a journey through different spaces and times.

11. Contrast:

    The cluttered, personal nature of the father's garage versus the organized, impersonal work archive.


Themes and Analysis:

The poem explores themes of memory, preservation, and the passage of time. It moves from a personal space (the late father's garage) to a professional one (the archive at work), and then to a historical moment captured in a photograph.

The garage scene represents personal history and the accumulation of a life lived, with its seemingly random collections of items. The work archive represents a more organized approach to preserving history. The photograph at the end bridges these two worlds, being a carefully preserved item that captures a moment in time.

The journey from the chaotic garage to the precise retrieval of the photograph suggests a process of making sense of the past, of finding order in the chaos of memories and possessions left behind.

The final image of the Edwardian family in their garden creates a poignant contrast with the boxed-up memories in the garage and the sterile archive. It suggests that despite our efforts to categorize and store our past, the most vivid memories are those that capture moments of life and human connection.

Overall, the poem is a meditation on how we store, access, and interpret the past, both personally and collectively. It uses concrete, vivid imagery to explore abstract concepts of memory and time.

Initial Impression

Frankly I was surprised at the accuracy, many people I know would not be able to articulate this level of comprehension. Although not a 'difficult poem' it is nuanced and structured which the analysis captured. 

Considering that over a third (36%)* of UK adults have admitted to having reading, writing, grammar, comprehension and spelling difficulties, it should be apparent that more advanced LLMs already posses utility for many. 


*As found during British Dyslexia Awareness Week (3- 9 October 2022). The study by Censuswide, commissioned by Texthelp, surveyed over 2,000 UK adults across the country.3 Oct 2022. 

For another writer's opinion of the capabilities of Claude 3.5 it's worth watch  The Nerdy Novelist's comparison to other LLM models: 

https://www.youtube.com/live/ELkpcIyllWc?si=nz2avGUUUqMuMEGj


A further experiment, mimicry

I prompted Claude 3.5 Sonnet with several other poems, to let it have a broader basis of reference, with the goal of seeing how well it could mimic a literary style (therefore how replaceable are human writers currently, using the best available model?). The results were interesting. Whilst (arguably!) the LLMs are not there yet, for 60 to 70% of writing jobs they are probably sufficient, often better and certainly orders of magnitude quicker. Here's the experiment:

Prompt: 

Could you write a poem about a worker faced with the fear of AI in a style that mimics the author?


Answer: 


Certainly. Here's an attempt to write a poem about a worker facing the fear of AI, mimicking the style of the author we've been analyzing:

Circuitry of Concern

Morning light filters through venetian slats,
casting zebra stripes across the ergonomic chair,
its lumbar support a cold comfort

as I boot up the company-issued laptop.
Emails populate the inbox:
'AI Integration Update'
'Efficiency Optimization Seminar'
'Your Role in the Digital Transformation'

Each subject line a pixel in the mosaic of unease,
each unread message a potential digital pink slip.
Coffee grows cold in the 'World's Best Dad' mug,
a relic from a time when flesh and blood
seemed irreplaceable.

Lines of code scroll past unseeing eyes,
a cascade of ones and zeros that might
render my decades of experience
into so much obsolete data.

In the break room, whispers of neural networks
and machine learning algorithms
replace water cooler gossip.

I return to my cubicle, fingers hovering
over keys that feel suddenly foreign,
as if my hands have already forgotten their purpose.

Outside, a pigeon pecks at discarded circuitry,
unaware that its simple brain
might soon outpace my own in market value.

The clock ticks towards five,
each second a byte of time
I can't afford to waste.

I stay late, illuminated by the harsh glow
of the monitor, trying to prove
my humanity through productivity,
as somewhere, silently, efficiently,
an AI learns to do my job.


This poem attempts to capture the author's style through:

Focus on everyday details and objects
Use of specific imagery (venetian slats, ergonomic chair)
Incorporation of technology-related terminology
Juxtaposition of the mundane and the profound
Subtle social commentary
Use of metaphor and symbolism
A contemplative, somewhat melancholic tone
Free verse structure with varied stanza lengths
Enjambment to create flow between lines
A blend of concrete observations and abstract reflections

Prompt: 

That is a reasonable attempt at mimicking the style, though I'd suggest it lacks some of the subtlety and word choice of the author. What other differences do you ascertain?

Answer: 


Thank you for the feedback. You're right, and I appreciate your insight. Let me analyze the differences between my attempt and the author's style more critically:

1. Subtlety: You correctly point out that my attempt lacks the subtlety of the original author. The fear of AI is more explicitly stated in my poem, whereas the original author tends to let themes emerge more organically through imagery and implication.

2. Word choice: The vocabulary in my attempt, while trying to be specific, doesn't quite capture the precision and unexpectedness of the author's diction. The original poems use more surprising and evocative word combinations.

3. Imagery: While I attempted to include specific images, they lack the vivid, sensory richness found in the author's work, particularly in poems like "The feel of" where the tactile experience is so powerfully conveyed.

4. Tone: My attempt has a more overtly anxious and pessimistic tone, whereas the original author's work often maintains a more neutral, observational stance, allowing the reader to draw their own emotional conclusions.

5. Structure: The stanza breaks in my attempt feel more conventional and less organic than those in the author's work, which often use structure to enhance meaning in more subtle ways.

6. Metaphor: The metaphors in my attempt (like "pixel in the mosaic of unease") are more explicit and less integrated than those in the author's work, where metaphors often arise naturally from the observed environment.

7. Cultural references: While I included some technological terms, the original author's work often incorporates more nuanced cultural touchstones that add depth and context.

8. Personal vs. Universal: My attempt leans more heavily on personal anxiety, whereas the author's work often bridges personal experience with more universal themes in a more seamless way.

9. Rhythm: The original author's work often has a more natural, speech-like rhythm that my attempt doesn't fully capture.

10. Depth of observation: The author's poems often reveal layers of meaning through seemingly simple observations, a depth that my attempt doesn't quite achieve.

These differences highlight the unique voice and skill of the original author, and the challenges in truly mimicking another writer's style. It's a reminder of the complex interplay of elements that make up a poet's distinctive approach to language and subject matter.

Conclusion

As far as writing is concerned, and this may apply to other areas such as coding equally, the analysis features of LLMs surpass their 'creative' abilities. They are useful tools still. 
I made a follow up prompt, asking it to rewrite, based off it's own analysis. It did improve a fair amount.

Whilst LLMs are not yet at the stage to surpass human abilities but arguably a useful adjunct / assistant to them, they are still capable of replacing an increasing proportion of human capabilities across multiple areas. At a significant cost, particularly in terms of natural and and human resources. What they will be able to accomplish, with a greater degree of certainty next year is becoming clear, they will present more of a challenge, than an opportunity, for a increasing number of knowledge workers and may soon seriously extend that capability to many more creative industry workers too.


Comments

Popular posts from this blog

What is happening inside of the black box?

  Neel Nanda is involved in Mechanistic Interpretability research at DeepMind, formerly of AnthropicAI, what's fascinating about the research conducted by Nanda is he gets to peer into the Black Box to figure out how different types of AI models work. Anyone concerned with AI should understand how important this is. In this video Nanda discusses some of his findings, including 'induction heads', which turn out to have some vital properties.  Induction heads are a type of attention head that allows a language model to learn long-range dependencies in text. They do this by using a simple algorithm to complete token sequences like [A][B] ... [A] -> [B]. For example, if a model is given the sequence "The cat sat on the mat," it can use induction heads to predict that the word "mat" will be followed by the word "the". Induction heads were first discovered in 2022 by a team of researchers at OpenAI. They found that induction heads were present in

The AI Dilemma and "Gollem-Class" AIs

From the Center for Humane Technology Tristan Harris and Aza Raskin discuss how existing A.I. capabilities already pose catastrophic risks to a functional society, how A.I. companies are caught in a race to deploy as quickly as possible without adequate safety measures, and what it would mean to upgrade our institutions to a post-A.I. world. This presentation is from a private gathering in San Francisco on March 9th with leading technologists and decision-makers with the ability to influence the future of large-language model A.I.s. This presentation was given before the launch of GPT-4. One of the more astute critics of the tech industry, Tristan Harris, who has recently given stark evidence to Congress. It is worth watching both of these videos, as the Congress address gives a context of PR industry and it's regular abuses. "If we understand the mechanisms and motives of the group mind, it is now possible to control and regiment the masses according to our will without their

Gemini LLM, an increase in benefits, and risks.

  Gemini LLM is being developed by Google Brain and Deepmind that was introduced at Google I/O 2023, and is expected to have a trillion parameters, like GPT-4. The project is using tens of thousands of Google's TPU AI chips for training, and could take months to complete. It may be introduced early next year. Gemini is being trained on a massive dataset of text, audio, video, images, and other media. This will allow it to 'understand' and respond to a wider range of input than previous LLMs. It will also be able to use other tools and APIs, which will make it more versatile and powerful. It's clearly looking to compete with a future GPT-5, this time Google are looking to get ahead of the curve. Training Gemini in a multimodal manner is significant because it allows the model to learn from a wider range of data. This should improve the model's accuracy and performance on a variety of tasks. For example, if Gemini is trained on both text and images, it can learn to as