Gen AI Biases and Homogenization of Language: Still A Cautious Integrationist

I am continuing my journey of reading through Sidney I. Dobrin’s book AI and Writing. I’ve reached the part of my reading where I’m getting to Dobrin suggesting effective prompting, use for visuals, and also biases and other issues that arise with using GenAI. Personally, I don’t like that the author is leaving so much of this messaging that asks questions about its use to the end of the book. Additionally, I think it’s too soft of an approach, too forgiving to GenAI.

So, if best practice is that we need to fact check, we need to be alert for biases and hallucinations, I keep coming back to why use it at all? The LLMs are trained on the data that gets input. We’ve seen enough of how the Internet works to know and trust that it’s been fed a wide range of content that is explicitly racist and sexist, for instance, which means LLMs are being trained on that content and it may show up. We also know it is tracing patterns and reproducing them, which means it’s reproducing mainstream voices and thinking. This has long been a topic of consideration in the field.

Specifically, there’s a body of work and a turn towards linguistic justice that pushes back against the homogenization of language (which is mentioned in the refusal of GenAI work happening in the field). We’ve long talked about how the books and texts that are popularized and reproduced, about the voices that we hear in the mainstream narrative, are often the privileged voices of white, straight men. When our history books and libraries primarily share these kinds of narratives, it results in a skewed history and view of the world. Even if we get minoritized voices into the history books or libraries, they are overwhelmed by the sheer number of voices reiterating the mainstream narrative. I mention this because this problem of LLMs reproducing the mainstream voices, thinking, styles of writing, and the like have long been a problem in the field. We are yet again faced with another instance where the dominant narratives are being pushed, whatever those might be, and the difficulty of finding and accessing alternate viewpoints. The field’s turn to linguistic justice is an attempt to remedy that, though it is a subset of the work happening in the field and, from my perspective, isn’t widely accepted yet. 

It’s also noteworthy that LLMs and technology have biases built into them, quite literally. Selfe and Selfe have talked about the ways designers of technology embed their own assumptions and biases about the world into what they do (arguably, I would say we all do this in our work), and as such LLMs fall prey to the same issue. Biases will appear in the literal coding and programming that makes it work. I am not well-versed in coding and programming, though I know enough to know that the language used and capabilities of that language shape the outcome. Often, there’s multiple ways to reach the same outcome, though in doing so the programmer is making assumptions about which pathway they think is the best, and with my again limited understanding I would say that it’s possible the logic of the program itself would show up as biased at some point in its output. In essence, it’s rhetorical choices happening there. This reminds me that I should look more into the rhetoric of code and programming, which I know some folks in the field are working on and exploring.

Dobrin also rightfully mentions that LLMs are limited to the data sets available to them, and one of these reasons is due to copyright and intellectual property. However, this leaves me concerned and wondering how copyright and intellectual property are being defined to begin with. Again, I’ve got a limited understanding of how this works, but it sounds like they’ve just scraped massive segments of the Internet. So, if it has access to ebooks and academic articles, would that be a violation of copyright and/or intellectual property? I suppose probably not because when publishing we grant the publisher certain permissions. But again it calls into question the ethical use of other people’s work, and their knowledge thereof. It, personally, doesn’t feel good to think that anything I put out into the world can be repackaged and reused however. And yet, I come back to this is what it means to be a digital citizen. I sign over my rights to many things and become complicit in the larger system of things.

Relatedly, I spent time with the refusing GenAI materials I mentioned in my last blog. There really isn’t much I disagree with, and several of their points are mentioned here and in my previous blog (concerns about IP/copyright, homogenization of language, biases embedded into technologies and systems). Where I still end up, though, is that it’s not going away. I did hear (about a month late) a news report that GenAI (I believe ChatGPT) hasn’t found a way to be financially viable. So, I suppose it could still fall out from under us, though I think it’ll just make way for the next iteration and won’t truly go away.

Finally, as a part of my teaching and trying to give a “fair shake” to GenAI, I’ve been taking workshops and have signed up for an online course that explores integrating GenAI into teaching. Yesterday, a workshop attendee shared how it’s still getting it wrong in their content area (I think it was something about the immune system? She asked it to create a study guide and missed critical parts), so we know it’s got problems. However, I personally learned more about the importance of correct prompting to get more effective output, and I found that to be helpful.

I’ve heard that GenAI is useful for looking at job ads and helping to prepare job materials, so I prompted GenAI to give me ideas for a lesson plan that would introduce how GenAI could be used in that way. I still feel that, while it gave me ideas and I used them as a starting place, it’s not producing content that someone can take wholesale. I have to combine it with my own content knowledge for the course, context of the class and students, and more to be able to pull something together. Certainly, I could’ve gone back and forth with GenAI or included all of those details in the initial prompting, though it still feels more like an efficient use of my time and like the end product is more “mine” if I use the output as a starting place but make heavy revisions. 

Anyway, the outcome was that several students shared with me that they found value in the task. I did not make it a requirement that they use it, so I had two in-class prompts (one for using GenAI and one for not using it). In looking over their work, so far most chose to use it. They shared they felt it was helpful and informative, and they helped me hone in on the key details of a job ad and not get so lost in the details. One student also shared that they felt more confident in their own assessment and analysis of what the job ad was asking for. Another student shared they got the idea to use their hobby of 3D printing as a point to mention in their job materials. So, it was fruitful, which is both concerning (because GenAI!) and of value, as students who used it seemed to have generally positive responses. (I should note at least one person said they felt it would’ve been more effective and efficient to do it on their own without GenAI’s help.)

So, I’m going to plug along as a cautious integrationist and continue to read, think, and write about it all.

Leave a Comment