Texas to use AI scoring system to grade state-mandated exams

The Texas Education Agency (TEA) is testing Generative Artificial Intelligence (Gen AI) in its scoring system. The new grading technique will employ chatbots like OpenAI’s ChatGPT to understand and communicate with users.

Texas will hire far fewer human evaluators this year, as it is replacing them with a new AI-powered scoring system. The State of Texas Assessments of Academic Readiness (STAAR) exams could be a testbench for replacing a majority of human graders with Gen AI.

Texas training Gen AI scoring system to replace human evaluators

The Texas Education Agency has reportedly confirmed it is, “rolling out an automated scoring engine for open-ended questions on the State of Texas Assessment of Academic Readiness for reading, writing, science and social studies”.

This year’s state-mandated exams in Texas are going to be historic. Students appearing for their STAAR exams this week will have way fewer human graders than last year. A natural language processing engine, commonly called Generative AI, will evaluate most of their answers.

The STAAR test measures students’ understanding of the state-mandated core curriculum. The state of Texas redesigned it last year. Interestingly, the test now has far fewer multiple-choice questions. Texas has replaced them with “open-ended questions” or “constructed response items”.

Texas is rolling out an “automated scoring engine” to score the STAAR test. The technology, which uses natural language processing, a building block of AI chatbots, will save the state $15-20 million. But some educators are worried.

New in @TexasTribune https://t.co/Tu36tmF5B7

— Keaton Peters (@KeatonPeters) April 10, 2024

According to the Texas Tribune, the newly introduced open-ended questions would have, “six to seven times more constructed response items.”

Simply put, such open-ended questions have several acceptable responses, compared to just one response in a multiple-choice question. Consequentially, such questions need a lot more time and evaluators to score, stated Jose Rios, director of student assessment at the Texas Education Agency.

In other words, these questions significantly increase the complexity of grading. And this is where Gen AI truly shines. Platforms such as ChatGPT have a proven track record of answering complex queries at multiple levels of simplicity and depth.

Texas estimates Gen AI will save $15 million to $20 million each year

The TEA has trained the Gen AI scoring system using 3,000 responses. As a safety precaution, the Gen AI is exposed to answers that have undergone two rounds of human scoring. The AI scoring engine has reportedly learned the characteristics of responses, and it is programmed to assign the same scores a human would have given.

Texan children taking the STAAR test will apparently be graded by AI. If you disagree with your child's score, they'll gladly grade it again with a human grader – for $50.#STAAR #Texas https://t.co/6Qr36uCYaU

— no, this is just soup for my family🐀 (@SeanxTyler) April 10, 2024

Human graders will reevaluate a quarter of all the computer-graded results. Moreover, some answers that may confuse the AI scoring system, such as slang or non-English responses, would be passed on to human evaluators.

The TEA has estimated it will save $15–20 million per year by reducing the need for temporary human scorers. Statistically speaking, Texas plans to hire less than 2,000 human graders this year. Back in 2023, the Lone Star State had hired about 6,000 evaluators for the same exam. Needless to say, several educators have expressed concern about the new evaluation technique.

The digital era has dawned on the Texas #STAAR test. In December 2023, the results of the first computer-graded written responses came to light. With the new grading system, 79% of testers scored a zero. Only 8% of testers scored a zero in a previous test with human graders.#TxEd pic.twitter.com/Kwj7FYUh16

— RaiseYourHandTexas (@RYHTexas) April 10, 2024