Musk’s ExTwitter Fumbles Copyright Law, Loses Data Scraping Lawsuit

Techdirt. 2024-05-15

Is there any law that Elon Musk actually understands?

The latest is that he’s lost yet another lawsuit, this time (in part) for not understanding copyright law.

There have been a variety of lawsuits regarding data scraping over the past decade, and we’ve long argued that such scraping should be allowed under the law (though sites are free to take technical measures to try to block them). Some of these issues are at stake in the recent Section 230 lawsuit that Ethan Zuckerman filed against Meta. That one is more about middleware/API access.

But pure “scraping” has come up in a number of cases, most notably the LinkedIn / HiQ case, where the 9th Circuit has said that scraping of public information is not a violation of the CFAA, as it was not “unauthorized access.” But the follow-up to that case was that the court still blocked HiQ from scraping LinkedIn, in part because of LinkedIn’s user agreement.

This has created a near total mess, where it is not at all clear if scraping public data on the internet is actually allowed.

This has only become more important in the last few years with the rise of generative AI and the need to get access to as much data as possible to train on.

Internet companies have been pushing to argue that their terms of service can block all kinds of scraping, perhaps relying on the eventual injunction blocking HiQ. Both Meta and ExTwitter sued a scraping company, Bright Data, arguing that its scraping violated their terms of service.

In January, Meta’s case against Bright Data was dismissed at the summary judgment stage. The judge in that case, Edward Chen, found that Meta’s terms of service clearly do not prohibit logged-off scraping of public data.

Now, ExTwitter’s lawsuit against the same company has reached a similar conclusion.

This time, it’s Judge William Alsup, who has dismissed the case for failure to state a claim. Alsup’s decision is a bit more thorough. It highlights that there are two separate issues here: did it violate ExTwitter’s terms of service to access its systems for scraping, and then, separately, to scrape and sell the data.

On the access side, the judge is not convinced by any of the arguments. It’s not trespass to chattels, because that requires some sort of injury.

Critically, the instant complaint alleges no such impairment or deprivation. X Corp. parrots elements, reciting that Bright Data’s “acts have caused injury to X Corp. and . . . will cause damage in the form of impaired condition, quality, and value of its servers, technology infrastructure, services, and reputation” (Amd. Compl. ¶ 102). Its lone deviation from that parroting — a conclusory statement that Bright Data’s “acts have diminished the server capacity that X Corp. can devote to its legitimate users” — fails to move the needle (Amd. Compl. ¶ 98). To say nothing of the fact that, as alleged, Bright Data and its customers are legitimate X users (subject to the Terms), the scraping tools and services they use are reliant on X Corp.’s servers functioning exactly as intended.

It’s not fraud under California law, because there’s no misrepresentation:

Starting with the argument that Bright Data’s technology and tools misrepresented requests, remember X Corp. does not allege that Bright Data or its customers have used their own registered accounts, or any other registered accounts, to scrape data from X, i.e., to access X by sending requests to X Corp.’s servers (for extracting and copying data). Meanwhile, X Corp. acknowledges that one does not need a registered account to access X and send such requests (see Amd. Compl. ¶ 22). X Corp. also acknowledges that X users with registered accounts can access X and send such requests without logging in to their registered accounts

And it’s not tortious interference with a contract, because, again, there’s no damage:

Among the elements of a tortious-interference claim is resulting damage. Pac. Gas & Elec., 791 P.2d at 590. The only damage that X Corp. plausibly pleaded in the instant complaint is that resulting from scraping and selling of data and, by extension, inducing scraping. X Corp. has not alleged any damage resulting from automated access to systems and, by extension, inducing automated access. As explained above, X Corp. has pleaded no impairment or deprivation of X Corp. servers resulting from sending requests to those servers. And, thin allusions to server capacity that could be devoted to “legitimate users” and reputational harm — not redressable under trespass to chattels as a matter of law — are simply too conclusory to be redressable at all. X Corp. will be allowed to seek leave to amend to allege damage (if any) resulting from automated access, as set out at the end of this order. But the instant complaint has failed to state a claim for tortious interference based on such access.

As for the scraping and selling of data, well, there’s no breach there either. And here we get into the copyright portion of the discussion. The question is who has the rights over this particular data. ExTwitter is claiming, somehow, that it has the right to stop scrapers because it has some rights over the data. But, the content is from users. Not ExTwitter. And that’s an issue.

Judge Alsup notes that ExTwitter’s terms give it a license to the content users post, but that’s a copyright license. Not a license to then do other stuff, such as suing others for copying it.

Note the rights X Corp. acquires from X users under the non-exclusive license closely track the exclusive rights of copyright owners under the Copyright Act. The license gives X Corp. rights to reproduce and copy, to adapt and modify, and to distribute and display (Terms 3–4). Section 106 of the Act gives “the owner of copyright . . . the exclusive rights to do and to authorize any of the following”: “to reproduce . . . in copies,” “to prepare derivative works,” “to distribute copies . . . to the public by sale,” and “to display . . . publicly.” 17 U.S.C. § 106. But X Corp. disclaims ownership of X users’ content and does not acquire a right to exclude others from reproducing, adapting, distributing, and displaying it under the non-exclusive license

Alsup notes that ExTwitter could, in theory, acquire the copyright on all content published on the platform instead of licensing it. However, he claims that it probably doesn’t do this because it could impact the company’s Section 230 immunities:

One might ask why X Corp. does not just acquire ownership of X users’ content or grant itself an exclusive license under the Terms. That would jeopardize X Corp.’s safe harbors from civil liability for publishing third-party content. Under Section 230(c)(1) of the Communications Decency Act, social media companies are generally immune from claims based on the publication of information “provided by another information content provider.” 47 U.S.C. § 230(c)(1). Meanwhile, under Section 512(a) of the Digital Millenium Copyright Act (“DMCA”), social media companies can avoid liability for copyright infringement when they “act only as ‘conduits’ for the transmission of information.” Columbia Pictures Indus., Inc. v. Fung, 710 F.3d 1020, 1041 (9th Cir. 2013); 17 U.S.C. § 512(a). X Corp. wants it both ways: to keep its safe harbors yet exercise a copyright owner’s right to exclude, wresting fees from those who wish to extract and copy X users’ content.

I have to admit, I’m not sure that a copyright assignment would change the Section 230 analysis… but perhaps? Anyway, it’s a weird hypothetical to raise in this scenario.

The larger point is just that ExTwitter has no right to stop others from copying this data. That’s not part of the rights the company has over the content on the site put there by third-party users.

The upshot is that, invoking state contract and tort law, X Corp. would entrench its own private copyright system that rivals, even conflicts with, the actual copyright system enacted by Congress. X Corp. would yank into its private domain and hold for sale information open to all, exercising a copyright owner’s right to exclude where it has no such right. We are not concerned here with an arm’s length contract between two sophisticated parties in which one or the other adjusts their rights and privileges under federal copyright law. We are instead concerned with a massive regime of adhesive terms imposed by X Corp. that stands to fundamentally alter the rights and privileges of the world at large (or at least hundreds of millions of alleged X users). For the reasons that follow, this order holds that X Corp.’s statelaw claims against Bright Data based on scraping and selling of data are preempted by the Copyright Act

And thus, the claims here also fail.

Arguably, this complaint was less silly than some others (and, yes, Meta made a similar — and similarly failed — complaint). The mess of the HiQ decisions means that the issue of data scraping is still kind of a big unknown under the law. Eventually, the Supreme Court may need to weigh in on scraping, and that’s going to be yet another scary Supreme Court case…