Post-Post-API Age: Studying Digital Platforms in Scant Data Access Times (2025)

Kayo Mimizukakayomimizuka@utexas.edu0000-0003-0966-5386The University of Texas at AustinAustinTexasUSA,Megan A Brownmgnbrown@umich.edu0000-0002-1338-8054University of MichiganAnn ArborMichiganUSA,Kai-Cheng Yangyang3kc@gmail.com0000-0003-4627-9273Northeastern UniversityBostonMassachusettsUSAandJosephine Lukitojlukito@utexas.edu0000-0002-0771-1070The University of Texas at AustinAustinTexasUSA

(2025)

Abstract.

Over the past decade, data provided by digital platforms has informed substantial research in HCI to understand online human interaction and communication.Following the closure of major social media APIs that previously provided free access to large-scale data (the “post-API age”), emerging data access programs required by the European Union’s Digital Services Act (DSA) have sparked optimism about increased platform transparency and renewed opportunities for comprehensive research on digital platforms, leading to the “post-post-API age.”However, it remains unclear whether platforms provide adequate data access in practice.To assess how platforms make data available under the DSA, we conducted a comprehensive survey followed by in-depth interviews with 19 researchers to understand their experiences with data access in this new era.Our findings reveal significant challenges in accessing social media data, with researchers facing multiple barriers including complex API application processes, difficulties obtaining credentials, and limited API usability. These challenges have exacerbated existing institutional, regional, and financial inequities in data access.Based on these insights, we provide actionable recommendations for platforms, researchers, and policymakers to foster more equitable and effective data access, while encouraging broader dialogue within the CSCW community around interdisciplinary and multi-stakeholder solutions.

DSA, post-API, social media, data access, survey, interview

copyright: acmlicensedjournalyear: 2025doi: XXXXXXX.XXXXXXXjournal: JACMjournalvolume: 37journalnumber: 4article: 111publicationmonth: 5ccs: Information systemsSocial networksccs: Social and professional topicsPrivacy policiesccs: Social and professional topicsGovernmental regulationsccs: Applied computingLaw, social and behavioral sciences

1. Introduction

Over the past decade, data provided by digital platforms has informed substantial research in HCI to understand online human interaction and communication.However, many platform data access programs have shuttered in recent years, leaving researchers with few official means through which to access social media data for research.In 2023, following Elon Musk’s purchase of Twitter (now X), the platform announced its plans to end free access to its Academic API(Clama, 2023). Following suit, Meta shut down CrowdTangle in 2024, ending the primary avenue by which researchers accessed and analyzed public information from Facebook and Instagram(Ortutay, 2024).Consequently, many public-interest research projects were interrupted, limiting public understanding of the social media platforms’ role in public and civic life.

Around the same time, new regulations in the E.U.’s Digital Services Act (DSA) mandate that Very Large Online Platforms (VLOPs) and Very Large Online Search Engines (VLOSEs) grant researchers access to public data.These platforms typically comply with the mandates by providing researcher data access programs.In this paradigm, the companies ultimately decide whether or not a researcher gets access, and each platform may interpret the legal requirements differently.In light of diminishing public-facing APIs, how platforms grant (or do not grant) access to platform data under the DSA is of utmost importance to the viability of API-based platform scholarship. However, it is unclear whether platforms provide adequate data access in practice.

To assess how platforms make data available under the DSA, we conducted a mixed-method study, combining survey data from 180 responses and in-depth interviews of 19 researchers about their experiences with data access under the DSA.We find that researchers overall were frustrated with data application processes through DSA-mandated public data access programs.Our survey shows that many researchers did not apply because they were unaware of the platforms’ data access programs, because they found the applications or policies problematic, or because they were not interested in the data offered by the programs.For those who did apply for data access, the majority had not heard back or had applications rejected at the time of the survey.

The interviews further reveal that researchers are frustrated with the opacity of the data application processes implemented by platforms, with many being denied access without a sufficient reason given by the platform.Even when researchers did get access to platform data, they found that the APIs were often clunky, unusable, or did not provide the data they needed to conduct their research.Based on these insights, we provide actionable recommendations for platforms, researchers, and policymakers to foster more equitable and effective data access, while encouraging broader dialogue within the CSCW community around interdisciplinary and multi-stakeholder solutions.

2. Background

2.1. APIs for Research

Historically, researchers have relied on third parties to collect data, such as firms providing representative panels or organizations facilitating interviews. In social media research, this model has evolved distinctly—instead of traditional third-party research firms, social media companies themselves serve as the intermediaries, primarily through Application Programming Interfaces (APIs).

For these social media companies, user-generated data is a proprietary asset with economic value.Their data infrastructures are primarily designed to support business objectives rather than academic inquiry(Wu and Taneja, 2021).As a result, the APIs provided by companies are designed to enable external actors to engage with the platform, often in ways that serve the platform’s commercial interests.For example, a business might use the YouTube API to automate video uploads, contributing to the platform’s growth.When companies make APIs publicly available, they typically prioritize commercial applications, with public-interest research emerging as an incidental benefit rather than an intended goal.111A notable exception was Twitter’s Academic API, which, before Elon Musk’s acquisition of Twitter in 2022, stood out as one of the first API offerings specifically designed for academic research.

As social media becomes increasingly central to public and social life, researchers have increasingly depended on platform data access to study the impact of social media on civic, social, and public health.Previous efforts to study platforms in this manner have yielded important scholarship in HCI and CSCW, including the influence of social media on democratic processes(Praet etal., 2021; Prochaska etal., 2023; Starbird etal., 2019), how social media can help amplify or silence marginalized voices(Jackson etal., 2020; Wang and Ringland, 2023; Bhimdiwala etal., 2024), and the role of social platforms in public health crises(Chen etal., 2020; Karra etal., 2023; Pater etal., 2023).Despite obvious public benefit and harm, platforms often have little incentive to share platform data, especially when findings may reflect poorly on the company(Persily etal., 2020).

This dynamic results in a condition that Wagner calls “independence by permission”(Wagner, 2023).In an analysis of Meta’s research collaborations during the 2020 U.S. election, Wagner illustrates this phenomenon, where access to data depended on corporate approval.Even researchers who are not directly funded by or partnered with social media platforms remain dependent on platforms’ willingness to grant access, effectively placing limits on scholarly independence in social media research.

2.2. Eras of Data Access

Researchers’ access to platform data has fluctuated significantly over time.Here we split the past two decades into four eras and illustrate them in Figure1 with key events and developments.

Post-Post-API Age: Studying Digital Platforms in Scant Data Access Times (1)

\Description

[A timeline of the eras of data access to social media platforms.]A timeline of the eras of data access to social media platforms.

2.2.1. Pre-API and Voluntary-API Eras

With the emergence of major social media platforms like Facebook (founded in 2004), YouTube (founded in 2005), and Twitter (founded in 2006), both companies and researchers recognized the immense value of digital data as a political and economic resource(Lazer etal., 2009).However, during this early period, which we term the “pre-API age,” programmatic approaches to accessing platform data remained limited.

Starting around 2010, researchers experienced a “golden age” of data access, as many social media platforms voluntarily developed APIs (the “voluntary-API age” in Figure1).One of the most influential examples was Twitter’s v1.1 and 2.0 APIs, which were free and open to all users and allowed accessing a wide range of data types with minimal restrictions. The Twitter 2.0 API also had an “academic track” that provided increased access for verified academic researchers.Over the decade, these APIs enabled countless research projects across various academic disciplines(Murtfeldt etal., 2024) and have been used for creating public-service tools and educational resources.Other researcher access programs from the same era include the Social Science One program(King and Persily, 2020), access through CrowdTangle, and the Meta U.S. 2020 elections Project(Wagner, 2023).These programs offered researchers diverse data types and varying levels of access, providing unique insights into different aspects of online platform behavior and user activity.

However, all data access programs have their limitations.Although APIs offer powerful and flexible access, platforms maintain complete control over the data access process, available data types, and API functionality(Bucher, 2013).This means that platforms can modify API specifications or end the API at their discretion, potentially disrupting ongoing research projects and tools.CrowdTangle also exemplified these constraints—while it provided relatively open access to public posts from Facebook and Instagram, it limited access to content from larger public pages and groups with over twenty-five thousand followers.The Social Science One initiative fell short of expectations, providing researchers with data of limited utility compared to what was initially promised(Alba, 2019).The project was further compromised when it was discovered that half of the dataset was missing due to production errors(Alba, 2021).At the same time, Facebook maintained control over both CrowdTangle and Social Science One, with the ability to deny data access to researchers whose work might not align with company interests.Similarly, the Meta U.S. 2020 election projects remained dependent on Facebook’s voluntary cooperation to provide data access.Notably, no subsequent projects with comparable arrangements were implemented or proposed, highlighting the unsustainable nature of this approach.

2.2.2. The Post-API Age

The aforementioned limitations collectively undermine the stability and reliability of data access approaches.A pivotal event in the history of platform data access was the Cambridge Analytica scandal in 2018.In response to this scandal, Facebook significantly restricted access to its Graph API, dealing a severe blow to the social media research community.In the influential essay “Computational Research in the Post-API Age”(Freelon, 2018), Freelon analyzed the implications of Facebook’s decision.The restrictions on the Graph API effectively eliminated all Terms of Service-compliant methods for systematic collection and analysis of Facebook data at the time.This shift led Freelon to characterize the resulting era as the “Post-API” age for Facebook research.

Another driving factor that led to more restricted data access is the rise of large language models (LLMs).The development of LLMs has been driven by the so-called “scaling law,” which states that the performance of LLMs is a function of the model’s size and the amount of training data(Hoffmann etal., 2022).This pattern makes user-generated data a valuable resource for training LLMs.As a result, online platforms have become increasingly cautious about granting data access.

Since 2023, the data access landscape has undergone a series of increasingly restrictive changes.Twitter initiated this trend by discontinuing its free API access.While a paid API remains available, its pricing structure has become prohibitively expensive for most researchers, with costs ranging from $100 per month for 10,000 tweets to $5,000 per month for one million tweets.Following Twitter’s lead, Reddit implemented similar restrictions on its free API access.Subsequently, in 2024, Meta announced the discontinuation of its CrowdTangle platform.These significant changes collectively mark a definitive shift in the “post-API” era for social media research.

In response to these challenges, researchers have intensified their exploration of alternative data collection methods beyond platform-provided APIs.Freelon’s essay remains particularly salient, emphasizing that effective social media platform research necessitates diverse data collection strategies, encompassing both collaborative and adversarial approaches.Many researchers have adopted unsanctioned web scraping as an alternative method(Bruns, 2021).However, this approach raises significant concerns as it introduces complex ethical and legal challenges that researchers must carefully navigate(Brown etal., 2024).As an alternative to scraping, researchers have developed more transparent data collection methods that incorporate explicit participant consent.For instance, Breuer etal. (2023) demonstrate how researchers can gather Facebook data through voluntary data donations from study participants.Furthermore, the emergence of open-source tools for facilitating and analyzing user-donated data has established data donation as an increasingly viable research methodology in the post-API era(Araujo etal., 2022).

2.3. New Regulatory Avenues in a Post-Post API Age

While the post-API age has left researchers in limbo, new developments in the European Union signals a new model for data access that is driven by policy regulations and an expectation of transparency for the platforms.An example is the Digital Services Act (DSA), which introduces a new process where online platforms and search engines provide data access to researchers and the broader public.

Under Article 40.12, platforms and search engines classified as “very large online platforms” (VLOPs) or “very large online search engines” (VLOSEs) must grant researchers access to publicly available data.Under Article 40.4, vetted researchers may access private data.Data access under the DSA is not limited to researchers within the European Union; rather, researchers around the globe can access data through provisions in the DSA provided they meet the same qualifications for data access (namely, that they are affiliated with non-profit institutions and are conducting research that investigates systemic risks to the European Union)(Brown etal., 2024).

While the DSA represents a positive regulatory development, its implementation ultimately rests with the platforms themselves.Importantly, under Article 40.12, platforms may still screen researchers for access to public data, introducing many of the same “independence by permission” challenges that we highlight in the previous sections.Moreover, the platforms may interpret and comply with the DSA mandates differently.Due to these uncertainties, this study aims to understand both the platforms’ implementation details and researchers’ experiences when applying for data access under the regulation.

3. Methods

To study researcher data access, we conducted a mixed-method study involving a survey and subsequent interviews. This process was approved by an Institutional Review Board (protocol number hidden for anonymity).

3.1. Survey Process

To conduct the survey, we solicited volunteer participants from seven different professional organizations across the following disciplines: computational social science, communication, political science, internet studies, and human-computer interaction.We also recruited from the following research communities: the Coalition for Independent Technology Research, the Media and Democracy Data Cooperative, the Knight Research Network, and the Center for Democracy and Technology.We ran the survey for roughly three months and received 180 responses.

3.2. Interview Data Collection and Analysis

After the survey, we conducted a total of 19 semi-structured interviews between October 2024 and February 2025 to gain deeper insight into the researchers’ experiences with data access.We recruited interviewees by reaching out to survey respondents who had indicated their willingness to participate in our interview.We obtained consent from 17 researchers and recruited two additional interviewees who did not complete the survey but were referred to us by one of the initial interviewees.It is important to note that some researchers indicated not having API access to a platform during the survey, but gained access between completing the survey and participating in the interview.

Of the 19 participants, 14 (73.7%) are academic researchers affiliated with universities, 5 (26.3%) are non-academic researchers affiliated with for-profit companies, public-funded research institutes, or civil-society organizations.12 (63.2%) are based in the E.U., 6 (31.6%) are based in the U.S., and 1 (5.3%) is based in Latin America.We have assigned the researchers descriptive codes based on their affiliations: AR for academic research institutions, NR for non-academic institutions, as well as regions they are based in: E.U. for Europe, U.S. for the United States, and L.A. for Latin America.Some common research fields of the participants include Information Sciences, Communication, Computer Science, Humanities, and Political Science.Details about the participants are provided in the Appendix, Table3.

The interviews were conducted remotely over Zoom and typically lasted between 45 minutes and an hour.To refine our interview protocol and ensure that it facilitated relevant discussions, several initial interviews were conducted by two members of the authorship together.The remaining interviews were conducted with only one author present.After the interviews, we transcribed them for analysis.The transcripts of six interviews were generated by Zoom’s auto-transcription feature, while the other 13 interviews were transcribed using the Whisper Large v3 Turbo model, released by OpenAI.222https://github.com/openai/whisperSince the Whisper model often made mistakes and could not accurately label the speakers, an author manually reviewed the transcripts, corrected the errors, and labeled the speakers.

We analyzed the interview transcripts in the following steps.First, three authors conducted open-coding of several transcripts, reading them line by line to capture relevant themes on a granular level using the qualitative analysis software Atlas.ti.333https://atlasti.comNext, the authors discussed these initial codes as a team to agree on a final codebook to employ.During this process, codes with similar meanings were collapsed while the irrelevant codes were removed. We then applied the codebook to all of the transcripts.Based on the annotation, two authors wrote memos to explain and articulate salient themes and their potential implications. Then, the research team conducted axial coding, discussing relationships between codes and creating larger categories. Finally, we discussed and agreed on the most important themes and featured them in the paper.

4. Findings

4.1. Survey Results

Our questionnaire focused primarily, though not exclusively, on platforms classified as a VLOP or VLOSE at the time of the survey distribution.We asked respondents about the data access program of each platform, whether they applied, and for what reasons they chose not to apply.Note that we included an additional platform, Reddit, owing to its popularity within the U.S. social media ecosystem(Proferes etal., 2021) even though it was not classified as a VLOP in the DSA.

PlatformHad accessUnawareIneligibleProblematicNot interested
Alibaba (Ali Express)0202124
Bing0191024
Booking.com0171026
Crowdtanglea156252
Google Maps0170124
Google Play0140029
Google Records Request05106
Google Search1222119
Google Shopping1132027
LinkedIn0171120
Meta Content Librarya464124
Snap0163123
TikTok192109
X (Twitter)424151
YouTube711458
  • a

    Includes Facebook and Instagram.

In the survey, applicants could indicate that they did not know that they could request data access, did not believe they were eligible, found the application process problematic, and/or were not interested in data from that particular platform.We summarize these findings in Table1.

For many researchers, the primary reason they did not apply for data access from a given platform was because they were not interested in data from that particular platform.However, researchers also indicated that they did not apply for data access because they were not aware that they could, or that a data access program even existed for a given platform.

There are several notable exceptions to this.For data access to Meta platforms (Facebook and Instagram), including Crowdtangle and the Meta Content Library, researchers did not apply because they either already had access (through Crowdtangle, which was still available at the time of the survey) or because they were concerned about the application process (for the Meta Content Library).Similarly, applicants for Twitter’s data access program under the DSA were concerned because they found the application process to be time-consuming, unclear, or overly expensive.We further investigate researchers’ concerns with the application processes through our in-depth interviews.

PlatformAppliedDeniedAccepted
Bing300
CrowdTangle212a9
Google (Search & Map)601b
LinkedIn200
Meta Content Library1916
Snap200
TikTok21100
Twitter3199
YouTube1105
  • a

    CrowdTangle was closed.

  • b

    Only access to Google Maps was granted.

Next, we inquired about respondents’ experiences with platform applications and their outcomes and list the results in Table2.For those who did apply, we gathered detailed information about both the application processes and their outcomes.Our findings reveal that the majority of applications were still pending at the time respondents completed our survey.While many researchers had been waiting for at least a month since their initial application, some reported even longer periods of uncertainty.Among those who received responses, a significant number faced rejection of their data access requests.This trend was especially pronounced for platforms like X (Twitter) and TikTok.Notably, when platforms rejected applications, they frequently did so without providing any explanation or justification for their decision.

4.2. The Process of Permitted Access

Our survey highlights several key barriers to data access for researchers.To elaborate on these barriers and understand their impact on research, we rely on our interview data to contextualize these quantitative findings.Using these combined results, we present a flowchart in Figure2, which highlights the many factors that may hinder researchers’ ability to gain access to a platform’s data program and ultimately cause them to abandon their research.The flowchart illustrates four critical barriers researchers face: (1) lack of an official API or insufficient awareness of existing API access programs, (2) overly complex application processes, (3) significant delays by platforms or their third-party proxies in granting access, and (4) inadequate or limited data quality after access is provided.To successfully conduct their desired study, researchers must navigate and overcome all these barriers. Otherwise, they will either have to resort to alternative data access approaches or abandon their research.

Post-Post-API Age: Studying Digital Platforms in Scant Data Access Times (2)

\Description

[A flowchart demonstrating the steps and challenges researchers face when attempting to access social media data in the post-post-API era.]A flowchart demonstrating the steps and challenges researchers face when attempting to access social media data in the post-post-API era.

Our qualitative findings from our interviews highlight the nuanced and multifaceted nature of data access procedures, the challenges with widespread variation in application procedures, and the subsequent impact on researchers’ willingness to apply for data at all.

4.2.1. Barriers to Applying

One of the barriers mentioned by researchers is the application process. Aligning with the survey results, some interviewees said they chose not to apply for API access for certain platforms because they found the application problematic.This sentiment was most glaringly obvious when they spoke about X (Twitter).Some researchers felt discouraged to apply or pay for the X API as they perceived the platform’s business practices under Elon Musk as going against democratic values.Bob, an information scientist in the U.S., decided to move away from research on X because he finds it ethically wrong to support the platform’s business in any way.“It’s now hostile to my values, to research, to knowledge …… it’s no longer a great research site for me to study the things I want to study,” said Bob.

When researchers applied for APIs, they described the application as cumbersome, especially as each platform had a different API application process, and the experiences of the researchers varied.Researchers lamented that the processes often did not reflect the realities of how research is conducted, were unnecessarily laborious, and required excessive time and resources.

For example, researchers are expected to apply for API access for individual research projects rather than for long-term access under the current data access regime.Researchers felt that this application was not designed to facilitate flexible uses of APIs.Max, a U.S.-based academic researcher who manages a large-scale research lab, explained that this requirement makes it hard for researchers to use the APIs simultaneously to conduct multiple projects.Discussing Meta and TikTok, Max said:

[The application processes were] kind of painful because they are adopting a model where they think that each researcher will apply for access for a specific research question.That is very, very cumbersome for a lab like us where we have many researchers working at many research questions at any given time.So it’s very limiting.” (Max)

These challenges highlight a mismatch between current data access procedures and the pragmatic practice of conducting research using social media data: data access limited to individual projects makes it difficult for researchers to deftly study rapid and recent trends in the social media ecosystem.

Other barriers to access involved the arduous application process. For example, researchers who applied for the Meta Content Library API in the early phase said they were required to have ethical approval from Institutional Review Boards (IRB) for their projects. Although this requirement was later removed, the researchers indicated that it discouraged many scholars from applying because (1) such a practice hinders researcher independence and (2) data collection, preliminary data analysis, and the process of designing research often go hand in hand. Brian, a U.S.-based information scientist, welcomed Meta’s decision to drop the IRB requirement, but added:“[T]he fact that they even required it in the first place was a very limiting thing.”Furthermore, for some social scientists who are not based in the U.S., the IRB requirement was simply difficult to meet because obtaining ethical approval is not a common practice in their countries.Elaborating on this point, Fabio, a European political communication researcher, explained:“In my university, like in many, many other universities in Europe, this kind of ethical committee [is] …… not really common in social science.

Other researchers mentioned that they struggled with a long list of questions and what they considered unrealistic criteria they were required to meet.Bastien, a researcher at a European civil-society organization, who applied for the LinkedIn API, recalled being asked around 40 questions at the time of his application.The platform “sets a very high bar” with its requirements, including what he perceived as overly strict privacy protection requirements for public data, said Bastien.While he agreed that platforms needed to ask some of these questions to ensure ethical data usage, Bastien felt that “there are almost no organizations that would actually be able to meet all these criteria.”He added, “the requirements around safety, security, privacy are absolutely disproportionate to the sensitivity of the [public] data.

Crafting answers to these lengthy lists of questions can be highly time-consuming, which in turn slows down research.Andy, a E.U.-based media studies scholar, described the application forms for Meta’s tools as “pretty clear and straightforward” but felt that the applications were “unnecessarily onerous,” which “slows down your research project.”Exacerbating the issue, aligning with the survey results, interviewees reported that it took them weeks, or sometimes months, before they heard back from platforms for their decisions.While researchers who obtained TikTok API access reported what they considered a relatively short time frame before being able to use the API, that was not the case with Meta: they had to sign a legal agreement with the platform, which they described as a cumbersome process requiring legal assistance and considerable time resources. As Max put it:

It’s a little frustrating because, of course, there is a [legal] agreement that needs to be signed …… every time there is an agreement, it has to go to the legal office of the university, and then they interact with the legal office at Meta.Sometimes these interactions are quick and easy, and other times they stretch on for weeks or months …… For the Meta Content Library, it probably took several weeks, maybe a few months.” (Max)

In sum, although API application processes and experiences vary by platforms and by researchers, our interviews highlight that the applications generally seem to have become much more time-consuming and complex under the current regime, requiring researchers to invest more resources.This change made many researchers feel that platforms fell short of adequately supporting research endeavors.For researchers not based in the U.S., some requirements, such as IRB, made the application process even more cumbersome, as these requirements seem not to reflect research practices at non-U.S. institutions.As Ista, a E.U.-based computer scientist, put it: “They don’t understand the needs of the research community well enough, or what they need to do to meet them.

4.2.2. Barriers to Access

Unfortunately, many interviewees were denied access to research APIs after going through these application processes.They either found the reasons for access denial provided by platforms unreasonable or felt that they were not given adequate explanations, believing that their research initiatives fell under what the European Commission calls research on systemic risks affecting E.U. citizens.In worse cases, researchers did not receive any communication from the platforms on their application statuses.

The majority of the rejected cases in our interviews involved trying to access X data.Erkki, a digital social scientist and a Ph.D. student based in the E.U., shared that his application for X’s academic research access was declined.Erkki has extensively studied climate discourses on X since before Elon Musk acquired the platform and applied for the X API to conduct research on how parliamentarians and municipal electoral candidates communicate about climate change.Although he thought there would be no reason for his application to be declined because he had access to the previous versions of the Twitter API and used them for similar research projects, the platform denied him access on the grounds that his application was “incomplete or lacks sufficient detail to show” that he met the criteria under the DSA, according to the email from X Developer Team shared by Erkki.Yet, “I don’t really see any reason why they would reject it this time,” said Erkki, adding, “obviously it felt like an injustice has been done.

Ista (AR/E.U.) also applied for X’s academic research access with an eye to studying hate speech and counter-speech on the platform.After exchanging some emails with the platform to provide requested additional information, her application was denied without a “very consistent reason”:

[I]n the end, they denied [the application] with the reasoning that the research question doesn’t fit the scope of the DSA, which I disagree with. But there is no process to actually contest the denial.” (Ista)

Some researchers never heard back from the platforms, which they interpreted as effectivly a rejection.The lack of transparency and communication from the platforms left researchers in limbo, without knowing whether they would be able to carry out their research projects.Kay, a communication scholar at a public university in the U.S., applied for the Reddit API when it was first announced in 2023.But she “never heard anything back,” said Kay, adding that such a case is “not unique to me.”Corroborating Kay, Thirteen, a computer scientist in the U.S. studying social movements, said he only found out that his Reddit API application was rejected when he attended a workshop where a Reddit employee indicated that the decisions had been made.The information was only “disclosed informally at the [workshop], not in a transparent, public forum,” said Thirteen, adding that he was unsure why his application was rejected.

Researchers who are not affiliated with universities face even more limited data access because some platforms prioritize academic researchers.Different treatments of academic and non-academic researchers are not in alignment with Article 40 of DSA, which stipulates that researchers, including those at non-academic institutions, who are approved by national public authorities, must be granted access to platform data(deCarvalho, 2024).

TikTok is one of the platforms that, at the time this research was conducted, excluded non-academic researchers from API access.Bastien (NR/E.U.) applied for the TikTok API while appealing to the regulators of the country he was based in to obtain support for pushing TikTok to approve his application.However, the application was declined on the grounds that he was not affiliated with a university:“They told us, ‘right now, it’s not available to non-academics, but we are planning on releasing it wider soon and we’ll let you know.’ They never let us know,” said Bastien.He sent a follow-up email to TikTok, but received no response.Similarly, Devin, a computational social scientist at a publicly-funded European research institution, was denied API access on the same grounds.Devin thinks the reason for rejection is not justifiable because the Digital Service coordinators he had a meeting with made it clear that publicly-funded research institutions should be able to access data through platform APIs.“We’re not a university, it is true, but that’s not a reason that we do not fall under the DSA,” said Devin.This treatment of researchers not affiliated with universities is, in Bastien’s words, a “mismatch between what they offer and what the law says they have to offer [under] the DSA.

Another interviewee, Green Wave, a researcher at a civil society organization in Latin America providing support for marginalized populations targeted by harmful social media activities, also voiced frustration about TikTok’s exclusion of non-academic researchers.To save time and resources, she decided not to apply for the TikTok API, given that it was designed mainly for academic researchers in the U.S. and E.U.However, not having access to this infrastructure affected her work, and she felt that platforms giving access to a limited pool of researchers went against the spirit of Article 40 DSA:

The DSA was created to spot systemic risks that platforms created for society.Civil society is playing an important and critical role in this because they …… can see how different vulnerable communities are interacting with social platforms and service providers.I really think that it’s easier to give data access to the academics because they have the whole data process standardized.But in terms of impact and making visible the systemic risks, civil society should have a role in it and a voice.” (GreenWave)

Stepping back, our interviews highlight that the decision-making processes regarding API applications are far from transparent.In many rejected cases from X, the platform’s justification was that these research projects fell outside the scope of the DSA, without elaborating on why or how.Some researchers never received responses from the platforms, leaving them to speculate about the reasons for denial.For researchers not affiliated with a university, access is even more limited because the door is closed from the beginning.Although some researchers said they would try to apply again or find alternative ways to collect data, not having official API access can limit the amount and the kinds of data scholars can use, narrowing the scope of their research significantly (we will come back to this point later).

4.2.3. Barriers to Usability

Despite the challenges mentioned above, there are cases where researchers were granted API access.When researchers gain access to APIs, what are their experiences? Are they satisfied with how the APIs function and the data they receive? Unfortunately, the answer is currently no—most of our interviewees reported encountering various obstacles while using the APIs.We observed three primary issues that made it difficult for scholars to conduct research using APIs: (1) difficulty collecting data in the first place, (2) restrictive data caps, and (3) poor data quality.

Difficulty collecting data

One major challenge researchers highlighted was the complexity and poor usability of APIs, which creates a significant hurdle to collecting data in the first place.The platform most frequently mentioned was TikTok, aligning with previous reports(Brown, 2023).Andy (AR/E.U.) found data collection through the TikTok API particularly challenging due to frequent server errors.He explained that most of his data collection attempts have failed due to these errors, and he was “not able to get anything out of it.”Other researchers provided detailed descriptions of these errors.Fabio (AR/E.U.), who has been documenting the errors he has encountered while using the TikTok API, explained that he experienced multiple server errors, including ones that indicated the server received too many requests and problems with pagination.“In theory, you should be able to download 100,000 videos a day, but in practice, I never even remotely reached that number,” said Fabio.Max (AR/U.S.) corroborated these issues, noting that he also experienced similar server errors when using the TikTok API.“[I]t’s really not a very good quality product …… a lot of the queries get errors in return. We have to try them several times” to collect data, said Max.

Researchers also described challenges in collecting data from other platforms.In discussing the Meta Content Library API, Fabio (AR/E.U.) said the tool did not function as described in its public documentation, making data collection difficult.He described the tool as “barely usable,” arguing that its design is “way too much focused on preserving any kind of risk of leaking the data” rather than facilitating research.Elaborating on this point, he explained that users must navigate complicated, time-consuming steps before being able to use the API, including downloading a VPN software, enabling dual-factor authentication multiple times, and logging into virtual environments, to name a few.Fabio said “I don’t think anyone is actually using it …… I think we’re giving up with this third-party environment because it’s not really usable.

Restrictive data caps

Even when researchers were able to collect some data, various data caps prevented them from obtaining adequate data to answer their research questions.Researchers acknowledged that certain limitations are necessary, but found current data caps excessively restrictive.One such limitation is the amount of data that can be retrieved.In discussing Meta Content Library, Brian (AR/U.S.) said he has not “really used it” because of its restrictions: “It has very strong restrictions on the amount of data you can get at any given time …… that make it virtually useless for the sort of work that I want to do,” explained Brian.Although the tool allows users to pull 100,000 records at a time, he noted that “From a big data perspective, that’s not a lot at all.

Similarly, Thirteen (AR/U.S.) found the TikTok API’s rate limit “not very generous.”He explained that although he was trying to collect data from a relatively small number of TikTok accounts (about “a few thousand”), what he deemed as strict rate limits made it “challenging to manage all the necessary data requests within the required refresh window.”For large research teams, these restrictions mean multiple projects cannot run at the same time as they need to work under a single quota limit.At Max’s (AR/U.S.) research lab, “there’s a single quota limit for all the people in the group” and “we have to make an agreement so that only one particular project can use the API at any given time because the quota is very, very limited,” said Max.

Another frequently-mentioned limitation was restricted access to certain data fields.For instance, researchers, particularly those based in the E.U., reported that Meta’s data access rules prevented them from retrieving information from profiles or pages with fewer than a certain number of followers.This, according to researchers, posed a significant challenge for researchers studying countries with smaller populations than the United States, where Meta is headquartered.Meta allows researchers to download Facebook content from pages with 15,000 or more likes/followers and profiles with 25,000 or more followers(Meta Platforms, Inc., nd). However, Bastien (NR/E.U.) said these thresholds are a “very high bar” for a country with a relatively small population he is interested in observing.Frank, an E.U.-based media and communication researcher, also expressed frustration that these restrictions seem to be set based on U.S. standards, making them impractical for his research: “What’s considered a major page or a big page on Facebook, or a big account on Instagram is something very different in the U.S.”Brian(AR/U.S.) saw various restrictions on accessible data fields placed by platforms as a major obstacle in conducting novel research:

They [platforms] are not doing the research.They’re always going to be designing the tool for last year’s research …… If you’re trying to replicate what other researchers have done, you’re not doing something novel anymore …… [some data fields are] not there because somebody decided that that wasn’t important and this other thing was important.There’s certain affordances of the tool that make it well suited to answer some questions and not others.” (Brian)

Poor data quality

Interviewees also reported inaccuracy in API data, particularly with the TikTok API.In alignment with the previous literature documenting notable discrepancies between API data and the TikTok website before July 2024(Pearson etal., 2025), nearly all of the interviewees who have used the TikTok API reported either missing components or inconsistencies in terms of publicly available videos.For example, Emily, a computational social scientist based in Europe, said the API often returned fewer, or sometimes more, transcripts of TikTok videos than were actually available on the TikTok app for a given time period.She noticed this inconsistency when she compared the API data with datasets she purchased from a third-party vendor.Emily added that the TikTok API is “not ideal” for research as she is required to do extra work to check inconsistencies and clean the API data.

Like Emily, other researchers shared that they have regularly experienced such inconsistencies with the TikTok API.Frank (AR/E.U.) also manually compared posts of a politician collected from the TikTok API with what’s displayed on the app, and found inaccurate results:

I’ll set up [the query to collect] all of [the politician’s] TikToks from September of this year.Run it, and it collects 10 TikToks.And then I look at his profile, scroll, and try to manually check that it’s correct.He has 12 during September of that year.I …… run the API query again, and then it only gets nine.So there’s something weird happening with the API and it doesn’t give consistent results.” (Frank)

Although the timeframes within which researchers collected their data vary and the platform seems to have been working on fixing these problems(Pearson etal., 2025), interviewees found the API data highly unreliable and insufficient for conducting research.As a result of these discrepancies, researchers ended up “not getting into it [using the API]” (Ista), having to find “another way to get that data” (Frank), or even “deprioritized my TikTok research” (Thirteen).Thirteen (AR/U.S.) moved away from conducting research on TikTok because he and his colleagues “ultimately did not believe that we would be able to answer the research question” using TikTok API data.The current state of data access made researchers feel that the platforms are simply trying to “check the box” (Frank) to meet the minimum regulatory requirements rather than genuinely supporting researchers. Many researchers expressed distrust in platform efforts, as Frank put it:

I have a sneaking suspicion that platform owners, not only TikTok but X and Meta and all these different actors, they are obliged to provide something to researchers but what that something is and how it works or doesn’t work isn’t necessarily regulated. It just says they have to do something. So sometimes, my mind drifts into thinking that this is the bare minimum of what they can provide.” (Frank)

So far, we have described researchers’ experiences of applying for APIs, cases of denied API applications, and the challenges they face in collecting adequate data through APIs.Our interviews indicate that the application process and data access are in a constant state of flux, creating ambiguity.While it takes time and resources for platforms to build reliable and sustainable research infrastructures, interviewees’ experiences highlight that the current data regime has fallen significantly short of allowing researchers to conduct meaningful research under the DSA.

4.3. Alternative Approaches

Given the many factors that limit researchers’ ability to successfully traverse the process of permitted access, from submitting the application and gaining data access to using the data for research, our participants highlighted the need for individual and collective alternative data access approaches.

4.3.1. Web Scraping

Many participants noted that, despite changing data access procedures, there were still unofficial methods for collecting social media data. Peter, a European academic researcher, said, “if there are APIs, it’s nice, else I will try to find my own way of getting data.”Similarly, Sarah, a European digital studies researcher, noted that data collection is possible without APIs: “Python scripts where you can get Instagram data with. There’s also still the possibility of in-browser scraping, which is done for Twitter.”However, these alternative collection approaches also had limitations, such as “temporal gaps,” meaning “you cannot go into historical data well,” added Sarah.

One common alternative approach mentioned was data scraping.Some platforms, such as Alphabet (which owns Google and YouTube), allow permissive scraping, as Bastien (NR/E.U.) was able to get permission for: “What they grant you is the right to scrape YouTube basically from a given IP address.”However, in most cases, the legality and ethical ambiguity of scraping have raised concerns.For example, Kate, a European computer science researcher, said, “There are a lot of scraping tools even now with which you could scrape Twitter, but I’m not sure if it’s ethically okay to do it for research.”In other words, even if researchers could scrape data (and in many cases, they were able to), researchers wanted to ensure that they did so legally and ethically.Kay (AR/U.S.) said, “I will go to my grave saying scraping is not a crime, but it does go against the terms,” highlighting this difficult balance between doing important research on public data and adhering to terms of service obligations.She said she still collected data, but set herself some boundaries: “I do still scrape web forums that are open, that don’t require logins and websites.”Researchers like Devin (NR/E.U.) also noted that they were more comfortable scraping data about aggregate information (such as engagement), rather than individual-user data: “We didn’t look at user-level data. But instance-level data. So we scrape data. And that’s also data that I think is safe to consider [a] public right.

Others said they would avoid using scraping as a data access strategy altogether because it is not the most reliable means of collecting data.For instance, Erkki (AR/E.U.) said he “really wouldn’t trust” any longitudinal or historical research relying on scraping because it is hard to obtain comprehensive data.Similarly, Fabio (AR/E.U.) said he is wary that the code used to scrape data could suddenly stop working because “when you are scraping, you are basically getting data in a way that is not supported by the platform.”Fabio added that the issue is not only the lack of official platform support for scraping but also the barriers they could put up to stop scrapers (e.g., rate-limiting traffic, detecting and blocking bots, or even taking legal action).Scraping means it is “not only not supported [by the platform] but sometimes actively fight [the platform],” said Fabio.

4.3.2. Other Approaches

There were many other alternatives that researchers considered other than scraping.Some used third-party applications and tools, such as Zeeschuimer, a browser extension; Apify, a web-scraping and automation cloud platform; and Brandwatch, a social listening tool. Others used existing archives of past data.Kate (AR/E.U.), for example, noted that she was using data she had collected previously: “I’m working with archival data that I have collected because it’s no longer feasible for researchers to get data [through APIs].”While these approaches have their own individual advantages and disadvantages, they collectively rely on some resources that are not democratically available to all researchers.Barriers include money (to buy access or to pay participants), coding proficiency (to build or use a scraper), or the difficulty of obtaining comprehensive historical or longitudinal data.

Because alternative data collections are imperfect, but official access is not available, many expressed concern that important research was not being conducted.Discussing the disadvantages and challenges of using alternative data access methods, Fabio (AR/E.U.) lamented, “You need to have a reliable way to assess this data …… this is really unfortunate because we don’t really have scraping [or other reliable and equitably accessible tools] and we don’t really have the data from official channels.”Bastien also expressed frustration that third-party vendors “don’t necessarily have the data we need and the format we need” and such tools usually provide data that is collected in a “less systematic way than what we would have with the APIs.

4.4. Present Experience with the DSA

Discussions about the DSA and its potential value are future-focused, with the acknowledgment that government regulation is only considered because platforms are not themselves willing to provide the data necessary to conduct independent research.In general, the participants expressed “cautious optimism” about the DSA: the future may rely on regulatory bodies to hold platforms to account, but it is not clear how this will actually occur and how data access will be provided.For example, Peter (AR/E.U.) expressed optimism that platforms do not necessarily determine what access is or is not permissible:

One of the crucial parts of the DSA …… is that the decision of who gets the data access and what kind of data should be given access to isn’t with the platforms.In that framework, it is with what’s called the digital service coordinator.So, basically, a government agency in one of the member countries.I think that is a model that works a lot better for academic research because, of course, platforms have an interest in controlling the type of data they get you but also minimizing the amount of data they give and so on.” (Peter)

In this quote, Peter puts a lot of responsibility on the Digital Service Coordinator (DSC) as the “third-party” between what platforms are willing to provide and what researchers claim to want as part of their research.444DSCs assess the eligibility of researchers under DSA Article 40.4 that concerns private data but not 40.12, the subject of the public data access programs, see (https://algorithmic-transparency.ec.europa.eu/news/faqs-dsa-data-access-researchers-2023-12-13_en).

When speaking more tangibly, researchers expressed frustration.Erkki (AR/E.U.), in particular, highlighted the difference between applying (which could be easy) and the common rejection researchers experience. For example, he applied for Twitter access and had earlier versions of API access. However, his Twitter application under the DSA was rejected:

To me, initially it looked pretty straightforward. I just fill a form and I send it. But, of course, the wordings and the context of some of the questions are different for the U.S. and for [the country he’s based in] or basically any European nation. E.U. countries tend to have not politically aligned national institutions and research centers and such.But the tone of the whole form, to me, indicated that it might be a mistake for me to apply with my status at the [non-academic research institute] instead of [the university he is affiliated with], which I did not consider …… at that point.” (Erkki)

This quote highlights the minutia that can affect the acceptance or rejection of an access application.

Green Wave (NR/L.A.) saw some optimism in data access programs mandated by the DSA, noting that Article 40 DSA “is an interesting opportunity for researchers outside the European Union to maybe have some new possibilities in terms of data access.”However, she also argued that there is a “transnational group disparity” between the Global South and the Global North as well as academia and civil society.As a result, the gap between the data “haves” and the data “have not’s” can widen under the DSA, particularly if they continue to disregard research outside of the academy.These experiences suggest a dichotomy between what participants hope will come out of the DSA and what they are currently experiencing.

4.5. Desired Future

4.5.1. Inflection Point to do Better Research

Many interviewees shared the view that social media research is at a critical juncture with the loss of free APIs.While the current conjuncture is often described as the “API Apocalypse” (Sarah, AR/E.U.), researchers also discussed ways to conduct better research within the current limitations, critically reflecting upon how the older data access regime has shaped their research fields.Some researchers argued that social media research has tended to concentrate on platforms with easier access to data, which they said has led to overlooking other important avenues for communication.They agreed that Twitter (X) was a popular platform for research, partly because data was readily available. Peter (AR/E.U.)’s following quote illustrates this point well:

I think there was also a tendency for researchers to just do their research on Twitter because it was convenient and you could get a lot of data there.And then that led to this research landscape where there was a lot of research that kind of took Twitter as a proxy for the internet, which I think is a problematic notion. So in a way, the fact that now we as researchers also need to maybe consider, okay, what platform should we be looking at?” (Peter)

Max (AR/U.S.) also saw the potential of producing more research on other relevant platforms as “the bright side” of the ongoing discourses around data access under the DSA.“Twitter was the platform where this data was most available …… but as people are moving to new platforms, especially because these platforms are open, then we hope we can do that again and use the kind of data on other platforms,” added Max.

Others argued that researchers should consider the possibility of small data research instead of focusing only on large datasets.For instance, Sarah (AR/E.U.) said that, with small data, researchers can make sure the data is reliable by manually checking if there are any discrepancies in data collected through APIs or scrapers.Although the loss of free APIs has limited the scope of research, it “also points out what you still can do even with small data …… you’re much more inclined to zoom into certain niches,” added Sarah. As these interviewees’ quotes indicate, some researchers are exploring new opportunities for social media research despite the lack of adequate data access under the DSA.

4.5.2. Collective Action and Policy

At the same time, however, researchers emphasized that platforms are responsible for ensuring equitable and adequate data access and that the research community needs to collectively pressure the platforms to step up their efforts to support research.They argued that platforms currently place strict limitations on who can access what kinds of data, as we have described above, and that they have not sufficiently considered the needs of the research community.

Brian (AR/U.S.) said that the current data access discourses are primarily driven by “the language of privacy, data minimization, protecting human subjects”, the rhetoric that platforms could weaponize to limit data access as much as they can.While acknowledging that “all of that is obviously an important consideration,” Brian indicated that a healthy balance between these ethical issues and ensuring data access to conduct important research is critical.“The bureaucracy is not serving the science [and] we’re not able to actually answer meaningful questions” under the current framework, added Brian.Researchers also voiced concerns that the current data access framework may compromise the independence of research from platform influence, calling for a community-wide discussion on the issue.

The current API application process that requires researchers to provide the details of their projects and the lack of transparency in platforms’ decision-making processes were of particular concern among researchers, as platforms could easily deny API access to research that could put pressure on the platforms to change their practices.Researchers should call on platforms to “make independent research possible within the regulatory framework,” said Sarah (AR/E.U.).To that end, researchers said it is paramount to build a joint effort to lobby for more ethical and sustainable data access.Fabio (AR/E.U.) argued that “the creation of a community of scholars and projects …… is super important from this perspective, because I’m always dealing with platforms in a point-to-point relationship.

4.5.3. Types of API Data Desired

As we have described in Section4.3, researchers are eager to explore alternative data collection methods to overcome the current limitations of APIs, but there are significant limitations of such approaches.For many social media scholars, reliable and sustainable API data access remains a desired means to conduct novel and creative research; researchers spoke about what types of data they wish they could access through APIs to answer their research questions.In this subsection, we list the desired API data mentioned by our interviewees in the hope that it will serve as a step toward encouraging platform-researcher dialogue on improving data access.

One type of data often mentioned by researchers is information that is already publicly accessible or data on public figures (e.g., politicians, celebrities, and news organizations).Researchers argued that platforms should be able to make public data readily available for all researchers, regardless of their affiliations, given that there is less concern about privacy.Bastien (NR/E.U.) shared that he and his colleagues have mainly requested publicly available data from different platforms, but these attempts have been unsuccessful so far, adding that the bar is higher for non-academic researchers like him.“Ideally, we would want to have access to an API that was broadly like the (older) Twitter API,” but if that’s not possible, “what we would have liked was the possibility to input one account and get a list of all of their publicly accessible content for the last N years, along with a number of interactions,” added Bastien.Emily (AR/E.U.), who studies TikTok posts of high-profile comedians, said platforms should loosen limitations on data related to public figures because “their data is in public domain” and it should be “OK for us to have it or even allow researchers to share the data with other scholars.” An important effort to develop protocol for determining what counts as publicly accessible content has been advanced by the Knight-Georgetown Institute’s Gold Standard Working Group.555https://kgi.georgetown.edu/expert-working-groups/gold-standard-expert-working-group/gold-standard-faq

Another request we heard frequently is access to larger-scale data.Many researchers emphasized that large-scale data access is crucial for understanding trends, user behaviors, and platform activities (e.g., content moderation), and information flows on a platform-wide scale.“Without APIs, I’m never going to be able to create very big data sets,” said Wellstone, a researcher at a for-profit company.This limitation forces researchers to work with smaller samples that may not be representative, making it hard to draw reliable and generalizable conclusions regarding platforms.For example, Sarah (AR/E.U.) said she has focused on small data research on TikTok and such an approach can contribute to important research findings, but scalability is hard to achieve without the API:

The captions, the hashtags, the visuals, the sound is a very important part on TikTok where all these remixes and riffs and stuff is going on …… now I can get at the data but not at scale. And this is where I would need the API for.” (Sarah)

Like Sarah, many other researchers discussed the types of API and data they would want from TikTok, in part because the platform has grown popular in many countries. This data included visual captions embedded in the videos (Emily, AR/E.U.), commenter identities that allow the creation of interaction networks based on a given content (Max, AR/U.S.), and all posts from a given account (Thirteen, AR/U.S.).

Overall, these findings highlight a dilemma: while researchers hope to rely less on platforms for data to secure independence, they also recognize limitations of alternative tools and the potential usefulness of official APIs.

5. Discussion

Our empirical findings have highlighted that, despite platforms’ efforts to meet data transparency requirements mandated by the DSA, their practices vary greatly, and current data access programs are far from being adequate to facilitate research on digital platforms.While scholars have previously performed API audits(Pearson etal., 2025), our paper provides details of various challenges researchers face in different phases of navigating the permitted access across platforms, highlighting institutional, regional, and financial obstacles that exacerbate inequities in data access.

These results should be understood with some caveats in mind.First, both our qualitative and quantitative studies relied on purposive sampling, and the experiences of the researchers involved in this study are not representative of any research field or region.This also means our paper falls short of documenting the state of researcher data access on all VLOPs and VLOSEs equally.However, our paper provides a broad overview of the current data access regime under the DSA, uncovering cross-platform trends and issues rather than a single platform, and encapsulates the experiences of researchers from different regions and backgrounds.

Second, we are still at the very beginning of the post-post-API era and the DSA regulatory framework has not been fully established yet.This means that what we described in this paper are subject to constant change. However, these findings may help shape global regulations in the post-post-API age. This can contribute to the CSCW research community—which has relied on social media as an important data source—by documenting the transitional phases of the current data access regime and by providing a point of comparison for future research.

To encourage a future-oriented discussion and concrete actions toward improved data access for the CSCW community and beyond, we offer recommendations for platforms, researchers, and policy based on our participants’ experiences documented in this paper.

5.1. Recommendations for Platforms

First and foremost, platforms should ensure transparency in the API application process.While previous research has identified inaccuracy in API data that hinders research(Pearson etal., 2025; Corso etal., 2024), our findings highlighted that obtaining API access itself remains a hurdle.Many researchers have been denied API access without detailed justifications or adequate communication from the platforms.Based on our interviews, many cases appear to fall under the DSA’s scope according to both our analysis and participants’ perspectives. However, the lack of clear guidelines and communication makes it difficult for researchers to properly frame their data access requests or determine whether reapplication would be worthwhile.For instance, while the DSA stipulates that data access should be available for research on systemic risks in Europe, there remains significant uncertainty around eligibility criteria—both in terms of which researchers qualify and what types of research projects are considered valid(Goanta etal., 2025).Furthermore, as our interviews surfaced, researchers are expected to request specific data for their projects, but they do not know what specific data is available(Goanta etal., 2025).

While these issues stem partly from the DSA’s operational limitations, platforms’ lack of coherent application procedures and ambiguous decision-making have led researchers to suspect intentional suppression of unfavorable research while only meeting minimal regulatory requirements.This has further eroded researchers’ already fragile trust in platform-provided data access(Allen etal., 2021).Given that platforms maintain complete control over data access permissions, increased transparency and accountability are essential to foster productive researcher-platform dialogue and establish ethical data access practices that genuinely serve the public interest.

We also recommend that platforms consider simplifying the application design to facilitate more flexible uses of APIs. For example, under the current data access programs, researchers can only apply for temporary, project-based access, which some participants said significantly limits their ability to complete research, implement longer-term projects, or share the API access within a research lab.It is understandable that platforms are reluctant to provide unlimited access to their data to avoid overburden, but more flexible access frameworks should be possible (at least in theory) given that older programs such as Twitter API and Meta’s CrowdTangle featured greater versatility.Ideally, platforms should allow researchers to apply for more long-term data access, rather than project-based access, so that researchers can conduct multiple research projects simultaneously as long as the overall research endeavor falls under the scope of the DSA.

Additionally, we recommend that platforms consider a more realistic balance between privacy protection and research.While the current design of API data access programs appears to be primarily driven by privacy considerations, participants saw the privacy-related criteria as disproportionately strict to the sensitivity of publicly accessible data.This approach is, unfortunately, discouraging many researchers from applying and preventing wider data access. As participants argued, platforms should consider less strict limitations on access to public data in terms of both quota and available data points.

For ethical and practical data access frameworks, platforms should engage in direct conversations on these issues with researchers from different disciplines and regions.An example of this is Twitter’s now-disbanded Academic Research advisory board that served to facilitate dialogue between researchers and the platform(Blakey, 2024).In a more recent development, it is promising to see that Reddit has worked directly with a small group of researchers to gather their feedback to improve the Reddit for Researchers (R4R) Beta Program(u/PeerRevue, [n. d.]).Although Reddit is not classified as a VLOP, such an initiative is a promising first step that can set a model example. We argue that VLOPs and VLOSEs should invest more resources into supporting research by listening to researchers’ needs because, ultimately, doing so will contribute to improving the safety and functionality of their own platforms.

5.2. Recommendations for Researchers

Over the last decade, research in HCI and CSCW has increasingly turned to social media data as a primary source to explore online social dynamics and human behavior, contributing to the design of new technological tools and interventions(AlvaradoGarcia etal., 2025).To continue building on this significant body of research, it is paramount that researchers foster interdisciplinary coalitions to collectively advocate for better researcher access to platform data.Our interviews highlighted the need for more organized efforts to improve data access despite individual researchers’ attempts to appeal or work with regulators and platforms.Fortunately, in addition to research that examined API performance, there are ongoing initiatives working toward informed policies and ethical, independent data access.To name a few examples, the Knight-Georgetown Institute666https://kgi.georgetown.edu focuses on developing consistent standards for accessing publicly available platform data under DSA Article 40, offering recommendations to regulators(Chapman, 2024).Coalition for Independent Technology Research777https://independenttechresearch.org has been actively building an interdisciplinary coalition involving not only academics but journalists, civil society organizations, and community scientists to advocate for data transparency and ethical research.

As underscored in this paper, there are currently notable disparities in data access between academic and non-academic researchers; E.U.-based researchers and those who are outside of the region, particularly in the Global South; and those with funding and those without.Given that third-party tools and other alternatives like scraping are neither adequate nor affordable for everyone, the institutional, geopolitical, and financial data access gaps can further increase among these groups without sustainable and equal access to official API data, and researchers, particularly those affiliated with academic institutions, should aim for inclusive dialogue.

At the same time, researchers should remain flexible and continue exploring different data collection approaches rather than focusing solely on the past models of data access.The data collection methods discussed thus far—APIs, scraping, and other alternatives—are fundamentally platform-centric, as they rely heavily on platforms’ data access policies and infrastructure.However, an alternative paradigm has been gaining momentum: user-sourced data collection, which encompasses both data donation and tracking approaches(Ohme etal., 2024).In the data donation model, users exercise their right to data portability (guaranteed under the General Data Protection Regulation) to download their personal platform data and voluntarily share it with researchers(Carrière etal., 2025).This approach has been facilitated by major platforms implementing data export functionality to comply with regulations.The tracking model represents another user-centric approach, typically involving browser extensions that monitor users’ platform interactions and data usage patterns(Christner etal., 2022).Projects like the National Internet Observatory are developing robust infrastructure to support such tracking-based data collection and sharing among researchers under enhanced ethical frameworks(Feal etal., 2024).While these user-sourced methods face their own set of challenges, they enable novel research directions, particularly in understanding user engagement and behavior patterns, that were previously impossible with traditional platform-centric approaches.

5.3. Recommendations for Policymakers

Building researcher coalitions and collectively lobbying for improved API data access is important, but clear regulatory frameworks are necessary when platforms refuse to respond swiftly and adequately.This is where we stand in the post-post-API era. We argue that more regulation is an inevitable step forward, particularly for independent research that holds these platforms responsible.

In Europe, as part of the DSA, the European Commission announced the Delegated Regulation to provide more clarity on how this data transparency should work in practice, focusing on specifying “the conditions under which sharing of data should take place and, the purposes for which the data may be used”(European Commission, 2023). This is an important step to address “critical legal interpretational and operational challenges”(Goanta etal., 2025) that have undermined the realization of data transparency, the regulation aims for.

While this move towards more evidence-based policymaking potentially addresses the arbitrary nature and the lack of transparency in platforms’ decisions on data access permission, as well as the lack of consistency across platform data-sharing practices, policies should also support plurality in data collection approaches.Beyond complying with basic regulatory requirements, platforms have little motivation to drastically expand the sharing of their data with researchers when they can profit from selling this data or leverage it to power their own generative AI tools.

Bearing this in mind, we argue that regulatory bodies should encourage not only permission-based data access but also more independent data access procedures.For example, it may be fruitful to consider the possibility of a regulatory framework that allows scraping of public data, especially when platforms fall short of providing necessary data.As described in this paper, scraping is one of the alternative data collection methods some researchers have relied on, but litigation attempts, particularly by Elon Musk(Bond, 2023), have created chilling effects among researchers wary of violating platform terms of services.Policies supporting alternative data collection methods can help research that is critical of platforms without the fear of suddenly losing data access or resources.

The issue of data access becomes even more complicated from a global perspective.Social media platforms and search engines operate on a global scale and, ideally, their data should be accessible to researchers regardless of language or region.In reality, however, there is a divergence in platforms’ regional policies, with Europe seeing more promising developments with the introduction of the GDPR (General Data Protection Regulation) and the DSA, as well as regulatory bodies like Ofcom.This means Europe may lead empirical social media research going forward, while other regions, particularly the Global South, may be left behind, exacerbating existing imbalances in access to resources.Platforms’ data access policies are ephemeral, and it is neither realistic nor desirable to allow platforms to set the standards for data access.While it is unclear how much data access can be achieved under the GDPR and the DSA, these policies can be a potential model for other parts of the world.

6. Conclusion

In this study, we used a mixed-method approach to understand researchers’ experiences with digital platform data access, platforms’ willingness to provide data in the post-post API age, and the necessary data for studying platforms robustly and transparently.New programs and tools mandated by the DSA initially garnered some hope among research communities, but our study indicates that they are far from providing wide researcher data access, nor do they provide data with necessary quality or quantity for emperical research.While it is encouraging to see some platforms try to improve API access, and regulatory efforts to increase clarity in how these data access programs are enacted, we are experiencing a significant setback from the age of mass data access.

Without mincing words, research about social media and digital platforms is in dire straits, particularly for scholarship that seeks to hold platform accountable. What researchers are emperically able to study now largely depends on regulatory frameworks, like the DSA. However, it is unclear how these policies will be enacted practically, or whether it will hinder important and alternative approaches to data collection, including scraping and user-sourced data collection. While these approaches are beyond the scope of what the DSA can consider, they are nevertheless important mechanisms for which researchers are able to gather publicly-accessible and/or user-provided data. To encourage platform and regulatory efforts based on open and direct engagement with the research community and user base, further research and case studies are needed to document transitional phases of the DSA’s data access regime.

References

  • (1)
  • Alba (2019)Davey Alba. 2019.Ahead of 2020, Facebook falls short on plan to share data on disinformation.https://www.nytimes.com/2019/09/29/technology/facebook-disinformation.html [Accessed 2025-05-01].
  • Alba (2021)Davey Alba. 2021.Facebook sent flawed data to misinformation researchers.https://www.nytimes.com/live/2020/2020-election-misinformation-distortions#facebook-sent-flawed-data-to-misinformation-researchers [Accessed 2025-05-01].
  • Allen etal. (2021)Jennifer Allen, Markus Mobius, DavidM Rothschild, and DuncanJ Watts. 2021.Research note: Examining potential bias in large-scale censored data.Harvard Kennedy School Misinformation Review (2021).
  • AlvaradoGarcia etal. (2025)Adriana AlvaradoGarcia, Tianling Yang, and Milagros Miceli. 2025.What Knowledge Do We Produce from Social Media Data and How?Proceedings of the ACM on Human-Computer Interaction 9, 1 (2025), 1–45.
  • Araujo etal. (2022)Theo Araujo, Jef Ausloos, Wouter van Atteveldt, Felicia Loecherbach, Judith Moeller, Jakob Ohme, Damian Trilling, Bob vande Velde, Claes DeVreese, and Kasper Welbers. 2022.OSD2F: An open-source data donation framework.Computational Communication Research 4, 2 (2022), 372–387.
  • Bhimdiwala etal. (2024)Ayesha Bhimdiwala, Krishna AkhilKumar Adavi, and Ahmer Arif. 2024.Fighting for Their Voice: Understanding Indian Muslim Women’s Responses to Networked Harassment.Proceedings of the ACM on Human-Computer Interaction 8, CSCW1 (2024), 1–24.
  • Blakey (2024)Elizabeth Blakey. 2024.The Day Data Transparency Died: How Twitter/X Cut Off Access for Social Research.Contexts 23, 2 (2024), 30–35.
  • Bond (2023)Shannon Bond. 2023.Elon Musk sues disinformation researchers, claiming they are driving away advertisers.https://www.npr.org/2023/08/01/1191318468/elon-musk-sues-disinformation-researchers-claiming-they-are-driving-away-adverti [Accessed 2025-05-01].
  • Breuer etal. (2023)Johannes Breuer, Zoltán Kmetty, Mario Haim, and Sebastian Stier. 2023.User-centric approaches for collecting Facebook data in the ‘post-API age’: Experiences from two studies and recommendations for future research.Information, Communication & Society 26, 14 (2023), 2649–2668.
  • Brown (2023)MeganA. Brown. 2023.The Problem with TikTok’s New Researcher API is Not TikTok.https://www.techpolicy.press/the-problem-with-tiktoks-new-researcher-api-is-not-tiktok [Accessed 2025-05-01].
  • Brown etal. (2024)MeganA. Brown, Andrew Gruen, Gabe Maldoff, Solomon Messing, Zeve Sanderson, and Michael Zimmer. 2024.Web Scraping for Research: Legal, Ethical, Institutional, and Scientific Considerations.arXiv:2410.23432[cs.CY]https://arxiv.org/abs/2410.23432
  • Bruns (2021)Axel Bruns. 2021.After the ‘APIcalypse’: Social media platforms and their fight against critical scholarly research.Disinformation and data lockdown on social platforms (2021), 14–36.
  • Bucher (2013)Taina Bucher. 2013.Objects of intense feeling: The case of the Twitter API.Computational Culture 3 (2013).
  • Carrière etal. (2025)ThijsC Carrière, Laura Boeschoten, Bella Struminskaya, HeleenL Janssen, NiekC de Schipper, and Theo Araujo. 2025.Best practices for studies using digital data donation.Quality & Quantity 59, 1 (2025), 389–412.doi:10.1007/s11135-024-01983-x
  • Chapman (2024)Peter Chapman. 2024.Laying the Foundation for Independent Platform Data Access in the EU.https://kgi.georgetown.edu/research-and-commentary/independent-platform-data-access-eu [Accessed 2025-05-01].
  • Chen etal. (2020)Emily Chen, Kristina Lerman, Emilio Ferrara, etal. 2020.Tracking social media discourse about the COVID-19 pandemic: Development of a public coronavirus twitter data set.JMIR public health and surveillance 6, 2 (2020), e19273.
  • Christner etal. (2022)Clara Christner, Aleksandra Urman, Silke Adam, and Michaela Maier. 2022.Automated Tracking Approaches for Studying Online Media Use: A Critical Review and Recommendations.Communication Methods and Measures 16, 2 (2022), 79–95.doi:10.1080/19312458.2021.1907841
  • Clama (2023)Justine Clama. 2023.Twitter just closed the book on academic research.https://www.theverge.com/2023/5/31/23739084/twitter-elon-musk-api-policy-chilling-academic-research [Accessed 2025-05-01].
  • Corso etal. (2024)Francesco Corso, Francesco Pierri, and Gianmarco DeFrancisciMorales. 2024.What we can learn from TikTok through its Research API. In Companion Publication of the 16th ACM Web Science Conference (Stuttgart, Germany) (Websci Companion ’24). Association for Computing Machinery, New York, NY, USA, 110–114.doi:10.1145/3630744.3663611
  • deCarvalho (2024)MateusCorreia de Carvalho. 2024.Researcher Access to Platform Data and the DSA: One Step Forward, Three Steps Back.https://www.techpolicy.press/researcher-access-to-platform-data-and-the-dsa-one-step-forward-three-steps-back [Accessed 2025-04-28].
  • European Commission (2023)European Commission. 2023.Delegated Regulation on data access provided for in the Digital Services Act.https://ec.europa.eu/info/law/better-regulation/have-your-say/initiatives/13817-Delegated-Regulation-on-data-access-provided-for-in-the-Digital-Services-Act_en [Accessed 2025-05-01].
  • Feal etal. (2024)Alvaro Feal, Jeffrey Gleason, Pranav Goel, Jason Radford, Kai-Cheng Yang, John Basl, Michelle Meyer, David Choffnes, Christo Wilson, and David Lazer. 2024.Introduction to National Internet Observatory. In Proceedings of ICWSM Data Challenge Workshop (Buffalo, NY, USA). AAAI.
  • Freelon (2018)Deen Freelon. 2018.Computational research in the post-API age.Political Communication 35, 4 (2018), 665–668.
  • Goanta etal. (2025)Catalina Goanta, Savvas Zannettou, Rishabh Kaushal, Jacob vande Kerkhof, Thales Bertaglia, Taylor Annabell, Haoyang Gui, Gerasimos Spanakis, and Adriana Iamnitchi. 2025.The Great Data Standoff: Researchers vs. Platforms Under the Digital Services Act.arXiv:2505.01122[cs.CY]https://arxiv.org/abs/2505.01122
  • Hoffmann etal. (2022)Jordan Hoffmann, Sebastian Borgeaud, Arthur Mensch, Elena Buchatskaya, Trevor Cai, Eliza Rutherford, Diego de LasCasas, LisaAnne Hendricks, Johannes Welbl, Aidan Clark, Tom Hennigan, Eric Noland, Katie Millican, George vanden Driessche, Bogdan Damoc, Aurelia Guy, Simon Osindero, Karen Simonyan, Erich Elsen, JackW. Rae, Oriol Vinyals, and Laurent Sifre. 2022.Training Compute-Optimal Large Language Models.arXiv:2203.15556[cs.CL]https://arxiv.org/abs/2203.15556
  • Jackson etal. (2020)SarahJ Jackson, Moya Bailey, and BrookeFoucault Welles. 2020.# HashtagActivism: Networks of race and gender justice.MIT Press.
  • Karra etal. (2023)AbhinavReddy Karra, Ranjan Jaiswal, and Sanorita Dey. 2023.Fishing for Validation: Understanding Promises and Challenges of a Private Social Media Group for COVID-19 Long-Hauler Patients.Proceedings of the ACM on Human-Computer Interaction 7, CSCW1 (2023), 1–34.
  • King and Persily (2020)Gary King and Nathaniel Persily. 2020.A new model for industry–academic partnerships.PS: Political Science & Politics 53, 4 (2020), 703–709.
  • Lazer etal. (2009)David Lazer, Alex Pentland, Lada Adamic, Sinan Aral, Albert-László Barabási, Devon Brewer, Nicholas Christakis, Noshir Contractor, James Fowler, Myron Gutmann, etal. 2009.Computational social science.Science 323, 5915 (2009), 721–723.
  • Meta Platforms, Inc. (nd)Meta Platforms, Inc. n.d..Meta Content Library and API.https://developers.facebook.com/docs/content-library-and-api/content-library [Accessed 2025-04-28].
  • Murtfeldt etal. (2024)Ryan Murtfeldt, Naomi Alterman, Ihsan Kahveci, and JevinD. West. 2024.RIP Twitter API: A eulogy to its vast research contributions.arXiv:2404.07340[cs.CY]https://arxiv.org/abs/2404.07340
  • Ohme etal. (2024)Jakob Ohme, Theo Araujo, Laura Boeschoten, Deen Freelon, Nilam Ram, ByronB. Reeves, and Thomas N.Robinson and. 2024.Digital Trace Data Collection for Social Media Effects Research: APIs, Data Donation, and (Screen) Tracking.Communication Methods and Measures 18, 2 (2024), 124–141.doi:10.1080/19312458.2023.2181319
  • Ortutay (2024)Barbara Ortutay. 2024.Meta kills off misinformation tracking tool CrowdTangle despite pleas from researchers, journalists.https://apnews.com/article/meta-crowdtangle-research-misinformation-shutdown-facebook-977ece074b99adddb4887bf719f2112a [Accessed 2025-05-01].
  • Pater etal. (2023)JessicaA Pater, Amanda Coupe, FayikaFarhat Nova, Rachel Pfafman, Jeanne Carroll, Abigal Brouwer, Camden Bohn, Jason Li, Noah Todd, FenLei Chang, etal. 2023.Social Media is Not a Health Proxy: Differences Between Social Media and Electronic Health Record Reports of Post-COVID Symptoms.Proceedings of the ACM on Human-Computer Interaction 7, CSCW1 (2023), 1–25.
  • Pearson etal. (2025)GeorgeDH Pearson, NathanA Silver, JessicaY Robinson, Mona Azadi, BarbaraA Schillo, and JenniferM Kreslake. 2025.Beyond the margin of error: A systematic and replicable audit of the TikTok research API.Information, Communication & Society 28, 3 (2025), 452–470.
  • Persily etal. (2020)Nathaniel Persily, JoshuaA Tucker, and JoshuaAaron Tucker. 2020.Social media and democracy: The state of the field, prospects for reform.Cambridge University Press.
  • Praet etal. (2021)Stiene Praet, David Martens, and Peter VanAelst. 2021.Patterns of democracy? Social network analysis of parliamentary Twitter networks in 12 countries.Online Social Networks and Media 24 (2021), 100154.
  • Prochaska etal. (2023)Stephen Prochaska, Kayla Duskin, Zarine Kharazian, Carly Minow, Stephanie Blucker, Sylvie Venuto, JevinD West, and Kate Starbird. 2023.Mobilizing manufactured reality: How participatory disinformation shaped deep stories to catalyze action during the 2020 US presidential election.Proceedings of the ACM on human-computer interaction 7, CSCW1 (2023), 1–39.
  • Proferes etal. (2021)Nicholas Proferes, Naiyan Jones, Sarah Gilbert, Casey Fiesler, and Michael Zimmer. 2021.Studying reddit: A systematic overview of disciplines, approaches, methods, and ethics.Social Media+ Society 7, 2 (2021), 20563051211019004.
  • Starbird etal. (2019)Kate Starbird, Ahmer Arif, and Tom Wilson. 2019.Disinformation as collaborative work: Surfacing the participatory nature of strategic information operations.Proceedings of the ACM on human-computer interaction 3, CSCW (2019), 1–26.
  • u/PeerRevue ([n. d.])u/PeerRevue. [n. d.].R/reddit4researchers.https://www.reddit.com/r/reddit4researchers/?rdt=35953
  • Wagner (2023)MichaelW Wagner. 2023.Independence by permission.Science 381, 6656 (2023), 388–391.
  • Wang and Ringland (2023)Yihe Wang and KathrynE Ringland. 2023.Weaving Autistic Voices on TikTok: Utilizing Co-Hashtag Networks for Netnography. In Companion Publication of the 2023 Conference on Computer Supported Cooperative Work and Social Computing. 254–258.
  • Wu and Taneja (2021)AngelaXiao Wu and Harsh Taneja. 2021.Platform enclosure of human behavior and its measurement: Using behavioral trace data against platform episteme.New Media & Society 23, 9 (2021), 2650–2667.

Appendix A Participant Information

PseudonymPronounAffiliationaRegionbResearch Field(s)
Andyhe/himARE.U.Media Studies
Bobhe/himARU.S.Information Sciences, Linguistics, Political Science, Sociology
Brianhe/himARU.S.Information Sciences, Psychology
Emilyshe/herARE.U.Humanities, Information Sciences, Linguistics, Political Science, Sociology
Erkkihe/himARE.U.Communication, Computer Science, Information Sciences
Fabiohe/himARE.U.Communication, Political Science
Frankhe/himARE.U.Media and Communication
Istashe/herARE.U.Computer Science, Sociology
Kateshe/herARE.U.Computer Science, Information Sciences, Psychology
Kayshe/herARU.S.Communication
Maxhe/himARU.S.Computer Science
Peterhe/himARE.U.New Media Studies
Sarahshe/herARE.U.Communication, Humanities, Sociology
Thirteenhe/himARU.S.Computer Science, Information Sciences
Bastienhe/himNR1E.U.Information Sciences
Devinhe/himNR2E.U.Communication, Psychology
Green Waveshe/herNR1L.A.Political Science
Philipphe/himNRE.U.Computational Social Science
WellstoneDr.NR3U.S.Business, Communication, Computer Science, Humanities, Information Sciences, Linguistics, Political Science
  • a

    AR = Academic Researcher; NR = Non-academic Researcher

  • b

    E.U. = European Union; U.S. = United States; L.A. = Latin America

  • 1

    Civil-society organization

  • 2

    Public-funded research institute

  • 3

    For-profit company

In Table3, we provide detailed information about the participants in our study.

Post-Post-API Age: Studying Digital Platforms in Scant Data Access Times (2025)
Top Articles
Latest Posts
Recommended Articles
Article information

Author: Ouida Strosin DO

Last Updated:

Views: 6751

Rating: 4.6 / 5 (56 voted)

Reviews: 87% of readers found this page helpful

Author information

Name: Ouida Strosin DO

Birthday: 1995-04-27

Address: Suite 927 930 Kilback Radial, Candidaville, TN 87795

Phone: +8561498978366

Job: Legacy Manufacturing Specialist

Hobby: Singing, Mountain biking, Water sports, Water sports, Taxidermy, Polo, Pet

Introduction: My name is Ouida Strosin DO, I am a precious, combative, spotless, modern, spotless, beautiful, precious person who loves writing and wants to share my knowledge and understanding with you.