The NG project is modernizing arXiv


For arXiv to continue to provide reliable and rapid dissemination of research, our technical infrastructure needed to be modernized. As reliable as our classic system has been, its codebase and technologies have become antiquated and very difficult to maintain and extend. Over more than 20 years, it has evolved organically into a complicated monolith, comprised of disparate and sometimes unclear visions.  Trying to move such a system forward is not sustainable and would push it beyond its limits. Therefore, in 2017, we launched the arXiv Next Generation (arXiv NG) project.

What are the goals of arXiv NG?

  • Incrementally replace the classic system. Gradually phase out components from the monolithic, mostly Perl legacy codebase, by replacing them with services written in Python.
  • Improve the scaling, failure tolerance, and availability of the infrastructure. Move to a cloud-based infrastructure.
  • Adopt open source practices to have higher engagement with external developers. Nearly all NG code is in public GitHub repositories under MIT license.
  • Modernize user interfaces. Address critical accessibility issues and provide greater compatibility with mobile devices.
  • Streamline development workflows. Operate on shorter release cycles, enabling more (and more frequent) stakeholder feedback.

What are the main attributes of the architecture and its technologies?

  • Maintainability: Adopt mainstream, well-supported technologies, like Python, Docker and Kubernetes.
  • Evolvability: Move towards a modular, service-based architecture where “sticky” interdependencies are minimized.
  • Flexibility: Move away from a monolithic data architecture. Our data architecture should align with durability and evolvability goals, which vary across the platform.
  • Better support for complex moderation and administrative workflows: Provide support for quality assurance processes and better visibility and control over the submission, moderation, and announcement processes.
  • Better API support and partner integrations: Adopt modern standards for data serialization, authentication and authorization, as well as documentation. Increase overall throughput of APIs, and expose valuable backend services to trusted clients.

How do we seek input from users about features?

As we planned the arXiv NG project, we conducted a series of surveys of our users and moderators to get feedback about the issues and improvements our community wants to see. We continue to accept feature requests and bug reports via help@arxiv.org and our feedback collectors on arxiv.org, and we also seek user input from all of our stakeholders in focus groups and direct outreach during our alpha/beta testing. This feedback continues to inform our planning and priority setting.

How do we set priorities?

There are a lot of things we want to accomplish to modernize our infrastructure and begin to improve for our community, so how do we prioritize what will be accomplished, how, and when? Three general rules govern how we set priorities:

  • Impact on core mission: How does work effect our core mission of supporting rapid dissemination, high service reliability, durability of data, and accessibility?
  • Technical dependencies and technical debt: Implementing solutions in the legacy codebase, and possibly out of sequence with arXiv NG, means that some work must be performed twice.
  • Opportunity cost: Is timing important due to external factors, such as the need to maintain compliance with standards and regulations?

Final decisions on IT prioritization are made by the IT Lead, with direction from the Executive Director and Scientific Director, and with input from stakeholder groups.

Helpful links