Back from Software Heritage meeting

Missing context about Software Heritage, feel free to give a look to my personal questions/answers from a session of the past year.

What a pleasant meeting with Software Heritage enthusiasts! So refreshing to team up with people with very diverse backgrounds and collaborate altogether for mixing ideas, feedback, or experience.

Many thanks to Software Heritage team for organizing and to Mélissa from La Dérivation for facilitating. The outcome of the event is very fruitful for me. Thanks!

How to collectively define what Software Heritage community needs to do and to be for this near future. Think about ideas for each then discuss them with your nearest neighbor, one-to-one. Times up, each group of two people must summarize in just one idea for to do and to be. Now, group by four people, and they try to merge the two ideas for each item. Times up, the result needs to sum up in only one idea for to do and to be. Now, group by eight and repeat. Then group again and repeat. The merging strategy leads to consensus including improvements or better wording. The last step of the process reads: we would like the Software Heritage community,

to be more inclusive to other skill diverse communities beyond computer science;
to be globally recognized as the reference for managing the software life cycle in all fields.

and,

to create special interest groups to get together with fixed goals and timeline;
to continuously increase outreach and usability, open to the world.

Great, isn't it? They appear to me as excellent outcomes¹!

Well, time is part of the constraint and for example, with a bit more time, I would have suggested another wording for beyond computer science since I do not identify myself as part of some computer science communities – maybe my proposal could have been: beyond communities focused on computing. Anyway, the interesting outcome is from the collective discussion and not the perfect final wording.

Another personal feedback. My initial “idea” – being consistent with the Software Heritage community’s recommendations, how many scientists in the room have archived the source code of their last publication? And mentioned the swhid identifier? – had been lost in translation in the merging branch I belonged to, but, and that’s what I find very interesting, the “idea” somehow appears in the other branch. The outcome is thus a consensus, no?

Then other working groups using other strategies led to other discussions. For instance, list actions for editorial offices, text for convincing board to archive software and use swid identifier, develop different use-cases for each swid identifiers, etc.

All in all, I am very happy to have been to the fruitful event! Thanks again for organizing. And last but not least, that’s also a nice opportunity to meet in-person folks with whom we collaborate online and to just take fresh news of life™.

Join the fun, join Guix for scientific research! Join SWH!

Edit: The next day, I attended to Software Heritage Symposium and Summit hosted at UNESCO. Thanks again Software Heritage team for the organization, awesome event! I am very grateful to Roberto Di Cosmo. Fun, we were at one seat from each other with Stefano Zacchiroli and we met for the first² time in-person here while we are currently exchanging emails.

The afternoon was intense! Various panels from diverse backgrounds. It is always interesting to attend to events which bridge institutional with longer-term speeches with day-to-day challenges.

Let point two scientific challenges: graph compression and Large Language Models (LLMs). The graph compression presentation by Sebastiano Vigna made the echo of the presentation by Paolo Ferragina (see slides) done the past year.

About LLMs, the idea is to use of the Software Heritage archive for the training of machine learning models that can automatically generate code to assist with software development tasks. Obviously, it asks legal and ethical questions. That's why: Software Heritage Statement on Large Language Models for Code; « We [Software Heritage] feel that the question is no longer whether LLMs for code should be built. They are already being built, independently of what we do, and there is no turning back. The real question is how they should be built and whom they should benefit.

In alignment with our mission, we believe that LLMs for code should be built in a transparent and respectful way, to the benefit of all. We hence state the following principles for acceptable machine learning use of the Software Heritage archive. »

These 3 principles appears to me worth to read… and share!

Footnotes:

Nothing is written in stone though.

Not exactly the first time, we already had two quick forgettable chats: one at Capitole du Libre in Toulouse in 2010 or 2011 and another one in FOSDEM in 2013 or 2019. Time flies!