• A friendly and supportive community, register today. Our forums use a separate account system.

Site Updates AI Considerations

General updates on development of the MedBud platform as a whole.
We've been considering for a couple of years whether to allow AI LLMs to scrape our information.

Without naming them, some MC companies have already tried to train models against our DB without consent - one attempt seems to have been what we previously interpreted as an attempted DDOS attack.

On one hand, we want to make Medical Cannabis information as widely accessible as is possible without barriers. On the other hand, companies are increasingly trying to automatically scrape/interpret our daily updates/data for commercial purposes - when we remain a struggling non-profit without commercial backing, and without the proper bandwidth to even facilitate such constant daily requests.

For a long-time now we've been considering training an LLM on our own dataset, whereby we could do a far better job overall - and it wouldn't be too hard for us to programmatically feed AI a complete summary of both our main database content; and all forum posts, future reviews etc. Even to the point transcriptions of YouTube reviews and comments could be inputted.

The monumental problem with us doing this is sheer computational cost, both in terms of AI tokens and API access tokens for third-party platforms. Given the increasingly costly dedicated hardware required - we would probably be looking at a few thousand a month in costs (even self-hosting), and even with caching AI summaries/queries periodically.

Posting this long list of thoughts to garner further public feedback from patients and the industry, because we've been unsure how to properly handle the situation for a long-time now.

We are otherwise going to post a big announcement on trying to fix our funding some point soon.
 
We've been considering for a couple of years whether to allow AI LLMs to scrape our information: Why allow such a resource from a non profit to become a profit making tool for someone else?
When we remain a struggling non-profit without commercial backing, and without the proper bandwidth to even facilitate such constant daily requests: unsustainable, got to ban the bots by following the latest defences, bit boring a constant battle, but everyone with useful data is in the same boat, the AI.Robots project is hopeful, as is the 2 factor challenge.
For a long-time now we've been considering training an LLM on our own dataset: Why what benefits will this give and to who? Big decision. Paying for tokens to use the initial AI's may change over this year, i forsee free AI plugins for such things as only a matter of time now.

Couple of thoughts, i only recently realised our forum is visible to the whole net, i assumed we were private, now i see we have private areas. As we all appear to use aliases I dont mind my forum data being used publically. What data are the bots trying to access both the main DB and the forum DB?
 
Re funding - with MedBud.wiki you have so much here already and so much potential for growth - be sure to shop around - find others. It all depends upon your personal motivations too of course and where you see yourself and x.MedBud.wiki in the future. Gain as much as input or mentoring as you can, during this critical growth period.
 
Last edited:
For those who do not know what this is about technically a simple explanation is: LLM bots and other AI scrapers take a sites content and make the data available for other uses, like running up your own site with another’s data in minutes with AI, or using the data as a content feed on another site, even re writing it so it appears not stolen. Whilst scraping this data the database resources are heavily used slowing the site for other users and in some cases making it resemble a ddos attack on the site making it unusable. If the stolen data is useful to someone it can also then be sold to them, even though that data was from another source.

All sites are now suffering this LLM bots situation, for anyone using wordpress a good article on it here: https://www.dev4press.com/blog/tuto...llm-bots-from-scrapping-your-website-content/
 
@Muiredach Also consider the bandwidth and support you have here in the forum - you have some very experienced and talented peeps who you've helped massively who'd like to help you as much they can :)
 
Am i right in assuming you can sell data even as a non profit, if the money is used on servers etc and does not translate to a profit over the year, if so you can sell data to the clinics perhaps to raise funds for expansion site wise.
 
Also re funding - I'm sure you will do your due dilligence but always take your own independent financial / legal advice never an interested parties side. Get a valuation and my advice would be try to avoid the "too big" investors if you can - I see MedBud as the reference point for price and stock and ultimately trust you to do the right things for you (and us fellow patients) and MedBud. I'm sure I'm not not alone in saying this - I see your business as the most trustworthy in a really shady industry, and we can keep shining a light on bad actors to protect patient rights.
 
Last edited:
The future sadly is ai and data and the future will be hackable as no getting away from it.
I understand this site is a data collecting site for patient experiences so i knew where its future lay and mega bucks in that data .

In usa the data that was collected from growers patients etc was phenomenal and worth more to big companies than people had a clue about.
Better tone down my talk on here then pmsl. Clinics already know who patients are on here just by orders alone so no biggy

Do what you have to do and hope that the data can change the standards of the shit we are getting offered if not no point .

See where we are in a year but it does take alot to run a site like this and i am grateful for its resources.
 
When I ask ChatGPT questions about medical cannabis, it will often reference MedBud.wiki. That sounds like a good thing for the site to me.

LLMs are the Google/Search Engine of the future, so being mad that LLMs want to scrape your site in order to reference you and potentially send traffic directly to your site is kinda like being mad at Google for putting you top of the listings for free.

I know it sucks to feel like you're giving up IP for free, but I think becoming an known source among LLMs for medical cannabis is probably worth the trade, as long as the sourcing is there.

Like you say, the only way to avoid it is to do a better job of it yourself, but I have to be honest the pace at which things like the review system are being set up means that you risk the LLMs moving on to some other source of information instead, and then you quickly fall out of the "scene" among the LLMs, and then you'll struggle to get that reputation back. I myself have found discussing information I've read on MedBud with chat GPT, sometimes to do some personalised analysis that MedBud can't offer, but also sometimes just to do basic filtering that should be available on MedBud already but currently isn't (like filtering flower by availability at different clinics, or by terpenes or by lineage or... by any measure, really).

I think you have to capitalise on the fact that LLMs currently want to source you, I think you'll regret it if they forget about you instead, and I think competing with them at their own game is unrealistic at their current pace, given the sites current pace.
 
Back
Top