In December, 2023, I had to shut down my blog because it was hacked. Now I’m trying to revive the blog by importing data from my old blog into a new one, with different software. Since I had the database with all the content, I have high hopes that I can do that.
First, I needed to export the data to a form that my new blog can understand, and XML is one of the supported formats. I thought it would be fun to write a Python script, but since I don’t know Python well enough to interface with a database, that posed a problem. I decided to use a LLM to have it generate something for me, and I was very happy with Google’s Gemini and the script it generated for me. I needed to customize it, but for the most part, everything worked great.
It was really interesting using a LLM to build a framework to start from. My python isn’t great, I have to look up syntax for stuff (how do you slice an array again? What is the form the ternary operator again?) and certainly don’t know what libraries to use to access a Database. Interestingly Gemini chose to use mysql.connector. Later, I found that there are at least three reasonable choices for database access, and that was actually one of the harder ones to install on Debian – I needed to download a .deb package from the internet. It wasn’t available in the default repositories. It worked fine for what I wanted, and being a pure python implementation meant that I didn’t have to worry about dependencies, so that’s fine. 
The LLM made a script, not a program. So I had to add a main function and make a function for the main work, but it is still a real hacky script. Once I got the database dump working (to XML) I tried to use an extension to import the data, but I never got the extension to work. I really want to have the post IDs remain the same between my old blog and new one so I can use a simple URL redirect to make the old URLs still work. So I thought I would directly insert data into the Database tables, but that proved to be hard.
I found a blog talking about using WordPress’s REST API to insert posts, so I used that. Not bad! There was lots of annoying things to take care of before things were working well. The biggest problem was that the Post IDs were about 4 out of sync by the end of the 600+ posts. Turned out my old blog had 4 posts missing, maybe deleted or something. So I needed to add dummy posts, but after that everything worked fine.
Now I face the problem that <script> tags in the raw HTML of my posts don’t work. I need to figure out what WordPress is doing to them – I use scripts from Flickr for images and Twitter to embed tweets, both of which I want to work before putting my blog back in service. I also will need to look at CSS and the theme but I’m less concerned about that if I can get the images with captions and twitter embeds in here. 
Anyway, looks like things are working reasonably well. I had to give up on hosting wp-content in a separate directory, but that is fine. Mod_rewrite is a real beast. There is still a lot of work to do – old posts have different CSS so I should try to see if I can do something for that, and I haven’t played with the theme at all. Still, great to have the blog back and a real history (of my own) preserved.



Leave a Reply