dotCMS diagnosis & cure for broken identifiers, orphaned inodes and versionless links

You ever see an error like this in your dotcms.log?

[14/11/14 08:52:54:570 MST] ERROR action.ViewLinksAction: Exception e =User com.liferay.portal.model.User@d523aa66 does not have permission to read null 
com.dotmarketing.exception.DotSecurityException: User com.liferay.portal.model.User@d523aa66 does not have permission to read null

You might have broken identifiers, orphaned inodes and/or links w/o versions. There are a few other reasons for this happening, but for the sake of this article, we're going to demonstrate how you'd definitively determine that you indeed have broken identifiers, orphaned inodes or versionless links and how to fix them.

Broken identifiers and orphaned inodes are the symptom of basically the same problem: stuff that's supposed to be related in your dotCMS database isn't.

A word about identifiers and inodes. Every dotCMS piece of info boils down to an identifier. With dotCMS's versioning, one identifier can have unlimited versions. Inodes describe version numbers. So identifier:inodes relationship is one-to-many relationship.

If your inodes and identifiers are messed up, dotCMS won't run in an expected manner, those are pretty important little info bits there.

You're going to need access via a database client to diagnose and fix. The SQL here is of MySQL flavor but should easily translate in to any of the other DBs, if you get stuck w/ translation, hit me up in comments, I'm happy to help.

Before we lay in any of these fixes, you are forbidden from proceeding without a database backup.

For MySQL, dumping requires that we explicitly include stored procedures and triggers in the dump (backup) file. Without exception, I find myself regularly dumping the database in all of the dotCMS installations I work, here's a Bash shell script I keep handy to automate my database dumping needs.


Identification

How do you narrow down that you do or don't have broken identifiers, orphaned inodes and/or versionless links?

Versionless Links

Let's say for example your Links portlet (Admin > Website Browser > Links) is misbehaving, perhaps not loading at all and you're seeing the error above. Let's investigate possible versionless “links” data by running this:

SELECT COUNT(identifier) FROM links WHERE identifier NOT IN (SELECT identifier FROM link_version_info);

Here we're asking the db for a count of links records that aren't associated with link_version_info records. A number < 0 indicates we have some cleanup to do. I cover that in the "fix" section which follows.

Orphaned Inodes

SELECT * FROM inode WHERE type = 'links' AND inode NOT IN (SELECT inode FROM links);

We're asking the database, "From your links data, do we have any inode records that don't match up?"

An affirmative response would be one or more returns, if you don't get any, that's good, your links data is sound. If you do get one or more returns however, we have work to do (see the "fix" section to follow).

Broken Identifiers

SELECT COUNT(*) FROM identifier WHERE asset_type = 'links' AND id NOT IN (SELECT identifier FROM links);

In db speak: "From your links data, are you lacking any links records that we have logged in the identifier table?" Just like above, one or more returns means we have a problem (fix coming).

Another broken identifier check here:

SELECT COUNT(*) FROM identifier WHERE asset_type NOT LIKE 'links' AND id IN (SELECT identifier FROM links);

We're asking the db: "please count up 'non-link' identifiers that live in the links table".

And one last broken identifiers check, here we check the parent_path field:

SELECT COUNT(*) FROM identifier WHERE parent_path LIKE '%http%' AND id IN (SELECT identifier FROM links);

Again, like the checks above, any return > 0 indicates that we have work to do.

Data Fixes

You backed up your db, right? No moving forward without one.

Orphaned Inodes

DELETE FROM inode WHERE type = 'links' AND inode NOT IN (SELECT inode FROM links);

Broken Identifiers

  DELETE FROM identifier WHERE asset_type = 'links' AND id NOT IN (SELECT identifier FROM links); 
UPDATE identifier set asset_type = 'links' WHERE asset_type not like 'links' AND id IN (SELECT identifier FROM links);

Ensure data consistency

UPDATE identifier SET asset_type = 'file_asset' WHERE id IN (SELECT identifier FROM file_asset) AND asset_type NOT LIKE 'file_asset';

Parent Path Fix

If your parent path check above gave us a count > 0, we're going to need to fix with the following read:

SELECT * FROM identifier WHERE parent_path LIKE '%http%' AND id IN (SELECT identifier FROM links);

If you end up seeing parent paths that follow the pattern ([directory][fqdn/path],(FQDN = Fully Qualified Domain Name)) like in this example, we need this data to write a fix. With that data you just pulled, we'll need the identifier.id, identifier.parent_path and a quick + dirty find/replace in your editor of choice to remove that FQDN, like the following:

update identifier set parent_path = '/offices/studentaccounts/' where id = '1b7be8b4-2f3d-4815-a8f0-d0607c74a79e';


And you'll want to write an update statement for each record, like this.

One last versionless link check

Let's run one last check for links w/o version info, like this.

SELECT identifier, inode, title, mod_date FROM links WHERE identifier NOT IN (SELECT identifier FROM link_version_info) ORDER BY identifier ASC, mod_date DESC;

Returns indicate bad data. Consider this return. Note that the identifier is the same in each record. That means that we have one record to update. The fix would work like this:

INSERT INTO link_version_info(identifier, working_inode, live_inode, deleted, locked_by, locked_on, version_ts) VALUES('2ead13e4-5421-4962-bf7c-b66881d305c8', '1562e242-f6cc-4521-a6f5-9ae83d45e0e7', '1562e242-f6cc-4521-a6f5-9ae83d45e0e7', 0, null, CURRENT_TIMESTAMP, CURRENT_TIMESTAMP);

We're telling the link_version_info table what the working/live version is for this record.

Conclusion

It's very possible that if you're looking at this, you have some corrupt or misbehaving data. It's probable too that it's messed up in a different way than described here. If this didn't get you pointed in the right direction, you still have questions or help in any way ironing out your dotCMS data issue, please either hit us up directly or leave us a note in comments.

Say Hello

Near the Cleveland, Akron or Medina area and want to stop by our office? Let us know and we'll get the coffee and whiteboards ready. :)